CN-122024745-A - Voice acquisition and preprocessing system and method based on software and hardware collaborative acceleration

CN122024745ACN 122024745 ACN122024745 ACN 122024745ACN-122024745-A

Abstract

The invention provides a voice acquisition and preprocessing system and method based on software and hardware collaborative acceleration, which relate to the technical field of voice signal processing, wherein the system comprises a user line interface circuit module, a user line interface circuit module and a voice signal processing module, wherein the user line interface circuit module is used for acquiring voice signals and performing analog-to-digital conversion on the voice signals to obtain digital voice data; the invention relates to a field programmable gate array module, a network acceleration processing module and a user state voice processing module, wherein the field programmable gate array module is used for preprocessing digital voice data and packaging the preprocessed data into an Ethernet frame with specific voice identification, the network acceleration processing module is used for recognizing the Ethernet frame with the Ethernet type field as the specific voice identification in a data link layer and routing the Ethernet frame to the user state application program, and the user state voice processing module is used for bearing the user state application program so as to execute voice data processing of an application layer.

Inventors

ZHEN WEILIANG
LI YAN
FANG SHUAI

Assignees

北京华环电子股份有限公司

Dates

Publication Date: 20260512
Application Date: 20251224

Claims (10)

1. A voice acquisition and preprocessing system based on software and hardware collaborative acceleration is characterized by comprising: The user line interface circuit module is used for collecting voice signals and carrying out analog-to-digital conversion on the voice signals to obtain digital voice data; the field programmable gate array module is electrically connected with the user line interface circuit module and is used for preprocessing the digital voice data and packaging the preprocessed data into an Ethernet frame with a specific voice identifier; The network acceleration processing module is connected with the field programmable gate array module through a network interface and is used for identifying the Ethernet frame with the Ethernet type field as the specific voice identifier at a data link layer and routing the Ethernet frame to a user mode application program; and the user state voice processing module is used for bearing the user state application program so as to execute voice data processing of an application layer.
2. The voice acquisition and preprocessing system based on software and hardware collaborative acceleration according to claim 1, wherein the preprocessing comprises: And converting the 8-bit nonlinear PCM code output by the subscriber line interface circuit module into a 16-bit linear PCM code.
3. The voice acquisition and preprocessing system based on software and hardware collaborative acceleration according to claim 1, wherein the preprocessing comprises: calculating the level amplitude of an input signal and carrying out silence detection by combining the signal zero-crossing rate; And when silence is detected, excluding the silence data from the digital voice data and then packaging the silence data into the Ethernet frame.
4. The voice acquisition and preprocessing system based on software and hardware co-acceleration according to claim 1, wherein said ethernet frame comprises: a destination MAC address field, a source MAC address, a VLAN tag optional field, an ethernet type field, a packet count field, a number of voice channels field, a voice channel data field, and a CRC check field; the Ethernet type field is used for identifying a special frame for voice acquisition, and the packet sending count field is used for detecting packet loss and synchronization and packet loss detection of a master/slave terminal.
5. The voice acquisition and preprocessing system based on software and hardware collaborative acceleration according to claim 1, wherein the network acceleration processing module is a data path acceleration architecture DPAA2 module based on a NXP chip; the data path acceleration architecture DPAA2 module based on the NXP chip comprises a regular expression matching engine, wherein the regular expression matching engine is used for identifying whether the Ethernet type field is the Ethernet frame with the specific voice identification.
6. The system of claim 5, further comprising, after the regular expression matching engine identifies the ethernet frame identified as the specific speech identifier: Redirecting the Ethernet frame to a hardware frame queue, setting the scheduling priority of the hardware frame queue to be higher than that of a common data buffer queue, and binding the hardware frame queue with a special user mode input/output interface.
7. The voice acquisition and preprocessing system based on software and hardware co-acceleration according to claim 6, wherein said user mode voice processing module further comprises a user mode input/output interface for obtaining said ethernet frame with specific voice identification from a hardware frame queue in a polling manner.
8. The voice acquisition and preprocessing system based on software and hardware collaborative acceleration according to claim 1, wherein the user mode voice processing module is further configured to, prior to executing voice data processing at an application layer: Analyzing the Ethernet frame with the specific voice identifier, extracting the count value in the packet sending count field, and carrying out continuous comparison with the count value of the previous frame so as to carry out packet loss detection.
9. The system for voice acquisition and preprocessing based on software and hardware co-acceleration according to claim 1, wherein said field programmable gate array module is further configured to: Sampling and buffering the preprocessed digital voice data according to a preset time interval, and once the preset sampling times are reached, packaging the buffered multiple sampled data into the Ethernet frame with the specific voice identifier.
10. A voice acquisition and preprocessing method based on software and hardware collaborative acceleration is characterized by comprising the following steps: Collecting and analog-to-digital converting analog voice signals through a user line interface circuit module to generate digital voice data; Preprocessing the digital voice data through a field programmable gate array module, and packaging the preprocessed voice data into an Ethernet frame with a specific voice identifier; Identifying the specific voice identifier at a data link layer through a network acceleration processing module, and responding to successful identification, directly routing the Ethernet frame to a user mode application program; And receiving and analyzing the Ethernet frame through a user mode voice processing module, and executing voice data processing of an application layer.

Description

Voice acquisition and preprocessing system and method based on software and hardware collaborative acceleration Technical Field The invention relates to the technical field of voice signal processing, in particular to a voice acquisition and preprocessing system and method based on software and hardware collaborative acceleration. Background With the rapid development of the internet of things, the industrial internet and the real-time communication technology, the voice communication is increasingly widely applied in the scenes such as industrial intercom, emergency command, voice wakeup and intelligent terminals. These scenarios place extremely high demands on the processing and transmission of voice data, including low latency, low jitter, low power consumption, and high reliability. In conventional voice communication systems (e.g., IP phones, conference systems, intelligent voice terminals), the processing of voice data generally follows a standardized software path, i.e., after voice data is collected by an audio CODEC (CODEC), the voice data is first managed by a driver (e.g., ALSA, pulseAudio) in an operating system kernel, then encapsulated and transmitted by a complex network protocol stack (e.g., TCP/IP), and finally sent to a user-mode application for further processing. The process involves frequent user/kernel state context switching, memory copying and scheduling delay, resulting in high system delay, large CPU load, high power consumption, and difficulty in meeting the scenes with strict real-time requirements (such as industrial talkback, emergency communication, voice wakeup, etc.). In the related art, although there are schemes of adopting DMA (Direct Memory Access ) or a hardware acceleration module to reduce the load of a CPU, the method still relies on kernel driver to complete data path management, but cannot completely bypass an operating system protocol stack, lacks a dynamic hardware level scheduling mechanism for voice channel priority, does not combine with silence detection to realize dynamic adjustment of acquisition opportunity, causes invalid data to be continuously transmitted, requires additional encapsulation/analysis overhead for carrying voice data by a universal ethernet frame, and cannot be rapidly identified by hardware. Disclosure of Invention The invention provides a voice acquisition and preprocessing system and method based on software and hardware collaborative acceleration, which are used for solving the defects that the prior voice technology is long in data transmission path, high in CPU load, large in delay and jitter and incapable of meeting the real-time scene requirements of industrial intercom and the like due to dependence on an operating system kernel protocol stack. The invention provides a voice acquisition and preprocessing system based on software and hardware collaborative acceleration, which comprises: The user line interface circuit module is used for collecting voice signals and carrying out analog-to-digital conversion on the voice signals to obtain digital voice data; the field programmable gate array module is electrically connected with the user line interface circuit module and is used for preprocessing the digital voice data and packaging the preprocessed data into an Ethernet frame with a specific voice identifier; The network acceleration processing module is connected with the field programmable gate array module through a network interface and is used for identifying the Ethernet frame with the Ethernet type field as the specific voice identifier at a data link layer and routing the Ethernet frame to a user mode application program; and the user state voice processing module is used for bearing the user state application program so as to execute voice data processing of an application layer. According to the voice acquisition and preprocessing system based on the software and hardware collaborative acceleration provided by the invention, the preprocessing comprises the following steps: And converting the 8-bit nonlinear PCM code output by the subscriber line interface circuit module into a 16-bit linear PCM code. According to the voice acquisition and preprocessing system based on the software and hardware collaborative acceleration provided by the invention, the preprocessing comprises the following steps: calculating the level amplitude of an input signal and carrying out silence detection by combining the signal zero-crossing rate; And when silence is detected, excluding the silence data from the digital voice data and then packaging the silence data into the Ethernet frame. According to the voice acquisition and preprocessing system based on the software and hardware collaborative acceleration provided by the invention, the Ethernet frame comprises: a destination MAC address field, a source MAC address, a VLAN tag optional field, an ethernet type field, a packet count field, a number of voice channels field, a voice channel data fi