CN-121999789-A - Directional pickup system of electric counter array based on neural network
Abstract
The invention relates to the technical field of voice processing, and discloses a power counter array directional pickup system based on a neural network, which comprises the following steps of extracting complex frequency spectrum; extracting a diffusion scale, calculating the mobility of a groove to a sound pressure ridge, generating a reserved frequency spectrum, generating a primary target frequency spectrum, generating a final target frequency spectrum, and outputting a target net direct sound pressure sequence. The invention establishes a space characteristic chain consisting of sound pressure gravity center, second-order distribution matrix, diffusion scale, slot projection coordinates, sound pressure ridge elongation ratio and slot sound pressure ridge mobility around the electric counter long-seam structure sound field, and sets a direct sound retaining network, a migration compensation synthesis mechanism and a residual suppression network, so that slot propagation characteristics caused by the counter structure are explicitly brought into a distinguishing and synthesis process, interception components, reflection diffusion components and residual interference are weakened more pertinently in a complex window sound field, and a target net direct sound pressure sequence which is more suitable for subsequent voice recognition, recording archiving and service quality inspection is output.
Inventors
- LIU TONGXU
- LIU WEIMIN
- YI FEI
- WANG YUNCHUAN
- WU JIAQI
- WANG CHUANGYE
- QIU DEGUI
- WANG YU
- SHEN SHILIN
- JIN JIAJIA
- LIU TONGTONG
- HE ZHIFAN
- SHANG SHUAI
Assignees
- 国网安徽省电力有限公司蚌埠供电公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260324
Claims (8)
- 1. Neural network-based power counter array directional pickup system, characterized by comprising: a first module for converting the time-domain sound pressure sequence to extract complex frequency spectrum; The second module calculates sound pressure energy by the complex frequency spectrum, combines the two-dimensional coordinates of the microphone and the sound pressure energy to obtain a sound pressure gravity center, constructs a second-order distribution matrix by the sound pressure gravity center and the sound pressure energy, and extracts a diffusion scale; The third module is used for projecting the sound pressure gravity center to obtain a slot-direction projection coordinate, solving the sound pressure ridge elongation ratio by a second-order distribution matrix, averaging the sound pressure energy to obtain average energy, and fusing the slot-direction projection coordinate displacement difference value, the sound pressure ridge elongation ratio and the average energy to obtain slot-direction sound pressure ridge mobility; A fourth module for inputting the logarithmic value of the sound pressure energy, the slot-direction projection coordinate, the sound pressure ridge elongation ratio and the slot-direction sound pressure ridge mobility into a direct sound reservation network to output a reservation weight, and generating a reserved frequency spectrum by combining a complex frequency spectrum; A fifth module, extracting a migration compensation coefficient from the second-order distribution matrix, generating a compensated synthesis center by fusing the migration compensation coefficient, the groove-to-sound pressure ridge mobility and the sound pressure center of gravity, calculating a channel synthesis weight according to the two-dimensional coordinates of the microphone and the distance between the compensated synthesis center, and generating a primary target frequency spectrum by combining the reserved frequency spectrum; A sixth module for calculating residual energy from the complex spectrum and the reserved spectrum, outputting residual suppression gain by inputting the residual energy, the logarithmic value of the primary target spectrum, the mobility of the groove to the sound pressure ridge, the migration compensation coefficient and the diffusion scale into the residual suppression network, and generating a final target spectrum by combining the primary target spectrum; and a seventh module for performing inverse transformation on the final target spectrum to output a target net direct sound pressure sequence.
- 2. The neural network-based power counter array directional pickup system of claim 1, wherein transforming the time-to-frequency sound pressure sequence to extract the complex frequency spectrum comprises: performing multiplication operation on the time domain sound pressure sequence and the analysis window function to extract a short-time slice data segment; And multiplying the short-time slice data segment with the complex exponential basis function, and performing cumulative summation calculation in the variable dimension of the discrete sampling point to obtain a complex frequency spectrum.
- 3. The neural network-based power counter array directional pickup system of claim 1, wherein calculating sound pressure energy from the complex spectrum, combining microphone two-dimensional coordinates with the sound pressure energy to obtain a sound pressure center of gravity, constructing a second order distribution matrix from the sound pressure center of gravity and the sound pressure energy, and extracting a diffusion scale, comprises: calculating the square of the complex spectrum modular length to obtain sound energy; Multiplying the microphone two-dimensional coordinates with sound energy, accumulating the sound energy in the microphone channel variable dimension to obtain an operation molecule, accumulating the sound energy in the microphone channel variable dimension, adding an extremely small positive value to obtain an operation denominator, and dividing the operation molecule by the operation denominator to obtain the sound pressure gravity center; Subtracting the sound pressure center of gravity from the microphone two-dimensional coordinates to obtain a coordinate offset vector, performing continuous multiplication operation on the coordinate offset vector, a transposed vector of the coordinate offset vector and sound pressure energy, performing accumulation calculation on the variable dimension of the microphone channel to obtain a matrix superposition molecule, and dividing the matrix superposition molecule by an operation denominator to construct a second-order distribution matrix; And adding elements in diagonal positions in the second-order distribution matrix to extract the diffusion scale.
- 4. The neural network-based power counter array directional pickup system of claim 1, wherein the projecting of the center of gravity of sound pressure to obtain a slot-wise projected coordinate, the calculating of the sound pressure ridge elongation ratio from a second order distribution matrix, the averaging of the sound pressure energy to obtain an average energy, the merging of the slot-wise projected coordinate displacement difference, the sound pressure ridge elongation ratio and the average energy to obtain the slot-wise sound pressure ridge mobility, comprises: performing vector inner product multiplication calculation of the sound pressure gravity center and the long seam direction unit vector to obtain a slot direction projection coordinate; extracting a first characteristic value and a second characteristic value of the second-order distribution matrix, dividing the first characteristic value by a summation result comprising the second characteristic value and a minimum positive value to obtain a sound pressure ridge elongation ratio; The sound pressure energy corresponding to all microphone channel variables is accumulated and added and divided by the total number of the microphone channels to calculate and obtain average energy; Subtracting the corresponding slot-direction projection coordinates of the original frequency variable from the corresponding slot-direction projection coordinates of the frequency variable with the added offset change, and taking the absolute value to extract the absolute value of the difference value of the displacement of the slot-direction projection coordinates; And performing continuous multiplication operation on the difference absolute value of the displacement of the groove-direction projection coordinates, the sound pressure ridge elongation ratio and the average energy, performing accumulation calculation on the frequency variable dimension to obtain deformed molecules, performing accumulation on the average energy on the frequency variable dimension, adding a minimum positive value to obtain a confidence bottom denominator, and dividing the deformed molecules by the confidence bottom denominator to obtain the mobility of the groove-direction sound pressure ridge.
- 5. The neural network-based power counter array directional pickup system of claim 1, wherein inputting the logarithmic magnitude of acoustic energy, the bin-wise projection coordinates, the acoustic pressure ridge elongation ratio, and the bin-wise acoustic pressure ridge mobility into the direct acoustic retention network outputs a retention weight, and generating the retained spectrum in combination with the complex spectrum comprises: adding the minimum positive values to the sound energy, and then applying natural logarithm mapping calculation to extract the logarithm values of the sound energy; The logarithmic value of sound pressure energy, the slot-direction projection coordinate, the sound pressure ridge elongation ratio and the slot-direction sound pressure ridge mobility are subjected to row-level series connection and combined structure to construct a first input feature vector; injecting the first input feature vector into a direct sound retaining network to execute forward operation, and combining an activation function to obtain a retaining weight; And carrying out one-to-one corresponding multiplication on the reserved weight and the complex spectrum to calculate stripping interception components so as to obtain a reserved spectrum.
- 6. The neural network-based power counter array directional pickup system of claim 1, wherein extracting migration compensation coefficients from the second order distribution matrix, fusing the migration compensation coefficients, the groove-to-ridge mobility, and the center of gravity of the sound pressure to generate a compensated synthesis center, calculating channel synthesis weights according to microphone two-dimensional coordinates and the distance between the compensated synthesis centers, and generating a primary target spectrum by combining the reserved spectrum, comprising: dividing the first characteristic value by a denominator base formed by collecting the sum of the first characteristic value and the second characteristic value and the minimum positive value to calculate and extract a migration compensation coefficient; performing continuous multiplication on the migration compensation coefficient, the groove-to-sound pressure ridge mobility and the long seam direction unit vector to obtain an offset vector, and subtracting the offset vector from the sound pressure gravity center to obtain a compensated synthetic center; Subtracting the compensated synthesis center from the two-dimensional coordinates of the microphone, performing self point multiplication operation in the vector to obtain a square space distance value, adding a negative sign to the square space distance value, dividing the square space distance value by the sum of a diffusion scale and a minimum positive value to obtain a natural exponent bottom variable parameter, and calling natural exponent function mapping calculation to the natural exponent bottom variable parameter to obtain an unnormalized declining value; Dividing the unnormalized regression numerical value by the unnormalized regression numerical value summation under all microphone channel variables, and extracting channel synthesis weight; the channel synthesis weights are multiplied by the reserved spectrum and cumulatively added in the microphone channel variable dimension to synthesize a primary target spectrum.
- 7. The neural network-based power counter array directional pickup system of claim 1, wherein calculating residual energy from the complex spectrum and the reserved spectrum, inputting the residual energy, the logarithmic value of the primary target spectrum, the bin-to-sound pressure ridge mobility, the migration compensation coefficient, and the diffusion scale into the residual suppression network, outputting a residual suppression gain, and generating a final target spectrum in combination with the primary target spectrum, comprises: subtracting the reserved frequency spectrum from the complex frequency spectrum, extracting a complex difference operation amplitude result, calculating the square of the modular length of the complex difference operation amplitude result, adding and summing under the variation of a microphone channel, and extracting residual energy; Calculating the square of the primary target frequency spectrum modular length, adding an extremely small positive value into a natural logarithmic model, extracting the logarithmic value of the primary target frequency spectrum, adding the extremely small positive value into the natural logarithmic model, and extracting the logarithmic value of the residual energy; The logarithmic value of the primary target frequency spectrum, the mobility of the groove to the sound pressure ridge, the migration compensation coefficient, the diffusion scale and the logarithmic value of the residual energy are connected in series end to end, and a second input characteristic vector is assembled and constructed; Inputting the second input feature vector into a residual suppression network to execute forward operation, and calculating a limiting threshold value by combining an activation function to obtain residual suppression gain; the final target spectrum is output by multiplying the primary target spectrum by the residual suppression gain.
- 8. The neural network-based power counter array directional pickup system of claim 1, wherein performing an inverse transform on the final target spectrum outputs a target net direct sound pressure sequence, comprising: and performing short-time Fourier inverse transformation on the final target frequency spectrum, and outputting a target net direct sound pressure sequence.
Description
Directional pickup system of electric counter array based on neural network Technical Field The invention relates to the technical field of voice processing, in particular to a power counter array directional pickup system based on a neural network. Background The power business window typically employs a counter barrier structure to accomplish ticket transfer, business inquiry and personnel separation. Such windows often provide a glass barrier, an elongated conversation opening, and a fixed countertop, and the speech propagation path between the customer and the teller is therefore significantly structurally constrained. In the propagation process, the target voice can reach the array microphone, and can form reflection, diffraction and diffusion propagation at the boundaries of the glass, the table top and the window, so that the acquired multichannel signal simultaneously contains target direct sound and various non-target components. For power counter scenarios that require preservation of service questionnaire content, support of speech recognition, or completion of recording archiving, such structural sound fields can directly affect the level of intelligibility of the target speech. Existing counter pick-up schemes mostly employ single microphone acquisition, fixed beam forming or general noise reduction algorithms. The single microphone system is difficult to use spatial distribution information and is easy to receive the target sound together with the ambient sound. Although fixed beam forming can enhance sound in a specific direction to a certain extent, the fixed beam forming has no pertinence to propagation offset, slot expansion and frequency-dependent migration phenomena caused by glass barrier and long slit structures, and is often only roughly suppressed from the geometric direction. The general voice noise reduction algorithm focuses more on broadband noise suppression, and the problems that the structural propagation form caused by long slits of a counter lacks explicit modeling, target voice is weakened together, residual reflection components are difficult to sufficiently remove, and time-frequency unit processing is unstable easily occur. With the advancement of applications such as window speech recognition, double-recording archiving, service quality inspection, intelligent auxiliary handling, etc., the front-end pickup system no longer requires background noise reduction, but also requires that the effective direct speech of the target speaker be maintained as much as possible in a complex counter structure. Particularly when clients and teller speak alternately, crosstalk exists between adjacent windows and multi-interface reflection exists on counter surfaces, spatial position discrimination, structure propagation compensation and residual interference suppression are difficult to be simultaneously considered only by relying on a traditional array weighting or a common neural network voice enhancement method. Disclosure of Invention The invention provides a power counter array directional pickup system based on a neural network, which solves the technical problems in the background technology. The invention provides a power counter array directional pickup system based on a neural network, which comprises: a first module for converting the time-domain sound pressure sequence to extract complex frequency spectrum; The second module calculates sound pressure energy by the complex frequency spectrum, combines the two-dimensional coordinates of the microphone and the sound pressure energy to obtain a sound pressure gravity center, constructs a second-order distribution matrix by the sound pressure gravity center and the sound pressure energy, and extracts a diffusion scale; The third module is used for projecting the sound pressure gravity center to obtain a slot-direction projection coordinate, solving the sound pressure ridge elongation ratio by a second-order distribution matrix, averaging the sound pressure energy to obtain average energy, and fusing the slot-direction projection coordinate displacement difference value, the sound pressure ridge elongation ratio and the average energy to obtain slot-direction sound pressure ridge mobility; A fourth module for inputting the logarithmic value of the sound pressure energy, the slot-direction projection coordinate, the sound pressure ridge elongation ratio and the slot-direction sound pressure ridge mobility into a direct sound reservation network to output a reservation weight, and generating a reserved frequency spectrum by combining a complex frequency spectrum; A fifth module, extracting a migration compensation coefficient from the second-order distribution matrix, generating a compensated synthesis center by fusing the migration compensation coefficient, the groove-to-sound pressure ridge mobility and the sound pressure center of gravity, calculating a channel synthesis weight according to the two-dimensional coordinates of the