CN-115641861-B - Vehicle-mounted voice enhancement method, device, storage medium and equipment

CN115641861BCN 115641861 BCN115641861 BCN 115641861BCN-115641861-B

Abstract

The application discloses a vehicle-mounted voice enhancement method, a device, a storage medium and equipment, wherein the method comprises the steps of firstly acquiring vehicle-mounted auxiliary information of a target vehicle, acquiring target voice information of a vehicle-mounted user in each voice zone on the target vehicle, and then enhancing the target voice information by utilizing the vehicle-mounted auxiliary information to obtain enhanced target voice information; and then, carrying out preset operation processing on the vehicle-mounted user and/or the target vehicle according to the enhanced target voice information to obtain a processing result. Therefore, the voice of the vehicle-mounted user in each voice zone on the vehicle is enhanced according to the vehicle-mounted auxiliary information, and the enhanced voice is utilized to perform subsequent vehicle awakening, user positioning, recognition and other preset operation processing, so that awakening, positioning and recognition effects can be improved, and further the voice interaction experience of the user in the driving state of the target vehicle is improved.

Inventors

HUANG YUANFANG
HU YU

Assignees

科大讯飞股份有限公司

Dates

Publication Date: 20260505
Application Date: 20221013

Claims (9)

1. A vehicle-mounted voice enhancement method, comprising: Acquiring vehicle-mounted auxiliary information of a target vehicle and acquiring target voice information of vehicle-mounted users in each sound zone on the target vehicle; Performing Fourier transform on the target voice information to obtain converted target voice information; Constructing a combined vector by utilizing the vehicle-mounted auxiliary information of the target vehicle and the converted target voice information, inputting the combined vector into a pre-constructed voice enhancement model, and predicting to obtain the weight of the target voice information of each voice zone on the target vehicle; Multiplying the weight and the corresponding target voice information respectively, and performing inverse Fourier transform on the obtained calculation result to obtain enhanced target voice information; According to the enhanced target voice information, carrying out preset operation treatment on the vehicle-mounted user and/or the target vehicle to obtain a treatment result; when the in-vehicle auxiliary information of the target vehicle includes seat information of the target vehicle, the constructing a combination vector using the in-vehicle auxiliary information of the target vehicle and the converted target voice information includes: constructing a combination vector by using the seat information of the target vehicle and the converted target voice information; or when the vehicle-mounted auxiliary information of the target vehicle includes vehicle speed information and window state information of the target vehicle, the constructing a combined vector using the vehicle-mounted auxiliary information of the target vehicle and the converted target voice information includes: And constructing a combination vector by utilizing the speed information and the window state information of the target vehicle and the converted target voice information.
2. The method of claim 1, wherein constructing a combined vector using the seat information of the target vehicle and the converted target voice information comprises: And splicing the vector corresponding to the seat information of the target vehicle and the vector corresponding to the converted target voice information to obtain a spliced vector as a combined vector, or constructing the combined vector by using the seat information of the target vehicle and the converted target voice information in a gating mode.
3. The method according to claim 1, wherein the constructing a combined vector using the vehicle speed information and window state information of the target vehicle and the converted target voice information includes: And splicing the vector corresponding to the speed information of the target vehicle, the vector corresponding to the window state information and the vector corresponding to the converted target voice information to obtain a spliced vector as a combined vector, or constructing the combined vector by using the speed information of the target vehicle, the window state information and the converted target voice information in a gating mode.
4. A method according to any one of claims 1 to 3, wherein the speech enhancement model comprises at least one of a convolutional neural network CNN, a recurrent neural network RNN, a real or complex network.
5. A method according to claims 1 to 3, wherein said inputting the combined vector into a pre-constructed speech enhancement model predicts the weights of the target speech information for each of the zones on the target vehicle, comprising: and inputting the combined vector into a pre-constructed voice enhancement model, and calculating the ratio of the frequency domain signal of the target voice information acquired by each voice zone to the frequency domain signal of the noisy audio acquired by the preset voice zone to be used as the weight of the target voice information of each voice zone.
6. The method according to claim 1, wherein the performing a preset operation process on the on-board user and/or the target vehicle according to the enhanced target voice information to obtain a processing result includes: And waking up a preset device of the target vehicle according to the enhanced target voice information, and positioning and identifying the vehicle-mounted user sending out the wake-up voice to obtain a processing result.
7. A vehicle-mounted voice enhancement device, comprising: The system comprises an acquisition unit, a control unit and a control unit, wherein the acquisition unit is used for acquiring vehicle-mounted auxiliary information of a target vehicle and acquiring target voice information of vehicle-mounted users in each sound zone on the target vehicle; The enhancement unit is used for enhancing the target voice information by utilizing the vehicle-mounted auxiliary information to obtain enhanced target voice information; The processing unit is used for carrying out preset operation processing on the vehicle-mounted user and/or the target vehicle according to the enhanced target voice information to obtain a processing result; The vehicle-mounted auxiliary information of the target vehicle includes seat information of the target vehicle, and the reinforcement unit includes: The first transformation subunit is used for carrying out Fourier transformation on the target voice information to obtain converted target voice information; The first prediction subunit is used for constructing a combined vector by utilizing the seat information of the target vehicle and the converted target voice information, inputting the combined vector into a pre-constructed voice enhancement model, and predicting to obtain the weight of the target voice information of each voice zone on the target vehicle; The first calculating subunit is used for respectively multiplying the weight and the corresponding target voice information, and performing inverse Fourier transform on the obtained calculation result to obtain enhanced target voice information; Or the vehicle-mounted auxiliary information of the target vehicle comprises speed information and vehicle window state information of the target vehicle, and the enhancement unit comprises: the second transformation subunit is used for carrying out Fourier transformation on the target voice information to obtain converted target voice information; the second prediction subunit is used for constructing a combined vector by utilizing the speed information and the window state information of the target vehicle and the converted target voice information, inputting the combined vector into a pre-constructed voice enhancement model, and predicting to obtain the weight of the target voice information of each voice zone on the target vehicle; And the second calculating subunit is used for respectively multiplying the weight and the corresponding target voice information, and performing inverse Fourier transform on the obtained calculation result to obtain the enhanced target voice information.
8. The vehicle-mounted voice enhancement equipment is characterized by comprising a processor, a memory and a system bus; the processor and the memory are connected through the system bus; the memory is for storing one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the method of any of claims 1-6.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein instructions, which when run on a terminal device, cause the terminal device to perform the method of any of claims 1-6.

Description

Vehicle-mounted voice enhancement method, device, storage medium and equipment Technical Field The present application relates to the field of speech processing technologies, and in particular, to a vehicle-mounted speech enhancement method, device, storage medium, and apparatus. Background With the improvement of the living standard of people and the rapid development of social economy, the utilization rate of automobiles is gradually increased, more and more automobiles enter the lives of people, and great convenience is brought to all aspects of the lives of people. Among them, voice interactive systems have also been popular in smart automobiles. At present, a plurality of sound areas of a vehicle are generally divided according to the position of a seat, for example, a four-seat vehicle comprising a main driver seat, a secondary driver seat, a main rear seat and a secondary rear seat is divided into four sound areas, and speaker audio corresponding to a plurality of sound areas is obtained through a directional beam or a voice separation model and sent to the rear end for waking up, and then the position of a wake-up person is obtained by comparing the waking up result, so that the subsequent recognition of a target speaker is completed. However, when the vehicle is in a scene of low signal to noise ratio such as high-speed driving windowing and interference of multiple speakers, the problem of low wake-up rate can be generated, and even if the vehicle is wake-up, the phenomenon of positioning error can also occur, so that the voice interaction experience of the vehicle-mounted user is poor. Disclosure of Invention The embodiment of the application mainly aims to provide a vehicle-mounted voice enhancement method, a device, a storage medium and equipment, which can enhance the voice of a speaker according to vehicle-mounted auxiliary information, so that the awakening, positioning and recognition effects can be improved, and further the voice interaction experience of a user in a driving state is improved. The embodiment of the application provides a vehicle-mounted voice enhancement method, which comprises the following steps: Acquiring vehicle-mounted auxiliary information of a target vehicle and acquiring target voice information of vehicle-mounted users in each sound zone on the target vehicle; performing enhancement processing on the target voice information by utilizing the vehicle-mounted auxiliary information to obtain enhanced target voice information; and carrying out preset operation processing on the vehicle-mounted user and/or the target vehicle according to the enhanced target voice information to obtain a processing result. In a possible implementation manner, the vehicle-mounted auxiliary information of the target vehicle includes seat information of the target vehicle, the enhancing processing is performed on the target voice information by using the vehicle-mounted auxiliary information to obtain enhanced target voice information, and the method includes: Performing Fourier transform on the target voice information to obtain converted target voice information; Constructing a combined vector by utilizing the seat information of the target vehicle and the converted target voice information, inputting the combined vector into a pre-constructed voice enhancement model, and predicting to obtain the weight of the target voice information of each voice zone on the target vehicle; And multiplying the weight and the corresponding target voice information respectively, and performing inverse Fourier transform on the obtained calculation result to obtain the enhanced target voice information. In a possible implementation manner, the constructing a combined vector using the seat information of the target vehicle and the converted target voice information includes: And splicing the vector corresponding to the seat information of the target vehicle and the vector corresponding to the converted target voice information to obtain a spliced vector as a combined vector, or constructing the combined vector by using the seat information of the target vehicle and the converted target voice information in a gating mode. In a possible implementation manner, the vehicle-mounted auxiliary information of the target vehicle includes vehicle speed information and vehicle window state information of the target vehicle, the enhancing processing is performed on the target voice information by using the vehicle-mounted auxiliary information to obtain enhanced target voice information, and the method includes: Performing Fourier transform on the target voice information to obtain converted target voice information; Constructing a combined vector by utilizing the speed information and the vehicle window state information of the target vehicle and the converted target voice information, inputting the combined vector into a pre-constructed voice enhancement model, and predicting to obtain the weight of the target voice