CN-122024716-A - Voice control method, system and network equipment
Abstract
The application relates to a voice control method, a voice control system and network equipment, wherein the method comprises the steps of obtaining a plurality of groups of initial voice data; the method comprises the steps of obtaining a plurality of groups of initial voice data, determining a target sound source position based on phase information of the plurality of groups of initial voice data, carrying out noise reduction processing on the plurality of groups of initial voice data based on the target sound source position to obtain target voice data, carrying out voice recognition processing on the target voice data to generate a target control instruction, and instructing the target voice interaction device to execute operation corresponding to the target voice instruction by the target control instruction. According to the application, the voice data are respectively collected by the voice interaction devices, the network device generates the control instruction aiming at the target voice interaction device based on a plurality of groups of voice data, and the voice control on the target voice interaction device can be realized under the condition that the environment is noisy or the user is far away from the target voice interaction device, so that the voice control effect is improved.
Inventors
- WANG YU
Assignees
- 宁波方太厨具有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260107
Claims (10)
- 1. A method of voice control, the method comprising: acquiring multiple groups of initial voice data, wherein the multiple groups of initial voice data are voice data respectively acquired by multiple voice interaction devices in response to target voice instructions; determining a target sound source position based on the phase information of the plurality of sets of initial voice data; noise reduction processing is carried out on the plurality of groups of initial voice data based on the target sound source position, so that target voice data are obtained; and carrying out voice recognition processing on the target voice data to generate a target control instruction, wherein the target control instruction is used for indicating target voice interaction equipment to execute the operation corresponding to the target voice instruction.
- 2. The voice control method according to claim 1, wherein the noise reduction processing is performed on the plurality of sets of initial voice data based on the target sound source position to obtain target voice data, including: Determining a noise type based on the target sound source location; Acquiring a preset noise spectrum corresponding to the noise type; Noise reduction processing is carried out on the plurality of groups of initial voice data based on the preset noise spectrum, so that a plurality of groups of noise reduction voice data are obtained; And carrying out data fusion processing on the plurality of groups of noise reduction voice data to obtain the target voice data.
- 3. The voice control method according to claim 2, wherein the performing noise reduction processing on the plurality of sets of initial voice data based on the preset noise spectrum to obtain a plurality of sets of noise-reduced voice data includes: Performing Fourier transform processing on the multiple groups of initial voice data to obtain noisy spectrums corresponding to the multiple groups of initial voice data respectively; The plurality of sets of noise reduced speech data are determined based on a difference between the noisy spectrum and the preset noise spectrum.
- 4. The voice control method according to claim 2, wherein the noise reduction processing is performed on the plurality of sets of initial voice data based on the preset noise spectrum to obtain a plurality of sets of noise reduced voice data, and further comprising: Performing Fourier transform processing on the multiple groups of initial voice data to obtain noisy spectrums corresponding to the multiple groups of initial voice data respectively; Determining noise reduction coefficients corresponding to each group of initial voice data respectively based on the distance between the target sound source position and each voice interaction device; and determining the plurality of groups of noise reduction voice data based on the noisy spectrum, the preset noise spectrum and the noise reduction coefficient.
- 5. The voice control method according to claim 2, wherein the performing data fusion processing on the plurality of sets of noise reduction voice data to obtain the target voice data includes: Determining weight coefficients corresponding to each group of noise reduction voice data respectively based on the distance between the target sound source position and each voice interaction device; and carrying out data fusion processing on the plurality of groups of noise reduction voice data based on the weight coefficient to obtain the target voice data.
- 6. The voice control method according to claim 1, wherein the determining the target sound source position based on the phase information of the plurality of sets of initial voice data includes: Performing time synchronization processing on the multiple groups of initial voice data to obtain multiple groups of synchronous voice data; Determining a target phase difference between the plurality of sets of synchronous voice data based on the phase information of the plurality of sets of synchronous voice data; determining a target space spectrum based on the target phase difference, wherein the target space spectrum represents the matching degree of preset phase differences corresponding to different preset positions and the target phase difference; and determining a preset position corresponding to the peak value of the target spatial spectrum as the target sound source position.
- 7. The voice control method according to claim 6, wherein the performing time synchronization processing on the plurality of sets of initial voice data to obtain a plurality of sets of synchronous voice data includes: determining one of the plurality of voice interaction devices as a reference device; Calculating the time difference of non-reference voice data relative to reference voice data, wherein the non-reference voice data is initial voice data corresponding to non-reference equipment, the reference voice data is initial voice data corresponding to the reference equipment, and the non-reference equipment is other voice interaction equipment except the reference equipment in the voice interaction equipment; and respectively carrying out translation processing on each group of non-reference voice data based on the time difference corresponding to each group of non-reference voice data to obtain a plurality of groups of synchronous voice data.
- 8. The voice control method according to claim 1, wherein the performing voice recognition processing on the target voice data to generate a target control instruction includes: Performing voice recognition processing on the target voice data to obtain a target voice recognition result, wherein the target voice recognition result comprises equipment information and operation information corresponding to the target voice interaction equipment; and generating the target control instruction based on the equipment information and the operation information, and sending the target control instruction to the target voice interaction equipment.
- 9. A network device, characterized in that the network device comprises a controller for performing the voice control method according to any of the preceding claims 1-8.
- 10. A voice control system, wherein the system comprises a plurality of acquisition ends and a controller; the plurality of acquisition terminals are used for respectively acquiring initial voice data in response to a target voice instruction and sending the initial voice data to the controller; The controller being adapted to perform the speech control method according to any of the preceding claims 1-8.
Description
Voice control method, system and network equipment Technical Field The present application relates to the field of voice interaction technologies, and in particular, to a voice control method, a system, and a network device. Background The device with the voice interaction function can receive the voice command of the user and execute corresponding operation according to the voice command. However, when the user is in a noisy environment or the user is far away from the device to be controlled, the problem that the voice command cannot be identified or the voice command identification result is inaccurate easily occurs when the user controls the device to be controlled through the voice command, and the user experience is affected. Disclosure of Invention In order to solve the technical problems, the application discloses a voice control method, a voice control system and a network device, wherein a plurality of voice interaction devices respectively collect voice data, the network device generates a control instruction aiming at a target voice interaction device based on a plurality of groups of voice data, and voice control on the target voice interaction device can be realized under the condition that the environment is noisy or a user is far away from the target voice interaction device, so that the voice control effect is improved. In one aspect, the present application provides a voice control method, the method comprising: acquiring multiple groups of initial voice data, wherein the multiple groups of initial voice data are voice data respectively acquired by multiple voice interaction devices in response to target voice instructions; determining a target sound source position based on the phase information of the plurality of sets of initial voice data; noise reduction processing is carried out on the plurality of groups of initial voice data based on the target sound source position, so that target voice data are obtained; and carrying out voice recognition processing on the target voice data to generate a target control instruction, wherein the target control instruction is used for indicating target voice interaction equipment to execute the operation corresponding to the target voice instruction. In some embodiments, the noise reduction processing is performed on the multiple sets of initial voice data based on the target sound source position to obtain target voice data, including: Determining a noise type based on the target sound source location; Acquiring a preset noise spectrum corresponding to the noise type; Noise reduction processing is carried out on the plurality of groups of initial voice data based on the preset noise spectrum, so that a plurality of groups of noise reduction voice data are obtained; And carrying out data fusion processing on the plurality of groups of noise reduction voice data to obtain the target voice data. In some embodiments, the denoising processing is performed on the plurality of sets of initial voice data based on the preset noise spectrum to obtain a plurality of sets of denoised voice data, including: Performing Fourier transform processing on the multiple groups of initial voice data to obtain noisy spectrums corresponding to the multiple groups of initial voice data respectively; The plurality of sets of noise reduced speech data are determined based on a difference between the noisy spectrum and the preset noise spectrum. In some embodiments, the denoising processing is performed on the plurality of groups of initial voice data based on the preset noise spectrum to obtain a plurality of groups of denoising voice data, and the method further includes: Performing Fourier transform processing on the multiple groups of initial voice data to obtain noisy spectrums corresponding to the multiple groups of initial voice data respectively; Determining noise reduction coefficients corresponding to each group of initial voice data respectively based on the distance between the target sound source position and each voice interaction device; and determining the plurality of groups of noise reduction voice data based on the noisy spectrum, the preset noise spectrum and the noise reduction coefficient. In some embodiments, the performing data fusion processing on the plurality of sets of noise reduction voice data to obtain the target voice data includes: Determining weight coefficients corresponding to each group of noise reduction voice data respectively based on the distance between the target sound source position and each voice interaction device; and carrying out data fusion processing on the plurality of groups of noise reduction voice data based on the weight coefficient to obtain the target voice data. In some embodiments, the determining the target sound source location based on the phase information of the plurality of sets of initial speech data includes: Performing time synchronization processing on the multiple groups of initial voice data to obtain mul