US-12626711-B2 - High-quality voice signal processing device and method through removal of ambient noise based on multi-sensor signal fusion

US12626711B2US 12626711 B2US12626711 B2US 12626711B2US-12626711-B2

Abstract

A high-quality voice signal processing device through removal of ambient noise based on multi-sensor signal fusion, includes: a voice microphone sensor that senses and outputs a speaker's voice signal; an accelerometer sensor that senses vibration of the speaker's vocal cords and outputs a signal; a noise reduction processing MCU that extracts a voice section according to vocal cord vibration using the output signal of the accelerometer sensor, synthesizes a low-frequency component of the accelerometer sensor and a low-frequency component of the voice microphone sensor at different synthesis ratios based on a level of noise extracted from the output signal of the voice microphone sensor using voice section information, and restores and outputs a voice signal by adding the synthesized low-frequency components and a high-frequency component of the voice microphone sensor; and a wireless communication module that externally outputs the restored voice signal.

Inventors

Seung Tae Kim
Ju In LIM
Yong Hun SONG

Assignees

INTUS. CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20230912
Priority Date: 20220919

Claims (10)

1 . A high-quality voice signal processing device through removal of ambient noise based on multi-sensor signal fusion, the device comprising: a voice microphone sensor that senses and outputs a speaker's voice signal; an accelerometer sensor that senses vibration of the speaker's vocal cords and outputs a signal; a noise reduction processing microcontroller unit (MCU) that extracts a voice section according to vocal cord vibration using the output signal of the accelerometer sensor, determines a signal outside the voice section in the output signal of the voice microphone sensor as noise using voice section information, extracts and removes the output signal determined as noise, synthesizes a low-frequency component of the accelerometer sensor and a low-frequency component of the voice microphone sensor at different synthesis ratios based on a level of the noise extracted from the output signal of the voice microphone sensor using voice section information, and restores and outputs a voice signal by adding the synthesized low-frequency components and a high-frequency component of the voice microphone sensor; and a wireless communication module that externally outputs the restored voice signal, wherein in synthesizing the low-frequency component of the accelerometer sensor and the low-frequency component of the voice microphone sensor, the noise reduction processing MCU causes the low-frequency component of the accelerometer sensor to be further included when the level of the noise extracted from the output signal of the voice microphone sensor is higher than a reference value, and causes the low-frequency component of the voice microphone sensor to be further included when the level of the noise extracted from the output signal of the voice microphone sensor is lower than the reference value.
2 . The high-quality voice signal processing device of claim 1 , wherein the noise reduction processing MCU separates the output signal of the voice microphone sensor in the voice section into the low-frequency component of the voice microphone sensor and a high-frequency component of the voice microphone sensor.
3 . The high-quality voice signal processing device of claim 1 , wherein the noise reduction processing MCU includes: a voice section extractor for extracting a voice section according to vocal cord vibration using the output signal of the accelerometer sensor; an accelerometer sensor low-frequency component processing unit that processes the low-frequency component of the accelerometer sensor; a voice microphone sensor noise extraction and removal unit that determines the signal outside the voice section in the output signal of the voice microphone sensor as noise using the voice section information, and extracts and removes the signal determined as noise; a noise level determination unit that determines a level of the noise extracted from the output signal of the voice microphone sensor; a voice microphone sensor low-frequency component processing unit and a voice microphone sensor high-frequency component processing unit that separate and process the output signal of the voice microphone sensor in the voice section into the low-frequency component of the voice microphone sensor and the high-frequency component of the voice microphone sensor; a synthesis unit that synthesizes the low-frequency component of the accelerometer sensor and the low-frequency component of the voice microphone sensor at different synthesis ratios based on the noise level determined by the noise level determination unit; and a voice signal restoration output unit that restores and outputs a voice signal by adding the synthesized low-frequency components and the high-frequency component of the voice microphone sensor.
4 . A high-quality voice signal processing device through removal of ambient noise based on multi-sensor signal fusion, the device comprising: a first and a second voice microphone sensor spaced apart from each other to sense and output a speaker's voice signal; an accelerometer sensor that senses vibration of the speaker's vocal cords and outputs a signal; a noise reduction processing MCU that extracts a voice section according to vocal cord vibration using the output signal of the accelerometer sensor, determines a signal outside the voice section in the output signal of the first and second voice microphone sensor as noise using voice section information, extracts and removes the output signal determined as noise, synthesizes a low-frequency component of the accelerometer sensor and low-frequency components of the first and the second voice microphone sensor at different synthesis ratios based on a level of noise extracted from the output signals of the first and the second voice microphone sensor using voice section information, and restores and outputs a voice signal by adding the synthesized low-frequency components and high-frequency components of the first and the second voice microphone sensor; and a wireless communication module that externally outputs the restored voice signal, wherein in synthesizing the low-frequency component of the accelerometer sensor and the low-frequency components of the first and the second voice microphone sensor, the noise reduction processing MCU causes the low-frequency component of the accelerometer sensor to be further included when the level of the noise extracted from the output signals of the first and the second voice microphone sensor is higher than a reference value, and causes the low-frequency component of the first and the second voice microphone sensor to be further included when the level of the noise extracted from the output signals of the first and the second voice microphone sensor is lower than the reference value.
5 . The high-quality voice signal processing device of claim 4 , wherein the noise reduction processing MCU separates the output signals of the first and the second voice microphone sensor in the voice section into the low-frequency component of the first and the second voice microphone sensor and a high-frequency component of the first and the second voice microphone sensor.
6 . The high-quality voice signal processing device of claim 5 , wherein the noise reduction processing MCU primarily removes noise from the output signals of the first and the second voice microphone sensor using the voice section information, secondarily removes noise from the output signals of the first and second voice microphone sensors from which the noise is primarily removed using a beamforming algorithm, and thirdly removes noise from the output signals of the first and the second voice microphone sensor from which the noise is secondarily removed using the voice section information again.
7 . The high-quality voice signal processing device of claim 4 , wherein the noise reduction processing MCU includes: a voice section extractor for extracting a voice section according to vocal cord vibration using the output signal of the accelerometer sensor; an accelerometer sensor low-frequency component processing unit that processes the low-frequency component of the accelerometer sensor; a noise extraction and removal unit that performs a primary noise extraction and removal using the voice section information, a secondary noise extraction and removal using a beamforming algorithm, and a tertiary noise extraction and removal using the voice section information again on the output signals of the first and the second voice microphone sensor; a noise level determination unit that determines a level of the noise extracted through the first noise extraction and removal in the noise extraction and removal unit; a voice microphone sensor low-frequency component processing unit and a voice microphone sensor high-frequency component processing unit that separate and process the output signals of the first and the second voice microphone sensor on which the tertiary noise extraction and removal has been performed into a low-frequency component and a high-frequency component of the first and the second voice microphone sensor; a synthesis unit that synthesizes the low-frequency component of the accelerometer sensor and the low-frequency components of the first and the second voice microphone sensor at different synthesis ratios based on the noise level determined by the noise level determination unit; and a voice signal restoration output unit that restores and outputs a voice signal by adding the synthesized low-frequency components and the high-frequency components of the first and the second voice microphone sensor.
8 . The high-quality voice signal processing device of claim 7 , wherein the noise extraction and removal unit includes: a first noise extraction and removal unit that extracts and primarily removes signals outside the voice section as noise from the output signals of the first and the second voice microphone sensor using the voice section information; a second noise extraction and removal unit that secondarily removes noise from the output signals of the first and the second voice microphone sensor from which the noise is firstly removed using a beamforming algorithm; and a third noise extraction and removal unit that extracts and thirdly removes signals outside the voice section as noise from the output signals of the first and second voice microphone sensor from which the noise has been secondarily removed using the voice section information again.
9 . A high-quality voice signal processing method through removal of ambient noise based on multi-sensor signal fusion, the method comprising: extracting a voice section according to vocal cord vibration using an output signal of an accelerometer sensor; determining a signal outside the voice section in an output signal of a voice microphone sensor as noise using voice section information, extracting and removing the output signal determined as noise, and separating the output signal of the voice microphone sensor in the voice section into a low-frequency component of the voice microphone sensor and a high-frequency component of the voice microphone sensor; determining a level of the noise extracted from the output signal of the voice microphone sensor; synthesizing a low-frequency component of the accelerometer sensor and the low-frequency component of the voice microphone sensor at different synthesis ratios based on the determined noise level; and restoring and outputting a voice signal by adding the synthesized low-frequency components and the high-frequency component of the voice microphone sensor, wherein in the synthesizing of the low-frequency component of the accelerometer sensor and the low-frequency component of the voice microphone sensor, the low-frequency component of the accelerometer sensor is further included when the level of the noise extracted from the output signal of the voice microphone sensor is higher than a reference value, and the low-frequency component of the voice microphone sensor is further included when the level of the noise extracted from the output signal of the voice microphone sensor is lower than the reference value.
10 . A high-quality voice signal processing method through removal of ambient noise based on multi-sensor signal fusion, the method comprising: extracting a voice section according to vocal cord vibration using an output signal of an accelerometer sensor; determining a signal outside the voice section in output signals of a first and a second voice microphone sensor as noise using voice section information, and extracting and primarily removing the output signal determined as noise; secondarily removing noise from the output signals of the first and the second voice microphone sensor from which the noise has been primarily removed using a beamforming algorithm; determining a signal outside the voice section as noise in the signal from which the noise has been secondarily removed, thirdly removing noise the signal determined as noise, and separating the output signals of the first and second voice microphone sensors in the voice section into low-frequency components of the first and second voice microphone sensors and high-frequency components of the first and second voice microphone sensors; determining a level of the noise extracted from the output signals of the first and second voice microphone sensors; synthesizing a low-frequency component of the accelerometer sensor and the low-frequency components of the first and second voice microphone sensors at different synthesis ratios based on the determined noise level; and restoring and outputting a voice signal by adding the synthesized low-frequency components and the high-frequency components of the first and second voice microphone sensors, wherein in the synthesizing of the low-frequency component of the accelerometer sensor and the low-frequency components of the first and second voice microphone sensors, the low-frequency component of the accelerometer sensor is further included when the level of the noise extracted from the output signals of the first and second voice microphone sensors is higher than a reference value, and the low-frequency components of the first and second voice microphone sensors are further included when the level of the noise extracted from the output signals of the first and second voice microphone sensors is lower than the reference value.

Description

CROSS-REFERENCE TO PRIOR APPLICATION This application claims priority to Korean Patent Application No. 10-2022-0118014 (filed on Sep. 19, 2022), which is hereby incorporated by reference in its entirety. BACKGROUND The present disclosure relates to voice signal processing, and more specifically, to a high-quality voice signal processing device and method through removal of ambient noise based on multi-sensor signal fusion, which enables robust voice signal processing in an external noise environment through multi-sensor signal fusion using an accelerometer sensor (ACC) and a voice microphone sensor (MIC). In general, a microphone is a means for converting a sender's voice into an electrical signal and transmitting it to a receiver. The microphones include a wired microphone, a wireless microphone, and the like, and are mostly configured in a manner that transmits voice coming out of a user's mouth while being mounted or located near the user's mouth. Due to the inconvenience of general microphones, and due to excessive noise, impossibility of use when wearing a helmet or dustproof clothing, or unclear voice transmission of the general microphones, not only special workers such as security guards and special agents, but also ordinary people are increasingly using throat microphones that transmit voice through the resonance of the vocal cords. The throat microphone, unlike the general microphone, transmits voice signals through the vibration of the vocal cords, so a user does not need to make loud sounds, which is useful for security personnel, and is also useful for ordinary people since it can transmit clearer voice signals without noise. Meanwhile, since the throat microphone collects vibration signals according to the vibration of the vocal cords and converts them into electrical signals, it needs to be perfectly protected from the external environment and be able to remove noise in collecting the signals according to the vibration of the vocal cords. Accordingly, the throat microphone requires a very high level of technical skill. FIG. 1 shows graphs of frequency characteristics of a voice microphone and a throat microphone, and FIG. 2 is a configuration diagram showing a noise removal principle through active noise canceling. In general, the throat microphone uses an inductive vibration sensor as a means for converting vibration. The inductive vibration sensor has a structure including a diaphragm, a coil, a permanent magnet, and the like, and the light coil is connected to the diaphragm. When the diaphragm and the coil vibrate together, the inductive vibration sensor converts vocal cord vibration into an electrical signal using the principle that the magnetic field around the coil is changed by the permanent magnet in the center of the coil and at the same time a voltage is generated in the coil. However, in such an inductive vibration sensor, the frequency response decreases in proportion to the frequency. For this reason, the inductive vibration sensor has a problem in that it cannot properly transmit voice of a high frequency component compared to voice of a low frequency component, and the clarity of the voice is lowered. A technology using an accelerometer sensor for the throat microphone has been introduced, but this also has limitations in obtaining a high-quality voice signal. Meanwhile, active noise canceling technology is used to obtain a high-quality voice signal in a microphone environment, and it can effectively respond to and process regular low-frequency noise, but is not effective for irregular noise in the high-pitched range and may even cause noise in certain environments. Accordingly, there is a demand for developing a new technology capable of processing an input noisy voice signal to obtain a high-quality voice signal. PRIOR ART DOCUMENT Patent Document (Patent Document 1) Korean Patent Application Publication No. 10-2021-0101644(Patent Document 2) Korean Patent No. 10-0873094(Patent Document 3) Korean Patent Application Publication No. 10-2018-0093363 SUMMARY In view of the above, the present disclosure provides a high-quality voice signal processing device and method through removal of ambient noise based on multi-sensor signal fusion, which enables robust voice signal processing in an external noise environment as a multi-sensor signal fusion using an accelerometer sensor (ACC) and a voice microphone sensor (MIC). The present disclosure provides a high-quality voice signal processing device and method through removal of ambient noise based on multi-sensor signal fusion, which enables efficient voice signal processing by extracting and removing noise from an output signal of the voice microphone sensor (MIC) using voice section information of the accelerometer sensor (ACC). The present disclosure provides a high-quality voice signal processing device and method through removal of ambient noise based on multi-sensor signal fusion, which enables to increase the quality of a voice s