KR-102962287-B1 - VOICE RECOGNITION DEVICE, METHOD AND COMPUTER PROGRAM FOR CONTROLLING OUTPUT VOLUME

KR102962287B1KR 102962287 B1KR102962287 B1KR 102962287B1KR-102962287-B1

Abstract

A voice recognition device for controlling output volume may include an output gain value calculation unit that calculates an output gain value of a media playback device and an output gain value of a voice recognition device based on whether the media playback device is operated, a reference gain deviation value calculation unit that calculates a reference gain deviation value between the media playback device and the voice recognition device based on a reference output gain value of the media playback device and a reference output gain value of the voice recognition device, and a volume control unit that controls the output volume of the voice recognition device based on the output gain value of the media playback device, the output gain value of the voice recognition device, and the reference gain deviation value.

Inventors

정재훈
김종엽
김희경

Assignees

주식회사 케이티

Dates

Publication Date: 20260508
Application Date: 20210315

Claims (19)

In a voice recognition device that controls output volume, An output gain value calculation unit that calculates the output gain value of the media playback device and the output gain value of the voice recognition device based on whether the media playback device is operating; A reference gain deviation value calculation unit that calculates a reference gain deviation value between the media playback device and the voice recognition device based on the reference output gain value of the media playback device and the reference output gain value of the voice recognition device; and It includes a volume control unit that controls the output volume of the voice recognition device based on the output gain value of the media playback device, the output gain value of the voice recognition device, and the reference gain deviation value. A voice recognition device further comprising an echo return loss value calculation unit that calculates an echo return loss value (ERLE, Echo Return Loss Enhancement) based on microphone input level information for sound input into the microphone of the voice recognition device during a preset time and output level information for a user voice with the echo removed.
In Article 1, A voice recognition device in which, when the media playback device is operating, the output gain value calculation unit calculates the output gain value of the media playback device based on microphone input level information and echo reference sound information regarding the playback sound of the media playback device input into the microphone of the voice recognition device.
In Article 1, A voice recognition device in which the output gain value calculation unit calculates the output gain value of the voice recognition device based on microphone input level information and echo reference sound information for sound input into the microphone of the voice recognition device when the media playback device is not operating.
delete
In Article 1, A voice recognition device further comprising a judgment unit that determines the proportion of the sound played by the voice recognition device or the sound played by the media playback device among the sound input to the microphone of the voice recognition device based on a comparison between the above-mentioned echo regression loss value and a predefined echo regression loss threshold.
In Article 5, A reference output gain value setting unit that sets the output gain value of the media playback device as a reference output gain value when the media playback device is in an operating state and the echo regression loss value exceeds a predefined echo regression loss threshold. A voice recognition device that further includes
In Article 5, A reference output gain value setting unit that sets the output gain value of the voice recognition device as a reference output gain value when the media playback device is not operating and the echo regression loss value exceeds a predefined echo regression loss threshold. A voice recognition device that further includes
In Article 1, The above volume control unit A voice recognition device that controls the output volume of the voice recognition device based on the difference value and the reference gain deviation value when the difference value between the output gain value of the media playback device and the output gain value of the voice recognition device is greater than or equal to a preset volume deviation threshold.
In Article 1, A voice recognition device having an output range wider than the output range of the media playback device.
In a speech recognition method for controlling output volume performed by a speech recognition device, A step of calculating the output gain value of the media playback device and the output gain value of the voice recognition device based on whether the media playback device is operating; A step of calculating a reference gain deviation value between the media playback device and the voice recognition device based on the reference output gain value of the media playback device and the reference output gain value of the voice recognition device; and The method includes the step of controlling the output volume of the voice recognition device based on the output gain value of the media playback device, the output gain value of the voice recognition device, and the reference gain deviation value. A speech recognition method further comprising the step of calculating an Echo Return Loss Enhancement (ERLE) value based on microphone input level information for sound input into the microphone of the speech recognition device during a preset time and output level information for the user voice from which echo has been removed.
In Article 10, The step of calculating the output gain value of the media playback device and the output gain value of the voice recognition device A voice recognition method comprising the step of calculating an output gain value of the media playback device based on microphone input level information and echo reference sound information regarding the playback sound of the media playback device input into the microphone of the voice recognition device when the media playback device is operating.
In Article 10, The step of calculating the output gain value of the media playback device and the output gain value of the voice recognition device A voice recognition method comprising the step of calculating an output gain value of the voice recognition device based on microphone input level information and echo reference sound information for sound input into the microphone of the voice recognition device when the media playback device is not operating.
delete
In Article 10, A speech recognition method further comprising the step of determining the proportion of the sound played by the speech recognition device or the sound played by the media playback device among the sound input to the microphone of the speech recognition device based on a comparison between the above-mentioned echo regression loss value and a predetermined echo regression loss threshold.
In Article 14, A speech recognition method comprising the step of setting the output gain value of the media playback device as a reference output gain value when the media playback device is in an operating state and the echo regression loss value exceeds a predetermined echo regression loss threshold.
In Article 14, A speech recognition method further comprising the step of setting the output gain value of the speech recognition device to a reference output gain value when the media playback device is not in operation and the echo regression loss value exceeds a predetermined echo regression loss threshold.
In Article 10, The step of controlling the output volume of the voice recognition device above is A voice recognition method comprising the step of controlling the output volume of the voice recognition device based on the difference value and the reference gain deviation value when the difference value between the output gain value of the media playback device and the output gain value of the voice recognition device is greater than or equal to a preset volume deviation threshold.
In Article 10, A voice recognition method in which the output range of the voice recognition device is wider than the output range of the media playback device.
A computer program stored on a computer-readable recording medium comprising a sequence of instructions for controlling an output volume performed by a voice recognition device, When the above computer program is executed by a computing device, Calculate the output gain value of the media playback device and the output gain value of the voice recognition device based on whether the media playback device is operating, and Based on the reference output gain value of the media playback device and the reference output gain value of the voice recognition device, a reference gain deviation value between the media playback device and the voice recognition device is calculated, and The output volume of the voice recognition device is controlled based on the output gain value of the media playback device, the output gain value of the voice recognition device, and the reference gain deviation value. A computer program stored on a computer-readable recording medium, comprising a sequence of instructions for calculating an Echo Return Loss Enhancement (ERLE) value based on microphone input level information for sound input into the microphone of the voice recognition device and output level information for a user voice with echo removed during a preset time.

Description

Voice recognition device, voice recognition method and computer program for controlling output volume This invention relates to a speech recognition device for controlling output volume, a speech recognition method, and a computer program. Recently, the range of applications for voice recognition devices has been diversifying from AI terminals with built-in speakers (e.g., Amazon Echo) to AI set-top boxes. An AI set-top box that supports conversational voice services has a first audio output interface through the media playback device of an existing set-top box and a second audio output interface through a built-in speaker. The first audio output interface is for connection with a media playback device (or external speaker) and supports playback of video and audio from the set-top box to the media playback device. The second audio output interface is the built-in speaker of the AI set-top box, and delivers streamed music or voice recognition results to the user as TTS voice. The output volume range of the built-in speaker is greater than the output volume range of the media player, and this difference in output volume inevitably causes a difference in the amount of volume change between each volume table step. For example, if the built-in speaker supports an output volume range of -60 to 0dB and the media player supports an output volume range of -30 to 0dB, the built-in speaker will have an average change of 2dB (=60/30) per step, and the media player will have a change of 2dB (=30/30). At this time, even if the perceived range of change in volume is somewhat uniform with the unit being dB, the range differs by a factor of two, so even if the two volumes are matched to similar sounds at a specific step, they will have different sounds at the remaining steps. Due to these volume differences, users may experience inconveniences such as having to further adjust the volume of the media player, or the sound (response sound) from the AI set-top box's built-in speaker being too quiet to hear or being output too loudly. FIG. 1 is a configuration diagram of an output volume control system according to one embodiment of the present invention. FIG. 2 is a block diagram of a voice recognition device illustrated in FIG. 1 according to an embodiment of the present invention. FIG. 3 is a diagram illustrating a speech recognition processing method performed by a speech recognition device according to an embodiment of the present invention. FIG. 4 is a flowchart illustrating a method for controlling the output volume of a voice recognition device according to an embodiment of the present invention. Embodiments of the present invention are described below with reference to the attached drawings so that those skilled in the art can easily implement the invention. However, the present invention may be embodied in various different forms and is not limited to the embodiments described herein. Furthermore, in order to clearly explain the present invention in the drawings, parts unrelated to the explanation have been omitted, and similar parts throughout the specification are denoted by similar reference numerals. Throughout the specification, when a part is described as being "connected" to another part, this includes not only cases where they are "directly connected," but also cases where they are "electrically connected" with other components interposed between them. Furthermore, when a part is described as "including" a certain component, this means that, unless specifically stated otherwise, it does not exclude other components but may include additional components. In this specification, the term "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Additionally, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware. Some of the operations or functions described in this specification as being performed by a terminal or device may instead be performed by a server connected to said terminal or device. Likewise, some of the operations or functions described as being performed by a server may also be performed by a terminal or device connected to said server. Hereinafter, specific details for implementing the present invention will be described with reference to the attached configuration diagram or process flowchart. FIG. 1 is a configuration diagram of an output volume control system according to one embodiment of the present invention. Referring to FIG. 1, the output volume control system may include a voice recognition device (100) and a media playback device (110). However, since the output volume control system of FIG. 1 is merely one embodiment of the present invention, the present invention is not limited to FIG. 1 and may be configured differently from FIG. 1 according to various embodiments of the present invention. The voice recognition device (100) is a device that combines an existing artificial intelligence sp