CN-121983009-A - KTV song requesting system and method for requesting songs based on microphone voice instruction analysis

CN121983009ACN 121983009 ACN121983009 ACN 121983009ACN-121983009-A

Abstract

The application relates to the technical field of intelligent voice recognition, and discloses a KTV song requesting system and method for requesting songs based on microphone voice instruction analysis. The wireless microphone is used for distributing the voice signal of the voice into parallel sound amplifying signal flow and command signal flow in real time during key triggering, the microphone receiver packages the command signal flow into data frames with channel state identifiers and audio data and is connected to the song requesting set top box through a special data transmission line, the song requesting set top box comprises a synchronous sampling module and a command analysis engine, the synchronous sampling module is used for synchronously sampling echo reference signals according to the channel state identifiers, and the command analysis engine carries out self-adaptive echo cancellation processing and semantic recognition on the audio data based on the echo reference signals so as to execute song requesting operation. The method is applied to the system. The application can effectively eliminate the interference of background music and the like and improve the recognition accuracy of voice instructions.

Inventors

LIU MINGQIANG
SONG JIAYU

Assignees

湖南四方朋友科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260116

Claims (10)

1. KTV song requesting system for requesting song based on microphone voice command analysis, which is characterized by comprising: The wireless microphone comprises a voice command key and a signal distribution circuit, wherein the signal distribution circuit is configured to distribute collected voice signals into parallel sound amplifying signal streams and command signal streams in real time during the period that the voice command key is triggered; A microphone receiver connected to the wireless microphone through a radio frequency link and configured to package the received instruction signal stream into a data frame having a channel state identification and audio data through a data communication interface, and to connect an acoustic amplification device through an audio output interface to output the amplified signal stream; The song-requesting set top box is connected with a data communication interface of the microphone receiver through a special data transmission line and comprises an instruction analysis engine and a synchronous sampling module, wherein the synchronous sampling module is configured to synchronously sample a background music digital signal output by the song-requesting set top box per se in an instruction analysis window determined according to a channel state identification analyzed from a data frame, the background music digital signal is used as an echo reference signal, and the instruction analysis engine is configured to perform self-adaptive echo cancellation processing on audio data analyzed from the data frame based on the echo reference signal and perform semantic recognition on the processed audio data to execute corresponding song-requesting operation.
2. The KTV song-requesting system for requesting song based on microphone voice command parsing of claim 1 wherein the wireless microphone further comprises a mute control module controlled by the voice command button, wherein the mute control module is configured to attenuate or block the amplified signal stream when the voice command button is pressed, so that the voice signal is transmitted to the song-requesting set-top box only through the command signal stream.
3. The KTV song requesting system according to claim 1, wherein the adaptive echo cancellation process comprises: Acquiring an echo reference signal vector corresponding to the echo reference signal ; Vector the echo reference signal Input to vector with weight Is provided with an adaptive filter which is arranged in the filter, wherein the weight vector For said echo reference signal vector Weighting to simulate an echo path and generating an echo estimation signal; subtracting the echo estimation signal from the audio data of the data frame to obtain a residual signal And is used as a target voice signal after echo cancellation; From the residual signal And the echo reference signal vector And updating the weight vector of the self-adaptive filter by adopting a normalized least mean square algorithm, wherein the updating formula is as follows: Wherein, the In order to update the weight vector after the update, To control the step size factor of the algorithm convergence speed and stability, To prevent regularization constants with denominators of zero.
4. The KTV song-requesting system based on microphone voice command parsing according to claim 1, wherein the command parsing engine includes a deep learning-based command recognition model, the command recognition model takes acoustic features of the audio data after echo cancellation processing as input, and performs joint decoding in combination with a KTV song-requesting domain bias language model, wherein the KTV song-requesting domain bias language model is obtained by training using a dedicated text corpus in the KTV song-requesting domain, the dedicated text corpus includes a set of song names, singer names, song-requesting control commands, and command common spoken language expressions.
5. The KTV song requesting system based on microphone voice command parsing of claim 4, wherein the joint decoding process comprises: The acoustic model for obtaining the instruction recognition model is based on input acoustic features Calculated word sequence probability ; Obtaining the KTV song requesting field bias language model as a word sequence Assigned domain prior probability =; By weighting fusion of the word sequence probabilities Probability of prior to the domain Is searched to obtain the word sequence with highest fusion score As a result of the recognition, its decision logic is: Wherein, the Fusion weights for the domain bias language model, Representing a traversal search for all possible word sequences To find the word sequence that maximizes the objective function value as output 。
6. The KTV song requesting system based on microphone voice command parsing of claim 5, wherein the command recognition model comprises: An acoustic feature extraction unit configured to convert the audio data after the echo cancellation processing into a mel-frequency cepstrum coefficient feature sequence at a frame level; A temporal feature encoder, coupled to the acoustic feature extraction unit, comprising a plurality of causal expansion convolution layers configured to perform deep temporal modeling on the mel-frequency cepstral coefficient feature sequence, outputting a high-dimensional context feature sequence; An attention decoder, coupled to the timing feature encoder, configured to apply an attention mechanism to the high-dimensional context feature sequence, generate a sequence comprising candidate tokens, and calculate the word sequence probability 。
7. The KTV song requesting system of claim 1 wherein the microphone receiver further comprises a collision arbitration module configured to package the command signal stream into the data frame and forward to the song requesting set top box according to a preset arbitration policy when command signal streams from a plurality of the wireless microphones are received simultaneously.
8. The KTV song requesting system for requesting songs based on microphone voice command analysis of claim 7 wherein the arbitration strategy comprises comparing received signal strength indication values corresponding to the command signal streams and forwarding the command signal streams in sequence from high to low.
9. The KTV song requesting system based on microphone voice command analysis according to claim 1, wherein the dedicated data transmission line adopts USB or I2S communication protocol, and the song requesting set top box sends a master clock signal to the microphone receiver through the dedicated data transmission line, so that the audio data sent by the microphone receiver is synchronized with the sampling clock of the synchronous sampling module.
10. A KTV song requesting method based on microphone voice command analysis for requesting songs, applied to the KTV song requesting system based on microphone voice command analysis for requesting songs as claimed in any one of claims 1-9, characterized in that the method comprises the following steps: When a voice command key on a wireless microphone is triggered, a signal distribution circuit is started to distribute human voice signals acquired by the wireless microphone into parallel sound amplifying signal streams and command signal streams in real time; The microphone receiver outputs the amplified signal stream to the audio amplifying device through the audio output interface, and packages the received command signal stream into a data frame with channel state identification and audio data through the data communication interface; the microphone receiver sends the data frame to a song requesting set top box through a special data transmission line; The song ordering set top box determines an instruction analysis window according to the channel state identification analyzed from the data frame, and synchronously samples a background music digital signal output by the song ordering set top box in the analysis window to generate an echo reference signal; The song requesting set top box carries out self-adaptive echo cancellation processing on the audio data analyzed from the data frame based on the echo reference signal, carries out semantic recognition on the processed audio data, and executes corresponding song requesting operation according to the recognition result.

Description

KTV song requesting system and method for requesting songs based on microphone voice instruction analysis Technical Field The application relates to the technical field of intelligent voice recognition, in particular to a KTV song requesting system and method for requesting songs based on microphone voice instruction analysis. Background With the development of voice control technology, intelligent voice assistants have been widely used in various consumer electronics and entertainment devices. In KTV scenes, there is an increasing demand for users to directly perform operations such as song requesting and song cutting through voice. However, this technology is faced with unique and serious technical challenges when applied to the KTV environment. Conventional voice assistants (e.g., smart speakers) typically employ a "hot word wake-up + listen-on-hold" mode of operation. The mode has the obvious defects that firstly, high-volume background music continuously played in a box, mixed chatting sounds of a plurality of persons and singing sounds form strong environmental noise and interference echoes, voice instructions of users are easily submerged, and the recognition rate is rapidly reduced. In addition, the open continuous listening mode makes the system unable to distinguish the song requesting command of the user from irrelevant background dialogue or lyrics, and is easy to generate false wake-up and false recognition, and interfere with normal singing. In addition, the interaction mode of shouting and fixing wake-up words (such as 'little colleagues') before each operation is complicated and unnatural in KTV scenes needing frequent adjustment, and the consistency and immersion of singing are destroyed. Although the prior art attempts to reduce noise and eliminate echo through a software algorithm, in the environments of KTV, such as complex sound source and strong signal coupling, the processing effect is limited only by a back-end algorithm, and the problem that the instruction signal is seriously polluted in the input stage cannot be fundamentally solved. Therefore, there is a need for a KTV song requesting technical scheme capable of realizing accurate, private and convenient voice instruction interaction under a strong interference environment by innovation from a system level. Disclosure of Invention The application aims to provide a KTV song requesting system and method for requesting songs based on microphone voice instruction analysis, which are used for solving the technical problems in the background technology. In order to achieve the above purpose, the present application discloses the following technical solutions: In a first aspect, the application discloses a KTV song requesting system for requesting songs based on microphone voice instruction analysis, comprising: The wireless microphone comprises a voice command key and a signal distribution circuit, wherein the signal distribution circuit is configured to distribute collected voice signals into parallel sound amplifying signal streams and command signal streams in real time during the period that the voice command key is triggered; A microphone receiver connected to the wireless microphone through a radio frequency link and configured to package the received instruction signal stream into a data frame having a channel state identification and audio data through a data communication interface, and to connect an acoustic amplification device through an audio output interface to output the amplified signal stream; The song-requesting set top box is connected with a data communication interface of the microphone receiver through a special data transmission line and comprises an instruction analysis engine and a synchronous sampling module, wherein the synchronous sampling module is configured to synchronously sample a background music digital signal output by the song-requesting set top box per se in an instruction analysis window determined according to a channel state identification analyzed from a data frame, the background music digital signal is used as an echo reference signal, and the instruction analysis engine is configured to perform self-adaptive echo cancellation processing on audio data analyzed from the data frame based on the echo reference signal and perform semantic recognition on the processed audio data to execute corresponding song-requesting operation. Optionally, the wireless microphone further comprises a mute control module controlled by the voice command key, and the mute control module is configured to perform attenuation processing or blocking processing on the amplified signal stream when the voice command key is pressed, so that the voice signal is transmitted to the song-requesting set top box only through the command signal stream. Optionally, the adaptive echo cancellation process includes: Acquiring an echo reference signal vector corresponding to the echo reference signal ; Vector the echo reference signalI