CN-122024709-A - Dynamic voice instruction analysis and distribution method

CN122024709ACN 122024709 ACN122024709 ACN 122024709ACN-122024709-A

Abstract

The invention relates to the technical field of voice recognition and discloses a dynamic voice command analysis and distribution method, which comprises the following steps of collecting multi-source voice command input data, generating an original voice command data set, carrying out voice signal preprocessing on the original voice command data set, generating standardized voice command data, and generating standardized voice command data and voice command feature vectors through voice signal preprocessing and voice feature extraction when carrying out dynamic voice command analysis, so that the quality of voice signals can be detected in real time, the interference of environmental noise can be eliminated, the definition and reliability of voice command input can be ensured, the command recognition deviation can be reduced, the accuracy and the robustness of voice command analysis can be improved, meanwhile, an analysis model can be updated by utilizing execution feedback data, so that a system can adaptively process multi-task conflict and sequence problems, and the accuracy and the overall system effect of voice command execution can be improved.

Inventors

XIAO XINWEN

Assignees

武汉大博恩科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260211

Claims (10)

1. The method for analyzing and distributing the dynamic voice command is characterized by comprising the following steps: S1, acquiring multi-source voice instruction input data to generate an original voice instruction data set; s2, performing voice signal preprocessing on the original voice instruction data set to generate standardized voice instruction data; S3, extracting voice features of the standardized voice instruction data based on a deep learning model to generate voice instruction feature vectors; S4, analyzing the instruction intention according to the voice instruction feature vector to generate voice instruction intention classification data; S5, carrying out instruction target matching based on the voice instruction intention classification data and a preset instruction allocation strategy to generate target equipment allocation data; S6, carrying out real-time priority evaluation on the data distributed by the target equipment, and generating a voice instruction priority queue; s7, dynamic resource allocation is carried out according to the voice instruction priority queue, and instruction execution scheduling data are generated; S8, executing a voice instruction based on the instruction execution scheduling data, collecting execution feedback information and generating instruction execution feedback data; And S9, updating a voice command analysis model according to the command execution feedback data to generate optimized voice command analysis parameters.
2. The method for dynamic voice command parsing and distribution according to claim 1, wherein the generating the original voice command data set in S1 includes the steps of: S11, collecting user voice instruction input through a multi-modal sensor array, wherein the multi-modal sensor array comprises a microphone array and an environmental noise sensor; s12, performing timestamp marking and source identification processing on the collected voice command input to generate an original voice command data set with space-time information.
3. The method for dynamic voice command parsing and distribution according to claim 1, wherein the step of generating standardized voice command data in S2 includes the steps of: S21, carrying out noise reduction and normalization processing on the original voice instruction data set to remove environmental noise and voice distortion; S22, dividing the effective voice segment by adopting a voice activity detection algorithm to generate standardized voice instruction data, wherein the sampling frequency of the standardized voice instruction data is 16kHz.
4. The method for dynamic voice command parsing and distribution according to claim 1, wherein the generating voice command feature vector in S3 includes the steps of: s31, constructing a convolutional neural network model, wherein the convolutional neural network model comprises a plurality of convolutional layers and a pooling layer; S32, inputting the standardized voice instruction data into the convolutional neural network model, extracting a Mel frequency cepstrum coefficient and voice frequency spectrum characteristics, and generating a voice instruction characteristic vector.
5. The method for dynamic voice command parsing and distribution according to claim 1, wherein the step of generating voice command intention classification data in S4 includes the steps of: s41, performing sequence modeling on the voice instruction feature vector based on an attention mechanism model to generate voice instruction context semantic data; S42, matching the voice instruction context semantic data with a predefined intention dictionary to generate voice instruction intention classification data, wherein the intention classification comprises a control instruction, a query instruction and a configuration instruction.
6. The method for analyzing and distributing dynamic voice command according to claim 1, wherein generating the target device distribution data in S5 comprises the steps of: S51, establishing a device resource state database, wherein the device resource state database stores available devices and resource load information thereof; And S52, matching the voice instruction intention classification data with the equipment resource state by adopting a greedy algorithm to generate target equipment allocation data.
7. The method for dynamic voice command parsing and allocation according to claim 1, wherein generating a voice command priority queue in S6 comprises the steps of: S61, calculating priority scores according to the emergency degree of the voice instruction and the equipment resource occupancy rate, and generating an initial priority list; S62, sorting the initial priority list by adopting a weighted polling algorithm to generate a voice instruction priority queue.
8. The method for dynamic voice command parsing and distribution according to claim 1, wherein the step of generating command execution scheduling data in S7 includes the steps of: s71, dynamically adjusting a resource allocation strategy based on a real-time system load to generate self-adaptive scheduling parameters; S72, distributing the voice instruction to the target equipment according to the self-adaptive scheduling parameters, and generating instruction execution scheduling data.
9. The method for dynamic voice command parsing and distribution according to claim 1, wherein the step of generating command execution feedback data in S8 includes the steps of: S81, monitoring a process of executing a voice instruction by target equipment, and collecting execution success rate and delay data; s82, generating instruction execution feedback data, wherein the instruction execution feedback data comprises an execution result and an error log.
10. The method for dynamic voice command parsing and distribution according to claim 1, wherein the generating optimized voice command parsing parameters in S9 includes the steps of: S91, adopting an incremental learning algorithm to execute feedback data according to the instruction to update weight parameters of the convolutional neural network model; S92, periodically verifying the accuracy of the updated voice command analysis model, and generating optimized voice command analysis parameters.

Description

Dynamic voice instruction analysis and distribution method Technical Field The invention relates to the technical field of voice recognition, in particular to a dynamic voice instruction analysis and distribution method. Background Speech recognition technology, also known as automatic speech recognition, aims to convert the lexical content in human speech into computer readable input. At present, due to the fact that sound conditions in an actual application environment are complex and changeable, when real-time analysis of a multi-source voice command is carried out, environment noise and interference are mixed in collected original audio data, whether the quality of a voice signal meets the requirement of high-precision analysis cannot be judged in real time, when serious distortion and reverberation exist in input voice, deviation occurs in command feature extraction, and accuracy of intention recognition is difficult to guarantee. Therefore, a dynamic voice command parsing and distribution method is proposed to solve the above-mentioned problems. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a dynamic voice command analysis and distribution method, which solves the problem that whether the quality of a voice signal meets the requirement of high-precision analysis cannot be judged in real time in the background technology. In order to achieve the above purpose, the invention provides a method for analyzing and distributing dynamic voice instructions, which comprises the following steps: S1, acquiring multi-source voice instruction input data to generate an original voice instruction data set; s2, performing voice signal preprocessing on the original voice instruction data set to generate standardized voice instruction data; S3, extracting voice features of the standardized voice instruction data based on a deep learning model to generate voice instruction feature vectors; S4, analyzing the instruction intention according to the voice instruction feature vector to generate voice instruction intention classification data; S5, carrying out instruction target matching based on the voice instruction intention classification data and a preset instruction allocation strategy to generate target equipment allocation data; S6, carrying out real-time priority evaluation on the data distributed by the target equipment, and generating a voice instruction priority queue; s7, dynamic resource allocation is carried out according to the voice instruction priority queue, and instruction execution scheduling data are generated; S8, executing a voice instruction based on the instruction execution scheduling data, collecting execution feedback information and generating instruction execution feedback data; And S9, updating a voice command analysis model according to the command execution feedback data to generate optimized voice command analysis parameters. Preferably, the generating the original voice command data set in S1 includes the following steps: S11, collecting user voice instruction input through a multi-modal sensor array, wherein the multi-modal sensor array comprises a microphone array and an environmental noise sensor; s12, performing timestamp marking and source identification processing on the collected voice command input to generate an original voice command data set with space-time information. Preferably, the generating standardized voice command data in S2 includes the following steps: S21, carrying out noise reduction and normalization processing on the original voice instruction data set to remove environmental noise and voice distortion; S22, dividing the effective voice segment by adopting a voice activity detection algorithm to generate standardized voice instruction data, wherein the sampling frequency of the standardized voice instruction data is 16kHz. Preferably, the generating the voice command feature vector in S3 includes the following steps: s31, constructing a convolutional neural network model, wherein the convolutional neural network model comprises a plurality of convolutional layers and a pooling layer; S32, inputting the standardized voice instruction data into the convolutional neural network model, extracting a Mel frequency cepstrum coefficient and voice frequency spectrum characteristics, and generating a voice instruction characteristic vector. Preferably, the generating the voice command intention classification data in S4 includes the following steps: s41, performing sequence modeling on the voice instruction feature vector based on an attention mechanism model to generate voice instruction context semantic data; S42, matching the voice instruction context semantic data with a predefined intention dictionary to generate voice instruction intention classification data, wherein the intention classification comprises a control instruction, a query instruction and a configuration instruction. Preferably, the generating the target device allo