CN-122024708-A - Speech instruction analysis method and system based on mine field knowledge enhancement

CN122024708ACN 122024708 ACN122024708 ACN 122024708ACN-122024708-A

Abstract

The application provides a voice instruction analysis method and a system based on knowledge enhancement in the mine field, wherein the method comprises the steps of constructing a knowledge graph in the mine field; the method comprises the steps of establishing a multi-task joint learning model, training the multi-task joint learning model by utilizing voice instructions in a mine operation scene, constructing a DST module by utilizing an LSTM network, training a voice instruction analysis strategy network based on a reinforcement learning algorithm to build a context perception analysis optimizing module, receiving a target voice instruction to be analyzed currently, carrying out collaborative analysis processing on the target voice instruction by utilizing a training completed mine field knowledge graph, the multi-task joint learning model and the context perception analysis optimizing module, and outputting an instruction analysis result. According to the method, the knowledge graph and the multi-task joint learning model in the mine field are fused, so that the efficient analysis and the dynamic optimization of the voice command in the mine operation scene are realized.

Inventors

ZOU HONGTAO
BAI XUYANG
NING ZHENXING
YAN XIAOWEN
TANG HAOWEN
WANG CHAO

Assignees

中煤科工集团信息技术有限公司

Dates

Publication Date: 20260512
Application Date: 20251224

Claims (10)

1. A voice command analysis method based on mine field knowledge enhancement is characterized by comprising the following steps: Constructing a mine field knowledge graph, wherein the mine field knowledge graph comprises professional terms in a mine operation scene; Establishing a multi-task joint learning model, and training the multi-task joint learning model by utilizing a voice instruction in a mine operation scene, wherein an input layer of the multi-task joint learning model is a double-channel encoder, the double-channel encoder is used for fusing various voice acoustic characteristics and text transcription data, and the multi-task joint learning model is used for carrying out voice instruction type classification and parameter regression; constructing a dialogue state tracking DST module through a long-short-term memory network LSTM, and training a voice instruction analysis strategy network based on a reinforcement learning algorithm to build a context perception analysis optimization module; And receiving a target voice command to be analyzed currently, carrying out cooperative analysis processing on the target voice command through the trained knowledge graph of the mine field, the multi-task joint learning model and the context perception analysis optimization module, and outputting a command analysis result.
2. The method according to claim 1, wherein the constructing a knowledge-graph of the mine field comprises: collecting various types of text data in the mine field; Processing the text data through a natural language processing model, and extracting entity information; Performing relationship labeling based on the entity information, and defining a triplet relationship in mine operation; and constructing a knowledge graph based on the triplet relation, and storing the constructed knowledge graph through a graph database.
3. The method of claim 1, wherein the dual-channel encoder comprises a convolutional neural network CNN for extracting speech features and a BERT model for extracting text features; The middle layer of the multi-task combined learning model is a multi-layer transducer model, and the output layer of the multi-task combined learning model comprises an instruction type classification head and a parameter regression head.
4. A method according to claim 3, wherein training the multi-task joint learning model using speech instructions in a mine work scenario comprises: collecting mine scene voice instruction data, extracting Mel Frequency Cepstrum Coefficient (MFCC) characteristics and automatic voice recognition (ASR) transcription text in the mine scene voice instruction data to construct a mine voice instruction data set; pre-training the convolutional neural network CNN for extracting voice features through a universal voice data set; Combining the pretrained convolutional neural network CNN with the BERT model, and performing fine tuning on the multi-task joint learning model by utilizing the mine voice instruction data set.
5. The method of claim 4, wherein said fine tuning the multi-tasking joint learning model with the mine speech command data set comprises: Taking the weighted sum of the instruction type cross entropy loss and the MSE loss as a loss function of the fine tuning stage; And carrying out joint training on the instruction type classification head and the parameter regression head based on the loss function.
6. The method of claim 1, wherein building the context-aware parsing optimization module comprises: constructing a coding model through a long-short-term memory network LSTM, and training the coding model by utilizing historical instruction data and historical equipment state data to obtain the DST module for outputting the context vector; Defining a state space, an action space and a reward function in the reinforcement learning strategy; in a simulated mine environment, a near-end strategy optimization PPO algorithm is adopted to train a strategy network so as to optimize analysis strategy selection logic of the strategy network.
7. The method of claim 1, wherein the performing collaborative parsing of the target voice command to output a command parsing result includes: Preprocessing the target voice command, and extracting MFCC characteristics and ASR transcription text data; performing multi-mode coding and feature fusion on the MFCC features and the ASR transcribed text data; Analyzing the fused multi-mode features, outputting instruction types through an instruction type classification head, and outputting key parameters through a parameter regression head to obtain an initial analysis result; Inquiring a knowledge graph in the mine field, and performing professional semantic verification and correction on the preliminary analysis result; and calling the DST module to output the context vector of the target voice command, and carrying out ambiguity correction on the corrected analysis result by combining the optimal analysis strategy selected by the voice command analysis strategy network to obtain a final analysis result.
8. A voice command analysis system based on mine field knowledge enhancement is characterized by comprising the following modules: The first construction module is used for constructing a mine field knowledge graph, wherein the mine field knowledge graph comprises technical terms in a mine operation scene; The training module is used for establishing a multi-task joint learning model and training the multi-task joint learning model by utilizing a voice command in a mine operation scene, wherein an input layer of the multi-task joint learning model is a double-channel encoder, the double-channel encoder is used for fusing various voice acoustic characteristics and text transcription data, and the multi-task joint learning model is used for carrying out voice command type classification and parameter regression; the second construction module is used for constructing a Dialogue State Tracking (DST) module through a long-short-term memory network (LSTM) and training a voice instruction analysis strategy network based on a reinforcement learning algorithm so as to build a context perception analysis optimization module; The analysis processing module is used for receiving the current target voice instruction to be analyzed, carrying out cooperative analysis processing on the target voice instruction through the training-completed knowledge graph of the mine field, the multi-task joint learning model and the context perception analysis optimizing module, and outputting an instruction analysis result.
9. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the mine field knowledge-based enhanced voice instruction parsing method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the mine field knowledge-based enhanced speech instruction parsing method according to any one of claims 1-7.

Description

Speech instruction analysis method and system based on mine field knowledge enhancement Technical Field The application relates to the technical field of artificial intelligence, in particular to a voice instruction analysis method and system based on knowledge enhancement in the mine field. Background Along with the improvement of the automation degree of mine operation, voice command analysis becomes an important link in the mine operation process. Because mine operation has high risk, accurate analysis of voice instructions directly influences equipment operation safety and emergency response efficiency. If a voice command is misinterpreted, serious accidents may occur. In the related art, voice command analysis is performed in a mine operation scene, and generally, a general language model is adopted to process voice or text single-mode information. However, in practical application, the voice command analysis method cannot accurately analyze the technical terms in the mine scene, has the defects of insufficient semantic understanding, easy information omission and the like, and cannot meet the requirements of current mine operation on the real-time performance, the professional performance and the dynamic adaptability of voice command analysis. Disclosure of Invention The present application aims to solve at least one of the technical problems in the related art to some extent. Therefore, a first object of the present application is to provide a method for analyzing voice command based on knowledge enhancement in mine field, which realizes efficient analysis and dynamic optimization of voice command in mine operation scene by fusing knowledge graph and multi-task joint learning model in mine field, solves the problems of inaccurate semantic understanding, multi-mode fusion deletion, weak context perception and the like in related technology, and realizes high-precision and high-robustness voice command analysis. The second object of the application is to provide a voice command analysis system based on knowledge enhancement in the mine field. A third object of the present application is to propose an electronic device. A fourth object of the present application is to propose a computer readable storage medium. In order to achieve the above object, a first aspect of the present application provides a method for analyzing voice command based on knowledge enhancement in mine field, comprising the following steps: Constructing a mine field knowledge graph, wherein the mine field knowledge graph comprises professional terms in a mine operation scene; Establishing a multi-task joint learning model, and training the multi-task joint learning model by utilizing a voice instruction in a mine operation scene, wherein an input layer of the multi-task joint learning model is a double-channel encoder, the double-channel encoder is used for fusing various voice acoustic characteristics and text transcription data, and the multi-task joint learning model is used for carrying out voice instruction type classification and parameter regression; constructing a dialogue state tracking DST module through a long-short-term memory network LSTM, and training a voice instruction analysis strategy network based on a reinforcement learning algorithm to build a context perception analysis optimization module; And receiving a target voice command to be analyzed currently, carrying out cooperative analysis processing on the target voice command through the trained knowledge graph of the mine field, the multi-task joint learning model and the context perception analysis optimization module, and outputting a command analysis result. The method comprises the steps of acquiring text data of various types in the mine field, processing the text data through a natural language processing model, extracting entity information, carrying out relationship labeling based on the entity information, defining a triplet relationship in mine operation, constructing a knowledge graph based on the triplet relationship, and storing the constructed knowledge graph through a graph database. Optionally, the dual-channel encoder comprises a convolutional neural network CNN for extracting voice features and a BERT model for extracting text features, wherein the middle layer of the multi-task joint learning model is a multi-layer transducer model, and the output layer of the multi-task joint learning model comprises an instruction type classification head and a parameter regression head. The method comprises the steps of acquiring mine scene voice instruction data, extracting Mel Frequency Cepstrum Coefficient (MFCC) characteristics and automatic voice recognition (ASR) transcribed text in the mine scene voice instruction data to construct a mine voice instruction data set, pre-training a Convolutional Neural Network (CNN) for extracting voice characteristics through a general voice data set, combining the pre-trained Convolutional Neural Network (CNN) with