CN-121983088-A - Oral dialogue quality assessment method based on AI
Abstract
The invention discloses an AI-based spoken dialogue quality assessment method, which relates to the technical field of artificial intelligence and natural language processing, and comprises the steps of receiving spoken dialogue audio streams, extracting text content information and acoustic prosody information, and carrying out association fusion according to time stamps to generate a multi-modal dialogue data sequence; and based on the comprehensive quality evaluation score and the dynamic memory library, locating contradictory nodes and contexts in the dialogue logic relationship map, constructing a local consistency reconstruction task, and updating a memory enhancement neural network by utilizing the local consistency reconstruction task. According to the invention, the memory enhancement neural network is updated by utilizing the local consistency reconstruction task, so that the self-adaptive learning and continuous evolution of the memory enhancement neural network based on actual dialogue contradiction are realized, and the consistency detection accuracy and the adaptability dynamic capability are improved.
Inventors
- HU YANG
Assignees
- 长春职业技术大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260206
Claims (10)
- 1. A spoken dialog quality assessment method based on AI is characterized by comprising the following steps, Receiving a spoken dialogue audio stream, extracting text content information and acoustic prosody information, and carrying out association fusion on the text content information and the acoustic prosody information according to time stamps to generate a multi-mode dialogue data sequence; Inputting a multi-mode dialogue data sequence into a memory enhancement neural network, associating a retrieval history memory through an attention mechanism, executing consistency check and contradiction detection, generating a consistency detection label, and associating and storing the consistency detection label and corresponding dialogue data to form a dynamic memory bank; Carrying out syntactic analysis on dialogue data with consistency detection labels in a dynamic memory library, screening out complete statement sentences as nodes, analyzing semantic association and logic dependency relationship among the nodes, and constructing a dialogue logic relationship map; performing joint analysis on the dialogue logic relation graph and the dynamic memory library, and calculating a dialogue structure consistency index to obtain a comprehensive quality evaluation score; Based on the comprehensive quality evaluation score and the dynamic memory library, contradiction nodes and contexts in the dialogue logic relation map are positioned, a local consistency reconstruction task is constructed, and the memory enhancement neural network is updated by utilizing the local consistency reconstruction task.
- 2. The method for evaluating the quality of an AI-based spoken dialog as set forth in claim 1, wherein the generating the sequence of multimodal dialog data includes the steps of, Pre-emphasis, framing and windowing are carried out on the spoken dialog audio stream to generate an audio frame sequence; Converting the audio frame sequence into a word-by-word text stream with a time stamp through a voice decoder to generate a text content information sequence; based on the audio frame sequence, calculating fundamental frequency, energy and duration parameters, and generating an acoustic prosody feature vector sequence; and aligning and splicing the text content information sequence and the acoustic prosodic feature vector sequence in the time dimension to generate a multi-modal dialogue data sequence.
- 3. The method for evaluating the quality of an AI-based spoken dialog according to claim 1, wherein the step of inputting a sequence of multimodal dialog data into a memory enhancement neural network, correlating the retrieval history by a mechanism of attention, Inputting the multi-modal dialogue data sequence into a memory enhancement neural network, performing context feature coding, and outputting a current speech vector; And calculating the association strength between the current speech vector and the existing history vector in the memory storage through the attention layer of the memory enhancement neural network, and performing history context synthesis on the existing history vector according to the association strength to generate a comprehensive history context vector.
- 4. The method for evaluating the quality of an AI-based spoken dialog in accordance with claim 1, wherein the step of forming a dynamic memory library comprises the steps of, Fusing the comprehensive historical context vector with the current speech vector, and performing contradictory probability mapping through a nonlinear transformation layer of the memory enhancement neural network to output contradictory confidence scores; Comparing the contradictory confidence score with a preset consistency judgment threshold value, judging whether the current utterance and the historical content have logic contradiction or not, and generating a consistency detection label; binding the consistency detection label with the corresponding current speech vector and the corresponding time stamp, and storing the consistency detection label into a memory storage of a memory enhancement neural network as a new memory vector to form a dynamic memory bank.
- 5. The method for evaluating the quality of a spoken dialog based on AI of claim 1, wherein the parsing of dialog data with consistency detection tags in a dynamic memory library, screening out complete statement sentences as nodes, comprises the steps of, Extracting all attached consistency detection labels and corresponding text content information from a dynamic memory library to form an original labeling corpus; And carrying out rule pattern matching based on a subject-predicate-object complete structure on the marked original corpus, screening out statement sentences with complete grammar structures and defining the statement sentences as logic nodes.
- 6. The method for evaluating the quality of a spoken dialog based on AI of claim 1, wherein the analyzing of semantic associations and logical dependencies between nodes constructs a dialog logic relationship graph comprising the steps of, Calculating semantic similarity between every two current speech vectors of each logic node, and carrying out semantic association judgment according to a preset similarity threshold value to generate a strong semantic association node pair sequence; aiming at each node pair in the semantic association node pair sequence, carrying out logic relationship classification by combining the time stamp sequence and the consistency detection label in the dynamic memory library, and determining a logic relationship type; and constructing a dialogue logical relation map by taking the logical nodes as vertexes and the logical relation types as directed edges.
- 7. The method for evaluating the quality of a spoken dialog based on AI according to claim 1, wherein the joint analysis of the dialog logic relationship graph and the dynamic memory library is performed as follows, Extracting node input degree, node output degree and path connectivity characteristics of a dialogue logic relation map, and generating a map structure characteristic vector; Counting the distribution and frequency of consistency detection labels on each node of a dialogue logic relation graph in a dynamic memory library, and generating a label distribution feature vector; And splicing and normalizing the map structure feature vector and the label distribution feature vector to generate a joint analysis feature vector.
- 8. The method for evaluating the quality of a spoken dialog based on AI of claim 1, wherein the integrated quality evaluation score is obtained by performing multidimensional projection mapping on a joint analysis feature vector, calculating a logical connectivity coefficient, a contradictory distribution density, and a collision propagation strength, respectively, and integrating by linear weighting.
- 9. The method for evaluating the quality of an AI-based spoken dialog according to claim 1, wherein locating contradictory nodes and contexts in a logical relationship graph of the dialog comprises the steps of, Screening low-quality subgraphs in the dialogue logic relation map according to the comprehensive quality evaluation score, and positioning contradictory nodes by combining consistency detection labels and time stamps marked as contradictory categories in the dynamic memory library; and extracting direct association logic nodes of contradiction nodes and corresponding dialogue texts according to the edge connection relations in the dialogue logic relation map and time sequence information in the dynamic memory library to form contradiction context fragments.
- 10. The method for evaluating the quality of an AI-based spoken dialog as set forth in claim 1, wherein the updating of the memory-enhanced neural network with the local consistency reconstruction task is performed by, Performing random masking on contradictory node contents in the contradictory context fragments to generate a local dialogue sequence with masking; Inputting the local dialogue sequence with the mask into a memory enhancement neural network, restoring the masked contradictory node content based on the associated information in the dynamic memory bank to obtain a restored text sequence, comparing the restored text sequence with the contradictory node content, and calculating a reconstruction loss value; And updating parameters of the memory enhancement neural network through a back propagation algorithm according to the reconstruction loss value, and generating an updated memory enhancement neural network.
Description
Oral dialogue quality assessment method based on AI Technical Field The invention relates to the technical field of artificial intelligence and natural language processing, in particular to an AI-based spoken language dialogue quality assessment method. Background In the fields of intelligent voice interaction, online education and dialogue analysis, the demand for automatic evaluation of the quality of spoken dialogue is growing increasingly, the conventional method is generally based on Natural Language Processing (NLP) technology to analyze the content of transcribed dialogue text or independently evaluate the transcribed dialogue text in combination with acoustic features, the conventional method is generally used for semantic understanding of the text by using a language model, evaluating the text through keyword matching, emotion analysis and theme consistency shallow indexes, and independently analyzing the rhythm features (such as intonation and pause) of the voice, and the voice is used as an auxiliary index of fluency, so that a certain quality insight can be provided from a single mode or a simple fusion level, and a current dialogue quality evaluation basis is formed. However, the method has limitations in coping with the deep quality evaluation of complex and multi-round spoken dialogues, the existing method lacks effective modeling and continuous memory of dialog history information, causes the context of the evaluation process to be split, is difficult to detect logical consistency and contradiction points across multi-round dialogues, focuses on shallow statistics of isolated sentences, fails to deeply construct a logical structure map of the dialogues, and cannot quantify semantic consistency and discourse of the dialogues as a whole. Disclosure of Invention The present invention has been made in view of the above-described problems occurring in the prior art. Therefore, the invention provides an AI-based spoken language dialogue quality assessment method which solves the problems that logic consistency detection is difficult and overall semantic consistency is difficult to quantify. In order to solve the technical problems, the invention provides the following technical scheme: The invention provides an AI-based spoken language dialogue quality assessment method, which comprises the steps of receiving spoken language dialogue audio streams, extracting text content information and acoustic rhythm information, carrying out association fusion on timestamps to generate a multi-modal dialogue data sequence, inputting the multi-modal dialogue data sequence into a memory enhancement neural network, carrying out consistency check and contradiction detection through a attention mechanism association search history memory to generate consistency detection labels, carrying out association storage on the consistency detection labels and corresponding dialogue data to form a dynamic memory library, carrying out syntax analysis on dialogue data with the consistency detection labels in the dynamic memory library, screening out complete statement sentences as nodes, analyzing semantic association and logic dependency between the nodes to construct a dialogue logic relationship map, carrying out joint analysis on the dialogue logic relationship map and the dynamic memory library, obtaining a comprehensive quality assessment score through calculating dialogue structure consistency indexes, positioning contradiction nodes and contexts in the dialogue logic relationship map based on the comprehensive quality assessment score and the dynamic memory library, constructing a local consistency reconstruction task, and updating the memory enhancement neural network by utilizing the local consistency reconstruction task. As a preferred embodiment of the method for evaluating the quality of a spoken dialog based on AI according to the invention, the generation of the multimodal dialog data sequence is performed as follows, Pre-emphasis, framing and windowing are carried out on the spoken dialog audio stream to generate an audio frame sequence; Converting the audio frame sequence into a word-by-word text stream with a time stamp through a voice decoder to generate a text content information sequence; based on the audio frame sequence, calculating fundamental frequency, energy and duration parameters, and generating an acoustic prosody feature vector sequence; and aligning and splicing the text content information sequence and the acoustic prosodic feature vector sequence in the time dimension to generate a multi-modal dialogue data sequence. As a preferable scheme of the AI-based spoken dialog quality assessment method of the present invention, wherein the multi-modal dialog data sequence is input into a memory-enhancing neural network, the retrieval history memory is associated by an attention mechanism, the steps are as follows, Inputting the multi-modal dialogue data sequence into a memory enhancement neural networ