CN-122000039-A - Interactive text generation method, system and application for child autism risk assessment

CN122000039ACN 122000039 ACN122000039 ACN 122000039ACN-122000039-A

Abstract

The invention discloses an interactive text generation method for child autism risk assessment, which comprises the following steps of firstly collecting knowledge in the autism field and initializing a knowledge database, secondly capturing the expression and the voice of a user from a multi-mode sensor in real time, converting the expression and the voice into text features and then carrying out feature fusion, thirdly loading the fused features into a large language model, and combining a knowledge base retrieval enhancement technology to complete one-round or multi-round interaction through an analysis chain, an interview chain and a question chain to generate an interactive text. According to the interactive text generation strategy optimization method, iteration and optimization are carried out on the interactive text generation strategy according to the dialogue context and the multi-modal characteristics, scale scores are summarized through a large language model according to the dialogue context, and user portraits and interview results are output. The invention also provides a generating system for realizing the interactive text generating method and related applications, and the generating system has wide application value.

Inventors

CAO GUITAO
CHEN ZIYUAN
CHEN CHEN
GU JINYU

Assignees

华东师范大学

Dates

Publication Date: 20260508
Application Date: 20241108

Claims (10)

1. An interactive text generation method for child autism risk assessment, which is characterized by comprising the following steps: step one, acquiring the knowledge in the autism field and initializing a knowledge database; capturing the expression and the voice of a user from a multi-mode sensor in real time, converting the expression and the voice into text features, and then carrying out feature fusion; And thirdly, loading the fused features into a large language model, and combining a knowledge base retrieval enhancement technology to complete one or more rounds of interaction through an analysis chain, an interview chain and a question chain to generate an interactive text.
2. The method of claim 1, wherein in step one, the domain knowledge comprises medical data about autism, interview summaries and correlation documents, wherein the medical data comprises autism-related expertise and case data, the interview summaries comprise modified infant autism checkup table M-CHAT and autism diagnostic interview table ADIR, guiding interview procedures, the correlation documents comprise documents related to autism in a broad sense, including books, stories; and/or the number of the groups of groups, And managing knowledge data by adopting a vector database, converting the domain knowledge into vectors, embedding and storing the vectors in the vector database, and searching by vector approximation matching, wherein the vector approximation matching is used for accessing and using background knowledge and specific data by a large language model, and accurate and meaningful interactive text generation is supported.
3. The method of claim 1, wherein in the second step, the expression and the voice of the user are collected, the facial expression of the user is analyzed through an emotion recognition algorithm to obtain emotion information, and/or the voice of the user is analyzed through an audio analysis algorithm to obtain voice characteristics comprising tone, speed and emotion; And carrying out feature fusion on the emotion information, the voice features and the semantic information through a text fusion algorithm, and storing the emotion information, the voice features and the semantic information in a vector database in a text feature vector form.
4. The method of claim 3, wherein the mood information, the speech feature, the semantic information are each converted to a text feature; After emotion information is acquired through an emotion recognition algorithm, the emotion vector is mapped into emotion description text represented by the text; after voice characteristics are obtained through an audio analysis algorithm, the voice characteristics are mapped into voice description texts expressed by texts; The emotion information is obtained through a facial expression analysis algorithm in Openface and a ResNet-based emotion classifier, the voice features are captured and extracted through a Librosa library, and semantic analysis is performed through a pre-training language model RoBERTa after the voice information is converted into text through a Wav2Vec 2.0 model.
5. The method of claim 1 wherein in step three, the validity of the input fusion feature is judged by the analysis chain, if the fusion feature is legal into a subsequent interview analysis, if the fusion feature is not legal, the fusion feature is directly ended; Constructing, by the interview chain, a user representation based on the user response and interview history information; and through a question chain, the user is asked based on the current strategy and combined with the multi-mode characteristics, the user portrait and the professional scale, and the question is output as an interactive text.
6. The method of claim 5, wherein the analysis chain invokes a large language model through a prompt project of word normalization and semantic judgment, performs preliminary processing on user input using predefined prompt words, parses user text input, and determines whether interview requirements are met; the interview chain calls a professional scale in the database, analyzes and analyzes the user answers in combination with a large language model, and updates the user portraits; the questioning chain is used for generating the next optimal dynamic question to the user according to the feedback of the current user, the portrait and the answer progress of the scale by combining a Q-Learning method according to the dialogue strategy driven by reinforcement Learning, outputting interactive text to return to the user until the interview flow is finished, and/or dynamically adjusting the questioning strategy according to the feedback and the state of the user and evaluating the effect of the question by updating the Q value, and/or helping the system to understand the current state and the demand of the user in real time by ReAct logic of LANGCHAIN and reasoning based on the information of the professional scale and the portrait of the user.
7. The method of claim 1, wherein iterating and optimizing the interactive text generation strategy based on the dialog context and the multi-modal characteristics during the interactive text generation process, comprises performing strategy learning by a Q-learning method, continuously iterating the updating to achieve optimal action selection, maximizing the desired long-term jackpot; The Q-learning method optimizes the strategy by iteratively updating the Q value, which represents the desired long-term jackpot that can be achieved by selecting action a t in a given state s t , as follows: Where eta is the learning rate, gamma is the discount factor, represents the current value of the future rewards, Representing the highest Q value that can be obtained by selecting the optimal action alpha' in the next state s t+1 , reflecting the potential rewards in the future; R (s t ,a t ) represents a reward function for evaluating the effect of each action based on the user's feedback and the consistency design of the dialog, and the reward function obtained when taking action a t in state s t at each dialog time t and transitioning to the next state is as follows: R(s t ,a t )＝R context (s t ,a t )+αR multi-model (s t ,a t ), Wherein, R context is the rewards obtained according to the context environment characteristics, R multi-model is the rewards obtained according to the multi-mode feedback characteristics of the user, and alpha is the weight coefficient for adjusting the relative importance of the two parts; Through policy iteration and optimization, the policy model is continually updated to select the optimal actions α * in each state: long-term feedback data of the user is collected and managed using LANGCHAIN framework, and policy iterations and updates are performed.
8. The method of claim 1 wherein after the end of the talk interview, the scale scores are summarized by a large language model based on the context of the talk, and a composite report is output containing user portraits, interview history, and scale statistics after the user interaction is completed.
9. An interactive text generation system for implementing the interactive text generation method according to any one of claims 1 to 8, wherein the system comprises a data management module, a feature extraction and fusion module, a dynamic weight adjustment module, a text generation module, a policy learning and optimization module; The data management module is used for storing and managing related medical data, interview outline and domain knowledge base; the feature extraction and fusion module is used for extracting multi-modal features from expressions and voices and carrying out feature fusion; the dynamic weight adjustment module is used for adjusting weights of different feature sources in vector fusion and optimizing feature fusion results; The text generation module generates an autism risk assessment interactive text by combining a pre-trained large language model with a knowledge base retrieval enhancement technology; The strategy learning and optimizing module is used for continuously iterating and optimizing the interactive text generation strategy through a reinforcement learning algorithm.
10. The interactive text generation method of any one of claims 1-8, or the use of the interactive text generation system of claim 9 in interactive text generation for autism risk assessment, dynamic interview for child behavior analysis, adaptive question generation in parental interviews, and personalized feedback in autism rehabilitation training.

Description

Interactive text generation method, system and application for child autism risk assessment Technical Field The invention belongs to the technical field of machine learning and computer application, and relates to an interactive text generation method, system and application for child autism risk assessment. Background Autism spectrum disorder (Autism Spectrum Disorder, ASD) is a neurological disorder that affects children's social interactions, communications, and behaviors. Early diagnosis and intervention is critical to improve quality of life in autistic children. Traditional diagnostic methods rely primarily on observations and questionnaires by specialized doctors, which are highly subjective and require significant time and effort. Since the etiology of autism is not yet clear, there is no effective treatment. Besides the intervention of medicines, the existing autism training mode mainly adopts one-to-one behavior correction and simple rehabilitation games. The problems of long training period (most children patients need to participate in training for life), high medical cost, scarcity of medical staff, uneven level of medical institutions, crude auxiliary treatment tools and the like are increasingly prominent. Disclosure of Invention In order to solve the defects in the prior art, the invention aims to provide an interactive text generation method, an interactive text generation system and application for children autism risk assessment. Aiming at the existing outstanding problems, advanced information science means such as control technology, artificial intelligence and the like are combined with auxiliary treatment, a remote interactive risk assessment system for the autism children is developed, the purpose of improving the accuracy of the risk assessment of the autism children is achieved, and a new tool is provided for diagnosis and treatment of the autism children. The invention aims to provide an interactive text generation method for children autism risk assessment, which realizes automatic and objective autism risk assessment by combining a professional interview scale and multi-modal characteristics, and improves assessment efficiency and accuracy. And collecting information according to the professional scale in the autism children field and the multi-mode characteristics of the user to diagnose the autism. In the existing diagnosis process of autism children, professional scales such as M-CHAT (improved infant autism checkup) and ADIR (autism diagnosis interview scales) are important basis for diagnosis by doctors, and the situations of the children can be reflected in a plurality of fields such as social interaction, communication, repetitive behaviors and the like. The invention adopts professional scales and combines large language models to realize interviews of parents and preliminary risk assessment of child autism through multiple rounds of interactive conversations. The invention provides an interactive text generation method for child autism risk assessment. The method aims to improve the flexibility, accuracy and humanization of interview problem generation and interaction in the diagnosis process by integrating multi-modal data analysis, dynamic weight adjustment, text generation and strategy learning and optimization technologies. The system can capture the expression, voice and text input of parents of children in real time and dynamically generate interactive text conforming to the autism risk assessment based on the professional scale and the domain knowledge. Through reinforcement learning mechanism, the system continuously optimizes the questioning strategy in interview process, ensures the consistency and pertinence of the questions, improves the diagnosis efficiency and the evaluation accuracy, and provides high-efficiency technical support for early identification and intervention of the autism of children. The invention provides an interactive text generation method for child autism risk assessment, which comprises the following steps: step one, acquiring the knowledge in the autism field and initializing a knowledge database; capturing the expression and the voice of a user from a multi-mode sensor in real time, converting the expression and the voice into text features, and then carrying out feature fusion; And thirdly, loading the fused features into a large language model, and combining a knowledge base retrieval enhancement technology to complete one or more rounds of interaction through an analysis chain, an interview chain and a question chain to generate an interactive text. In the first step, the knowledge in the autism field is collected and arranged, and a knowledge database is initialized. These domain knowledge include medical data, interview summaries and relevance documents regarding autism. Wherein the medical data comprises autism-related expertise and case data, the interview proposal comprises an M-CHAT (modified infant autism checklist) and ADI