CN-121983058-A - System and method for training and scoring doctor-patient ditch common emotion based on multi-modal analysis

CN121983058ACN 121983058 ACN121983058 ACN 121983058ACN-121983058-A

Abstract

The invention discloses a doctor-patient ditch co-emotion language training scoring system and method based on multi-mode analysis, and relates to the field of artificial intelligence and medical informatization. The system comprises a data acquisition module, a co-emotion language semantic feature extraction module, a co-emotion language context correlation scoring module, a speech speed analysis and scoring module, a multidimensional scoring fusion module, a feedback and training guidance module and a training closed loop, wherein the data acquisition module acquires doctor-patient speech signals, the doctor speech text data and time sequence basic parameters are obtained through a data preprocessing and time alignment module, the co-emotion language semantic feature extraction module analyzes texts to generate co-emotion language semantic feature vectors, the co-emotion language semantic scoring module matches the feature vectors with a template library to generate co-emotion language semantic scoring values, the co-emotion language context correlation scoring module generates co-emotion language correlation scoring values according to the speech emotion semantic features and the feature vectors of patients, the speech speed analysis and speech speed scoring module generates speech speed scoring values based on the time sequence basic parameters, the multidimensional scoring fusion module fuses the scoring values to generate comprehensive co-emotion communication scores, and the feedback and training guidance module generates adjustment suggestions according to the comprehensive co-emotion communication scores and the scoring values to form the training closed loop.

Inventors

CHEN SONGYU
WANG JINFAN
ZHANG YUE
SHAO JIANWEN
Yao Lingxi
HUA LEI
CHEN LIANG

Assignees

江苏布洛氪链数据科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260210

Claims (10)

1. The utility model provides a doctor-patient ditch common emotion language training scoring system based on multimodal analysis which is characterized in that the scoring system comprises: the data acquisition module is used for acquiring original voice signals and time stamp information of both doctors and patients; the data preprocessing and time alignment module is used for processing the original voice signals to obtain aligned doctor voice text data and time sequence basic parameters; The co-emotion language semantic feature extraction module is used for analyzing and identifying doctor voice text data to generate co-emotion language semantic feature vectors; the co-emotion language semantic scoring module is used for carrying out semantic matching on the co-emotion language semantic feature vector and a pre-constructed co-emotion semantic template library to generate a co-emotion language semantic scoring value; The co-emotion language context correlation scoring module is used for generating a co-emotion language correlation scoring value according to the correlation of the emotion semantic features of the patient voice and the co-emotion language semantic feature vector; the speech speed analysis and speech speed scoring module is used for calculating speech speed characteristics based on time sequence basic parameters and generating speech speed scoring values by combining preset comfortable speech speed intervals; the multidimensional scoring fusion module is used for carrying out normalization and weighted fusion on the co-emotion language semantic scoring value, the co-emotion language correlation scoring value and the speech speed scoring value to generate a comprehensive co-emotion communication score; and the feedback and training guide module is used for generating an adjustment suggestion to form a training closed loop according to the comprehensive co-emotion communication score and the comparison of each item score value and a preset threshold value.
2. The multi-modal analysis-based doctor-patient ditch co-morbid language training scoring system of claim 1, wherein the data preprocessing and time alignment module includes: Performing voice separation processing on the original voice signal based on the speaker recognition model to distinguish doctor voice data streams from patient voice data streams; noise reduction and voice enhancement processing are carried out on the voice data stream of the doctor; inputting the processed doctor voice data stream into a voice recognition model, and converting the processed doctor voice data stream into corresponding doctor voice text data; and establishing a time alignment relation among the original voice signal, the doctor voice text data and the timestamp information, and extracting a time sequence basic parameter based on the time alignment relation, wherein the time sequence basic parameter comprises voice duration time, voice fragment start-stop time and voice word number per unit time.
3. The multi-modal analysis-based doctor-patient ditch co-morbid language training scoring system of claim 1, wherein the co-morbid language semantic feature extraction module comprises: Sentence dividing processing is carried out on voice text data of doctors, and lexical analysis and syntactic structure analysis are carried out on the text after sentence dividing; Identifying a cosmophilic language sentence containing emotion response, comprehensiveness or pacifying semantic features from the analysis result; Based on a pre-constructed medical co-emotion language corpus and a semantic coding model, coding co-emotion language sentences into co-emotion language semantic feature vectors with preset dimensions; and aggregating a plurality of co-emotion language semantic feature vectors generated in a single doctor-patient communication process to construct a feature matrix for representing semantic distribution of doctors in different co-emotion dimensions.
4. The multi-modal analysis-based doctor-patient trench-co-emotion language training scoring system of claim 1, wherein the co-emotion language semantic scoring module comprises: Carrying out semantic similarity matching on the co-emotion language semantic feature vector and a standard semantic feature vector in a pre-constructed co-emotion semantic template library to obtain semantic matching parameters, wherein the co-emotion semantic template library comprises emotion understanding type, emotion response type and pacifying support type co-emotion semantic templates, and the semantic matching parameters comprise emotion understanding matching parameters, emotion response matching parameters and pacifying support matching parameters; calculating multidimensional semantic quantization indexes based on semantic matching parameters through a preset mapping function, wherein the multidimensional semantic quantization indexes comprise emotion understanding accuracy indexes, emotion response integrity indexes and language expression softness indexes; And carrying out weighted calculation on the multidimensional semantic quantization indexes to generate a co-emotion language semantic scoring value.
5. The multi-modal analysis-based doctor-patient trench-co-emotion language training scoring system of claim 4, wherein the emotion understanding accuracy index is based on emotion understanding matching parameters and is obtained by a first arctangent function mapping, The emotion response integrity index is obtained based on the emotion response matching parameter and through second arctangent function mapping, The language expression softness index is based on pacifying support matching parameters and is obtained through third tangent function mapping.
6. The multimodal analysis-based doctor-patient trench-co-emotion language training scoring system of claim 1, wherein the co-emotion language context correlation scoring module comprises: carrying out emotion recognition processing on the patient voice data to generate a patient emotion semantic feature vector; calculating semantic relativity between the co-emotion language semantic feature vector and the emotion semantic feature vector of the patient to obtain a context matching degree parameter; Decomposing the context matching degree parameters into emotion semantic matching components and problem semantic matching components; Calculating emotion correspondence accuracy indexes based on emotion semantic matching components, and calculating problem response pertinence indexes based on problem semantic matching components; And carrying out standardization processing on the emotion correspondence accuracy index and the problem response pertinence index, and generating a co-emotion language relevance scoring value based on the standardized processing result.
7. The multi-modal analysis-based doctor-patient trench-co-emotion language training scoring system of claim 1, wherein said speech rate analysis and scoring module comprises: Calculating an average speech speed value, a speech speed fluctuation parameter and a pause frequency parameter based on the time sequence basic parameters; comparing the average speech rate value with a preset comfortable speech rate interval of doctor-patient communication to generate a speech rate adaptation degree parameter; based on the speech rate adaptation degree parameter, mapping through a preset speech rate evaluation function to obtain a speech rate scoring intermediate value; And correcting the speech rate scoring intermediate value by combining the speech rate fluctuation parameter and the pause frequency parameter to generate a speech rate scoring value.
8. The multi-modal analysis-based doctor-patient ditch co-morbid language training scoring system of claim 1, wherein the multi-dimensional scoring fusion module includes: Respectively carrying out normalization processing on the co-emotion language semantic score value, the co-emotion language relevance score value and the speech speed score value; And carrying out weighted summation on the normalized scoring values according to preset weight parameters to generate a comprehensive co-emotion communication score.
9. The multi-modal analysis-based doctor-patient ditch co-morbid language training scoring system of claim 1, wherein the feedback and training guidance module includes: Comparing the comprehensive co-emotion communication score, the co-emotion language semantic score, the co-emotion language correlation score and the speech speed score with corresponding preset thresholds; if the comprehensive coscopic communication score or any score value is smaller than the corresponding preset threshold value, generating and outputting a targeted adjustment suggestion according to the analysis result; And in the subsequent voice data acquisition, reevaluating the communication effect according to the adjustment advice to form an iterative training closed loop.
10. A doctor-patient ditch co-morbid language training scoring method based on multi-modal analysis for implementing the doctor-patient ditch co-morbid language training scoring system based on multi-modal analysis of any one of claims 1-9, comprising: collecting original voice signals and timestamp information of both doctors and patients; Processing the original voice signal to obtain aligned doctor voice text data and time sequence basic parameters; Analyzing and identifying doctor voice text data to generate a co-emotion language semantic feature vector; Carrying out semantic matching on the co-emotion language semantic feature vector and a pre-constructed co-emotion semantic template library to generate a co-emotion language semantic scoring value; Generating a co-emotion language correlation scoring value according to the correlation of the emotion semantic features of the patient voice and the co-emotion language semantic feature vector; Calculating the speech speed characteristics based on the time sequence basic parameters, and generating a speech speed scoring value by combining a preset comfortable speech speed interval; Normalizing and weighting fusion is carried out on the co-emotion language semantic score value, the co-emotion language correlation score value and the speech speed score value, so that a comprehensive co-emotion communication score is generated; And generating an adjustment suggestion to form a training closed loop according to the comprehensive co-emotion communication score and the comparison of the score value of each sub-item and a preset threshold value.

Description

System and method for training and scoring doctor-patient ditch common emotion based on multi-modal analysis Technical Field The invention relates to the field of artificial intelligence and medical informatization, in particular to a doctor-patient ditch co-emotion language training scoring system and method based on multi-modal analysis. Background In modern medical systems, medical services are increasingly being shifted from traditional disease diagnosis and treatment to a patient-centric integrated health management model. In this context, doctor-patient communication serves as a key bridge connecting medical expertise with individual needs of patients, whose quality directly influences patient understanding of the disease, therapeutic compliance and overall satisfaction. Research shows that the co-emotion capability of doctors in the communication process is one of the core factors for establishing and maintaining doctor-patient trust. However, doctor-patient communication has high complexity and unstructured features, namely, content is mainly based on flexible and ambiguous natural language, and is accompanied by non-text information such as voice rhythm, intonation change and the like, which together form an important influence on emotion and understanding of a patient. Therefore, how to systematically and technically analyze the doctor-patient communication process becomes a significant challenge in the field of medical informatization. At present, the evaluation of the communication capacity of doctors and patients in medical education and clinical management mainly depends on three modes, namely subjective evaluation by experts, evaluation by means of on-site observation or record playback, low result consistency and difficulty in large scale, indirect evaluation by patient satisfaction questionnaires or post interviews, incapability of realizing real-time and fine analysis on the communication process, and simple text record analysis, wherein important information such as keyword statistics or emotion tendencies are only concerned, but voice prosody and the like is ignored. The problems of insufficient real-time performance, low quantization degree, lack of continuous feedback mechanism and the like of the methods generally exist, and the demands of modern medicine on fine assessment and training of communication capacity are difficult to meet. With the development of artificial intelligence technology, some schemes have attempted to analyze doctor-patient communication with the help of speech recognition, natural language processing and emotion recognition technologies. However, the prior art still has obvious limitations that most of the prior art only performs analysis based on a single mode (such as pure text or pure voice) and fails to fuse multidimensional information, some of the prior art introduces multi-mode data but lacks a collaborative modeling mechanism under a unified time axis to fully exert complementary advantages among modes, and in addition, most of the research remains on the emotion classification or tendency judgment level and fails to convert the comprehensive capability of 'co-emotion' into a technical object which can be calculated and quantized, so that the prior art is difficult to support fine assessment and training guidance. Co-emotion ability has the dual connotation of emotion understanding and response pacifying in doctor-patient communication, and is traditionally regarded as psychological or behavioral issues, and lacks an explicit technical quantification path. From the technical realization point of view, the method can comprehensively describe the multidimensional features such as language content, semantic association, expressed rhythm and the like. Therefore, a technical solution that can disassemble the co-emotion capability into a computable dimension and realize quantitative evaluation through multi-modal collaboration is urgently needed. Therefore, the system and the method for training and scoring the doctor-patient ditch co-emotion language in real time based on multi-mode analysis are provided, and aim to realize real-time quantitative analysis of the doctor co-emotion language expression by carrying out cooperative modeling on voice, semantic and time sequence features in a unified time axis. Disclosure of Invention Based on the above-mentioned shortcomings of the prior art, the present invention aims to provide a system and a method for training and scoring a doctor-patient ditch co-emotion based on multi-modal analysis, so as to solve the above-mentioned technical problems. In order to achieve the above purpose, the invention provides the following technical scheme that the doctor-patient ditch co-emotion language training scoring system based on multi-modal analysis comprises: the data acquisition module is used for acquiring original voice signals and time stamp information of both doctors and patients; the data preprocessing and time alignme