CN-121998474-A - Multi-modal data-based multi-person interaction innovation capability assessment system

CN121998474ACN 121998474 ACN121998474 ACN 121998474ACN-121998474-A

Abstract

The invention belongs to the technical field of individual innovation capability assessment, and provides a multi-user interaction innovation capability assessment system based on multi-modal data, aiming at the technical problems that the traditional assessment method is insufficient in ecological effectiveness and difficult to quantify group cooperation capability in a real interaction scene. The system builds a standardized creative task situation through a real online conference platform, synchronously collects language text, voice characteristics and facial expression data of participants, combines natural language processing, voice signal processing and emotion recognition technologies, performs hierarchical characteristic extraction and fusion modeling on multi-mode data, and realizes accurate quantification of innovation capability of individuals in group interaction. The invention is suitable for the student ability development evaluation of educational scenes and the high-potential talent selection of enterprise scenes, and has the technical advantages of high efficiency, expandability and adaptation to real interaction situations.

Inventors

GUO KAIFENG
CHEN QUNLIN
LUO ZUYING
TANG CHUANGAO

Assignees

西南大学

Dates

Publication Date: 20260508
Application Date: 20251218

Claims (9)

1. A multi-modal data-based multi-person interactive innovation capability assessment system, the system comprising: the system architecture is constructed on a real online conference platform, and synchronously collects and analyzes multi-mode behavior data of participants in the discussion process by designing standardized creative task situations; Core function module: the scene arrangement and equipment debugging module is used for guiding participants to enter an online meeting room, adjusting the angle of a camera and the position of equipment and ensuring the data acquisition quality; the instruction explanation and practice module is used for explaining task targets and operation specifications to participants and providing example tasks for practice; The test stage and data acquisition module is used for carrying out discussion of creative problems in a specified time and acquiring voice, text and video data; the data processing and feature extraction module is used for carrying out unified arrangement and multi-level feature extraction on the collected three-mode data, and comprises text semantic analysis, voice feature extraction and facial expression recognition; The analysis modeling and evaluation output module is used for constructing a multidimensional evaluation model of individual innovation ability and leader based on the extracted characteristics and outputting an evaluation result and a structured feedback report; And integrating natural language processing, voice signal processing and emotion recognition artificial intelligence technology to realize multi-modal data fusion and automatic evaluation.
2. The system of claim 1, wherein the multi-modal data collection means specifically comprises: voice data acquisition, namely acquiring voice data of a participant through high-quality audio equipment for voice feature analysis; Text data acquisition, namely converting voice data into text data through a voice transcription technology for semantic analysis and time sequence data analysis; And video data acquisition, namely acquiring facial video data of a participant through a camera for facial expression and emotion recognition analysis.
3. The system according to claim 1, wherein the feature fusion model is specifically: and constructing a multi-modal feature fusion model, fusing text semantics and time sequence features, voice acoustic features and facial expression features, and constructing an evaluation model of individual innovation ability and leader ability through a traditional machine learning algorithm or a deep neural network.
4. The system of claim 1, wherein the evaluation index system comprises: The text semantic index is that based on Word2Vec Word vector model, the participant voice transcription text is mapped into Word vector, and semantic network is constructed on the basis. Calculating originality of semantic distance evaluation views, utilizing hierarchical clustering analysis view distribution to quantify thinking flexibility, evaluating fluency of expression by combining text quantity, and simultaneously calculating degree centrality of nodes in a semantic network to predict leadership of individuals in discussion; And (3) a text time sequence index, namely constructing a time sequence semantic network based on the combination of the speaking sequence and the semantic similarity. The difference of individual speaking compared with the prior content is measured through calculating the novelty, the crossing property between adjacent speaking is evaluated through the jump degree, the continuous effect of the preface speaking on the subsequent speaking is described through the influence degree, and the deviation degree of the individual and the whole semantic center of the group is reflected through the centroid distance. Meanwhile, the directional centrality is calculated in a semantic network taking influence degree as a weighted edge, so that the position of an individual in a group semantic stream is captured, the contribution level of the individual is measured by combining the speaking times, and the voice acoustic index comprises the steps of extracting multidimensional acoustic features from voice signals of participants, wherein the multidimensional acoustic features comprise indexes such as speech speed (syllable number in unit time), pitch (fundamental frequency F0), intonation (pitch change mode), pause duration, frequency, energy intensity and the like. Based on these features, emotional tension (by pitch and energy fluctuation amplitude), interaction initiative (by speech duty cycle and response delay time) and emotional stability (by variance of pitch and energy) can be calculated, thereby quantitatively evaluating the emotional expression and interaction state of the individual in question; Facial expression index, namely extracting emotion types (such as pleasure, surprise, concentration and the like), emotion change amplitude (difference of expression characteristics in different time periods) and emotion stability (variance or standard deviation of expression change) of an individual in a discussion process by using a facial Action Unit (AU) detection and expression recognition algorithm. Based on these metrics, the individual's emotional expression, emotional accommodation, and social adaptability and interaction aggressiveness in team interactions can be quantified.
5. The system of claim 1, further comprising an application scene extension, the application scene extension comprising: The method is suitable for students 'ability development evaluation in middle school education and university learning courses, can comprehensively evaluate the students' innovative thinking, language expression ability, team cooperation level and emotion regulation ability, can provide scientific basis for personalized learning guidance, ability improvement planning and academic potential prediction, and supports teachers to conduct targeted intervention and feedback in classroom teaching and extracurricular activities; The enterprise scene is suitable for talent recruitment, team capability assessment and innovation potential assessment of scientific and innovative enterprises, can assist the enterprises to identify core talents with creativity, collaboration capability and leading potential, and provides data support for team building and post matching; Other multi-person interaction scenes are suitable for any scene needing to evaluate innovation ability, leading potential and social adaptability of individuals in group cooperation, such as team head storm, project cooperation, enterprise training workshops, academic group discussion, innovation competition and the like. The method can quantitatively analyze language expression, opinion integration capability, emotion regulation and influence of participants in interaction, and provides scientific basis for team building, talent selection and capability culture.
6. The system of claim 2, wherein the voice data collection process further comprises a step of preprocessing the voice data to improve the accuracy of voice feature extraction.
7. The system of claim 3, wherein the feature fusion model further comprises normalizing the multi-modal features during construction to eliminate dimensional differences between different modal data and improve stability of the assessment model.
8. The system of claim 4, wherein the assessment metric system further comprises weighting the metrics to reflect the relative importance of the different metrics in the innovation ability and leadership assessment.
9. The system of claim 5, wherein the application scenario extension further comprises custom development of industries and domains to meet specific requirements of different industries for innovation capability and leadership assessment.

Description

Multi-modal data-based multi-person interaction innovation capability assessment system Technical Field The invention relates to the field of evaluation systems, in particular to a multi-user interaction innovation capability evaluation system based on multi-modal data. Background Current innovation ability assessment relies primarily on individual-level contextual tasks to analyze the subject's independent innovation ability through manual scoring or single-modality data. Such methods have the following limitations: the ecological effectiveness is insufficient, the traditional evaluation is separated from the real interactive environment, and social behaviors such as viewpoint integration, conflict coordination and the like of individuals in team cooperation cannot be reflected; The subjectivity is strong, the manual scoring is easily influenced by cultural background and experience of a scoring person, and the evaluation result possibly has deviation; the data dimension is single, namely only the novelty, the variety and the quantity of the output views are concerned, and the procedural behaviors such as language procedures, intra-group angle collaboration, view selection and the like are ignored; the efficiency and the expandability are poor, manual evaluation is difficult to adapt to a large-scale scene, and an intelligent prediction and feedback mechanism is lacked. With education reform and enterprise innovation demand upgrade, capability assessment in a real interaction scene becomes a new demand. Enterprises need to identify staff with innovation capability, emotional stability and leadership, and educational scenes need to evaluate the performances of students in group cooperation. However, the prior art cannot meet the requirement of three-dimensional insight in a real situation. In order to solve the above problems, the applicant proposes a multi-user interactive innovation ability evaluation system based on multi-modal data. Disclosure of Invention The invention aims to provide a multi-user interaction innovation capability assessment system based on multi-mode data, so as to solve the problems in the prior art. In order to achieve the above purpose, the invention provides the following technical scheme that the multi-user interaction innovation capability assessment system based on multi-mode data comprises: the system architecture is constructed on a real online conference platform, and synchronously collects and analyzes multi-mode behavior data of participants in the discussion process by designing standardized creative task situations; Core function module: the scene arrangement and equipment debugging module is used for guiding participants to enter an online meeting room, adjusting the angle of a camera and the position of equipment and ensuring the data acquisition quality; the instruction explanation and practice module is used for explaining task targets and operation specifications to participants and providing example tasks for practice; The test stage and data acquisition module is used for carrying out discussion of creative problems in a specified time and acquiring voice, text and video data; the data processing and feature extraction module is used for carrying out unified arrangement and multi-level feature extraction on the collected three-mode data, and comprises text semantic analysis, voice feature extraction and facial expression recognition; The analysis modeling and evaluation output module is used for constructing a multidimensional evaluation model of individual innovation ability and leader based on the extracted characteristics and outputting an evaluation result and a structured feedback report; The method is technically characterized by integrating natural language processing, voice signal processing and emotion recognition artificial intelligence technology to realize multi-mode data fusion and automatic evaluation. Optionally, the multi-mode data acquisition mode specifically includes: voice data acquisition, namely acquiring voice data of a participant through high-quality audio equipment for voice feature analysis; Text data acquisition, namely converting voice data into text data through a voice transcription technology for semantic analysis and time sequence data analysis; And video data acquisition, namely acquiring facial video data of a participant through a camera for facial expression and emotion recognition analysis. Optionally, the feature fusion model specifically includes: And constructing a multi-modal feature fusion model, fusing text semantic features, voice acoustic features and facial expression features, and constructing an evaluation model of individual innovation ability and leader ability through a traditional machine learning algorithm or a deep neural network. Optionally, the evaluation index system includes: The text semantic index is that based on Word2Vec Word vector model, the participant voice transcription text is mapped into Word ve