Search

CN-121996341-A - Digital human interaction strategy optimization method based on big data identification

CN121996341ACN 121996341 ACN121996341 ACN 121996341ACN-121996341-A

Abstract

The invention discloses a digital human interaction strategy optimization method based on big data recognition, which relates to the technical field of intelligent interaction and digital human, and comprises the steps of collecting and storing multi-modal interaction data such as texts, voices, expressions and time lengths generated in the digital human interaction process of a user, marking the multi-modal interaction data with emotion states based on a pre-training emotion recognition model, extracting emotion feature vectors comprising emotion polarities, emotion intensities and emotion stability, constructing a user emotion change track map to recognize emotion state conversion modes, evaluating and sorting candidate interaction strategies according to the conversion modes and combining historical interaction data, selecting optimized interaction strategies, and updating strategy evaluation indexes according to user feedback, so that the self-adaptive optimization of the digital human interaction strategy is realized.

Inventors

  • LI DONGMING
  • YUAN CHAO
  • WU JIAQIAN

Assignees

  • 南京南数数字产业集团有限公司

Dates

Publication Date
20260508
Application Date
20260115

Claims (10)

  1. 1. A digital human interaction strategy optimization method based on big data identification is characterized by comprising the following steps of, The method comprises the steps of collecting multi-mode interaction data generated by a user in a digital human interaction process, and storing the multi-mode interaction data in an interaction database; carrying out emotion state labeling processing on the multi-mode interaction data according to a pre-trained emotion recognition model, and extracting emotion feature vectors of current dialogue rounds of a user; Constructing a user emotion change track map, and identifying a conversion mode of the user emotion state by calculating emotion feature vector difference values among successive dialogue rounds; Searching a candidate interaction strategy set from a strategy library according to the conversion mode, calculating strategy evaluation indexes of the candidate interaction strategies, sequencing the candidate interaction strategy set according to the strategy evaluation indexes, and selecting the candidate interaction strategy with the optimal sequencing as an optimized interaction strategy; Transmitting the interaction strategy parameters of the optimized interaction strategy to a digital person executing end, collecting feedback data of the user on the optimized interaction strategy, and writing the feedback data back to an interaction database to update strategy evaluation indexes.
  2. 2. The method for optimizing digital human interaction strategy based on big data recognition of claim 1, wherein the interaction strategy parameters comprise language style adjustment factors, response speed control parameters and content recommendation weight coefficients, the digital human execution end adjusts a digital human speech generation module according to initial values of the language style adjustment factors, the digital human execution end adjusts a response time sequence module of a digital human according to initial values of the response speed control parameters, the digital human execution end adjusts a content screening module of the digital human according to initial values of the content recommendation weight coefficients, the digital human execution end is configured to generate and output interaction response contents according to the adjusted parameters, and a user feedback acquisition module is synchronously started.
  3. 3. The method for optimizing digital human interaction strategy based on big data recognition of claim 1, wherein the calculation method of the strategy evaluation index is, Retrieving historical interaction records with the same conversion mode from an interaction database, and extracting a subset of historical application records matched with each candidate interaction strategy; counting the total number of records in the history application record subset as the strategy history application times, and calculating the record duty ratio of the history interaction result label in the history application record subset as the strategy history success rate; Calculating an arithmetic average value of all historical user feedback scores in the historical application record subset to serve as a strategy average feedback score, and simultaneously calculating a standard deviation of the historical user feedback scores to serve as a strategy feedback stability index; Calculating time attenuation weights according to the creation time of the historical interaction records and the number of days of the interval of the current time, multiplying the historical user feedback scores by the time attenuation weights, then carrying out weighted summation, and dividing the weighted summation by the sum of all the time attenuation weights to obtain strategy aging weighted scores; And multiplying the strategy historical success rate, the strategy average feedback score, the strategy feedback stability index and the strategy aging weighting score by corresponding dimension weight coefficients respectively, and then summing to obtain a strategy evaluation index.
  4. 4. The method for optimizing digital human interaction strategy based on big data recognition of claim 3, wherein the method for obtaining the conversion mode is, Searching all historical dialogue turn records of the current interaction session according to the session identifier from a emotion feature storage table of the interaction database, and extracting emotion feature vectors corresponding to all turns according to ascending arrangement of dialogue turn numbers to form an emotion feature vector time sequence; performing difference calculation on emotion feature vectors of two adjacent dialogue turns in the emotion feature vector time sequence to obtain an emotion change difference vector sequence; Performing mode coding operation on the emotion change difference vector sequence, determining a polarity migration direction label, an intensity change amplitude label and a stability trend label, and generating a single-step conversion mode coding sequence; constructing a user emotion change track map based on the single-step conversion mode coding sequence; Executing continuous mode identification operation on the user emotion change track map, intercepting continuous single-step conversion mode coding fragments on the single-step conversion mode coding sequence by using a sliding window, and splicing the single-step conversion mode codes in each fragment in sequence to form a composite conversion mode code; Matching and matching the composite conversion mode codes with a preset typical emotion conversion mode library, calculating a mode similarity score, and selecting a conversion mode category corresponding to a sample with the highest mode similarity score as a conversion mode.
  5. 5. The method of claim 4 wherein the user emotion change trace map is in a directed graph structure, wherein nodes in the graph represent discretized emotion state categories, directed edges in the graph represent conversion relationships between emotion states, each directed edge is accompanied by a single-step conversion pattern code as an edge attribute, and the number of times of occurrence of the conversion in a current session is recorded as an edge weight.
  6. 6. The method for optimizing digital human interaction strategy based on big data recognition as claimed in claim 4, wherein the extraction method of emotion feature vectors is, Performing feature extraction operation on the multi-modal interaction data, and extracting multi-modal emotion characterization vectors, wherein the multi-modal emotion characterization vectors comprise text emotion feature sub-vectors, voice emotion feature sub-vectors, visual emotion feature sub-vectors and behavior emotion feature sub-vectors; And inputting the multi-modal emotion characterization vector into a pre-trained emotion recognition model, and outputting an emotion feature vector.
  7. 7. The method for optimizing digital human interaction strategy based on big data recognition of claim 6, wherein the multi-modal interaction data comprises text input content, voice frequency characteristic parameters, facial expression recognition coordinates and interaction duration parameters, the emotion recognition model comprises an emotion classification layer, and the emotion classification layer comprises an emotion polarity prediction branch, an emotion strength prediction branch and an emotion stability prediction branch.
  8. 8. The digital human interaction strategy optimization method based on big data recognition of claim 7, wherein the emotion polarity prediction branch maps the multi-mode emotion characterization vector through a fully connected network to output emotion polarity values, the emotion strength prediction branch calculates absolute value average values of all dimension activation values in the multi-mode emotion characterization vector as original strength fractions, maps the original strength fractions to a zero-to-one interval through a normalization function to output emotion strength coefficients, and the emotion stability prediction branch invokes historical emotion polarity value records of three consecutive turns before a current dialogue turn from an interaction database, calculates standard deviation between the historical emotion polarity value records and the current emotion polarity value as emotion fluctuation amplitude, and outputs emotion stability indexes after inverting and normalizing the emotion fluctuation amplitude.
  9. 9. The computer equipment comprises a memory and a processor, wherein the memory stores a computer program, and the computer equipment is characterized in that the processor realizes the steps of the digital human interaction strategy optimization method based on big data identification according to any one of claims 1-8 when executing the computer program.
  10. 10. A computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the steps of the digital human interaction policy optimization method based on big data recognition as claimed in any one of claims 1 to 8.

Description

Digital human interaction strategy optimization method based on big data identification Technical Field The invention relates to the technical field of intelligent interaction and digital people, in particular to a digital human interaction strategy optimization method based on big data identification. Background With the rapid development of artificial intelligence, big data analysis and multi-modal perception technology, digital people are widely applied to intelligent customer service, virtual assistant, digital government affairs, digital education, immersive interaction and other scenes as intelligent interaction carriers integrating natural language processing, voice recognition, computer vision and emotion calculation. The digital human interaction technology at the present stage has gradually evolved from a single-round dialogue mode mainly comprising rule driving and template reply to a context-aware interaction mode integrating multi-mode information such as text, voice, expression and the like, and has made a certain progress in interaction naturalness and information expression integrity. However, most of the existing digital man systems focus on the improvement of the accuracy of single interactive content generation, and focus on semantic understanding, multi-mode feature fusion and matching degree of output content, so that the long-term statistics and dynamic modeling of emotion evolution rules, emotion stability and strategy response effects of users in the continuous interactive process are considered to be insufficient. Especially in high-frequency, multi-round and emotion-sensitive interaction scenes, digital people often have difficulty in adaptively adjusting interaction strategies according to the change trend of the emotion states of users, so that the interaction style is stiff, the response rhythm is unbalanced, and even the problems of emotion activation or user experience reduction occur. Therefore, how to identify and model the emotion change of the user by utilizing a big data analysis means on the basis of multi-mode interaction data and realize the dynamic optimization of the interaction strategy according to the emotion change, and the method has become a key problem to be solved in the technical field of digital human intelligent interaction. CN119299805B discloses a large data driven digital human intelligent interaction method and system, the technical scheme comprises the steps of carrying out image smoothing and equalization processing on video frames by extracting an interaction audio frequency and video frame set in a user history interaction video, respectively extracting characteristic time sequence information of an enhanced video frame, an interaction text and an interaction audio frequency by combining a voice recognition result, carrying out interaction fusion on multi-mode characteristics on the basis, generating a target interaction text, interaction voice and interaction expression, and finally constructing an interaction video of a digital human to realize intelligent interaction with a user. The scheme improves the understanding capability of the digital person to the user input information from the aspects of multi-mode perception and feature fusion, improves the accuracy and consistency of the interaction content and the expression form to a certain extent, and provides powerful support for the landing of the digital person interaction technology in the real application scene. However, the technical solution disclosed in CN119299805B above still focuses on synchronous extraction and fusion modeling of multi-modal features, and optimization of single-round or local interactive content generation effects, and does not explicitly model and analyze the emotional state change of the user in successive dialog rounds. Specifically, although the scheme can identify multi-mode information such as texts, voices and expressions, key emotion parameters such as emotion polarity, emotion intensity and emotion stability of a user are not quantitatively expressed, and a change track model reflecting the evolution of the emotion of the user along with the interactive progress is not constructed, so that it is difficult to identify a state transition mode of the emotion of the user from steady to fluctuation, from positive to negative and the like. In the aspect of interaction strategies, the prior art improves the expression matching degree mainly by generating target interaction contents and expressions, lacks a strategy effect evaluation mechanism based on historical interaction big data, cannot transversely compare and sort actual performances of different interaction strategies under similar emotion change scenes, and does not relate to the problem of continuously updating strategy evaluation indexes through user feedback. The method makes it difficult to realize policy self-optimization in the true sense when the digital person faces to a complex emotion-driven inter