CN-122027840-A - Digital human intelligent interaction control method and system in private live broadcast scene

CN122027840ACN 122027840 ACN122027840 ACN 122027840ACN-122027840-A

Abstract

The application relates to the technical field of digital human interaction, and discloses a digital human intelligent interaction control method and system in a private live broadcast scene. The method comprises the steps of obtaining group emotion change rate through multidimensional emotion recognition, on-line user number logarithmic normalization and double window ratio calculation on bullet screen texts, triggering digital human voice resolution operation and action sequence output according to the group emotion change rate, and carrying out self-adaptive updating on interaction triggering threshold values and long-time control window starting values through cross-scene interaction control files. The application improves the group emotion perception precision of digital human interaction control and the cross-field multiplexing capability of control parameters in a private live broadcast scene.

Inventors

WANG YANG
QIN BO

Assignees

天津白马星球智能科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (10)

1. The digital human intelligent interaction control method in the private domain live broadcast scene is characterized by comprising the following steps: S1, carrying out emotion recognition processing on each barrage text in barrage text streams of a private live broadcasting room to obtain multidimensional emotion intensity vectors corresponding to each barrage text; Step S2, dividing the multidimensional emotion intensity vector by a logarithm based on 2 corresponding to the sum of the online user numbers at the acquisition time and 2to obtain normalized emotion control quantity, respectively and cumulatively writing the normalized emotion control quantity into a short-time control window and a long-time control window according to the acquisition time, dividing the average value of the normalized emotion control quantity in each emotion dimension in the short-time control window by the sum of the average value of the normalized emotion control quantity in each emotion dimension in the long-time control window and a preset smoothing term to obtain group emotion change rate corresponding to each emotion dimension; S3, comparing the group emotion change rate corresponding to each emotion dimension with an interaction trigger threshold corresponding to each emotion dimension, selecting a voice resolution from a private voice operation library according to the trigger dimension and the group emotion change rate corresponding to each emotion dimension when the group emotion change rate corresponding to any emotion dimension is not lower than the interaction trigger threshold corresponding to the emotion dimension, driving a digital person to output an action sequence to the private live broadcast room after voice synthesis of the voice resolution, and recording and controlling output time; And S4, in a verification time window started at the control output moment, obtaining an interaction effect coefficient according to the variation of the normalized emotion control quantity, writing the group emotion variation rate and the interaction effect coefficient into a private domain interaction control file, and updating the interaction trigger threshold value of the next time and the starting value of the long-time control window by the private domain interaction control file.
2. The digital human intelligent interaction control method under the private live scene according to claim 1, wherein the step S1 comprises: inputting each barrage text in the barrage text stream of the private live broadcasting room into an emotion classification model with a Sigmoid function as an output layer activation function, and calculating the activation intensity of each barrage text in each emotion dimension based on the output layer of the emotion classification model to obtain each emotion dimension activation intensity value corresponding to each barrage text; Arranging the emotion dimension activation intensity values corresponding to the same barrage text according to emotion dimension sequences to obtain the multidimensional emotion intensity vector; the multidimensional emotion intensity vector corresponding to each barrage text is associated with the sending time of the corresponding barrage text and the private user identifier, and a multidimensional emotion intensity vector sequence with time marks is obtained; And (2) writing the multidimensional emotion intensity vector sequence with the time mark into a barrage emotion buffering queue according to the sending time sequence, and outputting the multidimensional emotion intensity vector and the corresponding private user identifier corresponding to each acquisition time to the step (S2) by the barrage emotion buffering queue.
3. The digital human intelligent interaction control method under the private live broadcast scene according to claim 1, wherein in the step S2, the multidimensional emotion intensity vector is divided by a logarithmic value based on2 corresponding to the sum of the number of online users at the collection time and 2 to obtain a normalized emotion control quantity, and the method comprises the following steps: The j-th dimensional emotion intensity value in the multidimensional emotion intensity vector is recorded as an emotion component, the online user number corresponding to the acquisition time is recorded as the live online number, and the logarithm based on 2 is taken as the sum of the live online number and 2, so that an online scale logarithm value is obtained; dividing the emotion component by the online scale logarithmic value to obtain a j-th dimension normalized emotion component, and arranging the normalized emotion components corresponding to each emotion dimension according to the emotion dimension order to obtain the normalized emotion control quantity.
4. The digital human intelligent interaction control method under the private live broadcast scene according to claim 3, wherein in the step S2, the normalized emotion control quantity is respectively accumulated and written into a short-time control window and a long-time control window according to the acquisition time, and the method comprises the following steps: Writing the normalized emotion control quantity into a short-time control window according to the acquisition time sequence, wherein the time length of the short-time control window is 30 seconds, and performing exponential weighted moving average calculation on all normalized emotion control quantity in the short-time control window in the j-th dimension normalized emotion components according to the acquisition time to obtain a short-time dimension average value; Writing the normalized emotion control quantity into a long-time control window according to the acquisition time sequence, wherein the time length of the long-time control window is 300 seconds, and performing exponential weighted moving average calculation on all normalized emotion control quantity in the long-time control window according to the acquisition time on the j-th dimension normalized emotion component to obtain a long-time dimension average value.
5. The digital human intelligent interaction control method under the private live broadcast scene according to claim 4, wherein in the step S2, the average value of the normalized emotion control quantity in each emotion dimension in the short-time control window is divided by the sum of the average value of the normalized emotion control quantity in each emotion dimension in the long-time control window and a preset smoothing term, so as to obtain a group emotion change rate corresponding to each emotion dimension, and the method comprises the following steps: Dividing the short-time dimension average value by the sum of the long-time dimension average value and a preset smoothing term to obtain a j-th dimension group emotion change rate, wherein the value of the preset smoothing term is 0.001, and the preset smoothing term is used for preventing zero removal error when the long-time dimension average value is zero; and respectively executing corresponding division operation on each emotion dimension to obtain the group emotion change rate corresponding to each emotion dimension.
6. The method according to claim 1, wherein in the step S3, a voice resolution is selected from a private voice library according to a trigger dimension and a corresponding group emotion change rate, and the voice resolution is synthesized to drive a digital person to output an action sequence to the private voice library, and the method comprises: inputting a triggering dimension and a corresponding group emotion change rate into a voice operation selection rule in a private voice operation library, wherein the voice operation selection rule divides the group emotion change rate into a first value interval, a second value interval and a third value interval, and matching a corresponding voice clearing operation from the private voice operation library based on the triggering dimension and the value interval to which the group emotion change rate belongs; Inputting the voice-clearing operation into a voice synthesis module to obtain voice-clearing operation audio, inputting the voice-clearing operation audio into a digital human skeleton binding model, outputting an action sequence and lip-shaped synchronous video stream aligned with the voice-clearing operation audio time length by the digital human skeleton binding model, pushing the action sequence and lip-shaped synchronous video stream to the private live broadcasting room, and recording the control output time.
7. The digital human intelligent interaction control method in the private live broadcast scene according to claim 1, wherein in the step S4, in a verification time window initiated at the control output moment, an interaction effect coefficient is obtained according to the variation of the normalized emotion control quantity, the group emotion variation rate and the interaction effect coefficient are written into a private interaction control file, and the private interaction control file updates the interaction trigger threshold value of the next session and the initial value of the long-time control window, including: the short-time dimension average value of the normalized emotion control quantity at the end time of the verification time window in the trigger dimension is subjected to difference with the short-time dimension average value at the control output time, and the difference value is divided by the short-time dimension average value at the control output time to obtain the interaction effect coefficient; Writing the group emotion change rate corresponding to the triggering dimension and the interaction effect coefficient into the private interaction control file according to a scene sequence; Updating the interaction trigger threshold value of the triggering dimension corresponding to the next time based on the average value of the group emotion change rate corresponding to the triggering dimension in the historical time in the private interaction control file, and initializing the starting value of the long-time control window of the next time based on the average value of the normalized emotion control quantity of the private user historical time in each emotion dimension in the private interaction control file.
8. A digital human intelligent interaction control system in a private live scene, which is used for realizing the digital human intelligent interaction control method in the private live scene according to any one of claims 1-7, wherein the digital human intelligent interaction control system in the private live scene comprises: The identification module is used for carrying out emotion identification processing on each barrage text in the barrage text stream of the private live broadcasting room to obtain multidimensional emotion intensity vectors corresponding to each barrage text; The analysis module is used for dividing the multidimensional emotion intensity vector by a logarithm based on 2 corresponding to the sum of the online user number and 2 at the acquisition time to obtain a normalized emotion control quantity, respectively and cumulatively writing the normalized emotion control quantity into a short-time control window and a long-time control window according to the acquisition time, dividing the average value of the normalized emotion control quantity in each emotion dimension in the short-time control window by the sum of the average value of the normalized emotion control quantity in each emotion dimension in the long-time control window and a preset smoothing term to obtain a group emotion change rate corresponding to each emotion dimension; The comparison module is used for comparing the group emotion change rate corresponding to each emotion dimension with an interaction trigger threshold corresponding to each emotion dimension respectively, selecting a voice resolution from a private voice operation library according to the trigger dimension and the group emotion change rate corresponding to any emotion dimension when the group emotion change rate corresponding to any emotion dimension is not lower than the interaction trigger threshold corresponding to the emotion dimension, driving a digital person to output an action sequence to the private live broadcast room after voice synthesis of the voice resolution, and recording and controlling the output moment; and the updating module is used for obtaining an interaction effect coefficient according to the variation of the normalized emotion control quantity in a verification time window started at the control output moment, writing the group emotion variation rate and the interaction effect coefficient into a private domain interaction control file, and updating the interaction trigger threshold value of the next time and the initial value of the long-time control window by the private domain interaction control file.
9. The digital human intelligent interaction control device under the private live scene is characterized by comprising a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor realizes the digital human intelligent interaction control method under the private live scene according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, causes the processor to perform the digital human intelligent interactive control method in a private live scenario according to any one of claims 1 to 7.

Description

Digital human intelligent interaction control method and system in private live broadcast scene Technical Field The application relates to the technical field of digital human interaction, in particular to a digital human intelligent interaction control method and system in a private live broadcast scene. Background Along with the rapid development of live broadcast of private electronic commerce, digital man technology is widely applied to private live broadcast scenes, and virtual digital man images are driven to interact with users in a live broadcast room in real time through technologies such as voice recognition, natural language processing and computer vision. The existing digital live interaction control system generally adopts an edge computing and cloud collaborative architecture, realizes multi-terminal real-time data transmission by using a WebRTC protocol, drives a digital human model to perform action synchronization by using a skeleton binding algorithm, and generates corresponding reply content to be output to a live broadcasting room after semantic features are extracted by combining a voice recognition module. However, in the prior art, the digital human interaction control system takes a single barrage text as a control signal input, performs emotion classification on each barrage independently, then outputs a discrete emotion label, and accordingly triggers one-to-one passive reply, and the control structure has the following technical defects: firstly, the discrete emotion labels lose continuous value information of emotion intensity, so that a subsequent control module cannot carry out statistical processing on bullet screen emotion signals in the time dimension, and a system cannot sense the relative change trend of group emotion in the live broadcasting room in the time dimension; secondly, the online user number of the private live broadcasting room dynamically changes along with time, the prior art directly carries out accumulation statistics on bullet screen emotion signals, the accumulated value of the period with more online user numbers is higher, emotion statistics of different periods and different scenes are not comparable in dimension, so that an interaction trigger threshold cannot be multiplexed across scenes, thirdly, after digital people execute interaction output in the prior art, the control effect of the interaction is not quantitatively evaluated, the interaction trigger threshold is a fixed preset value in each live broadcasting, and the control parameters are not adaptively updated along with the history interaction effect. The method is characterized in that the relative change trend of group emotions cannot be perceived in the prior art, only the absolute emotion intensity of a single bullet screen can be responded, when the group emotions in a living broadcast room are remarkably changed in a short time, the system cannot timely recognize and trigger active digestion control, so that the triggering time of a digital human interaction control instruction is disjointed from the actual change of the group emotion states in the living broadcast room, furthermore, even if a threshold triggering mechanism is introduced, the threshold cannot be optimized and updated based on historical scene data due to lack of cross-scene comparability due to incomparability of control signals, on the basis, the control effect lacks of quantitative feedback, the system repeatedly undergoes the same exploration process in each living broadcast, and the control parameters cannot be continuously optimized by utilizing the historical user emotion data accumulated in a living broadcast scene in a private domain, so that the user historical data asset specific to the living broadcast scene in the private domain cannot play a role in the digital human interaction control. Disclosure of Invention The application provides a digital human intelligent interaction control method and a system under a private live broadcast scene, which solve the problems that a digital human interaction control system in the prior art cannot trigger active digestion control based on relative change trend of group emotion and control parameters cannot be updated in a cross-field mode in a self-adaptive manner, and improve group emotion perception precision of digital human interaction control and cross-field multiplexing capability of control parameters under the private live broadcast scene. In a first aspect, the present application provides a digital human intelligent interaction control method in a private live broadcast scene, where the digital human intelligent interaction control method in the private live broadcast scene includes: S1, carrying out emotion recognition processing on each barrage text in barrage text streams of a private live broadcasting room to obtain multidimensional emotion intensity vectors corresponding to each barrage text; Step S2, dividing the multidimensional emotion intensity vecto