CN-121982178-A - Virtual face evaluation method, device and storage medium

CN121982178ACN 121982178 ACN121982178 ACN 121982178ACN-121982178-A

Abstract

The application provides a virtual face evaluation method, a device and a storage medium, belonging to the technical field of image processing, wherein the method comprises the steps of obtaining virtual face data generated by a virtual face generation system and user behavior data used for generating the virtual face data; the method comprises the steps of respectively extracting multi-modal characteristics of virtual face data and user behavior data corresponding to the virtual face data, calculating a plurality of sub-item evaluation values of the virtual face data based on the multi-modal characteristics, and weighting the plurality of sub-item evaluation values to obtain the evaluation value of the virtual face data. According to the scheme, the virtual face data is evaluated by fusing the multi-mode data of the virtual face data and the user behavior data, the reality and the naturalness of the virtual face data are more comprehensively reflected, and the evaluation standardization and automation requirements of the virtual face data are improved.

Inventors

WANG XINYI

Assignees

深圳市优必选科技股份有限公司

Dates

Publication Date: 20260505
Application Date: 20251231

Claims (10)

1. A virtual face evaluation method, the method comprising: Acquiring virtual face data generated by a virtual face generation system and user behavior data used for generating the virtual face data; respectively extracting the virtual face data and multi-modal characteristics of user behavior data corresponding to the virtual face data; Calculating a plurality of sub-item evaluation values of the virtual face data based on the multi-modal feature; And weighting the multiple sub-item evaluation values to obtain the evaluation value of the virtual face data.
2. The method of claim 1, wherein the calculating a plurality of sub-item evaluation values of the virtual face data based on the multi-modal feature comprises: determining emotion fit evaluation value of the virtual face data according to the emotion vector of the virtual face data and the target emotion corresponding to the user behavior data, and/or Determining mouth shape synchronization evaluation value of the virtual face data according to the mouth key point motion sequence of the virtual face data and the audio characteristics corresponding to the user behavior data, and/or Determining the naturalness evaluation value of the virtual face data according to the generation parameters of any two adjacent frames of virtual face images in the virtual face data, and/or Determining an acceptability evaluation value of the virtual face data by a genuineness discrimination model, and/or When the occurrence of image abnormality in the virtual face data is detected, a penalty item evaluation value is generated.
3. The method of claim 2, wherein the extracting the multi-modal features of the virtual face data and the user behavior data corresponding to the virtual face data, respectively, comprises: identifying, by a facial emotion recognition system, a first emotion vector representing an emotion feature of the virtual facial data; Identifying a second emotion vector of a target emotion corresponding to the user behavior data according to semantic information and image information in the user behavior data; the determining the emotion matching evaluation value of the virtual face data according to the emotion vector of the virtual face data and the target emotion corresponding to the user behavior data comprises the following steps: an emotional engagement evaluation value of the virtual face data is determined based on a difference between the first emotional vector and the second emotional vector.
4. The method of claim 2, wherein the extracting the multi-modal features of the virtual face data and the user behavior data corresponding to the virtual face data, respectively, comprises: extracting audio features of audio data of the user behavior data; Extracting a mouth key point motion sequence of the virtual face data; The determining the mouth shape synchronization evaluation value of the virtual face data according to the mouth part key point motion sequence of the virtual face data and the audio characteristics corresponding to the user behavior data comprises the following steps: determining the degree of correlation between the audio features and the changing process of the mouth key point motion sequence; And determining a mouth shape synchronization evaluation value of the virtual face data based on the degree of correlation of the change process.
5. The method of claim 2, wherein the extracting the multi-modal features of the virtual face data and the user behavior data corresponding to the virtual face data, respectively, comprises: extracting time sequence characteristics of the virtual face data and generation parameters of each frame of virtual face image; the determining the naturalness evaluation value of the virtual face data according to the generation parameters of any two adjacent frames of virtual face images in the virtual face data comprises the following steps: determining adjacent virtual face images from the virtual face data according to the time sequence characteristics of the virtual face data; For any two frames of adjacent virtual face images, determining the difference value of the generation parameters of the adjacent virtual face images; and determining a naturalness evaluation value of the virtual face data according to the difference value of the generation parameters.
6. The method of any one of claims 1-5, wherein prior to the extracting the multi-modal features of the virtual face data and the user behavior data corresponding to the virtual face data, respectively, the method further comprises: And aligning the virtual face data with the user behavior data according to the timestamp of the user behavior data corresponding to the virtual face image.
7. The method of any one of claims 1-5, wherein the method further comprises: And adjusting parameters of the virtual face generation system based on the evaluation value of the virtual face data.
8. A virtual face evaluation apparatus, characterized in that the apparatus comprises: An acquisition unit configured to acquire virtual face data generated by a virtual face generation system and user behavior data used for generating the virtual face data; a feature extraction unit for extracting multi-modal features of the virtual face data and user behavior data corresponding to the virtual face data, respectively; A first calculation unit configured to calculate a plurality of sub-item evaluation values of the virtual face data based on the multi-modal feature; and the second calculation unit is used for weighting the multiple sub-item evaluation values to obtain the evaluation value of the virtual face data.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the virtual face assessment method of any one of claims 1 to 7 when the computer program is executed by the processor.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the virtual face evaluation method according to any one of claims 1 to 7.

Description

Virtual face evaluation method, device and storage medium Technical Field The application belongs to the technical field of image processing, and particularly relates to a virtual face evaluation method, a virtual face evaluation device and a storage medium. Background With the development of image processing technology, the application of image processing in the field of virtual face generation is becoming more and more widespread. For example, in the fields of movies, games, live broadcasts, intelligent customer service, educational and interactive experiences, etc., it is often necessary to generate virtual faces to cooperate to perform corresponding functions. The reality and naturalness of the facial expression of the virtual face seriously affect the user experience in practical application. Wherein the virtual face is typically generated by a virtual face generation system. Virtual facial generation systems often drive models through speech, text, or motion signals to generate corresponding expression frames. Thus, the performance of the virtual face generation system not only affects the visual presentation quality of the virtual face, but also more directly determines the user experience. In summary, there is a need for a method of evaluating a virtual face generated by a virtual face generation system. Disclosure of Invention The application aims to provide a virtual face evaluation method, a virtual face evaluation device and a storage medium, and aims to solve the problems that a virtual face generation system is inaccurate in evaluation and nonuniform in standard. A first aspect of an embodiment of the present application provides a virtual face evaluation method, including: Acquiring virtual face data generated by a virtual face generation system and user behavior data used for generating the virtual face data; respectively extracting the virtual face data and multi-modal characteristics of user behavior data corresponding to the virtual face data; Calculating a plurality of sub-item evaluation values of the virtual face data based on the multi-modal feature; And weighting the multiple sub-item evaluation values to obtain the evaluation value of the virtual face data. In some embodiments, the computing a plurality of sub-item evaluation values of the virtual face data based on the multi-modal feature comprises: determining emotion fit evaluation value of the virtual face data according to the emotion vector of the virtual face data and the target emotion corresponding to the user behavior data, and/or Determining mouth shape synchronization evaluation value of the virtual face data according to the mouth key point motion sequence of the virtual face data and the audio characteristics corresponding to the user behavior data, and/or Determining the naturalness evaluation value of the virtual face data according to the generation parameters of any two adjacent frames of virtual face images in the virtual face data, and/or Determining an acceptability evaluation value of the virtual face data by a genuineness discrimination model, and/or When the occurrence of image abnormality in the virtual face data is detected, a penalty item evaluation value is generated. In some embodiments, the extracting the multi-modal features of the virtual face data and the user behavior data corresponding to the virtual face data, respectively, includes: identifying, by a facial emotion recognition system, a first emotion vector representing an emotion feature of the virtual facial data; Identifying a second emotion vector of a target emotion corresponding to the user behavior data according to semantic information and image information in the user behavior data; the determining the emotion matching evaluation value of the virtual face data according to the emotion vector of the virtual face data and the target emotion corresponding to the user behavior data comprises the following steps: an emotional engagement evaluation value of the virtual face data is determined based on a difference between the first emotional vector and the second emotional vector. In some embodiments, the extracting the multi-modal features of the virtual face data and the user behavior data corresponding to the virtual face data, respectively, includes: extracting audio features of audio data of the user behavior data; Extracting a mouth key point motion sequence of the virtual face data; The determining the mouth shape synchronization evaluation value of the virtual face data according to the mouth part key point motion sequence of the virtual face data and the audio characteristics corresponding to the user behavior data comprises the following steps: determining the degree of correlation between the audio features and the changing process of the mouth key point motion sequence; And determining a mouth shape synchronization evaluation value of the virtual face data based on the degree of correlation of the change process. In some embodiment