US-12620260-B2 - Information processing method, computer device, and storage medium

US12620260B2US 12620260 B2US12620260 B2US 12620260B2US-12620260-B2

Abstract

An information processing method is provided. The method includes obtaining a target video. A first target image feature corresponding to a face image of each frame is obtained. A target identity coefficient and a target texture coefficient corresponding to the target image feature in the face image of a different frame in the target video are obtained. A first target identity feature is obtained according to the target identity coefficient, and a first target texture feature is obtained according to the target texture coefficient. Once a first target feature is obtained by splicing the target image feature, the first target identity feature and the first target texture feature, a first target expression coefficient is obtained based on the first target feature.

Inventors

Chun Wang
Dingheng Zeng
Xunyi Zhou
Ning Jiang

Assignees

MASHANG CONSUMER FINANCE CO., LTD.

Dates

Publication Date: 20260505
Application Date: 20231227
Priority Date: 20220408

Claims (20)

1 . An information processing method, comprising: obtaining a target video comprising a plurality of frames, each of the plurality of frames comprising a face image corresponding to a same object; obtaining a first target image feature corresponding to the face image of each frame; determining a first target identity coefficient and a first target texture coefficient; determining a first target identity feature based on the first target identity coefficient; determining a first target texture feature based on the first target texture coefficient; splicing the first target image feature, the first target identity feature, and the first target texture feature, and obtaining a first target feature; and determining a first target expression coefficient based on the first target feature.
2 . The information processing method according to claim 1 , wherein determining the first target identity coefficient and the first target texture coefficient comprises: obtaining a first identity coefficient and a first texture coefficient of the face image corresponding to the first target image feature in a previous frame of the target video; obtaining a second identity coefficient and a second texture coefficient corresponding to the first target image feature; performing a weighted summation on the first identity coefficient and the second identity coefficient, and obtaining the first target identity coefficient corresponding to the first target image feature; performing the weighted summation on the first texture coefficient and the second texture coefficient, and obtaining the first target texture coefficient corresponding to the first target image feature.
3 . The information processing method according to claim 2 , wherein after determining the first target expression coefficient, the method further comprises: using the first target identity coefficient to replace the second identity coefficient corresponding to the first target image feature in the face image of a current frame in the target video; using the first target texture coefficient to replace the second texture coefficient corresponding to the first target image feature in the face image of the current frame in the target video.
4 . The information processing method according to claim 3 , wherein obtaining the target video comprises: acquiring an initial video; extracting the face image of each frame in the initial video; determining the same object by analyzing the face image of each frame, and determining one or more video segments from the initial video, each of the one or more video segments comprising at least two frames and the same object comprised in each of the at least two frames; determining one of the one or more video segments with a number of frames greater than a preset threshold as the target video.
5 . The information processing method according to claim 4 , wherein determining one of the one or more video segments with the number of frames greater than the preset threshold as the target video comprises: determining the one of the one or more video segments with the number of frames greater than the preset threshold as a first target video segment; obtaining a second target video segment by performing a style transformation on the first target video segment; and determining each of the first target video segment and the second target video segment as the target video.
6 . The information processing method according to claim 5 , wherein after determining the first target identity coefficient and the first target texture coefficient, further comprises: generating a first target loss function, comprising: inputting the first target identity coefficient into a second preset backbone model, and outputting a first identity feature; inputting the first target texture coefficient into a third preset backbone model, and outputting a first texture feature; splicing the first target image feature, the first identity feature, and the first texture feature, and obtaining a first feature; inputting the first feature into a preset head network model, and outputting a first predicted expression coefficient; generating a first predicted face three-dimensional model according to a label identity coefficient, a label texture coefficient, the first predicted expression coefficient, a label posture coefficient, and a label lighting coefficient; obtaining a first difference between a first face estimated value corresponding to the first predicted face three-dimensional model and an un-occluded area in the face image; obtaining a second difference between first predicted face three-dimensional key points corresponding to the first predicted face three-dimensional model and face three-dimensional key points in the face image; and establishing the first target loss function based on the first difference and the second difference.
7 . The information processing method according to claim 6 , further comprising: performing an optimization on a first network parameters of the second preset backbone model, the third preset backbone model, and the preset head network model according to the first target loss function; returning to repeatedly generate the first target loss function, iteratively optimizing the first network parameters of the second preset backbone model, the third preset backbone model, and the preset head network model through the first target loss function that is generated, until the first target loss function converges, and obtaining a second target preset backbone model, a third target preset backbone model, and a target preset head network model that have been trained.
8 . A computer device comprising: a storage device; at least one processor; and the storage device storing one or more programs, which when executed by the at least one processor, cause the at least one processor to: obtain a target video comprising a plurality of frames, each of the plurality of frames comprising a face image corresponding to a same object; obtain a first target image feature corresponding to the face image of each frame; determine a first target identity coefficient and a first target texture coefficient; determine a first target identity feature based on the first target identity coefficient; determine a first target texture feature based on the first target texture coefficient; splice the first target image feature, the first target identity feature, and the first target texture feature, and obtain a first target feature; and determine a first target expression coefficient based on the first target feature.
9 . The computer device according to claim 8 , wherein the at least one processor determines the first target identity coefficient and the first target texture coefficient by: obtaining a first identity coefficient and a first texture coefficient of the face image corresponding to the first target image feature in a previous frame of the target video; obtaining a second identity coefficient and a second texture coefficient corresponding to the first target image feature; performing a weighted summation on the first identity coefficient and the second identity coefficient, and obtaining the first target identity coefficient corresponding to the first target image feature; performing the weighted summation on the first texture coefficient and the second texture coefficient, and obtaining the first target texture coefficient corresponding to the first target image feature.
10 . The computer device according to claim 9 , wherein after determining the first target expression coefficient, the at least one processor is further caused to: use the first target identity coefficient to replace the second identity coefficient corresponding to the first target image feature in the face image of a current frame in the target video; use the first target texture coefficient to replace the second texture coefficient corresponding to the first target image feature in the face image of the current frame in the target video.
11 . The computer device according to claim 10 , wherein the at least one processor obtains the target video by: acquiring an initial video; extracting the face image of each frame in the initial video; determining the same object by analyzing the face image of each frame, and determining one or more video segments from the initial video, each of the one or more video segments comprising at least two frames and the same object comprised in each of the at least two frames; determining one of the one or more video segments with a number of frames greater than a preset threshold as the target video.
12 . The computer device according to claim 11 , wherein the at least one processor determines one of the one or more video segments with the number of frames greater than the preset threshold as the target video by: determining the one of the one or more video segments with the number of frames greater than the preset threshold as a first target video segment; obtaining a second target video segment by performing a style transformation on the first target video segment; and determining each of the first target video segment and the second target video segment as the target video.
13 . The computer device according to claim 12 , wherein after determining the first target identity coefficient and the first target texture coefficient, the at least one processor is further caused to: generate a first target loss function, comprising: input the first target identity coefficient into a second preset backbone model, and output a first identity feature; input the first target texture coefficient into a third preset backbone model, and output a first texture feature; splice the first target image feature, the first identity feature, and the first texture feature, and obtain a first feature; input the first feature into a preset head network model, and output a first predicted expression coefficient; generate a first predicted face three-dimensional model according to a label identity coefficient, a label texture coefficient, the first predicted expression coefficient, a label posture coefficient, and a label lighting coefficient; obtain a first difference between a first face estimated value corresponding to the first predicted face three-dimensional model and an un-occluded area in the face image; obtain a second difference between first predicted face three-dimensional key points corresponding to the first predicted face three-dimensional model and face three-dimensional key points in the face image; and establish the first target loss function based on the first difference and the second difference.
14 . The computer device according to claim 13 , the at least one processor is further caused to: perform an optimization on a first network parameters of the second preset backbone model, the third preset backbone model, and the preset head network model according to the first target loss function; return to repeatedly generate the first target loss function, iteratively optimizing the first network parameters of the second preset backbone model, the third preset backbone model, and the preset head network model through the first target loss function that is generated, until the first target loss function converges, and obtaining a second target preset backbone model, a third target preset backbone model, and a target preset head network model that have been trained.
15 . A non-transitory storage medium having instructions stored thereon, when the instructions are executed by a processor of a computer device, the processor is caused to perform an information processing method, wherein the method comprises: obtaining a target video comprising a plurality of frames, each of the plurality of frames comprising a face image corresponding to a same object; obtaining a first target image feature corresponding to the face image of each frame; determining a first target identity coefficient and a first target texture coefficient; determining a first target identity feature based on the first target identity coefficient; determining a first target texture feature based on the first target texture coefficient; splicing the first target image feature, the first target identity feature, and the first target texture feature, and obtaining a first target feature; and determining a first target expression coefficient based on the first target feature.
16 . The non-transitory storage medium according to claim 15 , wherein determining the first target identity coefficient and the first target texture coefficient comprises: obtaining a first identity coefficient and a first texture coefficient of the face image corresponding to the first target image feature in a previous frame of the target video; obtaining a second identity coefficient and a second texture coefficient corresponding to the first target image feature; performing a weighted summation on the first identity coefficient and the second identity coefficient, and obtaining the first target identity coefficient corresponding to the first target image feature; performing the weighted summation on the first texture coefficient and the second texture coefficient, and obtaining the first target texture coefficient corresponding to the first target image feature.
17 . The non-transitory storage medium according to claim 16 , wherein after determining the first target expression coefficient, the method further comprises: using the first target identity coefficient to replace the second identity coefficient corresponding to the first target image feature in the face image of a current frame in the target video; using the first target texture coefficient to replace the second texture coefficient corresponding to the first target image feature in the face image of the current frame in the target video.
18 . The non-transitory storage medium according to claim 17 , wherein obtaining the target video comprises: acquiring an initial video; extracting the face image of each frame in the initial video; determining the same object by analyzing the face image of each frame, and determining one or more video segments from the initial video, each of the one or more video segments comprising at least two frames and the same object comprised in each of the at least two frames; determining one of the one or more video segments with a number of frames greater than a preset threshold as the target video.
19 . The non-transitory storage medium according to claim 18 , wherein determining one of the one or more video segments with the number of frames greater than the preset threshold as the target video comprises: determining the one of the one or more video segments with the number of frames greater than the preset threshold as a first target video segment; obtaining a second target video segment by performing a style transformation on the first target video segment; and determining each of the first target video segment and the second target video segment as the target video.
20 . The non-transitory storage medium according to claim 19 , wherein after determining the first target identity coefficient and the first target texture coefficient, the method further comprises: generating a first target loss function, comprising: inputting the first target identity coefficient into a second preset backbone model, and outputting a first identity feature; inputting the first target texture coefficient into a third preset backbone model, and outputting a first texture feature; splicing the first target image feature, the first identity feature, and the first texture feature, and obtaining a first feature; inputting the first feature into a preset head network model, and outputting a first predicted expression coefficient; generating a first predicted face three-dimensional model according to a label identity coefficient, a label texture coefficient, the first predicted expression coefficient, a label posture coefficient, and a label lighting coefficient; obtaining a first difference between a first face estimated value corresponding to the first predicted face three-dimensional model and an un-occluded area in the face image; obtaining a second difference between first predicted face three-dimensional key points corresponding to the first predicted face three-dimensional model and face three-dimensional key points in the face image; and establishing the first target loss function based on the first difference and the second difference.

Description

The present application claims the priority to Chinese patent application with application No. 202210369409.5, filed on Apr. 8, 2022, entitled “INFORMATION PROCESSING METHOD, DEVICE, COMPUTER DEVICE, AND STORAGE MEDIUM”, the content of the present application is incorporated herein by reference. FIELD The present application relates to the field of computer vision technology, and specifically to an information processing method, a device, a computer device, and a storage medium. BACKGROUND Face reconstruction is a popular field in computer vision. Reconstructing a face 3D model based on face images is one of fundamental technologies for many face-related applications. In some cases, the face 3D model is constructed through a parameterized face 3D reconstruction algorithm. The parameterized face 3D reconstruction algorithm takes a parametric face 3D model to acquire prior information as constrains. A problem to reconstruction the face 3D is transformed into an estimation of parameters of a parameterized face 3D model, which can well cope with the reconstruction of face 3D model in different environments. Moreover, commonly used e parameterized face 3D reconstruction algorithms often do reconstruction by estimating 3DMM (3D Morphable Model) coefficients. BRIEF DESCRIPTION OF THE DRAWINGS In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without exerting creative efforts. FIG. 1 is a schematic diagram of a scene of an information processing system provided by an embodiment of the present application; FIG. 2 is a schematic diagram of a flowchart of an information processing method provided by an embodiment of the present application; FIG. 3 is a schematic diagram of another flowchart of the information processing method provided by an embodiment of the present application; FIG. 4A is a schematic diagram of a scene of the information processing method provided by an embodiment of the present application; FIG. 4B is a schematic diagram of a framework of the information processing system provided by an embodiment of the present application; FIG. 4C is a schematic diagram of another framework of the information processing system provided by an embodiment of the present application; FIG. 4D is a schematic diagram of another framework of the information processing system provided by an embodiment of the present application; FIG. 5 is a schematic diagram of a structural of an information processing device provided by an embodiment of the present application; FIG. 6 is a schematic diagram of a structural of a computer device provided by an embodiment of the present application. DESCRIPTION The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without making creative efforts fall within the scope of protection of this application. In order to implement subsequent operations such as determination of speakers or editing of expressions, it is often necessary to extract expression information of human face in an image. However, 3DMM expression information directly extracted by an image-based parameterized face 3D reconstruction algorithm commonly used mixes with other non-expression information, which results the expression information that is extracted inaccurate and result a poor accuracy of information processing. In order to solve the above problem, the present application provides an information processing method, an information processing device, a computer device, and a storage medium. Among them, the information processing method can be applied to the information processing device. The information processing device may be integrated in the computer device, and the computer device may be a terminal having an information processing function. The terminal can be, but is not limited to a smart phone, a tablet computer, a notebook computer, a desktop computer, and a smart watch, etc. The computer device can also be a server, and the server can be an independent physical server, or a server cluster or a distributed system composed of a plurality of physical servers, or can also be a cloud server that provides basic cloud computing services. The basic cloud computing services include cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud