CN-122002091-A - Video processing method and related device

CN122002091ACN 122002091 ACN122002091 ACN 122002091ACN-122002091-A

Abstract

The embodiment of the application provides a video processing method and a related device. The video processing method comprises the steps of obtaining a first picture, wherein the first picture comprises a face image of a target object, obtaining a first video, wherein the first video comprises a plurality of first video frames, each first video frame in the plurality of first video frames comprises a first digital object which is a digital object of the target object, extracting the face characteristic of the face image of the target object in the first picture, adjusting the face characteristic of the first digital object respectively contained in the plurality of first video frames based on the face characteristic of the face image of the target object, and obtaining a second video, wherein the plurality of second video frames in the second video respectively correspond to the plurality of first video frames, each second video frame comprises a second digital object, and the second digital object is the first digital object with the processed face characteristic. By adopting the method, the reality of the digital object in the video or the visual effect of the video can be improved.

Inventors

BAI RUIFENG
LIU HUI
XU DI

Assignees

华为云计算技术有限公司

Dates

Publication Date: 20260508
Application Date: 20241104

Claims (19)

1.A video processing method, comprising: acquiring a first picture, wherein the first picture comprises a face image of a target object; Acquiring a first video, wherein the first video comprises a plurality of first video frames, each first video frame in the plurality of first video frames comprises a first digital object, and the first digital object is the digital object of the target object; Extracting facial features of a facial image of the target object in the first picture, and adjusting facial features of a first digital object contained in each of the plurality of first video frames based on the facial features of the facial image of the target object; And obtaining a second video, wherein a plurality of second video frames in the second video respectively correspond to the plurality of first video frames, and each second video frame comprises a second digital object, and the second digital object is the first digital object with the facial features processed.
2. The method of claim 1, wherein prior to the capturing the first video, the method further comprises: Acquiring a plurality of second pictures, wherein the second pictures are respectively imaging of the target object at different visual angles; Performing three-dimensional reconstruction on the plurality of second pictures to obtain a three-dimensional model of the first digital object; rendering the three-dimensional model of the first digital object generates the first video, the first video comprising a dynamic picture of the first digital object.
3. The method according to claim 1 or 2, wherein prior to said obtaining the second video, the method further comprises: And carrying out consistency processing and fusion processing on the plurality of second video frames and first digital objects contained in first video frames corresponding to the plurality of second video frames, wherein the consistency processing comprises at least one of pose consistency processing, style consistency processing or background consistency processing, and the fusion processing comprises splicing face images in the second video frames with non-face images of the first digital objects contained in the first video frames.
4. The method of claim 3, wherein before the performing the consistency processing and the fusion processing on the second video frame corresponding to each of the first video frames and the first digital object included in each of the first video frames, the method further comprises: Responding to adjustment information sent by a user, wherein the adjustment information is used for adjusting at least one of skin color and skin quality; And adjusting facial features of the second digital objects contained in the plurality of second video frames according to the adjustment information to update the plurality of second video frames, wherein the updated plurality of second video frames are used for carrying out the consistency processing and the fusion processing.
5. The method according to claim 3 or 4, wherein the pose consistency process comprises: Establishing a mapping relation between a first key point and a second key point, wherein the first key point is a facial key point of a second digital object contained in each second video frame, the second key point is a facial key point of a first digital object contained in each first video frame, and the mapping relation is a corresponding relation between the position of the first key point in each second video frame and the position of the second key point in each corresponding first video frame; And according to the mapping relation and the position of the second key point, aligning the positions of the first key points in the corresponding second video frames.
6. The method of any one of claims 3-5, wherein the style consistency process comprises: And adjusting the style of the first digital object in the first video frame based on the style of the facial features of the second digital object, wherein the style comprises a light shadow or a skin color.
7. The method of any of claims 3-6, wherein the background consistency process comprises: And eliminating chromatic aberration between a foreground region and a background region, wherein the foreground region is a region where the first digital object in each first video frame is located, and the background region is a region except the foreground region in each first video frame.
8. The method of any of claims 1-7, wherein the extracting facial features of the target object's facial image in the first picture adjusts facial features of a first digital object included in each of the plurality of first video frames based on facial features of the target object's facial image, comprising: Extracting a face mask of the target object in the first picture and a face mask of a first digital object in each first video frame; And replacing the face mask of the first digital object in each first video frame by adopting the face mask of the target object, and eliminating a mask gap, wherein the mask gap is a gap between the replaced mask and a mask adjacent area.
9. A video processing apparatus, comprising: The first acquisition module is used for acquiring a first picture, wherein the first picture comprises a face image of a target object; A second acquisition module, configured to acquire a first video, where the first video includes a plurality of first video frames, where each of the plurality of first video frames includes a first digital object, and the first digital object is a digital object of the target object; The processing module is used for extracting the facial features of the facial image of the target object in the first picture, adjusting the facial features of the first digital object contained in each of the plurality of first video frames based on the facial features of the facial image of the target object, and obtaining second videos, wherein each of the plurality of second video frames in the second video corresponds to the plurality of first video frames, each of the second video frames contains a second digital object, and the second digital object is the first digital object with the facial features processed.
10. The apparatus of claim 9, wherein, prior to said capturing the first video, The first acquisition module is further used for acquiring a plurality of second pictures, and the second pictures are respectively imaging of the target object at different visual angles; The processing module is further used for carrying out three-dimensional reconstruction on the plurality of second pictures to obtain a three-dimensional model of the first digital object, rendering the three-dimensional model of the first digital object to generate the first video, and the first video comprises a dynamic picture of the first digital object.
11. The apparatus of claim 9 or 10, wherein, prior to said obtaining the second video, The processing module is further configured to perform consistency processing and fusion processing on the plurality of second video frames and a first digital object included in a first video frame corresponding to each of the plurality of second video frames, where the consistency processing includes at least one of pose consistency processing, style consistency processing, or background consistency processing, and the fusion processing includes stitching a face image in the second video frame with a non-face image of the first digital object included in the first video frame.
12. The apparatus of claim 11, wherein the means for performing the consistency and fusion of the second video frame corresponding to each of the first video frames with the first digital object contained in each of the first video frames is preceded by a means for performing a consistency and fusion of the second video frame with the first digital object contained in each of the first video frames, The processing module is further used for responding to adjustment information sent by a user, and the adjustment information is used for adjusting at least one of skin color and skin quality; the processing module is further configured to adjust facial features of the second digital objects included in the plurality of second video frames according to the adjustment information, so as to update the plurality of second video frames, where the updated plurality of second video frames are used to perform the consistency processing and the fusion processing.
13. The apparatus according to claim 11 or 12, wherein the pose consistency process comprises: Establishing a mapping relation between a first key point and a second key point, wherein the first key point is a facial key point of a second digital object contained in each second video frame, the second key point is a facial key point of a first digital object contained in each first video frame, and the mapping relation is a corresponding relation between the position of the first key point in each second video frame and the position of the second key point in each corresponding first video frame; And according to the mapping relation and the position of the second key point, aligning the positions of the first key points in the corresponding second video frames.
14. The apparatus of any of claims 11-13, wherein the style consistency process comprises: And adjusting the style of the first digital object in the first video frame based on the style of the facial features of the second digital object, wherein the style comprises a light shadow or a skin color.
15. The apparatus of any of claims 11-14, wherein the context consistency process comprises: And eliminating chromatic aberration between a foreground region and a background region, wherein the foreground region is a region where the first digital object in each first video frame is located, and the background region is a region except the foreground region in each first video frame.
16. The apparatus according to any one of claims 9-15, wherein in said extracting facial features of the facial image of the target object in the first picture, the processing module is configured to adjust facial features of the first digital object included in each of the plurality of first video frames based on the facial features of the facial image of the target object, in particular: Extracting a face mask of the target object in the first picture and a face mask of a first digital object in each first video frame; And replacing the face mask of the first digital object in each first video frame by adopting the face mask of the target object, and eliminating a mask gap, wherein the mask gap is a gap between the replaced mask and a mask adjacent area.
17. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory; The processor of the at least one computing device is configured to execute instructions stored in a memory of the at least one computing device to cause the cluster of computing devices to perform the method of any of claims 1-8.
18. A computer-readable storage medium having stored therein instructions which, when run on a cluster of computing devices, cause the cluster of computing devices to perform the method of any of claims 1-8.
19. A computer program product comprising program instructions that, when run on a cluster of computing devices, cause the cluster of computing devices to perform the method of any of claims 1-8.

Description

Video processing method and related device Technical Field The embodiment of the application relates to the technical field of computers, in particular to a video processing method and a related device. Background With the development of virtual reality and metauniverse technologies, digital objects (e.g., digital persons) have been attracting attention as virtual characters, as a variety of applications can be implemented in a virtual digital space. A digital object is a model built by computer graphics technology that mimics an object in the real world (e.g., a real person). The digital object itself is a static graphic and in order for the digital object to exhibit dynamic characteristics, a digital object video may be generated. Digital object video is video in which a dynamic digital object is presented through a plurality of successive video frames. To generate a digital object video, objects in the real world may be simulated by three-dimensional modeling software to construct an initial three-dimensional model. Further, details, textures, skeleton structures and the like can be added to the initial three-dimensional model, and then rendering is performed, so that the three-dimensional digital object can be obtained. After the three-dimensional digital object is obtained, an animation related to the three-dimensional digital object can be set, and then the animation is rendered to obtain the digital object video. The digital object video generated by the method has poor visual effect of the digital object in the digital object video due to the rough digital object obtained by three-dimensional modeling. Disclosure of Invention The application provides a video processing method which is used for improving the visual effect of a digital object in a video. The application also provides a video processing device, a computing device cluster, a computer readable storage medium, a computer program product and the like. The application provides a video processing method, which comprises the steps of obtaining a first picture, wherein the first picture comprises a face image of a target object, obtaining a first video, wherein the first video comprises a plurality of first video frames, each first video frame in the plurality of first video frames comprises a first digital object, the first digital object is a digital object of the target object, extracting the face characteristic of the face image of the target object in the first picture, adjusting the face characteristic of the first digital object in the plurality of first video frames based on the face characteristic of the face image of the target object, and obtaining a second video, wherein the plurality of second video frames in the second video correspond to the plurality of first video frames, each second video frame comprises a second digital object, and the second digital object is the first digital object with the processed face characteristic. The computing devices in the cluster of computing devices may obtain the first picture and/or the first video from a local or other computing device. The first picture contains one or more target objects, the target objects containing facial features. The target object may be an object in the real world. A first picture containing a facial image of the target object may be obtained by photographing the target object. A target object, such as a person in the real world, a facial image of the target object, such as a face image of the person. The digital object of the target object is a first digital object, such as a digital person obtained by virtualization processing of a real person in the real world. The video containing the first digital object may be a first video comprising a plurality of first video frames, each first video frame of the plurality of first video frames containing the first digital object, and a dynamic picture containing the first digital object may be presented as the video is continuously played. The target object and the first digital object each have facial features, and the corresponding facial features can be obtained from the first picture or the first video frame by feature extraction. The facial features may be local features of the face, such as nose, mouth, eyes, skin, etc., or global features of the face, i.e., features that encompass the entire face. After extracting the facial features of the facial image of the target object, the facial features of the first digital object contained in each of the plurality of first video frames are adjusted based on the facial features, so that the facial features of the first digital object can be more matched with the target object, and the second digital object is obtained after adjustment. And after the plurality of first video frames are adjusted, a plurality of second video frames can be obtained, and the plurality of second video frames are sequenced according to time to obtain a second video. In the first aspect, the first di