CN-121330131-B - Virtual element synthesis control system and method applied to 3D animation

CN121330131BCN 121330131 BCN121330131 BCN 121330131BCN-121330131-B

Abstract

The invention discloses a virtual element synthesis control system and a virtual element synthesis control method applied to a 3D animation, which relate to the technical field of image processing, and when a GPU (graphics processing unit) renders a scene picture in the 3D animation, picture data in the rendering process are extracted and stored in the GPU; the method comprises the steps of constructing a mask prediction model by utilizing a convolutional neural network, predicting a three-dimensional space occupation mask which is generated when virtual elements interact with a scene object through the mask prediction model, constructing a learning network by utilizing a neural network algorithm, calculating the emergent light intensity corresponding to the incident light intensity in a picture, calculating the contact shadow and the ambient light shielding of the virtual elements cast on the scene object in a 3D animation picture, calculating the self-luminous color of the virtual elements by utilizing the emergent light intensity, the contact shadow and the ambient light shielding, carrying out time sequence consistency reinforcement after carrying out virtual element synthesis on each frame picture in the 3D animation, and outputting the 3D animation synthesized by the virtual elements.

Inventors

YANG HAOGANG

Assignees

环球墨非(北京)科技有限公司

Dates

Publication Date: 20260512
Application Date: 20251023

Claims (10)

1. The virtual element synthesis control method applied to the 3D animation is characterized by comprising the following steps of: S100, when the GPU renders the scene images in the 3D animation, setting the number of the output of the shaders in the rendering channel to the rendering targets to be larger than one; S200, constructing a mask prediction model by utilizing a convolutional neural network, and taking the extracted picture data as input; the three-dimensional space occupation mask which is generated when the virtual element is predicted to interact with the scene object through the mask prediction model is specifically: Combining the world position texture, the world normal texture and the depth texture in the geometric Buffer G-Buffer to form a multi-channel tensor, wherein three position values in the world position texture, three normal vector values in the world normal texture and three depth values in the depth texture are extracted and integrated to obtain seven characteristic values; the encoder downsamples and extracts the space geometric features of the scene picture, and the decoder upsamples and predicts the probability map; The mask prediction model outputs a probability mask map M occ (P) with the same resolution as the display screen, wherein P represents pixels, M occ (P) E [0,1], and the probability mask map represents mask probability of a scene picture which is blocked by the occupation space of virtual elements at the position of the pixels P; Presetting a probability threshold, judging the mask probability of each pixel in the probability mask graph by using the probability threshold, and judging that the corresponding pixel point is blocked by the virtual element when the mask probability is larger than the probability threshold; s300, extracting incident light intensity and illumination direction data from picture data, constructing a learning network by using a neural network algorithm, and calculating emergent light intensity corresponding to the incident light intensity in a picture; the emergent light intensity corresponding to the incident light intensity in the calculated picture is specifically: the method comprises the steps of extracting incident light intensity and illumination direction from HDR environment illumination information, constructing a fully connected network by utilizing a neural network algorithm, inputting the incident light intensity into the fully connected network and outputting emergent light intensity; Extracting the absorption coefficient and the scattering coefficient of the virtual element for the light intensity, and calculating the transmissivity of the virtual element by utilizing the absorption coefficient and the scattering coefficient; Extracting the depth value of light entering the virtual element and the depth value of light exiting the virtual element from the depth texture, calculating the depth difference between the depth value and the depth value to obtain the light travelling distance, and calculating the emergent light intensity by using the incident light intensity, the transmittance of the virtual element and the travelling distance; S400, calculating contact shadows and ambient light shielding of virtual elements projected on scene objects in the 3D animation picture; s500, calculating self-luminous colors of virtual elements by using emergent light intensity, contact shadow and ambient light shielding, extracting a background color and a space occupation ratio shade predicted by a shade prediction model, and performing physical-based volume mixing to output final colors of a picture; s600, performing time sequence consistency reinforcement after virtual element synthesis on each frame of picture in the 3D animation, and outputting the 3D animation after virtual element synthesis.
2. The virtual element synthesis control method applied to 3D animation according to claim 1, wherein the specific steps of extracting the frame data in the rendering process and storing the frame data in the GPU in S100 are as follows: S101, when virtual elements are synthesized, extracting a scene picture in a 3D animation, rendering the scene picture, configuring a shader in a rendering channel to output more than one number of rendering targets, extracting picture data in the rendering process, wherein the picture data comprises world position textures, world normal textures, depth textures, roughness textures and HDR environment illumination information, the world position textures store 3D coordinates (x, y, z) of each pixel in world space, the world normal textures store surface unit normal vectors (N x ,N y, N z ) corresponding to each pixel, the depth textures store depth values D of each pixel, the roughness textures store material attributes of the object surface, the HDR environment illumination information comprises global illumination of the scene, environment cube maps and spherical harmonic illumination coefficients, and storing all picture data in a GPU to form a geometric Buffer G-Buffer.
3. The method for controlling synthesis of virtual elements applied to 3D animation according to claim 2, wherein the specific steps of predicting, in S200, a three-dimensional space occupation mask to be generated when the virtual element interacts with the scene object through the mask prediction model are as follows: S201, constructing a mask prediction model by utilizing a lightweight editor-decoder convolutional neural network, extracting virtual elements and picture data of corresponding scenes when the historical virtual elements are synthesized, training the mask prediction model, and learning spatial shielding and surrounding relations between the virtual elements and the scenes through training by the mask prediction model.
4. The method for controlling synthesis of virtual elements for 3D animation according to claim 3, wherein the step of calculating the outgoing light intensity corresponding to the incoming light intensity in the frame in S300 comprises the following steps: S301, for each pixel shielded by a virtual element, extracting the incident light intensity and the illumination direction from the HDR environment illumination information according to the world position texture and the world normal texture; S302, specifically calculating the output light intensity in the fully connected network, namely extracting the absorption coefficient and the scattering coefficient of the virtual element for the light intensity, and calculating the transmissivity of the virtual element by utilizing the absorption coefficient and the scattering coefficient, wherein the formula is as follows: ; In the formula, T (T) represents the transmissivity of the virtual element, beta a represents the absorption coefficient, beta s represents the scattering coefficient, T represents the time from the entrance of light into the virtual element to the exit of the virtual element; Extracting a depth value of light entering the virtual element and a depth value of light exiting the virtual element from the depth texture, calculating a depth difference between the depth value and the depth value to obtain a light travelling distance, and calculating an emergent light intensity by using the incident light intensity, the transmittance of the virtual element and the travelling distance, wherein the formula is as follows: ; in the formula, L out represents the outgoing light intensity, k represents the light travel distance, and L int represents the incoming light intensity.
5. The method for controlling synthesis of virtual elements applied to 3D animation according to claim 4, wherein the step of calculating contact shadows and ambient light shadows of virtual elements projected on a scene object in the 3D animation in S400 comprises the following steps: S401, emitting a ray to the light source direction in the scene according to the position of the pixel point P v of the virtual element of the world position texture as a starting point, setting a sampling point in the ray, extracting a depth value D (P sample ) of each sampling point in the scene, and extracting a depth value D scene of the pixel position corresponding to the depth texture of the scene; When D (P sample ）≥D scene , it is judged that the virtual element pixel point P v does not belong to the "touch shadow area"; s402, measuring the ray distance of a contact point between the ray and the object of the scene when the ray penetrates through the object of the scene, setting the maximum effective distance by a professional according to the requirement of a virtual element, and calculating the shadow intensity in a contact shadow area by using the maximum effective distance, wherein the formula is as follows: ; In the formula, S contact represents the shadow intensity in the contact shadow area, G represents the ray distance between the ray and the contact point of the scene object, G max represents the maximum effective distance, and n represents the shadow transition coefficient; S403, setting sampling directions in normal hemispheres of the pixel points P v by using the same pixel points P v of the virtual element as the center in the scene, judging whether the virtual element pixel points are blocked by the scene object according to the same depth value comparison method as in S401, judging the number of the sampling directions blocked by the scene object, and calculating the duty ratio of the number of the sampling directions blocked by the scene object in all the sampling directions as an ambient light blocking factor AO (P).
6. The method for controlling synthesis of virtual elements applied to 3D animation according to claim 5, wherein the specific steps of performing the final color of the physical-based volumetric blending output picture in S500 are as follows: S501, calculating self-luminous color of a virtual element by using emergent light intensity, contact shadow and ambient light shielding, wherein the formula is as follows: ; In the formula, C vol represents a self-luminous color of the virtual element; S502, extracting a space duty ratio mask predicted by a background color and a mask prediction model, and performing physical-based volume mixing to output a final color of a picture, wherein the formula is as follows: ; In the formula, C final (p) represents the screen final color, and C bg (p) represents the background color.
7. The method for controlling synthesis of virtual elements applied to 3D animation according to claim 6, wherein the specific steps of outputting the 3D animation after synthesis of the virtual elements in S600 are as follows: s601, analyzing each frame of the 3D animation based on a time sequence, extracting the position of each pixel point in the previous frame and the position of each pixel point in the current frame, obtaining a scene motion vector according to the pixel point position difference value of the adjacent frame, and re-projecting the final color of the picture output by the previous frame to the current frame by utilizing the scene motion vector; S602, mixing the final color of the picture calculated and output by the current frame with the final color of the picture after re-projection, wherein the formula is as follows: ; In the formula, C final （p） t represents a mixed final picture, C final （p） traw represents a final picture color calculated and output by a current frame, C final （p） trep represents a final picture color re-projected, alpha represents a mixed weight and is set by a worker; s603, mixing the final colors of the pictures of each frame to obtain a mixed final picture of each frame, and integrating to obtain the 3D animation synthesized by the output virtual elements.
8. The virtual element synthesis control system applied to the 3D animation is characterized by comprising a data acquisition module, a model training module, a light intensity calculation module, a shadow analysis module, a synthesis module and a time sequence optimization module; the data acquisition module is used for setting the number of the output of the shader in the rendering channel to the rendering target to be more than one when the GPU renders the scene in the 3D animation; the model training module is used for constructing a mask prediction model by utilizing a convolutional neural network, taking the extracted picture data as input, and predicting a three-dimensional space occupation mask which is generated when a virtual element interacts with a scene object through the mask prediction model; the light intensity calculation module is used for extracting incident light intensity and illumination direction data from the picture data, constructing a learning network by using a neural network algorithm, and calculating emergent light intensity corresponding to the incident light intensity in the picture; the shadow analysis module is used for calculating contact shadows and ambient light shielding of virtual elements cast on scene objects in the 3D animation picture; The synthesizing module is used for calculating the self-luminous color of the virtual element by utilizing the emergent light intensity, the contact shadow and the ambient light shielding, extracting the background color and the space occupation ratio shade predicted by the shade prediction model, and carrying out physical-based volume mixing to output the final color of the picture; The time sequence optimizing module is used for carrying out time sequence consistency strengthening after virtual element synthesis on each frame of picture in the 3D animation and outputting the 3D animation after virtual element synthesis.
9. The virtual element composition control system applied to 3D animation of claim 8, wherein the shadow analysis module comprises a shadow intensity unit and an ambient light blocking factor unit; The shadow intensity unit is used for measuring the ray distance of the contact point between the ray and the scene object when the ray penetrates through the scene object, the professional sets the maximum effective distance according to the virtual element requirement, and the shadow intensity in the contact shadow area is calculated by using the maximum effective distance; The ambient light shielding factor unit is used for calculating the duty ratio of the number of all sampling directions which are judged to be shielded by the scene object in all the sampling directions as the ambient light shielding factor.
10. The virtual element composition control system applied to 3D animation according to claim 8, wherein the composition module comprises a self-luminous color unit and a composition unit; The self-luminous color unit is used for calculating the self-luminous color of the virtual element by utilizing the emergent light intensity, the contact shadow and the ambient light shielding; The synthesizing unit is used for extracting a background color and a space duty ratio shade predicted by the shade prediction model and performing physical-based volume mixing to output a picture final color.

Description

Virtual element synthesis control system and method applied to 3D animation Technical Field The invention relates to the technical field of image processing, in particular to a virtual element synthesis control system and method applied to 3D animation. Background In 1950, dream island was considered the first 3D animated film in the world, marking the initial formation of 3D animation technology. In 1962, the father of computer graphics, isuwana, sasepharan developed SKETCHPAD systems, which were first used to achieve light pen cross-drawing. In the early 1970 s, the development of three-dimensional auxiliary animation systems, such as the system of the Ohio state university, was gradually rising, which can be shaded. In the 80 s, with the penetration of computer graphics theory, three-dimensional object modeling technology advances, and key parameter interpolation methods, kinematic algorithms and dynamic algorithms for animation generation appear. In 1989, james's seed-calmeturon created the first liquid CG organism in profound. In 1995, the toy general mobilization of Picks was regarded as the shock of the head full CG long piece, and the coming of the three-dimensional animation era was announced. Computer graphics provides a basis for 3D animated virtual element synthesis, including 3D modeling, illumination computation, shadow processing, material and texture rendering, and other techniques. For example, depth learning based texture synthesis techniques are capable of generating realistic natural surface textures, and improved illumination calculation and shading algorithms enhance the visual effect of the animation. Modern animation engines such as Maya, unrealEngine and Unity are continuously optimized, and technologies such as ray tracing and real-time physical simulation are introduced, so that the fidelity and rendering speed of animation effects are remarkably improved, and a stronger tool is provided for virtual element synthesis. However, in the present 3D animation virtual element composition, there is still unavoidable that the traditional green screen matting or manual ROTO cannot perfectly handle the interaction of semitransparent, reflective and refractive elements (such as smoke) and scenes. The virtual element looks like being stuck on the scene without real optical fusion, and the illumination, shadow and reflection of the virtual element need to be manually matched with the illumination environment of the real scene frame by frame, which is complex in process and easy to make mistakes. Disclosure of Invention The invention aims to provide a virtual element synthesis control system and a virtual element synthesis control method for 3D animation, which are used for solving the problems in the prior art. In order to achieve the above purpose, the present invention provides the following technical solutions: a virtual element composition control method applied to 3D animation, the method comprising the steps of: S100, when the GPU renders the scene images in the 3D animation, setting the number of the output of the shaders in the rendering channel to the rendering targets to be larger than one; further, the specific steps of extracting the frame data in the rendering process and storing the frame data in the GPU are as follows: S101, when virtual elements are synthesized, extracting a scene picture in a 3D animation, rendering the scene picture, configuring a shader in a rendering channel to output more than one number of rendering targets, extracting picture data in the rendering process, wherein the picture data comprises world position textures, world normal textures, depth textures, roughness textures and HDR environment illumination information, the world position textures store 3D coordinates (x, y, z) of each pixel in world space, the world normal textures store surface unit normal vectors (N x,Ny,Nz) corresponding to each pixel, the depth textures store depth values D of each pixel, the roughness textures store material attributes of the object surface, the HDR environment illumination information comprises global illumination of the scene, environment cube maps and spherical harmonic illumination coefficients, and storing all picture data in a GPU to form a geometric Buffer G-Buffer. The shader outputs to a plurality of rendering targets, can synchronously acquire rich picture data, avoid data loss caused by single output, and provide a basis for subsequent multi-dimensional computation. The key data such as world position, normal line and depth are stored in the GPU and form a geometric Buffer G-Buffer, the parallel processing capacity of the GPU can greatly improve the data reading and subsequent calculation efficiency, and meanwhile, the integrated G-Buffer can enable a subsequent model and algorithm to directly call multidimensional data, so that the data transmission loss is reduced. S200, constructing a mask prediction model by utilizing a convolutional neural network, and