CN-122023645-A - Random data enhancement method, device and storage medium based on three-dimensional space structure modeling

CN122023645ACN 122023645 ACN122023645 ACN 122023645ACN-122023645-A

Abstract

The invention relates to the technical field of computer vision and three-dimensional modeling, and discloses a random data enhancement method, equipment and a storage medium based on three-dimensional space structure modeling, wherein the method comprises the steps of modeling a three-dimensional geometry and appearance of a target object, establishing a scene space model and defining an internal reference matrix of a virtual camera; the method comprises the steps of generating a camera position and a camera orientation meeting viewing cone constraint conditions, projecting a target object and a scene space model to a two-dimensional image plane based on an internal reference matrix of a virtual camera and a constructed external reference matrix to generate a two-dimensional basic image, and generating a final training sample on the basis of the two-dimensional basic image. According to the method, the three-dimensional geometry and appearance modeling is carried out on the target object, the target object is projected to the two-dimensional image plane based on the internal reference matrix and the external reference matrix, and the two-dimensional basic image is generated by combining the illumination model rendering, so that the problem that the conventional two-dimensional data enhancement method cannot simulate visual angle change and illumination interaction is solved.

Inventors

Fan Rizhao
Cui Zongshuai
ZHANG YUE
CHENG JIAN
WANG GUANGFU
WANG HAO
LEI CHAO
LI YUXIN

Assignees

天地科技股份有限公司北京煤炭共性技术研究分公司
煤炭科学研究总院有限公司

Dates

Publication Date: 20260512
Application Date: 20251231

Claims (10)

1. The random data enhancement method based on three-dimensional space structure modeling is characterized by comprising the following steps of: S100, performing three-dimensional geometric and appearance modeling on a target object, establishing a scene space model, and defining an internal reference matrix of a virtual camera; S200, randomly setting the position and the posture of the target object in the world coordinate system in the scene space model, configuring virtual light source parameters, and generating a camera position and a camera orientation which meet the viewing cone constraint condition; S300, based on an internal reference matrix of the virtual camera and an external reference matrix constructed according to the camera position and the camera orientation, projecting the target object and the scene space model which are set by the position and the gesture to a two-dimensional image plane, processing a space shielding relation between the target object and the scene space model by using a depth buffer technology, and combining the virtual light source parameter and the illumination model for rendering to generate a two-dimensional basic image with physical consistency; S400, based on the two-dimensional basic image, sensor noise, two-dimensional area shielding, illumination change and imaging artifact simulation are overlapped, and a final training sample is generated.
2. The method for enhancing random data based on modeling of three-dimensional space structure according to claim 1, wherein in the step S100, the creating a scene space model includes: Obtaining three-dimensional data of a target class to be enhanced, and characterizing the three-dimensional data as a grid model comprising a vertex set, a surface patch set and a texture map, wherein the texture map is used for representing color distribution and reflection characteristics of the surface of the target object; Establishing a virtual background environment model for placing the target object, wherein the virtual background environment model can be realized by adopting various three-dimensional characterization modes, including but not limited to a geometric grid model or a three-dimensional Gaussian sputtering set; When characterized with the three-dimensional gaussian sputtering set, gaussian primitives in the three-dimensional gaussian sputtering set include a positional mean, a covariance matrix, opacity, and spherical harmonic coefficients.
3. The method according to claim 1, wherein in the step S200, the randomly setting the position and the posture of the target object in the world coordinate system in the scene space model includes: randomly sampling translation vectors in a preset three-dimensional scene bounding box range, and randomly sampling a rotation matrix; Performing rigid transformation on vertex coordinates of the target object under a local coordinate system by using the rotation matrix and the translation vector, and calculating to obtain vertex coordinates of the target object under the world coordinate system; and performing collision detection by using an axis alignment bounding box or an orientation bounding box of the target object, and resampling the translation vector and the rotation matrix if the target object and the scene space model are detected to have volume overlapping until non-penetrating physical constraint is met.
4. The method for enhancing random data based on three-dimensional spatial structure modeling according to claim 1, wherein in step S200, the generating a camera position and a camera orientation satisfying a viewing cone constraint condition comprises: randomly sampling the camera position in a preset space taking the center of the target object as the center of sphere, and calculating the camera orientation according to a gaze point strategy to obtain an external reference matrix of the virtual camera; converting key points of the target object from the world coordinate system to a camera coordinate system, and projecting the key points to the two-dimensional image plane to calculate pixel coordinates and depth values; And if the pixel coordinates are positioned in the range of the image width and the image height of the two-dimensional image plane and the depth value is positioned between the distance between the near clipping surface and the distance between the near clipping surface of the virtual camera, judging that the current camera position and the current camera orientation are effective.
5. The method of claim 1, wherein in the step S300, the projecting the target object and the scene space model after the setting of the position and the posture to a two-dimensional image plane includes: Converting vertex coordinates in the world coordinate system into a camera coordinate system by using an external parameter matrix of the virtual camera to obtain camera coordinate system coordinates containing depth values, wherein the depth values are vertical distances relative to a camera plane along the direction of a camera optical axis; and performing perspective projection operation and perspective division operation on the coordinates of the camera coordinate system by using the internal reference matrix of the virtual camera to generate pixel coordinates on the two-dimensional image plane.
6. The method of claim 1, wherein in the step S300, the processing the spatial occlusion relationship between the target object and the scene space model by using a depth buffer technique includes: Rasterizing the geometric primitive projected to the two-dimensional image plane to generate a pixel primitive, and calculating the primitive depth value of the pixel primitive by utilizing gravity center coordinate interpolation; Initializing a depth buffer area, and setting the depth values of all pixel positions in the depth buffer area to be a preset maximum depth value; performing a depth test on each of the pixel patches by comparing a patch depth value of the current pixel patch with a depth value stored in the depth buffer; If the tile depth value is smaller than the stored depth value in the depth buffer zone, judging that the pixel tile is visible, updating the depth buffer zone and the color buffer zone, and if the tile depth value is larger than or equal to the stored depth value in the depth buffer zone, discarding the pixel tile.
7. The method for enhancing random data based on three-dimensional spatial structure modeling of claim 6, wherein said combining said virtual light source parameters and illumination model for rendering generates a two-dimensional base image with physical consistency, comprising: sampling the texture color of the surface of the target object for the pixel fragments passing the depth test, and interpolating to calculate a unit normal vector; constructing a sight line direction vector, a light incidence direction vector and a light reflection direction vector aiming at the pixel cells based on the virtual light source parameters; Based on a local illumination model, respectively calculating an ambient light component, a diffuse reflection component and a specular reflection component by using the texture color, the unit normal vector, the line-of-sight direction vector, the light incident direction vector and the light reflection direction vector; And accumulating the ambient light component, the diffuse reflection component and the specular reflection component to synthesize a final color value of the pixel in the two-dimensional basic image.
8. The method for enhancing random data based on three-dimensional spatial structure modeling according to claim 1, wherein in the step S400, the sensor noise, two-dimensional area occlusion, illumination variation and imaging artifact simulation are superimposed on the basis of the two-dimensional basic image, and the method comprises: Injecting random noise variables obeying Gaussian distribution into pixel values of the two-dimensional basic image by adopting an additive Gaussian white noise model, and carrying out value truncation to generate noisy image data; Randomly generating a rectangular mask area on a two-dimensional image plane of the noisy image data, and replacing pixel values falling into the rectangular mask area with preset filling values or random noise filling to generate shielding image data; and applying convolution kernel to the shielding image data by utilizing convolution operation to generate a fuzzy image, and carrying out nonlinear transformation on the fuzzy image by combining a gamma correction coefficient and a linear gain coefficient to complete the imaging artifact simulation and the superposition of illumination change.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method steps of any one of claims 1 to 8 when the computer program is executed.
10. A computer readable storage medium, characterized in that the storage medium has stored thereon computer instructions, which when executed by a processor, cause a computer device to perform the method steps of any of claims 1 to 8.

Description

Random data enhancement method, device and storage medium based on three-dimensional space structure modeling Technical Field The invention relates to the technical field of computer vision and three-dimensional modeling, in particular to a random data enhancement method, equipment and storage medium based on three-dimensional space structure modeling. Background With the application of deep learning technology in the field of computer vision, the training effect of the deep neural network model depends on the number and quality of training samples. In an actual application scene, obtaining annotation data covering various target object forms, different illumination conditions and complex background environments generally faces the problems of high acquisition cost and long annotation period. In order to expand the training dataset and improve the generalization ability of the model, the data enhancement method is widely applied to the pre-processing stage of model training, aiming at supplementing the deficiency of the real acquisition data by the samples generated by the algorithm. Existing data enhancement techniques focus mainly on processing in the two-dimensional image domain. Conventional image processing methods typically involve performing geometric transformations such as rotation, translation, scaling, flipping, etc. on the original two-dimensional image, or adjusting the brightness, contrast, and saturation of the image to simulate illumination changes, and simulating image degradation by adding noise or blur. In addition, the enhancement method based on cut-and-paste is used for increasing the diversity of the combination of the object and the background by dividing and pasting the two-dimensional image block of the target object from the original image to different background images, and the method enriches the semantic content of the training sample to a certain extent. However, the enhancement method based on the two-dimensional image layer is difficult to simulate the interaction of real three-dimensional space transformation and physical illumination. Firstly, the perspective deformation of a target object under different visual angles cannot be generated by simple geometric transformation, and the real influence of the light source position change on the light and shade distribution and shadow of the object surface is difficult to reflect by simple brightness adjustment, so that the generated sample lacks the consistency of the geometry and the shadow. Secondly, due to lack of depth information and physical constraint, the shearing-pasting-based method is easy to cause the situation that the proportion between the synthesized target object and the background is out of balance, the spatial position is suspended or the physical rule is in violation of overlapping, and the shielding relation between the objects cannot be processed correctly. In addition, the conventional background fusion method is difficult to keep high-frequency texture details and parallax effects of the environment, and the single noise addition method is difficult to simulate complex sensor degradation and environment interference in a real imaging link. Therefore, how to ensure the correctness of geometric perspective relation, illumination distribution and physical space logic in the data enhancement process and generate training samples with sense of reality texture and imaging characteristics is a technical problem to be solved in the field. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a random data enhancement method, equipment and a storage medium based on three-dimensional space structure modeling, which solve the problems that the prior two-dimensional data enhancement technology can not simulate geometrical perspective change and illumination interaction of a target object in a three-dimensional space, and the correctness of physical space logic and shielding relation between the target object and a background environment is difficult to ensure, so that a generated training sample lacks reality and physical consistency. In order to solve the problems, the invention provides the following technical scheme: The invention provides a random data enhancement method based on three-dimensional space structure modeling, which adopts the following technical scheme: a random data enhancement method based on three-dimensional space structure modeling comprises the following steps: S100, performing three-dimensional geometric and appearance modeling on a target object, establishing a scene space model, and defining an internal reference matrix of a virtual camera; S200, randomly setting the position and the posture of the target object in the world coordinate system in the scene space model, configuring virtual light source parameters, and generating a camera position and a camera orientation which meet the viewing cone constraint condition; S300, based on an internal reference mat