CN-122023671-A - Multi-view image semantic two-dimensional mapping labeling method based on three-dimensional reconstruction

CN122023671ACN 122023671 ACN122023671 ACN 122023671ACN-122023671-A

Abstract

The invention relates to a multi-view image semantic two-dimensional mapping labeling method based on three-dimensional reconstruction, which comprises the steps of carrying out three-dimensional reconstruction and semantic label labeling on a high-overlapping multi-view input image to obtain a labeled three-dimensional model with semantic labels, then carrying out rendering on the labeled three-dimensional model under the view angles of all the input images through a joint rendering technology, generating a plurality of rendering depth maps and a patch index map of the labeled three-dimensional model, detecting a shielding region by utilizing depth differences, calculating pixel-level visibility confidence coefficient of each pixel in the labeled three-dimensional model, and dynamically mapping the semantic labels on the labeled three-dimensional model to a two-dimensional image according to a pixel shielding region detection result and all the obtained pixel-level visibility confidence coefficient. Therefore, the accurate mapping from the three-dimensional semantics to the two-dimensional labeling is realized, the targets are ensured to be labeled as the same semantic information under any view angle, and the consistency labeling of the full view angle coverage of the targets is realized.

Inventors

CHEN HAIZHEN
WU YAWEN
NIE QIAN
ZHU MENGYUAN
YUAN ZHENG
DONG YUE
ZHAO SAISHUAI
LUO BOJIN

Assignees

宁波市测绘和遥感技术研究院(宁波市自然资源和规划调查监测中心)

Dates

Publication Date: 20260512
Application Date: 20260408

Claims (6)

1. The multi-view image semantic two-dimensional mapping labeling method based on three-dimensional reconstruction is characterized by comprising the following steps of: step 1, performing three-dimensional reconstruction on a high-overlapping multi-view input image to generate an original three-dimensional model, and labeling semantic tags on the original three-dimensional model to obtain a labeled three-dimensional model with semantic tags, wherein the original three-dimensional model is provided with a plurality of triangular patches, and each triangular patch is provided with a semantic tag corresponding to the three-dimensional model to which the triangular patch belongs; Step 2, respectively rendering the marked three-dimensional model under the view angles of all input images in the high-overlapping multi-view input images by a joint rendering technology, and synchronously generating a plurality of rendering depth maps and a plurality of patch index maps of the marked three-dimensional model, wherein the number of the rendering depth maps and the number of the patch index maps are respectively consistent with the number of the high-overlapping multi-view input images, and the patch index maps directly record triangular patch IDs corresponding to each pixel; Step 3, estimating a scene depth map of each input image in the high-overlapping multi-view input images by utilizing a multi-view stereo matching method, and performing pixel-by-pixel subtraction processing on a rendering depth map of each input image under the same camera view and the scene depth map to obtain a depth difference value for detecting each pixel shielding area in each input image; Step 4, calculating the view angle weight of each pixel, and calculating the pixel level visibility confidence of each pixel in each input image based on the preset pixel depth difference attenuation factor and the obtained view angle weight of the corresponding pixel; And step 5, processing according to the depth difference value of each pixel in the obtained input image and the obtained visibility confidence level of all pixel levels, and dynamically mapping the semantic label on the marked three-dimensional model to the two-dimensional image.
2. The multi-view image semantic two-dimensional mapping labeling method based on three-dimensional reconstruction according to claim 1, wherein in step 1, the three-dimensional reconstruction process of the original three-dimensional model comprises the following steps: step a1, calculating the camera pose and sparse point cloud of a high-overlapping multi-view input image through a motion recovery structure; Step a2, calculating a scene depth map of each input image in the high-overlapping multi-view input image on the basis of the camera pose and sparse point cloud of the obtained high-overlapping multi-view input image by a multi-view stereo matching method, and generating dense point cloud of the high-overlapping multi-view input image on the basis of the scene depth maps of all the obtained input images; And a step a3 of generating a three-dimensional model on the basis of the obtained dense point cloud by using a poisson reconstruction mode, wherein the three-dimensional model is the original three-dimensional model.
3. The multi-view image semantic two-dimensional mapping labeling method based on three-dimensional reconstruction according to claim 2, wherein in step 1, the process of labeling semantic labels on the original three-dimensional model comprises the following steps: Adopting man-machine interaction to perform visual interpretation or utilizing an automatic labeling algorithm to assign category semantic labels to each triangular patch of the original three-dimensional model, generating a three-dimensional model with semantic labels, and taking the three-dimensional model as the labeled three-dimensional model, wherein the labeled three-dimensional model is marked as M, M= (V, F, S), , , V is a vertex set formed by the vertices of all triangular patches on the three-dimensional model M after labeling, V i is an ith vertex in the vertex set V, N v is the total number of vertices in the vertex set V, R 3 represents the positions of points on the three-dimensional model, F is a triangular patch set formed by all triangular patches on the three-dimensional model M after labeling, F j is a jth triangular patch in the triangular patch set F, N f is the total number of triangular patches in the triangular patch set F, S is a semantic set formed by the semantics on all triangular patches on the three-dimensional model M after labeling, S j is a jth semantic in the semantic set S, and Z represents the category to which the semantics S j belongs.
4. The method of claim 1, wherein in step 2, the joint rendering process comprises projecting each triangular patch in the three-dimensional model after labeling to an image plane by using a rasterization technique and writing depth values of the triangular patches and triangular patch IDs into corresponding pixels for pixels passing a z-buffer test, wherein the joint rendering function of the joint rendering process is that C k =(K k , R k , t k ),R k ∈SO(3),t k ∈R 3 ;D k is a depth map obtained by rendering, records a depth value corresponding to each pixel, J k is a patch index map, records a triangular patch ID corresponding to each pixel, C k is a camera parameter set corresponding to the kth image, Is a symbolic functional representation of a joint rendering function The output of (a) is the depth map D k and the patch index map J k ;K k are the internal reference matrices of the camera, R k is the rotation matrix, and t k is the translation vector.
5. The multi-view image semantic two-dimensional mapping labeling method based on three-dimensional reconstruction according to claim 2, wherein in step 3, the detection process of the pixel shielding region comprises the following steps: Step b1, respectively calculating pixel depth difference values between each pixel scene depth map and each pixel rendering depth map and the corresponding pixel scene depth map in the three-dimensional model after labeling, wherein the scene depth map corresponding to the pixel p in the three-dimensional model after labeling is marked as D scene,k (p), the pixel depth difference value between the rendering depth map D k (p) of the pixel p and the scene depth map D scene,k (p) is marked as delta D (p), The depth map of each pixel scene in the three-dimensional model after labeling and the depth map of the corresponding pixel scene are obtained by calculating pixel by pixel under the same camera view angle; And b2, judging according to the obtained pixel depth difference value corresponding to each pixel, wherein when the pixel depth difference value exceeds a preset threshold value, the pixel corresponding to the pixel depth difference value is judged to be a pixel shielding area, label mapping of the pixel shielding area is abandoned, and otherwise, the pixel corresponding to the pixel depth difference value is judged to be an area which is not shielded.
6. The three-dimensional reconstruction-based multi-view image semantic two-dimensional mapping labeling method according to claim 5, wherein in step 4, the pixel-level visibility confidence is calculated by a method of Conf (p) =ω depth ×max(0,n j ·v p ); Wherein Conf (p) is the pixel level visibility confidence of the pixel p, ω depth is the penetration depth coefficient, max (0, n j ·v p ) is the angle of view weight, n j is the unit normal vector of the triangular patch f j in the three-dimensional model, v p is the ray direction vector from the camera center to the pixel p, and γ is the depth difference attenuation coefficient.

Description

Multi-view image semantic two-dimensional mapping labeling method based on three-dimensional reconstruction Technical Field The invention relates to the field of mapping, in particular to a multi-view image semantic two-dimensional mapping labeling method based on three-dimensional reconstruction. Background In the mapping field, data labeling and processing are the core fundamental links of machine learning and AI model training. Currently, mainstream AI models such as deep learning rely on supervised learning, and a mapping relationship between input features and output targets needs to be established through annotation data, so that the models can learn patterns and rules in the data. The scale of the high-quality labeling data directly influences the accuracy of the model. In order to make labels on data, the existing data labeling method mainly comprises a manual frame-by-frame labeling mode, a semi-automatic labeling tool, a multi-view joint labeling mode and other data labeling modes. The manual frame-by-frame annotation is to manually annotate each frame in a video or continuous image sequence to annotate the target type and position of the video or continuous image sequence, the efficiency is low, the cost is high, the semi-automatic annotation is to generate a preliminary annotation by using a pre-training model, the preliminary annotation is corrected manually, a great deal of manual intervention is still needed, and the multi-view joint annotation is to cross-verify annotation consistency through multi-camera data, but the problem of annotation conflict caused by view shielding and deformation is not solved. In general, the existing data labeling method has the following problems that in the data labeling process, the same object is inconsistent in labeling due to shielding, illumination or deformation under different view angles, and consistent labeling of the full view angle coverage of the target is difficult to realize. Disclosure of Invention The technical problem to be solved by the invention is to provide a multi-view image semantic two-dimensional mapping labeling method based on three-dimensional reconstruction aiming at the prior art. The multi-view image semantic annotation back-pushing method can generate a three-dimensional model based on the multi-view image, and the consistency annotation of the target all-view coverage is realized by mapping the three-dimensional model semantic annotation into the two-dimensional image. The technical scheme adopted for solving the technical problems is that the multi-view image semantic two-dimensional mapping labeling method based on three-dimensional reconstruction is characterized by comprising the following steps: step 1, performing three-dimensional reconstruction on a high-overlapping multi-view input image to generate an original three-dimensional model, and labeling semantic tags on the original three-dimensional model to obtain a labeled three-dimensional model with semantic tags, wherein the original three-dimensional model is provided with a plurality of triangular patches, and each triangular patch is provided with a semantic tag corresponding to the three-dimensional model to which the triangular patch belongs; Step 2, respectively rendering the marked three-dimensional model under the view angles of all input images in the high-overlapping multi-view input images by a joint rendering technology, and synchronously generating a plurality of rendering depth maps and a plurality of patch index maps of the marked three-dimensional model, wherein the number of the rendering depth maps and the number of the patch index maps are respectively consistent with the number of the high-overlapping multi-view input images, and the patch index maps directly record triangular patch IDs corresponding to each pixel; Step 3, estimating a scene depth map of each input image in the high-overlapping multi-view input images by utilizing a multi-view stereo matching method, and performing pixel-by-pixel subtraction processing on a rendering depth map of each input image under the same camera view and the scene depth map to obtain a depth difference value for detecting each pixel shielding area in each input image; Step 4, calculating the view angle weight of each pixel, and calculating the pixel level visibility confidence of each pixel in each input image based on the preset pixel depth difference attenuation factor and the obtained view angle weight of the corresponding pixel; And step 5, processing according to the depth difference value of each pixel in the obtained input image and the obtained visibility confidence level of all pixel levels, and dynamically mapping the semantic label on the marked three-dimensional model to the two-dimensional image. In the three-dimensional reconstruction-based multi-view image semantic two-dimensional mapping labeling method, in step 1, the three-dimensional reconstruction process of the original three-dimensional mo