CN-122023746-A - Semantic decomposition method and system for three-dimensional components of ancient building

CN122023746ACN 122023746 ACN122023746 ACN 122023746ACN-122023746-A

Abstract

The invention discloses a semantic decomposition method and a semantic decomposition system for three-dimensional components of an ancient building. The method comprises the steps of generating projection images of multiple view angles based on an original three-dimensional model, projecting triangular patches of the original three-dimensional model onto the projection images of all view angles, respectively inputting the projection images of the multiple view angles into a trained GAS segmentation network to obtain corresponding semantic masks, finding out pixels falling inside a two-dimensional triangle, dividing the pixels falling inside the two-dimensional triangle into different polygonal areas based on the semantic masks, dividing the different polygonal areas to generate multiple new triangles, reinforcing the original three-dimensional model based on the multiple new triangles to obtain a reinforced three-dimensional model, re-projecting the triangular patches of the reinforced three-dimensional building, and applying a majority voting method to the semantic masks of the projection images of all view angles to obtain a final semantic label of each triangular patch. The invention realizes the accurate and complete extraction of the slender components in the building.

Inventors

GONG YIPING
LIU SIJIN
DUAN HAIWANG
LIU YUHAN
XU BO

Assignees

湖北工业大学

Dates

Publication Date: 20260512
Application Date: 20260414

Claims (10)

1. A method for semantic decomposition of three-dimensional components of an ancient building, the method comprising: Acquiring an original three-dimensional model of an ancient building, and generating projection images of a plurality of view angles based on the original three-dimensional model; Projecting the triangular patches of the original three-dimensional model onto projection images of all view angles to obtain two-dimensional triangles on the projection images; Respectively inputting projection images of a plurality of visual angles into a trained GAS segmentation network to obtain corresponding semantic masks; The method comprises the steps of finding out pixels falling in a two-dimensional triangle, dividing the pixels falling in the two-dimensional triangle into different polygonal areas based on a semantic mask, dividing the different polygonal areas to generate a plurality of new triangles, reinforcing an original three-dimensional model based on the new triangles to obtain a reinforced three-dimensional model, re-projecting triangular patches of the reinforced three-dimensional building, and applying a majority voting method to the semantic mask of projection images of all view angles to obtain a final semantic label of each triangular patch.
2. The method of claim 1, wherein nine views are defined for each of the original three-dimensional models, and the projection images of the nine views include a top view and eight oblique views having an oblique angle of a first predetermined angle.
3. The method for semantic decomposition of three-dimensional structures of ancient building according to claim 2, wherein projecting triangular patches of said original three-dimensional model onto projection images of respective perspectives comprises: Triangular patches defining an original three-dimensional model Normal vector of triangular patch And a normal vector of the projected image ; The triangular dough piece is processed according to ) The projection rules smaller than the second preset angle are projected onto the projection images of all the visual angles.
4. The ancient building three-dimensional member semantic decomposition method according to claim 1, wherein the GAS segmentation network comprises an input feature extraction module, an integrated encoder and a progressive decoder, and wherein the GAS segmentation network processing comprises: Inputting the projection image into a feature extraction module, and obtaining original features through convolution operation, maximum pooling and downsampling operation; Inputting the original features into an integrated encoder, and extracting and fusing the multi-scale features to obtain multi-scale context fusion features; and inputting the multi-scale context fusion features into a progressive decoder to obtain corresponding semantic masks.
5. The method for semantic decomposition of three-dimensional structures of ancient building according to claim 4, wherein the integrated encoder comprises a plurality of deformable SE bottleneck structures and an ASPP module, and wherein the processing of the integrated encoder comprises: Extracting multi-scale features from the original features through a plurality of deformable SE bottleneck structures to obtain multi-scale features; fusing the multi-scale features through an ASPP module to obtain multi-scale context fusion features; the deformable SE bottleneck structure comprises a plurality of convolution layers and a SE attention module, and the processing procedure of the deformable SE bottleneck structure comprises the following steps: and sequentially passing the original features through a CBR convolution layer, a dCBR convolution layer, a CB convolution layer and an SE attention mechanism to obtain first features, passing the original features through a 2D convolution layer to obtain second features, fusing the first features and the second features, and then passing an activation function to obtain features with corresponding scales.
6. The method of claim 5, wherein the CBR convolution layer comprises a 1 x1 convolution, a batch normalization, and a ReLU activation function connected in sequence, the dCBR convolution layer comprises a 3 x 3 deformable convolution, a batch normalization, and a ReLU activation function, and the CB convolution layer comprises a 1 x1 convolution and a batch normalization.
7. The method for semantic decomposition of three-dimensional structures of ancient building according to claim 3, wherein the segmenting the different polygonal regions to generate a plurality of new triangles, enhancing the original three-dimensional model based on the plurality of new triangles comprises: Extracting boundary points from each polygonal area by ALPHA SHAPE algorithm, and ordering all boundary points clockwise to obtain a set Wherein Representing boundary points; When collecting And when the number of the triangle areas is larger than 3, triangulating the polygonal areas by adopting a Bowyer-Watson algorithm to obtain a plurality of new triangles.
8. A three-dimensional structural member semantic decomposition system for an ancient building, comprising: the method comprises the steps of obtaining a model, wherein the model is used for obtaining an original three-dimensional model of an ancient building, and generating projection images of multiple visual angles based on the original three-dimensional model; the projection module is used for projecting the triangular patches of the original three-dimensional model onto projection images of all view angles to obtain two-dimensional triangles on the projection images; the semantic segmentation module is used for respectively inputting projection images of a plurality of view angles into the trained GAS segmentation network to obtain corresponding semantic masks; The enhancement and updating module is used for finding out pixels falling in the two-dimensional triangle, dividing the pixels falling in the two-dimensional triangle into different polygonal areas based on semantic masks, dividing the different polygonal areas to generate a plurality of new triangles, enhancing the original three-dimensional model based on the new triangles to obtain an enhanced three-dimensional model, re-projecting triangular patches of the enhanced three-dimensional building, and applying a majority voting method to the semantic masks of projection images of all view angles to obtain a final semantic tag of each triangular patch.
9. A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor to perform the method of semantic decomposition of a three-dimensional structure of an historic building of any one of claims 1 to 7.
10. An electronic device comprising a processor and a memory, the processor being electrically connected to the memory, the memory being for storing instructions and data, the processor being for performing the steps of the method for semantic decomposition of a three-dimensional structure of an ancient building as claimed in any one of claims 1 to 7.

Description

Semantic decomposition method and system for three-dimensional components of ancient building Technical Field The invention relates to the technical field of computer vision, in particular to a semantic decomposition method, a semantic decomposition system, a storage medium and electronic equipment for three-dimensional components of an ancient building. Background Building semantic decomposition is important in the fields of digital archiving of cultural relics, structural analysis of ancient buildings, modeling of urban knowledge, protection of cultural heritage and the like. Three-dimensional building models based on polyhedral models or point cloud representations have made remarkable progress in the decomposition of coarse-grained components such as roofs, facades, windows, and the like. However, the decomposition of fine components such as ridges, beams and columns in historic buildings, or curtain wall mullions and trusses in modern buildings is still relatively inadequate. These components are not only structurally and visually critical, but also carry important architectural and cultural information. Failure to accurately extract these elongated members limits the usefulness of the three-dimensional building model in high-level analysis and knowledge-driven applications. The segmentation challenge of the elongated member arises from superposition problems both geometrically and semantically. Geometrically, elongated, curved members are tightly connected to adjacent surfaces, making them difficult to extract in a topologically consistent manner. Semantically, these elements are similar to the texture of adjacent roofing or facades, which tend to make the characteristic response weak or ambiguous. In recent years, image-based semantic decomposition methods have demonstrated that high resolution texture information can compensate for geometric ambiguity to some extent, making oblique photogrammetry building models containing fine geometry and texture an important data form for building semantic decomposition. Accordingly, more and more research is beginning to focus on three-dimensional building parsing methods where images are geometrically fused. However, existing convolutional neural network architectures often rely on a fixed receptive field that cannot be effectively aligned with the geometric flow of elongated or curved members, resulting in the frequent occurrence of breaks and discontinuities in these members in two-dimensional predictions, which in turn affect the topological consistency of the three-dimensional grid. To address these problems, existing research has been developed primarily in two technical directions, one approach to learn flexible sampling offsets through deformable convolution to better align elongated or curved structures, and another approach to cross-scale feature aggregation using encoder-decoder architecture, feature pyramid and attention mechanisms, aimed at improving extraction of fine structures. Although these strategies can alleviate the problem of feature loss of the fine feature to some extent, the problem of feature accurate extraction of the fine feature is not fundamentally solved. Furthermore, the problem is further exacerbated by the resolution mismatch between the high resolution image and the relatively coarse three-dimensional grid. The elongated structures clearly visible in the two-dimensional image tend to degrade into sparse triangulated or overly smooth surfaces during the grid reconstruction process, resulting in refined two-dimensional semantic information that is also difficult to obtain topologically consistent, complete and refined building component extraction results after projection onto the three-dimensional photogrammetry model. Disclosure of Invention The invention provides a semantic decomposition method, a semantic decomposition system, a storage medium and electronic equipment for three-dimensional components of an ancient building, which can solve the problems of semantic continuity and topological consistency of an elongated component in the characteristic extraction process and realize accurate and complete extraction of the elongated component in the building. The invention provides a semantic decomposition method of three-dimensional components of an ancient building, which comprises the following steps: Acquiring an original three-dimensional model of an ancient building, and generating projection images of a plurality of view angles based on the original three-dimensional model; Projecting the triangular patches of the original three-dimensional model onto projection images of all view angles to obtain two-dimensional triangles on the projection images; Respectively inputting projection images of a plurality of visual angles into a trained GAS segmentation network to obtain corresponding semantic masks; The method comprises the steps of finding out pixels falling in a two-dimensional triangle, dividing the pixels falling in the