CN-121999337-A - Method and device for dividing functional parts of three-dimensional model, electronic equipment and storage medium

CN121999337ACN 121999337 ACN121999337 ACN 121999337ACN-121999337-A

Abstract

The application relates to a method, a device, electronic equipment and a storage medium for dividing functional parts of a three-dimensional model, wherein the method comprises the steps of obtaining point cloud data and multi-view two-dimensional images of the three-dimensional model, binding face IDs of curved surfaces for each sampling point of the point cloud data based on the face IDs of the curved surfaces in the three-dimensional model, binding the face IDs of curved surfaces corresponding to two-dimensional planes where each pixel of the multi-view two-dimensional images is located, conducting cross-modal attention fusion on the point cloud data and the multi-view two-dimensional images by taking the face IDs as association references through a preset semantic segmentation model, obtaining fusion point cloud characteristics, determining functional area labels of each sampling point in the fusion point cloud characteristics, mapping the functional area labels of each sampling point to the face IDs corresponding to the sampling points, obtaining functional area labels corresponding to the face IDs, and combining the face IDs with the same functional area labels into a curved surface set. The application realizes the accurate identification of each functional part of the three-dimensional model.

Inventors

Xiang Huatao
HE HAO
HUANG YONGBO
LIANG JIACHEN
WEN YINAN
DAI FEI
ZHANG CHENGJIE

Assignees

重庆蓝电汽车科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260407

Claims (10)

1. A method for partitioning a functional part of a three-dimensional model, the method comprising: acquiring point cloud data of a three-dimensional model and a multi-view two-dimensional image, wherein the point cloud data are used for indicating local geometric detail information of each curved surface of the three-dimensional model, and the multi-view two-dimensional image is used for indicating a global space structure of the three-dimensional model; Binding the surface ID of the curved surface to which each sampling point of the point cloud data belongs based on the surface ID of each curved surface in the three-dimensional model, and binding the surface ID of the corresponding curved surface of the two-dimensional plane where each pixel of the multi-view two-dimensional image is located; Performing cross-modal attention fusion on the point cloud data and the multi-view two-dimensional image by taking a face ID as an association reference through a preset semantic segmentation model to obtain fusion point cloud characteristics, and determining a functional area label of each sampling point in the fusion point cloud characteristics; Mapping the functional area label of each sampling point to the surface ID bound by the corresponding sampling point to obtain the functional area label corresponding to each surface ID; The face IDs with the same function area labels are combined into one curved surface set, wherein each curved surface set is used for indicating one functional part.
2. The method of claim 1, wherein the cross-modal attention fusion of the point cloud data and the multi-view two-dimensional image with the face ID as a correlation criterion through a preset semantic segmentation model to obtain fusion point cloud features, and the determining of the functional area label of each sampling point in the fusion point cloud features comprises: inputting the point cloud data and the multi-view two-dimensional image into a preset semantic segmentation model; Performing feature aggregation on the multi-view two-dimensional image through an image encoder in the semantic segmentation model to obtain image features, and performing feature extraction on the point cloud data through a point cloud encoder in the semantic segmentation model to obtain point cloud features, wherein the image features can capture feature relevance of the same curved surface under multiple views, and the point cloud features can distinguish areas with adjacent physical distances and belonging to different parts; Performing cross-modal attention fusion on the image features and the point cloud features through a cross-modal attention layer, and associating the image features and the point cloud features corresponding to the same face ID in the fusion process to obtain fusion point cloud features, wherein the image encoder and the point cloud encoder share the cross-modal attention layer; And decoding and classifying the fusion point cloud features through a decoder to obtain the functional area labels of each sampling point.
3. The method of claim 2, wherein feature aggregating the multi-view two-dimensional image by an image encoder in the semantic segmentation model, the obtaining image features comprises: Mapping the face ID bound by each pixel in the multi-view two-dimensional image into a feature vector by an image encoder; Aggregating the feature vectors of all pixels corresponding to the same face ID in a single two-dimensional image in the space dimension to obtain a single view feature vector of each face ID in the single two-dimensional image; fusing all the two-dimensional images aiming at the single-view feature vector of the same face ID to obtain a fused feature vector of each face ID; And splicing the fusion feature vectors of each face ID to obtain the image features.
4. The method of claim 2, wherein cross-modal attention fusion is performed on the image features and the point cloud features through a cross-modal attention layer, and the image features and the point cloud features corresponding to the same face ID are associated in the fusion process, and obtaining the fused point cloud features comprises: Unifying characteristic channels of the image characteristics and the point cloud characteristics through a cross-modal attention layer; Based on the consistency of the surface IDs, carrying out attention weighted fusion on the image features and the point cloud features after the channels are unified to obtain first fusion point cloud features, wherein the first fusion point cloud features are used for indicating preliminary association of the image features and the point cloud features; Carrying out residual addition and normalization on the first fusion point cloud characteristic and the point cloud characteristic after unifying the channels to obtain a second fusion point cloud characteristic, wherein the second fusion point cloud characteristic is used for fusing the association between the image characteristic and the point cloud characteristic on the basis of keeping the point cloud characteristic; and mapping the second fusion point cloud characteristics through feedforward network characteristics, and then carrying out residual error addition and normalization on the second fusion point cloud characteristics to obtain final fusion point cloud characteristics.
5. The method of claim 4, wherein performing attention weighted fusion of the channel-unified image features and the point cloud features based on the face ID consistency, to obtain a first fused point cloud feature comprises: Generating a query vector according to the point cloud characteristics after the channel is unified, and generating a key vector and a value vector according to the image characteristics after the channel is unified, wherein the query vector is used for indicating the characteristics of each sampling point, the key vector is used for indicating the characteristics of each pixel, and the value vector is used for indicating the characteristic value of each pixel; traversing each sampling point and each pixel, and determining a bias item according to whether the surface IDs bound by the sampling points and the pixels are consistent, wherein the bias item corresponding to the consistent surface ID is larger than the bias item corresponding to the inconsistent surface ID; and carrying out attention weighted calculation based on the query vector, the key vector, the value vector and the bias term to obtain a first fusion point cloud characteristic.
6. The method of claim 2, wherein decoding the fused point cloud features by a decoder to obtain a functional area label for each sample point comprises: Mapping the characteristics of each sampling point in the fusion point cloud characteristics to the dimension corresponding to each functional area label through the decoder to obtain the score of each sampling point under each functional area label; normalizing the scores to obtain the prediction probability of each sampling point under the corresponding dimension of each functional area label; and selecting a functional area label corresponding to the maximum prediction probability as the functional area label of the sampling point.
7. The method of claim 1, wherein the training process of the semantic segmentation model comprises: Acquiring training data and inputting the training data into an initial semantic segmentation model, wherein the training data comprises a sample multi-view two-dimensional image, sample point cloud data and actual functional area labels corresponding to each sampling point; processing the sample multi-view two-dimensional image and the sample point cloud data through the initial semantic segmentation model, and outputting a prediction function area label of each sampling point; calculating a loss between the predicted functional region label and the actual functional region label based on a joint loss function, wherein the joint loss function is constructed based on weighted cross entropy loss, dess loss, and in-plane consistency loss; In the back propagation process, according to loss signals of an image encoder and a point cloud encoder, parameters of the image encoder, the point cloud encoder, a cross-modal attention layer and a decoder are updated together until the model converges, and a trained semantic segmentation model is obtained.
8. A functional part dividing apparatus of a three-dimensional model, the apparatus comprising: the acquisition module is used for acquiring point cloud data of a three-dimensional model and multi-view two-dimensional images, wherein the point cloud data are used for indicating local geometric detail information of each curved surface of the three-dimensional model, and the multi-view two-dimensional images are used for indicating a global space structure of the three-dimensional model; The binding module is used for binding the surface ID of the curved surface for each sampling point of the point cloud data based on the surface ID of each curved surface in the three-dimensional model, and binding the surface ID of the corresponding curved surface of the two-dimensional plane where each pixel of the multi-view two-dimensional image is located; The fusion module is used for carrying out cross-modal attention fusion on the point cloud data and the multi-view two-dimensional image by taking a face ID as an association reference through a preset semantic segmentation model to obtain fusion point cloud characteristics, and determining a functional area label of each sampling point in the fusion point cloud characteristics; the mapping module is used for mapping the functional area label of each sampling point to the surface ID bound by the corresponding sampling point to obtain the functional area label corresponding to each surface ID; and the distribution module is used for merging the surface IDs with the same function area labels into one curved surface set, wherein each curved surface set is used for indicating one functional part.
9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; A memory for storing a computer program; A processor for implementing the method of any of claims 1-7 when executing a program stored on a memory.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-7.

Description

Method and device for dividing functional parts of three-dimensional model, electronic equipment and storage medium Technical Field The present application relates to the field of neural networks, and in particular, to a method and apparatus for dividing functional parts of a three-dimensional model, an electronic device, and a storage medium. Background Computer aided engineering analysis (Computer AIDED ENGINEERING, CAE) is a key link in the product research and development process, and in the CAE analysis preprocessing stage, parts and components of a geometric model imported from a Computer aided design (Computer AIDED DESIGN, CAD) system are required to be grouped and named, and parts and components of different structures are required to adopt different grid division standards, which is the basis for ensuring the CAE analysis precision. At present, the industry mainly relies on an automatic tool based on predefined rule scripts and simple string matching to finish grouping and naming of parts of a CAD geometric model, the grouping basis of the method is single, the grouping basis can be only based on basic properties such as materials, sizes and the like, the accurate distinction of different functional parts cannot be realized, and further different grid division standards cannot be set according to different structural characteristics. The defects can cause mismatch between grid division and the actual structure of the parts, directly influence the accuracy and efficiency of CAE analysis, and are difficult to meet the actual requirements of high-precision CAE analysis. Disclosure of Invention The application provides a method and a device for dividing functional parts of a three-dimensional model, electronic equipment and a storage medium, and aims to solve the problem that the functional parts of the three-dimensional model cannot be accurately distinguished. In a first aspect, the present application provides a method for dividing functional parts of a three-dimensional model, the method comprising: acquiring point cloud data of a three-dimensional model and a multi-view two-dimensional image, wherein the point cloud data are used for indicating local geometric detail information of each curved surface of the three-dimensional model, and the multi-view two-dimensional image is used for indicating a global space structure of the three-dimensional model; Binding the surface ID of the curved surface to which each sampling point of the point cloud data belongs based on the surface ID of each curved surface in the three-dimensional model, and binding the surface ID of the corresponding curved surface of the two-dimensional plane where each pixel of the multi-view two-dimensional image is located; Performing cross-modal attention fusion on the point cloud data and the multi-view two-dimensional image by taking a face ID as an association reference through a preset semantic segmentation model to obtain fusion point cloud characteristics, and determining a functional area label of each sampling point in the fusion point cloud characteristics; Mapping the functional area label of each sampling point to the surface ID bound by the corresponding sampling point to obtain the functional area label corresponding to each surface ID; The face IDs with the same function area labels are combined into one curved surface set, wherein each curved surface set is used for indicating one functional part. Optionally, performing cross-modal attention fusion on the point cloud data and the multi-view two-dimensional image by using a preset semantic segmentation model and using a face ID as an association reference to obtain fusion point cloud features, and determining a functional area label of each sampling point in the fusion point cloud features includes: inputting the point cloud data and the multi-view two-dimensional image into a preset semantic segmentation model; Performing feature aggregation on the multi-view two-dimensional image through an image encoder in the semantic segmentation model to obtain image features, and performing feature extraction on the point cloud data through a point cloud encoder in the semantic segmentation model to obtain point cloud features, wherein the image features can capture feature relevance of the same curved surface under multiple views, and the point cloud features can distinguish areas with adjacent physical distances and belonging to different parts; Performing cross-modal attention fusion on the image features and the point cloud features through a cross-modal attention layer, and associating the image features and the point cloud features corresponding to the same face ID in the fusion process to obtain fusion point cloud features, wherein the image encoder and the point cloud encoder share the cross-modal attention layer; And decoding and classifying the fusion point cloud features through a decoder to obtain the functional area labels of each sampling point. Optionally, per