EP-4738276-A1 - IDENTIFYING AND MATCHING ASSETS IN 3D DATA

EP4738276A1EP 4738276 A1EP4738276 A1EP 4738276A1EP-4738276-A1

Abstract

The invention pertains to a computer-implemented method for automatically identifying representations (20) of assets in a 3D model, the method comprising: segmenting the 3D model into a plurality of cluster elements, at least a subset of cluster elements including a representation of an asset; organizing each cluster element into an asset structure (25) comprising a plurality of nodes (26, 27) and edges; and computing an asset embedding (28) for each of the nodes and edges, each asset embedding representing an asset or a part thereof, wherein the method further comprises, particularly in real time: obtaining a search template from the user, the search template corresponding to a user-selected asset or a user-selected representation (10) of an asset; generating, based on the obtained search template, a template structure (15) comprising a plurality of nodes (16, 17) and edges and a template embedding (18) for each of the nodes and edges, each template embedding representing the user-selected asset or a part thereof; comparing the template embeddings with the asset embeddings of a multitude of asset structures to find representations of assets in the 3D model that are similar to the user-selected representation of an asset; and providing information about the found representations of assets to the user.

Inventors

Winistörfer, Martin
PRADOS-TORREBLANCA, Andres
LOPEZ FERNANDEZ, LUIS

Assignees

Hexagon Innovation Hub GmbH

Dates

Publication Date: 20260506
Application Date: 20241105

Claims (15)

Computer-implemented method (100) for automatically identifying representations (20) of assets in a 3D model (1) of an environment, the method comprising - segmenting (110) the 3D model (1) into a plurality of cluster elements, at least a subset of cluster elements including a representation (20) of an asset; - organizing (120) each cluster element into an asset structure (25) comprising a plurality of nodes (26, 27) and edges; and - computing (130) an asset embedding (28) for each of the nodes (26, 27) and edges, each asset embedding representing an asset or a part thereof; wherein the method further comprises: - obtaining (150) a search template from the user, the search template corresponding to a user-selected asset or a user-selected representation (10) of an asset; - generating (160), based on the obtained search template, a template structure (15) comprising a plurality of nodes (16, 17) and edges and a template embedding (18) for each of the nodes (16, 17) and edges, each template embedding (18) representing the user-selected asset or a part thereof; - comparing (170) the template embeddings (18) with the asset embeddings (28) of a multitude of asset structures (25) to find representations (20) of assets in the model (1) that are similar to the user-selected representation (10) of an asset; and - providing (190) information about the found representations (20) of assets to the user.
Method (100) according to claim 1, wherein the asset structure (25) and the template structure (15) are graph structures or tree structures, particularly octrees, in particular wherein the template structure (15) is generated (160) so that the template embeddings (18) have a same format as the asset embeddings (28).
Method (100) according to claim 1 or claim 2, wherein the 3D model (1) is a point cloud or a mesh, particularly wherein the environment comprises a multitude of assets of a multitude of different asset types.
Method (100) according to any one of the preceding claims, wherein comparing (170) the template embeddings (18) with the asset embeddings (28) is performed by a trained neural network.
Method (100) according to any one of the preceding claims, comprising a pre-processing (105) of the 3D model (1), the pre-processing (105) being completed before the template is obtained (150) from the user, wherein at least the segmenting (110) of the 3D model and the organizing (120) of the cluster elements are part of the pre-processing (105), particularly wherein also the computing (130) of the node embedding for each of the nodes and edges is part of the pre-processing (105).
Method (100) according to any one of the preceding claims, wherein at least the steps of generating (160) the template structure (15), comparing (170) the template embeddings (18) with the asset embeddings (28), and providing (190) the information to the user are performed in real time.
Method (100) according to any one of the preceding claims, comprising displaying (140) the 3D model (1) to a user, particularly wherein the displayed 3D model is a pre-processed 3D model.
Method (100) according to claim 7, wherein obtaining (150) the template from the user comprises enabling the user to select a representation (20) of the asset in the displayed 3D model (1).
Method (100) according to claim 7 or claim 8, wherein providing (190) the information about found representations (20) of assets to the user comprises displaying the found representations (20) of assets in the 3D model (1) in a highlighted manner.
Method (100) according to any one of the preceding claims, wherein the 3D model (1) is a point cloud, and the obtained search template is or comprises a parametrized model, particularly a BIM or a CAD model, and generating (160) the template structure (15) comprises - generating a synthetic point cloud from the parameterized model, particularly using data augmentation; or - comparing one or more existing similar parametrized models of a specific asset and using an existing point cloud of that asset.
Method (100) according to any one of the preceding claims, wherein the obtained search template comprises at least one of text, video and images related to the user-selected asset, and generating (160) the template structure (15) comprises recognizing the user-selected asset from the text, video or images, particularly wherein the 3D model (1) is a point cloud and generating (160) the template structure (15) further comprises using an existing point cloud of the recognized user-selected asset.
Method (100) according to any one of the preceding claims, comprising a filtering (180) of outliers in the found representations (20) of assets before providing (190) the information to the user, in particular wherein the method comprises enabling the user to provide feedback regarding the provided found representations (20) of assets, wherein the feedback is used for improving the filtering (180), particularly wherein the feedback comprises selected examples of true positives and false positives
Method (100) according to any one of the preceding claims, wherein organizing (130) the cluster elements into the asset structure (25) is performed by a trained neural network, particularly wherein - generating (160) the template structure (15) is performed by the same trained neural network; and/or - the trained neural network is a Graph Neural Network that has been trained using symbolic rules to learn relationship constraints.
Method (100) according to any one of the preceding claims, wherein organizing (130) the cluster elements into the asset structure (25) comprises: - using vision foundation models on images to identify and group assets; - enforcing multi-view consistency; - establishing the structure of all assets based on neighbourhood; - refining the structure to a required granularity; - creating an embedding vector for each element; and - assigning the embedding vectors to the structure, particularly wherein the asset structure (25) is a hierarchical structure.
Computer program product comprising program code having computer-executable instructions for performing the method (100) according to any one of claims 1 to 14.

Description

The present invention pertains to a method of automatically identifying representations of assets in a three-dimensional data set such as a point cloud. It is known to capture 3D data of a surrounding as a point cloud, for instance using a laser scanner or similar capture device. The captured 3D data can then be visualized on a display to a user. Sometimes, it is necessary to find a certain asset in the 3D data. For example, the user might want to find specific machines or types of doors in the point cloud data. Depending on the size of the 3D data and the amount of assets in the data, this can be a tedious and time-consuming work. It would therefore be desirable to provide a method allowing a user to easily initiate an automated search for assets in 3D data. One solution is to run object detection algorithms in 3D for a specific type of assets. To distinguish the assets, the algorithms would need to be trained on many different types of assets. Disadvantageously, the specified asset must be known in advance, i.e., before the algorithms are trained and deployed. This leads to a very large number of classes, increasing the necessary size of the neural network capacity. Also, there is no or only a limited amount of data of an asset available to train on. The large number of classes added to the limited amount of data for certain classes produces a data imbalance during training. Consequently, the algorithm is prone to classify unseen objects with the most common classes used during training or detect less times uncommon object classes. Alternatively, instead of finding assets directly in 3D space, assets are located on corresponding images using 2D object detection, and once identified, projected into 3D to highlight the asset. Disadvantageously, this solution only works if images as well as an accurate projection between the images and the point cloud are available in the data. Other disadvantages include the multi-view aggregation, i.e. the handling of multiple detection of same objects, and the poor performance in distinguishing in between foreground and background of the point cloud, i.e. between points of the object and points behind the object from the point of view of the image. US 2023/0028242 A1 discloses generating a unique identification code for an industrial commodity, and US 11,704,343 B2 discloses an Artificial-Intelligence based method for associating data regarding physical-world assets from a plurality of databases. Point cloud registration, i.e., aligning different scans to a single geometrically consistent point cloud, is a main application for 3D feature descriptors. For instance, an approach for 3D point cloud registration is disclosed by K. Fu et al.: "Robust Point Cloud Registration Framework Based on Deep Graph Matching", 9 November 2022 (arXiv:2211.04696v1). A. Zeng et al., "3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions", 9 April 2017 (arXiv:1603.08182v3), disclose an approach for matching local geometric features on real-world depth images. While this approach follows the 2D paradigm, i.e., to find correspondences between features in 3D space using feature descriptor and matching algorithm, these features are not representing objects or assets. There is no association between the 3D feature and actual type of asset, like an asset from a catalogue or library. It is possible to consider assets as a whole or as the sum of their sub-parts. This involves extending the basic principle of 2D matching into 3D space and providing solutions on how to handle and compare similar assets or objects. For instance, this task of finding assets in 3D space based on template could be accomplished in the following way: creating a 3D template, computing an embedding for the template, processing point cloud sample with same window size, computing list of 3D embeddings, and comparing the list with template embeddings to identify potential matches. Whereas this approach is similar to 2D matching methodologies, in 3D space the sampling of the point cloud disadvantageously is very computation expensive. Also, computation in real-time is not possible, and the user needs to wait many minutes for the result. In addition, edges and fine details are difficult to segment in point clouds compared to images. It is therefore an object of the present invention to provide an improved method for identifying assets in a point cloud based on a user input. It is a particular object to provide such a method that allows to identify the assets reliably and in real time. At least one of these objects is achieved by the method of claim 1 and/or the dependent claims of the present invention. The claimed invention pertains to a computer-implemented method for automatically identifying representations of assets in a 3D model of an environment. The 3D model, e.g., may be a point cloud or a mesh, and the environment may comprise a multitude of assets of a multitude of different asset types. The method comp