CN-121999281-A - Large language model-based 3D workpiece interpretable classification method

CN121999281ACN 121999281 ACN121999281 ACN 121999281ACN-121999281-A

Abstract

The invention relates to the technical field of computer vision and artificial intelligence, and discloses a large-language-model-based 3D workpiece interpretable classification method, which comprises the steps of firstly extracting topological features with definite geometric meaning from a 3D workpiece point cloud by utilizing a persistent coherent technology, then analyzing based on persistence values of the topological features to distinguish key topological features from noise topological features, firstly generating unique personalized structural semantic descriptions based on the topological features of each 3D workpiece individual to be classified for each 3D workpiece individual to be classified, and secondly combining the personalized descriptions with standard topological descriptions of corresponding workpiece categories, wherein the standard topological descriptions are generated by statistically summarizing key and noise topological features of a plurality of samples in the category, and finally, taking the composite structural semantic description as input, and performing classification decision by utilizing the large-scale language model subjected to LoRA technology fine adjustment. The method realizes the interpretable reasoning of the classification result of the 3D workpieces.

Inventors

YE DONGSHENG
HUANG HAIBO
ZHANG XINYAO
WU XINGHONG
ZHOU YOUMEI
ZHANG RUIMIN

Assignees

湖北汽车工业学院

Dates

Publication Date: 20260508
Application Date: 20260112

Claims (9)

1. A large language model-based 3D workpiece interpretable classification method, comprising the steps of: Step one, providing three-dimensional point cloud data of 3D workpieces to be classified; Step two, extracting topological features of the 3D workpiece by using a persistent coherent technology based on the three-dimensional point cloud data; Analyzing the topological features based on the extracted topological features to distinguish key topological features from noise topological features, and generating structural semantic descriptions based on the analysis results; Step four, based on the key topological feature and the noise topological feature, generating standard description of each workpiece category in a statistics mode; And fifthly, taking the structural semantic description and the standard description of each workpiece category as model reasoning thinking chain input, classifying the 3D workpieces by using a large language model finely tuned by LORA technology, and outputting a classification result.
2. The method for classifying the 3D workpiece interpretable based on the large language model according to claim 1, wherein in the first step, three-dimensional point cloud data is acquired by collecting the three-dimensional point cloud data through a CAD model in the internet, and the original three-dimensional point cloud data is in a form of triangular mesh data consisting of vertices, edges and triangular patches.
3. The method for classifying the 3D workpiece interpretable based on the large language model according to claim 2, wherein in the first step, based on a sample_points_ poisson _disk function of a poisson disk sampling algorithm implemented in an Open3D Open source library, the surface of the triangular mesh is processed to generate a point cloud meeting the poisson disk distribution characteristic, namely three-dimensional point cloud data.
4. The method for classifying 3D workpieces according to claim 3, wherein the persistent coherent technology in the second step adopts a method of constructing Vietoris-Rips complex, converts discrete point cloud data in three-dimensional point cloud data into a simplex complex capable of reflecting topological features, and defines a formula as follows: ; in the formula, A p-dimensional space coordinate point set; Is a distance threshold; the distance between any two points in the point set is expressed as a metric function.
5. The method for large language model based 3D workpiece interpretable classification of claim 4, wherein in step two, the distance threshold is obtained by constructing Vietoris-Rips complex method Topology features of the lower data, as Can track the appearance and disappearance of each topological feature Thus, the change condition of the topological characteristics of the data under different scales is obtained, the dynamic process is called filtering, and the definition formula is as follows: 。
6. The method for large language model based 3D workpiece interpretable classification of claim 5, wherein extracting the topological feature of the 3D workpiece in step two includes: In the continuous coherent filtering process, the birth time and the extinction time of various topological features are tracked and recorded, wherein the various topological features specifically comprise a communication branch, a ring and a cavity, the communication branch is 0-dimension, the ring is 1-dimension, and the cavity is 2-dimension and 3-dimension; and generating a point cloud meeting the distribution characteristics of the poisson disk in the continuous step I, adopting Vietoris-Rips complex to create a simplex complex tree in the step II, calculating the continuous coherent full step for 5 times, and recording 5 groups of independent continuous coherent output results of the 5 continuous steps, wherein the finally obtained key topological characteristics are the topological characteristics of the 3D workpiece.
7. The large language model based 3D workpiece interpretable classification method of claim 6, wherein the third step includes: A1, counting duration time of each topological feature in the topological features of the 3D workpiece; A2, judging the characteristic with duration less than the minimum duration in the topological characteristics of the 3D workpiece as a noise topological characteristic, judging the characteristic with duration not less than the minimum duration as a key topological characteristic, eliminating the noise topological characteristic, and reserving the key topological characteristic; A3, carding the topological attribute of the single workpiece according to the dimensions corresponding to the connected branches, the rings and the cavities based on the reserved key topological features, converting the topological attribute into personalized semantic text containing feature dimensions, number, durability and structural meanings, and generating structural semantic description.
8. The large language model based 3D workpiece interpretable classification method of claim 7, wherein the fourth step includes: B1, collecting key topological feature data obtained by processing a plurality of groups of conventional models in the step three aiming at each category of target workpieces, recording the noise topological feature types corresponding to the models, and carrying out dimension statistics on the collected key topological feature data to extract a common parameter range, wherein the common parameter range comprises the feature quantity of each dimension, a birth extinction time interval and a persistence range; B2, based on the statistical result of the B1, constructing a structural standard description template for each type of workpiece, wherein the template comprises category names, clear parameter ranges of key topological features of each dimension, physical structural meanings corresponding to the key features and typical noise topological feature types to be eliminated; selecting the same kind of workpieces with different specifications and different kinds of workpieces, carrying out matching test on key topological features and corresponding standard templates, and counting matching indexes; When the matching success rate of the same kind of workpieces is more than or equal to the matching success rate threshold of the same kind of workpieces and the mismatching rate of different kinds of workpieces is less than or equal to the mismatching rate threshold of the workpieces, the matching is up to standard, otherwise, the matching is not up to standard; Under the condition that the matching does not reach the standard, the parameter range of key features in the template is adjusted by combining the test result, typical feature details are supplemented until the matching reaches the standard, and finally, standard descriptions of all workpiece types are determined.
9. The large language model based 3D workpiece interpretable classification method of claim 8, wherein the fifth step includes: c1, splicing the structural semantic description of the 3D workpieces to be classified with the standard description of each workpiece class to construct a complete model reasoning thinking chain; loading a pre-trained large language model subjected to LoRA technology fine adjustment, and inputting the constructed reasoning thinking chain into the model; C3, outputting a class judgment result of the 3D workpiece to be classified and a corresponding feature matching basis based on feature matching logic in a thinking chain by the model; And C4, carrying out statistical verification on the model classification result, wherein the requirement that the matching success rate of the similar workpieces is more than or equal to the matching success rate threshold of the similar workpieces, the mismatching rate of the different workpieces is less than or equal to the mismatching rate threshold of the workpieces is met, representing that the model meets the standard, otherwise, representing that the model does not meet the standard, and if the model does not meet the standard, combining the result to adjust the characteristic description details of the thinking chain or the fine adjustment parameters of the model until the model meets the standard, and finally outputting the stable 3D workpiece classification result.

Description

Large language model-based 3D workpiece interpretable classification method Technical Field The invention relates to the technical field of computer vision and artificial intelligence, in particular to a large language model-based 3D workpiece interpretable classification method. Background The need for 3D vision technology is undergoing a paradigm shift from "see" to "understand", where the need for three-dimensional vision technology by its production systems has been perceived from basic objects to precise recognition, classification, and geometric understanding of three-dimensional workpieces. In core application scenarios such as industrial robot sorting, automatic quality detection and intelligent warehouse management, the realization of high-precision and high-robustness 3D workpiece classification has become one of key technologies for promoting industry upgrading. While existing 3D classification methods, represented by deep learning, have exhibited some classification accuracy, the inherent drawbacks of these technical routes are increasingly becoming the bottleneck restricting their large-scale deployment in high-reliability industrial scenarios. First, from the view of the interpretability of the model, the existing deep learning model is essentially a complex "black box" system, the specific three-dimensional geometric features and logic reasoning process on which the classification decision depends are extremely opaque, and the user can only obtain the classification result of "what" and cannot understand "why" to make the decision, so that the reliability and acceptability of the technology in the field are impaired by the lack of decision basis. Moreover, from the perspective of robustness, the characteristics learned by the prior method show that the point cloud data are sensitive to noise, partial shielding of the object surface, acquisition view angle change and other interference factors, and the point cloud data are easy to obviously decline in a real industrial site with complex working conditions and full uncertainty. Therefore, a new technical solution capable of improving the interpretability and the robustness is needed in the field, so that the application potential of the 3D vision technology in the industrial link is released. Disclosure of Invention (One) solving the technical problems Aiming at the defects of the prior art, the invention provides a large language model-based 3D workpiece interpretable classification method, which has the topological characteristics of the workpiece obtained through topological data analysis and realizes interpretable reasoning on the 3D workpiece classification result by utilizing the causal reasoning capability of the large language model. (II) technical scheme In order to solve the technical problems, the invention starts from the core requirement of improving the interpretability of 3D classification, extracts the topological characteristics of workpieces of different classes through a continuous coherent method, analyzes the commonality rules of the workpieces in connectivity, annular structure, cavity characteristics and the like, establishes a method from numeric topological description to structural semantic characterization so as to realize the conversion from data-driven classification to knowledge-guided reasoning classification, and provides the following technical scheme that the 3D workpiece interpretable classification method based on a large language model comprises the following steps: Step one, providing three-dimensional point cloud data of 3D workpieces to be classified; Step two, extracting topological features of the 3D workpiece by using a persistent coherent technology based on the three-dimensional point cloud data; Analyzing the topological features based on the extracted topological features to distinguish key topological features from noise topological features, and generating structural semantic descriptions based on the analysis results; Step four, based on the key topological feature and the noise topological feature, generating standard description of each workpiece category in a statistics mode; And fifthly, taking the structural semantic description and the standard description of each workpiece category as model reasoning thinking chain input, classifying the 3D workpieces by using a large language model finely tuned by LORA technology, and outputting a classification result. Preferably, in the first step, three-dimensional point cloud data is acquired through a CAD model in the internet, and the original three-dimensional point cloud data format is triangle mesh data composed of vertices, edges and triangle patches. Preferably, in the first step, the triangle mesh surface is processed based on a sample_points_ poisson _disk function of the poisson disk sampling algorithm implemented in the Open3D Open source library, so as to generate a point cloud meeting the poisson disk distribution characteristics, namely