Search

CN-122024271-A - Automatic bubble labeling and information extraction method for engineering drawing based on hybrid vision model

CN122024271ACN 122024271 ACN122024271 ACN 122024271ACN-122024271-A

Abstract

The invention discloses an automatic bubble labeling and information extraction method for engineering drawings based on a hybrid vision model, which comprises the steps of inputting original image data of the engineering drawings, detecting and classifying drawing elements based on a rotation target detection model, dividing the drawing elements into five types of measurement items, theoretical values, reference values, text descriptions and tables, obtaining detection result data containing positions and categories, extracting information according to the categories, directly outputting structured data and generating confidence coefficient by using a fine-tuning vision language model for the first three types, identifying the text descriptions, extracting key value pairs by combining OCR (optical character recognition) with a large language model for the tables, verifying and correcting the output of the first two types of data based on the generating confidence coefficient, distributing unique bubble numbers according to a preset rule according to spatial position information of the measurement items, associating and integrating all the data, and automatically generating a final report containing a bubble labeling layer and a structured detection list.

Inventors

  • HUANG SHENGCHAO
  • LU XIAOHUI
  • WANG CHUNLIN

Assignees

  • 江苏大道云隐科技有限公司

Dates

Publication Date
20260512
Application Date
20260227

Claims (10)

  1. 1. The automatic bubble labeling and information extracting method for the engineering drawing based on the mixed vision model is characterized by comprising the following steps of: S1, inputting original image data of an engineering drawing; S2, detecting and classifying the original image data based on a rotating target detection model to obtain detection result data comprising a plurality of detection units, wherein each detection unit comprises directional boundary frame coordinate information, drawing element types and category confidence; The drawing element category comprises measurement items, theoretical values, reference values, text descriptions and tables, wherein the measurement items comprise all dimension labels with tolerance, the theoretical values correspond to theoretical correct dimensions marked by boxes, and the reference values correspond to reference dimensions marked by brackets; S3, carrying out information extraction processing on the image area corresponding to each detection unit: Intercepting corresponding regional image data from the original image data according to the category of the detection unit; if the category is a measurement item, a theoretical value or a reference value, inputting the regional image data into a fine-tuned visual language model VLM to obtain first structured data and corresponding generation confidence coefficient; if the category is text description, inputting the regional image data into a visual language model VLM to obtain second structured data and corresponding generation confidence; If the category is a table, the regional image data are sequentially processed through an optical character recognition OCR module and a large language model LLM to obtain third structured data; s4, performing confidence verification processing on the first structured data and the second structured data: When the generation confidence level is lower than a preset threshold value, an optical character recognition OCR module is called to recognize the corresponding regional image data to obtain OCR verification data, and the first structured data or the second structured data is corrected according to the OCR verification data to obtain corrected structured data; when the generation confidence is higher than or equal to the preset threshold, directly taking the first structured data or the second structured data as corrected structured data; S5, based on the spatial position information of the detection units with the categories of the measurement items in the detection result data, distributing unique bubble numbers to each measurement item detection unit according to a preset spatial ordering rule, and generating number mapping data; and S6, carrying out association matching on the corrected structured data, the third structured data and the detection result data to obtain an enhanced detection record, and integrating the enhanced detection record with the number mapping data to generate a final output result comprising a bubble labeling layer and a structured detection list.
  2. 2. The method for automatic bubble labeling and information extraction of engineering drawings based on a hybrid vision model according to claim 1, wherein the rotating object detection model in S2 is a one-stage object detection network supporting rotating frame prediction, which outputs directional bounding box parameters (cx, cy, w, h, θ), wherein (cx, cy) is a center point coordinate, (w, h) is a width and a height, and θ is a rotation angle; The detection result data is obtained through the following steps: s21, carrying out standardization processing on the original image data to form a model input tensor; S22, extracting image features through the rotating target detection model and performing multi-task prediction to obtain original prediction data comprising directional boundary frame parameters, category probability and target confidence; S23, performing confidence screening, frame decoding and non-maximum suppression processing on the original predicted data to obtain a candidate detection unit list; S24, packaging the candidate detection unit list into structured detection result data.
  3. 3. The method for automatic bubble labeling and information extraction of engineering drawings based on a hybrid vision model according to claim 2, wherein the non-maximum suppression process in S23 calculates the intersection ratio of the rotating frames with respect to the rotating frames.
  4. 4. The method for automatic bubble labeling and information extraction of engineering drawings based on a hybrid visual model according to claim 3, wherein in the step S3, in the process of generating structured data based on the visual language model VLM, the generating confidence of the first structured data or the second structured data is obtained by performing an aggregate calculation on the Token prediction probability of a key Token in an output sequence.
  5. 5. The method for automatic bubble labeling and information extraction of engineering drawings based on a hybrid visual model according to claim 4, wherein the first structured data in S3 is a structured object including a nominal value, a tolerance, a geometric symbol, and a number field, the second structured data is a structured object including a text content identification, and the third structured data is a structured data of a key value extracted from OCR text by the large language model LLM.
  6. 6. The method for automatic bubble labeling and information extraction of engineering drawings based on a hybrid vision model according to claim 5, wherein the generating confidence verification process in S4 further comprises comparing the OCR verification data with the first structured data or the second structured data to be verified at a character level or a semantic level, and automatically correcting or performing collision arbitration according to the comparison result.
  7. 7. The method for automatic bubble labeling and information extraction of engineering drawings based on a hybrid vision model according to claim 6, wherein the preset spatial ordering rule in S5 is an ordering rule encircling from left to right, from top to bottom or clockwise.
  8. 8. The method for automatic bubble labeling and information extraction of engineering drawings based on a hybrid vision model according to claim 7, wherein the clockwise surrounding ordering rule in S5 comprises the following steps: determining a center point of a drawing or a region; calculating the angle of the reference point of each measuring item detection unit relative to the center point; numbering is performed in order of decreasing angle.
  9. 9. The method for automatic bubble labeling and information extraction of engineering drawings based on a hybrid vision model according to claim 8, wherein in S6, the association matching is to search and combine the corresponding corrected structured data or third structured data with the detection unit ID of each detection unit in the detection result data as an index, so as to form the enhanced detection record.
  10. 10. The method for automatically labeling and extracting information from engineering drawings based on a hybrid vision model according to claim 9, wherein in S6, all measurement item records are screened from the integrated data, and key information is extracted according to a preset report format for arrangement, so as to generate the structured detection list; And drawing a labeling graph with a bubble number at a position corresponding to the drawing according to the position information of each record in the integrated data, thereby generating the bubble labeling graph layer.

Description

Automatic bubble labeling and information extraction method for engineering drawing based on hybrid vision model Technical Field The invention relates to an automatic bubble labeling and information extraction method for engineering drawings based on a hybrid vision model, and belongs to the technical field of computer vision. Background In the digital manufacturing and quality management flow, the "bubble label (Ballooning)" of the engineering drawing is a key step. This process requires numbering each feature on the drawing, such as dimensions, tolerances, notes, etc., and extracting its attributes to generate a detection plan. Existing academic research generally classifies drawing elements into GD & T, surface roughness, chamfer, thread categories. This classification based on "geometric feature type" is detailed, but does not directly correspond to "regulatory attributes" in quality detection. For example, in quality inspection (Ballooning) logic, the processing logic for the dimension with tolerance (to be measured) is quite different from the reference dimension (reference only), but existing inspection models often mix them into a talk. Traditional OCR technology (such as TESSERACT) is not good in processing the rotation text, special symbols and complex layout in engineering drawings, and simple OCR cannot understand the semantic structure of the size, and often requires self-compiling complex post-processing rules. Disclosure of Invention The technical problem to be solved by the invention is to realize accurate positioning, attribute classification and structured extraction of the features to be controlled in the drawing by constructing a new classification detection system and combining a rotation target detection technology with a fine adjustment VLM and OCR technology. The invention provides a technical scheme for solving the technical problems, which is an automatic bubble labeling and information extraction method for engineering drawings based on a mixed vision model, and comprises the following steps: S1, inputting original image data of an engineering drawing; S2, detecting and classifying the original image data based on a rotating target detection model to obtain detection result data comprising a plurality of detection units, wherein each detection unit comprises directional boundary frame coordinate information, drawing element types and category confidence; The drawing element category comprises a measurement item, a theoretical value, a reference value, a text description and a table, wherein the measurement item comprises all dimension labels with tolerance, the theoretical value corresponds to a theoretical correct dimension marked by a square frame, and the reference value corresponds to a reference dimension marked by a bracket; s3, carrying out information extraction processing on the image area corresponding to each detection unit: according to the category of the detection unit, intercepting corresponding regional image data from the original image data; If the category is a measurement item, a theoretical value or a reference value, inputting the regional image data into a fine-tuned visual language model VLM to obtain first structured data and corresponding generation confidence coefficient; if the category is text description, inputting the regional image data into a visual language model VLM to obtain second structured data and corresponding generation confidence; if the category is a table, the regional image data is sequentially processed through an optical character recognition OCR module and a large language model LLM to obtain third structured data; S4, performing generation confidence verification processing on the first structured data and the second structured data: When the generated confidence level is lower than a preset threshold value, an optical character recognition OCR module is called to recognize corresponding regional image data to obtain OCR verification data, and the first structured data or the second structured data is corrected accordingly to obtain corrected structured data; when the generated confidence is higher than or equal to a preset threshold, directly taking the first structured data or the second structured data as corrected structured data; s5, based on the spatial position information of the detection units with the categories of the measurement items in the detection result data, distributing unique bubble numbers to each measurement item detection unit according to a preset spatial ordering rule, and generating number mapping data; And S6, performing association matching on the corrected structured data, the third structured data and the detection result data to obtain an enhanced detection record, and integrating the enhanced detection record with the number mapping data to generate a final output result comprising a bubble labeling layer and a structured detection list. The rotating target detection model in S2 is a one-stage target