CN-121582260-B - Automatic lung nodule labeling method based on multi-mode semantic affinity iterative optimization

CN121582260BCN 121582260 BCN121582260 BCN 121582260BCN-121582260-B

Abstract

The embodiment of the invention provides an automatic lung nodule labeling method based on multi-mode semantic affinity iterative optimization, which belongs to the technical field of data processing and specifically comprises the steps of obtaining lung CT images and text reports, carrying out structural semantic analysis on the text reports, carrying out visual feature extraction on the lung CT images, carrying out semantic guidance processing on depth visual feature representations based on key semantic information to generate semantic enhanced visual guidance features, calling a segmentation basic model based on the semantic enhanced visual guidance features, generating a plurality of candidate segmentation masks by introducing an active mutation mechanism, calculating the semantic matching degree score of each candidate segmentation mask according to each candidate segmentation mask, comparing the semantic matching degree scores of all candidate segmentation masks, and selecting the candidate segmentation mask with the highest score as a final lung nodule labeling result to be output. By the scheme of the invention, the labeling precision, adaptability and interpretability are improved.

Inventors

LIANG WEI
ZHANG JINGYI
CHEN YAN
ZHANG JUNJIE
WEN QING

Assignees

湖南工商大学

Dates

Publication Date: 20260508
Application Date: 20260127

Claims (5)

1. An automatic lung nodule labeling method based on multi-modal semantic affinity iterative optimization is characterized by comprising the following steps: step 1, acquiring a lung CT image to be marked and a text report corresponding to the lung CT image and describing the characteristics of a lung nodule; Step 2, carrying out structural semantic analysis on the text report, and extracting key semantic information of the nodule position and morphological characteristics contained in the text report; Step 3, extracting visual features of the lung CT image to obtain depth visual feature representation; Step 4, carrying out semantic guidance processing on the depth visual feature representation based on the key semantic information to generate a visual guidance feature with enhanced semantic; and 5, calling a segmentation basic model based on the visual guide characteristics of semantic enhancement, and generating a plurality of candidate segmentation masks by introducing an active mutation mechanism, wherein the active mutation mechanism comprises a prompt mutation strategy and specifically comprises the following steps: Deriving an initial bounding box hint from the visual guide feature: Wherein, the And Representing the abscissa and ordinate respectively of the top left corner vertex of the bounding box in the image, And The width and the height of the bounding box are respectively represented; By the formula Generating the first Individual variant bounding box hints , wherein, And For random positional shifts within a predetermined range, And A random scale factor within a preset range; prompting each variation boundary box Respectively taking the input as the input, and calling the segmentation basic model to generate a corresponding candidate segmentation mask; Step 6, for each candidate segmentation mask, calculating cosine similarity between the image region features corresponding to the candidate segmentation mask and text description features corresponding to the text report by using a pre-trained cross-mode semantic matching model, and taking the cosine similarity as a semantic matching degree score of the candidate segmentation mask; The step 6 specifically includes: step 6.1, for the ith candidate segmentation mask According to The pixel coordinates marked as foreground in the image are cut out from the CT image of lung ; Step 6.2, image area is processed Image encoder for inputting cross-mode semantic matching model to obtain image feature embedded vector ; Step 6.3, inputting the text report into a text encoder of the cross-modal semantic matching model to obtain a text feature embedded vector ; Step 6.4, calculating an image feature embedding vector Embedding vectors with text features Cosine similarity between as candidate segmentation mask Semantic matching degree scoring of (2) ; And 7, comparing semantic matching degree scores of all the candidate segmentation masks, and selecting the candidate segmentation mask with the highest score as a final lung nodule labeling result to output.
2. The method according to claim 1, wherein the step 2 specifically comprises: Based on the analysis function, analyzing the text report into a structural semantic information set S formed by K triples by using a preset analysis mode as key semantic information, wherein the triples are , wherein, Representing the medical entity being described, Representing the attribute dimension of the medical entity, Concrete description values representing attribute dimensions, the attribute dimensions Including location, size, density, morphology, and edge features.
3. The method according to claim 2, wherein the preset parsing scheme includes an implicit parsing scheme based on a large language model and an explicit parsing scheme based on rules and knowledge base; the implicit parsing method based on the large language model comprises the following steps: Embedding a text report into a predefined instruction prompt template, inputting the text report into a large language model, obtaining JSON format output containing triples, and converting the JSON format output into a structural semantic information set S; the explicit parsing method based on the rules and the knowledge base comprises the following steps: word segmentation and dependency parsing of text reports, pattern matching based on predefined medical description dictionary to identify attribute values As a concrete description value of the attribute dimension, and determining the attribute value from the syntactic dependency Modified medical entity To form triples.
4. The method of claim 3, wherein the active mutation mechanism comprises at least one of the following strategies: Determining initial prompt information based on visual guide characteristics, generating a plurality of different prompt information by applying random disturbance to the position and/or scale parameters of the initial prompt information, and respectively inputting the different prompt information into the segmentation basic model; a confidence threshold variation strategy comprises the steps of fixing prompt information of an input segmentation basic model, and binarizing a probability map output by the model by setting a group of different thresholds to generate a plurality of different candidate segmentation masks; and (3) a model internal randomness utilization strategy, namely setting different random seeds for calling the segmentation basic model for a plurality of times to obtain a plurality of different candidate segmentation masks.
5. The method of any one of claims 1 to 4, wherein the cross-modal semantic matching model is a vision-language dual encoder model pre-trained by contrast learning on a large scale teletext pair dataset.

Description

Automatic lung nodule labeling method based on multi-mode semantic affinity iterative optimization Technical Field The invention relates to the technical field of data processing, in particular to an automatic lung nodule labeling method based on multi-mode semantic affinity iterative optimization. Background At present, accurate labeling of lung nodules is a key link of early auxiliary diagnosis of lung cancer, and an automatic labeling method based on deep learning is provided for improving efficiency. The existing methods can be divided into two types, namely a visual model which simply depends on CT images, the segmentation result of the visual model cannot be aligned with a text report describing the characteristics of the nodules in terms of semantics, and a multi-mode method which is used for attempting to fuse the graphics context, wherein the multi-mode method adopts a characteristic splicing or attention mechanism to conduct one-time prediction, and a quantitative evaluation and iterative optimization mechanism for fine-granularity semantic consistency between the segmentation result and the text description is lacked. It can be seen that there is a need for an automatic lung nodule labeling method that can utilize cross-modal semantic matching for iterative optimization, thereby automatically generating multi-modal semantic affinity iterative optimization-based lung nodules that are highly consistent with textual descriptions. Disclosure of Invention In view of the above, the embodiment of the invention provides an automatic lung nodule labeling method based on multi-mode semantic affinity iterative optimization, which at least partially solves the problems of poor labeling efficiency and accuracy in the prior art. The embodiment of the invention provides an automatic lung nodule labeling method based on multi-mode semantic affinity iterative optimization, which comprises the following steps: step 1, acquiring a lung CT image to be marked and a text report corresponding to the lung CT image and describing the characteristics of a lung nodule; Step 2, carrying out structural semantic analysis on the text report, and extracting key semantic information of the nodule position and morphological characteristics contained in the text report; Step 3, extracting visual features of the lung CT image to obtain depth visual feature representation; Step 4, carrying out semantic guidance processing on the depth visual feature representation based on the key semantic information to generate a visual guidance feature with enhanced semantic; Step 5, calling a segmentation basic model based on the visual guide characteristics of semantic enhancement, and generating a plurality of candidate segmentation masks by introducing an active mutation mechanism; Step 6, for each candidate segmentation mask, calculating cosine similarity between the image region features corresponding to the candidate segmentation mask and text description features corresponding to the text report by using a pre-trained cross-mode semantic matching model, and taking the cosine similarity as a semantic matching degree score of the candidate segmentation mask; And 7, comparing semantic matching degree scores of all the candidate segmentation masks, and selecting the candidate segmentation mask with the highest score as a final lung nodule labeling result to output. According to a specific implementation manner of the embodiment of the present invention, the step 2 specifically includes: Based on the analysis function, analyzing the text report into a structural semantic information set S formed by K triples by using a preset analysis mode as key semantic information, wherein the triples are , wherein,Representing the medical entity being described,Representing the attribute dimension of the medical entity,Concrete description values representing attribute dimensions, the attribute dimensionsIncluding location, size, density, morphology, and edge features. According to a specific implementation manner of the embodiment of the invention, the preset parsing manner comprises an implicit parsing method based on a large language model and an explicit parsing method based on rules and a knowledge base; the implicit parsing method based on the large language model comprises the following steps: Embedding a text report into a predefined instruction prompt template, inputting the text report into a large language model, obtaining JSON format output containing triples, and converting the JSON format output into a structural semantic information set S; the explicit parsing method based on the rules and the knowledge base comprises the following steps: word segmentation and dependency parsing of text reports, pattern matching based on predefined medical description dictionary to identify attribute values As a concrete description value of the attribute dimension, and determining the attribute value from the syntactic dependencyModified medical entityTo fo