CN-122025180-A - Method and system for identifying condylar lesions in CBCT image based on multi-modal information

CN122025180ACN 122025180 ACN122025180 ACN 122025180ACN-122025180-A

Abstract

A method for recognizing condylar lesions in CBCT images based on multi-mode information includes such steps as preprocessing CBCT images, quantized data of patient and medical knowledge corpus, extracting depth feature vectors of images from the preprocessed CBCT images by deep learning model, obtaining structured quantized data feature vectors, primary fusion of image features and quantized data features to generate comprehensive query vectors, searching in the pre-coded medical knowledge text embedding library to obtain the most relevant knowledge of current case dynamically and form knowledge feature vectors, final fusion of image features, quantized data features and knowledge features to form a final fused feature vector, and inputting to lesion recognition model to output lesion probability. And providing a system for identifying condylar lesions in the CBCT image based on the multi-modal information. The method and the device remarkably improve the accuracy, the robustness and the interpretability of the identification of the condylar lesions.

Inventors

HAO PENGYI
CHEN RUI
WU FULI

Assignees

浙江工业大学

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (10)

1. A method for identifying condylar lesions in a CBCT image based on multi-modal information, the method comprising the steps of: Step S1, for the original CBCT image And CBCT images Related patient text data Preprocessing and outputting an image Encoded data feature vector Using encoders Corpus of medical knowledge Coding into a vector library for quick retrieval ; Step S2, CBCT image Input to pass over a large number of unlabeled CBCT images Extracting image depth characteristic vectors from a self-supervision image encoder pre-trained by a self-supervision contrast learning framework ; Step S3, image depth characteristic vector And quantizing the data feature vector Inputting the integrated query vectors into a trainable multi-layer perceptron module for fusion to generate comprehensive query vectors ; Step S4, for the comprehensive query vector Matrix with knowledge base Performing similarity retrieval, and ranking highest The individual vectors forming a matrix The matrix Generating final knowledge feature vectors by aggregation operations using average pooling ; Step S5, the image depth characteristic vector Quantized data feature vector And knowledge feature vector Splicing to form final fusion feature vector Will be Input into classifier, output lesion probability And judging the output probability value through a preset threshold value to obtain a final lesion recognition result.
2. The method for identifying condylar lesions in CBCT images based on multi-modality information as claimed in claim 1, wherein the procedure of step S1 is as follows: Step 1.1, for the original CBCT image Preprocessing, including size normalization, three-dimensional denoising and gray value normalization, and enhancing the contrast of a lesion area by applying a three-dimensional limiting contrast adaptive histogram equalization technology to obtain a preprocessed image ; Step 1.2, text data for patient Pretreatment is carried out by filling the missing value of the numerical type characteristic Standardization, carrying out missing value filling and single-heat coding on the category type characteristics, and finally splicing the category type characteristics into unified quantized data characteristic vectors ; Step 1.3, combining the original medical knowledge text set with the steps of word segmentation, stop word removal, word shape reduction and the like of the custom dictionary in sequence to form a standardized medical knowledge corpus The dimension is as follows 。
3. The method for identifying condylar lesions in CBCT images based on multi-modality information of claim 2, wherein the process of step S1 further comprises: Step 1.4, will The model is used as a text encoder and matched word segmentation device thereof is used for processing the text Each text unit of (a) is converted into a special mark And Input tensor of (a) The dimension is as follows ; Step 1.5, input tensor To be delivered to Forward propagation in the model and extraction and starting position Marking the corresponding hidden state vector as the final semantic embedded vector for the text unit The dimension is as follows ; Step 1.6, collecting corpus Semantic embedded vectors generated for all text units in a document Constructing and storing a pre-coded text embedded vector library which can be finally used for quick retrieval The dimension is as follows 。
4. The method for identifying condylar lesions in CBCT images based on multi-modality information as claimed in claim 1, wherein the procedure of step S2 is as follows: step 2.1, loading has been passed over a number of unlabeled CBCT images The self-supervision contrast learning framework pre-trains a three-dimensional vision Transforme model as an image encoder; Step 2.2, inputting CBCT image Dividing into a series of regular, non-overlapping three-dimensional patches in three-dimensional space, each patch having dimensions of Individual voxels, the image is converted into A set of three-dimensional tiles having dimensions of ; Step 2.3, flattening each three-dimensional patch into a one-dimensional vector, mapping it into a vector of fixed dimension through a trainable linear projection layer to form a patch embedding, embedding all The blocks are embedded in a spatial order to form an embedded sequence The dimension is as follows 。
5. The method for identifying condylar lesions in CBCT images based on multi-modality information of claim 4, wherein the process of step S2 further comprises: Step 2.4 embedding the sequence in blocks Leading a special classification mark which can be learned and has the same dimension as the block embedding Is embedded in vectors of (a) Then, adding a corresponding leachable position embedding for each embedding in the sequence, providing the spatial position information of each block in the original three-dimensional image for the model, and forming a final model input sequence The dimension is as follows ; Step 2.5, complete input sequence To the image encoder, the sequence will pass through multiple layers in turn Modules, each layer comprising a multi-headed self-attention module and a feed forward network, enabling the model to calculate global dependencies between all blocks in the sequence; Step 2.6, obtaining hidden state vectors of all positions from the final output layer of the model, and accurately extracting the hidden state vectors from the hidden state vectors and the sequence starting positions Marking the corresponding output vector and taking the vector as a depth characteristic vector representing global semantic information of the whole CBCT image The dimension is as follows 。
6. The method for identifying condylar lesions in CBCT images based on multi-modality information as claimed in claim 1, wherein the procedure of step S3 is as follows: Step 3.1, image depth feature vector And quantizing the data feature vector Splicing along the feature dimension to form a combined feature vector Its dimension is ; Step 3.2, constructing a multi-layer perceptron module comprising a hidden layer and an output layer, the hidden layer comprising a layer for converting the input dimension from the input dimension to the output dimension Mapping to hidden dimensions And using a ReLU as a nonlinear activation function, the output layer includes a hidden dimension Mapping to final fusion dimension Is a fully connected layer of (a); Step 3.3, combining the feature vectors Input into hidden layer of built multi-layer perceptron module, the layer passes through the matrix of weight of learning parameters Bias vector Performing linear transformation, and applying Activating the function to obtain a hidden layer feature representation ; Step 3.4, representing hidden layer characteristics Output layer to be transmitted to multi-layer perceptron module, the layer passing through its weight matrix Bias vector For a pair of Performing linear projection to generate final comprehensive query vector 。
7. The method for identifying condylar lesions in CBCT images based on multi-modality information as claimed in claim 1, wherein the process of step S4 is as follows: Step 4.1 for Each of which text embeds a vector Calculate its and Cosine similarity between Obtaining a composition comprising List of individual similarity values ; Step 4.2, similarity value list According to the descending order, selecting the highest similarity according to the sorting result Individual text embedding vectors, forming a set of embedding vectors for related knowledge text Wherein Is a preset integer, each vector dimension is ; Step 4.3, pair All of (3) The individual vectors are subjected to an averaging pooling operation, i.e. for each dimension Averaging the values to obtain an aggregated knowledge feature vector The dimension is as follows 。
8. The method for identifying condylar lesions in CBCT images based on multi-modality information as claimed in claim 1, wherein the process of step S5 is as follows: Step 5.1, will 、 And Performing splicing operation on the three feature vectors in feature dimensions to obtain a final fusion feature vector The dimension is as follows ; Step 5.2, defining and initializing a multi-layer perceptron The classifier is used as a lesion recognition model, and the architecture and the learnable parameters of the model are defined as follows: First hidden layer A full connection layer with weight Bias and method of making same Responsible for dimension-wise slave input Mapping to hidden dimensions Followed by one Activation function and one A layer; second hidden layer A full connection layer with weight Bias and method of making same Responsible for dimension from Mapping to hidden dimensions Followed by one An activation function and a Dropout layer; Output layer A full connection layer with weight Bias and method of making same Responsible for dimension from Mapping to 1 followed by The function is activated.
9. The method for identifying condylar lesions in CBCT images based on multi-modality information of claim 8, wherein the process of step S5 further comprises: Step 5.3, will First hidden layer input to lesion recognition model This layer first performs a linear transformation and then applies Activating a function to introduce nonlinearity; Step 5.4, will Through one of A layer that randomly sets the output of a portion of neurons to zero with a pre-set probability during model training; Step 5.5, will Input to the second hidden layer This layer first performs a linear transformation and then applies Activating a function to introduce nonlinearity; Step 5.6, will Through the second one A layer enhancing generalization ability of the model; Step 5.7, will Input to output layer Performing final linear transformation to obtain an un-normalized logic value ; Step 5.8, will Input to In the activation function, mapped to Obtaining final lesion probability in the interval ; Step 5.9, outputting the lesion probability of the model With a predetermined classification threshold Comparing if Then Judging as pathological changes, if Then And judging that the test is normal.
10. A system for implementing the method for identifying condylar lesions in CBCT images based on multi-modality information according to claim 1, comprising: an image preprocessing module for preprocessing the original CBCT image And (3) and Image processing apparatus Related patient text data Preprocessing and outputting an image Encoded data feature vector Using encoders Corpus of medical knowledge Coding into a vector library for quick retrieval ; An image feature extraction module for extracting CBCT image Input to pass over a large number of unlabeled CBCT images Extracting image depth characteristic vectors from a self-supervision image encoder pre-trained by a self-supervision contrast learning framework ; An image and quantized data feature fusion module for fusing image depth feature vectors And quantizing the data feature vector Inputting the integrated query vectors into a trainable multi-layer perceptron module for fusion to generate comprehensive query vectors ; Knowledge retrieval and aggregation module for searching and aggregating comprehensive query vectors Matrix with knowledge base Performing similarity retrieval, and ranking the highest preset number The individual vectors forming a matrix The matrix Generating final knowledge feature vectors by aggregation operations using average pooling ; A final classification decision module for classifying the image depth feature vectors Quantized data feature vector And knowledge feature vector Splicing to form final fusion feature vector Will be Input into classifier, output lesion probability And judging the output probability value through a preset threshold value to obtain a final lesion recognition result.

Description

Method and system for identifying condylar lesions in CBCT image based on multi-modal information Technical Field The invention belongs to the field of medical image processing, and particularly relates to a method and a system for identifying condyloid lesions in a CBCT image based on multi-mode information. Background The condyloid process of mandible (condyloid process for short) is the structure of temporomandibular joint #) The core functional components of the device bear huge biomechanical load in complex maxillofacial movements such as chewing, speech and the like. Due to its constant functional load bearing and complex anatomy, the condylar head is extremely susceptible to various pathological changes. Clinically, condylar lesions are mainly represented by abnormal changes of bone structures, and diagnosis of the condylar lesions is highly dependent on imaging examination, particularly the bone changes of the condylar, such as bone sclerosis, bone erosion, morphological abnormality and the like. The occurrence and development of lesions directly reflect the health condition of the joints. Thus, accurate identification and assessment of condylar lesions, forHas decisive clinical significance in early diagnosis, disease stage and treatment scheme. Cone Beam Computed Tomography (CBCT) has become one of the most reliable imaging means for assessing condylar bone structure with its high spatial resolution and excellent imaging capability for hard tissue. The method can clearly display the fine structures of the condylar cortical bone and the cancellous bone, and provides abundant detailed information for lesion diagnosis. However, the conventional CBCT image diagnosis procedure relies largely on subjective experience of doctors, which is not only time-consuming and laborious, but also the diagnosis results are susceptible to inter-observer differences. Especially in the face of early or atypical lesions, minor bone changes are easily ignored, possibly leading to missed or misdiagnosis. However, there are few studies currently using deep learning studies to identify condylar lesions in CBCT images, and the prior art relies primarily on a single image modality, which has limitations in dealing with complex temporomandibular joint diseases, because it ignores other clinical data that can provide valuable supplemental information. Furthermore, deep learning models based solely on data driving often lack interpretability, their "black box" nature makes their decision process difficult to understand and trust, which is a significant obstacle in the medical field where high reliability and security are required. To overcome these challenges, a multi-modal method of effectively fusing image information, clinical quantitative data, and domain expert knowledge is becoming an important development trend for medical artificial intelligence research. By fusing data from different sources, a more comprehensive view of the patient can be constructed, thereby improving the accuracy and robustness of the diagnostic model. Meanwhile, a medical knowledge base is introduced, prior information and constraint can be provided for the decision of the model, the interpretability of the model is enhanced, and the diagnosis logic is closer to the thinking process of a clinician. Thus, a kind of fusion is developedThe multi-mode intelligent diagnosis method for the quantitative data of the images and the patients and the medical text knowledge is hopeful to break through the bottleneck of the existing single-mode diagnosis technology, and provides a new solution for realizing accurate and efficient identification of the temporomandibular joint condylar lesions. Disclosure of Invention In order to overcome the defects of the prior art, the invention provides a method and a system for identifying condyloid lesions in a CBCT image based on multi-mode information, which realize the automatic and accurate identification of the condyloid lesions in the CBCT image and improve the reliability and efficiency of diagnosis. The technical scheme adopted for solving the technical problems is as follows: A method for identifying condylar lesions in CBCT images based on multi-modal information comprises the following steps: Step S1, for the original CBCT image And CBCT imagesRelated patient text dataPreprocessing, wherein H represents image height, W represents image width, D represents image depth, D represents input feature dimension of original patient text data, and an output image is obtainedEncoded data feature vector,Representing the dimension of the encoded patient data feature vector, using an encoderCorpus of medical knowledgeCoding into a vector library for quick retrieval,Representing the total number of text entries contained in the medical knowledge corpus,Representing the semantic feature vector dimension of the encoder output; Step S2, CBCT image Input to pass over a large number of unlabeled CBCT images() Extracting image depth