CN-122025099-A - Lung cancer diagnosis and prognosis system based on visual language basic model

CN122025099ACN 122025099 ACN122025099 ACN 122025099ACN-122025099-A

Abstract

The application relates to the technical field of artificial intelligence and medical image processing, and discloses a lung cancer diagnosis and prognosis system based on a visual language basic model, which comprises a database construction module, a visual language comparison pre-training module, a radiology report generation module and a downstream task prediction module. The database construction module performs standardized processing on the 3D chest CT and the report, the pre-training module realizes semantic alignment of the image and the text characteristics by combining optimization contrast loss and auxiliary classification loss, the report generation module compresses the visual characteristics by utilizing the 3D space pooling perceptron and inputs the visual characteristics into the large language model to generate a diagnosis report, and the downstream task prediction module multiplexes the visual encoder to output multidimensional assessment results such as malignancy classification, parting, transfer, survival prognosis and the like. According to the application, through cross-modal alignment and multi-task combined learning, the problems of 3D image feature processing bottleneck and modal splitting are solved, and the automatic assessment and auxiliary diagnosis of the whole lung cancer flow are realized.

Inventors

LIU CHENGCAI
WANG SHUO
Sang Haolin
WANG JINGWEN
HUANG XINGYU
WU GE
WANG CHENGHAO

Assignees

北京航空航天大学

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (10)

1. A lung cancer diagnosis and prognosis system based on a visual language base model, comprising: A database construction module (10) for acquiring paired 3D chest CT image sequences and radiological reports and performing data preprocessing to obtain standardized lung region of interest data and structured abnormality signature; A visual language contrast pre-training module (20) comprising a 3DSwinTransformer architecture visual encoder (21) and a BERT architecture linguistic encoder (22) for extracting semantically aligned visual and textual features by jointly optimizing contrast loss functions and disease-specific auxiliary classification loss functions; A radiology report generation module (30) for compressing and projecting the visual features with a 3D spatial pooling perceptron (31) and inputting to a large language model based on low-rank adaptive fine tuning to generate a radiology diagnosis report; And the downstream task prediction module (40) is used for multiplexing the visual encoder (21) and connecting a plurality of independent task prediction heads, and is used for outputting lung cancer malignancy classification results, parting results, metastasis prediction results and survival prognosis evaluation results based on the global visual characteristics output by the visual encoder (21).
2. The lung cancer diagnosis and prognosis system based on a visual language base model according to claim 1, characterized in that the database construction module (10) comprises a data preprocessing unit for: Segmenting the 3D chest CT image sequence by utilizing a pre-trained DenseNet-FPN model to obtain a lung mask, and intercepting an external cube according to the lung mask to serve as the lung region of interest; Performing intensity truncation on voxels in the region of interest of the lung to remove highlight noise, and performing Z-score normalization processing; the processed data is scaled to a fixed three-dimensional tensor as input to the visual encoder (21).
3. The lung cancer diagnosis and prognosis system based on a visual language base model according to claim 1, wherein the visual language comparison pre-training module (20) is configured to: Dividing the input pulmonary region of interest data into space-time embedded blocks by using the visual encoder (21), and extracting a 3D visual feature map with spatial structure information and a global visual feature vector by using a multi-layer shift window attention module; Converting the radiological report to text feature embedding with the linguistic encoder (22); and calculating the contrast loss function by maximizing cosine similarity between the paired global visual feature vector and the text feature embedding and minimizing cosine similarity between unpaired samples.
4. A lung cancer diagnosis and prognosis system based on a visual language base model according to claim 3, characterized in that the visual language comparison pre-training module (20) further comprises a shared disease-specific auxiliary classification head for: receiving the global visual feature vector, and predicting the probability of the abnormal sign label through a full connection layer and an activation function; Calculating a binary cross entropy loss as the disease-specific auxiliary classification loss function based on the predicted probability and the true anomaly sign label generated by the database construction module (10); A weighted sum of the contrast loss function and the disease-specific auxiliary classification loss function is taken as a total pretraining target.
5. The lung cancer diagnosis and prognosis system based on a visual language base model according to claim 1, characterized in that the 3D spatial pooling perceptron (31) in a radiology report generation module (30) is configured to: calculating a spatial attention weight map of the visual feature through a 3D convolution layer; The visual features are weighted and aggregated by utilizing the space attention weight graph, so that pooled features are obtained; The pooled features are compressed into a fixed number of visual tokens by layer normalization and multi-layer perceptron and mapped to the embedding space of the large language model by projection layer (32).
6. The visual language-based model lung cancer diagnosis and prognosis system according to claim 5, wherein the radiological report generation module (30) employs a two-stage training strategy: A first stage freezes the visual encoder (21) and the large language model, training only the 3D spatial pooling perceptron (31) and projection layer (32); a second stage of thawing the visual encoder (21) and fine-tuning the parameter matrix of the large language model using a low-rank adaptation technique; In the second stage training, a semantic consistency loss function is introduced, wherein the semantic consistency loss function is used for constraining consistency of the generated report and the input lung region of interest in a semantic feature space.
7. The visual language-based model lung cancer diagnosis and prognosis system according to claim 1, wherein the downstream task prediction module (40) employs a multistage defrost migration strategy: Thawing parameters of the visual encoder (21) in sequential stages for downstream tasks of different complexity; Training only the task pre-measurement head for simple tasks, progressively thawing deep to shallow parameters of the visual encoder (21) for complex tasks, and fine-tuning with a layered learning rate.
8. The visual language-based model lung cancer diagnosis and prognosis system according to claim 1, wherein the downstream task prediction module (40) comprises a malignancy classification prediction head (41) and a histological and genotyping prediction head (42): The malignancy classification pre-measurement head (41) is used for outputting the benign and malignancy probability of the lung nodule based on a binary cross entropy loss function; The histological and genetic typing prediction head (42) is used for outputting lung cancer subtype classification results and gene mutation states based on a weighted cross entropy loss function, wherein the weight of the weighted cross entropy loss function is determined according to the reciprocal of the number of category samples, so that the problem of category imbalance is solved.
9. The visual language-based model lung cancer diagnosis and prognosis system according to claim 1, wherein the downstream task prediction module (40) comprises a survival prognosis prediction head (44), the survival prognosis prediction head (44) being configured to: Converting the continuous survival time into a classification label; introducing an inverse probability weighting strategy, and weighting the un-deleted samples by using a deletion distribution function estimated by a Kaplan-Meier method; And calculating weighted binary cross entropy loss, and outputting risk assessment results of the progression-free lifetime and the total lifetime.
10. The visual language-based model lung cancer diagnosis and prognosis system according to claim 1, wherein the downstream task prediction module (40) further comprises a transfer prediction head (43), the transfer prediction head (43) being configured to: modeling lymph node metastasis and distant metastasis as multi-tag classification problems; based on the global visual features, the occurrence probability of a plurality of transition categories is predicted simultaneously by utilizing a multi-label binary cross entropy loss function.

Description

Lung cancer diagnosis and prognosis system based on visual language basic model Technical Field The invention relates to the technical field of artificial intelligence and medical image processing, in particular to a lung cancer diagnosis and prognosis system based on a visual language basic model. Background Lung cancer is a malignant tumor with extremely high mortality rate in the global scope, and early screening and accurate diagnosis of the lung cancer are important for improving survival rate of patients. Currently, chest Computed Tomography (CT) is the primary imaging modality for clinical diagnosis of lung cancer. In a traditional diagnosis and treatment procedure, a radiologist needs to browse hundreds of CT sections layer by layer, manually write a radiological report, and subjectively evaluate the benign malignancy of a nodule and the prognosis of a patient based on image features. This process is not only labor intensive, time consuming, but also diagnostic consistency and accuracy are highly dependent on the clinical experience and state of staffs, subject to subjective bias. Although deep learning techniques have been widely used in the field of medical image analysis, existing auxiliary diagnostic systems still have a number of limitations in practical clinical applications. The existing image analysis model is mostly limited to processing of a single task, for example, training is only carried out on lung nodule detection or benign and malignant classification, comprehensive assessment capability of the whole lung cancer flow is lacked, complex clinical requirements such as histological typing, gene mutation state prediction, remote metastasis assessment and survival prediction are difficult to consider under the same framework, and models of different diagnosis and treatment links are mutually split. Meanwhile, in order to reduce the computational complexity, the existing mainstream method often adopts a two-dimensional slice input or pseudo three-dimensional processing mode, which damages the three-dimensional space continuity of the focus to a certain extent and loses key space context information. In addition, the existing visual model and natural language processing model are usually independently constructed, and rich semantic supervision information contained in a radiology report cannot be effectively utilized, so that a modal barrier exists between visual features and medical text semantics, and the features lack semantic interpretation. Particularly, when an attempt is made to introduce a large language model to perform automatic report generation, massive feature tokens generated by high-dimensional three-dimensional medical image data can cause exponential increase of calculation cost, and end-to-end training and efficient reasoning are difficult to achieve. Moreover, for survival prognosis analysis tasks, it is difficult for conventional regression or classification models to effectively handle the right deletion phenomenon commonly found in clinical follow-up data, and direct omission of deleted data often results in systematic deviation of prognosis risk assessment results. Therefore, a comprehensive diagnosis and prognosis system capable of deeply fusing three-dimensional images and text semantics, breaking through the bottleneck of high-dimensional data calculation and taking into account multi-dimensional clinical prediction tasks is needed. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a lung cancer diagnosis and prognosis system based on a visual language basic model, which solves the problems that the prior lung cancer auxiliary diagnosis technology has image and text mode semantic rupture, the calculation bottleneck is caused when the high-dimensional three-dimensional image features are adapted to a large language model, and the single model is difficult to consider the diagnosis to prognosis of the whole-flow clinical task. In order to achieve the above purpose, the invention is realized by the following technical scheme: The invention provides a lung cancer diagnosis and prognosis system based on a visual language basic model, which comprises a database construction module, a visual language comparison pre-training module, a radiology report generation module and a downstream task prediction module. Wherein the database construction module is configured to acquire paired 3D chest CT image sequences and radiological reports and perform data preprocessing operations to obtain normalized lung region of interest data and structured anomaly signature. The visual language contrast pre-training module is connected to a database construction module comprising a 3DSwinTransformer architecture-based visual encoder and a BERT architecture-based linguistic encoder configured to extract semantically aligned visual features and textual features by jointly optimizing contrast loss functions and disease-specific auxiliary classification loss