CN-122025149-A - Tumor prediction method based on layered visual language and electronic equipment

CN122025149ACN 122025149 ACN122025149 ACN 122025149ACN-122025149-A

Abstract

The application provides a tumor prediction method based on layered visual language and electronic equipment, wherein the method is applied to the field of artificial intelligence, and comprises the following steps: the method comprises the steps of obtaining a three-dimensional medical image of a target patient and a tumor interested region, extracting key sub-region features of the region to obtain a local semantic feature set, extracting tumor and adjacent tissue features to obtain a global context feature set, and forming a visual feature set. And constructing a clinical priori query of a target cancer species and an image mode, generating a prognosis related language prompt through a large language model, encoding, solving an optimal matching plan matrix of a semantic matching optimal problem, and screening elite feature subsets. And enhancing the global feature set through gating fusion, applying contrast learning consistency constraint, and enabling the fusion features to form a characterization vector input prediction layer to obtain survival risk scores. The method can integrate priori knowledge and tumor pathological characteristics, so that the final output is accurate and reliable in survival risk score, and the adaptation to the clinical actual evaluation requirement is more reliable.

Inventors

Deng fuxing
LIN QIN
WU YIPING
Shang kaili

Assignees

厦门大学附属第一医院(厦门市第一医院、厦门市红十字医院、厦门市糖尿病研究所)

Dates

Publication Date: 20260512
Application Date: 20260408

Claims (10)

1. A tumor prediction method based on hierarchical vision and language collaboration, the method comprising: acquiring a three-dimensional medical image of a target patient and a corresponding tumor region of interest; Extracting features of key subregions in the tumor region of interest from the three-dimensional medical image to obtain a local semantic feature set, and extracting features comprising the whole tumor region of interest and adjacent tissue regions thereof to obtain a global context feature set; constructing a clinical priori knowledge query aiming at a target cancer type and an image mode, inputting the query into a large language model to generate a language prompt related to survival prognosis; The semantic matching cost optimization problem between the visual feature set and the language prompt vector set is solved, and an optimal matching plan matrix is obtained; information enhancement is carried out on the elite global feature subset by using the aggregated elite local feature subset through a gating fusion mechanism, so that an enhanced global feature set is obtained; Applying consistency constraint based on contrast learning to the local characterization obtained by aggregating the elite local feature subset and the global characterization obtained by aggregating the enhanced global feature set; And fusing the elite local feature subset with the enhanced global feature set to form a characterization vector of a target patient, and inputting a prediction layer to obtain a survival risk score, wherein the prediction layer is obtained by training a minimum total objective function, and the total objective function comprises a first loss function for survival risk ranking and a second loss function generated by the consistency constraint based on contrast learning.
2. The method of claim 1, wherein extracting features of key sub-regions within the tumor region of interest from the three-dimensional medical image to obtain a local semantic feature set, and extracting features including the entire tumor region of interest and adjacent tissue regions thereof to obtain a global context feature set, comprises: Dividing the key subregion of the tumor region of interest to obtain a plurality of key subregions at least comprising a tumor core region and a tumor invasion edge region; Inputting the image blocks of each key subregion into a pre-trained first 3D feature extraction network respectively, extracting corresponding feature vectors, and combining the feature vectors of all the key subregions into a local semantic feature set; Inputting the image blocks containing the whole tumor region of interest and the adjacent tissue regions thereof into a pre-trained second 3D feature extraction network, and extracting a global context feature set.
3. The method of claim 1, wherein constructing a clinical prior knowledge query for a target cancer species and image modality, inputting the query to a large language model, generating a language hint related to prognosis of survival, encoding the language hint, generating a set of language hint vectors, comprising: constructing a text query for querying visual features related to poor survival prognosis based on the target cancer species and the image modality; Inputting the text query into a large language model to obtain a natural language description set for describing visual characteristics related to poor survival prognosis; Mapping each description in the natural language description set into a high-dimensional vector through a pre-trained text encoder to obtain the language prompt vector set, wherein each vector in the language prompt vector set uniquely corresponds to a specific prognosis related feature description.
4. The method of claim 1, wherein solving a semantic matching cost optimization problem between the set of visual features and the set of language hint vectors to obtain an optimal matching plan matrix comprises: For each visual feature vector in the visual feature set and each language prompt vector in the language prompt vector set, calculating cosine similarity between the visual feature vector and each language prompt vector, and determining semantic mismatching degree between the visual feature vector and the language prompt vector based on the cosine similarity to construct a semantic matching cost matrix; And solving an optimal matching plan matrix for minimizing the total matching cost through an optimal transmission algorithm based on the semantic matching cost matrix.
5. The method of claim 1, wherein said screening elite local feature subsets and elite global feature subsets from said visual feature set based on said optimal matching plan matrix comprises: For the optimal matching plan matrix, calculating the overall alignment strength of each local feature in the local semantic feature set and each global feature in the global context feature set and all language prompt vectors respectively by summing up each row of elements to obtain corresponding feature alignment scores; based on the feature alignment score, respectively ordering all local features in the local semantic feature set and all global features in the global context feature set in descending order; The method comprises the steps of ranking the local semantic feature set on the first proportion of features to be determined as elite local feature subsets, and ranking the global context feature set on the second proportion of features to be determined as elite global feature subsets, wherein the first proportion is the same as or different from the second proportion.
6. The method of claim 1, wherein the information enhancement of the elite global feature subset by the gating fusion mechanism using the aggregated elite local feature subset to obtain an enhanced global feature set comprises: Performing pooling aggregation on the elite local feature subsets, and splicing the elite local feature subsets with the elite global feature subsets, performing linear transformation on the spliced results through a full-connection layer, and performing Sigmoid activation function processing to generate a gating signal with a value range of 0 to 1; Weighting the activated local features by using the gating signals to obtain weighted local features, and weighting the activated global features by using the difference value of the value 1 and the gating signals to obtain weighted global features; And adding the weighted local features and the weighted global features to obtain an enhanced global feature set.
7. The method of claim 1, wherein said applying a consistency constraint based on contrast learning to the local characterization resulting from aggregating the elite local feature subset and the global characterization resulting from aggregating the enhanced global feature set comprises: for the same patient, the local characterization and the global characterization of the same patient are constructed as positive sample pairs, and for different patients, the local characterization of one patient and the global characterization of the other patient are constructed as negative sample pairs; Based on the positive sample pair and the negative sample pair, respectively calculating the contrast loss from the local characterization to the global characterization and the contrast loss from the global characterization to the local characterization; and taking the average value of the contrast loss from the local characterization to the global characterization and the contrast loss from the global characterization to the local characterization as a final contrast loss value.
8. The method of claim 7, wherein the calculating the local-to-global-characterization contrast loss and the global-to-local-characterization contrast loss based on the positive and negative sample pairs, respectively, comprises: for each patient in the training batch, taking the cosine similarity of the local characterization and the global characterization of the patient as positive sample similarity, and taking the cosine similarity set of the local characterization and the global characterization of other patients in the batch as negative sample similarity; based on the positive sample similarity and the negative sample similarity set, calculating the contrast loss from the local characterization to the global characterization through the following formula : Wherein, the In the case of a batch size of the product, Representing the cosine similarity calculation, Is used for the temperature super-parameter, And Respectively the first The local and global characterizations of the individual patient, Represent the first Global characterization of individual patients; calculating contrast loss from global to local characterization by the following formula : Wherein, the Is the first Local characterization of individual patients.
9. The method of claim 1, wherein the training and prediction process of the prediction layer comprises: In the training stage, a prediction network serving as the prediction layer is constructed, the prediction network is a multi-layer perceptron or Cox proportion risk model, a characterization vector of a target patient generated by training data is input into the prediction network to obtain a prediction risk score, and parameters of the prediction network and all trainable components used for generating the characterization vector of the target patient are jointly optimized end to end by minimizing a total objective function, wherein the total objective function is formed by combining a first loss function and a second loss function, and the first loss function is a negative bias log likelihood loss function based on the Cox proportion risk model; in the prediction stage, the characterization vector of the target patient is input into the prediction layer obtained through the training, and the survival risk score of the target patient is output.
10. An electronic device, the electronic device comprising: a memory for storing executable program code; a processor for calling and running the executable program code from the memory, causing the electronic device to perform the method of any one of claims 1 to 9.

Description

Tumor prediction method based on layered visual language and electronic equipment Technical Field The application relates to the field of artificial intelligence, and in particular relates to a tumor prediction method based on layered visual language and electronic equipment. Background In oncology, medical images provide rich, non-invasive in vivo tumor information, including tumor morphology, size, metabolic activity, relationships with surrounding tissues, etc., which enables the patient's physical condition to be determined. In recent years, information of the medical image is analyzed based on a deep learning model to improve recognition accuracy of tumor information. In the related art, the existing model only analyzes information carried in medical images from pixel data, completely ignores diagnosis knowledge accumulated by a professional doctor for many years, generally takes the whole tumor or image as a whole to perform feature extraction, cannot effectively distinguish information between internal features of the tumor and external environments of the tumor, and cannot give a correct judgment basis to a clinician. Disclosure of Invention The application provides a tumor prediction method based on layered visual language and electronic equipment, wherein the method can integrate priori knowledge and tumor pathological characteristics, so that the final output is accurate and reliable in survival risk score, and the adaptation to clinical actual evaluation requirements is more reliable. In a first aspect, a tumor prediction method based on hierarchical visual language is provided, the method comprising: acquiring a three-dimensional medical image of a target patient and a corresponding tumor region of interest; Extracting features of key subregions in the tumor region of interest from the three-dimensional medical image to obtain a local semantic feature set, and extracting features comprising the whole tumor region of interest and adjacent tissue regions thereof to obtain a global context feature set; constructing a clinical priori knowledge query aiming at a target cancer type and an image mode, inputting the query into a large language model to generate a language prompt related to survival prognosis; the semantic matching cost optimization problem between the visual feature set and the language prompt vector set is solved, and an optimal matching plan matrix is obtained; Information enhancement is carried out on the elite global feature subset by using the aggregated elite local feature subset through a gating fusion mechanism, so that an enhanced global feature set is obtained; applying consistency constraint based on contrast learning to the local characterization obtained by aggregating the elite local feature subset and the global characterization obtained by aggregating the enhanced global feature set; The elite local feature subset is fused with the enhanced global feature set to form a representation vector of a target patient and is input into a prediction layer to obtain a survival risk score, wherein the prediction layer is trained by a minimum total objective function, and the total objective function comprises a first loss function for survival risk ranking and a second loss function generated by the consistency constraint based on contrast learning. According to the method, clinical priori knowledge query aiming at a target cancer species and an image mode can be constructed, a large language model is combined to generate a prognosis related language prompt and code, then an optimal matching plan matrix is solved through semantic matching cost optimization, clinical priori knowledge and image visual characteristics are fused, namely, diagnosis knowledge accumulated by a professional doctor for years is increased in the process of analyzing information carried in medical images, a gating fusion mechanism is utilized to conduct targeted information enhancement on an elite global feature set by utilizing an aggregated elite local feature subset, the elite local feature subset and the enhanced global feature set are fused to form a characterization vector of a target patient and input a prediction layer, and survival risk scores are obtained, so that pathological information of key sub-areas in tumors is reserved in the process of survival risk assessment, the physical condition of the patient is judged jointly by combining features of a tumor region of interest and adjacent tissue areas, and accordingly accurate and reliable survival risk scores are finally output, clinical actual assessment requirements are met, and accuracy, reliability and clinical practicability of tumor survival prognosis prediction are comprehensively improved. With reference to the first aspect, in some possible implementations, the extracting features of key sub-regions in the tumor region of interest from the three-dimensional medical image to obtain a local semantic feature set, and extracting features incl