CN-122025170-A - Rare disease identification method and system based on multi-scale vision-language prompt multi-example learning

CN122025170ACN 122025170 ACN122025170 ACN 122025170ACN-122025170-A

Abstract

A rare disease identification method and system based on multi-scale visual-language prompt multi-example learning are provided, the method is oriented to a full-view digital pathology image (WSI), under the weak supervision condition that only slice-level labels are used, a multi-scale visual-language cooperative prompt mechanism is constructed by introducing priori knowledge in the pathology field, a large-scale language model is utilized to generate descriptive text prompts corresponding to image blocks with different scales, a prototype-guided image block decoder is designed to aggregate massive image block features, and simultaneously a context-guided text decoder is combined to realize cross-modal alignment.

Inventors

SHI JIANGBO
LIU YU
XU FENG

Assignees

西安交通大学

Dates

Publication Date: 20260512
Application Date: 20260127

Claims (9)

1. A multi-example learning rare disease recognition method based on multi-scale visual-language cues for weakly supervised classification of full-field digital pathology images WSI, comprising the steps of: step 1, preprocessing a full-view digital pathological image WSI input into a large language model, filtering a background area, and carrying out dyeing normalization processing on an image block of a tissue area; step 2, clipping from different resolution levels through non-overlapping sliding windows to obtain a low resolution image block set With high resolution image block sets Wherein, the method comprises the steps of, And The number of the image blocks corresponding to the low scale and the high scale respectively; Input frozen pre-trained visual encoder for low resolution and high resolution image blocks, respectively Extracting corresponding low-scale image block features And high-scale image block features Wherein d is the feature dimension of the image block; step 3, constructing a double-scale visual text prompt generation module: Step 3.1, designing a guiding questioning template; step 3.2, inputting a target rare disease category name into a large language model according to a designed questioning template, and automatically generating a corresponding low-scale text prompt and a corresponding high-scale text prompt, wherein the low-scale text prompt describes a tissue arrangement mode and focus area distribution, and the high-scale text prompt describes cell morphology, cell nucleus characteristics and color depth; And 3.3, respectively introducing a leachable context prefix vector before the low-scale text prompt and the high-scale text prompt to form a dynamic text prompt, and encoding the dynamic text prompt into a low-scale prompt text characteristic and a high-scale prompt text characteristic through a frozen pre-training text encoder, wherein the low-scale prompt text characteristic and the high-scale prompt text characteristic are expressed as follows: ,(1) ,(2) Wherein, the Representing the characteristic vector that can be learned, Along with the training optimization process, the matching degree of the text prompt and the target task is gradually improved; Step 4, constructing a prototype guided image block decoder module: Step 4.1, defining a group of learnable low-scale prototype vectors and high-scale prototype vectors; Step 4.2, clustering the low-scale image block features into low-scale prototypes through an attention mechanism, generating updated low-scale prototype features, and automatically learning importance weights of each prototype feature Slice-level visual features By weighting and summing all updated low-scale prototype features, the linear transformation matrix obtained by training is utilized Performing linear transformation to obtain low-scale slice characteristics: the high-scale slice-level visual characteristics are obtained in the same way; step 5, constructing a context-guided text decoder module: step 5.1, taking the text features obtained in the step 3 as a query, taking visual contexts formed by splicing image block features with corresponding scales and prototype features as keys and values, performing feature interaction through a cross attention layer, adaptively absorbing visual information most relevant to text description, optimizing general static text features into dynamic text features highly matched with the current input WSI content ; Step 5.2, calculating optimized low-scale slice characteristics respectively And optimized low-scale text features Similarity of (c) and high-scale slice features Optimized high-scale text features Realizes the effective alignment of the image and the text mode, and is weighted and fused The activation function outputs the final disease classification probability.
2. The multi-example learning rare disease identification method based on multi-scale vision-language prompt as claimed in claim 1 is characterized in that the background filtering adopts an Ojin binarization algorithm to divide tissue areas and backgrounds, only the tissue areas are reserved for image block clipping, and the staining normalization adopts a Z-score method to eliminate staining style differences among different WSI samples.
3. The multi-instance learning rare disease recognition method based on multi-scale visual-language cues as defined in claim 1, wherein the learnable context prefix vector is jointly optimized with classification loss during training to adapt text cues to target task distribution.
4. The multi-instance learning rare disease recognition method based on multi-scale visual-language cues according to claim 1, wherein the prototype-guided image block decoder implements feature grouping by computing similarity between image block features and prototype vectors, and uses an attention mechanism to weight-aggregate features within each group.
5. The multi-instance learning rare disease recognition method based on multi-scale visual-language cues according to claim 1, wherein the context-guided text decoder module adopts a cross-attention mechanism, takes a slice-level visual representation as a Query, takes text features as Key keys and Value values or vice versa to realize bi-directional cross-modal information interaction, and the visual context comprises local image block details and global prototype semantics in the context-guided text decoder for enhancing semantic alignment of the text features with current WSI content.
6. A multi-example learning rare disease recognition system based on multi-scale visual-language cues of the above method for implementing the multi-example learning rare disease recognition method based on multi-scale visual-language cues as set forth in any one of claims 1 to 5, characterized by comprising: the image preprocessing unit is used for carrying out background filtering and dyeing normalization on the full-view digital pathological image in the step 1; The multi-scale image block clipping unit is used for generating a low-scale image block set and a high-scale image block set in the step 2; the double-scale visual text prompt generation unit is used for integrating the large language model and the learnable prefix in the step 3 to dynamically generate a text prompt rich in pathology prior; the visual-language feature extraction unit is used for extracting embedded features of the image blocks and the text prompts in the step 3; a prototype guided image block decoder unit for aggregating image block features to generate a slice level representation in step 4; a context-guided text decoder unit for cross-modal alignment and feature enhancement in step 5; and the classification and interpretability output unit is used for outputting final disease classification probability and key evidence image blocks in the step 5.
7. The multi-example learning rare disease recognition system based on multi-scale visual-language cues as claimed in claim 6, wherein said rare disease recognition system is trained under weak supervision using slice-level labels only, is suitable for small sample rare disease diagnosis scenarios, and has cross-center generalization capability.
8. A multi-example learning rare disease identification device based on multi-scale visual-language cues, comprising: A memory storing a computer program; A multi-example learning rare disease identification method based on multi-scale visual-language cues as defined in any one of claims 1 to 5, when executed by the processor.
9. A computer readable storage medium storing a computer program, which when executed by a processor implements the multi-example learning rare disease recognition method based on multi-scale visual-language cues as claimed in any one of claims 1 to 5.

Description

Rare disease identification method and system based on multi-scale vision-language prompt multi-example learning Technical Field The invention belongs to the technical field of computer vision technology and medical image analysis, and particularly relates to a rare disease identification method and system based on multi-scale vision-language prompt multi-example learning. Background Pathological examination is the core "gold standard" for disease diagnosis and plays a role in a hammer tone. With the rapid development of full-field digital scanning technology, a physical slide conventionally observed under a mirror can be rapidly scanned into a full-field digital pathology image (white SLIDE IMAGE, WSI). WSI has an oversized pixel-level size and pyramid hierarchical context structure, which makes it very difficult to acquire a large number of datasets with accurate pixel-level annotations. Therefore, a weakly supervised learning paradigm relying only on slice-level labels is becoming the dominant approach to processing WSI. Currently, as a representative algorithm of weak supervised learning, multi-example learning (Multiple INSTANCE LEARNING, MIL) is widely adopted to process WSI, mainly including three-stage flow of image block clipping, image block feature extraction and slice feature aggregation. The existing MIL method has the main problem that depending on a large number of slice labels, the training process needs a large number of slice-level labels to ensure that the performance of the model converges to an expected value. However, real scenes such as clinically rare diseases often face a "small sample dilemma" that makes it difficult to support the model for efficient feature learning. In recent years, visual-Language models (VLMs) have a universal feature representation capability by pre-training on massive graphic pairs to learn a broad range of knowledge. A method of [Radford, Alec, et al. "Learning transferable visual models from natural language supervision."International Conference on Machine Learning. PmLR, 2021.], is still characterized in that (1) the text prompt lacks pathology prior, the existing text prompt mostly adopts a fixed class name to replace a template, fine morphological differences among disease subtypes cannot be accurately distinguished, (2) data dependence and calculation cost are high, massive pathology image-text pairs need to be collected for pre-training, the process takes time and resources, rapid landing and iteration of clinical scenes are not facilitated, and (3) WSI data characteristics are difficult to adapt, namely, the VLM in the general field is not optimized aiming at a hierarchical structure and an oversized size of WSI, a large number of image block characteristics are difficult to be aggregated efficiently, and effective alignment of vision and language modes is realized. To sum up, the prior art is difficult to meet the requirements in the real diagnosis scene of 'small sample, cross-center and high-precision' clinical rare disease classification, and a high-efficiency learning system which can fuse pathology prior, effectively migrate pre-training knowledge and adapt to a pathology section data structure is needed. Disclosure of Invention In order to overcome the defects of the prior art, the invention aims to provide a rare disease identification method and a rare disease identification system based on multi-scale vision-language prompt multi-example learning, which simulate a pathologist diagnosis process, realize high-precision classification under a small sample, face full-view digital pathology images (WSI), and construct a multi-scale vision-language cooperative prompt mechanism by introducing priori knowledge in pathology fields under the weak supervision condition of only using slice-level labels, wherein a large language model is utilized to generate descriptive text prompts corresponding to image blocks with different scales, a prototype-guided image block decoder is designed to aggregate massive image block characteristics, and simultaneously a context-guided text decoder is combined to realize cross-modal alignment, so that rare disease identification accuracy is remarkably improved, good cross-center generalization capability is provided, interpretable evidence output is supported, large-scale labeling data or head pre-training is not needed, and clinical deployment is facilitated. A rare disease identification method based on multi-scale visual-language prompt multi-instance learning for weakly supervised classification of full field digital pathology images (WSIs), comprising the steps of: step 1, preprocessing a full-view digital pathological image WSI input into a large language model, filtering a background area, and carrying out dyeing normalization processing on an image block of a tissue area; step 2, clipping from different resolution levels through non-overlapping sliding windows to obtain a low resolution image block set With high