CN-122023944-A - Digestive tract endoscope lesion classification model training method based on small sample learning

CN122023944ACN 122023944 ACN122023944 ACN 122023944ACN-122023944-A

Abstract

The invention relates to the field of computer aided diagnosis and discloses a training method of an endoscope lesion classification model based on small sample learning, which is characterized in that marking information of different modes is unified under the same coordinate system on the basis of a constructed mode time sequence diagram to form three-dimensional space-time aligned multi-mode data, the multi-mode data after classification are subjected to dynamic cross-mode attention fusion, so that a dynamic cross-mode attention weight diagram is generated, cross-mode sharing characteristics and time sequence dynamic characteristics are obtained on the basis of the weight diagram and are combined to form cross-mode time sequence consistency characteristics, the cross-mode attention weight diagram is combined, a meta-learning classification model is trained, the trained meta-learning classification model is visualized in a thermodynamic diagram mode, the trained meta-learning classification model is exported and deployed to an actual application platform, new endoscope images are classified, the training efficiency and the classification accuracy are improved, and the robustness of the model is enhanced.

Inventors

FU YIWEI
XUE DANDAN
JIAO CHENYANG
HE JUN
ZHAO JUNJUN
Yuan Wenzhuo
DING WEIFENG

Assignees

南京索图科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260408

Claims (10)

1. The method for training the classification model of the gastrointestinal endoscope lesions based on the small sample learning is characterized by comprising the following steps of: S1, acquiring a white light endoscope image, an NBI image and a confocal image in a digestive tract by positioning a lesion area, marking the lesion area of the white light endoscope image, mapping marking information with the NBI image and the confocal image, and constructing a modal timing diagram; S2, unifying marking information of different modes into the same coordinate system on the basis of the constructed mode time sequence diagram to form three-dimensional space-time aligned multi-mode data, and classifying the aligned multi-mode data according to lesion states; S3, carrying out dynamic cross-modal attention fusion on the classified multi-modal data so as to generate a dynamic cross-modal attention weight graph; S4, comparing the fused dynamic cross-modal attention weight graph with tasks among different modalities to obtain cross-modal sharing characteristics, combining the fused dynamic cross-modal attention weight graph with time sequence information in a modal time sequence graph, and comparing the different time-space tasks to obtain time sequence dynamic characteristics; S5, combining the cross-modal sharing characteristic with the time sequence dynamic characteristic to form a cross-modal time sequence consistency characteristic, and combining the cross-modal time sequence consistency characteristic with a cross-modal attention weight graph to train the meta-learning classification model; S6, visualizing the attention weight by using a thermodynamic diagram; And S7, exporting the trained meta-learning classification model, deploying the meta-learning classification model to an actual application platform, and classifying the new digestive tract endoscopic image.
2. The method for training a classification model of a lesion of an endoscope in a digestive tract based on small sample learning according to claim 1, wherein the step of constructing a modal timing chart comprises: S101, preliminarily positioning a lesion area in a digestive tract through a white light endoscope, acquiring a white light endoscope image of the lesion area, and then switching to an NBI endoscope mode in the same examination process to acquire an NBI image of the same lesion area; S102, performing image preprocessing on the obtained white light endoscope image, NBI image and cell-level resolution confocal image; and S103, marking a lesion area of the white light endoscope image, automatically mapping marking information to the NBI image and the confocal image through an image registration technology, realizing the spatial alignment of the lesion area among the multi-mode images, simultaneously extracting mode specific characteristic data from the marking area, and constructing a mode time sequence diagram of the spatial-temporal alignment by combining with an inspection time stamp.
3. The method for training a classification model of a lesion of an endoscope in a digestive tract based on small sample learning of claim 2, wherein the forming of the three-dimensional space-time aligned multi-modal data comprises: Transforming the labeling information of different modes from the original coordinate system to a unified coordinate system on the basis of the mode timing diagram; calculating the structural similarity of the image of the transformed labeling area and the image of the corresponding labeling area in the reference mode image to verify the spatial alignment, and if the structural similarity is greater than a preset similarity threshold, determining that the spatial alignment verification is passed; After the space alignment verification is determined to pass, compensating the acquisition time difference of different modes, ensuring the synchronization of time axes, calculating the inter-frame difference of aligned videos, and if the inter-frame difference is smaller than a preset inter-frame difference threshold value, determining that the time alignment verification passes; And integrating the different mode data with the corresponding labeling information after the space and time alignment to form three-dimensional space-time aligned multi-mode data, and then automatically classifying the multi-mode data according to the pathological change state by using a deep learning model.
4. A method of training a small sample learning based gastrointestinal endoscope lesion classification model according to claim 3 and wherein said classification of multi-modal data is performed by dividing the aligned different modality data into subsets of data by lesion state, thereby forming a multi-modal sub-dataset.
5. The method of training a small sample learning-based classification model of an enteroscope lesion of claim 1, wherein the step of generating a dynamic cross-modal attention weighting map comprises: Constructing a network structure for dynamically calculating the attention weight among different modes, wherein the network structure comprises a plurality of branches, each branch processes the data of one mode, and the information interaction among different modes is realized through an interaction layer; The classified multi-mode data are respectively input into corresponding branches of a dynamic cross-mode attention network, similarity among different modes is dynamically calculated in an interaction layer to obtain attention weights among the different modes, the data of the different modes are fused according to the attention weights among the different modes, and meanwhile, the attention weights among the different modes at each position are recorded in the attention fusion process to generate a dynamic cross-mode attention weight graph.
6. The training method of the classification model of the gastrointestinal endoscope lesions based on the small sample learning according to claim 5, wherein comparing the task between different modes with the fused dynamic cross-mode attention weight graph to obtain the cross-mode sharing feature comprises: The attention weight distribution is calculated by extracting the attention weight characteristics of different modes from the dynamic cross-mode attention weight graph, the attention weight distributions of different modes on the same task are compared, the difference value between different modes is calculated, and information which is focused by a plurality of modes together is identified based on the difference value and is used as the cross-mode sharing characteristic.
7. The training method of the classification model of the gastrointestinal endoscope lesions based on the small sample learning according to claim 6, wherein the combining the fused dynamic cross-modal attention weight graph with the time sequence information in the modal time sequence graph, comparing different time-space tasks, and obtaining the time sequence dynamic characteristics comprises: The method comprises the steps of carrying out time convolution on a model time sequence diagram, extracting time sequence characteristics from the model time sequence diagram, carrying out weighted aggregation on the extracted time sequence characteristics by using a dynamic cross-model attention weight diagram as a weight in an element-by-element multiplication mode, carrying out comparison analysis on the characteristics obtained after the weighted aggregation and the characteristics of different time space tasks, and screening out the characteristics with obvious distinguishability by comparing similarity differences to be used as time sequence dynamic characteristics.
8. The training method of the small sample learning-based gastrointestinal endoscope lesion classification model according to claim 7, wherein the training of the meta learning classification model comprises: Performing splicing operation on the extracted cross-modal shared characteristics and time sequence dynamic characteristics to obtain a joint characteristic vector, performing dimension reduction on the joint characteristic vector by adopting a principal component analysis method, and performing splicing and dimension reduction to obtain a characteristic vector capturing cross-modal shared information and time sequence dynamic change information at the same time as a cross-modal time sequence consistency characteristic; training the meta learning classification model by using cross-modal time sequence consistency characteristics; And exporting the trained meta-learning classification model into a deployable format, and deploying on different platforms.
9. The small sample learning-based method of training a classification model of an endoscopic lesion of the alimentary canal of claim 8, wherein the attention visualization comprises: the attention weight is visualized by using a thermodynamic diagram mode, the attention weight is mapped to the pixels of the image, the area with higher weight value is represented by brighter color, and the area of interest of the model on the image is displayed.
10. The small sample learning-based method for training a classification model of an enteron endoscope lesion of claim 9, wherein the deploying of the meta-learning classification model comprises: The trained meta learning classification model is exported to be in a deployable format, deployment is carried out on different platforms, the deployed model is applied to an actual digestive tract endoscope lesion classification task, and a new endoscope image is classified.

Description

Digestive tract endoscope lesion classification model training method based on small sample learning Technical Field The invention relates to the technical field of computer-aided diagnosis, in particular to a training method of a gastrointestinal endoscope lesion classification model based on small sample learning. Background Gastrointestinal endoscopy is an important means of diagnosing gastrointestinal disorders. However, interpretation of endoscopic images is highly dependent on experience and expertise of doctors, and interpretation differences may exist between different doctors. In order to improve the accuracy, efficiency and consistency of diagnostics, deep learning based computer aided diagnosis systems have been developed. Early deep learning models, such as convolutional neural networks, have achieved significant success in the task of classification of images of the digestive tract endoscopes. These models typically require extensive annotation data to train to learn complex image features. However, in the field of digestive endoscopy, there are difficulties in acquiring large-scale, high-quality, fine-labeled datasets. The small sample learning is a technology developed for solving the problem of data scarcity, so that a model can quickly learn and identify new categories under the condition of only a small number of marked samples, thereby developing a computer-aided diagnosis system which is more practical, easier to deploy and lower in data dependence, and finally assisting doctors in improving diagnosis efficiency and accuracy; however, the above procedure still has the following drawbacks: firstly, the existing method may only use a single-mode digestive tract endoscope image for training, such as only use a white light endoscope image, however, the single-mode image contains limited information and may not fully and accurately reflect the characteristics of the lesion; Secondly, the existing method may ignore dynamic change information of lesions in time and space, and only concern features in a single mode during feature extraction, but do not consider correlation between different modes and dynamic change features of lesions in time dimension; thirdly, in the existing method, under a small sample learning scene, the traditional classification model is easy to cause the problem of fitting due to the fact that training samples are limited, so that the generalization capability of the model on a new sample is poor. Disclosure of Invention In order to overcome the defects in the prior art, the invention provides a training method of a classification model of an gastrointestinal endoscope lesion based on small sample learning, which aims to solve the problems in the background art. The invention provides a method for training a classification model of pathological changes of an alimentary canal endoscope based on small sample learning, which comprises the following steps: S1, acquiring a white light endoscope image, an NBI image and a confocal image in a digestive tract by positioning a lesion area, marking the lesion area of the white light endoscope image, mapping marking information with the NBI image and the confocal image, and constructing a modal timing diagram; S2, unifying marking information of different modes into the same coordinate system on the basis of the constructed mode time sequence diagram to form three-dimensional space-time aligned multi-mode data, and classifying the aligned multi-mode data according to lesion states; S3, carrying out dynamic cross-modal attention fusion on the classified multi-modal data so as to generate a dynamic cross-modal attention weight graph; S4, comparing the fused dynamic cross-modal attention weight graph with tasks among different modalities to obtain cross-modal sharing characteristics, combining the fused dynamic cross-modal attention weight graph with time sequence information in a modal time sequence graph, and comparing the different time-space tasks to obtain time sequence dynamic characteristics; S5, combining the cross-modal sharing characteristic with the time sequence dynamic characteristic to form a cross-modal time sequence consistency characteristic, and combining the cross-modal time sequence consistency characteristic with a cross-modal attention weight graph to train the meta-learning classification model; S6, visualizing the attention weight by using a thermodynamic diagram; And S7, exporting the trained meta-learning classification model, deploying the meta-learning classification model to an actual application platform, and classifying the new digestive tract endoscopic image. Preferably, the step of constructing a modal timing chart includes: S101, preliminarily positioning a lesion area in a digestive tract through a white light endoscope, acquiring a white light endoscope image of the lesion area, and then switching to an NBI endoscope mode in the same examination process to acquire an NBI image of the same lesion are