CN-115994558-B - Pre-training method, device, equipment and storage medium of medical image coding network

CN115994558BCN 115994558 BCN115994558 BCN 115994558BCN-115994558-B

Abstract

The embodiment of the application discloses a pre-training method, device, equipment and storage medium of a medical image coding network, which comprise the steps of obtaining medical images divided into a plurality of image blocks, selecting the image blocks to be masked from the plurality of image blocks, masking the image blocks to be masked by using a mask image block to obtain a masked image, obtaining query features and average features by using a coding network and a momentum average network respectively, wherein the structures of the coding network and the momentum average network are the same, one of the query features and the average features is a global feature of the masked image, the other is a local feature of the mask image block, calculating contrast loss based on the query features and the average features, updating model parameters of the coding network according to the updated model parameters of the coding network, and continuing training until the coding network meets a pre-training stop condition. The method can solve the technical problem that a pre-training model aiming at medical images is lacking in the related technology.

Inventors

LI ANWEI

Assignees

广州视源电子科技股份有限公司
广州视源人工智能创新研究院有限公司

Dates

Publication Date: 20260512
Application Date: 20211018

Claims (14)

1. A method for pre-training a medical image coding network, comprising: acquiring a three-dimensional medical image, wherein the medical image is divided into a plurality of image blocks with the same size; selecting an image block to be masked from a plurality of image blocks; Masking the image block to be masked in the medical image by using a mask image block to obtain a masked image; Obtaining query characteristics and average characteristics by using a coding network and a momentum average network respectively, wherein when the query characteristics are first local characteristics of the mask image block, the average characteristics are second global characteristics of the masked image, and when the query characteristics are first global characteristics of the masked image, the average characteristics are second local characteristics of the mask image block, and the coding network and the momentum average network have the same structure; Calculating contrast loss based on the query features and the average features, and updating model parameters of the coding network according to the contrast loss; Updating the model parameters of the momentum average network according to the updated model parameters of the coding network; Continuously acquiring a three-dimensional medical image, and returning to execute the operation of selecting an image block to be masked until the coding network meets a pre-training stopping condition; wherein said calculating a contrast loss based on said query features and said average features comprises: Adding the average characteristics obtained currently into a dynamic dictionary, wherein the average characteristics of other masked images are recorded in the dynamic dictionary; taking the average characteristic and the query characteristic belonging to the same masked image as a positive sample pair, and taking the average characteristic and the query characteristic belonging to different masked images as a negative sample pair; a contrast loss is calculated from the positive sample pair and the negative sample pair.
2. The pretraining method according to claim 1, further comprising: reconstructing a low-resolution image block according to the first local feature or the second local feature; Downsampling the image block to be masked to obtain a downsampled image block; Calculating a reconstruction loss from the low resolution image block and the downsampled image block; the updating the model parameters of the coding network according to the contrast loss comprises: And updating model parameters of the coding network according to the contrast loss and the reconstruction loss.
3. The pretraining method according to claim 2, wherein the reconstruction loss is calculated by a Smooth-L1 loss function.
4. The pre-training method of claim 1, wherein the encoding network is comprised of a feature encoder, a first pooling layer, a projector, a second pooling layer, and a predictor, Obtaining query features using the encoding network includes: extracting features of the masked image by the feature encoder to obtain a three-dimensional feature map; Acquiring mask features of the mask image blocks and image features of other image blocks in the masked image from the three-dimensional feature map by a first pooling layer, wherein each image block corresponds to one image feature; mapping and projecting the mask feature and each image feature by the projector to obtain each isolation feature; Fusing all the isolation features by using the second pooling layer to obtain a first global feature of the masked image; and predicting the isolation characteristic corresponding to the mask image block by the predictor to obtain a first local characteristic corresponding to the mask image block.
5. The pretraining method of claim 4, wherein the feature encoder employs an asymmetric 3D convolutional network, the projector employs a first multi-layer perceptron network, and the predictor employs a second multi-layer perceptron network.
6. The pre-training method of claim 4, wherein the downsampling ratio of the feature encoder is less than or equal to the size of the image block.
7. The pre-training method of claim 6 wherein the image block size is an integer multiple of the downsampling ratio of the feature encoder, and wherein the first pooling layer employs ROI pooling.
8. The pretraining method according to claim 1, wherein the selecting an image block to be masked from among the plurality of image blocks comprises: calculating the one-dimensional entropy of each image block; filtering the image block according to the one-dimensional entropy; And selecting one image block from the reserved image blocks as the image block to be masked.
9. The pre-training method of claim 8, wherein the filtering the image block according to the one-dimensional entropy comprises: selecting one-dimensional entropy larger than a preset threshold value from all the one-dimensional entropy, or selecting a preset number of one-dimensional entropy from the largest one-dimensional entropy based on the size sorting of the one-dimensional entropy; and reserving the image block corresponding to the selected one-dimensional entropy.
10. The pretraining method according to claim 1, wherein before masking the image block to be masked in the medical image using a mask image block, further comprising: and selecting a mask image block which is currently used from the fixed value image block, the image block to be masked and other image blocks of the medical image.
11. The pretraining method of claim 10, wherein the fixed value image block corresponds to a first selected probability, the other image blocks correspond to a second selected probability, the image block to be masked corresponds to a third selected probability, the first selected probability is greater than the second selected probability, and the second selected probability is greater than the third selected probability.
12. A pretraining apparatus for a medical image coding network, comprising: the acquisition module is used for acquiring a three-dimensional medical image, and the medical image is divided into a plurality of image blocks with the same size; The selecting module is used for selecting the image block to be masked from a plurality of image blocks; The masking module is used for masking the image blocks to be masked in the medical image by using the mask image blocks to obtain a masked image; The feature determining module is used for obtaining query features and average features by using a coding network and a momentum average network respectively, wherein the query features are the second global features of the masked image when the query features are the first local features of the mask image block, the average features are the second local features of the mask image block when the query features are the first global features of the masked image block, and the coding network and the momentum average network have the same structure; The first updating module is used for calculating contrast loss based on the query characteristics and the average characteristics and updating model parameters of the coding network according to the contrast loss; The second updating module is used for updating the model parameters of the momentum average network according to the updated model parameters of the coding network; The repeated training module is used for continuously acquiring the three-dimensional medical image and returning to execute the operation of selecting the image block to be covered until the coding network meets the pre-training stopping condition; The first updating module comprises a joining unit for joining the average characteristics obtained currently into a dynamic dictionary, wherein average characteristics of other covered images are recorded in the dynamic dictionary, a sample pair constructing unit for taking average characteristics and query characteristics belonging to the same covered image as positive sample pairs and average characteristics and query characteristics belonging to different covered images as negative sample pairs, and a comparison learning unit for calculating comparison loss through the positive sample pairs and the negative sample pairs.
13. A pre-training apparatus for a medical image coding network, comprising: one or more processors; A memory for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of pre-training a medical image coding network as claimed in any one of claims 1-11.
14. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a method of pre-training a medical image coding network according to any of claims 1-11.

Description

Pre-training method, device, equipment and storage medium of medical image coding network Technical Field The embodiment of the application relates to the technical field of neural networks, in particular to a pre-training method, device and equipment of a medical image coding network and a storage medium. Background Deep learning (DEEP LEARNING, DL) refers to learning the inherent regularity and presentation hierarchy of sample data to enable a machine to analyze learning capabilities like a person. With the development of deep learning technology, deep learning models are widely used in the medical image processing direction. For example, brain medical images are taken by magnetic resonance examination (Magnetic Resonance, MR) and brain gliomas in the brain medical images are segmented using a deep learning model to achieve assisted detection of brain gliomas. For another example, an electronic computed tomography (Computed Tomography, CT) is used to obtain a medical image of the lung, and a deep learning model is used to detect pulmonary nodules in the medical image of the lung to achieve assisted detection of pulmonary nodules. Also, for example, chest medical images are taken using a direct digital radiography system (DR) and chest abnormalities are detected using a deep learning model to facilitate detection of the chest. However, in contrast to the task of deep learning models to process natural images (e.g., face images), the task of deep learning models to process medical images lacks a pre-training model on a large dataset. The pre-training model can be regarded as a deep learning model which is obtained based on big data set pre-training and is irrelevant to specific tasks, and in the application process, the specific tasks can be combined, and fine-tuning training can be directly carried out on the pre-training model, so that the pre-training model is quickly applicable to the specific tasks. But currently lacks a pre-trained model for medical imaging. Therefore, when processing medical images, training needs to be started from random initialized model parameters for a deep learning model, the requirements on the data quantity of the marked medical images used for training are high, the requirements on the parameter adjustment strategy of the model are high, and the problems of difficulty in model training, limited precision and insufficient generalization capability exist. Disclosure of Invention The embodiment of the application provides a pre-training method, device and equipment of a medical image coding network and a storage medium, which are used for solving the technical problem that a pre-training model for medical images is lacked in the related technology. In a first aspect, an embodiment of the present application provides a method for pre-training a medical image coding network, including: acquiring a three-dimensional medical image, wherein the medical image is divided into a plurality of image blocks with the same size; selecting an image block to be masked from a plurality of image blocks; Masking the image block to be masked in the medical image by using a mask image block to obtain a masked image; Obtaining query characteristics and average characteristics by using a coding network and a momentum average network respectively, wherein when the query characteristics are first local characteristics of the mask image block, the average characteristics are second global characteristics of the masked image, and when the query characteristics are first global characteristics of the masked image, the average characteristics are second local characteristics of the mask image block, and the coding network and the momentum average network have the same structure; Calculating contrast loss based on the query features and the average features, and updating model parameters of the coding network according to the contrast loss; Updating the model parameters of the momentum average network according to the updated model parameters of the coding network; and continuing to acquire the three-dimensional medical image, and returning to execute the operation of selecting the image block to be masked until the coding network meets the pre-training stopping condition. In a second aspect, an embodiment of the present application further provides a pretraining apparatus of a medical image coding network, including: the acquisition module is used for acquiring a three-dimensional medical image, and the medical image is divided into a plurality of image blocks with the same size; The selecting module is used for selecting the image block to be masked from a plurality of image blocks; The masking module is used for masking the image blocks to be masked in the medical image by using the mask image blocks to obtain a masked image; The feature determining module is used for obtaining query features and average features by using a coding network and a momentum average network respectively, wherein the query fe