CN-121982468-A - Image embedding processing method and device, electronic equipment and storage medium

CN121982468ACN 121982468 ACN121982468 ACN 121982468ACN-121982468-A

Abstract

The invention discloses an image embedding processing method, an image embedding processing device, electronic equipment and a storage medium, wherein the method comprises the steps of obtaining a first medical image to be processed and determining a first image processing model corresponding to the first medical image; the first image processing model at least comprises a feature extraction module, a feature fusion module and an embedding processing module, wherein the feature extraction module is used for extracting features of a first medical image under multiple scales, the feature fusion module is used for obtaining a fusion feature map obtained after fusion of global features and local features of the first medical image, the embedding processing module is used for obtaining an embedding vector for mapping the fusion feature map into a preset dimension, the first medical image is input into the first image processing model, and a first embedding vector corresponding to the first medical image is output. According to the scheme, the medical image can be embedded in a high-efficiency and high-precision manner through the structural improvement and the designed first image processing model.

Inventors

ZHANG XINGLIN
LIU JIAQI
MA BINGQI
LIU YONGHUI

Assignees

上海影苗智能科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260330

Claims (10)

1. An image embedding processing method, characterized in that the method comprises: the method comprises the steps of obtaining a first medical image to be processed, and determining a first image processing model corresponding to the first medical image, wherein the first image processing model at least comprises a feature extraction module, a feature fusion module and an embedding processing module, the feature extraction module is used for extracting features of the first medical image under multiple scales, the feature fusion module is used for obtaining a fusion feature map obtained after fusion of global features and local features of the first medical image, and the embedding processing module is used for obtaining an embedding vector for mapping the fusion feature map into preset dimensions; And inputting the first medical image into the first image processing model, and outputting a first embedded vector corresponding to the first medical image.
2. The method of claim 1, wherein the inputting the first medical image into the first image processing model and outputting a first embedded vector corresponding to the first medical image comprises: Inputting the first medical image to the feature extraction module to extract image features of the first medical image under different scales so as to obtain a first feature map; inputting the first feature map to the feature fusion module to fuse global features and local features of the first medical image to obtain a fused feature map; and inputting the fusion feature map to the embedding processing module to perform feature compression and standardization processing on the fusion feature map to obtain the first embedding vector corresponding to the first medical image.
3. The method of claim 2, wherein the feature extraction module comprises a depth separable convolution unit, a convolution processing unit and a splicing unit, wherein the depth separable convolution unit comprises at least one depth separable convolution layer, the convolution processing unit comprises at least two parallel convolution branches, at least two of the convolution branches comprise a standard convolution layer and a cavity convolution layer, and convolution kernels corresponding to the standard convolution layers in at least two of the convolution branches are different in size; the inputting the first medical image to the feature extraction module to perform image feature extraction on the first medical image under different scales to obtain a first feature map includes: Inputting the first medical image to the depth separable convolution unit to respectively convolve the spatial features and the channel features of the first medical image to obtain a second feature map; the second feature map is respectively input into each convolution branch of the convolution processing unit, and a third feature map output by each convolution branch is obtained; and inputting the plurality of third feature images into the splicing unit to perform feature splicing processing to obtain the first feature images.
4. The method of claim 2, wherein the feature fusion module includes at least a channel attention unit, a spatial attention unit, and a fusion unit, and wherein inputting the first feature map to the feature fusion module to fuse global features and local features of the first medical image to obtain a fused feature map includes: inputting the first feature map to the channel attention unit so as to adaptively learn the attention weights of a plurality of channels of the first feature map and output a global feature map; Inputting the global feature map to the spatial attention unit so as to adaptively learn attention weights of a plurality of spatial positions of the global feature map and output a local feature map; and inputting the global feature map and the local feature map to the fusion unit, and outputting the fusion feature map.
5. The method of claim 2, wherein the embedded processing module comprises at least a first compression unit, a second compression unit, and a projection unit, wherein the first compression unit comprises at least one convolutional layer; The step of inputting the fusion feature map to the embedding processing module to perform feature compression and standardization processing on the fusion feature map to obtain the first embedding vector corresponding to the first medical image, including: inputting the fusion feature map to the first compression unit for performing a first convolution operation to compress channel information of the fusion feature map to obtain a fourth feature map; Inputting the fourth characteristic diagram to the second compression unit for self-adaptive average pooling operation so as to compress the space size of the fourth characteristic diagram to obtain a fifth characteristic diagram; And inputting the fifth characteristic diagram to the projection unit for flattening and normalizing to obtain the first embedded vector with the preset dimension.
6. The method of claim 1, wherein the first image processing model is trained by: Acquiring a plurality of second medical images, wherein the second medical images are correspondingly marked with pathology result labels, and the pathology result labels are used for indicating pathology types of the second medical images; For a single second medical image, extracting features of the second medical image through a first machine learning model to obtain a second embedded vector corresponding to the second medical image; determining an embedding vector discrimination loss of the first machine learning model based on the second embedding vectors and the pathological result labels corresponding to the plurality of second medical images; And adjusting parameters of the first machine learning model based on the embedded vector discrimination losses corresponding to the plurality of second medical images so as to obtain the first image processing model.
7. The method of claim 6, wherein the determining an embedding vector discrimination loss of the first machine learning model based on the second embedding vectors and the pathology result labels for the plurality of the second medical images comprises: determining a positive sample pair and a difficult sample pair from the plurality of second medical images based on the second embedded vectors and the pathological result labels corresponding to the plurality of second medical images, wherein the similarity of the second embedded vectors between the two second medical images belonging to the difficult sample pair is greater than a preset similarity threshold, and the pathological result labels corresponding to the two second medical images are different; Determining a first loss according to the similarity distance between the second embedded vectors corresponding to the two second medical images in the positive sample pair; determining a second loss according to the similarity distance between the second embedded vectors corresponding to the two second medical images in the difficult sample pair; And carrying out weighted summation on the first loss and the second loss, and determining the embedding vector discrimination loss.
8. An image embedding processing apparatus, characterized in that the apparatus comprises: The data acquisition module is used for acquiring a first medical image to be processed and determining a first image processing model corresponding to the first medical image; the first image processing model at least comprises a feature extraction module, a feature fusion module and an embedding processing module, wherein the feature extraction module is used for extracting features of the first medical image under multiple scales, the feature fusion module is used for obtaining a fusion feature map obtained after fusion of global features and local features of the first medical image, and the embedding processing module is used for obtaining an embedding vector for mapping the fusion feature map into preset dimensions; And the image processing module is used for inputting the first medical image into the first image processing model and outputting a first embedded vector corresponding to the first medical image.
9. An electronic device, the electronic device comprising: One or more processors; storage means for storing one or more programs, The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the image embedding processing method of any of claims 1-7.
10. A storage medium containing computer executable instructions which, when executed by a computer processor, are for performing the image embedding processing method as claimed in any one of claims 1 to 7.

Description

Image embedding processing method and device, electronic equipment and storage medium Technical Field The present invention relates to the field of computer processing technologies, and in particular, to an image embedding processing method and apparatus, an electronic device, and a storage medium. Background In the medical image field, the existing image embedding process is generally performed through a Vision Transformer (ViT) model, however, a ViT model adopts a global modeling mode, and the local fine features of the image are easily ignored through global attention aggregation features of image segmentation, so that the detail characterization of the image is blurred. In addition, viT models need large-scale pre-training data (more than millions) to realize effective convergence, when sample data are fewer, the problems of over fitting and weak generalization capability easily occur, so that the performance of the models is reduced, and the actual processing requirements in the medical image field are difficult to meet. Disclosure of Invention The invention provides an image embedding processing method, an image embedding processing device, electronic equipment and a storage medium, which can realize efficient and high-precision embedding processing of medical images through a first image processing model after structural improvement and design. In a first aspect, the present invention provides an image embedding processing method, including: the method comprises the steps of obtaining a first medical image to be processed, and determining a first image processing model corresponding to the first medical image, wherein the first image processing model at least comprises a feature extraction module, a feature fusion module and an embedding processing module, the feature extraction module is used for extracting features of the first medical image under multiple scales, the feature fusion module is used for obtaining a fusion feature map obtained after fusion of global features and local features of the first medical image, and the embedding processing module is used for obtaining an embedding vector for mapping the fusion feature map into preset dimensions; And inputting the first medical image into the first image processing model, and outputting a first embedded vector corresponding to the first medical image. In a second aspect, the present invention also provides an image embedding processing apparatus, including: The data acquisition module is used for acquiring a first medical image to be processed and determining a first image processing model corresponding to the first medical image; the first image processing model at least comprises a feature extraction module, a feature fusion module and an embedding processing module, wherein the feature extraction module is used for extracting features of the first medical image under multiple scales, the feature fusion module is used for obtaining a fusion feature map obtained after fusion of global features and local features of the first medical image, and the embedding processing module is used for obtaining an embedding vector for mapping the fusion feature map into preset dimensions; And the image processing module is used for inputting the first medical image into the first image processing model and outputting a first embedded vector corresponding to the first medical image. In a third aspect, an embodiment of the present invention further provides an electronic device, including: One or more processors; storage means for storing one or more programs, The one or more programs, when executed by the one or more processors, cause the one or more processors to implement an image embedding processing method as provided in any embodiment of the present invention. In a fourth aspect, there is also provided in an embodiment of the invention a storage medium containing computer executable instructions which, when executed by a computer processor, are used to perform an image embedding processing method as provided in any embodiment of the invention. According to the technical scheme, a first medical image to be processed is obtained, a first image processing model corresponding to the first medical image is determined, the first image processing model at least comprises a feature extraction module, a feature fusion module and an embedding processing module, the feature extraction module is used for extracting features of the first medical image under multiple scales, so that multi-scale capturing of focus-related features in the first medical image can be achieved, the feature fusion module is used for obtaining a fusion feature map obtained after fusion of global features and local features of the first medical image, fusion of shallow fine grain texture features and deep semantic features is achieved, the embedding processing module is used for obtaining an embedding vector mapping the fusion feature map into a preset dimension, and the first medical image