CN-117292139-B - Training method of feature extraction model, image processing method, device and medium

CN117292139BCN 117292139 BCN117292139 BCN 117292139BCN-117292139-B

Abstract

The application discloses a training method of a feature extraction model, an image processing method, image processing equipment and a medium. The training method comprises the steps of obtaining sample images, carrying out feature extraction on the sample images by using a student feature extraction model and a teacher feature extraction model respectively to obtain a plurality of initial feature images with different sizes, carrying out feature fusion on the initial feature images by using the student feature extraction model and the teacher feature extraction model respectively to obtain target feature images corresponding to the initial feature images, wherein the number of the target feature images is the same as that of the initial feature images, and adjusting parameters in the student feature extraction model by using similarity loss between the target feature images obtained by the student feature extraction model and the target feature images obtained by the teacher feature extraction model. By the aid of the scheme, accuracy of the trained student feature extraction model in a subsequent image processing process can be improved.

Inventors

Xi Yingzhuo
ZHANG CHENGCHENG
MA ZIANG

Assignees

杭州华橙软件技术有限公司

Dates

Publication Date: 20260508
Application Date: 20230830

Claims (12)

1. A method of training a feature extraction model, comprising: Acquiring a sample image; respectively carrying out feature extraction on the sample image by using a student feature extraction model and a teacher feature extraction model to obtain a plurality of initial feature graphs with different sizes, wherein the complexity of the student feature extraction model is smaller than that of the teacher feature extraction model; The method comprises the steps of respectively carrying out feature fusion on a plurality of initial feature images by using a student feature extraction model and a teacher feature extraction model to obtain target feature images corresponding to the initial feature images, wherein one initial feature image is taken as an initial feature image, the feature images except the initial feature image are taken as intermediate feature images, and the initial feature image only has one initial feature image with adjacent size; for each intermediate feature map, carrying out feature fusion on the intermediate feature map and the initial feature map which is adjacent to the previous intermediate feature map or the fusion feature map which is adjacent to the previous intermediate feature map, so as to obtain the fusion feature map which is corresponding to each intermediate feature map; feature fusion is carried out on the initial feature map and the fusion feature map corresponding to each intermediate feature map, so that each target feature map is obtained, and the number of the target feature maps is the same as that of the initial feature maps; And adjusting parameters in the student feature extraction model by utilizing similarity loss between a plurality of target feature graphs obtained by the student feature extraction model and a plurality of target feature graphs obtained by the teacher feature extraction model, wherein the student feature extraction model after training can perform feature extraction and feature fusion on an image to be processed in a subsequent image processing process.
2. The training method according to claim 1, wherein before the similarity loss between the plurality of target feature maps obtained by the student feature extraction model and the plurality of target feature maps obtained by the teacher feature extraction model is set, the training method further comprises: obtaining the similarity between the target feature images of each target feature image group, wherein each target feature image group comprises a target feature image obtained by the student feature extraction model and a target feature image obtained by the teacher feature extraction model, and initial feature images corresponding to the two target feature images in the target feature image group have the same size; And determining the similarity loss based on each similarity.
3. The method according to claim 1, wherein for each of the intermediate feature maps, feature-fusing the intermediate feature map with a fused feature map corresponding to a previous adjacent start feature map or a previous adjacent intermediate feature map to obtain a fused feature map corresponding to each of the intermediate feature maps, including: For the first intermediate feature map, cascading the initial feature map after passing through a dimension reduction module to obtain a fusion feature map corresponding to the first intermediate feature map; and for each non-first intermediate feature map, cascading the fusion feature maps corresponding to the previous adjacent intermediate feature map after passing through the dimension reduction module to obtain the fusion feature map corresponding to each non-first intermediate feature map.
4. The training method according to claim 1, wherein the feature fusion is performed on the initial feature map and the fusion feature map corresponding to each intermediate feature map to obtain each target feature map, and the training method comprises: Taking the fusion feature map corresponding to the last intermediate feature map as an initial fusion feature map and taking the other fusion feature maps as intermediate fusion feature maps; For each intermediate fusion feature map, carrying out feature fusion on the intermediate fusion feature map and a previous adjacent initial fusion feature map or a previous adjacent advanced feature map corresponding to the intermediate fusion feature map to obtain an advanced feature map corresponding to each intermediate fusion feature map; Performing feature fusion on the initial feature map and the last advanced feature map to obtain an advanced feature map corresponding to the initial feature map; and obtaining a target feature map corresponding to the initial fusion feature map and a target feature map corresponding to each advanced feature map based on the initial fusion feature map and each advanced feature map.
5. The training method according to claim 4, wherein for each of the intermediate fusion feature maps, feature fusion is performed on the intermediate fusion feature map and a previous adjacent starting fusion feature map or a previous adjacent intermediate fusion feature map to obtain a corresponding advanced feature map of each of the intermediate fusion feature maps, including: for a first intermediate fusion feature map, cascading the initial fusion feature map after downsampling to obtain a further feature map corresponding to the first intermediate fusion feature map; for each non-first intermediate fusion feature map, cascading the advanced feature map corresponding to the previous adjacent intermediate fusion feature map after downsampling to obtain the advanced feature map corresponding to each non-first intermediate fusion feature map; The step of carrying out feature fusion on the initial feature map and the last step feature map to obtain a step feature map corresponding to the initial feature map comprises the following steps: and cascading the last advanced feature map after downsampling for the initial feature map to obtain an advanced feature map corresponding to the initial feature map.
6. The training method of claim 4, wherein the obtaining, based on the initial fusion feature map and each of the advanced feature maps, a target feature map corresponding to the initial fusion feature map and a target feature map corresponding to each of the advanced feature maps includes: and processing the initial fusion feature map and each advanced feature map through a dimension reduction module to obtain a target feature map corresponding to the initial fusion feature map and a target feature map corresponding to each advanced feature map.
7. The training method according to claim 1, wherein the student feature extraction model and the teacher feature extraction model include a channel transformation module, the channel transformation module includes a plurality of cascaded channel transformation modules and a plurality of mapping modules, each of the initial feature maps includes an initial feature map corresponding to each of the channel transformation modules, and the feature extraction is performed on the sample image by using the student feature extraction model and the teacher feature extraction model, respectively, to obtain a plurality of initial feature maps with different sizes, including: Preprocessing the sample image by using the student characteristic extraction model and the teacher characteristic extraction model respectively to obtain a preprocessing characteristic diagram; For a first channel conversion module, processing the preprocessing feature map to obtain a feature map output by the first channel conversion module; For each non-first channel conversion module, processing the feature map output by the previous channel conversion module to obtain a feature map corresponding to each non-first channel conversion module; And mapping the feature images output by the channel conversion modules by using the mapping modules to obtain initial feature images corresponding to the channel conversion modules, wherein the number of channels of the initial feature images corresponding to the channel conversion modules is smaller than that of the feature images output by the channel conversion modules.
8. The training method of claim 7, wherein each channel conversion module includes a first channel conversion sub-module with a step length of 2 and a second channel conversion sub-module with a step length of 1, and the processing the preprocessing feature map for a first channel conversion module to obtain a feature map output by the first channel conversion module includes: Processing the preprocessing feature map by using the first channel transformation submodule to obtain a first advanced feature map; processing the first advanced feature map by using the second channel conversion sub-module to obtain a feature map output by the first channel conversion module; And processing the feature map output by the previous channel conversion module for each non-first channel conversion module to obtain a feature map corresponding to each non-first channel conversion module, wherein the processing comprises the following steps: Processing the feature map output by the previous channel conversion module by utilizing the first channel conversion sub-module to obtain a second advanced feature map; and processing the second advanced feature map by using the second channel conversion sub-module to obtain a feature map output by the non-first channel conversion module.
9. An image processing method, comprising: acquiring an image to be processed; Processing the image to be processed by using a student feature extraction model in an image processing model to obtain a plurality of target feature images corresponding to the image to be processed, wherein the student feature extraction model is trained based on the training method according to any one of claims 1 to 8; and performing image processing on the basis of a plurality of target feature images corresponding to the image to be processed by using a detection head model in the image processing model to obtain an image processing result of the image to be processed.
10. The image processing method according to claim 9, wherein the image processing includes object classification, the image processing is performed based on a plurality of object feature maps corresponding to the image to be processed by using a detection head model in the image processing model, and an image processing result of the image to be processed is obtained, including: Convolving the plurality of target feature maps by utilizing first compact convolution to obtain a plurality of convolution feature maps; Inputting the multiple convolution feature images into a second compact convolution to obtain a classification result, wherein the classification result is used for indicating whether a target object exists or not in the image to be processed; And inputting the plurality of convolution feature maps into a third compact convolution to obtain the position of the target object in the image to be processed and the confidence of the position.
11. An electronic device comprising a memory and a processor, wherein the memory stores program instructions and the processor invokes the program instructions from the memory to perform the method of any of claims 1-10.
12. A computer readable storage medium, comprising program files stored thereon, which when executed by a processor are adapted to carry out the method according to any of claims 1-10.

Description

Training method of feature extraction model, image processing method, device and medium Technical Field The present application relates to the field of computer technologies, and in particular, to a training method for a feature extraction model, and an image processing method, apparatus, and medium. Background The application of image processing modules (such as models for object detection, object recognition, object classification, etc.) is becoming increasingly widespread. Object detection or object recognition is an important task in the field of computer vision, aimed at detecting images or videos and locating the position of specific objects. However, the complexity of the target detection or target identification or target classification model is generally high, and the target detection or target identification model with high complexity cannot be applied to the deployment model on the resource-constrained or embedded device. However, the accuracy of the less complex models is problematic, resulting in lower accuracy of target detection or target identification or target classification. There is an urgent need for a model that not only can be deployed on embedded devices with low complexity, but also has high detection performance and high-efficient inference speed. Disclosure of Invention The application provides at least a training method of a feature extraction model, an image processing method, image processing equipment and a medium. The application provides a training method of a feature extraction model, which comprises the steps of obtaining sample images, carrying out feature extraction on the sample images by utilizing a student feature extraction model and a teacher feature extraction model respectively to obtain a plurality of initial feature images with different sizes, wherein the complexity of the student feature extraction model is smaller than that of the teacher feature extraction model, carrying out feature fusion on the initial feature images by utilizing the student feature extraction model and the teacher feature extraction model respectively to obtain target feature images corresponding to the initial feature images, the number of the target feature images is the same as that of the initial feature images, and carrying out feature extraction and feature fusion on images to be processed in a subsequent image processing process by utilizing similarity loss between the target feature images obtained by the student extraction model and the target feature images obtained by the teacher feature extraction model to adjust parameters in the student feature extraction model. The application provides a training device of a feature extraction model, which comprises an acquisition module, an extraction module and an adjustment module, wherein the acquisition module is used for acquiring sample images, the extraction module is used for carrying out feature extraction on the sample images by utilizing a student feature extraction model and a teacher feature extraction model respectively to obtain a plurality of initial feature images with different sizes, the complexity of the student feature extraction model is smaller than that of the teacher feature extraction model, the fusion module is used for carrying out feature fusion on the plurality of initial feature images by utilizing the student feature extraction model and the teacher feature extraction model respectively to obtain target feature images corresponding to the initial feature images, the number of the target feature images is the same as that of the initial feature images, the adjustment module is used for carrying out similarity loss between the plurality of target feature images obtained by utilizing the student feature extraction model and the plurality of target feature images obtained by the teacher feature extraction model, and adjusting parameters in the student feature extraction model, and the student feature extraction model after training can carry out feature extraction and feature fusion on images to be processed in a subsequent image processing process. The application provides an image processing method which comprises the steps of obtaining an image to be processed, processing the image to be processed by utilizing a student feature extraction model in an image processing model to obtain a plurality of target feature images corresponding to the image to be processed, wherein the student feature extraction model is trained based on the training method, and performing image processing by utilizing a detection head model in the image processing model based on the plurality of target feature images corresponding to the image to be processed to obtain an image processing result of the image to be processed. The application provides an image processing device which comprises an image acquisition module, a first image processing module and a second image processing module, wherein the image acquisition module is used fo