CN-115731408-B - Training method and device for image processing model, electronic equipment and storage medium

CN115731408BCN 115731408 BCN115731408 BCN 115731408BCN-115731408-B

Abstract

The invention provides a training method, device, equipment and storage medium of an image processing model, wherein the image processing model comprises a visual transformation network and a category weight network, the method comprises the steps of inputting at least one clipping sub-image corresponding to an enhanced sample image into the visual transformation network, obtaining coding sub-images corresponding to the at least one clipping sub-image respectively, inputting any coding sub-image, clipping sub-images corresponding to the coding sub-image and coding images obtained by splicing all the coding sub-images into the category weight network, obtaining an anti-enhanced sample image corresponding to the enhanced sample image, and adjusting parameters of the visual transformation network and the category weight network based on a standard sample image corresponding to the enhanced sample image and the anti-enhanced sample image.

Inventors

TAO SHENG
LI CHUNLI
LI HONGJIE
ZHAO HONGFENG
ZHU JIANGLIN
LIN ZHONGKANG

Assignees

上海励驰半导体有限公司

Dates

Publication Date: 20260508
Application Date: 20221121

Claims (13)

1. A method of training an image processing model, the image processing model comprising a visual transformation network and a class weighting network, the method comprising: Inputting at least one clipping sub-image corresponding to the enhanced sample image into the visual transformation network, and obtaining coding sub-images respectively corresponding to the at least one clipping sub-image; Inputting any coding sub-image, a clipping sub-image corresponding to the coding sub-image and a coding image obtained by splicing all the coding sub-images into a category weight network to obtain an anti-enhancement sample image corresponding to the enhancement sample image, wherein the category weight network comprises a first neural network layer, a third neural network layer, a mask plate, a classification coding layer and a convolution decoder; And adjusting parameters of the visual transformation network and the category weight network based on the standard sample image and the anti-enhancement sample image corresponding to the enhancement sample image.
2. The method according to claim 1, wherein said inputting at least one cropping sub-image corresponding to said enhanced sample image into said visual transformation network, prior to obtaining coding sub-images respectively corresponding to said at least one cropping sub-image, further comprises: preprocessing an original sample image to obtain a standard sample image; Performing enhancement processing on the standard sample image to obtain an enhanced sample image; and clipping the enhanced sample image to obtain at least one clipping sub-image corresponding to the enhanced sample image.
3. The method according to claim 1, wherein said inputting at least one cropping sub-image corresponding to the enhanced sample image into the visual transformation network, obtaining a coding sub-image corresponding to the at least one cropping sub-image respectively, comprises: Inputting the at least one clipping sub-image into a linear layer included in the visual transformation network, and confirming that the output of the linear layer is a linear projection corresponding to each clipping sub-image respectively; And inputting the linear projection corresponding to each clipping sub-image and the classification identification of each clipping sub-image in the enhanced sample image into an encoder included in the visual transformation network, and confirming the output of the encoder to be the coding sub-image corresponding to at least one clipping sub-image respectively.
4. The method according to claim 1, wherein the inputting the encoded image obtained by stitching any one of the encoded sub-images, the cropped sub-image corresponding to the encoded sub-image, and all the encoded sub-images into the category weight network, obtaining the anti-enhancement sample image corresponding to the enhancement sample image, includes performing the following processing on each encoded sub-image and the cropped sub-image corresponding to each encoded sub-image: inputting the coding sub-image to a first neural network layer included in the category weight network, and acquiring a first coding characteristic corresponding to the coding sub-image; inputting the clipping sub-image corresponding to the coding sub-image into a second neural network layer included in the category weight network, and acquiring a second coding feature corresponding to the clipping sub-image; And inputting the coded image to a third neural network layer included in the category weight network, and acquiring a third coding feature corresponding to the coded image.
5. The method according to claim 4, wherein the inputting the encoded image obtained by stitching any one of the encoded sub-images, the cropped sub-image corresponding to the encoded sub-image, and all the encoded sub-images into the category weight network, obtaining the anti-enhancement sample image corresponding to the enhancement sample image, includes performing the following processing on each encoded sub-image and the cropped sub-image corresponding to each encoded sub-image: acquiring the autocorrelation strength of the first coding feature based on the first coding feature; masking the third coding feature to obtain a first matrix; the autocorrelation strength is divided into a non-masking region and a masking region based on a first matrix.
6. The method according to claim 5, wherein the inputting the encoded image obtained by stitching any one of the encoded sub-images, the cropped sub-image corresponding to the encoded sub-image, and all the encoded sub-images into the category weight network, obtaining the anti-enhancement sample image corresponding to the enhancement sample image, includes performing the following processing on each encoded sub-image and the cropped sub-image corresponding to each encoded sub-image: Confirming a pixel weighted output of the non-masked area based on a pixel weighted weight of the non-masked area and the second encoding feature; confirming a pixel weighted output of the mask region based on the pixel weighted weight of the mask region and the first encoding feature; Confirming a first mosaic coefficient and a second mosaic coefficient based on the non-mask region and the mask region; Confirming an anti-enhancement encoded sub-image corresponding to the clipping sub-image based on the first stitching coefficient, the second stitching coefficient, the pixel weighted output of the non-mask region, and the pixel weighted output of the mask region; Confirming a splicing processing result of the anti-enhancement coding sub-image corresponding to each cutting sub-image, and obtaining the anti-enhancement coding sample image; And performing convolution decoding on the anti-enhancement sample image to obtain the anti-enhancement sample image.
7. The method of claim 1, wherein the adjusting parameters of the visual transformation network and the category-weight network based on the standard sample image and the inverse enhanced sample image corresponding to the enhanced sample image comprises: Confirming a loss function of the image processing model based on the value of each pixel in the standard sample image and the value of each pixel in the anti-enhancement sample image; Parameters of the visual transformation network and the class weight network are adjusted based on a loss function of the image processing model.
8. The method of claim 7, wherein said validating the loss function of the image processing model based on the value of each pixel in the standard sample image and the value of each pixel in the anti-enhancement sample image comprises: Confirming a reconstruction loss sub-function based on the value of each pixel in the standard sample image, the value of each pixel in the anti-enhancement sample image, and the L1 loss function; Confirming a perceptual loss sub-function based on the value of each pixel in the standard sample image, the value of each pixel in the anti-enhanced sample image, and a visual activation thermodynamic diagram; confirming a generator loss sub-function based on the value of each pixel in the inverse enhanced sample image; confirming a loss function of the image processing model based on the reconstructed loss sub-function, the perceived loss sub-function, and the generator loss sub-function.
9. A positioning method, characterized in that it is implemented based on an image processing model obtained by training according to claims 1-8, said method comprising: Respectively carrying out normalization processing on the real-time image and the image construction image based on the image processing model, and respectively obtaining real-time characteristics corresponding to the real-time image and image construction characteristics corresponding to the image construction image; Feature fusion is carried out on the global navigation satellite system GNSS and the map building features corresponding to the real-time images, and a feature map is confirmed based on the result of the feature fusion; and confirming positioning information of equipment for acquiring the real-time image based on the characteristic map and the real-time characteristic.
10. A training apparatus for an image processing model, the image processing model comprising a visual transformation network and a class weighting network, the apparatus comprising: The first training unit is used for inputting at least one clipping sub-image corresponding to the enhanced sample image into the visual transformation network, and acquiring coding sub-images corresponding to the at least one clipping sub-image respectively; The second training unit is used for inputting any coding sub-image, a clipping sub-image corresponding to the coding sub-image and a coding image obtained by splicing all the coding sub-images into a category weight network to obtain an anti-enhancement sample image corresponding to the enhancement sample image, wherein the category weight network comprises a first neural network layer, a third neural network layer, a mask plate, a classification coding layer and a convolution decoder; And the adjusting unit is used for adjusting parameters of the visual transformation network and the category weight network based on the standard sample image corresponding to the enhanced sample image and the anti-enhanced sample image.
11. Positioning device, characterized in that it is realized based on an image processing model obtained by training according to claims 1-8, said device comprising: The normalization unit is used for respectively carrying out normalization processing on the real-time image and the image construction image based on the image processing model, and respectively obtaining real-time characteristics corresponding to the real-time image and image construction characteristics corresponding to the image construction image; The feature fusion unit is used for carrying out feature fusion on the global navigation satellite system GNSS and the map building features corresponding to the real-time images, and confirming a feature map based on the result of the feature fusion; and the positioning unit is used for confirming the positioning information of the equipment for acquiring the real-time image based on the characteristic map and the real-time characteristic.
12. An electronic device, comprising: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8; or can perform the method of claim 9.
13. A non-transitory computer readable storage medium storing computer instructions for enabling the computer to perform the method of any one of claims 1-8; or can perform the method of claim 9.

Description

Training method and device for image processing model, electronic equipment and storage medium Technical Field The disclosure relates to the technical field of image processing, and in particular relates to a training method and device of an image processing model, electronic equipment and a storage medium. Background In the field of automatic driving or robot positioning, vehicle or robot vision positioning is generally performed based on a high-precision vision feature map, wherein the vehicle or robot vision positioning comprises two stages of map construction and positioning, the high-precision vision feature map needs to be generated in the map construction stage, feature points in images acquired in real time need to be extracted in the positioning stage and then matched with the high-precision vision feature map so as to obtain the real-time pose of the vehicle or robot, and when the difference between the images acquired in real time and the high-precision vision feature map is large due to brightness, definition and the like, the matching between the images acquired in real time and the high-precision vision feature map is difficult to realize, so that the vehicle or robot positioning fails. Disclosure of Invention The present disclosure provides a training method, apparatus, electronic device, and storage medium for an image processing model, so as to at least solve the above technical problems in the prior art. According to a first aspect of the present disclosure, there is provided a training method of an image processing model, the model comprising a visual transformation network and a category weighting network, the method comprising: Inputting at least one clipping sub-image corresponding to the enhanced sample image into the visual transformation network, and obtaining coding sub-images respectively corresponding to the at least one clipping sub-image; inputting any coding sub-image, a clipping sub-image corresponding to the coding sub-image and a coding image obtained by splicing all the coding sub-images into a category weight network, and obtaining an anti-enhancement sample image corresponding to the enhancement sample image; And adjusting parameters of the visual transformation network and the category weight network based on the standard sample image and the anti-enhancement sample image corresponding to the enhancement sample image. In the above solution, before inputting the at least one cropping sub-image corresponding to the enhanced sample image into the visual transformation network and obtaining the coding sub-image corresponding to the at least one cropping sub-image respectively, the method further includes: preprocessing an original sample image to obtain a standard sample image; Performing enhancement processing on the standard sample image to obtain an enhanced sample image; and clipping the enhanced sample image to obtain at least one clipping sub-image corresponding to the enhanced sample image. In the above solution, the inputting at least one cropping sub-image corresponding to the enhanced sample image into the visual transformation network, and obtaining the coding sub-image corresponding to the at least one cropping sub-image respectively, includes: Inputting the at least one clipping sub-image into a linear layer included in the visual transformation network, and confirming that the output of the linear layer is a linear projection corresponding to each clipping sub-image respectively; And inputting the linear projection corresponding to each clipping sub-image and the classification identification of each clipping sub-image in the enhanced sample image into an encoder included in the visual transformation network, and confirming the output of the encoder to be the coding sub-image corresponding to at least one clipping sub-image respectively. In the above scheme, the inputting the coding sub-image obtained by splicing any coding sub-image, the clipping sub-image corresponding to the coding sub-image and all the coding sub-images into the category weight network, obtaining the anti-enhancement sample image corresponding to the enhancement sample image, includes performing the following processing on each coding sub-image and the clipping sub-image corresponding to each coding sub-image: inputting the coding sub-image to a first neural network layer included in the category weight network, and acquiring a first coding characteristic corresponding to the coding sub-image; inputting the clipping sub-image corresponding to the coding sub-image into a second neural network layer included in the category weight network, and acquiring a second coding feature corresponding to the clipping sub-image; And inputting the coded image to a third neural network layer included in the category weight network, and acquiring a third coding feature corresponding to the coded image. In the above scheme, the inputting the coding sub-image obtained by splicing any coding sub-image, the cl