CN-116109882-B - Neural network training and class detection method, device, terminal and storage medium

CN116109882BCN 116109882 BCN116109882 BCN 116109882BCN-116109882-B

Abstract

The application provides a neural network training and category detection method, a device, a terminal and a computer readable storage medium, wherein the neural network training method comprises the steps of acquiring a first sample image containing a first target and a second sample image containing a second target; the method comprises the steps of respectively carrying out feature extraction on a first sample image and a second sample image based on a neural network to obtain a first feature data set corresponding to the first sample image and a second feature data set corresponding to the second sample image, carrying out feature fusion on an ith first feature image and a jth second feature image to obtain a first fusion feature image, carrying out feature fusion on the jth first feature image and the ith second feature image to obtain a second fusion feature image, and carrying out iterative training on the neural network based on errors between the first fusion feature image and the second fusion feature image. According to the application, the multi-scale features are fused, and the first fused feature map and the second fused feature map are mutually distilled, so that the model training is prevented from being over-fitted.

Inventors

JIANG XINTING
Dun Jingyu
WANG YAYUN

Assignees

浙江大华技术股份有限公司

Dates

Publication Date: 20260508
Application Date: 20221230

Claims (10)

1. A neural network training method, the training method comprising: Acquiring a first sample image containing a first target and a second sample image containing a second target, wherein the first sample image is associated with a first annotation class of the first target, and the second sample image is associated with a second annotation class of the second target; The method comprises the steps of respectively carrying out feature extraction on a first sample image and a second sample image based on a neural network to obtain a first feature data set corresponding to the first sample image and a second feature data set corresponding to the second sample image, wherein the first feature data set comprises m first feature images with different scales extracted by the neural network, the second feature data set comprises m second feature images with different scales extracted by the neural network, the n-th first feature image and the n-th second feature image have the same scale, n is less than or equal to m, and n and m are positive integers; Performing feature fusion on the ith first feature map and the jth second feature map to obtain a first fusion feature map, and performing feature fusion on the jth first feature map and the ith second feature map to obtain a second fusion feature map, wherein i is not equal to j, i is not equal to m, j is not equal to m, and i and j are positive integers; Iteratively training the neural network based on an error between the first fused feature map and the second fused feature map; Wherein, the training method further comprises: Weighting and summing the first annotation category and the second annotation category to obtain a generated tag corresponding to the first fusion feature map and the second fusion feature map; performing category prediction on the first fusion feature map through the neural network to obtain a first prediction category, and performing category prediction on the second fusion feature map to obtain a second prediction category; the neural network is iteratively trained based on error values between a first prediction category of the first fused feature map and the generated tag, and error values between a second prediction category of the second fused feature map and the generated tag.
2. The training method of claim 1, wherein the neural network comprises a feature extraction layer and at least two residual modules, the at least two residual modules comprising a first residual module and a second residual module, the feature extraction layer, the first residual module, and the second residual module being cascaded in sequence; The feature extraction is performed on the first sample image and the second sample image based on the neural network, so as to obtain a first feature data set corresponding to the first sample image and a second feature data set corresponding to the second sample image, including: The feature extraction layer performs feature extraction on the sample image to obtain an image feature map; performing downsampling processing and feature extraction on the image feature map through the first residual error module to obtain a first feature map; and carrying out downsampling processing and feature extraction on the first feature map through the second residual error module to obtain a second feature map.
3. The training method of claim 2, wherein the residual module comprises an input layer, a first convolution layer, a second convolution layer, a pooling layer, and a third convolution layer, the input layer being connected to the first convolution layer, the pooling layer, respectively, the second convolution layer being connected to the first convolution layer, the third convolution layer being connected to the pooling layer; The step of performing downsampling and feature extraction on the image feature map by the first residual module to obtain a first feature map includes: The input layer transmits the image characteristic images to the first convolution layer and the pooling layer respectively; the first convolution layer and the second convolution layer sequentially conduct feature extraction on the image feature images to obtain target feature images; the pooling layer adjusts the channel number of the image feature map to obtain a pooled feature map; the third convolution layer performs downsampling treatment on the pooled feature map to obtain a preprocessed feature map; and carrying out feature fusion on the target feature map and the preprocessing feature map to obtain the first feature map.
4. The training method of claim 2, wherein each of the at least two residual modules not connected to the feature extraction layer in the neural network has a convolutional layer connected thereto; after the step of performing downsampling processing and feature extraction on the first feature map by the second residual module to obtain a second feature map, the method further includes: The convolution layer adjusts the number and size of channels of the second feature map.
5. Training method according to claim 2, characterized in that 0< j < i; the step of carrying out feature fusion on the ith first feature map and the jth second feature map to obtain a first fusion feature map comprises the following steps: and carrying out feature fusion on the first feature map and the second feature map extracted by any one of the residual modules between the residual module for extracting the first feature map and the residual module connected with the feature extraction layer to obtain a fusion feature map.
6. A class detection method comprising: acquiring an image to be processed, wherein the image to be processed contains a target object; And carrying out category detection on the image to be processed by using a classification network model to obtain category information of the target object, wherein the classification network model is obtained by the method of any one of claims 1-5.
7. A neural network training device, the training device comprising: The system comprises an acquisition module, a first detection module and a second detection module, wherein the acquisition module is used for acquiring a first sample image containing a first target and a second sample image containing a second target, the first sample image is associated with a first annotation category of the first target, and the second sample image is associated with a second annotation category of the second target; The feature extraction module is used for carrying out feature extraction on the first sample image and the second sample image based on a neural network to obtain a first feature data set corresponding to the first sample image and a second feature data set corresponding to the second sample image, wherein the first feature data set comprises m first feature images with different scales extracted by the neural network, the second feature data set comprises m second feature images with different scales extracted by the neural network, the n-th first feature image and the n-th second feature image have the same scale, n is less than or equal to m, and n and m are positive integers; The feature fusion module is used for carrying out feature fusion on the ith first feature image and the jth second feature image to obtain a first fusion feature image, and carrying out feature fusion on the jth first feature image and the ith second feature image to obtain a second fusion feature image, wherein i is not equal to j, i is not less than m, j is not less than m, and i and j are positive integers; The training module is used for carrying out iterative training on the neural network based on errors between the first fusion feature map and the second fusion feature map, carrying out weighted summation on the first labeling category and the second labeling category to obtain generating labels corresponding to the first fusion feature map and the second fusion feature map, carrying out category prediction on the first fusion feature map through the neural network to obtain a first prediction category, carrying out category prediction on the second fusion feature map to obtain a second prediction category, and carrying out iterative training on the neural network based on error values between the first prediction category of the first fusion feature map and the generating labels and error values between the second prediction category of the second fusion feature map and the generating labels.
8. A class detection device, characterized in that the class detection device comprises: The image acquisition module is used for acquiring an image to be processed, wherein the image to be processed contains a target object; The classification detection module is used for performing classification detection on the image to be processed by using a classification network model to obtain the classification information of the target object, wherein the classification network model is obtained by the method of any one of claims 1-5.
9. A terminal comprising a memory, a processor and a computer program stored in the memory and running on the processor, the processor being configured to execute program data to implement steps in a neural network training method according to any one of claims 1 to 5 or steps in a class detection method according to claim 6.
10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when executed by a processor, the computer program implements the steps of the neural network training method according to any one of claims 1 to 5 or the steps of the class detection method according to claim 6.

Description

Neural network training and class detection method, device, terminal and storage medium Technical Field The invention relates to the technical field of deep learning, in particular to a neural network training and class detection method, a device, a terminal and a computer readable storage medium. Background The convolutional neural network (Convolutional Neural Networks, CNN) is used as a popular method of deep learning, high-order semantic features of objects in the image are obtained by performing a series of operations such as convolution and pooling on the image, and the features are classified by the full connection layer (Fully Connected layers, FC) so as to achieve the classification of the image. The convolutional neural network has a feature extraction capability far superior to that of the traditional image classification, the image classification technology based on the convolutional neural network is promoted to have great progress in classification precision, and the convolutional neural network has high practical value and is often used for classifying vehicles, license plate types, license plate nations and the like in application scenes such as intelligent transportation and the like. However, convolutional neural networks only consider extracting features from a single picture for fusion, and the influence of the quality of the images or some external attacks on the images can lead to the extracted features not having robustness or even not conforming to the actual characteristics of the actual category, thereby affecting the application of the neural network. Meanwhile, as the characteristic optimization is carried out on each picture, the network is easy to be fitted in training, and the robustness is insufficient. Disclosure of Invention The invention mainly solves the technical problem of providing a neural network training and category detection method, a device, a terminal and a computer readable storage medium, and solves the problem of overfitting in training in the prior art. In order to solve the technical problems, the first technical scheme adopted by the invention is to provide a neural network training method, which comprises the following steps: Acquiring a first sample image containing a first target and a second sample image containing a second target; Respectively carrying out feature extraction on a first sample image and a second sample image based on a neural network to obtain a first feature data set corresponding to the first sample image and a second feature data set corresponding to the second sample image, wherein the first feature data set comprises m first feature images with different scales extracted by the neural network, the second feature data set comprises m second feature images with different scales extracted by the neural network, the n-th first feature image and the n-th second feature image have the same scale, n is less than or equal to m, and n and m are positive integers; Performing feature fusion on the ith first feature map and the jth second feature map to obtain a first fusion feature map, and performing feature fusion on the jth first feature map and the ith second feature map to obtain a second fusion feature map, wherein i is not equal to j, i is not greater than m, j is not greater than m, and i and j are positive integers; and performing iterative training on the neural network based on the error between the first fusion feature map and the second fusion feature map. Wherein the first sample image is associated with a first annotation class of the first target and the second sample image is associated with a second annotation class of the second target; the training method further comprises the following steps: Carrying out weighted summation on the first annotation category and the second annotation category to obtain a generated tag corresponding to the first fusion feature map and the second fusion feature map; Performing category prediction on the first fusion feature map through a neural network to obtain a first prediction category, and performing category prediction on the second fusion feature map to obtain a second prediction category; The neural network is iteratively trained based on error values between the first prediction category of the first fused feature map and the generated tag, and error values between the second prediction category of the second fused feature map and the generated tag. The neural network comprises a feature extraction layer and at least two residual error modules, wherein the at least two residual error modules comprise a first residual error module and a second residual error module, and the convolution layer, the first residual error module and the second residual error module are sequentially cascaded; Performing feature extraction on the first sample image and the second sample image based on the neural network to obtain a first feature data set corresponding to the first sample image and a second