CN-116975631-B - Model training, target detection method, electronic device and computer storage medium

CN116975631BCN 116975631 BCN116975631 BCN 116975631BCN-116975631-B

Abstract

The application provides a model training, a target detection method, electronic equipment and a computer storage medium, wherein a first target detection network which is trained in advance is obtained, the model training comprises a first feature extraction module, a first classification branch and a first regression branch, a second target detection network is established by the aid of the first feature extraction module, the second classification branch and the second regression branch, a training set is input into the first target detection network and the second target detection network, a first classification result output by the first classification branch, a first regression result of the first regression branch, a second classification result output by the second classification branch and a second regression result of the second regression branch are obtained, distillation loss of the second target detection network is obtained by means of difference information of the first classification result and the second classification result and difference information of the first regression result and the second regression result, and the second target detection network is trained by means of distillation loss. And the model performance is improved.

Inventors

ZHENG JIAJUN
ZHANG CHENGCHENG
MA ZIANG

Assignees

杭州华橙软件技术有限公司

Dates

Publication Date: 20260508
Application Date: 20230629

Claims (8)

1. A target detection method, characterized in that the target detection method comprises: inputting an image to be detected into a pre-trained target detection network; Acquiring target detection information based on the classification and/or the position of a prediction frame input by the target detection network; wherein the target detection network is obtained by: Acquiring a first target detection network trained in advance, wherein the first target detection network comprises a first feature extraction module, a first classification branch and a first regression branch; updating a second feature extraction module of a second target detection network by using network parameters of the first feature extraction module, wherein the second target detection network comprises a second classification branch, a second regression branch and a third regression branch; Inputting a training set into the first target detection network and the second target detection network, and acquiring a first classification result output by the first classification branch, a first regression result of the first regression branch, a second classification result output by the second classification branch, a second regression result of the second regression branch and a third regression result of the third regression branch; obtaining distillation loss of the second target detection network by using the difference information of the first classification result and the second classification result and the difference information of the first regression result and the second regression result; Acquiring comprehensive regression loss of the second target detection network by using the intersection ratio of the second regression result and the real regression result and the intersection ratio of the third regression result and the real regression result; Acquiring auxiliary regression loss of the second target detection network by utilizing coordinate distribution probability difference between the second regression result and the real regression result; Training the second target detection network using the distillation loss, the synthetic regression loss, and the auxiliary regression loss.
2. The method for detecting a target according to claim 1, wherein, The first regression branch and the second regression branch are used for predicting the coordinate distribution probability of the boundary box, and the third regression branch is used for predicting the distance probability of the boundary box.
3. The method for detecting an object according to claim 2 or 1, wherein, After training the second target detection network using the distillation loss, the target detection method further comprises: And deleting the second regression branch after the second target detection network training is completed.
4. The method for detecting an object according to claim 2 or 1, wherein, The inputting the training set into the first target detection network and the second target detection network comprises: Inputting the training set into the first target detection network and the second target detection network; Extracting prediction feature graphs of different scales of the training set by using a first feature extraction module of the first target detection network and/or a second feature extraction module of the second target detection network; And acquiring regression feature vectors and/or classification feature vectors with different scales by using the prediction feature graphs with different scales.
5. The method for detecting a target according to claim 1, wherein, The first feature extraction module and/or the second feature extraction module comprises a lightweight backbone network and a Lite type feature enhancement extraction module, wherein the lightweight backbone network is used for extracting a predicted feature map of the training set, and the Lite type feature enhancement extraction module is used for enhancing feature expression of the predicted feature map.
6. The method for detecting a target according to claim 1, wherein, After the training set is input into the first target detection network, the target detection method further comprises: acquiring an anchor point prediction frame and a classification score output by the first target detection network; acquiring control measurement parameters of each anchor point prediction frame by utilizing the intersection ratio of the anchor point prediction frame and the real boundary frame and the classification score; and dividing part of anchor points into positive samples by using the control measurement parameters, and dividing the rest anchor points into negative samples.
7. An electronic device comprising a processor and a memory, wherein the memory has program data stored therein, the processor being configured to execute the program data to implement the object detection method of any of claims 1-6.
8. A computer readable storage medium for storing program data, which when executed by a processor is adapted to carry out the object detection method according to any one of claims 1-6.

Description

Model training, target detection method, electronic device and computer storage medium Technical Field The present application relates to the field of computer vision, and in particular, to a model training method, a target detection method, an electronic device, and a computer storage medium. Background In recent years, as a direction of intense research in the field of deep learning, a target detection technology has been rapidly developed, and a main task of the target detection technology is to locate and classify a target object in an image. In various industrial and living application fields, such as security monitoring, the target detection technology is increasingly widely used, and in consideration of the limitation of equipment computing resources, cost optimization and the like, light weight has become an important research development trend. At present, the main direction of light weight comprises means of light weight network, model pruning, knowledge distillation on the model and the like. A few researches are focused on a knowledge distillation means to solve the problem of light-weight target detection technology, and a large model is commonly used to teach a model form of a small model, so that knowledge migration among different models is performed, and the purpose of knowledge distillation is achieved. However, this paradigm has two main drawbacks, (1) the inefficiency of knowledge migration between different models, i.e., student models often cannot learn the full knowledge of the teacher model, and (2) often requires extensive experimentation to find the best teacher model architecture. Disclosure of Invention The application provides a model training method, a target identification method, electronic equipment and a computer readable storage medium. In order to solve the technical problems, the application provides a model training method, which comprises the steps of obtaining a first target detection network trained in advance, wherein the first target detection network comprises a first feature extraction module, a first classification branch and a first regression branch, creating a second target detection network by using the first feature extraction module, wherein the second target detection network comprises a second feature extraction module, a second classification branch and a second regression branch, inputting a training set into the first target detection network and the second target detection network, obtaining a first classification result output by the first classification branch, a first regression result of the first regression branch, a second classification result output by the second classification branch, using difference information of the first classification result and the second classification result and difference information of the first regression result and the second regression result to obtain distillation loss of the second target detection network, and training the second target detection network by using the distillation loss. The second target detection network further comprises a third regression branch, wherein the first regression branch and the second regression branch are used for predicting the coordinate distribution probability of the boundary box, and the third regression branch is used for predicting the distance probability of the boundary box. The training of the second target detection network by using the distillation loss comprises the steps of obtaining a third regression result output by the third regression branch, obtaining comprehensive regression loss of the second target detection network by using the cross-over ratio of the second regression result to a real regression result and the cross-over ratio of the third regression result to the real regression result, obtaining auxiliary regression loss of the second target detection network by using the coordinate distribution probability difference between the second regression result and the real regression result, and training the second target detection network by using the distillation loss, the comprehensive regression loss and the auxiliary regression loss. The model training method further comprises deleting the second regression branch after the second target detection network training is completed by using the distillation loss. The method comprises the steps of inputting a training set into a first target detection network and a second target detection network, extracting prediction feature graphs of different scales of the training set by a first feature extraction module of the first target detection network and/or a second feature extraction module of the second target detection network, and obtaining regression feature vectors and/or classification feature vectors of different scales by the prediction feature graphs of different scales. The first feature extraction module and/or the second feature extraction module comprises a lightweight backbone network and a Lite