CN-114429557-B - Target classification method and system

CN114429557BCN 114429557 BCN114429557 BCN 114429557BCN-114429557-B

Abstract

A target classification method and system. The method comprises the steps of classifying detection results of target objects by using more than two preset classification network models when the detection results of the target objects are received, wherein the detection results of the target objects comprise target object type information, target object position information and confidence information, the intermediate classification results output by the classification network models are fused to obtain the classification results of the target objects, the classification results of the target objects comprise target object type information and confidence information, and the types of training samples corresponding to any two classification network models are different. By applying the scheme, the accuracy of target detection can be realized.

Inventors

LI XINWEI
LU JICHENG
QIN LIANG
Luo Yangxiaoxuan
CHEN JIANNAN

Assignees

上海复旦微电子集团股份有限公司

Dates

Publication Date: 20260505
Application Date: 20201014

Claims (18)

1. A method of classifying objects, comprising: After receiving an input image, performing target detection on the input image by using a preset detection network model to obtain a detection result of a target object; When receiving the detection result of the target object, respectively classifying the detection result of the target object by using more than two preset classification network models, wherein the detection result of the target object comprises target object category information, target object position information and confidence information; Fusing the intermediate classification results output by the classification network models to obtain the classification results of the target articles, wherein the classification results of the target articles comprise target article category information and confidence information; Judging whether the detection result of the target object is effective or not by utilizing the classification result of the target object, and taking the effective detection result of the target object as the identification result of the target object; Judging whether the detection result of the target object is effective or not by utilizing the classification result of the target object, wherein the detection result of the target object comprises the steps of carrying out class conversion on the target object class information in the detection result of the target object by utilizing preset first training sample class mapping relation information, judging whether the detection result of the target object is effective or not based on the class conversion result to obtain the identification result of the target object; The second training sample category mapping relation information is the training sample category of the same training sample when the training sample corresponding to the detection network model is used as the training sample corresponding to the detection network model, and the similar training sample category information between the training sample category and the similar training sample category when the training sample corresponding to the classification network model is used as the training sample corresponding to the detection network model, wherein the similar training sample category is the training sample category with the same visual characteristics or one or more training sample categories at the same level in a training sample category table; The method comprises the steps of judging whether the detection result of the target object is effective or not based on a class conversion result, judging whether target object class information in the detection result of the target object is matched with target object class information of the classification result of the target object or not to obtain a first judgment result, if the first judgment result is that the confidence coefficient of the detection result of the target object is higher than a first confidence coefficient threshold value or the confidence coefficient of the classification result of the target object is higher than a second confidence coefficient threshold value when the first judgment result is that the confidence coefficient of the detection result of the target object is higher than a third confidence coefficient threshold value or the confidence coefficient of the classification result of the target object is higher than a fourth confidence coefficient threshold value when the first judgment result is that the confidence coefficient of the detection result of the target object is higher than the first confidence coefficient threshold value, and if the first judgment result is that the confidence coefficient of the detection result of the target object is higher than the second confidence coefficient threshold value when the first judgment result is that the confidence coefficient of the target object is not higher than the fourth confidence coefficient threshold value.
2. The method of claim 1, wherein the training samples corresponding to any two of the classification network models are of completely different classes or partially different classes.
3. The method of claim 1, wherein the difference in the number of training samples in any two categories among the training samples corresponding to each of the classification network models is less than a second sample difference threshold.
4. The method of object classification according to claim 3, wherein training samples corresponding to the two or more classification network models are further determined by: And taking two types of training samples with different visual characteristics and the difference value of the number of the training samples being smaller than a preset third sample difference threshold value as training samples corresponding to the same classification network model, wherein the third sample difference threshold value is smaller than the second sample difference threshold value.
5. The method of classifying objects according to claim 1, wherein the fusing the intermediate classification results output from the classification network models to obtain the classification results of the objects comprises: When the target object class information of the same target object exists in only one intermediate classification result, the intermediate classification result is used as the classification result of the target object; when the target object class information of the same target object exists only in the two intermediate classification results, taking the intermediate classification result with higher confidence in the two intermediate classification results as the classification result of the target object; When the target object class information of the same target object exists in more than three intermediate classification results, selecting a preset number of intermediate classification results as the classification results of the target object according to the confidence information in the more than three intermediate classification results.
6. The method for classifying objects according to claim 1, wherein after receiving an input image, performing object detection on the input image by using a preset detection network model to obtain a detection result of the object article, includes: After receiving an input image, respectively detecting a target object of the input image by using more than two preset detection network models; Fusing intermediate detection results output by the detection network models to obtain detection results of the target articles, wherein the intermediate detection results comprise target article category information, target article position information and confidence information; wherein, the training samples corresponding to any two detection network models are different in category.
7. The method of claim 6, wherein the training samples corresponding to any two of the detection network models are of different classes or partially different classes.
8. The method of claim 6, wherein the difference between the number of training samples corresponding to any two of the detection network models is less than a preset first difference, and the difference between the number of training samples corresponding to any two of the detection network models is less than a first difference threshold.
9. The method of object classification according to claim 6, wherein the two or more detection network models are trained by: Counting the number of training samples in each category in all training samples, and determining a first category with the most training samples and a second category with the least training samples; Uniformly dividing the training sample number of the first category into K training sample intervals defined by the training sample number of the second category to obtain K-1 sample number separation values, wherein K is a positive integer greater than 1; Based on the K-1 sample number separation values and the training sample numbers of all the categories, the categories of the obtained training samples are separated, the training samples of the K detection network models are obtained, and training is performed.
10. The method of classifying objects according to claim 6, wherein the fusing the intermediate detection results output by the detection network models to obtain the detection results of the object objects comprises: when the target object class information of the same target object exists in only one of the intermediate detection results, the intermediate detection result is used as the detection result of the target object; When the target object category information of the same target object exists in more than two intermediate detection results, combining the target object position information and the confidence information in the more than two intermediate detection results to obtain the detection result of the target object.
11. The method of claim 10, wherein combining the target object position information and the confidence information in the two or more intermediate detection results to obtain the detection result of the target object comprises: Calculating the overlapping area between the first area and each second area, removing the corresponding intermediate detection result according to the size of the overlapping area, and taking the intermediate detection result after removing the corresponding intermediate detection result in the two or more intermediate detection results as the detection result of the target object of the round; the detection results of the target object and the intermediate detection results with the lowest confidence in the above two intermediate detection results are used as the detection results of the target object together; the first area is a target object area corresponding to the target object position information of the intermediate detection results, the second area is a target object area corresponding to the target object position information of the rest intermediate detection results in the two or more intermediate detection results, and the confidence coefficient of the intermediate detection result where the first area is located is higher than that of the intermediate detection result where the second area is located.
12. The method for classifying objects according to claim 11, wherein the performing a culling operation on the corresponding intermediate detection result according to the size of the overlapping area includes: Judging the difference value between the area of each target object and the overlapping area in the corresponding two middle detection results based on the size of the overlapping area; When the difference value between the area of the target object and the overlapping area is smaller than a first area threshold value, performing a rejection operation on an intermediate detection result where the area of the target object with the overlapping area difference value smaller than the first area threshold value is located; And when the difference value between the area of the target object and the overlapping area is larger than the first area threshold value, performing a rejecting operation based on the difference value between the areas of the target object in the corresponding two intermediate detection results.
13. The method of claim 12, wherein the performing a culling operation based on a difference between areas of the target items in the respective two intermediate detection results comprises: calculating the difference value between the areas of the target objects in the corresponding two intermediate detection results; And when the difference value between the areas of the target objects is smaller than a second area threshold value, performing a rejecting operation on the intermediate detection result with lower confidence.
14. The method of claim 12, wherein the performing a culling operation based on a difference between areas of the target items in the respective two intermediate detection results, further comprises: And updating the target object position information of the reserved intermediate detection result by using the target object position information of the removed intermediate detection result.
15. The method of claim 11, wherein the first areas of the intermediate detection results with non-lowest confidence are calculated in order of high confidence and a culling operation is performed.
16. The method of object classification according to claim 1, The training samples corresponding to the more than two classification network models are determined by the following method: Selecting a characteristic training sample with the same visual characteristics from all training samples, and taking the training sample with the same visual characteristics as a training sample corresponding to the same classification network model; Selecting training samples with the same article use characteristics from the rest training samples after the selection according to the visual characteristics, and taking the training samples with the same article use characteristics as training samples corresponding to the same classification network model; selecting training samples with the same article material from the rest training samples selected according to the visual characteristics and the article use characteristics, and taking the training samples with the same article material as training samples corresponding to the same classification network model; wherein, the training samples corresponding to any two classification network models are different in category.
17. A target classification system, comprising: the detection unit is suitable for respectively utilizing a preset detection network model to carry out target detection on the input image after receiving the input image so as to obtain a detection result of a target object; The system comprises more than two target classification units, a target detection unit and a detection unit, wherein the target classification units are suitable for classifying the detection result of the target object by respectively utilizing more than two preset classification network models when receiving the detection result of the target object, and the detection result of the target object comprises target object category information, target object position information and confidence information; the second fusion unit is suitable for fusing the intermediate classification results output by the classification network models to obtain the classification results of the target articles, wherein the classification results of the target articles comprise target article category information and confidence information; the judging unit is suitable for judging whether the detection result of the target object is effective or not by utilizing the classification result of the target object, and taking the effective detection result of the target object as the identification result of the target object; The judging unit is suitable for carrying out category conversion on the target article category information in the detection result of the target article by utilizing the preset first training sample category mapping relation information, judging whether the detection result of the target article is effective or not based on the category conversion result to obtain the identification result of the target article; The second training sample category mapping relation information is the training sample category of the same training sample when the training sample corresponding to the detection network model is used as the training sample corresponding to the detection network model, and the similar training sample category information between the training sample category and the similar training sample category when the training sample corresponding to the classification network model is used as the training sample corresponding to the detection network model, wherein the similar training sample category is the training sample category with the same visual characteristics or one or more training sample categories at the same level in a training sample category table; The method comprises the steps of judging whether the detection result of the target object is effective or not based on a class conversion result, judging whether target object class information in the detection result of the target object is matched with target object class information of the classification result of the target object or not to obtain a first judgment result, if the first judgment result is that the confidence coefficient of the detection result of the target object is higher than a first confidence coefficient threshold value or the confidence coefficient of the classification result of the target object is higher than a second confidence coefficient threshold value when the first judgment result is that the confidence coefficient of the detection result of the target object is higher than a third confidence coefficient threshold value or the confidence coefficient of the classification result of the target object is higher than a fourth confidence coefficient threshold value when the first judgment result is that the confidence coefficient of the detection result of the target object is higher than the first confidence coefficient threshold value, and if the first judgment result is that the confidence coefficient of the detection result of the target object is higher than the second confidence coefficient threshold value when the first judgment result is that the confidence coefficient of the target object is not higher than the fourth confidence coefficient threshold value.
18. The object classification system of claim 17, wherein the detection unit comprises: The system comprises more than two target detection subunits, wherein the detection subunits are suitable for detecting target objects of an input image by utilizing a preset detection network model after the input image is received; the second fusion subunit is suitable for fusing the intermediate detection results output by the detection network models to obtain the detection result of the target object, wherein the intermediate detection result comprises target object category information, target object position information and confidence information; wherein, the training samples corresponding to any two classification network models are different in category.

Description

Target classification method and system Technical Field The invention relates to the field of target detection, in particular to a target classification method and a target classification system. Background The deep learning prototype is to simulate human brain by using computer system architecture, and is widely applied in security inspection and other fields. The application of the deep learning in the security inspection field mainly comprises target detection, namely detecting the position and classification information of target objects in an input image. Existing solutions for target detection based on deep learning typically use a single convolutional neural network (Convolutional Neural Networks, CNN) or a fully connected deep neural network (Deep Neural Networks, DNN) for target detection. However, the accuracy of target detection using the above-described scheme is poor. Disclosure of Invention The invention aims to solve the problem of improving the accuracy of target detection. In order to solve the above problems, an embodiment of the present invention provides a target classification method, including: when receiving the detection result of the target object, classifying the detection result of the target object by using more than two preset classification network models respectively, wherein the detection result of the target object comprises target object category information, target object position information and confidence information; Fusing the intermediate classification results output by the classification network models to obtain the classification results of the target articles, wherein the classification results of the target articles comprise target article category information and confidence information; wherein, the training samples corresponding to any two classification network models are different in category. Optionally, the categories of training samples corresponding to any two classification network models are completely different or partially different. Optionally, the difference of the number of training samples of any two categories in the training samples corresponding to each classification network model is smaller than a second sample difference threshold. Optionally, the training samples corresponding to the more than two classification network models are determined by the following method: Selecting a characteristic training sample with the same visual characteristics from all training samples, and taking the training sample with the same visual characteristics as a training sample corresponding to the same classification network model; Selecting training samples with the same article use characteristics from the rest training samples after the selection according to the visual characteristics, and taking the training samples with the same article use characteristics as training samples corresponding to the same classification network model; and selecting training samples with the same article material from the remaining training samples selected according to the visual characteristics and the article use characteristics, and taking the training samples with the same article material as the training samples corresponding to the same classification network model. Optionally, the training samples corresponding to the more than two classification network models are further determined by the following method: And taking two types of training samples with different visual characteristics and the difference value of the number of the training samples being smaller than a preset third sample difference threshold value as training samples corresponding to the same classification network model, wherein the third sample difference threshold value is smaller than the second sample difference threshold value. Optionally, the merging the intermediate classification results output by the classification network models to obtain the classification result of the target object includes: When the target object class information of the same target object exists in only one intermediate classification result, the intermediate classification result is used as the classification result of the target object; when the target object class information of the same target object exists only in the two intermediate classification results, taking the intermediate classification result with higher confidence in the two intermediate classification results as the classification result of the target object; When the target object class information of the same target object exists in more than three intermediate classification results, selecting a preset number of intermediate classification results as the classification results of the target object according to the confidence information in the more than three intermediate classification results. Optionally, the method further comprises: After receiving an input image, performing target detection on the input image by using a preset detection netwo