CN-116664979-B - Pseudo tag generation method, model training method, target detection method and device

CN116664979BCN 116664979 BCN116664979 BCN 116664979BCN-116664979-B

Abstract

The invention provides a pseudo tag generation method, a model training method, a target detection method and equipment, wherein the method comprises the following steps: training the constructed target detection model by using the marked first training image to obtain a first target detection model, obtaining target characteristics corresponding to marked targets in the first training image by using the first target detection model, generating a pseudo tag of an unmarked second training image by using the first target detection model and assisting with the target characteristics corresponding to the marked targets, training the marked training image and the training image with the pseudo tag to obtain a final target detection model on the basis, and further, carrying out target detection on the image to be detected by using the target detection model obtained by training. According to the invention, accurate pseudo labels can be generated aiming at unlabeled training images, and a target detection model with better performance can be obtained by training on the basis, and further, the model is used for detecting the target of the image to be detected, so that a better detection effect can be obtained.

Inventors

SHENG DIAN
LIN KEN
YIN BING

Assignees

科大讯飞股份有限公司

Dates

Publication Date: 20260505
Application Date: 20230602

Claims (14)

1. A pseudo tag generation method, comprising: training the constructed target detection model by adopting the marked first training image to obtain a first target detection model; Obtaining target characteristics corresponding to marked targets in the first training image by using the first target detection model to obtain target characteristics corresponding to a plurality of known targets respectively; generating a pseudo tag of an unlabeled second training image by using the first target detection model and assisting with target characteristics corresponding to the known targets respectively, wherein the second training image with the pseudo tag is used for training the constructed target detection model or the first target detection model; The obtaining, by using the first target detection model, a target feature corresponding to the labeled target in the first training image includes: based on the first target detection model, N candidate target features of the first training image are obtained, each candidate target feature is converted into a candidate target detection result, N candidate target detection results are obtained, and N is an integer greater than 1; determining target features corresponding to marked targets in the first training image according to the N candidate target features and candidate target detection results matched with the marking information of the first training image in the N candidate target detection results; The generating a pseudo tag of an unlabeled second training image by using the first target detection model and assisting target features corresponding to the known targets respectively comprises: dividing target features corresponding to marked targets of the same type into a group to obtain a plurality of target feature groups; Performing target detection on the second training image by using the first target detection model and the plurality of target feature groups to obtain a plurality of candidate target detection results of the second training image; and determining the pseudo tag of the second training image according to the detection results of the plurality of candidate targets of the second training image.
2. The method of generating a pseudo tag according to claim 1, wherein determining the target feature corresponding to the labeled target in the first training image according to the N candidate target features and a candidate target detection result matched with the labeling information of the first training image in the N candidate target detection results includes: screening candidate target detection results matched with the labeling information of the first training image from N candidate target detection results of the first training image; And acquiring target characteristics corresponding to the marked targets in the first training image according to the screened candidate target detection results and the N candidate target characteristics of the first training image.
3. The method of generating pseudo tag according to claim 1, wherein said performing object detection on the second training image using the first object detection model and the plurality of object feature sets to obtain a plurality of candidate object detection results of the second training image includes: Acquiring image features of the second training image by using the first target detection model; Decoding the image features of the second training image by using the first target detection model and the plurality of target feature groups to obtain a plurality of candidate target features of the second training image; And converting each candidate target feature of the second training image into a candidate target detection result by using the first target detection model to obtain a plurality of candidate target detection results of the second training image.
4. The method of generating pseudo tag according to claim 3, wherein decoding the image features of the second training image with the first object detection model and the plurality of object feature groups to obtain a plurality of candidate object features of the second training image includes: Traversing the plurality of target feature groups: And decoding the second features of the second training image based on the first target detection model and assisted by the current traversed target feature group to obtain N candidate target features of the second training image.
5. The pseudo tag generation method of claim 1, wherein each candidate target detection result of the second training image corresponds to a confidence level; the determining the pseudo tag of the second training image according to the detection results of the plurality of candidate targets of the second training image includes: Filtering out candidate target detection results with the corresponding confidence coefficient smaller than a preset confidence coefficient threshold value from a plurality of candidate target detection results of the second training image; And de-duplicating a plurality of candidate target detection results aiming at the same target in the residual candidate target detection results, wherein the residual candidate target detection results are finally used as pseudo labels of the second training image.
6. The method of generating pseudo tag according to claim 1, wherein the first object detection model comprises a feature extractor, an encoder, a decoder, and a mapper; The step of performing object detection on the second training image by using the first object detection model and the plurality of object feature sets to obtain a plurality of candidate object detection results of the second training image, includes: extracting the characteristics of the second training image by using a characteristic extractor of the first target detection model to obtain first characteristics of the second training image; Encoding the first feature of the second training image by using an encoder of the first target detection model to obtain a second feature of the second training image; decoding the second feature of the second training image by using the decoder of the first target detection model and the plurality of target feature groups to obtain a plurality of candidate target features of the second training image; and mapping each candidate target feature of the second training image into a candidate target detection result by using a mapper of the first target detection model to obtain a plurality of candidate target detection results of the second training image.
7. The pseudo tag generating method of claim 6, wherein the decoder includes M cascaded decoding modules, each decoding module being identical in structure, M being an integer greater than 1; each target feature in each target feature group is a feature sequence, and an ith feature in each feature sequence is a feature output by an ith decoding module, wherein 1< = i < = M; When the decoder of the first target detection model is utilized and the plurality of target feature groups are assisted to decode the second feature of the second training image, the input of each decoding module comprises a target query feature, a reference feature and the second feature of the second training image, the output of each decoding module comprises N candidate target features, and each decoding module extracts the feature related to a target detection task from the second feature of the second training image by utilizing the target query feature and being assisted by the reference feature; the target query features input into the 1 st decoding module are N created features, the reference features input into the 1 st decoding module are a plurality of created features, the target query features input into the j decoding module are N candidate target features output by the j-1 st decoding module, the reference features input into the j decoding module are features output by the j-1 st decoding module in each feature sequence in the plurality of target feature groups, and 2< = j < = M.
8. A method for training a target detection model, comprising: Acquiring a marked first training image and an unmarked second training image; Generating a pseudo tag of the second training image by adopting the pseudo tag generation method according to any one of claims 1 to 7; And training a second target detection model by adopting the marked first training image and a second training image with a pseudo tag to obtain a final target detection model, wherein the second target detection model is a constructed target detection model or a model obtained by carrying out preliminary training on the constructed target detection model by adopting the marked first training image.
9. A method of detecting an object, comprising: Acquiring an image to be detected; Performing target detection on the image to be detected by utilizing a target detection model obtained through pre-training; The object detection model is trained by the object detection model training method according to claim 8.
10. The pseudo tag generation device is characterized by comprising a first training module, a known target feature acquisition module and a pseudo tag generation module; the first training module is used for training the constructed target detection model by adopting the marked first training image to obtain a first target detection model; The known target feature acquisition module is used for acquiring target features corresponding to marked targets in the first training image by using the first target detection model to obtain target features corresponding to a plurality of known targets respectively; The pseudo tag generation module is used for generating a pseudo tag of an unlabeled second training image by utilizing the first target detection model and assisting with target characteristics corresponding to the known targets respectively, wherein the second training image with the pseudo tag is used for training the constructed target detection model or the first target detection model; The known target feature obtaining module is specifically configured to obtain N candidate target features of the first training image based on the first target detection model, convert each candidate target feature into a candidate target detection result, where N is an integer greater than 1, and determine a target feature corresponding to a labeled target in the first training image according to the N candidate target features and a candidate target detection result matched with labeling information of the first training image in the N converted candidate target detection results; The pseudo tag generation module is specifically configured to divide target features corresponding to each labeled target of the same type into a group to obtain a plurality of target feature groups, perform target detection on the second training image by using the first target detection model and the plurality of target feature groups, obtain a plurality of candidate target detection results of the second training image, and determine a pseudo tag of the second training image according to the plurality of candidate target detection results of the second training image.
11. The training device for the target detection model is characterized by comprising a training data acquisition module, the pseudo tag generation device according to claim 10 and a second training module; The training data acquisition module is used for acquiring a marked first training image and an unmarked second training image; the pseudo tag generation module is used for generating a pseudo tag of the second training image; The second training module is configured to train a second target detection model by using the first labeled training image and a second training image with a pseudo tag to obtain a final target detection model, where the second target detection model is a constructed target detection model or a model obtained by performing preliminary training on the constructed target detection model by using the first labeled training image.
12. The target detection device is characterized by comprising an image acquisition module to be detected and a target detection module; the image acquisition module to be detected is used for acquiring an image to be detected; The target detection module is used for carrying out target detection on the image to be detected by utilizing a target detection model obtained through pre-training; Wherein the object detection model is obtained using the object detection model training apparatus according to claim 11.
13. A processing device is characterized by comprising a memory and a processor; the memory is used for storing programs; The processor is configured to execute the program, implement the steps of the pseudo tag generation method according to any one of claims 1 to 7, implement the steps of the object detection model training method according to claim 8, or implement the steps of the object detection method according to claim 9.
14. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the pseudo tag generation method according to any one of claims 1 to 7, or implements the steps of the object detection model training method according to claim 8, or implements the steps of the object detection method according to claim 9.

Description

Pseudo tag generation method, model training method, target detection method and device Technical Field The invention relates to the technical field of computer vision, in particular to a pseudo tag generation method, a model training method, a target detection method and target detection equipment. Background Target detection refers to finding out an interesting target from an image and determining the category and the position of the target. In many fields of computer vision, object detection is a very basic task, such as image segmentation, object tracking, keypoint detection, etc., which typically rely on object detection. The target detection is generally realized through a trained target detection model, that is, the target detection model is obtained by training the training image, and then the target detection model is obtained by training to detect the target of the image to be detected. In order to train the target detection model, the training image is usually required to be marked, and then the marked training image is used for training the target detection model. The labeling process of the training image is generally realized manually, namely, the positions and the categories of the targets in the training image are labeled manually. It can be appreciated that in order to train to obtain a target detection model with better performance, a large number of labeled training images are often required to be obtained, however, in practical application, the labeled training images are often lacking due to high labor cost, which causes a certain limitation on training of the model. Disclosure of Invention In view of the above, the invention provides a pseudo tag generation method, a model training method, a target detection method and a device, which are used for solving the problem that model training is limited due to lack of marked training images caused by higher labor cost, and the technical scheme is as follows: a pseudo tag generation method comprising: training the constructed target detection model by adopting the marked first training image to obtain a first target detection model; Obtaining target characteristics corresponding to marked targets in the first training image by using the first target detection model to obtain target characteristics corresponding to a plurality of known targets respectively; and generating a pseudo tag of the unlabeled second training image by using the first target detection model and the target characteristics corresponding to the known targets respectively. Optionally, the obtaining, by using the first object detection model, the object feature corresponding to the labeled object in the first training image includes: Acquiring N candidate target features of the first training image by using the first target detection model, wherein each candidate target feature is a target feature corresponding to a candidate target, and N is an integer greater than 1; converting each candidate target feature of the first training image into a candidate target detection result by using the first target detection model to obtain N candidate target detection results of the first training image; and acquiring target features corresponding to marked targets in the first training image according to N candidate target detection results of the first training image, marking information of the first training image and N candidate target features of the first training image. Optionally, the obtaining, according to the N candidate target detection results of the first training image, the labeling information of the first training image, and the N candidate target features of the first training image, the target features corresponding to the labeled targets in the first training image includes: screening candidate target detection results matched with the labeling information of the first training image from N candidate target detection results of the first training image; And acquiring target characteristics corresponding to the marked targets in the first training image according to the screened candidate target detection results and the N candidate target characteristics of the first training image. Optionally, the generating, by using the first target detection model and the target features corresponding to the plurality of known targets respectively, a pseudo tag of an unlabeled second training image includes: dividing target features corresponding to marked targets of the same type into a group to obtain a plurality of target feature groups; Performing target detection on the second training image by using the first target detection model and the plurality of target feature groups to obtain a plurality of candidate target detection results of the second training image; and determining the pseudo tag of the second training image according to the detection results of the plurality of candidate targets of the second training image. Optionally, the performing object detection on the s