CN-114529756-B - Image labeling method and device
Abstract
The invention discloses an image labeling method and device, the method comprises the steps of receiving a prediction result of a target deep learning model on an unlabeled sample image, screening prediction labels of a plurality of sample images based on the prediction result to serve as training labeling images, obtaining a multi-scale image pyramid of the training labeling images, copying the multi-scale image pyramid into two parts, performing first data processing on one part of the multi-scale image pyramid to obtain a first image, performing second data processing different from the first data processing or not on the other part of the multi-scale image pyramid to obtain a second image, inputting the first image and the second image into the target deep learning model to obtain corresponding first prediction labels and second prediction labels, calculating corresponding loss functions, and performing iterative updating on the target deep learning model. The method and the device can fully utilize the marking data output by the original target deep learning model, reduce marking cost and improve the accuracy of the marking data.
Inventors
- ZHONG CHENG
- ZHOU YINGJIE
- DENG XING
- ZHANG ZEXI
Assignees
- 传申弘安智能(深圳)有限公司
- 珠高智能科技(深圳)有限公司
Dates
- Publication Date
- 20260421
- Application Date
- 20220124
- Priority Date
- 20220124
Claims (4)
- 1. An image labeling method is characterized by comprising the following steps: acquiring image data, and selecting a part of image data from the image data by a clustering method to serve as an unlabeled sample image; labeling the unlabeled sample image by using a labeling tool to obtain a sample labeling image; The sample labeling image is input into a target deep learning model to carry out small sample training to obtain a coarse scale model, wherein the target deep learning model comprises a first multi-scale refining branch, a main network and a second multi-scale refining branch, the first multi-scale refining branch carries out feature extraction through a second feature extraction network, the main network carries out feature extraction through the first feature extraction network, and the second multi-scale refining branch carries out feature extraction through a third feature extraction network; stopping training if the rough mark model meets the preset requirement, and outputting a marking result; if the precision of the rough standard model does not meet the preset requirement, adding semi-supervised cyclic training; the training of the small sample comprises: inputting a sample labeling image, cutting out a positive sample target from the sample labeling image, and performing multi-scale scaling on the cut positive sample target to generate a multi-scale image pyramid which is used as the input of a first multi-scale thinning branch; Inputting the sample marked image into a backbone network, inputting a corresponding multi-scale image pyramid into a first multi-scale thinning branch, and obtaining corresponding image features through a second feature extraction network, wherein the weight of the second feature extraction network is shared with that of the first feature extraction network; inputting the sample labeling image into a backbone network, and calculating a corresponding loss function after the sample labeling image passes through a first feature extraction network; inputting the multi-scale image pyramid into a first multi-scale refinement branch, calculating a corresponding loss function of the first multi-scale refinement branch, and merging the loss function into a loss function of a backbone network; the semi-supervised cyclic training method comprises the following steps: receiving a prediction result of a target deep learning model on an unlabeled sample image, screening prediction labels of a plurality of sample images based on the prediction result, and taking the prediction labels as training labeling images; Performing first data processing on one of the multi-scale image pyramids to obtain a first image, and performing second data processing different from the first data processing or not on the other multi-scale image pyramid to obtain a second image; Inputting the first image and the second image into the target deep learning model to obtain a corresponding first prediction label and a corresponding second prediction label, calculating a corresponding loss function according to the first prediction label and the second prediction label, and carrying out iterative updating on the target deep learning model; The method for screening out the prediction labels of a plurality of sample pictures based on the prediction result as the training annotation image comprises the following steps: Receiving the prediction result of the unlabeled sample image; Selecting a prediction frame with confidence coefficient above a preset threshold value from the prediction result as a label of the sample image; Taking the sample image with the label as the training label image; Inputting the first image and the second image into the target deep learning model to obtain a corresponding first prediction tag and a corresponding second prediction tag, calculating a corresponding loss function according to the first prediction tag and the second prediction tag, and iteratively updating the target deep learning model comprises: inputting the first image and the second image as a group of input samples to a second multi-scale refinement branch, and predicting through a third feature extraction network to obtain a first prediction tag and a second prediction tag, wherein the third feature extraction network shares weights with the first feature extraction network and the second feature extraction network; And taking the second prediction tag as a real tag of the first image after the first data processing is executed, comparing the real tag with the first prediction tag, calculating a loss function corresponding to a second multi-scale refinement branch, merging the loss function into a loss function of a backbone network, and carrying out iterative updating on the backbone network.
- 2. The image annotation method as claimed in claim 1, wherein the first data processing is data intensity enhancement and the second data processing is data intensity enhancement.
- 3. An image labelling device, employing the method of any of claims 1 to 2, comprising: The acquisition module is used for acquiring image data, and selecting a part of image data from the image data as an unlabeled sample image through a clustering method; the labeling module is used for labeling the unlabeled sample image by using a labeling tool to obtain a sample labeling image; the target deep learning model comprises a first multi-scale refining branch, a main network and a second multi-scale refining branch, wherein the first multi-scale refining branch performs feature extraction through a second feature extraction network, the main network performs feature extraction through a first feature extraction network, and the second multi-scale refining branch performs feature extraction through a third feature extraction network; the training stopping module is used for stopping training if the rough mark model reaches a preset requirement and outputting a marking result; the adding module is used for adding semi-supervised cyclic training if the precision of the coarse scale model does not meet the preset requirement; The small sample training module is further to: inputting a sample labeling image, cutting out a positive sample target from the sample labeling image, and performing multi-scale scaling on the cut positive sample target to generate a multi-scale image pyramid which is used as the input of a first multi-scale thinning branch; Inputting the sample marked image into a backbone network, inputting a corresponding multi-scale image pyramid into a first multi-scale thinning branch, and obtaining corresponding image features through a second feature extraction network, wherein the weight of the second feature extraction network is shared with that of the first feature extraction network; inputting the sample labeling image into a backbone network, and calculating a corresponding loss function after the sample labeling image passes through a first feature extraction network; inputting the multi-scale image pyramid into a first multi-scale refinement branch, calculating a corresponding loss function of the first multi-scale refinement branch, and merging the loss function into a loss function of a backbone network; The additional module then includes: The selection module is used for receiving the prediction result of the target deep learning model on the unlabeled sample image, screening out the prediction labels of a plurality of sample pictures based on the prediction result, and taking the prediction labels as training labeling images; The image processing module is used for acquiring a multi-scale image pyramid of the training marked image, and copying the multi-scale image pyramid into two parts, executing first data processing on one part of the multi-scale image pyramid to obtain a first image, and executing second data processing or non-executing processing different from the first data processing on the other part of the multi-scale image pyramid to obtain a second image; the training module is used for inputting the first image and the second image into the target deep learning model to obtain a corresponding first prediction label and a corresponding second prediction label, calculating a corresponding loss function according to the first prediction label and the second prediction label, and carrying out iterative updating on the target deep learning model; the selecting module is further configured to: Receiving the prediction result of the unlabeled sample image; Selecting a prediction frame with confidence coefficient above a preset threshold value from the prediction result as a label of the sample image; Taking the sample image with the label as the training label image; the training module is also configured to: inputting the first image and the second image as a group of input samples to a second multi-scale refinement branch, and predicting through a third feature extraction network to obtain a first prediction tag and a second prediction tag, wherein the third feature extraction network shares weights with the first feature extraction network and the second feature extraction network; And taking the second prediction tag as a real tag of the first image after the first data processing is executed, comparing the real tag with the first prediction tag, calculating a loss function corresponding to a second multi-scale refinement branch, merging the loss function into a loss function of a backbone network, and carrying out iterative updating on the backbone network.
- 4. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1 to 2.
Description
Image labeling method and device Technical Field The invention relates to the technical field of machine learning, in particular to an image labeling method and device. Background The first step in solving the practical problem using the depth model is to obtain annotation data for the corresponding application scenario. In general, training a better performing model requires thousands of labeling data, the labeling amount is huge, and when labeling tasks involve expertise in the vertical field, the related personnel are required to be trained on duty, so that the labor cost and the time cost are increased sharply. The accuracy of marking is also a vital ring, the manual marking has stronger uncertainty and contingency, different quality inspection modes are required to be designed according to different scenes, more specialized quality inspection personnel are trained, and the comprehensive cost is high. Therefore, an automated image labeling method is needed to obtain high-precision labeling data while reducing labeling costs. Disclosure of Invention The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides the image labeling method which can reduce the labeling cost and improve the precision of labeling data. The invention also provides an image marking device with the image marking method. The invention also provides a computer readable storage medium with the image labeling method. The image labeling method comprises the steps of receiving a prediction result of a target deep learning model on an unlabeled sample image, screening prediction labels of a plurality of sample images based on the prediction result to serve as training labeling images, obtaining a multi-scale image pyramid of the training labeling images, copying the multi-scale image pyramid into two parts, performing first data processing on one part of the multi-scale image pyramid to obtain a first image, performing second data processing or non-performing processing on the other part of the multi-scale image pyramid, which is different from the first data processing, to obtain a second image, inputting the first image and the second image into the target deep learning model to obtain corresponding first prediction labels and second prediction labels, calculating corresponding loss functions according to the first prediction labels and the second prediction labels, and performing iterative updating on the target deep learning model. The image labeling method has the advantages that labeling data output by an original target deep learning model can be fully utilized, a plurality of training labeling images can be screened out, two different treatments are respectively carried out on each training labeling image, the training labeling images are used as a group of samples, the target deep learning model is input, a loss function is calculated according to the obtained two prediction labels, iteration is carried out on the target deep learning model, and the precision of the labeling data is effectively improved while the labeling cost is reduced. According to some embodiments of the invention, a method for screening out prediction labels of a plurality of sample pictures based on the prediction results and taking the sample pictures as training annotation images comprises the steps of receiving the prediction results of unlabeled sample images, selecting prediction frames with confidence above a preset threshold from the prediction results as the annotations of the sample images, and taking the sample images with the annotations as the training annotation images. According to some embodiments of the invention, the first data processing is data strength augmentation and the second data processing is data weakness augmentation. According to some embodiments of the invention, calculating the corresponding loss function according to the first prediction tag and the second prediction tag includes using the second prediction tag as a real tag of the first multi-scale image pyramid after the first data processing is executed, comparing the real tag with the first prediction tag, and calculating the corresponding loss function. According to some embodiments of the invention, the iterative updating method for the target deep learning model comprises the steps of inputting the first image and the second image into a second multi-scale refinement branch of the target deep learning model, calculating a loss function of the second multi-scale refinement branch, merging the loss function into a corresponding loss function of a main branch of the target deep learning model, and iteratively updating the target deep learning model. According to some embodiments of the invention, the second multi-scale refined branch of the target deep learning model is shared with weights of a feature extraction network of a main branch of the target deep learning model. According to some embodim