US-12626493-B2 - Method and apparatus for determining set of training samples, method and apparatus for training model, and method and apparatus for detecting object

US12626493B2US 12626493 B2US12626493 B2US 12626493B2US-12626493-B2

Abstract

A method and an apparatus for determining a set of training samples, a method and an apparatus for training a model, and a method and an apparatus for detecting an object, the method including: performing a movement operation on an object region in a sample image to determine a plurality of enhanced object regions; and determining, based on a plurality of candidate regions corresponding to the sample image and the plurality of enhanced object regions, a set of training samples corresponding to the sample image to obtain an object detection model by training a network model based on the set of training samples corresponding to the sample image. A probability of a small object region being selected as a set of training samples is improved without affecting sample selection of a large-size object region, thereby improving detection accuracy of an object detection model in detecting the small object.

Inventors

Hao Sun

Assignees

Black Sesame Technologies (Chengdu) Co., Ltd.

Dates

Publication Date: 20260512
Application Date: 20230817
Priority Date: 20220817

Claims (20)

1 . A method for determining a set of training samples, comprising: performing a movement operation on an object region in a sample image to determine a plurality of enhanced object regions, wherein the plurality of enhanced object regions comprise a moved object region, wherein the movement operation comprises translating or rotating the object region by a predetermined amount relative to the sample image; and determining, based on a plurality of candidate regions corresponding to the sample image and the plurality of enhanced object regions, a set of training samples corresponding to the sample image to obtain an object detection model by training a network model based on the set of training samples corresponding to the sample image, wherein the set of training samples corresponding to the sample image comprises at least one candidate region in the plurality of candidate regions corresponding to the sample image, wherein determining the set of training samples based on the plurality of candidate regions corresponding to the sample image and the plurality of enhanced object regions comprises: for each current enhanced object region in the plurality of enhanced object regions, calculating a respective intersection over union between each of the plurality of candidate regions and the current enhanced object region to determine overlap degrees corresponding to the plurality of candidate regions respectively; determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively and an overlap degree threshold, a set of training samples corresponding to the current enhanced object region; and determining, based on the set of training samples corresponding to the plurality of enhanced object regions respectively, the set of training samples corresponding to the sample image, thereby affecting an accuracy level of the object detection model.
2 . The method according to claim 1 , wherein performing the movement operation on the object region in the sample image to determine the plurality of enhanced object regions comprises: performing, based on a preset path, the movement operation on the object region in the sample image to determine the plurality of enhanced object regions.
3 . The method according to claim 1 , wherein the overlap degree threshold comprises a fixed overlap degree threshold and a dynamic overlap degree threshold, and wherein determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively and an overlap degree threshold, a set of training samples corresponding to the current enhanced object region comprises: determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively and the fixed overlap degree threshold, a first set of training samples corresponding to the current enhanced object region, wherein the first set of training samples comprises at least one candidate region in the plurality of candidate regions; determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively, an overlap degree corresponding to the current enhanced object region, and determining, based on the overlap degrees corresponding to the plurality of enhanced object regions respectively, a mean value and a standard deviation of the overlap degrees; determining, based on the first set of training samples corresponding to the plurality of enhanced object regions respectively, a stability coefficient of the current enhanced object region; determining, based on the mean value and the standard deviation of the overlap degrees, and the stability coefficient of the current enhanced object region, a dynamic overlap degree threshold of the current enhanced object region; determining, based on the overlap degrees corresponding to the plurality of candidate regions and the dynamic overlap degree threshold of the current enhanced object region, a second set of training samples corresponding to the current enhanced object region; and determining, based on the first set of training samples corresponding to the current enhanced object region and the second set of training samples corresponding to the current enhanced object region, the set of training samples corresponding to the current enhanced object region.
4 . The method according to claim 1 , before performing the movement operation on the object region in the sample image to determine the plurality of enhanced object regions, further comprising: for each current object region in a plurality of object regions in the sample image, performing pre-moving on the current object region to determine a plurality of pre-enhanced object regions corresponding to the current object region, wherein the plurality of pre-enhanced object regions comprise the current object region and a plurality of pre-moved object regions corresponding to the current object region; determining, based on the plurality of candidate regions corresponding to the sample image and the plurality of pre-enhanced object regions corresponding to the current object region, an overlap degree corresponding to the current object region; and determining, based on overlap degrees corresponding to the plurality of object regions respectively, a movement order of the plurality of object regions to perform the movement operation on the plurality of object regions in the sample image according to the movement order.
5 . The method according to claim 1 , wherein the object detection model is configured to detect an object in an image.
6 . The method according to claim 1 , wherein the object detection model is configured to: determine an image to be detected; and detect the image to determine the object region in the image.
7 . The method of claim 1 , wherein the movement operation on the object region in the sample image is determined based on simulating a moving state of an object in the sample image.
8 . One or more non-transitory computer-readable storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising: performing a movement operation on an object region in a sample image to determine a plurality of enhanced object regions, wherein the plurality of enhanced object regions comprise a moved object region, wherein the movement operation comprises translating or rotating the object region by a predetermined amount relative to the sample image; and determining, based on a plurality of candidate regions corresponding to the sample image and the plurality of enhanced object regions, a set of training samples corresponding to the sample image to obtain an object detection model by training a network model based on the set of training samples corresponding to the sample image, wherein the set of training samples corresponding to the sample image comprises at least one candidate region in the plurality of candidate regions corresponding to the sample image, wherein determining the set of training samples based on the plurality of candidate regions corresponding to the sample image and the plurality of enhanced object regions comprises: for each current enhanced object region in the plurality of enhanced object regions, calculating a respective intersection over union between each of the plurality of candidate regions and the current enhanced object region to determine overlap degrees corresponding to the plurality of candidate regions respectively; determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively and an overlap degree threshold, a set of training samples corresponding to the current enhanced object region; and determining, based on the set of training samples corresponding to the plurality of enhanced object regions respectively, the set of training samples corresponding to the sample image, thereby affecting an accuracy level of the object detection model.
9 . The one or more non-transitory computer-readable storage media according to claim 8 , wherein performing the movement operation on the object region in the sample image to determine the plurality of enhanced object regions comprises: performing, based on a preset path, the movement operation on the object region in the sample image to determine the plurality of enhanced object regions.
10 . The one or more non-transitory computer-readable storage media according to claim 8 , wherein the overlap degree threshold comprises a fixed overlap degree threshold and a dynamic overlap degree threshold, and wherein determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively and an overlap degree threshold, a set of training samples corresponding to the current enhanced object region comprises: determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively and the fixed overlap degree threshold, a first set of training samples corresponding to the current enhanced object region, wherein the first set of training samples comprises at least one candidate region in the plurality of candidate regions; determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively, an overlap degree corresponding to the current enhanced object region, and determining, based on the overlap degrees corresponding to the plurality of enhanced object regions respectively, a mean value and a standard deviation of the overlap degrees; determining, based on the first set of training samples corresponding to the plurality of enhanced object regions respectively, a stability coefficient of the current enhanced object region; determining, based on the mean value and the standard deviation of the overlap degrees, and the stability coefficient of the current enhanced object region, a dynamic overlap degree threshold of the current enhanced object region; determining, based on the overlap degrees corresponding to the plurality of candidate regions and the dynamic overlap degree threshold of the current enhanced object region, a second set of training samples corresponding to the current enhanced object region; and determining, based on the first set of training samples corresponding to the current enhanced object region and the second set of training samples corresponding to the current enhanced object region, the set of training samples corresponding to the current enhanced object region.
11 . The one or more non-transitory computer-readable storage media according to claim 8 , before performing the movement operation on the object region in the sample image to determine the plurality of enhanced object regions, further comprising: for each current object region in a plurality of object regions in the sample image, performing pre-moving on the current object region to determine a plurality of pre-enhanced object regions corresponding to the current object region, wherein the plurality of pre-enhanced object regions comprise the current object region and a plurality of pre-moved object regions corresponding to the current object region; determining, based on the plurality of candidate regions corresponding to the sample image and the plurality of pre-enhanced object regions corresponding to the current object region, an overlap degree corresponding to the current object region; and determining, based on overlap degrees corresponding to the plurality of object regions respectively, a movement order of the plurality of object regions to perform the movement operation on the plurality of object regions in the sample image according to the movement order.
12 . The one or more non-transitory computer-readable storage media according to claim 8 , wherein the object detection model is configured to: determine an image to be detected; and detect the image to determine the object region in the image.
13 . The one or more non-transitory computer-readable storage media of claim 8 , wherein the movement operation on the object region in the sample image is determined based on simulating a moving state of an object in the sample image.
14 . A system comprising: one or more computers and one or more storage devices on which are stored instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: performing a movement operation on an object region in a sample image to determine a plurality of enhanced object regions, wherein the plurality of enhanced object regions comprise a moved object region, wherein the movement operation comprises translating or rotating the object region by a predetermined amount relative to the sample image; and determining, based on a plurality of candidate regions corresponding to the sample image and the plurality of enhanced object regions, a set of training samples corresponding to the sample image to obtain an object detection model by training a network model based on the set of training samples corresponding to the sample image, wherein the set of training samples corresponding to the sample image comprises at least one candidate region in the plurality of candidate regions corresponding to the sample image, wherein determining the set of training samples based on the plurality of candidate regions corresponding to the sample image and the plurality of enhanced object regions comprises: for each current enhanced object region in the plurality of enhanced object regions, calculating a respective intersection over union between each of the plurality of candidate regions and the current enhanced object region to determine overlap degrees corresponding to the plurality of candidate regions respectively; determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively and an overlap degree threshold, a set of training samples corresponding to the current enhanced object region; and determining, based on the set of training samples corresponding to the plurality of enhanced object regions respectively, the set of training samples corresponding to the sample image, thereby affecting an accuracy level of the object detection model.
15 . The system according to claim 14 , wherein performing the movement operation on the object region in the sample image to determine the plurality of enhanced object regions comprises: performing, based on a preset path, the movement operation on the object region in the sample image to determine the plurality of enhanced object regions.
16 . The system according to claim 14 , wherein the overlap degree threshold comprises a fixed overlap degree threshold and a dynamic overlap degree threshold, and wherein determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively and an overlap degree threshold, a set of training samples corresponding to the current enhanced object region comprises: determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively and the fixed overlap degree threshold, a first set of training samples corresponding to the current enhanced object region, wherein the first set of training samples comprises at least one candidate region in the plurality of candidate regions; determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively, an overlap degree corresponding to the current enhanced object region, and determining, based on the overlap degrees corresponding to the plurality of enhanced object regions respectively, a mean value and a standard deviation of the overlap degrees; determining, based on the first set of training samples corresponding to the plurality of enhanced object regions respectively, a stability coefficient of the current enhanced object region; determining, based on the mean value and the standard deviation of the overlap degrees, and the stability coefficient of the current enhanced object region, a dynamic overlap degree threshold of the current enhanced object region; determining, based on the overlap degrees corresponding to the plurality of candidate regions and the dynamic overlap degree threshold of the current enhanced object region, a second set of training samples corresponding to the current enhanced object region; and determining, based on the first set of training samples corresponding to the current enhanced object region and the second set of training samples corresponding to the current enhanced object region, the set of training samples corresponding to the current enhanced object region.
17 . The system according to claim 14 , before performing the movement operation on the object region in the sample image to determine the plurality of enhanced object regions, further comprising: for each current object region in a plurality of object regions in the sample image, performing pre-moving on the current object region to determine a plurality of pre-enhanced object regions corresponding to the current object region, wherein the plurality of pre-enhanced object regions comprise the current object region and a plurality of pre-moved object regions corresponding to the current object region; determining, based on the plurality of candidate regions corresponding to the sample image and the plurality of pre-enhanced object regions corresponding to the current object region, an overlap degree corresponding to the current object region; and determining, based on overlap degrees corresponding to the plurality of object regions respectively, a movement order of the plurality of object regions to perform the movement operation on the plurality of object regions in the sample image according to the movement order.
18 . The system according to claim 14 , wherein the object detection model is configured to detect an object in an image to be detected.
19 . The system according to claim 14 , wherein the object detection model is configured to: determine an image to be detected; and detect the image to determine the object region in the image.
20 . The system of claim 14 , wherein the movement operation on the object region in the sample image is determined based on simulating a moving state of an object in the sample image.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application claims the benefit under 35 U.S.C. § 119(a) of the filing date of Chinese Patent Application No. 202210987671.6, filed in the Chinese Patent Office on Aug. 17, 2022. The disclosure of the foregoing application is herein incorporated by reference in its entirety. TECHNICAL FIELD The present application relates to the field of deep learning technologies, and in particular, to a method and an apparatus for determining a set of training samples, a method and an apparatus for training a model, a method and an apparatus for detecting an object, a computer readable storage medium and an electronic device. BACKGROUND Object detection is an important application in the field of deep learning. With continuous development of deep learning technologies, there are more and more application scenarios for object detection. Object detection is widely used in scenarios such as intelligent transportation, video surveillance, and vehicle-road cooperation. Small object detection is an important branch and one of the difficulties in object detection. Specifically, a small object in an image to be detected has characteristics such as motion blur and susceptibility to occlusion, resulting in a low detection accuracy of a deep learning model in detecting the small object in the image to be detected. SUMMARY In view of this, embodiments of the present application provide a method and an apparatus for determining a set of training samples, a method and an apparatus for training a model, a method and an apparatus for detecting an object, a computer readable storage medium and an electronic device, to solve a problem of low detection accuracy of a deep learning model in detecting a small object in an image to be detected. According to a first aspect, an embodiment of the present application provides a method for determining a set of training samples, including: performing a movement operation on an object region in a sample image to determine a plurality of enhanced object regions, where the plurality of enhanced object regions include a moved object region; and determining, based on a plurality of candidate regions corresponding to the sample image and the plurality of enhanced object regions, a set of training samples corresponding to the sample image to obtain an object detection model by training a network model based on the set of training samples corresponding to the sample image, where the set of training samples corresponding to the sample image includes at least one candidate region in the plurality of candidate regions corresponding to the sample image. According to the first aspect of the present application, in some embodiments, the performing a movement operation on an object region in a sample image to determine a plurality of enhanced object regions includes: performing, based on a preset path, the movement operation on the object region in the sample image to determine the plurality of enhanced object regions. According to the first aspect of the present application, in some embodiments, the performing, based on a preset path, the movement operation on the object region in the sample image to determine the plurality of enhanced object regions includes: performing, based on the preset path, a translation operation on the object region in the sample image to determine the plurality of enhanced object regions; and/or performing, based on the preset path, a rotation operation on the object region in the sample image to determine the plurality of enhanced object regions. According to the first aspect of the present application, in some embodiments, the determining, based on a plurality of candidate regions corresponding to the sample image and the plurality of enhanced object regions, a set of training samples corresponding to the sample image includes: for each current enhanced object region in the plurality of enhanced object regions, calculating intersection over union between each of the plurality of candidate regions and the current enhanced object region to determine overlap degrees corresponding to the plurality of candidate regions respectively; determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively and an overlap degree threshold, a set of training samples corresponding to the current enhanced object region; and determining, based on the set of training samples corresponding to the plurality of enhanced object regions respectively, the set of training samples corresponding to the sample image. According to the first aspect of the present application, in some embodiments, the overlap degree threshold includes a fixed overlap degree threshold and a dynamic overlap degree threshold, and the determining, based on the overlap degrees corresponding to the plurality of candidate regions respectively and an overlap degree threshold, a set of training samples corresponding to the current enhanced object region include