CN-116912514-B - Neural network for detecting objects in images

CN116912514BCN 116912514 BCN116912514 BCN 116912514BCN-116912514-B

Abstract

Systems, apparatuses, media, and methods for identifying and classifying objects within a set of images are provided. The system and method receives an image depicting an object of interest (310), detects at least a portion of the object of interest within the image using a multi-layer object model (330), determines context information (340), and identifies the object of interest (350) that is included in two or more bounding boxes.

Inventors

HAN WEI
YANG JIANCHAO
ZHANG NING
LI JIA

Assignees

斯纳普公司

Dates

Publication Date: 20260508
Application Date: 20171101
Priority Date: 20161101

Claims (20)

1. An apparatus-implemented method for training a multi-layer object model, the apparatus-implemented method comprising: Accessing, using one or more processors of a device coupled to a memory of the device, a set of training images, each training image depicting a known object of interest; identifying, by the one or more processors, a set of bounding boxes within the set of training images, each individual bounding box in the set of bounding boxes having a resolution; For each given bounding box in the set of bounding boxes: Determining, by the one or more processors, whether the resolution of the given bounding box exceeds a specified box resolution, and Responsive to determining that the resolution of the given bounding box exceeds the specified box resolution, rescaling, by the one or more processors, the resolution of the given bounding box to match the specified box resolution by: Identifying a center point of the given bounding box, and Modifying the given bounding box by cropping at least a portion of the given bounding box with respect to the center point beyond the range of the specified box resolution, the specified box resolution determined based on at least one of the type of known object of interest or information within a label of the given bounding box; Initializing, by the one or more processors, one or more model parameters of the multi-layer object model, and Iteratively adjusting, by the one or more processors, the one or more model parameters while detecting the known object of interest in the set of bounding boxes using the multi-layer object model, the iterative adjusting being performed until a change in an average loss function value resulting from iteration of the one or more model parameters is below a change threshold.
2. The apparatus implementing method of claim 1, wherein an average loss function value is obtained for two or more instances of a training image in the set of training images, each instance of the two or more instances of the training image having a different resolution.
3. The apparatus implementing method of claim 1, wherein each individual bounding box in the set of bounding boxes has a label.
4. The apparatus implementing method of claim 1, wherein each individual bounding box in the set of bounding boxes has a set of coordinates identifying a location within a training image in the set of training images.
5. The apparatus-implemented method of claim 1, wherein the iteratively adjusting the one or more model parameters comprises iteratively adjusting the one or more model parameters using a gradient descent algorithm.
6. The apparatus implementing method of claim 5, wherein the gradient descent algorithm uses a back propagation calculation.
7. The apparatus-implemented method of claim 1, wherein the initializing the one or more model parameters of the multi-layer object model comprises initializing the one or more model parameters using a gaussian distribution.
8. The apparatus implementing method of claim 1, wherein at least one training image of the set of training images includes or is associated with data indicative of at least one of an entity, class, type, or other identifying information of the known object of interest in the at least one training image.
9. The apparatus-implemented method of claim 1, wherein at least one training image of the set of training images comprises or is associated with data identifying a location of at least a portion of the known object of interest within the at least one training image.
10. A system for training a multi-layer object model, comprising: one or more processors, and A processor-readable storage device coupled to the one or more processors, the processor-readable storage device storing processor-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations of training a multi-layer object model, the operations comprising: accessing a set of training images, each training image depicting a known object of interest; Identifying a set of bounding boxes within the set of training images, each individual bounding box in the set of bounding boxes having a resolution; For each given bounding box in the set of bounding boxes: Determining whether the resolution of the given bounding box exceeds a specified box resolution, and In response to determining that the resolution of the given bounding box exceeds the specified box resolution, rescaling the resolution of the given bounding box to match the specified box resolution by: Identifying a center point of the given bounding box, and Clipping at least a portion of the given bounding box with respect to the center point beyond the specified box resolution; initializing one or more model parameters of the multi-layer object model, and Iteratively adjusting the one or more model parameters until a change in an average loss function value resulting from the iteration of the one or more model parameters is below a change threshold.
11. The system of claim 10, wherein an average loss function value is obtained for two or more instances of a training image in the set of training images, each instance of the two or more instances of the training image having a different resolution.
12. The system of claim 10, wherein each individual bounding box in the set of bounding boxes has a label.
13. The system of claim 10, wherein each individual bounding box in the set of bounding boxes has a set of coordinates identifying a location within a training image in the set of training images.
14. The system of claim 10, wherein the iteratively adjusting the one or more model parameters comprises iteratively adjusting the one or more model parameters using a gradient descent algorithm.
15. The system of claim 14, wherein the gradient descent algorithm uses a back-propagation calculation.
16. The system of claim 10, wherein the initializing the one or more model parameters of the multi-layer object model comprises initializing the one or more model parameters using a gaussian distribution.
17. The system of claim 10, wherein at least one training image of the set of training images includes or is associated with data indicative of at least one of an entity, class, type, or other identifying information of the known object of interest in the at least one training image.
18. The system of claim 10, wherein at least one training image of the set of training images includes or is associated with data identifying a location of at least a portion of the known object of interest within the at least one training image.
19. A processor-readable storage device storing processor-executable instructions that, when executed by one or more processors of a machine, cause the machine to perform operations of training a multi-layer object model, the operations comprising: accessing a set of training images, each training image depicting a known object of interest; Identifying a set of bounding boxes within the set of training images, each individual bounding box in the set of bounding boxes having a resolution; For each given bounding box in the set of bounding boxes: Determining whether the resolution of the given bounding box exceeds a specified box resolution, and In response to determining that the resolution of the given bounding box exceeds the specified box resolution, rescaling the resolution of the given bounding box to match the specified box resolution by: Identifying a center point of the given bounding box, and Clipping at least a portion of the given bounding box with respect to the center point beyond the specified box resolution; initializing one or more model parameters of the multi-layer object model, and Iteratively adjusting the one or more model parameters until a change in an average loss function value resulting from the iteration of the one or more model parameters is below a change threshold.
20. The processor-readable storage device of claim 19, wherein an average loss function value is obtained for two or more instances of a training image in the set of training images, each instance of the two or more instances of the training image having a different resolution.

Description

Neural network for detecting objects in images The application is a divisional application of a Chinese patent application with the application number 201780067267.4, the application date of the original application is 2017, 11, 1, the priority date is 2016, 11, 1, the date of entering the Chinese national stage is 2019, 4 and 29, and the application is named as a neural network for detecting objects in images. RELATED APPLICATIONS The present application claims priority from U.S. patent application Ser. No.15/340,675, filed on Ser. No. 11/1, 2016, which is incorporated herein by reference in its entirety. Technical Field Embodiments of the present disclosure generally relate to automatic processing of images. More specifically, but not by way of limitation, the present disclosure proposes systems and methods for detecting and identifying objects within a set of images. Background Telecommunication applications and devices may use various media, such as text, images, sound recordings, and/or video recordings, to provide communications between multiple users. For example, video conferencing allows two or more individuals to communicate with each other using a combination of software applications, telecommunications devices, and telecommunications networks. The telecommunication device may also record the video stream for transmission as messages across the telecommunication network. Currently, the object detection process typically uses a two-step approach, training a classification model for image-level prediction without using bounding boxes, and using weakly labeled classification data. The process then classifies the image using a trained classification model, which takes into account localization. However, these processes often result in sub-optimal utilization of model parameters and difficulties in knowledge transfer exist based on various mismatches between classification operations and localization problems. Drawings Each of the figures merely illustrates an example embodiment of the present disclosure and should not be taken to limit its scope. Fig. 1 is a block diagram illustrating a networked system according to some example embodiments. Fig. 2 is a diagram illustrating an object detection system according to some example embodiments. FIG. 3 is a flowchart illustrating an example method for detecting and identifying objects within an image, according to some example embodiments. FIG. 4 is a flowchart illustrating an example method for detecting and identifying objects within an image, according to some example embodiments. FIG. 5 is a flowchart illustrating an example method for detecting and identifying objects within an image, according to some example embodiments. FIG. 6 is a flowchart illustrating an example method for detecting and identifying objects within an image, according to some example embodiments. FIG. 7 is a flowchart illustrating an example method for detecting and identifying objects within an image, according to some example embodiments. FIG. 8 is a flowchart illustrating an example method for detecting and identifying objects within an image, according to some example embodiments. FIG. 9 is a user interface diagram depicting an example mobile device and mobile operating system interface according to some example embodiments. FIG. 10 is a block diagram illustrating an example of a software architecture that may be installed on a machine according to some example embodiments. FIG. 11 is a block diagram presenting a graphical representation of a machine in the form of a computer system within which a set of instructions may be executed to cause the machine to perform any of the methods discussed herein, according to an example embodiment. The headings provided herein are for convenience only and do not necessarily affect the scope or meaning of the terms used. Detailed Description The following description includes systems, methods, techniques, sequences of instructions, and computer-machine program products that illustrate embodiments of the present disclosure. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide an understanding of various embodiments of the inventive subject matter. It will be apparent, however, to one skilled in the art that embodiments of the inventive subject matter may be practiced without these specific details. In general, well-known instruction instances, protocols, structures, and techniques have not been shown in detail. A general object detection system classifies objects using classification tags and locates objects present in an input image. Object detection is performed and evaluated by means of an average precision (MEAN AVERAGE precision), which is a measure of quality that takes into account classification and positioning. Thus, there remains a need in the art for improved identification, modeling, interpretation, and recognition (recognment) of objects within an image without or