US-12626488-B2 - Post-processing unit for neural processing unit
Abstract
According to one example of the present disclosure, a post-processing unit may be provided. The post-processing unit may be implemented in register transfer level (RTL) code and designed to interface with a neural processing unit (NPU) configured for object detection computations of a neural network model. The post-processing unit may include a processing unit configured to filter a plurality of bounding boxes transmitted from the NPU and output only those that satisfy a particular condition and one or more input registers configured to store data output from the processing unit.
Inventors
- Ho Chul Lee
Assignees
- DEEPX CO., LTD.
Dates
- Publication Date
- 20260512
- Application Date
- 20250204
- Priority Date
- 20240523
Claims (20)
- 1 . A post-processing circuit, comprising: an access circuit configured to communicate with a neural processing unit (NPU) via a bus, the NPU processing: in a first period, a first image or a first portion of the image to generate a first plurality of bounding boxes, and in a second period subsequent to the first period, a second image or a second portion of the image to generate a second plurality of bounding boxes; a processing circuit configured to: receive the first plurality of bounding boxes and filter the first plurality of bounding boxes in the first period to selectively output a subset of the first plurality of bounding boxes that satisfy a predetermined condition as data output corresponding to the first image or the first portion of the image, receive the second plurality of bounding boxes and filter the second plurality of bounding boxes in the second period to selectively output a subset of the second plurality of bounding boxes that satisfy the predetermined condition as data output corresponding to the second image or the second portion of the image, and perform hypothesis suppression on the data output corresponding to the first image or the first portion of the image in the second period; and memory configured to store the data output from the processing circuit.
- 2 . The post-processing circuit of claim 1 , wherein the data output comprises, for each of the first bounding boxes, class scores indicative of probability that classes of objects being present in each of the first bounding boxes.
- 3 . The post-processing circuit of claim 2 , further comprising a first computation circuit configured to select one or more classes for each of the first bounding boxes as the subset of the data output by comparing class scores of classes for each of the first bounding boxes.
- 4 . The post-processing circuit of claim 3 , further comprising: a second computation circuit configured to extract one or more of the first bounding boxes by comparing a class confidence score of each of the first bounding boxes with a threshold confidence score, the class confidence score representing probability that an object of a class is present in each of the first bounding boxes and derived from an object presence confidence score and the class scores, the object presence confidence score included in the data output and indicative of probability that an object is present in each of the first bounding boxes.
- 5 . The post-processing circuit of claim 4 , wherein the second computation circuit is configured to determine the class confidence score as a product of the object presence confidence score and a class score with the subset of classes extracted by the first computation circuit.
- 6 . The post-processing circuit of claim 3 , wherein the memory is further configured to store the subset of classes for each of the bounding boxes extracted by the first computation circuit.
- 7 . The post-processing circuit of claim 6 , wherein the memory comprises: a plurality of memory registers; and an address generation logic for accessing the plurality of memory registers.
- 8 . The post-processing circuit of claim 4 , wherein the processing circuit is configured to perform a non-maximum suppression (NMS) operation to perform the hypothesis suppression on the one or more bounding boxes extracted by the second computation circuit to remove redundant or overlapping bounding boxes of the plurality of first bounding boxes.
- 9 . A system comprising: a bus; a neural processing unit (NPU) coupled to the bus and configured to: perform at least multiply and accumulate operations on a first input data to generate a plurality of first bounding boxes in a first period, and perform at least multiple and accumulate operations on a second input data to generate a plurality of second bounding boxes in a second period subsequent to the first period; and a post-processing circuit coupled to the bus and comprising: an access circuit configured to communicate with the NPU via the bus to receive the first plurality of bounding boxes and the second plurality of bounding boxes, a processing circuit configured to: receive the first plurality of bounding boxes and filter the first plurality of bounding boxes in the first period to selectively output a subset of the first bounding boxes that satisfy a predetermined condition as data output corresponding to the first input data, receive the second plurality of bounding boxes and filter the second plurality of bounding boxes in the second period to selectively output a subset of the second plurality of bounding boxes that satisfy the predetermined condition as data output corresponding to the second input data, perform hypothesis suppression on the data output corresponding to the first input data in the second period; and memory configured to store the data output from the processing circuit.
- 10 . The system of claim 9 , wherein the data output comprises, for each of the first bounding boxes, class scores indicative of probability that classes of objects being present in each of the first bounding boxes.
- 11 . The system of claim 10 , wherein the post-processing circuit comprises a first computation circuit configured to select one or more classes for each of the first bounding boxes as the subset of the data output by comparing class scores of classes for each of the first bounding boxes.
- 12 . The system of claim 11 , wherein the post-processing circuit further comprises: a second computation circuit configured to extract one or more of the first bounding boxes by comparing a class confidence score of each of the first bounding boxes with a threshold confidence score, the class confidence score representing probability that an object of a class is present in each of the first bounding boxes and derived from an object presence confidence score and the class scores, the object presence confidence score included in the data output and indicative of probability that an object is present in each of the first bounding boxes.
- 13 . The system of claim 12 , wherein the second computation circuit is configured to determine the class confidence score as a product of the object presence confidence score and a class score with the subset of classes extracted by the first computation circuit.
- 14 . The system of claim 11 , wherein the memory is further configured to store the subset of classes for each of the bounding boxes extracted by the first computation circuit.
- 15 . The system of claim 14 , wherein the memory comprises: a plurality of memory registers; and an address generation logic for accessing the plurality of memory registers.
- 16 . The system of claim 12 , wherein the processing circuit is configured to perform a non-maximum suppression (NMS) operation to perform the hypothesis suppression on the one or more bounding boxes extracted by the second computation circuit to remove redundant or overlapping bounding boxes of the plurality of first bounding boxes.
- 17 . A method for performing operations associated with a neural network model, the method comprising: performing, by a neural processing unit (NPU), at least multiply and accumulate operations on a first input data to generate a plurality of first bounding boxes in a first period; sending the plurality of first bounding boxes from the NPU to a post-processing circuit via a bus in the first period; filtering the plurality of first bounding boxes and selectively outputting a subset of the first bounding boxes that satisfy a predetermined condition as data output corresponding to the first input data by the post-processing circuit; storing the data output corresponding to the first input data in a memory of the post-processing circuit; performing, by the NPU, at least multiply and accumulate operations on a second input data to generate a plurality of second bounding boxes in a second period subsequent to the first period; filtering the plurality of second bounding boxes and selectively outputting a subset of the second bounding boxes that satisfy the predetermined condition as data output corresponding to the second input data by the post-processing circuit; storing the data output corresponding to the second input data in a memory of the post-processing circuit; and performing hypothesis suppression on the data output corresponding to the first input data in the second period.
- 18 . The method of claim 17 , wherein the data output comprises, for each of the first bounding boxes, class scores indicative of probability that classes of objects being present in each of the first bounding boxes.
- 19 . The method of claim 18 , wherein filtering the plurality of first bounding boxes and selectively outputting comprises: selecting one or more classes for each of the first bounding boxes as the subset of the data output by comparing class scores of classes for each of the first bounding boxes; and extracting one or more bounding boxes by comparing a class confidence score of each of the first bounding boxes with a threshold confidence score, the class confidence score representing probability that an object of a class is present in each of the first bounding boxes and derived from an object presence confidence score and the class scores, the object presence confidence score included in the data output and indicative of probability that an object is present in each of the first bounding boxes.
- 20 . The method of claim 19 , further comprising determining the class confidence score as a product of the object presence confidence score and a class score with the subset of classes.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority to Republic of Korea Patent Application No. 10-2024-0100508, filed on Jul. 29, 2024, and Republic of Korea Patent Application No. 10-2024-0067309, filed on May 23, 2024, which are incorporated by reference in their entirety. BACKGROUND OF THE DISCLOSURE Humans have the intelligence to recognize, classify, infer, predict, control/decision making, and the like. Artificial intelligence (AI) is the artificial imitation of human intelligence. The human brain is made up of tons of nerve cells called neurons. Each neuron is connected to hundreds to thousands of other neurons through connections called synapses. In order to mimic human intelligence, the operation of biological neurons and the connections between neurons are modeled in a neural network (NN) model. In other words, a neural network is a system of nodes connected in a layer structure that mimics neurons. SUMMARY OF THE DISCLOSURE Embodiments relate to a post-processing circuit that is separate from a neural processing unit (NPU). The post-processing circuit includes an access circuit, a processing circuit and memory. The access circuit enables the post-processing circuit to communicate with the NPU via a bus. The processing circuit filters a plurality of bounding boxes received from the NPU and selectively outputs a subset of the bounding boxes that satisfy a predetermined condition as data output. The memory stores the data output from the processing unit. In one or more embodiments, the data output includes for each of the bounding boxes in a region of an image and class scores indicative of probability that classes of objects being present in each bounding box. In one or more embodiments, the post-processing circuit further includes a first computation circuit configured to select one or more classes for each bounding box as the subset of the data output by comparing class scores of classes for each bounding box. In one or more embodiments, the post-processing circuit further includes a second computation circuit configured to extract one or more bounding boxes by comparing a class confidence score of each bounding box with a threshold confidence score. The class confidence score represents probability that an object of a class is present in each bounding box and derived from an object presence confidence score and the class scores. The object presence confidence score is included in the data output and is indicative of probability that an object is present in each bounding box. In one or more embodiments, the second computation circuit is configured to determine the class confidence score as a product of the object presence confidence score and a class score with the subset of classes extracted by the first computation circuit. In one or more embodiments, the memory further stores the subset of classes for each bounding box extracted by the first computation circuit. In one or more embodiments, the memory includes a plurality of memory registers, and an address generation logic for accessing the plurality of memory registers. In one or more embodiments, the processing circuit is configured to perform a non-maximum suppression (NMS) operation on the one or more bounding boxes extracted by the second computation circuit to remove redundant or overlapping bounding boxes of the plurality of bounding boxes. Embodiments also relate to a system including a bus, a neural processing unit (NPU) and a post-processing circuit. The NPU is coupled to the bus and configured to perform at least multiply and accumulate operations on an input data to generate a plurality of bounding boxes. The post-processing circuit is coupled to the bus and includes an access circuit configured to communicate with the NPU via the bus to receive the plurality of bounding boxes, a processing circuit filters the plurality of bounding boxes and selectively outputs a subset of the bounding boxes that satisfy a predetermined condition as data output. The post-processing circuit includes memory that stores the data output from the processing unit. Embodiments also relate to a method for performing operations associated with a neural network model. A neural processing unit (NPU) is coupled to the bus and performs at least multiply and accumulate operations on an input data to generate a plurality of bounding boxes. The plurality of bounding boxes are sent from the NPU to a post-processing circuit via a bus. The plurality of bounding boxes are filtered and a subset of the bounding boxes that satisfy a predetermined condition is selectively output as data output by the post-processing circuit. The data output is stored in an internal memory of the post-processing circuit. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1A is a block diagram illustrating a system where a post-processing unit implemented separately from a neural processing unit, in accordance with an example of the present disclosure. FIG. 1B is a block diagram i