US-12627821-B2 - Scalable video coding for machine

US12627821B2US 12627821 B2US12627821 B2US 12627821B2US-12627821-B2

Abstract

A neural processing unit (NPU) for decoding video or feature map is provided. The NPU may include at least one processing element (PE) for an artificial neural network, the at least one PE configured to receive and decode data included in a bitstream. The data included in the bitstream may include data of a base layer; or the data of the base layer and data of at least one enhancement layer. An NPU for encoding video or feature map is also provided. The encoder NPU may include at least one PE for an artificial neural network, the at least one PE configured to receive and encode a transmitted video or feature map, wherein the at least one PE may be further configured to output a bitstream including data of a base layer and data of at least one enhancement layer.

Inventors

IlMyeong Im
Sunmi LEE

Assignees

DEEPX CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20231013
Priority Date: 20220629

Claims (20)

1 . A neural processing unit (NPU) for processing feature maps, the NPU comprising: a first circuitry arranged as at least one processing element (PE) for performing operations of an artificial neural network, by using data included in a received bitstream, wherein the data included in the received bitstream comprises: data of a base layer; or the data of the base layer and data of at least one enhancement layer, wherein the data of the base layer includes a first feature map which is applied to a first machine analysis task, wherein the data of the at least one enhancement layer includes a second feature map which is applied to a second machine analysis task, wherein the first machine analysis task utilizes only the first feature map, wherein the second machine analysis task utilizes both of the first feature map and the second feature map, wherein the first machine analysis task utilizing only the first feature map is different from the second machine analysis task utilizing both of the first feature map and the second feature map, and wherein the received bitstream is generated by and transmitted from another NPU.
2 . The NPU of claim 1 , wherein at least a portion of the at least one enhancement layer of the received bitstream is configured to be selectively processed.
3 . The NPU of claim 1 , wherein at least a portion of the at least one enhancement layer is configured to be selectively processed according to an available bandwidth of a transmission channel of the received bitstream or according to a preset machine analysis task.
4 . The NPU of claim 1 , wherein the first and second feature maps in the received bitstream have been extracted at any intermediate layer of an artificial neural network model.
5 . The NPU of claim 1 , wherein an available bandwidth of a transmission channel of the received bitstream is configured to be detected.
6 . The NPU of claim 1 , wherein the at least one PE is configured to selectively process at least a portion of the at least one enhancement layer according to a preset machine analysis task.
7 . The NPU of claim 1 , wherein the at least one PE is configured to process the base layer and a first enhancement layer according to the first machine analysis task.
8 . The NPU of claim 1 , wherein the at least one PE is configured to process the base layer, a first enhancement layer, and a second enhancement layer according to the second machine analysis task.
9 . The NPU of claim 1 , wherein a number of the at least one enhancement layer included in one frame is varied according to a condition of a transmission channel.
10 . The NPU of claim 1 , wherein a number of the at least one enhancement layer included in one frame is determined according to a condition of a transmission channel and feedback information on the determined number is transmitted to an encoder.
11 . The NPU of claim 1 , wherein the at least one enhancement layer is included in one frame in ascending order according to indexes of layers of the at least one enhancement layer.
12 . A neural processing unit (NPU) for encoding feature maps, the NPU comprising: a first circuitry arranged as at least one processing element (PE) for performing operations of an artificial neural network thereby outputting one or more feature maps, wherein the one or more feature maps are packed into a bitstream, wherein the bitstream includes data of a base layer; or the data of the base layer and data of at least one enhancement layer, wherein the data of the base layer includes a first feature map which is applied to a first machine analysis task, wherein the data of the at least one enhancement layer includes a second feature map which is applied to a second machine analysis task, wherein the first machine analysis task utilizes only the first feature map, wherein the second machine analysis task utilizes both of the first feature map and the second feature map, wherein the first machine analysis task utilizing only the first feature map is different from the second machine analysis task utilizing both of the first feature map and the second feature map, and wherein the bitstream is transmitted from the NPU to another NPU.
13 . The NPU of claim 12 , wherein a number of the at least one enhancement layer of the bitstream is adjusted according to an available bandwidth of a transmission channel.
14 . The NPU of claim 12 , wherein a number of the at least one enhancement layer of the bitstream is adjusted for at least one frame interval.
15 . The NPU of claim 12 , wherein the first and second feature maps in the bitstream are extracted at any intermediate layer of an artificial neural network model.
16 . The NPU of claim 12 , wherein a number of the at least one enhancement layer included in one frame is varied according to a condition of a transmission channel.
17 . The NPU of claim 12 , wherein the NPU is configured to receive feedback on a number of at least one enhancement layer included in one frame from a decoder.
18 . The NPU of claim 12 , wherein the at least one enhancement layer is included in one frame in ascending order according to indexes of layers of the at least one enhancement layer.
19 . A VCM decoder for processing feature maps, the VCM decoder comprising: a first circuitry arranged as at least one processing element (PE) for performing operations of an artificial neural network, by using data included in a received bitstream, wherein the data included in the received bitstream comprises: data of a base layer, or the data of the base layer and data of at least one enhancement layer, wherein the data of the base layer includes a first feature map which is applied to a first machine analysis task, wherein the data of the at least one enhancement layer includes a second feature map which is applied to a second machine analysis task, wherein the first machine analysis task utilizes only the first feature map, wherein the second machine analysis task utilizes both of the first feature map and the second feature map, wherein the first machine analysis task utilizing only the first feature map is different from the second machine analysis task utilizing both of the first feature map and the second feature map, and wherein the received bitstream is generated by and transmitted from another NPU.
20 . The VCM decoder of claim 19 , wherein the first and second feature maps in the received bitstream have been extracted at any intermediate layer of an artificial neural network model.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation application of the U.S. Utility patent application Ser. No. 17/898,234 filed on Aug. 29, 2022, which claims the priority of Korean Patent Application No. 10-2022-0079620 filed on Jun. 29, 2022, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference. BACKGROUND OF THE DISCLOSURE Technical Field The present disclosure relates to scalable video coding for a machine. Background Art Continuous development of the information and communication industry has led to a worldwide spread of broadcasting services having a high definition (HD) resolution. As a result, users of such services have become accustomed to high-resolution and high-definition images and/or videos, and demand has increased for high picture quality, that is, high-resolution, high-quality video such as ultra high definition (UHD) video. Standardization of coding technology for UHD (4K, 8K, or higher) video data was completed in 2013 through high efficiency video coding (HEVC). HEVC is a next-generation video compression technology that has a higher compression rate and lower complexity than the previous H.264/AVC technology. HEVC is a key technology for effectively compressing the massive amounts of data of HD and UHD video content. HEVC performs block-based encoding like previous compression standards. However, unlike H.264/AVC, there is a difference in that only one profile exists. There are a total of eight core encoding technologies included in HEVC's unique profile, to include technologies for hierarchical coding structure, transformation, quantization, intra prediction coding, inter picture motion prediction, entropy coding, loop filtering, and others. Since adoption of the HEVC video codec in 2013, immersive video and virtual reality services using 4K and 8K video images have expanded, and a versatile video coding (VVC) standard has been developed. VVC, which is called H.266, is a next-generation video codec that aims to improve performance by more than two times compared to HEVC. H.266 (VVC) was developed with the goal of more than twice the efficiency of the previous generation codec, i.e., H.265 (HEVC). VVC was initially developed with 4K or higher resolution in mind, but it was also developed for 16K-level ultra-high-resolution image processing for the purpose of responding to 360-degree images due to the expansion of the VR market. In addition, as the HDR market gradually expands due to the development of display technology, VVC supports not only 10-bit color depth but also 16-bit color depth, and supports 1000 nits, 4000 nits, and 10000 nits of brightness expression. In addition, as it is being developed with the VR market and 360-degree video market in mind, it supports variable frame rates ranging from 0 to 120 FPS. Advancement of Artificial Intelligence Artificial intelligence (AI) is also developing rapidly. AI refers to artificially imitating human intelligence, that is, intelligence capable of performing recognition, classification, inference, prediction, and control/decision making. Due to the development of artificial intelligence technology and the increase in Internet of Things (IOT) devices, it is predicted that traffic between machines will explode, and image analysis that depends on the machine will be widely used. SUMMARY OF THE DISCLOSURE The inventors of the present disclosure have recognized the problem that a technique for image analysis by a machine has not yet been developed. Accordingly, an object of the present disclosure is to provide a neural processing unit (NPU) for effectively performing image analysis by a machine. A neural processing unit (NPU) according to an example of the present disclosure may be an NPU for decoding video or feature map. The NPU may include at least one processing element (PE) for an artificial neural network. A bitstream received by the at least one PE may include base layer data and may alternatively include base layer data and data of at least one enhancement layer. The base layer data included in the received bitstream may be configured to be decoded by the at least one PE. Alternatively, the base layer data and the at least one enhancement layer data included in the received bitstream may be configured to be decoded by the at least one PE. At least a portion of the at least one enhancement layer of the received bitstream may be configured to be selectively processed. At least a portion of the at least one enhancement layer may be configured to be selectively processed according to an available bandwidth of a transmission channel of the received bitstream. At least a portion of the at least one enhancement layer may be configured to be selectively processed according to a preset machine analysis task. An available bandwidth of a transmission channel of the received bitstream may be configured to be detected. The at least one PE may be configured to selectively p