CN-121986484-A - Method, apparatus and system for encoding and decoding tensors

CN121986484ACN 121986484 ACN121986484 ACN 121986484ACN-121986484-A

Abstract

Systems and methods for decoding tensors from a bitstream for use by a second part of a network. The second part of the network is a second part of the neural network. The method includes determining a network first part topology indication from the bitstream and determining a network first part weight indication from the bitstream. The method further includes determining whether to decode the tensor from the bitstream using the network second portion based on the network first portion topology indication and the network first portion weight indication. In the event that it is determined to decode the tensor from the bitstream, the method includes decoding the tensor from the bitstream for the network second portion.

Inventors

Christoffer James Luo Siwoen

Assignees

佳能株式会社

Dates

Publication Date: 20260505
Application Date: 20240828
Priority Date: 20231010

Claims (13)

1. A method for decoding a tensor from a bitstream for use by a second part of a network, the method comprising: determining a network first partial topology indication from the bitstream; Determining a network first part weight indication from the bitstream; Determining whether to decode a tensor from the bitstream based on the network first part topology indication and the network first part weight indication, and In the event that it is determined to decode a tensor from the bitstream, the tensor is decoded from the bitstream for the network second portion.
2. A method for generating neural network results for content from a bitstream, the method comprising: determining a network first partial topology indication from the bitstream; Determining a network first part weight indication from the bitstream; determining whether to proceed with the second part of the network based on the first part of the network topology indication and the first part of the network weight indication, and In the event that it is determined to perform the second portion of the network, the second portion of the network is performed using a tensor decoded from the bitstream to produce the neural network result.
3. The method of claim 1, wherein determining whether to decode a tensor from the bitstream is based on availability of a plurality of network second portions associated with the network first portion, wherein one of the plurality of network second portions is selected for execution.
4. The method of claim 2, wherein determining whether to proceed with the network second portion is based on availability of a plurality of network second portions associated with the network first portion, wherein one of the plurality of network second portions is selected for execution.
5. The method of claim 1 or 2, wherein the network topology indication is a hash of the filtered ONNX network representation, the filtering preserving graph properties.
6. The method of claim 1 or 2, wherein the network weight indication is a hash of the filtered ONNX network representation, the filtering preserving weight properties.
7. The method of claim 5, wherein the hash is one of a sha256 hash, a sha1sum hash, a md5sum hash, and a crc32 hash of the filtered ONNX network representation.
8. The method of claim 1 or 2, wherein the network weight indication is a hash of an MPEG NNC representation of the network weight.
9. A method according to claim 1 or 2, wherein the ONNX versions are encoded in the bitstream and an association between the first and second network portions of each ONNX version is available.
10. A decoder for decoding a tensor from a bitstream for use by a second part of a network, the decoder being configured to: determining a network first partial topology indication from the bitstream; Determining a network first part weight indication from the bitstream; Determining whether to decode a tensor from the bitstream based on the network first part topology indication and the network first part weight indication, and In the event that it is determined to decode a tensor from the bitstream, the tensor is decoded from the bitstream for the network second portion.
11. A non-transitory computer readable storage medium storing a program for performing a method for decoding a tensor from a bitstream for use by a second portion of a network, the method comprising: determining a network first partial topology indication from the bitstream; Determining a network first part weight indication from the bitstream; Determining whether to decode a tensor from the bitstream based on the network first part topology indication and the network first part weight indication, and In the event that it is determined to decode a tensor from the bitstream, the tensor is decoded from the bitstream for the network second portion.
12. A decoder for generating neural network results for content from a bitstream, the decoder configured to: determining a network first partial topology indication from the bitstream; Determining a network first part weight indication from the bitstream; determining whether to proceed with the second part of the network based on the first part of the network topology indication and the first part of the network weight indication, and In the event that it is determined to perform the second portion of the network, the second portion of the network is performed using a tensor decoded from the bitstream to produce the neural network result.
13. A non-transitory computer readable storage medium storing a program for performing a decoding method for generating neural network results for content from a bitstream, the method comprising: determining a network first partial topology indication from the bitstream; Determining a network first part weight indication from the bitstream; determining whether to proceed with the second part of the network based on the first part of the network topology indication and the first part of the network weight indication, and In the event that it is determined to perform the second portion of the network, the second portion of the network is performed using a tensor decoded from the bitstream to produce the neural network result.

Description

Method, apparatus and system for encoding and decoding tensors Citation of related application The present application is based on the benefit of the filing date of australian patent application 2023248076 filed on 10 th 2023, 35 u.s.c. ≡119, which is incorporated herein by reference in its entirety as if set forth in its entirety herein. Technical Field The present invention relates generally to digital video signal processing and, in particular, to methods, apparatus and systems for encoding and decoding tensors from convolutional neural networks. The invention also relates to a computer program product comprising a computer readable medium having recorded thereon a computer program for encoding and decoding tensors from a convolutional neural network using video compression techniques. Background Convolutional Neural Networks (CNNs) are emerging technologies for dealing with use cases involving machine vision, such as object detection, instance segmentation, object tracking, human body pose estimation, and motion recognition. The application of CNNs may involve the use of "edge devices" with sensors and some processing power coupled to an application server as part of a "cloud". CNNs may require relatively high computational complexity, which exceeds that which can typically be provided in terms of computational capacity or power consumption with edge devices. Executing CNN in a distributed manner has become one solution to running a front-end network with limited capability edge devices without requiring all computational complexity to be generated within the cloud server, while the edge devices may have underutilized reasoning resources. In other words, the distributed processing allows legacy edge devices to still provide the capability of the leading edge CNN by distributing processing between the edge devices and other processing components such as cloud servers. Such a distributed network architecture may be referred to as "Collaborative Intelligence (CI)", and provides benefits such as reusing partial results from a first portion of the network for several different second portions (possibly with each portion being optimized for a different task). The CI architecture introduces the need for efficient compression of tensor data for transmission over a network such as a WAN. CNNs typically comprise many layers, such as convolutional layers and fully-connected layers, in which data is passed from one layer to the next in the form of a "tensor". Splitting the network across different devices introduces the need to compress the intermediate multidimensional tensor data passing from one layer to the next within the CNN to facilitate transmission over a network with bandwidth limitations or costs. Such compression of tensors may be referred to as "feature compression", and intermediate tensor data is often referred to as "feature" or "feature map". A feature or feature map is typically a collection of two-dimensional (2D) arrays of values that, when combined into a 3D (or 4D) data structure, form a tensor, with each feature map corresponding to one "channel" of the tensor. The intermediate tensor data represents a partially processed version of the input, such as an image frame or video frame, encountered within the neural network. The international organization for standardization/international electrotechnical association 1/group association 29/working group 2 (ISO/IEC JTC1/SC29/WG 2) (also known as "moving picture experts group" (MPEG)) technical requirements are assigned the task of studying the requirements of compression techniques in various contexts and often associated with video. WG2 "MPEG specifications" have established "feature compression of machine video coding" (FCVCM) ad-hoc group (ad-hoc group), which was commissioned to study feature compression. FCVCM AHG have issued a "proposal solicitation" of solicitation responses to form the basis of standardized projects related to feature compression. Previously, the response to the "evidence collection" (CfE) has shown that it can be significantly better than the technique of feature compression results achieved using the most advanced normalization technique applied directly to tensors. CNNs typically require weights for the various layers to be predetermined in a training phase in which a very large amount of training data is passed through the CNN and the results determined by the trained network are compared to ground truth values (ground truth) associated with the training data. The difference between the obtained result and the desired result is denoted as "loss" and is measured using a "loss function". Using the determined loss, a process for updating the network weights, such as random gradient descent (SGD), or the like, is performed. Network weight updates typically involve a "gradient" back-propagation process that starts at the output layer of the network and reverses to terminate when the input layer to the network is updated, the