US-12621471-B2 - Method to determine encoder parameters

US12621471B2US 12621471 B2US12621471 B2US 12621471B2US-12621471-B2

Abstract

A method performed by an apparatus to determine compression parameters to choose when compressing images or videos for use in a machine vision task is provided. The method includes compressing an uncompressed original image or video at a plurality of different quality levels and/or bit rates to create a plurality of compressed images or videos. The method further includes, for each compressed image or video compressed at the different quality levels and/or bit rates: decompressing the compressed image or video to create a decompressed image or video, executing a machine vision algorithm on the decompressed image or video to generate machine vision results for the decompressed image or video and deriving a performance value indicating a performance of the decompressed image or video based on comparing the machine vision results to an assumed truth.

Inventors

Christopher Hollmann
Per Wennersten
Jacob Ström

Assignees

TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

Dates

Publication Date: 20260505
Application Date: 20220408

Claims (12)

1 . A method performed by an apparatus to determine compression parameters to choose when compressing images or videos for use in a machine vision task, the method comprising: compressing an uncompressed original image or video at a plurality of different quality levels and/or bit rates to create a plurality of compressed images or videos; for each compressed image or video compressed at the different quality levels and/or bit rates: decompressing the compressed image or video to create a decompressed image or video; executing a machine vision algorithm on the decompressed image or video to generate machine vision results for the decompressed image or video; generating an assumed truth derived from the original uncompressed image or video; deriving a performance value indicating a performance of the decompressed image or video for the machine vision task based on comparing the machine vision results to the assumed truth derived from the original uncompressed image or video; and determining a quality level and/or bit rate to choose when compressing an image or video to use in the machine vision task based on the performance value for each decompressed image or video compressed at the different quality levels and/or bit rates; and wherein generating the assumed truth comprises executing the machine vision algorithm on the uncompressed original image or video to generate the assumed truth.
2 . The method of claim 1 , wherein generating the assumed truth comprises: executing the machine vision algorithm on a compressed and decompressed version of the uncompressed original image or video to generate the assumed truth.
3 . The method of claim 1 , wherein compressing the uncompressed original image or video at the plurality of different quality levels and/or bit rates comprises: changing a resolution of the uncompressed original image or video to form a plurality of uncompressed original image or video, each having a resolution.
4 . The method of claim 3 , wherein changing the resolution comprises changing a spatial resolution and/or a temporal resolution.
5 . The method of claim 1 , wherein compressing the uncompressed original image or video at the plurality of different quality levels and/or bit rates comprises: compressing the uncompressed original image or video using a different quantization parameter as the different quality levels.
6 . The method of claim 1 , wherein compressing the uncompressed original image or video at the plurality of different quality levels and/or bit rates comprises: compressing the uncompressed original image or video using different quantization parameters for different parts of the uncompressed original image or video.
7 . The method of claim 1 , wherein compressing the uncompressed original image or video at the plurality of different quality levels and/or bit rates comprises: compressing the uncompressed original image or video using different compression algorithms.
8 . The method of claim 1 , further comprising determining the bit rate to use based on the number of bits needed to store or transmit the compressed image or video.
9 . The method of claim 1 , further comprising executing a rate-distortion function to set a rate-distortion performance value of a compression-distortion tradeoff and determining the quality level and/or bit rate to use in the machine vision task comprises determining the quality level and/or bit rate to use in the machine vision task based further on the rate-distortion performance value.
10 . The method of claim 1 , wherein the machine vision task is object detection and the machine vision results are a list of bounding boxes that describe each object and wherein deriving the performance value comprises: comparing each bounding box in the machine vision results to bounding boxes in the assumed truth; responsive to there being an amount of overlap between a bounding box in the machine vision results and a bounding box in the assumed truth being above a designated value, increasing a number of correctly identified bounding boxes; responsive to all bounding boxes in the machine vision results being compared, deriving the performance value.
11 . An apparatus adapted to perform according to claim 1 .
12 . A computer program product comprising a non-transitory storage medium including program code to be executed by processing circuitry of an apparatus, whereby execution of the program code causes the apparatus to perform according to claim 1 .

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a 35 U.S.C. § 371 national stage application of PCT International Application No. PCT/SE2022/050355 filed on Apr. 8, 2022, which in turn claims domestic priority to U.S. Provisional Patent Application No. 63/175,220, filed on Apr. 15, 2021, the disclosures and content of which are incorporated by reference herein in their entirety. TECHNICAL FIELD The present disclosure relates generally to communications, and more particularly to encoding and decoding methods and related devices and nodes supporting encoding and decoding. BACKGROUND Many of the image and video compression standards that have been developed over the last years have primarily been directed at human viewers. The impression of how a human that views a decoded video has thus been the focus of many efforts. However, due to the recent increase in computational capabilities and the rise of neural networks, another category of “viewers” for visual content has emerged—machines. Computers or algorithms analyze videos and make decisions based on what the machines are seeing. There are many different machine vision (MV) tasks that machines can perform, such as object detection, tracking or segmentation, action recognition, pose estimation, event prediction and many more. One of the most common tasks a machine performs is object detection, where an algorithm (e.g. a neural network) searches for objects within a video or image and generally returns a bounding box indicating the position of the object and a label indicating what kind of object it is. There are a number of datasets to train and test neural networks that perform MV tasks available. These datasets usually consist of several components: A large training set, that can be used for training the algorithmA small verification set, for which results with specified algorithms are published, so users can verify that their setup is working correctly.A test set which is often used in challengesThe ground truth (GT), which contains the correct answers for the tasks that are evaluated. Taking object detection as an example, FIG. 1 shows a bounding box from the GT and a detected bounding box. Here the area of the intersection (solid line) is divided by the area of the union (dashed line) of the two bounding boxes. If this value, usually called the intersection over union (IoU), exceeds a predetermined threshold (e.g., 0.5), the bounding box is considered detected successfully. In the following paragraphs, the IoU is referred to as overlap. In 2019, the International Standardization Organization's Moving Pictures Expert Group (MPEG) started an investigation into the area of video coding for machines. Here the goal is to either develop a new compression standard or find a different solution that can optimize video compression for MV tasks. SUMMARY The current methods to compress images or videos use primarily the same encoder parameters for entire datasets with little to no variation. In most cases there is no consideration of the performance for certain machine vision tasks on a per-image or per-video basis. According to a first aspect of the present disclosure, there is provided a method performed by an apparatus to determine compression parameters to choose when compressing images or videos for use in a machine vision task. The method comprises compressing an uncompressed original image or video at a plurality of different quality levels and/or bit rates to create a plurality of compressed images or videos. The method further comprises, for each compressed image or video compressed at the different quality levels and/or bit rates: decompressing the compressed image or video to create a decompressed image or video, executing a machine vision algorithm on the decompressed image or video to generate machine vision results for the decompressed image or video and deriving a performance value indicating a performance of the decompressed image or video based on comparing the machine vision results to an assumed truth. The method comprises determining a quality level and/or bit rate to use in the machine vision task based on the performance value for each decompressed image or video compressed at the different quality levels and/or bit rates. According to a second aspect of the present disclosure, there is provided an apparatus adapted to perform the method according to the first aspect. According to a third aspect of the present disclosure, there is provided a computer program comprising program code to be executed by processing circuitry of an apparatus, whereby execution of the program code causes the apparatus to perform the method according to the first aspect. According to a fourth aspect of the present disclosure, there is provided a computer program product comprising a non-transitory storage medium including program code to be executed by processing circuitry of an apparatus, whereby execution of the program code causes the apparatus to perform