US-12626334-B2 - Method and device with image processing

US12626334B2US 12626334 B2US12626334 B2US 12626334B2US-12626334-B2

Abstract

An electronic device, including one or more processors configured to execute instructions; and a memory storing the instructions which, when executed by the one or more processors, configures the one or more processors to generate high-quality feature data of a current frame, by implementing a feature restoration model that is provided reference feature data, received by an electronic device, and corresponding to compressed feature data of a reference image corresponding to a first time that is different from a second time to which the current frame corresponds, and low-quality feature data received by the electronic device, and corresponding to compressed data of the current frame, that has a lower image quality than the reference image; and generate a current frame of a third image quality higher than the second image quality, based on the high-quality feature data.

Inventors

Wonhee Lee
Seungeon KIM

Assignees

SAMSUNG ELECTRONICS CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20230516
Priority Date: 20221108

Claims (19)

1 . An electronic device, comprising: one or more processors configured to execute instructions; and a memory storing the instructions which, when executed by the one or more processors, configures the one or more processors to: generate high-quality feature data of a current frame, by implementing a feature restoration model that is provided reference feature data, received by an electronic device, and corresponding to compressed feature data of a reference image corresponding to a first time that is different from a second time to which the current frame corresponds, and low-quality feature data received by the electronic device, and corresponding to compressed data of the current frame, that has a lower image quality than the reference image; and generate a current frame of a third image quality higher than the lower image quality, based on the high-quality feature data, wherein the high-quality feature data is of higher quality than the low-quality feature data and the low-quality feature data is of lower quality than the reference feature data.
2 . The electronic device of claim 1 , wherein the execution of the instructions configures the one or more processors to restore the reference image using a first decoder provided the reference feature data, and wherein the reference feature data is representative of having been generated by performing quantization on a feature map of a reference frame extracted from the reference frame, and the low-quality feature data is representative of having been generated by performing quantization on a feature map of the current frame extracted from the current frame.
3 . The electronic device of claim 1 , wherein, in the execution of the instructions, the one or more processors are configured to restore the reference frame from the reference feature data.
4 . The electronic device of claim 3 , wherein, in the execution of the instructions, the one or more processors are configured to: perform dequantization on the reference feature data to generate dequantized reference feature data; and generate the restored reference frame from the dequantized reference feature data.
5 . The electronic device of claim 1 , wherein, in the execution of the instructions, the one or more processors are configured to: perform dequantization on the high-quality feature data to generate dequantized high-quality feature data; and generate the current frame of the third image quality from the dequantized high-quality feature data.
6 . The electronic device of claim 1 , wherein the reference frame corresponds to an I frame comprised in a group of pictures (GOP) of a video having a plurality of frames, and the current frame corresponds to a B frame or a P frame comprised in the GOP.
7 . The electronic device of claim 6 , wherein the electronic device further includes a storage device configured to store respective received reference feature data for each of two or more GOPs.
8 . The electronic device of claim 1 , wherein the feature restoration model is a neural network comprising any one or any combination of two or more of a convolution layer, a first layer, and an attention layer, and a transformer-based neural network.
9 . The electronic device of claim 1 , wherein the feature restoration model is trained based on at least one of: a first loss function based on a difference between high-quality feature data, which is extracted by encoding a current training frame of a second image quality, and high-quality feature data of the current training frame, which is output by the feature restoration model that receives, as inputs, reference feature data extracted from a reference training frame of a first image quality and low-quality feature data extracted from the current training frame; and a second loss function based on a difference between the current training frame and a current training frame restored by decoding the high-quality feature data extracted by encoding the current training frame.
10 . An electronic device comprising: a communication device configured to: receive reference feature data extracted from a reference frame of a first image quality; and receive first low-quality residual data that indicates a difference between low-quality feature data of a previous frame and low-quality feature data extracted from a current frame of a second image quality lower than the first image quality, or receive second low-quality residual data extracted from a residual frame between a motion compensation frame, in which a motion of the current frame is compensated for, and a motion compensation frame, in which a motion of the previous frame is compensated for; one or more processors configured to execute instructions; and a memory storing the instructions which, when executed by the one or more processors, configures the one or more processors to: generate low-quality feature data of the current frame, based on the low-quality feature data of the previous frame and the first low-quality residual data, in response to a receipt of the first low-quality residual data; generate high-quality feature data of the current frame, by implementing a first feature restoration model that receives the reference feature data and the low-quality feature data of the current frame as inputs; and generate a current frame of a third image quality higher than the second image quality, based on the high-quality feature data, wherein the high-quality feature data is of higher quality than the low-quality feature data and the low-quality feature data is of lower quality than the reference feature data.
11 . The electronic device of claim 10 , wherein in the execution of the instructions, the one or more processors are configured to: in response to receipt of the second low-quality residual data, generate motion-compensated reference feature data, which is generated by applying a motion compensation value to the reference feature data, and high-quality residual data by implementing a second feature restoration model that is provided the second low-quality residual data; generate a decoded residual frame by decoding the high-quality residual data; and generate a current frame of a fourth image quality higher than the second image quality, based on the decoded residual frame and an inter-predicted current frame, and wherein the high-quality residual data is of higher quality than the second low-quality residual data.
12 . A processor-implemented method, comprising: generating high-quality feature data of a current frame, by implementing a feature restoration model that is provided reference feature data, received by an electronic device, and corresponding to compressed feature data of a reference image corresponding to a first time that is different from a second time to which the current frame corresponds, and low-quality feature data received by the electronic device, and corresponding to compressed data of the current frame, that has a lower image quality than the reference image; and generating a current frame of a third image quality higher than the lower image quality, based on the high-quality feature data.
13 . The method of claim 12 , wherein: the reference feature data is representative of having been generated by performing quantization on a feature map of a reference frame extracted from the reference frame, and the low-quality feature data is representative of having been generated by performing quantization on a feature map of the current frame extracted from the current frame.
14 . The method of claim 12 , further comprising restoring the reference frame from the reference feature data.
15 . The method of claim 14 , wherein the restoring of the reference frame comprises: performing dequantization on the reference feature data to generate dequantized reference feature data; and generate the restored reference frame from the dequantized reference feature data.
16 . The method of claim 12 , wherein the generating of the current frame of the third image quality comprises: performing dequantization on the high-quality feature data to generate dequantized high-quality feature data; and generating the current frame of the third image quality from the dequantized high-quality feature data.
17 . The method of claim 12 , wherein the reference frame corresponds to an I frame comprised in a group of pictures (GOP) of a video having a plurality of frames, and the current frame corresponds to a B frame or a P frame comprised in the GOP.
18 . The method of claim 12 , wherein the feature restoration model is a neural network comprising any one or any combination of two or more of a convolution layer, a first layer, and an attention layer, and a transformer-based neural network.
19 . A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 12 .

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit under 35 USC § 119(a) of Korean Patent Application No. 10-2022-0147961, filed on Nov. 8, 2022, in the Korean Intellectual Property Office, the entire disclosure of which is incorporated herein by reference for all purposes. BACKGROUND 1. Field The following description relates to a method and device with image processing. 2. Description of Related Art A Motion Estimation (ME) technique may be used for video compression (encoding). The ME technique identifies a motion vector by referring to pixels from one image frame to another image frame. The ME technique may be a compression method that is based on temporal redundancy of a video, and compress a video by removing the temporal redundancy, using data of video frames around an image frame that is being compressed. Advanced Video Coding or MPEG-4 Part 10 (e.g., H264 codec) and High-Efficiency Video Coding or MPEG-H Part 2 (e.g., H265 codec) are example codecs that use such temporal based encoding, where b and p-frames may be temporally encoded. SUMMARY This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. In a general aspect, a system includes one or more processors configured to execute instructions; and a memory storing the instructions which, when executed by the one or more processors, configures the one or more processors to generate high-quality feature data of a current frame, by implementing a feature restoration model that is provided reference feature data, received by an electronic device, and corresponding to compressed feature data of a reference image corresponding to a first time that is different from a second time to which the current frame corresponds, and low-quality feature data received by the electronic device, and corresponding to compressed data of the current frame, that has a lower image quality than the reference image; and generate a current frame of a third image quality higher than the second image quality, based on the high-quality feature data. The execution of the instructions configures the one or more processors to restore the reference image using a first decoder provided the reference data, and wherein the reference feature data may be representative of having been generated by performing quantization on a feature map of the reference frame extracted from the reference frame, and the low-quality feature data may be representative of having been generated by performing quantization on a feature map of the current frame extracted from the current frame. In the execution of the instructions, the one or more processors may be configured to restore the reference frame from the reference feature data. In the execution of the instructions, the one or more processors may be configured to perform dequantization on the reference feature data to generate dequantized reference feature data; and generate the restored reference frame from the dequantized reference feature data. In the execution of the instructions, the one or more processors may be configured to perform dequantization on the high-quality feature data to generate dequantized high-quality feature data; and generate the current frame of the third image quality from the dequantized high-quality feature data. The reference frame may correspond to an I frame comprised in a group of pictures (GOP) of a video having a plurality of frames, and the current frame may correspond to a B frame or a P frame comprised in the GOP. The electronic device may further include a storage device configured to store respective received reference feature data for each of two or more GOPs. The feature restoration model may be a neural network comprising any one or any combination of two or more of a convolution layer, a first layer, and an attention layer, and a transformer-based neural network. The feature restoration model may be trained based on at least one of a first loss function based on a difference between high-quality feature data, which is extracted by encoding a current training frame of a second image quality, and high-quality feature data of the current training frame, which is output by the feature restoration model that receives, as inputs, reference feature data extracted from a reference training frame of the first image quality and low-quality feature data extracted from the current training frame; and a second loss function based on a difference between the current training frame and a current training frame restored by decoding the high-quality feature data extracted by encoding the current training frame. In a general aspect, an electronic device includes a communication device configured to receive reference fe