EP-4738239-A1 - IMAGE PROCESSING SYSTEM, IMAGE PROCESSING METHOD, AND PROGRAM

EP4738239A1EP 4738239 A1EP4738239 A1EP 4738239A1EP-4738239-A1

Abstract

It is made possible to perform high-precision super-resolution processing on moving images generated from an object whose texture changes without relying on movement information. An image processing system, wherein a processor acquires first to Nth input frames having a number of input pixels and first to Nth intermediate frames from each input frame, acquires first to Nth estimated frames from each intermediate frame, identifies an nth color change pixel including color information that changes regardless of the movement of the object in the nth intermediate frame based on texture information of the object, and acquires nth auxiliary information by replacing the pixel value of the color change pixel in the nth cumulative feature information with a predetermined value, and the machine learning model includes an output layer that outputs the nth cumulative feature information and an output layer that outputs the nth estimated frame, and learns using a plurality of training data including a learning intermediate frame, the auxiliary information in which the color change pixel has been replaced with a predetermined value, and a learning estimated frame.

Inventors

YOKOTA, KENICHIRO
KOBIKI, HISASHI
IKENOUE, SHOICHI

Assignees

Sony Interactive Entertainment Inc.
Sony Group Corporation

Dates

Publication Date: 20260506
Application Date: 20240621

Claims (10)

An image processing system comprising at least one processor, wherein: the at least one processor acquires each of first to Nth input frames (N is a natural number of 2 or more) having a predetermined number of input pixels; acquires first to Nth intermediate frames by generating intermediate frames having a number of intermediate pixels equal to or greater than the number of input pixels and that correspond to the input frames based on the input frame; and acquires first to Nth estimated frames having a number of estimated pixels equal to or greater than the number of intermediate pixels and greater than the number of input pixels by inputting each intermediate frame into a machine learning model; the machine learning model includes a cumulative feature information output layer having the nth intermediate frame (n = 2, 3, ..., N) and n-1th auxiliary information based on n-1th cumulative feature information indicating a feature of the first to n-1th intermediate frames are input, and wherein the nth cumulative feature information indicating a feature of the first to nth intermediate frames is output; and an estimated frame output layer wherein the nth cumulative feature information is input and wherein the nth estimated frame is output; the at least one processor identifies an nth color change pixel including color information that changes regardless of movement of an object in the nth intermediate frame based on texture information of the object and acquires the nth auxiliary information by replacing a pixel value of the color change pixel at the nth cumulative feature information with a predetermined value; and the machine learning model learns using a plurality of training data respectively including a learning intermediate frame having the number of intermediate pixels generated based on a learning intermediate frame having the number of intermediate pixels generated based on a learning input frame having the number of input pixels, the auxiliary information in which the color change pixels are replaced with a predetermined value, and a learning estimated frame having the number of estimated pixels.
The image processing system according to claim 1, wherein each of the input frames is an image obtained by rendering three-dimensional data depicting one or more objects as seen from a predetermined viewpoint.
The image processing system according to claim 2, wherein each of the input frames is an image obtained by rendering so that the viewpoint varies for each of the input frames, the at least one processor acquires variation information, which is information relating to variation of the viewpoint for each input frame in the rendering, and generates each of the intermediate frames found by interpolating the pixel value of the position corresponding to each pixel before variation in the input frame based on the variation information and each pixel of each of the input frames.
The image processing system according to claim 2 or 3, wherein the at least one processor acquires n-1th movement information, which is information indicating an amount and a direction of movement from an n-1th input frame to an nth input frame and acquires the n-1th auxiliary information by applying movement compensation to the n-1th cumulative feature information based on the n-1th movement information.
The image processing system according to claim 4, wherein the at least one processor acquires n-1th depth information indicating the depth of each pixel in the n-1th input frame and nth depth information indicating the depth of each pixel in the nth input frame identifies an nth appearing pixel, which is a pixel in the nth intermediate frame in which all or part of the object not displayed in the n-1th intermediate frame is displayed, based on the n-1th depth information and the nth depth information and acquires the n-1th auxiliary information by replacing the pixel value of the nth appearing pixel in the n-1th cumulative feature information with a predetermined value.
The image processing system according to claim 1 or 2, wherein the cumulative feature information output layer is input with the first intermediate frame and imparted auxiliary information and outputs the first cumulative feature information.
The image processing system according to claim 1 or 2, wherein the cumulative feature information is image information having the same number of pixels as the number of intermediate pixels.
The image processing system according to claim 1 or 2, wherein the color change pixel is represented by information representing infinity or not a number.
An image processing method, wherein: a processor acquires each of first to Nth input frames (N is a natural number of 2 or more) having a predetermined number of input pixels; acquires first to Nth intermediate frames by generating intermediate frames having a number of intermediate pixels equal to or greater than the number of input pixels and that correspond to the input frames based on the input frame; and acquires first to Nth estimated frames having a number of estimated pixels equal to or greater than the number of intermediate pixels and greater than the number of input pixels by inputting each intermediate frame into a machine learning model; the machine learning model includes a cumulative feature information output layer having the nth intermediate frame (n = 2, 3, ..., N) and n-1th auxiliary information based on n-1th cumulative feature information indicating a feature of the first to n-1th intermediate frames are input, and wherein the nth cumulative feature information indicating a feature of the first to nth intermediate frames is output; and an estimated frame output layer wherein the nth cumulative feature information is input and wherein the nth estimated frame is output; an nth color change pixel including color information that changes regardless of movement of an object in the nth intermediate frame is identified based on texture information of the object and the nth auxiliary information is acquired by replacing a pixel value of the color change pixel at the nth cumulative feature information with a predetermined value; and the machine learning model learns using a plurality of training data respectively including a learning intermediate frame having the number of intermediate pixels generated based on a learning intermediate frame having the number of intermediate pixels generated based on a learning input frame having the number of input pixels, the auxiliary information in which the color change pixels are replaced with a predetermined value, and a learning estimated frame having the number of estimated pixels.
A program for making: input frame acquisition means for acquiring each of first to Nth input frames (N is a natural number of 2 or more) having a predetermined number of input pixels; intermediate frame acquisition means for acquiring first to Nth intermediate frames by generating intermediate frames having a number of intermediate pixels equal to or greater than the number of input pixels and that correspond to the input frames based on the input frame; and estimated frame acquisition means for acquiring first to Nth estimated frames having a number of estimated pixels equal to or greater than the number of intermediate pixels and greater than the number of input pixels by inputting each intermediate frame into a machine learning model function in a computer; wherein the machine learning model includes a cumulative feature information output layer having the nth intermediate frame (n = 2, 3, ..., N) and n-1th auxiliary information based on n-1th cumulative feature information indicating a feature of the first to n-1th intermediate frames are input, and wherein the nth cumulative feature information indicating a feature of the first to nth intermediate frames is output; and an estimated frame output layer wherein the nth cumulative feature information is input and wherein the nth estimated frame is output; the program also making identification means for identifying an nth color change pixel including color information that changes regardless of movement of an object in the nth intermediate frame based on texture information of the object and auxiliary information acquisition means for acquiring the nth auxiliary information by replacing a pixel value of the color change pixel at the nth cumulative feature information with a predetermined value function in the computer; and the machine learning model learns using a plurality of training data respectively including a learning intermediate frame having the number of intermediate pixels generated based on a learning intermediate frame having the number of intermediate pixels generated based on a learning input frame having the number of input pixels, the auxiliary information in which the color change pixels are replaced with a predetermined value, and a learning estimated frame having the number of estimated pixels.

Description

[Technical Field] The present invention relates to an image processing system, an image processing method, and a program. [Background Art] Conventionally, art for using a machine learning model to estimate a high quality still image based on a low quality still image (super-resolution) is known (see Non-Patent Document 1 below). [Prior Art Documents] [Non-Patent Documents] [Non-Patent Document 1] Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang. Learning a Deep Convolutional Network for Image Super-Resolution, in Proceedings of European Conference on Computer Vision (ECCV), 2014 [Summary of Invention] [Problem to be Solved by Invention] The inventors of the present application are considering applying super-resolution described above to moving images such as game screens. In super-resolution of moving images, it is believed that moving images of higher image quality can be estimated by taking into consideration not only information about each frame to be processed but also information about a past frame of these frames. In particular, degradation of image quality due to ghosting can be avoided by taking into consideration information that indicates the movement of an object, such as a motion vector. However, there are cases where the texture of an object changes regardless of the movement information of the object, such as when the object is a mirror or when the object has an animation texture. When super-resolution processing that takes movement information into consideration is performed on moving images generated from such objects, it may actually result in a decrease in image quality. An object of the present disclosure is to provide an image processing system, an image processing method, and a program that can perform high-precision super-resolution processing on moving images generated from objects with changing textures without relying on movement information, in image processing means that estimate high-quality moving images based on low-quality moving images using movement information and information from past frames. [Means for Solving Problem] An image processing system according to the present invention is an image processing system including at least one processor, wherein: the at least one processor acquires each of first to Nth input frames (N is a natural number of 2 or more) having a predetermined number of input pixels; acquires first to Nth intermediate frames by generating intermediate frames having a number of intermediate pixels equal to or greater than the number of input pixels and that correspond to the input frames based on the input frame; and acquires first to Nth estimated frames having a number of estimated pixels equal to or greater than the number of intermediate pixels and greater than the number of input pixels by inputting each intermediate frame into a machine learning model; the machine learning model includes a cumulative feature information output layer having the nth intermediate frame (n = 2, 3, ..., N) and n-1th auxiliary information based on n-1th cumulative feature information indicating a feature of the first to n-1th intermediate frames are input, and wherein the nth cumulative feature information indicating a feature of the first to nth intermediate frames is output; and an estimated frame output layer wherein the nth cumulative feature information is input and wherein the nth estimated frame is output; the at least one processor identifies an nth color change pixel including color information that changes regardless of movement of an object in the nth intermediate frame based on texture information of the object and acquires the nth auxiliary information by replacing a pixel value of the color change pixel at the nth cumulative feature information with a predetermined value; and the machine learning model learns using a plurality of training data respectively including a learning intermediate frame having the number of intermediate pixels generated based on a learning intermediate frame having the number of intermediate pixels generated based on a learning input frame having the number of input pixels, the auxiliary information in which the color change pixels are replaced with a predetermined value, and a learning estimated frame having the number of estimated pixels. [Brief Description of Drawings] [FIG. 1] A diagram illustrating one example of a hardware configuration of an image processing system.[FIG. 2] A diagram illustrating an overview of an image processing system.[FIG. 3] A diagram schematically illustrating processing in an image processing system.[FIG. 4] A functional block diagram illustrating one example of functions realized by the image processing system.[FIG. 5] A diagram describing processing of a rendering unit.[FIG. 6] A diagram describing processing in an intermediate frame acquisition unit.[FIG. 7] A diagram schematically illustrating processing for defining a color change pixel.[FIG. 8] A flowchart illustrating one example of the flow