EP-4446996-B1 - LEARNABLE IMAGE TRANSFORMATION TRAINING METHODS AND SYSTEMS IN GRAPHICS RENDERING

EP4446996B1EP 4446996 B1EP4446996 B1EP 4446996B1EP-4446996-B1

Inventors

Salmi, Arturo Tommaso
CSÉFALVAY, Szabolcs
IMBER, JAMES

Dates

Publication Date: 20260506
Application Date: 20240325

Claims (15)

A training method for training a frame transformation pipeline, the frame transformation pipeline being part of a graphics processing system and configured to transform rendered frames to produce enhanced frames comprising one or more desired characteristics exhibited in a set of target images, wherein the frame transformation pipeline comprises one or more shaders, and wherein each of the one or more shaders is defined by a parametrized mathematical function selected to be capable of replicating a particular visual characteristic, the training method comprising: receiving one or more input images having been rendered by the graphics processing system, and receiving the set of target images; applying each shader of the frame transformation pipeline to at least a portion of at least some of the one or more input images to obtain one or more candidate output frames; calculating, at a parametrized discriminator of a generative adversarial network, an indication of a similarity between visual characteristics of the candidate output frames and the set of target images; in dependence on the indication, applying a parameter update step to parameters of the discriminator and to parameters of each of one or more of the parametrized mathematical functions defining a respective one of the one or more shaders, wherein the parameter update step is configured to derive parameters of each of said one or more of the parametrized mathematical functions in order that the respective one of the shaders is arranged to impose, when applied to a frame, its respective particular visual characteristic in dependence on an extent to which the particular visual characteristic is exhibited in the set of target images.
The method of claim 1, wherein the indication is an objective loss value calculated using an adversarial loss function.
The method of claim 2, wherein the adversarial loss function calculates two loss components, wherein a first component is indicative of an accuracy of the discriminator in determining whether the one or more candidate output frames belongs to the set of target images, and a second component is indicative of an accuracy of the frame transformation pipeline in generating enhanced frames that exhibit the visual characteristics of the set of target images.
The method of any preceding claim, wherein the parameter update step is controlled by a generative adversarial network, GAN, wherein the discriminator is arranged to calculate a probability that the candidate output frames belong in the set of target images.
The method of any preceding claim, wherein the set of target images do not contain a predetermined mapping onto any of the one or more input images.
The method of any preceding claim, wherein each parametrized mathematical function represents an image-capture characteristic, and wherein the particular visual characteristic each shader is arranged to replicate is a physical phenomenon associated with an image-capture process.
The method of any preceding claim, wherein the parameter update step concurrently updates the parameters of the discriminator and the parameters of each of one or more of the parametrized mathematical functions.
The method of any of claim 1 to 6, wherein the parameter update step comprises: updating the parameters of the discriminator; subsequent to updating the parameters of the discriminator, calculating, at the discriminator and using the updated parameters, an updated indication of a similarity between visual characteristics of the candidate output frames and the set of target images; updating the parameters of each of one or more of the parametrized mathematical functions in dependence on the updated indication.
The method of any preceding claim, wherein the frame transformation pipeline comprises at least one neural network configured to impose or further enhance frames based on a desired visual characteristic, wherein the parameter update step comprises updating network parameters defining one or more of the at least one neural network.
The method of any preceding claim, wherein one of the one or more shaders is a lens blur shader configured to replicate lens blur, wherein the parametrized mathematical function comprises at least one kernel comprising an array of values, wherein applying the lens blur shader comprises convolving the at least one kernel over at least a portion of an array of values representing pixels of the one or more rendered frames.
The method of any preceding claim, wherein one of the one or more shaders is a bloom shader configured to replicate the effect of light bleeding due to oversaturation in an image-capture system, wherein applying the bloom shader to an input frame of the one or more rendered frames comprises: downsampling the input frame to obtain a plurality of sub-frames each having a lower image resolution than a resolution of the input frame; for each sub-frame: extracting a luma channel; isolating portions of the extracted luma channel above a brightness threshold; applying a blurring function to the isolated portions to obtain a bloomed sub-frame; rescaling and combining each of the obtained bloomed sub-frame to obtain a bloom mask having the resolution of the input frame; combining the bloom mask with the input frame.
The method of any preceding claim, comprising, prior to calculating the indication at the parametrized discriminator, excising one or more regions of the one or more candidate output frames to obtain one or more edited candidate output frames, wherein the one or more excised regions contain visual artefacts that differ from the one or more desired characteristics exhibited in a set of target images.
A training apparatus module for training a frame transformation pipeline, the frame transformation pipeline being part of a graphics processing system and configured to transform rendered frames to produce enhanced frames comprising one or more desired characteristics exhibited in a set of target images, wherein the frame transformation pipeline comprises one or more shaders, and wherein each shader is defined by a parametrized mathematical function selected to be capable of replicating a particular visual characteristic, the training apparatus module comprising one or more processors configured to: receive the set of target images and a set of input images having been rendered by the graphics processing system; apply each shader of the frame transformation pipeline to at least a portion of at least some of the set of input images to obtain one or more candidate output frames; calculate, using a parametrized discriminator of a generative adversarial network, an indication of a similarity between visual characteristics of the candidate output frames and the set of target images; in dependence on the indication, apply a parameter update step to parameters of the discriminator and to parameters of each of one or more of the parametrized mathematical functions defining a respective one of the one or more shaders, wherein the parameter update step is configured to derive parameters of each of said one or more of the parametrized mathematical functions in order that the respective one of the shaders is arranged to impose, when applied to a frame, its respective particular visual characteristic in dependence on an extent to which the particular visual characteristic is exhibited in the set of target images.
An integrated circuit definition dataset that, when processed in an integrated circuit manufacturing system, configures the integrated circuit manufacturing system to manufacture a training apparatus module as claimed in claim 13.
Computer readable code configured to cause the method of any of claims 1 to 12 to be performed when the code is run.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application claims priority from UK patent applications GB2305382.0 filed on 12 April 2023, and GB2305381.2 filed on 12 April 2023. TECHNICAL FIELD The present disclosure relates to techniques for training learnable shaders capable of replicating image characteristics, in particular for applying post-processing in graphics rendering systems. BACKGROUND In computer graphics, a shader is used during the rendering of a scene to calculate and apply a desired trait, or a part thereof, to the rendered frames. Shaders comprise a mathematical function or algorithm that is applied to a set of pixels or vertices of the rendered frame. Some shaders are applied after the geometry of the scene has been rendered (e.g., by rasterisation), i.e., as a post-process. Shaders apply traits such as a certain type of lighting, hue, saturation, texture and the like. Shader algorithms may also be designed to alter the position of pixels/vertices to produce a final rendered image. In principle, a shader can be implemented to apply any visual characteristic or effect to a rendered image, and multiple shaders may be used in combination to achieve a particular effect. As described herein, some shaders are used for vertex and fragment shading, and other shaders may implement a post-processing method. The term 'post-processing' is used herein to refer to applying some processing to pixel values of an existing image, e.g., an image which has been rendered by a GPU. In these cases, the pixel values of the existing image may be read back into the GPU (e.g. as the texels of a texture) before being processed and applied to yield the fragments (pixels) of a new, post-processed, image. Simple post-processing shaders, e.g., that apply a certain hue to an image, can be manually coded and are thus algorithmically straightforward to implement and computationally cheap to apply during rendering. More generally, image transformation algorithms (i.e., that perform some form of image-to-image translation or filtering effect) have been implemented using machine learning methods, e.g., neural networks. For example, image transformation algorithms can be implemented to match a desired image characteristic or design style, without manual design or adjustment (e.g. choice of suitable parameters). In this way, a neural network can be trained to map a set of input (e.g. rendered) images to a set of target (e.g. photographic/photorealistic) images, to learn one or more arbitrary characteristics. Such image transformation algorithms operate globally, i.e., such that they learn all characteristics of a target. One way to train image-transformation neural networks is to use an adversarial network, which uses a zero-sum game to train a generator network using a discriminator network. The discriminator is trained simultaneously with the generator to classify transformed images as 'fake' (i.e., having been generated by the generator) or 'real' (belonging to the set of the target images). These adversarial networks are called Generative Adversarial Networks (GANs). Both networks of the GAN, i.e., the generator and discriminator, contain learnable parameters. The goal of the generator is to produce an output that replicates characteristics of the target such that it can deceive the discriminator, and the goal of the discriminator is to distinguish between the output of the generator and the 'true' target data. In other words, the generator has the objective of being trained to fool the discriminator, and the discriminator has the objective of learning to distinguish the generator output from the target data. GANs are known to be usable for both paired and unpaired data. At deployment, the generator from the GAN can therefore be used without the discriminator to transform an arbitrary input image to obtain characteristics of a target dataset. Large neural networks (NNs) produced in this way can produce accurate results almost indistinguishable from a target image set. The term 'large neural network' (or elsewhere referred to as a 'fully parameterised neural network') is intended to refer to a neural network with a large number of layers and parameters (e.g., around 1-10 million parameters or more). For example, fully parametrised NNs trained using a GAN can be effective in producing photorealistic images using computer-generated images as input. However, large NNs are very computationally expensive to use even when optimised (e.g., when approximated using a sparser set of parameters, or using lower bit depths to represent parameters). This is due to the sheer number of parameters: e.g., on the order of 10 million or more. Additionally, because neural networks are trained to simulate arbitrary characteristics (e.g., nebulous traits such as artistic style), the huge number of parameters required mean that such neural networks are undifferentiated 'black boxes' that learn features indiscriminately. In other words, large NNs lear