Search

EP-4432217-B1 - IMAGE ENHANCEMENT

EP4432217B1EP 4432217 B1EP4432217 B1EP 4432217B1EP-4432217-B1

Inventors

  • Sanchit, Sanchit
  • Tack, Alexander

Dates

Publication Date
20260506
Application Date
20240312

Claims (15)

  1. An image processing method, the method including: by a computer processing system implementing a trained machine learning model: receiving as an input to the trained machine learning model a combination of: image characteristics of an input image, wherein the image characteristics include variables that change between an image before processing by the trained machine learning model and after the image has been processed by the trained machine learning model; and a first classification output for the input image, the first classification output relating the input image to a set of image classes; and generating, through application of the trained machine learning model, at least one visual parameter usable to generate a processed image relative to the input image; wherein: a machine learning model is trained to form the trained machine learning model by a process comprising utilising as a learning objective a reduction or minimisation of a combination of both: i) a first loss, wherein the first loss is a loss between an output image of the machine learning model that applies the at least one visual parameter and a target training image and ii) a second loss, wherein the second loss is a loss between a second classification output, provided by the machine learning model, different from the first classification output, and a known classification of the target training image.
  2. The method of claim 1, wherein the image characteristics define colour and brightness semantics of the input image.
  3. The method of claim 1, wherein the image characteristics comprise data representing a colour histogram of the input image, for example an RGB histogram.
  4. The method of any one of claims 1 to 3, wherein the trained machine learning model is a first trained machine learning model and the first classification output comprises an output of a second trained machine learning model, different to the first trained machine learning model, wherein the second trained machine learning model is trained to classify images into one of a plurality of scene classes.
  5. The method of any one of claims 1 to 4, wherein the at least one visual parameter comprises one or more of: (i) brightness, (ii) contrast, (iii) saturation, (iv) vibrance, (v) whites, (vi) blacks, (vii) shadows and (viii) highlights.
  6. The method of any one of claims 1 to 5, wherein the first loss is a mean square error loss between the output image and the target training image.
  7. The method of any one of claims 1 to 6, wherein the second loss is a multi-class cross entropy loss between the second classification output and the known classification of the target training image.
  8. The method of any one of claims 1 to 7, wherein the learning objective is a mathematical combination of the first loss and the second loss.
  9. The method of any one of claims 1 to 8, wherein the machine learning model comprises a first multilayer perceptron configured to provide the at least one visual parameter and a second multilayer perceptron, configured to provide the second classification output.
  10. The method of claim 9, wherein the machine learning model comprises a third multilayer perceptron, the third multilayer perception configured to reduce the dimensionality of the input to the trained machine learning model, wherein the first multilayer perceptron and the second multilayer perception are both attached to the third multilayer perceptron.
  11. The method of any one of claims 1 to 10, wherein the machine learning was trained based on a plurality of image pairs, each image pair comprising a target training image and a degraded image, the degraded image used to generate the output image of the machine learning model during training.
  12. The method of claim 11, wherein: a first image pair of the plurality of image pairs is associated with a first class and the degraded image of the first image pair was generated by applying a first degradation model to the target training image of the first image pair; a second image pair of the plurality of image pairs is associated with a second class and the degraded image of the second image pair was generated by applying a second degradation model to the target training image of the second image pair; the first image pair is different to the second image pair and the first degradation model is different to the second degradation model.
  13. The method of claim 12, wherein the first degradation model and not the second degradation model was selected for the first image pair due to the association of the first image pair with the first class and not the second class and the second degradation model and not the first degradation model was selected for the second image pair due to the association of the second image pair with the second class and not the first class.
  14. A computer processing system including one or more computer processors and computer-readable storage, the computer processing system configured to perform the method of any one of claims 1 to 13.
  15. Non-transient computer-readable storage storing instructions for a computer processing system, wherein the instructions, when executed by the computer processing system cause the computer processing system to perform the method of any one of claims 1 to 13.

Description

Field of the disclosure The present disclosure relates to the field of image processing. Particular embodiments relate to a method of enhancement of a digital image through changes to one or more visual parameters of the digital image, the changes identified using a computer or computer system implementing a machine learning solution. Other embodiments relate to a computer processing system or computer-readable storage configured to perform such a method. Background Digital images, for example photos or videos stored as data, are pervasive in modern society. They can be and often are generated using a digital camera. There is now a high availability of digital cameras, including on multifunction devices like smart phones, in addition to dedicated cameras. Digital cameras have a diverse range of specifications, including relating to the lens size, number of lenses and in image capture hardware. Digital images may be generated by other mechanisms, for example using computer applications and in recent times there has been significant discussion of the use of artificial intelligence to generate digital images, including artwork. Software or firmware may automatically process digital image data, for example digital image data generated by the image capture hardware of a digital camera or digital image data received from or via another source. Software or firmware may also or instead allow for the manual adjustment of visual parameters of digital image data, including for example to process the digital image in response to a manual input to adjust of one or more of brightness, saturation and contrast. The software or firmware may form a part of a digital camera or other image generator, or may be run on a computer system separate from the digital camera or other image generator, which computer system has received digital image data for processing. The software or firmware for processing digital images may be deployed to enhance the image. The enhancement may aim to make the image more aesthetically pleasing. The enhancement may also or instead aim to make the image clearer or enable information from the image to the more readily discerned. The present disclosure relates to methods for using machine learning based solutions to image processing, for example to allow for image enhancement. CN 113 763 296 A describes an image processing method in which a global colour feature and an image semantic feature corresponding to a video frame are obtained. Enhancement parameters of the video frame are obtained according to the global colour feature and the image semantic feature. Summary of the disclosure Embodiments of a method of training a machine learning model are described. The embodiments have particular application to training a machine learning model to perform image processing, for example image enhancement. The invention is defined in the independent claims. In some embodiments, the method of training includes utilising as a learning objective a reduction or minimisation of a combination of both: i) a first loss, wherein the first loss is a loss between an output image of the machine learning model that applies at least one visual parameter and a target training image and ii) a second loss, wherein the second loss is a loss between a classification output and a known classification of the target training image. In some embodiments the method of training includes reducing or minimising a combination of both the first loss and the second loss together with utilising for the training unsupervised image pairs as described below. In some embodiments, the method of training includes utilising one or more unsupervised images pairs, wherein an unsupervised image pair is one in which the degraded image has been generated by a computational process based on the target image of the unsupervised image pair. The computational process includes applying a selected degradation model to the target image, the selection of the degradation model for use in generating an unsupervised being based on classification information associated with the target image of the unsupervised pair. Embodiments of a method for generating image pairs for training a machine learning model for image processing are described. In some embodiments, the method for generating the image pairs includes receiving a set of training images and scene information for the set of training images and selecting and applying one of a plurality of degradation models to the set of training images to form a set of degraded images corresponding to the set of training images, wherein the selecting is based on the scene information. Each degraded image and corresponding training image forms an image pair for training a machine learning model. Embodiments of training a machine learning model for image processing that utilise the generated image pairs are also described. Embodiments of a method of image processing are also described. The embodiments include embodiments that uti