EP-4423715-B1 - AI-BASED AESTHETICAL IMAGE MODIFICATION

EP4423715B1EP 4423715 B1EP4423715 B1EP 4423715B1EP-4423715-B1

Inventors

LI, JI
SUN, XIAO
DAI, QI
HU, Han

Dates

Publication Date: 20260506
Application Date: 20220908

Claims (12)

A system for modifying an image, comprising: a processor; and a computer-readable medium in communication with the processor, the computer-readable medium comprising instructions that, when executed by the processor, cause the processor to control the system to perform functions of: receiving (1310) a source image having a first image configuration; determining (1320) a second image configuration for a target image, wherein the second image configuration comprises a size and a shape; providing (1330), to an artificial intelligence, AI, engine, the received source image, the Al engine trained to perform functions of: identifying (1340), based on a set of rules related to visual features, a plurality of candidate regions from the source image, each candidate region showing a different portion of the source image, wherein the set of rules includes the identified second image configuration and an aspect ratio of the target image; generating (1350) a plurality of regional proposal images based on the plurality of identified candidate regions, respectively, wherein generating comprises any one or more of resizing, warping the candidate regions, or adding a new portion to an existing candidate region such that each proposal region image has the second image configuration; determining (1360), based on prior aesthetical evaluation data, an aesthetical value of each regional proposal image; and selecting (1370), based on the determined aesthetical value of each regional proposal image, a first regional proposal image as the target image, the first region proposal image being one of the plurality of regional proposal images; extracting (1380), from the AI engine, the first regional proposal image selected as the target image; and causing (1390) the first regional proposal image to be displayed via a display of a user device.
The system of claim 1, wherein: the first image configuration includes a first image height and a first image width, the second image configuration include a second image height and a second image width, and at least one of the first image height and width is different from at least one of the second image height and width.
The system of claim 1, wherein the Al engine comprises a first machine learning, ML, model trained to perform identifying, from the source image, the plurality of candidate regions based on the set of rules related to the visual features.
The system of claim 3, wherein the set of rules related to the visual features includes at least one of: the second image configuration for the target image; an aspect ratio or shape of the source image or target image; a location of text in the source image; a location of a logo in the source image; a location of a specific body part in the source image; a visibility tolerance; and a clarity or distortion tolerance.
The system of claim 3, wherein the Al engine further comprises a second ML model trained to perform determining, based on the prior aesthetical evaluation data, the aesthetical value of each regional proposal image.
The system of claim 5, wherein the prior aesthetical evaluation data comprises a plurality of prior user aesthetical evaluations of a plurality of sample images having different visual features and configurations.
The system of claim 5, wherein, for determining the aesthetical value of each regional proposal image, the second ML model is configured to perform: a function of processing, using a plurality of computing devices, the plurality of regional proposal images in parallel, or a plurality of functions comprising: determining, based on a set of coarse aesthetic selection rules, a first aesthetical evaluation value of each regional proposal image; identifying a set of the regional proposal images having the first aesthetical evaluation value higher than a predetermined value; determining, based on a set of fine aesthetic selection rules, a second aesthetical evaluation value of each included in the identified set of the regional proposal images; determining that the first regional proposal image has a highest second aesthetical evaluation value; and selecting the first regional proposal image as the target image.
A computer-readable medium comprising instructions that, when executed by a processor, cause the processor to control a system to perform: receiving (1310) a source image having a first image configuration; determining (1320) a second image configuration for a target image, wherein the second image configuration comprises a size and a shape; providing (1330), to an artificial intelligence, AI, engine, the received source image, the Al engine trained to perform functions of: identifying (1340), based on a set of rules related to visual features, a plurality of candidate regions from the source image, each candidate region showing a different portion of the source image, wherein the set of rules includes the identified second image configuration and an aspect ratio of the target image; generating (1350) a plurality of regional proposal images based on the plurality of identified candidate regions, respectively, wherein generating comprises any one or more of resizing, warping the candidate regions, or adding a new portion to an existing candidate region such that each proposal region image has the second image configuration; determining (1360), based on prior aesthetical evaluation data, an aesthetical value of each regional proposal image; and selecting (1370), based on the determined aesthetical value of each regional proposal image, a first regional proposal image as the target image, the first region proposal image being one of the plurality of regional proposal images; extracting (1380), from the AI engine, the first regional proposal image selected as the target image; and causing (1390) the first regional proposal image to be displayed via a display of a user device.
A method of operating a system for modifying an image, comprising: receiving (1310) a source image having a first image configuration; determining (1320) a second image configuration for a target image, wherein the second image configuration comprises a size and a shape; providing (1330), to an artificial intelligence, AI, engine, the received source image, the Al engine trained to perform functions of: identifying (1340), based on a set of rules related to visual features, a plurality of candidate regions from the source image, each candidate region showing a different portion of the source image, wherein the set of rules includes the identified second image configuration and an aspect ratio of the target image; generating (1350) a plurality of regional proposal images based on the plurality of identified candidate regions, respectively, wherein generating comprises any one or more of resizing, warping the candidate regions, or adding a new portion to an existing candidate region such that each proposal region image has the second image configuration; determining (1360), based on prior aesthetical evaluation data, an aesthetical value of each regional proposal image; and selecting (1370), based on the determined aesthetical value of each regional proposal image, a first regional proposal image as the target image, the first region proposal image being one of the plurality of regional proposal images; extracting (1380), from the AI engine, the first regional proposal image selected as the target image; and causing (1390) the first regional proposal image to be displayed via a display of a user device.
The method of claim 9, wherein: the first image configuration includes a first image height and a first image width, the second image configuration includes a second image height and a second image width, and at least one of the first image height and width is different from at least one of the second image height and width.
The method of claim 9, wherein the AI engine comprises a first machine learning, ML, model trained to perform identifying, from the source image, the plurality of candidate regions based on the set of rules related to the visual features.
The method of claim 11, wherein the AI engine further comprises a second ML model trained to perform determining, based on the prior aesthetical evaluation data, the aesthetical value of each regional proposal image.

Description

BACKGROUND The recent development in digital photography has revolutionized how visual content is created, published, shared and consumed, and has contributed to the births and successes of a number of significant visual content-based social networking services, such as, Instagram™, TikTok™, etc. Now it has become a norm for anyone with a mobile device or digital camera to create, modify, store and share pictures and videos through various IT and social platforms. However, different content-related platforms require or impose different image configuration requirements and restrictions. Even within the same platform, different image configuration requirements and restrictions are often imposed depending on service or function types. For example, within the same Instagram™ platform, Instagram™ photo posts are automatically modified to have one image configuration (e.g., 1080 x 1080 pixels) while photos uploads for Instagram™ stories are automatically modified to have a different image configuration (e.g., 1080 x 1920 pixels). Due to such differences among different requirements and usage scenarios, a visual content created to meet one requirement or usage scenario may not be as aesthetically pleasing or visually effective as the original content when used for other usage scenarios. Hence, when the same visual content or source content is to be used for different usage scenarios (e.g., a magazine page, webpage banner, Facebook™ post, email template, newspaper advertisement, etc.), a user must manually modify the source content to generate a number of different variations manually to ensure that each variation meets different configuration requirements or restrictions while maintaining the same or similar aesthetical value or visual effectiveness. This requires human intelligence, training, skill and efforts, which cannot be easily replicated even with a state-of-art machine. CN 112 017 193 A relates to an image cutting device and method based on visual saliency and aesthetic score. The operation module receives the initial image and the cropped aspect ratio, and sends the initial image to the saliency detection module, and sends the cropped aspect ratio to the crop processing module. The saliency detection module receives the initial image to perform saliency area detection, obtains the initial image with the salient target frame, and sends the initial image with the salient target frame to the cropping processing module. The crop processing module obtains an initial image with an initial crop frame according to the salient target frame and the cropped aspect ratio, and generates an initial image with a set of candidate crop frames based on the initial crop frame; The crop frame contains at least one candidate crop frame; each candidate crop frame is combined with the initial image, and the candidate crop frame is cropped to obtain a set of candidate crop images; the candidate crop images are sent to aesthetic quality evaluation module. The aesthetic quality evaluation module evaluates the aesthetic quality score of each candidate cropped image, and sends the candidate cropped image with the highest aesthetic quality score as the final cropped image to the display module. The display module receives the final cut image sent by the aesthetic quality evaluation module, and displays it simultaneously with the initial image. SUMMARY It is the object of the present invention to simplify modifying an image to meet image configurational requirements or guidelines imposed by a content sharing, printing or publishing service while reducing computational efforts. This object is solved by the subject matter of the independent claims. Preferred embodiments are defined by the dependent claims. In an implementation, a system for modifying an image, including a processor and a computer-readable medium in communication with the processor. The computer-readable medium including instructions that, when executed by the processor, cause the processor to control the system to perform functions of receiving a source image having a first image configuration; determining a second image configuration for a target image; providing, to an artificial intelligence (AI) engine, the received source image, the AI engine trained to perform functions of identifying, based on a set of rules related to visual features, a plurality of candidate regions from the source image, each candidate region showing a different portion of the source image; generating a plurality of regional proposal images based on the plurality of identified candidate regions, respectively, wherein each proposal region image has the second image configuration; determining, based on prior aesthetical evaluation data, an aesthetical value of each regional proposal image; and selecting, based on the determined aesthetical value of each regional proposal image, a first regional proposal image as the target image, the first region proposal image being one of the plurality of reg