EP-4740111-A1 - METHOD, DEVICE, AND MEDIUM FOR DETERMINING IMAGE FOR DISPLAY

EP4740111A1EP 4740111 A1EP4740111 A1EP 4740111A1EP-4740111-A1

Abstract

Implementations of the present disclosure provide a method, device, and medium for determining an image for display. The method comprises obtaining a plurality of candidate images associated with an object. The method further comprises generating a prompt for a language model based on the plurality of candidate images. The method further comprises obtaining a plurality of attractiveness ranks corresponding to the plurality of candidate images by feeding the prompt to the language model. The method further comprises determining a plurality of probability distributions corresponding to the plurality of candidate images based on the plurality of attractiveness ranks. In addition, the method further comprises determining a target image for display from the plurality of candidate images based on the plurality of probability distributions.

Inventors

ZHANG, QI

Assignees

Beijing Youzhuju Network Technology Co., Ltd.

Dates

Publication Date: 20260513
Application Date: 20250606

Claims (20)

A method for determining an image for display, comprising: obtaining a plurality of candidate images associated with an object; generating a prompt for a language model based on the plurality of candidate images; obtaining a plurality of attractiveness ranks corresponding to the plurality of candidate images by feeding the prompt to the language model; determining a plurality of probability distributions corresponding to the plurality of candidate images based on the plurality of attractiveness ranks; and determining a target image for display from the plurality of candidate images based on the plurality of probability distributions.
The method of claim 1, wherein generating the prompt for the language model based on the plurality of candidate images comprises: obtaining a description of task objective; obtaining an image list based on the plurality of candidate images; and generating the prompt based on the description of task objective and the image list.
The method of claim 2, wherein generating the prompt based on the description of task objective and the image list comprises: obtaining a scoring criteria associated with the object; and generating the prompt based on the description of task objective, the image list, and the scoring criteria.
The method of claim 3, wherein generating the prompt based on the description of task objective, the image list, and the scoring criteria comprises: obtaining a template of output, the template of output comprising a field of image identification and a field of attractiveness score; and generating the prompt based on the description of task objective, the image list, the scoring criteria, and the template of output.
The method of claim 1, wherein determining the target image for display from the plurality of candidate images based on the plurality of probability distributions comprises: generating a plurality of sample values corresponding to the plurality of candidate images by performing random samplings on the plurality of probability distributions respectively; and determining the target image from the plurality of candidate images based on the plurality of sample values.
The method of claim 5, wherein determining the target image from the plurality of candidate images based on the plurality of random sample values comprises: determining a candidate image with a greatest sample value as the target image.
The method of claim 1, wherein generating the plurality of probability distributions corresponding to the plurality of candidate images based on the plurality of probability distributions comprises: generating a plurality of Beta distributions corresponding to the plurality of candidate images based on the plurality of attractiveness ranks, a Beta distribution of the plurality of Beta distributions comprising an alpha parameter and a beta parameter, the alpha parameter indicating a number of times that users interact with a candidate image, the beta parameter indicating a number of times that users have not interacted with the candidate image.
The method of claim 7, wherein the plurality of candidate images comprises a first candidate image and a second candidate image, an attractiveness rank of the first candidate image is higher than an attractiveness rank of the second candidate image, and generating the plurality of Beta distributions corresponding to the plurality of candidate images based on the plurality of attractiveness ranks comprises: initializing a plurality of alpha parameters of the plurality of Beta distributions based on the plurality of attractiveness ranks, wherein a value of an alpha parameter corresponding to the first candidate image is greater than a value of an alpha parameter corresponding to the second candidate image.
The method of claim 8, further comprises: transmitting the target image to a user device for display; receiving a feedback data, the feedback data indicating whether a user has interacted with the target image; and updating a target Beta distribution corresponding to the target image based on the feedback data.
The method of claim 9, wherein updating the target Beta distribution corresponding to the target image based on the feedback data comprises: increasing a value of an alpha parameter of the target Beta distribution in response to the feedback data indicating that the user has interacted with the target image; and increasing a value of a beta parameter of the target Beta distribution in response to the feedback data indicating that the user has not interacted with the target image.
An electronic device, comprising: a memory and a processor; wherein the memory is configured to store one or more computer instructions which, when executed by the processor, cause the processor to: obtain a plurality of candidate images associated with an object; generate a prompt for a language model based on the plurality of candidate images; obtain a plurality of attractiveness ranks corresponding to the plurality of candidate images by feeding the prompt to the language model; determine a plurality of probability distributions corresponding to the plurality of candidate images based on the plurality of attractiveness ranks; and determine a target image for display from the plurality of candidate images based on the plurality of probability distributions.
The device of claim 11, wherein the instructions causing the processor to generate the prompt for the language model based on the plurality of candidate images comprise instructions causing the processor to: obtain a description of task objective; obtain an image list based on the plurality of candidate images; and generate the prompt based on the description of task objective and the image list.
The device of claim 12, wherein the instructions causing the processor to generate the prompt based on the description of task objective and the image list comprise instructions causing the processor to: obtain a scoring criteria associated with the object; and generate the prompt based on the description of task objective, the image list, and the scoring criteria.
The device of claim 13, wherein the instructions causing the processor to generate the prompt based on the description of task objective, the image list, and the scoring criteria comprise instructions causing the processor to: obtain a template of output, the template of output comprising a field of image identification and a field of attractiveness score; and generate the prompt based on the description of task objective, the image list, the scoring criteria, and the template of output.
The device of claim 11, wherein the instructions causing the processor to determine the target image for display from the plurality of candidate images based on the plurality of probability distributions comprise instructions causing the processor to: generate a plurality of sample values corresponding to the plurality of candidate images by performing random samplings on the plurality of probability distributions respectively; and determine the target image from the plurality of candidate images based on the plurality of sample values.
The device of claim 15, wherein the instructions causing the processor to determine the target image from the plurality of candidate images based on the plurality of random sample values comprise instructions causing the processor to: determine a candidate image with a greatest sample value as the target image.
The device of claim 11, wherein the instructions causing the processor to generate the plurality of probability distributions corresponding to the plurality of candidate images based on the plurality of probability distributions comprise instructions causing the processor to: generate a plurality of Beta distributions corresponding to the plurality of candidate images based on the plurality of attractiveness ranks, a Beta distribution of the plurality of Beta distributions comprising an alpha parameter and a beta parameter, the alpha parameter indicating a number of times that users interact with a candidate image, the beta parameter indicating a number of times that users have not interacted with the candidate image.
The device of claim 17, wherein the plurality of candidate images comprises a first candidate image and a second candidate image, an attractiveness rank of the first candidate image is higher than an attractiveness rank of the second candidate image, and the instructions causing the processor to generate the plurality of Beta distributions corresponding to the plurality of candidate images based on the plurality of attractiveness ranks comprise instructions causing the processor to: initialize a plurality of alpha parameters of the plurality of Beta distributions based on the plurality of attractiveness ranks, wherein a value of an alpha parameter corresponding to the first candidate image is greater than a value of an alpha parameter corresponding to the second candidate image.
The device of claim 18, the memory is further configured to store instructions causing the processor to: transmit the target image to a user device for display; receive a feedback data, the feedback data indicating whether a user has interacted with the target image; and update a target Beta distribution corresponding to the target image based on the feedback data.
A non-transitory computer-readable medium comprising instructions stored thereon which, when executed by a processor, cause the processor to: obtain a plurality of candidate images associated with an object; generate a prompt for a language model based on the plurality of candidate images; obtain a plurality of attractiveness ranks corresponding to the plurality of candidate images by feeding the prompt to the language model; determine a plurality of probability distributions corresponding to the plurality of candidate images based on the plurality of attractiveness ranks; and determine a target image for display from the plurality of candidate images based on the plurality of probability distributions.

Description

METHOD, DEVICE, AND MEDIUM FOR DETERMINING IMAGE FOR DISPLAY CROSS-REFERENCE TO RELATED APPLICATION (S) This application claims priority to U.S. Application No. 18/890, 529 filed on September 19, 2024, the disclosure of which is incorporated herein by reference in its entirety. BACKGROUND In modern mobile and web applications, several platforms allow users to upload images related to a specific object or subject, and these platforms use various algorithms or criteria to select and display the most relevant or appealing image to other users. For example, some e-commerce platforms may obtain multiple images of a product from different angles or in different settings. Then, the platforms may select the most visually appealing or contextually relevant image based on several factors. For another example, in some news aggregation platforms, several images might be associated with a single article. These images may include stock photos, author-provided images, or automatically generated thumbnails from video content. The platforms may use algorithms to determine which image to display in the preview. SUMMARY In a first aspect according to some implementations of the present disclosure, a method for determining an image for display is provided. The method comprises obtaining a plurality of candidate images associated with an object. The method further comprises generating a prompt for a language model based on the plurality of candidate images. The method further comprises obtaining a plurality of attractiveness ranks corresponding to the plurality of candidate images by feeding the prompt to the language model. The method further comprises determining a plurality of probability distributions corresponding to the plurality of candidate images based on the plurality of attractiveness ranks. In addition, the method further comprises determining a target image for display from the plurality of candidate images based on the plurality of probability distributions. In a second aspect according to some implementations of the present disclosure, an electronic device comprising a memory and a processor is provided. The memory is configured to store computer instructions which, when executed by the processor, cause the processor to obtain a plurality of candidate images associated with an object. The instructions further cause the processor to generate a prompt for a language model based on the plurality of candidate images. The instructions further cause the processor to obtain a plurality of attractiveness ranks corresponding to the plurality of candidate images by feeding the prompt to the language model. The instructions further cause the processor to determine a plurality of probability distributions corresponding to the plurality of candidate images based on the plurality of attractiveness ranks. In addition, the instructions further cause the processor to determine a target image for display from the plurality of candidate images based on the plurality of probability distributions. In a third aspect according to some implementations of the present disclosure, a non-transitory computer-readable medium is provided. The medium comprises instructions stored thereon which, when executed by a processor, cause the processor to obtain a plurality of candidate images associated with an object. The instructions further cause the processor to generate a prompt for a language model based on the plurality of candidate images. The instructions further cause the processor to obtain a plurality of attractiveness ranks corresponding to the plurality of candidate images by feeding the prompt to the language model. The instructions further cause the processor to determine a plurality of probability distributions corresponding to the plurality of candidate images based on the plurality of attractiveness ranks. In addition, the instructions further cause the processor to determine a target image for display from the plurality of candidate images based on the plurality of probability distributions. Any of the one or more above aspects in combination with any other of the one or more aspects. Any of the one or more aspects as described herein. This Summary is provided to introduce a selection of concepts in a simplified form, which is further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Additional aspects, features, and/or advantages of examples will be set forth in part in the following description and, in part, will be apparent from the description, or may be learned by practice of the disclosure. BRIEF DESCRIPTION OF THE DRAWINGS Implementations of the present disclosure may be understood from the following Detailed Description when read with the accompanying figures. In accordance with the standard practice in the industry, various features are not dra