CN-122020189-A - Method, device, medium and equipment for determining generation style of draft graph model

CN122020189ACN 122020189 ACN122020189 ACN 122020189ACN-122020189-A

Abstract

The embodiment of the disclosure provides a method, a device, a medium and equipment for determining a generating style of a generating graph model. The method comprises the steps of obtaining an original text of an instruction generated image, obtaining N matched texts corresponding to one or more prediction styles of the image generated according to the original text and N subdivision styles of each prediction style based on a preset large model, inputting the original text into a target text-to-text graph model to obtain a target image, combining the target image with each matched text and the original text respectively for each prediction style to obtain N+1 image text pairs, inputting each image text pair into a preset image-to-text matching model respectively to obtain image-to-text matching scores corresponding to each image text pair, and determining whether the prediction styles are real styles of the generated image by the text-to-text graph model according to the image-to-text matching scores. By the method, the style of the generated image of the text-generated graph model can be determined efficiently and automatically, and the consumed time and labor cost are reduced.

Inventors

HUANG ZHENYU
YAO DONGYU
LI HANYU
LI MING
Mou Yaling
JIANG RUIJIE
QI LE
SUN JINGWEI

Assignees

北京字跳网络技术有限公司

Dates

Publication Date: 20260512
Application Date: 20241106

Claims (10)

1. A method of determining a text-to-graph model rendering style, comprising: Acquiring an original text of an instruction generated image, and acquiring N matched texts corresponding to one or more prediction styles of the image generated according to the original text and N subdivision styles of each prediction style based on a preset large model, wherein N is a natural number greater than or equal to 1; And for each prediction style, respectively combining the target image with each matched text corresponding to the prediction style and the original text to obtain N+1 image text pairs, respectively inputting each image text pair into a preset image-text matching model to obtain image-text matching scores corresponding to each image text pair, and determining whether the prediction style is the real style of the generated image of the text-to-text graph model according to the image-text matching scores.
2. The method of claim 1, wherein obtaining N matching texts corresponding to one or more prediction styles of the image generated from the original text, N subdivision styles of each prediction style, based on the preset large model, comprises: the method comprises the steps of constructing a first prompt word, inputting the first prompt word into a preset large model to obtain N matched texts corresponding to one or more prediction styles and N subdivision styles of each prediction style, wherein the first prompt word comprises an original text and a first prompt part, the first prompt part indicates one or more prediction styles and N matched texts corresponding to N subdivision styles of each prediction style which are deduced to be presented by a generated image according to the original text.
3. The method of claim 1, wherein determining a target style of the generated image of the meridional graph model based on the graph-text matching score comprises: and determining a first image text pair with the highest image text matching score from a plurality of image text pairs according to the image text matching scores respectively corresponding to the image text pairs, and determining the prediction style as the real style of the generated image of the text-to-text graph model if the first image text pair is obtained by combining any one of the N matched texts with the target image.
4. A method according to claim 3, further comprising: and if the first image text pair is obtained by combining the original text and the target image, determining that the prediction style is not the real style of the generated image of the draft image model.
5. The method of claim 2, further comprising: And if the prediction style is determined to be the real style of the generated image model, determining the subdivision style in the real style of the generated image model of the generated image according to the subdivision style corresponding to the matched text in the text pair of the first image.
6. The method of claim 1, wherein the first hint word further includes a second hint portion that indicates an example text that generated an image, a style instance of the example image generated from the example text, and an instance of a plurality of matching texts corresponding to a plurality of subdivision styles of the style instance.
7. The method of claim 1, wherein obtaining N matching texts corresponding to one or more prediction styles of the image generated from the original text, N subdivision styles of each prediction style, based on the preset large model, comprises: And searching one or more prediction styles and N matched texts corresponding to N subdivision styles of each prediction style from a preset knowledge base according to the original text, wherein the knowledge base is pre-associated with and stores the original text, the one or more prediction styles obtained based on a preset large model and a plurality of matched texts corresponding to the subdivision styles of each prediction style.
8. An apparatus for determining a text-to-graph model rendering style, comprising: The image processing device comprises an acquisition unit, a prediction unit and a processing unit, wherein the acquisition unit is configured to acquire an original text of an instruction generated image, and obtain N matched texts corresponding to one or more prediction styles of the image generated according to the original text and N subdivision styles of each prediction style based on a preset large model, wherein N is a natural number greater than or equal to 1; The judging unit is configured to input the original text into a target text-to-text graph model to obtain target images, for each prediction style, combine the target images with each matched text corresponding to the prediction style and the original text to obtain N+1 image text pairs, respectively input each image text pair into a preset graph-text matching model to obtain graph-text matching scores corresponding to each image text pair, and determine whether the prediction style is the real style of the generated image of the text-to-text graph model according to the graph-text matching scores.
9. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-7.
10. An electronic device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-7.

Description

Method, device, medium and equipment for determining generation style of draft graph model Technical Field The embodiment of the disclosure relates to the technical field of computers, in particular to a method, a device, a medium and equipment for determining a generating style of a text-generated graph model. Background The Text-to-Image (Text-to-Image) generation model, which refers to an artificial intelligence model capable of generating corresponding Image content according to Text description, has wide application in fields such as painting, architectural design, education assistance, and the like. However, current text-generated graphic models often have specific types of styles that are difficult to avoid because of different sources of training data, different configuration parameters of the models, and the like. For example, images generated by some of the text-to-graphic models often have a particular pictorial style, such as written reality, cartoon, abstract, and so forth. Other graphical models typically generate images having a specific composition style, such as symmetrical composition, tricomponent composition, and the like. When a user uses a text-generated graph model, a long use time is often required to know the style of a model generated image, and the model is difficult to accurately determine efficiently. Disclosure of Invention The embodiment of the disclosure describes a method, a device, a medium and equipment for determining a generating style of a generating graph model. According to a first aspect, there is provided a method of determining a text-to-graph model rendering style, comprising: Acquiring an original text of an instruction generated image, and acquiring N matched texts corresponding to one or more prediction styles of the image generated according to the original text and N subdivision styles of each prediction style based on a preset large model, wherein N is a natural number greater than or equal to 1; And for each prediction style, respectively combining the target image with each matched text corresponding to the prediction style and the original text to obtain N+1 image text pairs, respectively inputting each image text pair into a preset image-text matching model to obtain image-text matching scores corresponding to each image text pair, and determining whether the prediction style is the real style of the generated image of the text-to-text graph model according to the image-text matching scores. According to a second aspect, there is provided an apparatus for determining a rendering style of a text-to-graph model, comprising: The image processing device comprises an acquisition unit, a prediction unit and a processing unit, wherein the acquisition unit is configured to acquire an original text of an instruction generated image, and obtain N matched texts corresponding to one or more prediction styles of the image generated according to the original text and N subdivision styles of each prediction style based on a preset large model, wherein N is a natural number greater than or equal to 1; The judging unit is configured to input the original text into a target text-to-text graph model to obtain target images, for each prediction style, combine the target images with each matched text corresponding to the prediction style and the original text to obtain N+1 image text pairs, respectively input each image text pair into a preset graph-text matching model to obtain graph-text matching scores corresponding to each image text pair, and determine whether the prediction style is the real style of the generated image of the text-to-text graph model according to the graph-text matching scores. According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect. According to a fourth aspect, there is provided an electronic device comprising a memory having executable code stored therein and a processor which when executing the executable code implements the method of the first aspect. Provided are apparatuses, devices, and media according to embodiments of the present disclosure. Firstly, acquiring an original text for indicating to generate an image, and acquiring a plurality of matching texts corresponding to one or more prediction styles of the image generated according to the original text and a plurality of subdivision styles of each prediction style based on a preset large model. And for each prediction style, combining the target image with each matched text corresponding to the prediction style and the original text to obtain a plurality of image text pairs, respectively inputting each image text pair into a preset image-text matching model to obtain image-text matching scores corresponding to each image text pair, and determining whether the prediction style is the real style of the image generated by the tex