CN-121996743-A - Data generation method and task platform

CN121996743ACN 121996743 ACN121996743 ACN 121996743ACN-121996743-A

Abstract

The embodiment of the specification provides a data generation method and a task platform, wherein the data generation method comprises the steps of obtaining a target image and seed question-answer information corresponding to the target image, obtaining image description information corresponding to the target image according to the target image, generating a question-answer prompt text according to the target image, the image description information, the seed question-answer information and a preset prompt text template, wherein the preset prompt text template comprises a prompt evolution type used for increasing question-answer diversity, inputting the question-answer prompt text into a question-answer prediction model, and obtaining predicted question-answer information which is output by the question-answer prediction model and aims at the target image, and the predicted question-answer information comprises a predicted question and a predicted answer corresponding to the predicted question. The method solves the problems of high manual labeling cost, uneven quality and simple model generation data in the current processing mode, and improves the quality of the predicted question-answer information.

Inventors

Lin Tingen
LUO RUN
ZHANG HAONAN
CHEN LONGZE
LIU XIONG
WU YUCHUAN
HUANG FEI
LI YONGBIN

Assignees

阿里巴巴（中国）有限公司

Dates

Publication Date: 20260508
Application Date: 20241104

Claims (17)

1. A data generation method, comprising: Acquiring a target image and seed question-answer information corresponding to the target image, and acquiring image description information corresponding to the target image according to the target image; Generating a question-answer prompt text according to the target image, the image description information, the seed question-answer information and a preset prompt text template, wherein the preset prompt text template comprises a prompt evolution type which is used for increasing the question-answer diversity; And inputting the question and answer prompt text into a question and answer prediction model, and obtaining predicted question and answer information which is output by the question and answer prediction model and aims at the target image, wherein the predicted question and answer information comprises a predicted question and a predicted answer corresponding to the predicted question.
2. The method of claim 1, further comprising: Inputting the predicted question-answer information and the target image into a question-answer information verification model, wherein the question-answer information verification model is used for determining whether the predicted answer is correct or not; Obtaining a verification result output by the question-answer information verification model; And under the condition that the verification result is correct, determining the predicted question-answer information as target question-answer information of the target image.
3. The method of claim 2, further comprising: and taking the target question-answering information as seed question-answering information, and continuously executing the operation of generating a question-answering prompt text according to the target image, the image description information, the seed question-answering information and a preset prompt text template.
4. The method of claim 1, further comprising: Acquiring target evolution demand information; and determining a prompt evolution type according to the target evolution demand information, and generating a preset prompt text template according to the prompt evolution type.
5. The method of claim 4, obtaining target evolution demand information, comprising: Acquiring at least one piece of reference evolution demand information; the target evolution demand information is randomly determined in the at least one reference evolution demand information.
6. The method of claim 4, generating a preset hint text template according to the hint evolution type, comprising: acquiring a preset evolution capability library, wherein the evolution capability library comprises at least one piece of preset evolution capability information; selecting at least one target evolution capability information from at least one preset evolution capability information; And generating a preset prompt text template according to the prompt evolution type, the preset evolution capability library and the information of each target evolution capability.
7. The method of claim 6, generating a preset hint text template from the hint evolution type, the preset evolution capability library, and each target evolution capability information, comprising: determining a prompt evolution parameter corresponding to the prompt evolution type; And splicing the prompt evolution parameters, the preset evolution capability library and the evolution capability information of each target to generate a preset prompt text template.
8. The method of claim 6, wherein the information type of the preset evolution capability information includes an image recognition capability and a text reasoning capability.
9. The method of claim 8, the image recognition capabilities comprising location capabilities, reference capabilities, computing capabilities, text recognition capabilities, and presence determination capabilities; The text reasoning capability includes a relationship description capability, a scene understanding capability, a behavior prediction capability, and a knowledge correlation capability.
10. The method of claim 1, the hint evolution type comprising any of a cognitive reasoning evolution type, an interactive evolution type, and a fine-grained perceptual evolution type.
11. The method of claim 2, further comprising: forming an image-text question-answer data pair according to the target image and the target question-answer information; and training a multi-mode large language model through the graph-text question-answer data pair.
12. The data generation method is applied to cloud side equipment and comprises the following steps: receiving a target image and seed question-answer information corresponding to the target image sent by a terminal side device, and acquiring image description information corresponding to the target image according to the target image; Generating a question-answer prompt text according to the target image, the image description information, the seed question-answer information and a preset prompt text template, wherein the preset prompt text template comprises a prompt evolution type which is used for increasing the question-answer diversity; Inputting the question-answer prompt text into a question-answer prediction model, and obtaining predicted question-answer information which is output by the question-answer prediction model and aims at the target image, wherein the predicted question-answer information comprises a predicted question and a predicted answer corresponding to the predicted question; and sending the predicted question-answer information to the terminal side equipment.
13. The method of claim 12, further comprising: Inputting the predicted question-answer information and the target image into a question-answer information verification model, wherein the question-answer information verification model is used for determining whether the predicted answer is correct or not; Obtaining a verification result output by the question-answer information verification model; if the verification result is correct, determining that the predicted question-answer information is the target question-answer information of the target image; and sending the target question-answer information to the end-side equipment.
14. A task platform comprising a request interface and a response unit; the request interface is used for receiving a target image sent by the terminal side equipment and seed question-answer information corresponding to the target image, and acquiring image description information corresponding to the target image according to the target image; The response unit is used for generating a question-answer prompt text according to the target image, the image description information, the seed question-answer information and a preset prompt text template, wherein the preset prompt text template comprises a prompt evolution type which is used for increasing the question-answer diversity, the question-answer prompt text is input into a question-answer prediction model, and the predicted question-answer information which is output by the question-answer prediction model and aims at the target image is obtained, and the predicted question-answer information comprises a predicted question and a predicted answer corresponding to the predicted question.
15. A computing device, comprising: A memory and a processor; the memory is adapted to store a computer program/instruction, the processor being adapted to execute the computer program/instruction, which when executed by the processor, implements the steps of the method of any of claims 1 to 13.
16. A computer readable storage medium storing a computer program/instruction which, when executed by a processor, implements the steps of the method of any one of claims 1 to 13.
17. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 13.

Description

Data generation method and task platform Technical Field The embodiment of the specification relates to the technical field of computers, in particular to a data generation method. Background Along with the development of computer technology, the multi-modal large language model is also developed, and the quality of graphic instruction data plays an important role in the capability of the multi-modal large language model on multiple visual language tasks. The current image-text instruction data has the problems of less quantity and lower quality, has become the development bottleneck of a multi-mode large language model, has the advantages of manual marking and model generation, high manual marking data cost and uneven quality, is affected by shallow reasoning, is simpler and cannot guarantee the complexity and the richness of the image-text instruction data. Disclosure of Invention In view of this, the present embodiment provides a data generation method. One or more embodiments of the present specification are also directed to a task platform, a computing device, a computer-readable storage medium, and a computer program product that address the deficiencies of the prior art. According to a first aspect of embodiments of the present specification, there is provided a data generating method, including: Acquiring a target image and seed question-answer information corresponding to the target image, and acquiring image description information corresponding to the target image according to the target image; Generating a question-answer prompt text according to the target image, the image description information, the seed question-answer information and a preset prompt text template, wherein the preset prompt text template comprises a prompt evolution type which is used for increasing the question-answer diversity; And inputting the question and answer prompt text into a question and answer prediction model, and obtaining predicted question and answer information which is output by the question and answer prediction model and aims at the target image, wherein the predicted question and answer information comprises a predicted question and a predicted answer corresponding to the predicted question. According to a second aspect of embodiments of the present specification, there is provided a data generating method, applied to a cloud-side device, including: receiving a target image and seed question-answer information corresponding to the target image sent by a terminal side device, and acquiring image description information corresponding to the target image according to the target image; Generating a question-answer prompt text according to the target image, the image description information, the seed question-answer information and a preset prompt text template, wherein the preset prompt text template comprises a prompt evolution type which is used for increasing the question-answer diversity; Inputting the question-answer prompt text into a question-answer prediction model, and obtaining predicted question-answer information which is output by the question-answer prediction model and aims at the target image, wherein the predicted question-answer information comprises a predicted question and a predicted answer corresponding to the predicted question; and sending the predicted question-answer information to the terminal side equipment. According to a third aspect of embodiments of the present specification, there is provided a task platform, comprising a request interface and a response unit; the request interface is used for receiving a target image sent by the terminal side equipment and seed question-answer information corresponding to the target image, and acquiring image description information corresponding to the target image according to the target image; The response unit is used for generating a question-answer prompt text according to the target image, the image description information, the seed question-answer information and a preset prompt text template, wherein the preset prompt text template comprises a prompt evolution type which is used for increasing the question-answer diversity, the question-answer prompt text is input into a question-answer prediction model, and the predicted question-answer information which is output by the question-answer prediction model and aims at the target image is obtained, and the predicted question-answer information comprises a predicted question and a predicted answer corresponding to the predicted question. According to a fourth aspect of embodiments of the present specification, there is provided a computing device comprising: A memory and a processor; the memory is configured to store a computer program/instruction, and the processor is configured to execute the computer program/instruction, where the computer program/instruction, when executed by the processor, implements the steps of the data generation method described above. According to a fifth aspe