CN-122023571-A - Picture generation method and device comprising hands, electronic equipment and storage medium

CN122023571ACN 122023571 ACN122023571 ACN 122023571ACN-122023571-A

Abstract

The application discloses a method, a device, electronic equipment and a storage medium for generating a picture comprising hands, wherein the method comprises the steps of generating a first picture based on text information, wherein the first picture comprises a first hand, performing hand key point detection processing on the first picture to obtain first skeleton information of the first hand, matching the first skeleton information with second skeleton information of hand skeletons in a preset skeleton library, determining a target hand skeleton from the hand skeletons in the preset skeleton library, taking the target hand skeleton as auxiliary guide information, guiding to generate a second picture based on the text information, the second picture comprises a second hand, the similarity between the second hand and the first hand meets the preset similarity, and the abnormality rate of the first hand is larger than that of the second hand. According to the embodiment of the application, the matched skeleton is used for regenerating the picture, so that the possibility of deformity of the hand is reduced, the consistency of the picture is ensured, and the problems of color and edge caused by local generation are avoided.

Inventors

SU RONG

Assignees

北京搜狗科技发展有限公司

Dates

Publication Date: 20260512
Application Date: 20241106

Claims (13)

1. A method of generating a picture including a hand, the method comprising: Generating a first picture based on the text information, wherein the first picture comprises a first hand; Performing hand key point detection processing on the first picture to obtain first skeleton information of the first hand; matching the first skeleton information with second skeleton information of a hand skeleton in a preset skeleton library, and determining a target hand skeleton from the hand skeletons in the preset skeleton library; The target hand skeleton is used as auxiliary guide information to guide generation of a second picture based on the text information, the picture performance of the first picture and the picture performance of the second picture correspond to the text information, the second picture comprises a second hand, the similarity between the second hand and the first hand meets the preset similarity, and the abnormality rate of the first hand is larger than that of the second hand.
2. The method of generating a picture including a hand according to claim 1, The generating a first picture based on the text information includes: inputting the text information into a first picture generation model to generate the first picture; the step of using the target hand skeleton as auxiliary guiding information to guide the text information to generate a second picture includes: The target hand skeleton and the text information are input into a second picture generation model to generate the second picture, the second picture generation model comprises the first picture generation model and an auxiliary guide structure, and the auxiliary guide structure uses the target hand skeleton as the auxiliary guide information to guide the first picture generation model to generate the second picture based on the text information.
3. The method of claim 2, wherein the hand skeleton in the pre-set skeleton library is determined from an original set of hand skeletons based on the second image generation model.
4. A method of generating a picture containing a hand as claimed in claim 3, further comprising: The method comprises the steps of obtaining an original hand skeleton set and an original text information set, wherein the original hand skeleton set comprises a plurality of original hand skeletons, and the original text information set comprises a plurality of original text information; performing picture generation processing on the original hand skeleton set and the original text information set by using the second picture generation model to obtain a plurality of pictures to be processed; And determining the hand skeleton in the preset skeleton library from the original hand skeleton set according to the hand expression states in the plurality of pictures to be processed.
5. The method for generating a picture including a hand according to claim 4, wherein said performing a picture generation process on the original hand skeleton set and the original text information set using the second picture generation model to obtain a plurality of pictures to be processed includes: For each original hand skeleton in the set of original hand skeletons, performing: taking the original hand skeleton currently being processed as a current hand skeleton; determining a first amount of original text information from the set of original text information; And inputting the current hand skeleton into the auxiliary guide structure as original guide information, and guiding the first picture generation model to generate a first number of pictures to be processed corresponding to the current hand skeleton based on the first number of original text information.
6. The method for generating a picture including a hand according to claim 4, wherein determining a hand skeleton in the preset skeleton library from the original hand skeleton set according to hand performance states in the plurality of pictures to be processed includes: Determining a second number of to-be-processed pictures from the first number of to-be-processed pictures corresponding to each original hand skeleton, wherein the hand expression state of each to-be-processed picture in the second number of to-be-processed pictures meets a preset hand condition; The original hand skeletons are concentrated, and the original hand skeletons meeting the quantity condition are determined to be hand skeletons in the preset skeleton library; the quantity condition characterizes that the ratio of the second quantity and the first quantity satisfies a preset ratio.
7. The method of generating a picture including a hand according to any one of claims 1-6, further comprising: Performing hand key point detection processing on the hand skeletons in the preset skeleton library to obtain second skeleton information of the hand skeletons in the preset skeleton library, wherein the second skeleton information comprises key point positions and key point spatial relations; and carrying out standardization processing on the second skeleton information to obtain standardized second skeleton information.
8. The method for generating a picture including a hand according to claim 7, wherein the matching the first skeleton information with the second skeleton information of the hand skeleton in the preset skeleton library, and determining the target hand skeleton from the hand skeletons in the preset skeleton library, includes: carrying out standardization processing on the first skeleton information to obtain standardized first skeleton information; Matching the normalized first skeleton information with the normalized second skeleton information of the hand skeletons in the preset skeleton library, and determining the target hand skeleton from the hand skeletons in the preset skeleton library.
9. The method of generating a picture including a hand according to any one of claims 1 to 6, wherein the target hand skeleton is presented in the form of an image; the target hand skeleton is used as auxiliary guiding information to guide the text information to generate a second picture, wherein the second picture comprises a second hand and comprises the following components: the target hand skeleton is proportionally adjusted according to the proportion of the first hand; and taking the target hand skeleton after the proportion adjustment as auxiliary guide information, and guiding the text information to generate the second picture.
10. A picture generation device comprising a hand, the device comprising: the first picture generation module is used for generating a first picture based on the text information, wherein the first picture comprises a first hand; The key point detection module is used for carrying out hand key point detection processing on the first picture to obtain first skeleton information of the first hand; the framework matching module is used for matching the first framework information with the second framework information of the hand frameworks in the preset framework library and determining a target hand framework from the hand frameworks in the preset framework library; The second picture generation module is used for guiding the target hand skeleton to generate a second picture based on the text information as auxiliary guiding information, wherein the picture performance of the first picture and the picture performance of the second picture correspond to the text information, the second picture comprises a second hand, the similarity between the second hand and the first hand meets the preset similarity, and the abnormality rate of the first hand is larger than that of the second hand.
11. An electronic device comprising a processor and a memory, wherein the memory stores at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement the method for generating a hand-containing picture according to any one of claims 1 to 9.
12. A computer-readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the hand-containing picture generation method of any one of claims 1-9.
13. A computer program, wherein the computer program when executed by a processor implements the method for generating a hand-containing picture according to any one of claims 1 to 9.

Description

Picture generation method and device comprising hands, electronic equipment and storage medium Technical Field The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for generating a picture including a hand, an electronic device, and a storage medium. Background With the development of artificial intelligence technology, various data processing models of artificial intelligence have been rapidly developed. The rapid iteration of AI-generated images, which is an important part of artificial intelligence technology, has entered an explosion phase, which is embodied in that the citizens perform AI image creation in an attempt to reduce cost or increase productivity through AI. However, for automated production and landing of AI images, a deformity is currently facing one of the most important issues. For example, because a person has ten fingers, the fingers vary in length, direction, and hand often interacts with a variety of objects, this makes it difficult for the model to learn this feature. And the hand occupies a smaller area in the image, so that the generation success rate is lower, and the usability of the model image is seriously affected by the hand-based processing. Disclosure of Invention In order to solve the problems in the prior art, the embodiment of the invention provides a method, a device, electronic equipment and a storage medium for generating a picture including a hand. The technical proposal is as follows: In one aspect, a method for generating a picture including a hand is provided, the method including: Generating a first picture based on the text information, wherein the first picture comprises a first hand; Performing hand key point detection processing on the first picture to obtain first skeleton information of the first hand; matching the first skeleton information with the second skeleton information of the hand skeletons in the preset skeleton library, and determining a target hand skeleton from the hand skeletons in the preset skeleton library; The target hand skeleton is used as auxiliary guide information to guide generation of a second picture based on text information, the picture performance of the first picture and the picture performance of the second picture correspond to the text information, the second picture comprises a second hand, the similarity between the second hand and the first hand meets the preset similarity, and the abnormality rate of the first hand is larger than that of the second hand. In another aspect, there is provided a picture generation apparatus including a hand, the apparatus including: the first picture generation module is used for generating a first picture based on the text information, wherein the first picture comprises a first hand; the key point detection module is used for carrying out hand key point detection processing on the first picture to obtain first skeleton information of the first hand; the framework matching module is used for matching the first framework information with the second framework information of the hand frameworks in the preset framework library and determining a target hand framework from the hand frameworks in the preset framework library; The second picture generation module is used for guiding the target hand skeleton to generate a second picture based on the text information by taking the target hand skeleton as auxiliary guiding information, wherein the picture representation of the first picture and the picture representation of the second picture correspond to the text information, the second picture comprises a second hand, the similarity between the second hand and the first hand meets the preset similarity, and the abnormality rate of the first hand is larger than that of the second hand. In some of the possible embodiments of the present invention, A first picture generation module, configured to: Inputting the text information into a first picture generation model to generate a first picture; a second picture generation module, configured to: The target hand skeleton and the text information are input into a second picture generation model to generate a second picture, wherein the second picture generation model comprises a first picture generation model and an auxiliary guide structure, and the auxiliary guide structure uses the target hand skeleton as auxiliary guide information to guide the first picture generation model to generate the second picture based on the text information. In some possible embodiments the hand skeleton in the preset skeleton library is determined from the original hand skeleton set based on the second picture generation model. In some possible embodiments the apparatus further comprises a skeleton determination module for: The method comprises the steps of acquiring an original hand skeleton set and an original text information set, wherein the original hand skeleton set comprises a plurality of original hand s