CN-121979437-A - Method, device and equipment for generating information based on image

CN121979437ACN 121979437 ACN121979437 ACN 121979437ACN-121979437-A

Abstract

The embodiment of the specification provides a method, a device and equipment for generating information based on images. The method comprises the steps of obtaining touch operation which is executed by a user aiming at an image displayed by a terminal device, determining an area image pointed by the touch operation in the image in response to the touch operation, extracting elements from the area image, determining target elements contained in the area image, and generating an avatar with the same characteristics as the target elements and introduction information aiming at the target elements by using a generation model based on the target elements, wherein the avatar and the introduction information can be displayed on the terminal device.

Inventors

ZHOU CHUNXIAN

Assignees

支付宝(杭州)数字服务技术有限公司

Dates

Publication Date: 20260505
Application Date: 20260122

Claims (14)

1. A method of generating information based on an image, comprising: Acquiring touch operation executed by a user aiming at an image displayed by terminal equipment; responding to the touch operation, and determining an area image pointed by the touch operation in the image; Extracting elements from the area image, and determining target elements contained in the area image; Based on the target element, generating an avatar having the same characteristics as the target element and introduction information for the target element by using a generative model, wherein the terminal device can display the avatar and the introduction information.
2. The method of claim 1, the method further comprising: Acquiring shooting context information of the image, wherein the context information comprises at least one of shooting places, shooting time, weather conditions, illumination conditions or surrounding object categories; The generating, based on the target element, an avatar having the same characteristics as the target element and introduction information for the target element using a generative model includes: generating a prompt word based on the context information and the target element, wherein the prompt word is used for indicating a generated model to combine the context information to perform style adjustment on the target element so as to generate an avatar reflecting the context information and introduction information aiming at the target element; and inputting the prompt word into a generative model to obtain the avatar and the introduction information aiming at the target element, wherein the avatar and the introduction information contain the characteristics represented by the context information.
3. The method of claim 1, the generating an avatar having the same characteristics as the target element, comprising: the method comprises the steps of obtaining a style tag, wherein the style tag is used for indicating a generated model to generate an virtual image conforming to a style corresponding to the style tag, and the style tag at least comprises at least one of cartoon style, 2D style, 3D style, line style, ink-wash style, anthropomorphic style and fairy tale style; And providing the target element and the style label to a generative model, and obtaining the virtual image which has the same characteristics as the target element and accords with the style corresponding to the style label by using the generative model.
4. A method according to claim 3, the method further comprising: identifying at least one characteristic information of the name, the category and the material of the target element; The providing the target element and the style tag to a generative model includes: And providing the characteristic information and the style tag to a generative model, wherein the generative model can generate an avatar which has the same characteristic as the target element and accords with the style corresponding to the style tag based on the characteristic information and the style tag.
5. The method of claim 1, the method further comprising: the method comprises the steps of acquiring user characteristic information of terminal equipment, wherein the user characteristic information comprises historical interaction preference information of a user, and the historical interaction preference information represents interaction information generated by the user for interaction of IP (Internet protocol) of the user preference; Determining the IP image preferred by the user according to the user characteristic information; the generating an avatar having the same characteristics as the target element includes: Providing the information of the target element and the IP image preferred by the user to a generation model, and obtaining an virtual image which has the same characteristics as the target element and accords with the image style of the IP image by using the generation model, wherein the information of the IP image preferred by the user comprises at least one of text information described in a natural language description form and an image of the IP image, and the text information at least comprises one of the name of the IP image, the visual information of the IP image and the category information of the IP image.
6. The method of claim 5, the method further comprising: determining a cognitive hierarchy of the user based on the user characteristic information; and generating introduction information matched with the cognitive hierarchy and aiming at the target element.
7. The method of claim 6, the method further comprising: if the cognitive level is lower than a preset level, generating audio of the introduction information so that the user terminal plays the audio in the process of displaying the introduction information; or if the cognitive level is lower than a preset level, generating introduction video based on the introduction information, so that the user terminal plays the introduction information in the form of video.
8. The method of claim 7, wherein the auditory style of the audio has the same emotional dimension as the visual style of the avatar, or wherein the auditory visual style of the introduction video has the same emotional dimension as the visual style of the avatar, and wherein the emotional dimension comprises at least one of liveness, sink, child interest, portrayal, humor, seriousness.
9. The method of claim 1, wherein if the introduction information for the target element includes audio information, the audio information is generated by a text-to-speech engine according to a sound conforming to a character feature of the avatar, the text-to-speech engine is capable of dynamically adjusting tone, speed and intonation parameters of the audio information according to the character feature of the avatar, the character feature includes a life stage feature, a character feature and a physiological attribute feature of the avatar.
10. The method of claim 1, wherein if the introduction information for the target element includes audio information, the terminal device further includes an audio playing control in a page for displaying the avatar; Or alternatively If the introduction information aiming at the target element comprises video information, the page of the virtual image displayed by the terminal equipment further comprises a video playing control; and if the terminal equipment acquires the triggering operation of the user on the video playing control, playing the video information.
11. The method of claim 1, the method further comprising: Determining characteristic information of the target element, wherein the characteristic information comprises information representing a category or information representing a material; According to the characteristic information, determining sound elements corresponding to the characteristic information; and the terminal equipment can play the background sound while playing the audio information aiming at the target element.
12. The method of claim 1, the method further comprising: according to the name and/or the category of the target element, selecting the ancient poetry matched with the target element from an ancient poetry library; And the terminal equipment displays the ancient poetry in the background area of the virtual image in a superposition manner, or displays the ancient poetry in the introduction information.
13. An apparatus for generating information based on an image, comprising: the operation acquisition module is used for acquiring touch operation executed by a user aiming at an image displayed by the terminal equipment; the area image determining module is used for responding to the touch operation and determining an area image pointed by the touch operation in the image; The element determining module is used for extracting elements from the area image and determining target elements contained in the area image; An information generation module for generating an avatar having the same characteristics as the target element and introduction information for the target element using a generative model based on the target element; the terminal device can display the avatar and the introduction information.
14. A computing device, comprising: A memory and a processor; The memory is adapted to store a computer program/instruction, the processor being adapted to execute the computer program/instruction, which when executed by the processor, implements the steps of the method of any of claims 1 to 12.

Description

Method, device and equipment for generating information based on image Technical Field One or more embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method for generating information based on an image. One or more embodiments of the present specification relate to an apparatus, a computing device, a computer-readable storage medium, and a computer program product for generating information based on an image. Background With the continuous development of society, users can learn knowledge from applications in different modes, such as watching animation, watching short video, watching pictures and texts, and the like, however, most of knowledge is composed of fixed animation or pictures, personalized teaching contents cannot be displayed according to actual demands, and interestingness is lacking. Therefore, how to provide a method capable of generating personalized information for learning or viewing by a user based on simple interaction is a technical problem to be solved. Disclosure of Invention In view of this, one or more embodiments of the present disclosure provide a method, apparatus, device, and computer-readable medium for generating information based on an image, so as to solve the problem of low interest in the existing method for generating information from an image. According to a first aspect of one or more embodiments of the present specification, there is provided a method of generating information based on an image, comprising: Acquiring touch operation executed by a user aiming at an image displayed by terminal equipment; responding to the touch operation, and determining an area image pointed by the touch operation in the image; Extracting elements from the area image, and determining target elements contained in the area image; Based on the target element, generating an avatar having the same characteristics as the target element and introduction information for the target element by using a generative model, wherein the terminal device can display the avatar and the introduction information. According to a second aspect of one or more embodiments of the present specification, there is provided an apparatus for generating information based on an image, comprising: the operation acquisition module is used for acquiring touch operation executed by a user aiming at an image displayed by the terminal equipment; the area image determining module is used for responding to the touch operation and determining an area image pointed by the touch operation in the image; The element determining module is used for extracting elements from the area image and determining target elements contained in the area image; An information generation module for generating an avatar having the same characteristics as the target element and introduction information for the target element using a generative model based on the target element; the terminal device can display the avatar and the introduction information. According to a third aspect of one or more embodiments of the present description, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the method of generating information based on images when executing the computer instructions. According to a fourth aspect of one or more embodiments of the present description, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of generating information based on images. According to a fifth aspect of embodiments of the present specification, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the above-described method of generating information based on images. At least one embodiment of the specification can achieve the advantages that through acquiring touch operation executed by a user on an image displayed by a terminal device, determining an area image pointed by the touch operation in the image, extracting target elements from the area image, generating an avatar with the same characteristics as the target elements and introduction information for the target elements by using a generating model based on the target elements, and displaying the avatar and the introduction information on the terminal. The method and the device can enable a user to generate corresponding introduction information and virtual images for learning and understanding based on the learning requirement of the user and simply interact with the images, and can improve the interestingness and individuation of the generated information, improve the enthusiasm of the user for learning and understanding the target elements and improve the user experience without manually inquiring the knowledge related to the tar