CN-121981246-A - Accompanying processing method and device for image file and electronic equipment

CN121981246ACN 121981246 ACN121981246 ACN 121981246ACN-121981246-A

Abstract

The application discloses an accompanying processing method and device of an image file and electronic equipment. The method comprises the steps of determining an image file currently used by a user, obtaining subject information of the image file currently used by the user, obtaining a question which is presented by the user in real time, obtaining a target picture corresponding to the question and text information corresponding to the target picture, generating an answer to the question which is presented by the user in real time according to the subject information, the question, the target picture corresponding to the question and the text information corresponding to the target picture, and providing the answer to the user. The method and the device ensure that answers of the questions are attached to the picture theme plots by extracting the theme information and the text information, can reduce the divergence of the answers, reduce the error rate of the answers, improve the accuracy and consistency of the answers, take the generated theme information as a front input item of the model, accelerate the reasoning speed of the model, improve the conversation efficiency, be applicable to different application scenes and improve the user experience and the efficiency of the system.

Inventors

YANG JINRONG
ZHENG QIONGXIA
WU SHENGKAI
ZHANG SIHENG

Assignees

广州希倍思智能科技有限公司
广州视源电子科技股份有限公司
广州视源人工智能创新研究院有限公司

Dates

Publication Date: 20260505
Application Date: 20241028

Claims (10)

1. A method for processing accompaniment of an image file, comprising: determining an image file currently used by a user; Acquiring theme information of an image file currently used by the user; Acquiring a problem which is presented by the user in real time; Acquiring a target picture corresponding to the problem and text information corresponding to the target picture, wherein the target picture is a picture in the image file; generating answers of the questions proposed by the user in real time according to the theme information, the questions, the target pictures corresponding to the questions and the text information corresponding to the target pictures; providing the answer to the user.
2. The method of claim 1, wherein the acquiring subject information of the image file currently used by the user comprises: identifying the image file, obtaining identification information corresponding to the image file, and obtaining theme information of the image file from a preset database according to the identification information.
3. The method of claim 1, wherein the acquiring subject information of the image file currently used by the user comprises: extracting text information corresponding to page content in the image file according to the image file; according to the text information corresponding to the extracted page content, obtaining continuous text information corresponding to the image file; Acquiring description information of page pictures in the image file; and inputting the continuous text information, the description information of the page pictures and the page pictures in the image file into a preset large language model, and generating the theme information of the image file through the preset large language model.
4. The method of claim 3, further comprising obtaining the pre-set large language model; The obtaining the preset large language model comprises the following steps: Acquiring a data set containing a relevance picture, text information corresponding to the relevance picture and theme information; Preprocessing the text information and the theme information in the data set to align the preprocessed text information and theme information with the relevance picture; Inputting the preprocessed relevance pictures, the text information and the theme information into a large language model to be trained so as to output predicted theme information corresponding to the relevance pictures; Constraining the relation between the predicted topic information and the topic information through a loss function so as to enable the predicted topic information output by the large language model to be trained to be close to the topic information; And taking the large language model to be trained, of which the output predicted subject information is closest to the subject information, as the preset large language model.
5. The method of claim 4, wherein the obtaining a data set including a relevance picture, text information corresponding to the relevance picture, and subject information, comprises: Acquiring a preset number of relevance pictures; Respectively extracting text structure information of pictures in the relevance pictures, wherein the text structure information comprises text information and outer frame geometric information of the text; integrating the text information extracted by the picture with the geometric information of the outer frame of the text to form continuous first text information corresponding to the picture; Integrating the first text information corresponding to the relevance picture to obtain continuous second text information; acquiring description information of pictures in the relevance pictures; Obtaining theme information according to the preset number of associated pictures, the second text information and the description information of the pictures in the associated pictures; Organizing the preset number of relevance pictures, the second text information and the theme information to form a data set.
6. The method of claim 5, wherein integrating the text information extracted from the picture with the geometric information of the outer frame of the text to form the continuous first text information corresponding to the picture comprises: Clustering according to the geometric information of the outer frames of the characters in the pictures so that the outer frames of the characters in a preset range belong to the same category; And integrating the text information in the outer frames of the texts belonging to the same category into a continuous text information section to obtain continuous first text information corresponding to the picture.
7. The method of claim 5, wherein integrating the first text information corresponding to the association picture to obtain continuous second text information comprises: Acquiring the arrangement sequence of the pictures in the relevance pictures; And ordering the first text information corresponding to each picture according to the arrangement sequence to obtain ordered text information, wherein the ordered text information forms the second text information.
8. An apparatus for processing accompaniment of an image file, comprising: The image file determining module is used for determining an image file currently used by a user; the theme information acquisition module is used for acquiring theme information of the image file currently used by the user; the questioning module is used for acquiring the questions proposed by the user in real time; the image text extraction module is used for acquiring a target image corresponding to the problem and text information corresponding to the target image, wherein the target image is an image in the image file; the answer generation module is used for generating an answer of the question which is put forward by the user in real time according to the theme information, the question, the target picture corresponding to the question and the text information corresponding to the target picture; And the feedback module is used for providing the answer for the user.
9. An electronic device comprising a memory and a processor, the memory being connected to the processor, the processor being configured to execute one or more computer programs stored in the memory, the processor, when executing the one or more computer programs, causing the electronic device to implement the method of any of claims 1-7.
10. A non-transitory computer-readable storage medium storing computer-executable instructions which, when executed by an electronic device, cause the electronic device to perform the method of any one of claims 1 to 7.

Description

Accompanying processing method and device for image file and electronic equipment Technical Field The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for processing accompaniment of an image file, and an electronic device. Background The reading of the image file is intended to interpret and understand the image file for the user and to be able to answer questions posed by the user or to provide relevant guidance. In the course of reading the image file, if the user reading the image file has some questions, the reading system of the image file can answer and guide the questions. The prior art can only understand and converse the input of a few pictures when achieving the purpose, and cannot guarantee the understanding of the whole image file, however, a user may randomly ask questions for a certain page of the image file, and at this time, the system may not accurately answer the questions of the user. Disclosure of Invention An object of the embodiments of the present application is to provide a method, an apparatus, and an electronic device for processing accompaniment of an image file, so as to solve the problem how to improve efficiency and accuracy of answering a question in the accompaniment process of the image file. In order to solve the technical problems, the technical scheme adopted by the embodiment of the application is that an accompanying processing method of an image file is provided, which comprises the steps of determining the image file currently used by a user, obtaining subject information of the image file currently used by the user, obtaining a problem which is set forth by the user in real time, obtaining a target picture corresponding to the problem and text information corresponding to the target picture, wherein the target picture is a picture in the image file, generating an answer to the problem which is set forth by the user in real time according to the subject information, the problem, the target picture corresponding to the problem and the text information corresponding to the target picture, and providing the answer to the user. According to the method, the topic information and the text information are extracted from the current image file, so that the answer of the question is attached to the picture topic plot, the divergence of the answer can be reduced, the error rate of the answer is reduced, and the accuracy and consistency of the answer are improved. In addition, the generated subject information is used as a front-end input item of the model, and a large number of historical pictures are prevented from being input, so that the reasoning speed of the model is increased, the efficiency of a dialogue system can be improved, the process of answering questions is accelerated, and unnecessary calculation cost and time consumption are reduced. Moreover, the method is suitable for various task scenes, such as drawing and reading, conference PPT summarization and the like, and the system can better understand the whole scenario by tracing the historical pictures or the subsequent picture content and provide answers and explanations which are more fit for the demands of users. Therefore, the method of the embodiment has remarkable beneficial effects in the aspects of improving the reasoning speed, reducing the error rate, being applicable to different application scenes, improving the user experience and the like, and is beneficial to improving the efficiency of the image file accompanying and reading system. Optionally, the obtaining the theme information of the image file currently used by the user includes identifying the image file, obtaining identification information corresponding to the image file, and obtaining the theme information of the image file from a preset database according to the identification information. The method can directly obtain the theme information from the database, and the system can more comprehensively understand the content of the image file by obtaining the theme information, so that the problem posed by the user can be better answered. The method comprises the steps of obtaining theme information of an image file currently used by a user, wherein the theme information comprises the steps of extracting text information corresponding to page content in the image file according to the image file, obtaining continuous text information corresponding to the image file according to the extracted text information corresponding to the page content, obtaining description information of page pictures in the image file, inputting the continuous text information, the description information of the page pictures and the page pictures in the image file into a preset large language model, and generating the theme information of the image file through the preset large language model. The method can synthesize various information of the image file by extracting the text information and the