CN-115908612-B - Regional-perception image-guided story-writing method

CN115908612BCN 115908612 BCN115908612 BCN 115908612BCN-115908612-B

Abstract

According to the regional perception image guided story renewal method, keywords in story context are extracted, story development clues are obtained, knowledge graphs of the story clues are extracted in an existing large-scale knowledge graph network, and the knowledge graphs are filtered through similarity between calculation and input images. A scene graph is constructed for the input image. And matching the filtered knowledge graph with the image scene graph, and acquiring the characteristics of the matched image object and the object connected with the matched image object from the scene graph as the image key characteristics. The corresponding image region is considered to be the image key region that conforms to the story context logic. And extracting the emotion characteristics of the key region of the image by using an image emotion extractor trained on the image emotion data set. And generating a more specific, consistent and emotion-rich story ending by using the story context, the filtered knowledge graph, the selected image key features and the emotion features of the image key regions.

Inventors

HUANG QINGBAO
LI ZHIGANG
Li Pijian

Assignees

广西大学

Dates

Publication Date: 20260505
Application Date: 20221121

Claims (7)

1. The regional perception image-guided story-writing method is characterized by comprising the following steps of: S1, respectively inputting story contexts And leading an ending image Extracting keywords in the story context sentence, and connecting the obtained keywords in sequence to form a story context development clue ; S2, extracting the story context development clues in the step S1 from the large-scale knowledge base ConceptNet Knowledge of medium key words Knowledge graph of composition Calculating a knowledge graph Concept of (a) and input image Similarity scores among the knowledge graphs are selected, concepts that the similarity scores meet a threshold value are selected, and filtered knowledge graphs are obtained ; S3, identifying objects in the image and relations among the objects, and constructing a scene graph of the image ; S4, filtering the knowledge graph in the step S2 And the scene graph obtained in step S3 Matching to obtain matched image object ; S5, constructing a scene graph from the step S3 In (3) selecting the image object matched in step S4 And objects directly connected thereto As key features of an image ; S6, obtaining an image object in the step S5 And objects directly connected thereto Position in original image Calculating key features of an image Corresponding image key region Center coordinates of (c) And the key region of the image Is of a width of (1) And height of Obtaining key areas of the image Coordinates of (c) ; S7, acquiring the key region of the image in the step S6 by using an image emotion extractor trained on the image emotion data set Is of emotion characteristics of (a) ; S8, utilizing the story context of the step S1 Knowledge graph filtered in step S3 The key features of the image selected in the step S5 And S7, emotion characteristics of the key region of the image obtained in the step Generating an end of story; the story context described in step S8 utilizing step S1 Knowledge graph filtered in step S3 The key features of the image selected in the step S6 And S7, emotion characteristics of the key region of the image Generating story ends The process of (1) is expressed as follows: , In the formula, , Representing the end of the generation And (5) personal words.
2. The method of claim 1, wherein step 1 of obtaining keywords in a story context sentence uses a RAKE keyword recognition tool.
3. The method of claim 1, wherein the keyword knowledge in step S2 is The expression is: , Wherein, the Representing the entity used for the query, Representing the AND in ConceptNet The word(s) of interest is (are) a word, Representing the relationship between the two; The knowledge graph The expression is: 。
4. A method of region-aware image-guided story line writing of claim 1, wherein the scene graph of the image of step S3 The expression of (2) is: , Wherein, the Representing a set of objects identified in the image, Representing the relationship between a set of objects.
5. The method of claim 1, wherein the image object is an image object in step S5 Directly connected objects Edges formed therebetween The expression of (2) is: , In the formula, Representing the image object matched in step S5 And objects directly connected thereto in the scene graph The relation exists between the two, and the content information of the key area which accords with the story development in the image is acquired In which, in the process, Representative slave scene graph Selected edges have And each.
6. The method of claim 1, wherein the image key features in step S5 are Center coordinates of the corresponding image key region The calculation formula of (2) is as follows: , , The key region of the image Is of a width of (1) And height of The calculation formula of (2) is as follows: , , In the formula, Represent the first The center coordinates of the individual image objects, Is the first The width of the individual image objects is such that, Is the first The height of the individual image objects.
7. The method of claim 1, wherein the emotional characteristics of the key image areas in step S7 The acquisition mode is as follows: , Wherein the method comprises the steps of Is an image emotion extractor trained on an image emotion dataset.

Description

Regional-perception image-guided story-writing method Technical Field The invention relates to the technical field of natural language processing and the field of computer vision, in particular to a region-aware image-guided story-writing method. Background An image-guided end-of-story task is to generate an end sentence for an incomplete story context that conforms to both context logic and information for a given image under the constraints of the given image. The main challenge of the task is how to reasonably utilize the image and text information, so that the generated end has good logic property and meets the constraint of the image content and emotion. Early work was mainly directed to plain text story-end writing tasks, which, due to the constraints of the task itself, often resulted in some generalized, generic end, the content being not specific enough. Later work suggests an image-guided end-of-story task, i.e., end-of-story content is generated in the end-of-story task with an end-related image constraint, making the generated result more specific and realistic. However, the existing model only uses the characteristics of the whole image, and does not consider the relationship between the region in the image and the story context, so that pertinence is lacking. While using the image as a guide for ending generation naturally ensures that the emotion of the ending generation matches the emotion information conveyed by the image. In addition, common sense knowledge is widely applied to the task of text generation to expand existing information and help models to deepen understanding of story context information. However, for the image-guided story-ending renewal task, the simply acquired common sense knowledge has interference information which does not coincide with the image content, so that the knowledge needs to be filtered, and the knowledge related to the image content is selected to help the model understand the input information of different modes. Disclosure of Invention Aiming at the existing image story ending writing method, the relation between the image content and story context logic is not highlighted, the guiding effect of the image emotion characteristics on the ending is not considered, and the effect of common sense knowledge is not considered, the invention provides a region-aware image guiding story writing method, which comprises the following specific scheme: An area-aware image-guided story-writing method, comprising the steps of: S1, respectively inputting a story context X and an image I at the end of guiding, extracting keywords in a story context sentence, and connecting the obtained keywords according to a sequence to form a story context development clue K; s2, extracting knowledge of keywords in the story context development clues K in the step S1 from a large-scale knowledge base ConceptNet The composed knowledge graph G R calculates the similarity score between the concepts in the knowledge graph G R and the input image I, picks the concepts with the similarity score meeting the threshold value, and acquires the filtered knowledge graph S3, identifying objects in the image and the relation between the objects, and constructing a scene graph G I of the image; s4, filtering the knowledge graph in the step S2 Matching with the scene graph G I obtained in the step S3 to obtain a matched image object S5, selecting the image object matched in the step S4 from the scene graph G I constructed in the step S3And objects directly connected theretoFeatures of all these objects as key features of the image S6, according to the image object in the step S5And objects directly connected theretoPosition (x, y, w, h) in original image, calculating and selecting key features of imageCenter coordinates of the corresponding image key region I subAnd the width of the critical area I sub of the imageAnd height ofObtaining coordinates of key areas of an image S7, acquiring the emotion feature f senti of the image key region I sub in the step S6 by using an image emotion extractor trained on the image emotion data set; s8, utilizing the story context X of the step S1 and the knowledge graph filtered in the step S3 S5, selecting key features of the imageAnd S7, generating the story ending by using the emotion characteristics f senti of the acquired image key region. Further, the keywords in the story context sentences obtained in step 1 use a RAKE keyword recognition tool. Further, the common sense knowledge described in step S2The expression is: Wherein H represents an entity for query, T represents a word related to H in ConceptNet, and R represents a relation between the two; The knowledge graph G R has the expression: further, the expression of the scene graph G I of the image described in step S3 is: GI＝(NI,VI) Where N I＝(NI1,NI2,…,NIo) represents a set of objects identified in the image and V I＝(VI1,VI2,…,VId) represents a relationship between the set of objects. Further, the image o