CN-122002105-A - Method, device and medium for constructing character relation of video

CN122002105ACN 122002105 ACN122002105 ACN 122002105ACN-122002105-A

Abstract

The embodiment of the disclosure provides a character relation construction method of a video, which is applied to the technical field of video processing and analysis and comprises the steps of obtaining video structure information of a target video by carrying out data processing on the target video, dividing the target video into at least two content levels according to the video structure information, and executing step-by-step abstract text generation from the lowest content level to a higher content level. And then inputting the highest content level abstract text which is obtained by the method and contains complete narrative logic into a pre-trained abstract generation model, and converting the structured information subjected to level fusion into a coherent and refined scenario abstract text. Finally, inputting the scenario abstract text into the same abstract generation model again, and utilizing a target instruction for indicating to generate character relation information, the model can infer and output character relations based on deep semantic understanding of the whole scenario, so that accurate construction of dynamic character relations is realized.

Inventors

ZHOU CHEN

Assignees

北京奇艺世纪科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260121

Claims (10)

1.A character relationship construction method of a video, comprising: Performing data processing on a target video to obtain video structure information of the target video; dividing the target video into at least two content levels according to the video structure information, wherein each content level comprises a plurality of video clips; Generating summary text of each content hierarchy step by step starting from the lowest content hierarchy, wherein the summary text of a high content hierarchy is generated based on summary text fusion of subordinate low content hierarchies; inputting abstract text of the highest content level to a pre-trained abstract generation model to generate scenario abstract text of the target video; Inputting the scenario abstract text into the abstract generation model, and outputting character relation information among characters in the target video by the abstract generation model based on a target instruction for indicating to generate the character relation information, wherein the character relation information at least comprises relation types among the characters and time sequence change information of the relation types along with the scenario.
2. The method of claim 1, wherein the inputting the scenario summary text into the summary generation model, outputting, by the summary generation model, character relationship information between characters in the target video based on target instructions indicating the generation of character relationship information, comprises: Inputting the scenario abstract text and a first target instruction into the abstract generation model, wherein the first target instruction is used for indicating the abstract generation model to execute the following operations: Identifying a target role in the scenario abstract text; Determining interaction events among different target roles, and determining episode time points, relationship states and relationship types corresponding to the interaction events; organizing the relation state and the relation type according to a scenario advancing sequence based on the interaction event and the corresponding episode time point, and generating time sequence change information of the relation state; Constructing a relationship graph according to the target roles and the time sequence change sequence, wherein the edges of the relationship graph are used for representing relationship information between the roles, and the relationship information comprises relationship types between the target roles and the time sequence change information; And outputting the relation map by the abstract generating model.
3. The method according to claim 2, wherein the method further comprises: And associating time sequence change information in the relation map with one or more corresponding key event evidences, wherein the key event evidences comprise event descriptions which come from the scenario abstract text and reflect relation state transition.
4. A method according to claim 3, characterized in that the method further comprises: Presenting the relationship map; And responding to the triggering operation of the edge in the relation graph interface, and jumping and playing the target video clip corresponding to the time stamp according to the time stamp corresponding to the key event evidence associated with the edge.
5. The method according to any one of claims 1-4, further comprising: Inputting the scenario abstract text and a second target instruction into the abstract generation model, wherein the first target instruction is used for instructing the abstract generation model to execute the following operations: Identifying a target role in the scenario abstract text; determining static attribute information of the target role in the scenario, wherein the static attribute information comprises at least one of identity attribute and occupation attribute; determining behavior mode characteristics and character characteristics of the target character in the scenario advancing process; Determining at least one key event associated with the target character, and organizing the at least one key event according to a scenario development sequence to form a key event sequence, wherein the key event is an event description showing character development or relation state transition of the target character; Generating a character portrait of the target character based on the static attribute information, the behavior mode characteristics, the character characteristics and the key event sequence; A character representation of one or more target characters is output by the abstract generation model.
6. The method of claim 1, wherein the abstract generation model is obtained by using a scenario abstract text including character relation labels as a training sample, calculating a difference value by comparing character relation information generated by the abstract generation model with the character relation labels in the training sample, and performing iterative training according to the difference value, wherein the character relation labels at least include relation types among characters and time sequence change information of the relation types advancing along with the scenario.
7. The method of claim 1, wherein the video structure information includes a list of timestamps of shot cuts, subtitle information, character information, and a list of timestamps of scene cuts of the target video; Determining a plurality of shot fragments from the target video according to the shot switching timestamp list, and generating abstract text corresponding to each shot fragment according to subtitle information and role information corresponding to each shot fragment; According to the timestamp list of scene switching, the plurality of shot segments are aggregated to obtain a plurality of scenes, and according to the shot segments aggregated under each scene, the abstract text corresponding to each shot segment and the subtitle information, the scene abstract text corresponding to each scene is generated; And inputting each scene abstract text into a pre-trained abstract generation model to generate the scenario abstract text of the target video.
8. The method of claim 7, wherein generating the summary text for each shot segment based on the subtitle information and the character information in each shot segment comprises: and generating abstract text of each shot segment according to the subtitle information, the role information and the emotion information of the background music in each shot segment.
9. A computing device comprising a processor, a memory for storing instructions executable by the processor, the processor for reading the executable instructions from the memory and executing the instructions to implement the method of any of claims 1-8.
10. A computer readable storage medium, characterized in that the storage medium stores a computer program for executing a method for implementing any of claims 1-8.

Description

Method, device and medium for constructing character relation of video Technical Field The disclosure relates to the technical field of video processing and analysis, and in particular relates to a method, equipment and medium for constructing a character relationship of a video. Background With the popularization of video contents such as television dramas and series dramas on a streaming media platform, a user has difficulty in rapidly and accurately understanding complex character relations and evolution histories thereof when facing tens of to hundreds of hours of dramas. Whether the general audience makes a chasing decision or the professional performs secondary creation or marketing analysis of the content, urgent demands are made on the dynamic relationship among roles in the automated and deep-resolution video. Therefore, how to automatically construct the character relationship with time sequence evolution characteristic from massive video data has become a key technical problem for improving the understanding and intelligent application value of video content. Currently, the prior art mainly adopts a discrete processing mode for understanding the human relation in the video. On the one hand, based on the computer vision technology, such as face detection and clustering, different people appearing in the video can be identified and tracked, but only a list of the appearing characters can be generated, and the social relationship between the characters cannot be known. On the other hand, shallow associations between characters can be analyzed based on natural language processing techniques, such as extracting co-occurrence relationships from subtitles or text in contrast, which typically process visual or text cues in isolation, resulting in static and sporadic results. The prior art cannot realize effective fusion and deep semantic reasoning of multi-mode information, so that serious defects exist in a core task of character relation generation, and further experience of a user is poor. Disclosure of Invention In order to solve the above technical problems or at least partially solve the above technical problems, the present disclosure provides a method, apparatus, and medium for constructing a character relationship of a video. The embodiment of the disclosure provides a person relationship construction method of video, which comprises the steps of obtaining a shot switching timestamp list, subtitle information, role information and a scene switching timestamp list by extracting data of a target, and providing detailed bottom layer data support for subsequent analysis. Then, the target video can be disassembled into a plurality of shot fragments according to the shot switching time stamp, and accurate shot-level abstract text is generated by combining subtitle information and role information in each shot fragment, so that preliminary understanding and summarization of each shot content are realized. Then, the continuous shot fragments can be aggregated into scenes by using the scene switching time stamp, and abstract texts and subtitle information of all shots in the scenes are fused to generate scene-level abstract texts, so that core drama and emotion trend of the scenes are further refined. Finally, summarizing and inputting all the scene abstract texts into an abstract generating model subjected to deep training, comprehensively considering the logic association and emotion venation among all the scenes, and generating complete scenario abstract texts covering the key scenario, character development and emotion conflict of the target video. The process effectively overcomes the defects of the prior art through multi-level and multi-dimensional data integration and analysis, realizes deep semantic understanding of complex videos, accurately presents core roles and dynamic relations thereof, and meets the dual requirements of rapid mastering of the scenario overview and deep analysis of scenario details of users. The embodiment of the disclosure also provides a computing device, which comprises a processor, a memory for storing executable instructions of the processor, and the processor, wherein the processor is used for reading the executable instructions from the memory and executing the instructions to realize the character relation construction method of the video. The present disclosure also provides a computer-readable storage medium storing a computer program for executing the person relationship construction method of a video as provided by the embodiments of the present disclosure. Drawings The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and