CN-122027866-A - Game video generation method and device, electronic equipment and medium

CN122027866ACN 122027866 ACN122027866 ACN 122027866ACN-122027866-A

Abstract

The present application relates to the field of computer technologies, and in particular, to a method and apparatus for generating a game video, an electronic device, and a medium. The method comprises the steps of determining a plurality of candidate video segments from a material video segment, an event video segment or a virtual person explanation video segment through game content text generated based on a game log of a target object in a specified time period, matching at least two target video segments from the candidate video segments based on audio time length of a voice file corresponding to the game content text, and synchronously binding the spliced at least two target video segments with the voice file to generate a game video of the target object. Therefore, through deep fusion of text, audio and video multi-mode contents, automatic generation and conversion from game original data to personalized and large-scale exquisite game contents are realized, and development and utilization efficiency of game data resources is effectively improved.

Inventors

OU WENJIE
CHEN YIXIN
Bian Tengyue
LIN YUE
LONG TENGFEI
MENG XIANGLIN
CHEN JINGWEN
FAN WEIMENG
FU RONGHUI

Assignees

网易（杭州）网络有限公司

Dates

Publication Date: 20260512
Application Date: 20260211

Claims (16)

1. A method of generating a game video, the method comprising: obtaining a game log of a target object in a specified time period, and generating a game content text based on the game log; Determining a plurality of candidate video clips based on the game content text, wherein each candidate video clip is one of a material video clip generated based on a video material library, an event video clip generated based on a target game log corresponding to a key event participated by the target object and a virtual human explanation video clip generated based on the game content text; Matching at least two target video clips from the multiple candidate video clips based on the audio time length of the voice file corresponding to the game content text; and synchronously binding the at least two target video clips with the voice file after splicing to generate the game video of the target object.
2. The method of claim 1, wherein the step of generating the video clips of material based on a library of video material comprises: For each original video in the video material library, adopting at least two scene segmentation algorithms to respectively calculate difference values between each group of adjacent video frames in the original video; obtaining comprehensive difference degree between each group of adjacent video frames in the original video based on the difference value corresponding to each scene segmentation algorithm and the algorithm weight of each scene segmentation algorithm, wherein each algorithm weight is preset based on the visual characteristics of video content; based on the comprehensive difference degree between each group of adjacent video frames in the original video, carrying out scene segmentation on the original video; And determining the video segments separated from each original video in the video material library as material video segments.
3. The method of claim 1, wherein the step of generating an event video clip based on a target game log corresponding to a key event in which the target object participates, comprises: determining key events participated by the target object based on text content of the game content text; acquiring a target game log corresponding to the key event, and extracting event key information from the target game log, wherein the event key information comprises at least one of event type, participating entity and event result; based on the prompting words corresponding to the event types and the event key information, guiding an image generation model to generate a plurality of single-frame static images; performing time sequence expansion on the plurality of single-frame static images through a video generation model to obtain an initial dynamic video segment; And executing mirror-transporting processing on the initial dynamic video segment based on the geographic coordinates of the target elements associated with the key event, and generating an event video segment.
4. The method of claim 3, wherein the performing a mirror process on the initial dynamic video segment based on the geographic coordinates of the target elements associated with the key event generates an event video segment, comprising: determining a lens focusing position and a lens movement path based on geographic coordinates of target elements associated with the key event in the game world; and executing the lens operation of the virtual camera on the initial dynamic video segment according to the lens focusing position and the lens movement path, and generating an event video segment containing a dynamic lens operation effect.
5. The method of claim 4, wherein determining the lens focus location and lens movement path based on geographic coordinates of the key event associated target element in the game world comprises: Converting the screen coordinates on the canvas of the rendered large map by a coordinate mapping algorithm; And determining a lens focusing position and a lens movement path based on the occurrence position of the key event represented by the screen coordinates and the spatial distribution of each target element associated with the key event under a screen coordinate system.
6. The method of claim 1, wherein the step of generating a virtual human narrative video segment based on the game content text comprises: converting the game content text into a voice file, and acquiring an alignment time stamp of corresponding pronunciation in the voice file and each character in the game content text on a voice time axis based on a forced alignment algorithm; Generating facial animation and body animation of the virtual person according to the alignment time stamp, the semantic and emotion information of the game content text and the voice rhythm information of the voice file; And loading a preset 3D scene, and synchronously playing the facial animation and the body animation to generate a virtual human explanation video clip.
7. The method of claim 1, wherein the generating game content text based on the game log comprises: dividing the specified time period into a plurality of sub-time periods; For each sub-time period, extracting game events in the sub-time period from the game log, and classifying the game events in the sub-time period according to event types; for each class event, calculating an initial importance score for each event based on event impact, number of participants, and duration; invoking a large language model to perform semantic understanding on each game event, generating a target importance score with enhanced semantics, and screening out key events under each event type according to the target importance score; Taking each key event corresponding to the sub-time period as input, calling a large language model, combining a preset sub-report template and a preset poetry narrative style instruction, and generating sub-review text of the sub-time period; and generating the game content text based on sub-review text of each sub-time period in the designated time period.
8. The method of claim 5, wherein generating the game content text based on sub-review text for each sub-time period within the specified time period comprises: integrating sub-review texts of each sub-time period in the appointed time period according to time sequence, and extracting major event clues and development venues of the sub-time periods; And calling a large language model to combine with a prompting word corresponding to a preset poetry narrative style by taking the integration result of each sub-review text and the extracted major event clues and development venues as inputs, and generating the game content text.
9. The method of claim 1, wherein after said generating said game content text, before said determining a multi-segment candidate video clip, said method further comprises: performing sentence segmentation on the game content text to obtain a plurality of sentence units with complete semantics; Calling a large language model, and screening target game logs matched with statement units from game logs in a specified time period based on semantic features of each statement unit; and establishing a mapping relation between the screened target game log and the corresponding statement unit, and generating a first mapping data table containing game content texts, statement unit start and stop position marks, item identifiers of the target game log and start and stop moments of a designated time period.
10. The method according to claim 1, wherein the method further comprises: Invoking a pre-trained text semantic model, carrying out vectorization processing on video description texts corresponding to all video fragments, generating semantic vectors for matching the text with video semantics, and storing the semantic vectors into a video semantic vector list; analyzing tag information of each video segment to construct a tag vocabulary, and establishing a corresponding video index list for each tag; and constructing a second mapping data table based on the video semantic vector list, the tag vocabulary and the video index list.
11. The method of claim 1, wherein the determining a multi-segment candidate video segment based on the game content text comprises: analyzing the game content text to extract keywords, executing semantic tag rule matching according to a preset matching rule, screening out initially matched video fragments and creating a candidate video index list; Aiming at each statement unit in the game content text, arranging the candidate video clips in a descending order based on the similarity between the semantic feature of each statement unit and the label information of each video clip in the candidate video index list; Selecting at least one video segment from the sequenced video segments corresponding to each sentence unit as a candidate video segment matched with the sentence unit; and summarizing the candidate video clips corresponding to all sentence units to obtain multi-segment candidate video clips matched with the game content text.
12. The method according to claim 1, wherein the matching at least two target video clips from the multiple candidate video clips based on the audio duration of the voice file corresponding to the game content text includes: setting the audio time length as an initial remaining time length; Traversing the multiple segments of candidate video segments according to a preset sequence, and determining the distribution duration of the target video segments corresponding to the candidate video segments according to the original duration and the current residual duration of the candidate video segments aiming at the currently traversed candidate video segments; Creating a target video clip object, wherein the target video clip object comprises identification information, original duration and the allocation duration of the selected candidate video clip; deducting the distribution time length from the current residual time length, and updating the residual time length; And stopping traversing when the residual duration is reduced to zero, and outputting the created target video segment object set, wherein the sum of the distribution duration of each target video segment is equal to the audio duration.
13. The method of claim 1, wherein the step of synchronously binding the at least two target video segments after splicing with the voice file to generate the game video of the target object comprises: Calling a video stitching algorithm to stitch the target video segments after duration distribution to generate an intermediate video; Loading the voice file and constructing a complete audio track, wherein the audio track and the intermediate video are synchronously bound with each other in an audio-video mode, and a video object with synchronous audio-video is generated; Based on the game content text and the total duration of the corresponding audio track, sentence segmentation is carried out on the game content text by combining text semantic integrity, and a plurality of text fragments are generated; And calculating the corresponding display time according to the character duty ratio of each text segment, generating a subtitle object with display position and display time attribute, and overlapping the subtitle object to the video object of the audio-video synchronization to generate the game video.
14. A game video generation apparatus, the apparatus comprising: the text generation module is used for acquiring a game log of the target object in a specified time period and generating a game content text based on the game log; The multi-mode generation module is used for determining a plurality of candidate video clips based on the game content text, wherein each candidate video clip is one of a material video clip generated based on a video material library, an event video clip generated based on a target game log corresponding to a key event participated by the target object and a virtual person explanation video clip generated based on the game content text; The video determining module is used for matching at least two target video clips from the multiple candidate video clips based on the audio time length of the voice file corresponding to the game content text; and the video synthesis module is used for synchronously binding the at least two target video clips after splicing with the voice file to generate a game video of the target object.
15. An electronic device comprising a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the electronic device is in operation, the machine-readable instructions being executable by the processor to perform the steps of the method of generating a game video as claimed in any one of claims 1 to 13.
16. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the game video generation method according to any one of claims 1 to 13.

Description

Game video generation method and device, electronic equipment and medium Technical Field The present application relates to the field of computer technologies, and in particular, to a method and apparatus for generating a game video, an electronic device, and a medium. Background With the rapid development of game industry and the deep fusion of artificial intelligence technology, the requirements of players on the immersion of game experience, the individuation of content and social propagation properties are increasingly improved, particularly in the field of strategy games, the generation scale of game data presents explosive growth characteristics, thousands of combat records can be generated by a single player in a single season, and the data scale of game event data, combat logs and the like which are accumulated and generated all day by all the games can reach hundreds of thousands or even millions. However, the intelligent data processing and content automatic generation capabilities of the current game system are obviously insufficient, and the high-efficiency mining and value conversion of the massive game data are difficult, so that the potential value of the game data cannot be fully developed and utilized. In the prior art, the related game content is manufactured by relying on manual editing and manual arrangement, so that the manufacturing efficiency is low, the cost is high, and the personalized and large-scale generation of the content is difficult to realize. Therefore, a technical scheme capable of realizing intelligent and automatic generation of game related content based on massive game data is needed. It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art. Disclosure of Invention Accordingly, embodiments of the present application provide at least a method, an apparatus, an electronic device, and a medium for generating a game video, which can improve the development and utilization efficiency of game data resources. The application mainly comprises the following aspects: in a first aspect, an embodiment of the present application provides a method for generating a game video, where the method includes: obtaining a game log of a target object in a specified time period, and generating a game content text based on the game log; Determining a plurality of candidate video clips based on the game content text, wherein each candidate video clip is one of a material video clip generated based on a video material library, an event video clip generated based on a target game log corresponding to a key event participated by the target object and a virtual human explanation video clip generated based on the game content text; Matching at least two target video clips from the multiple candidate video clips based on the audio time length of the voice file corresponding to the game content text; and synchronously binding the at least two target video clips with the voice file after splicing to generate the game video of the target object. In a second aspect, an embodiment of the present application further provides a game video generating apparatus, where the apparatus includes: the text generation module is used for acquiring a game log of the target object in a specified time period and generating a game content text based on the game log; The multi-mode generation module is used for determining a plurality of candidate video clips based on the game content text, wherein each candidate video clip is one of a material video clip generated based on a video material library, an event video clip generated based on a target game log corresponding to a key event participated by the target object and a virtual person explanation video clip generated based on the game content text; The video determining module is used for matching at least two target video clips from the multiple candidate video clips based on the audio time length of the voice file corresponding to the game content text; and the video synthesis module is used for synchronously binding the at least two target video clips after splicing with the voice file to generate a game video of the target object. In a third aspect, the embodiment of the present application further provides an electronic device, including a processor, a memory, and a bus, where the memory stores machine-readable instructions executable by the processor, where the processor and the memory communicate through the bus when the electronic device is running, and where the machine-readable instructions are executed by the processor to perform the steps of the game video generating method described in the first aspect or any possible implementation manner of the first aspect. In a fourth aspect, the embodiment of the present application further provides a co