CN-121996837-A - Dialogue method, dialogue device, electronic equipment and storage medium

CN121996837ACN 121996837 ACN121996837 ACN 121996837ACN-121996837-A

Abstract

The application relates to a dialogue method, a dialogue device, an electronic device and a storage medium, wherein matching scores between scene topics of a current playing scene and a plurality of interest topics in a user interest model of a watching user are analyzed in real time in the process of playing media content, personalized response information is generated in response to the matching scores meeting conditions, and the response information is output to the user, so that dialogue interaction is actively carried out with the user, an active, timely and contextually synchronous interaction mode is realized, and the immersion and participation of the user in the process of watching are greatly enhanced. The intelligent media device breaks through the limitation of understanding dimension of media, and can realize the deep capture of artistic connotation. In addition, the user interaction mode is innovated, the passive response is upgraded into active accurate interaction, scene theme labels of the currently played media content are analyzed in real time, active dialogue is triggered based on the matching condition with the user interest model, real-time synchronization of the dialogue and the situation is realized, and emotion resonance of the user is improved.

Inventors

FU ZHIFU
WANG HUIBO
LEI XIN

Assignees

深圳创维－RGB电子有限公司

Dates

Publication Date: 20260508
Application Date: 20251208

Claims (10)

1. A method of dialog, the method comprising: in the process of playing media content, acquiring a multi-mode data stream of the media content, and extracting a scene theme describing a current playing scene from the multi-mode data stream; Acquiring a user interest model associated with a current user, wherein the user interest model comprises at least one interest topic generated based on historical interaction data of the current user; Matching calculation is carried out on the scene subjects and at least one interest subject in the user interest model, and a target matching score is obtained; Responsive to the target match score meeting a condition, generating response information based on the multimodal data stream and the user interest model; and outputting the response information to the current user.
2. The method of claim 1, wherein the obtaining the multi-modal data stream of the media content and extracting scene topics describing the current playing scene from the multi-modal data stream comprises: acquiring a video data stream, an audio data stream and a text data stream of the media content at the current playing time; Inputting the video data stream into a pre-trained first model to identify content in the video data stream through the first model and outputting a corresponding visual concept tag; identifying music information in the audio data stream, and generating a corresponding audio concept tag; Identifying keywords and topic labels in the text data stream to obtain corresponding text concept labels; and carrying out semantic normalization processing on the visual concept tag, the audio concept tag and the text concept tag to obtain a scene theme describing the current playing scene.
3. The method of claim 1, wherein the matching the scene topic with at least one topic of interest in the user interest model to obtain a target matching score comprises: The following is performed for each topic of interest in the user interest model: acquiring an interest weight corresponding to the interest subject; Calculating the similarity between the interest theme and the scene theme; Performing setting operation according to the similarity and the interest weight to obtain a matching score between the interest topic and the scene topic; and determining target matching scores according to the matching scores corresponding to the interest topics.
4. The method of claim 3, wherein the obtaining the interest weight corresponding to the interest topic comprises: acquiring last dialogue data associated with the interest subject; Analyzing the last dialogue data to obtain calculation factors and time information; determining new weight data based on the calculation factor and old weight data of the interest topic, and determining a time decay factor based on the time information; and calculating the product of the new weight data and the time attenuation factor to obtain the interest weight corresponding to the interest subject.
5. The method of claim 4, wherein the determining new weight data based on the calculation factor and old weight data of the topic of interest and determining a time decay factor based on the time information comprises: analyzing the calculation factors to obtain dialogue turn factors, active questioning factors and emotion score factors; calculating new weight data according to the dialogue round factor, the active questioning factor, the emotion score factor and old weight data associated with the interest theme; and determining a difference value between the current time and the time information, and calculating a time attenuation factor according to the difference value and a preset time attenuation parameter.
6. The method of claim 1, wherein the generating response information based on the multimodal data stream and the user interest model comprises: based on the multi-modal data stream and the user interest model, generating structured response information and plain text response information; and determining the structured response information and the plain text response information as response information.
7. The method according to claim 1 or 6, wherein said outputting said response information to said current user comprises: Acquiring the account subscription state of the current user; Outputting the structured response information in the response information to the current user under the condition that the account subscription state is a first state; and under the condition that the account subscription state is the second state, outputting the plain text response information in the response information to the current user.
8. A dialog device, the device comprising: The scene theme extraction module is used for acquiring a multi-mode data stream of the media content in the process of playing the media content and extracting scene themes describing a current playing scene from the multi-mode data stream; The interest model acquisition module is used for acquiring a user interest model associated with a current user, wherein the user interest model comprises at least one interest topic generated based on historical interaction data of the current user; the target matching score determining module is used for carrying out matching calculation on the scene theme and at least one interest theme in the user interest model to obtain a target matching score; The response information generation module is used for responding to the target matching score meeting the condition and generating response information based on the multi-mode data stream and the user interest model; and the response information output module is used for outputting the response information to the current user.
9. An electronic device comprising a processor and a memory, the processor being configured to execute a dialog control program stored in the memory to implement the dialog method of any of claims 1-7.
10. A storage medium, characterized in that the storage medium stores one or more programs, the one or more programs are executable by one or more processors to implement the dialog method of any of claims 1-7.

Description

Dialogue method, dialogue device, electronic equipment and storage medium Technical Field The present application relates to the field of artificial intelligence technologies, and in particular, to a dialogue method, a dialogue device, an electronic device, and a storage medium. Background With the development of the Internet, the rapid maturation and popularization of large language model technology, the integration of artificial intelligence and multimedia interaction becomes the core development direction of the industry, and the existing intelligent media playing equipment is not satisfied with the basic function of playing media content, but is upgraded to the human-computer interaction aspect. However, the existing intelligent media playing device is limited to passive interaction behavior of content recommendation before watching, and cannot solve the problem of real-time interaction in user watching, and the problem that understanding dimension of media content is single, emotion and experience of a user cannot be synchronized in an interaction process, and interaction level with the user is shallow is solved. Disclosure of Invention The application provides a dialogue method, a dialogue device, an electronic device and a dialogue storage medium, which are used for solving the problems that the existing intelligent media playing device has shallow interaction level with a user, the understanding dimension of media content is single, and the emotion of the user cannot be synchronized in real time in interaction. In a first aspect, the present application provides a method of dialog, the method comprising: in the process of playing media content, acquiring a multi-mode data stream of the media content, and extracting a scene theme describing a current playing scene from the multi-mode data stream; Acquiring a user interest model associated with a current user, wherein the user interest model comprises at least one interest topic generated based on historical interaction data of the current user; Matching calculation is carried out on the scene subjects and at least one interest subject in the user interest model, and a target matching score is obtained; Responsive to the target match score meeting a condition, generating response information based on the multimodal data stream and the user interest model; and outputting the response information to the current user. In a possible implementation manner, the obtaining the multi-mode data stream of the media content and extracting the scene theme describing the current playing scene from the multi-mode data stream include: acquiring a video data stream, an audio data stream and a text data stream of the media content at the current playing time; Inputting the video data stream into a pre-trained first model to identify content in the video data stream through the first model and outputting a corresponding visual concept tag; identifying music information in the audio data stream, and generating a corresponding audio concept tag; Identifying keywords and topic labels in the text data stream to obtain corresponding text concept labels; and carrying out semantic normalization processing on the visual concept tag, the audio concept tag and the text concept tag to obtain a scene theme describing the current playing scene. In a possible implementation manner, the matching calculation of the scene topic and at least one interest topic in the user interest model to obtain a target matching score includes: The following is performed for each topic of interest in the user interest model: acquiring an interest weight corresponding to the interest subject; Calculating the similarity between the interest theme and the scene theme; Performing setting operation according to the similarity and the interest weight to obtain a matching score between the interest topic and the scene topic; and determining target matching scores according to the matching scores corresponding to the interest topics. In a possible implementation manner, the obtaining the interest weight corresponding to the interest topic in the user interest model includes: acquiring last dialogue data associated with the interest subject; Analyzing the last dialogue data to obtain calculation factors and time information; determining new weight data based on the calculation factor and old weight data of the interest topic, and determining a time decay factor based on the time information; and calculating the product of the new weight data and the time attenuation factor to obtain the interest weight corresponding to the interest subject. In a possible implementation manner, the determining new weight data based on the calculation factor and the old weight data of the interest topic, and determining a time attenuation factor based on the time information includes: analyzing the calculation factors to obtain dialogue turn factors, active questioning factors and emotion score factors; calculating new weight dat