CN-121979474-A - Interactive broadcasting method and system

CN121979474ACN 121979474 ACN121979474 ACN 121979474ACN-121979474-A

Abstract

The invention provides an interactive broadcasting method and system, and relates to the technical field of intelligent vehicle man-machine interaction. The method comprises the steps of responding to an operation instruction of a user on a graphical interface, obtaining target material data and broadcasting permission of the target user, uploading the target material data and the broadcasting permission of the target user to a cloud platform, generating target data packages of different asset combinations of invokable images, sound rays and permission in voice broadcasting application based on sound ray characteristics, driving signals determined by audio frequency characteristics, pixel level characteristics and the broadcasting permission in the target material data, invoking corresponding voice broadcasting images and voice broadcasting data from the target data packages to synchronously broadcast according to the current real-time application scene of a vehicle and the corresponding broadcasting permission of the real-time target user, enriching the image and sound ray selection of voice broadcasting, reducing the occupation of computer and memory of a vehicle, and improving the emotion and individuation experience of man-machine interaction.

Inventors

WEN LINTAO
ZHU SHUANG
XIA YONG

Assignees

奇瑞汽车股份有限公司

Dates

Publication Date: 20260505
Application Date: 20260127

Claims (10)

1. An interactive broadcasting method, comprising: responding to an operation instruction of a user aiming at a graphical interface, acquiring target material data and broadcasting permission of a target user, and uploading the target material data and the broadcasting permission of the target user to a cloud platform; Generating a target data packet for calling different asset combinations of images, sound rays and rights in a voice broadcasting application based on the sound ray characteristics and the audio characteristics in the target material data and the pixel-level characteristics and the broadcasting rights in the target material data; And according to the real-time application scene of the current vehicle and the broadcasting permission corresponding to the real-time target user, calling corresponding voice broadcasting images and voice broadcasting data from the target data packet to synchronously broadcast.
2. The method according to claim 1, wherein the step of acquiring and uploading the target material data and the broadcasting rights of the target user to the cloud platform in response to an operation instruction of the user for the graphical interface comprises: responding to a first operation instruction of a user for a graphical interface, and determining the type of the target material to be uploaded; Acquiring face images of a target user and/or acquiring preset picture materials, and acquiring preset voice materials; And when the current network state accords with a preset condition, controlling the target material data to upload the cloud platform from the corresponding data inlet according to the target material type.
3. The method according to claim 2, wherein the step of acquiring and uploading the target material data and the broadcasting rights of the target user to the cloud platform in response to the operation instruction of the user for the graphical interface, further comprises: Responding to a second operation instruction of a user aiming at a graphical interface, executing at least one permission editing operation to determine broadcasting permission, wherein the application permission of the target material data of the corresponding type is allocated to at least one target user account associated with the user, and the voice broadcasting permission of the corresponding scene is allocated to at least one target user account associated with the user; And synchronously uploading the broadcasting permission to a cloud platform.
4. The method of claim 1, wherein the step of generating a target data packet for invoking different combinations of figures, sound rays and rights in a voice broadcast application based on the driving signals determined by the sound ray features and the audio features in the target material data, the pixel level features in the target material data, and the broadcast rights, comprises: Generating a voice broadcasting image based on a driving signal determined by the audio characteristics in the target material data and pixel-level characteristics extracted by a preset network model; performing voice recognition on preset voice materials in the target material data to obtain sound line characteristics, and training to generate voice broadcasting data; And correlating the voice broadcasting data with the voice broadcasting image, and combining the broadcasting authority to obtain a target data packet.
5. The method of claim 4, wherein the step of generating the voice broadcasting character based on the driving signal determined by the audio feature in the target material data and the pixel-level feature extracted via the preset network model comprises: invoking a pre-trained variation self-encoder to extract pixel-level features from face images and/or preset picture materials in the target material data, and encoding the pixel-level features into feature vectors; Mapping the audio characteristics extracted from the preset voice materials of the target material data into a driving signal characteristic matrix matched with the characteristic vector through a transducer network based on preset basic action parameters and/or the audio characteristics extracted from the preset voice materials of the target material data, wherein the driving signal characteristic matrix is used for adjusting lip-shaped actions of the voice broadcasting image based on the audio characteristics; And carrying out alignment fusion on the characteristic vector and the driving signal characteristic matrix, and generating a continuous animation frame for representing the voice broadcasting image through a diffusion converter.
6. The method according to claim 1, wherein before the step of invoking the corresponding voice broadcasting image and the voice broadcasting data from the target data packet for synchronous broadcasting according to the real-time application scene of the current vehicle and the broadcasting authority corresponding to the real-time target user, the method further comprises: and verifying the integrity and the authenticity of the target data packet, and extracting the target data packet passing the verification according to the adaptation format of the current vehicle.
7. The method according to claim 1 or 6, wherein the step of calling the corresponding voice broadcasting image and voice broadcasting data from the target data packet to synchronously broadcast according to the real-time application scene of the current vehicle and the broadcasting authority corresponding to the real-time target user comprises the following steps: Monitoring a real-time target user logged in a current vehicle and a real-time application scene where the current vehicle is located; According to the broadcasting permission, determining a first voice broadcasting image and first voice broadcasting data which are allowed to be applied by the real-time target user in the target data packet; And respectively calling corresponding second voice broadcasting images and second voice broadcasting data from the first voice broadcasting images and the first voice broadcasting data to synchronously broadcast sound and picture based on the real-time application scene and/or in response to a third operation instruction of a user for a graphical interface.
8. An interactive broadcasting system, comprising: the uploading module is used for responding to an operation instruction of a user for the graphical interface, acquiring target material data and broadcasting permission of the target user and uploading the target material data and broadcasting permission of the target user to the cloud platform; the generation module is used for generating a target data packet for calling different asset combinations of images, sound rays and rights in the voice broadcasting application based on the sound ray characteristics and the driving signals determined by the sound ray characteristics and the sound frequency characteristics in the target material data and the broadcasting rights in the target material data; and the broadcasting module is used for calling corresponding voice broadcasting images and voice broadcasting data from the target data packet to synchronously broadcast according to the real-time application scene of the current vehicle and the broadcasting permission corresponding to the real-time target user.
9. An electronic device comprising a memory, a processor and a program stored on the memory and capable of running on the processor, the processor implementing the method of any one of claims 1 to 7 when executing the program.
10. A computer readable storage medium, characterized in that the computer program is stored in the readable storage medium, which computer program, when executed, implements the method of any of claims 1-7.

Description

Interactive broadcasting method and system Technical Field The invention relates to the technical field of intelligent vehicle control, in particular to an interactive broadcasting method and system. Background With the continuous improvement of the intelligent level of automobiles and the increasing demands of consumers on vehicle-mounted experience, vehicle-mounted man-machine interaction systems are developing from pure functional interaction to the direction of fusion of emotional design. Currently, a mainstream vehicle-mounted system generally adopts a virtual voice assistant scheme based on computer vision and voice synthesis technology, and the system generally collects image data of the face of a passenger through a camera in the vehicle, performs facial feature extraction and modeling by using a local computing unit, and finally generates an avatar associated with the appearance feature of the passenger. In terms of voice interaction, the system relies on text-to-speech (TTS) technology to generate personalized speech output by capturing voiceprint features of a particular user. In the prior art, the generation process of the virtual image highly depends on the real-time computing capability of the vehicle-mounted chip, and the computation-intensive tasks including face recognition, feature extraction, three-dimensional modeling and the like need to be processed. Meanwhile, the system must continuously collect and process multimodal data from in-vehicle sensors, including but not limited to facial images collected by high definition cameras, voice data collected by microphone arrays, and occupant profile information collected by seat pressure sensors. The data acquisition and processing mechanism has two remarkable limitations that on one hand, the image library and the voice library generated by the system are limited by the diversity of actual passengers, and when the main passengers are fixed, the system is difficult to provide abundant interaction choices, and on the other hand, the localized data processing flow can cause continuous pressure on vehicle-mounted computing resources, and the problems of computation delay or data loss and the like are easily caused under complex driving scenes. Disclosure of Invention The invention aims to provide an interactive broadcasting method and system for alleviating technical problems in the prior art. In a first aspect, the present invention provides an interactive broadcasting method, including: responding to an operation instruction of a user aiming at a graphical interface, acquiring target material data and broadcasting permission of a target user, and uploading the target material data and the broadcasting permission of the target user to a cloud platform; Generating a target data packet for calling different asset combinations of images, sound rays and rights in a voice broadcasting application based on the sound ray characteristics and the audio characteristics in the target material data and the pixel-level characteristics and the broadcasting rights in the target material data; And according to the real-time application scene of the current vehicle and the broadcasting permission corresponding to the real-time target user, calling corresponding voice broadcasting images and voice broadcasting data from the target data packet to synchronously broadcast. In an optional embodiment, the step of acquiring and uploading the target material data and the broadcasting permission of the target user to the cloud platform in response to an operation instruction of the user for the graphical interface includes: responding to a first operation instruction of a user for a graphical interface, and determining the type of the target material to be uploaded; Acquiring face images of a target user and/or acquiring preset picture materials, and acquiring preset voice materials; And when the current network state accords with a preset condition, controlling the target material data to upload the cloud platform from the corresponding data inlet according to the target material type. In an optional embodiment, in response to an operation instruction of the user for the graphical interface, the step of acquiring and uploading the target material data and the broadcasting permission of the target user to the cloud platform further includes: Responding to a second operation instruction of a user aiming at a graphical interface, executing at least one permission editing operation to determine broadcasting permission, wherein the application permission of the target material data of the corresponding type is allocated to at least one target user account associated with the user, and the voice broadcasting permission of the corresponding scene is allocated to at least one target user account associated with the user; And synchronously uploading the broadcasting permission to a cloud platform. In an optional embodiment, based on the driving signal determined by the sound ray