CN-121996072-A - Vehicle-mounted voice control method, electronic equipment and vehicle

CN121996072ACN 121996072 ACN121996072 ACN 121996072ACN-121996072-A

Abstract

The disclosure relates to a vehicle-mounted voice control method, electronic equipment and a vehicle, which are applied to the electronic equipment and relate to the technical field of vehicle control. When a user uses a voice interaction function in a vehicle, multi-mode fusion data obtained by data fusion of multi-mode data comprising user related data and vehicle related data and intention prediction of a constructed dynamic user portrait are carried out, a voice instruction recommendation list is generated based on a potential intention collection and a preset recommendation strategy, and intelligent recommendation instructions can be generated according to user preference and driving environment, so that the individuation degree and scene fitting degree of recommendation are improved, and the click rate and the user interaction efficiency of the voice instructions are improved.

Inventors

WANG HAO
WANG GUOJIE
ZHANG YAJUAN
WANG XUEYING

Assignees

长城汽车股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260123

Claims (10)

1. A vehicle-mounted voice control method, characterized in that the method comprises: responding to awakening of a voice interaction function in a vehicle, and carrying out data fusion processing on acquired multi-modal data to obtain corresponding multi-modal fusion data, wherein the multi-modal data comprises user-related data and vehicle-related data; constructing a dynamic user representation, the dynamic user representation comprising a plurality of dimension attributes characterizing a user's real-time state; Performing intention prediction processing based on the multi-mode fusion data and the dynamic user portrait to obtain a potential intention set; And generating a voice instruction recommendation list based on the potential intention set and a preset recommendation strategy, and displaying the voice instruction recommendation list.
2. The method of claim 1, wherein the performing data fusion processing on the acquired multi-modal data to obtain corresponding multi-modal fusion data includes: Extracting features of the acquired multi-modal data to obtain corresponding multi-modal feature vectors, wherein the multi-modal feature vectors comprise user preference feature vectors, situation feature vectors, vehicle state feature vectors and historical interaction feature vectors; And carrying out weighted fusion on the multi-modal feature vector based on a target weight to obtain the multi-modal fusion data, wherein the target weight is dynamically adjusted according to the multi-modal fusion data.
3. The method of claim 2, wherein the dynamic user representation includes static properties, dynamic properties, and contextual properties; Wherein said constructing a dynamic user representation comprises: Constructing the static attribute based on the user preference feature vector; Constructing the dynamic attribute based on the historical interaction feature vector; constructing the contextual attribute based on the contextual feature vector and the vehicle state feature vector; the dynamic user portraits are constructed based on the static attributes, the dynamic attributes and the contextual attributes.
4. The method of claim 1, wherein the performing intent prediction processing based on the multimodal fusion data and the dynamic user representation to obtain a set of potential intents comprises: Performing intention recognition on the multi-mode fusion data and the dynamic user portrait to obtain an actual instruction intention; performing associated prediction on the actual instruction intention based on a pre-constructed intention association tree to obtain the potential intention set, wherein the intention association tree is obtained by pre-training the multi-mode fusion data, and the potential intention set comprises a plurality of potential instruction intents.
5. The method of claim 4, wherein the performing the association prediction on the actual instruction intent based on the pre-constructed intent association tree to obtain the set of potential intents comprises: calculating the dynamic association degree between the actual instruction intention and each potential instruction intention in the intention association tree; And determining a plurality of potential instruction intents which are ranked at the front according to the dynamic association degree, and obtaining the potential intention set.
6. The method of claim 1, wherein the generating a list of voice instruction recommendations based on the set of potential intents and a preset recommendation policy comprises: instruction recommendation is carried out on the potential intention set based on the preset recommendation strategy, and a plurality of candidate voice instructions are obtained; And grading and sorting the candidate voice instructions to generate the voice instruction recommendation list.
7. The method of claim 1, wherein prior to said presenting the list of voice instruction recommendations, the method further comprises: evaluating the driving load level of the vehicle in real time according to the multi-mode fusion data; and determining a target display form based on the driving load level.
8. The method of claim 7, wherein presenting the list of voice instruction recommendations comprises: If the driving load level is high, determining that the target display form is to partially display the voice instruction recommendation list; And if the driving load level is low, determining that the target display form is to completely display the voice instruction recommendation list.
9. The method according to claim 1, wherein the method further comprises: And receiving interactive feedback behaviors of a user aiming at the voice instruction recommendation list, and updating the dynamic user portrait and optimizing the potential intention set in real time based on the interactive feedback behaviors.
10. A vehicle, characterized by comprising: A processor; A memory for storing executable instructions; Wherein the processor is configured to read the executable instructions from the memory and execute the executable instructions to implement the vehicle-mounted voice control method of any one of the preceding claims 1-9.

Description

Vehicle-mounted voice control method, electronic equipment and vehicle Technical Field The disclosure relates to the technical field of vehicle control, and in particular relates to a vehicle-mounted voice control method, electronic equipment and a vehicle. Background Along with the rapid development of the internet of vehicles technology, vehicle-mounted voice interaction becomes an important man-machine interaction mode for improving driving safety and convenience. The vehicle-mounted voice interaction can realize voice interaction between a driver and the vehicle, and provide functions such as navigation, music playing, phone call and the like. In the vehicle-mounted voice interaction, the voice instruction recommending function is used as an important component to help a user to quickly get up, guide the user to find and know the function, improve the user experience and the like. In the vehicle-mounted voice system of the main stream of the target, the voice instruction recommending function mostly adopts an immobilization and static mode. Common schemes include making carousel recommendations based on a preset hotword list, providing limited associated instructions according to a current application interface, or simply ordering recommendations according to a global frequency of use. Therefore, the method cannot adapt to the personalized use habits and preferences of different users and recommend proper instructions according to the real-time driving environment, so that the voice interaction efficiency and the user use experience are affected. Therefore, how to intelligently recommend instructions according to user preferences and driving environment is a technical problem which needs to be solved by the person skilled in the art. Disclosure of Invention In order to solve the technical problems, the disclosure provides a vehicle-mounted voice control method, electronic equipment and a vehicle. A first aspect of an embodiment of the present disclosure provides a vehicle-mounted voice control method, applied to an electronic device, including: responding to awakening of a voice interaction function in a vehicle, and carrying out data fusion processing on acquired multi-modal data to obtain corresponding multi-modal fusion data, wherein the multi-modal data comprises user-related data and vehicle-related data; constructing a dynamic user representation, the dynamic user representation comprising a plurality of dimension attributes characterizing a user's real-time state; Performing intention prediction processing based on the multi-mode fusion data and the dynamic user portrait to obtain a potential intention set; And generating a voice instruction recommendation list based on the potential intention set and a preset recommendation strategy, and displaying the voice instruction recommendation list. In some embodiments of the present disclosure, the performing data fusion processing on the acquired multi-modal data to obtain corresponding multi-modal fusion data includes: Extracting features of the acquired multi-modal data to obtain corresponding multi-modal feature vectors, wherein the multi-modal feature vectors comprise user preference feature vectors, situation feature vectors, vehicle state feature vectors and historical interaction feature vectors; And carrying out weighted fusion on the multi-modal feature vector based on a target weight to obtain the multi-modal fusion data, wherein the target weight is dynamically adjusted according to the multi-modal fusion data. In some embodiments of the present disclosure, the dynamic user representation includes static properties, dynamic properties, and contextual properties; Wherein said constructing a dynamic user representation comprises: Constructing the static attribute based on the user preference feature vector; Constructing the dynamic attribute based on the historical interaction feature vector; constructing the contextual attribute based on the contextual feature vector and the vehicle state feature vector; the dynamic user portraits are constructed based on the static attributes, the dynamic attributes and the contextual attributes. In some embodiments of the present disclosure, the performing intent prediction processing based on the multimodal fusion data and the dynamic user portrait, to obtain a potential intent set, includes: Performing intention recognition on the multi-mode fusion data and the dynamic user portrait to obtain an actual instruction intention; performing associated prediction on the actual instruction intention based on a pre-constructed intention association tree to obtain the potential intention set, wherein the intention association tree is obtained by pre-training the multi-mode fusion data, and the potential intention set comprises a plurality of potential instruction intents. In some embodiments of the present disclosure, the performing, based on the pre-constructed intent association tree, association prediction on t