CN-117854493-B - Vehicle cabin voice intention recognition method and device and vehicle control method

CN117854493BCN 117854493 BCN117854493 BCN 117854493BCN-117854493-B

Abstract

The application discloses a vehicle cabin voice intention recognition method, a device and a vehicle control method, which are characterized in that text data of current voice input of a user, current expression limb data of the user and internal and external environment data of a vehicle are obtained; the method comprises the steps of obtaining a first intention recognition result by utilizing a multi-classification model, carrying out user intention recognition on current voice input text data of a user by utilizing the multi-classification model to obtain the first intention recognition result, carrying out user intention recognition on the basis of current voice input text data of the user, historical voice input text data of the user, current expression limb data of the user and internal and external environment data of a vehicle by utilizing the multi-pattern recognition model to obtain a second intention recognition result, and fusing the first intention recognition result and the second intention recognition result to obtain a user real intention recognition result. Through fusing intention recognition results of the multi-classification model and the multi-modal recognition model, accuracy of speech intention understanding can be improved, and therefore intelligence and user experience of a vehicle cabin dialogue system are improved.

Inventors

CAO MING
ZHANG XUAN
YIN CHAOJUN

Assignees

浙江吉利控股集团有限公司
吉利汽车研究院（宁波）有限公司

Dates

Publication Date: 20260505
Application Date: 20240108

Claims (10)

1. A method for recognizing a voice intention of a vehicle cabin, comprising: Acquiring text data input by current voice of a user, current expression limb data of the user and internal and external environment data of a vehicle; Performing user intention recognition on the text data input by the current voice of the user by utilizing a multi-classification model so as to obtain a first intention recognition result; Performing user intention recognition based on the current voice input text data of the user, the historical voice input text data of the user, the current expression limb data of the user and the internal and external environment data of the vehicle through a multi-mode recognition model so as to obtain a second intention recognition result; fusing the first intention recognition result and the second intention recognition result to obtain a user real intention recognition result; The method comprises the steps of fusing a first intention recognition result with a second intention recognition result to obtain a user true intention recognition result, fusing different and predicted probabilities of the first intention recognition result and the second intention recognition result based on the first intention recognition result and the predicted intention in the second intention recognition result, obtaining the user true intention recognition result based on the first intention recognition result, fusing probabilities of the second intention and the first intention with the second intention when the predicted probability of the first intention is smaller than a first preset threshold value, fusing probabilities of the first intention and the second intention when the predicted probability of the first intention is smaller than a second preset threshold value, and fusing probabilities of the first intention and the second intention to increase the predicted probability of the first intention and the increased predicted probability or the second intention and the increased predicted probability as the user true recognition result, and fusing probabilities of the first intention and the second intention and the predicted probability of the second intention when the predicted probability of the first intention is smaller than the first preset threshold value, the predicted probability of the second intention is smaller than the second preset threshold value, and the predicted probability of the first intention and the second intention is not increased and the predicted intention and the increased predicted probability of the user true intention is reduced.
2. The method of claim 1, further comprising storing the user's current speech input text data as user historical speech input text data.
3. The vehicle cabin voice intention recognition method according to claim 1, wherein the fusing of the first intention recognition result and the second intention recognition result to obtain a user's actual intention, further comprises: Judging whether supplementary entity information is needed according to the intention category of the first intention recognition result; When the entity information needs to be supplemented, extracting entity information corresponding to the first intention recognition result from the text data input by the current voice of the user; And fusing the first intention recognition result, the entity information and the second intention recognition result to acquire the real intention recognition result of the user.
4. The vehicle cabin voice intent recognition method of claim 1, wherein obtaining text data of a user's current voice input comprises: Acquiring current voice input information of a user; And carrying out content recognition on the current voice input information through a voice recognition module so as to acquire the current voice input text data of the user.
5. The vehicle cabin voice intent recognition method of claim 1, wherein the vehicle interior-exterior environmental data includes at least one of vehicle interior temperature data, vehicle interior barometric pressure data, vehicle exterior temperature data, and weather conditions.
6. The vehicle cabin voice intent recognition method of claim 1, wherein the multi-classification model comprises a decision tree model or a deep learning model, and the multi-modal recognition model comprises a CogVLM model or a GPT model.
7. The vehicle cabin voice intent recognition method of claim 1, wherein the multimodal recognition model is disposed at a cloud.
8. The vehicle cabin voice intention recognition method according to claim 1, wherein fusing the first intention recognition result with the second intention recognition result to obtain a user real intention recognition result, comprises: when the prediction probability of the first prediction intention is larger than or equal to a first preset threshold value, the first prediction intention and the corresponding prediction probability are used as the recognition result of the real intention of the user; When the prediction probability of the first prediction intention is smaller than a first preset threshold value and the prediction probability of the second prediction intention is larger than or equal to a second preset threshold value, the second prediction intention and the corresponding prediction probability are used as the recognition result of the real intention of the user.
9. A voice intent recognition device, comprising: The data acquisition module is used for acquiring text data input by the current voice of the user, current expression limb data of the user and internal and external environment data of the vehicle; the first intention recognition module is used for recognizing the user intention of the text data input by the current voice of the user by utilizing the multi-classification model so as to obtain a first intention recognition result; The second intention recognition module is used for carrying out user intention recognition based on the current voice input text data of the user, the historical voice input text data of the user, the current expression limb data of the user and the internal and external environment data of the vehicle through a multi-mode recognition model so as to obtain a second intention recognition result; The result fusion module is used for fusing the first intention recognition result and the second intention recognition result to obtain a user real intention recognition result; The method comprises the steps of fusing a first intention recognition result with a second intention recognition result to obtain a user true intention recognition result, fusing different and predicted probabilities of the first intention recognition result and the second intention recognition result based on the first intention recognition result and the predicted intention in the second intention recognition result, obtaining the user true intention recognition result based on the first intention recognition result, fusing probabilities of the second intention and the first intention with the second intention when the predicted probability of the first intention is smaller than a first preset threshold value, fusing probabilities of the first intention and the second intention when the predicted probability of the first intention is smaller than a second preset threshold value, and fusing probabilities of the first intention and the second intention to increase the predicted probability of the first intention and the increased predicted probability or the second intention and the increased predicted probability as the user true recognition result, and fusing probabilities of the first intention and the second intention and the predicted probability of the second intention when the predicted probability of the first intention is smaller than the first preset threshold value, the predicted probability of the second intention is smaller than the second preset threshold value, and the predicted probability of the first intention and the second intention is not increased and the predicted intention and the increased predicted probability of the user true intention is reduced.
10. A vehicle control method characterized by comprising: Acquiring text data input by current voice of a user, current expression limb data of the user and internal and external environment data of a vehicle; Performing user intention recognition on the text data input by the current voice of the user by utilizing a multi-classification model so as to obtain a first intention recognition result; Performing user intention recognition based on the current voice input text data of the user, the historical voice input text data of the user, the current expression limb data of the user and the internal and external environment data of the vehicle through a multi-mode recognition model so as to obtain a second intention recognition result; fusing the first intention recognition result and the second intention recognition result to obtain a user real intention recognition result; generating a vehicle control instruction according to the real intention recognition result of the user so as to realize vehicle cabin control; The method comprises the steps of fusing a first intention recognition result with a second intention recognition result to obtain a user true intention recognition result, fusing different and predicted probabilities of the first intention recognition result and the second intention recognition result based on the first intention recognition result and the predicted intention in the second intention recognition result, obtaining the user true intention recognition result based on the first intention recognition result, fusing probabilities of the second intention and the first intention with the second intention when the predicted probability of the first intention is smaller than a first preset threshold value, fusing probabilities of the first intention and the second intention when the predicted probability of the first intention is smaller than a second preset threshold value, and fusing probabilities of the first intention and the second intention to increase the predicted probability of the first intention and the increased predicted probability or the second intention and the increased predicted probability as the user true recognition result, and fusing probabilities of the first intention and the second intention and the predicted probability of the second intention when the predicted probability of the first intention is smaller than the first preset threshold value, the predicted probability of the second intention is smaller than the second preset threshold value, and the predicted probability of the first intention and the second intention is not increased and the predicted intention and the increased predicted probability of the user true intention is reduced.

Description

Vehicle cabin voice intention recognition method and device and vehicle control method Technical Field The application relates to the technical field of intelligent automobiles, in particular to a vehicle cabin voice intention recognition method, a vehicle cabin voice intention recognition device and a vehicle control method. Background The vehicle cabin is the most closely interacted part of the vehicle and drivers and passengers, along with the continuous development of intelligent cabins and automatic driving technologies, the vehicle is not only a tool for assisting people to travel, but also gradually becomes a ring in the diversified living space of people, and the human-computer interaction in the vehicle cabin is particularly important. At present, the most common mode of vehicle cabin man-machine interaction is voice interaction, namely, a driver and passengers send commands to the vehicle cabin in a voice mode, the vehicle cabin is in dialogue with a user after processing procedures such as voice recognition, semantic understanding voice synthesis and the like, user intention recognition is completed, and vehicle cabin control is executed according to the recognized user intention. The intention recognition mode carries out user intention recognition by means of current voice data of drivers and passengers, and the intention of the user cannot be accurately recognized due to the fact that the current voice data of the drivers and the passengers are too concise and lack of necessary information, so that the intelligence of cabin man-machine interaction and the user experience are affected. Disclosure of Invention In view of the above-mentioned drawbacks of the prior art, it is an object of the present application to provide a vehicle cabin voice intention recognition method, a vehicle cabin voice intention recognition device, a vehicle control method model training method, a training device, and a medical image classification method, which overcome, at least in part, one or more of the problems due to the limitations and disadvantages of the related art. To achieve the above and other related objects, the present application provides a method for recognizing a voice intention of a vehicle cabin, comprising: Acquiring text data input by current voice of a user, current expression limb data of the user and internal and external environment data of a vehicle; Performing user intention recognition on the text data input by the current voice of the user by utilizing a multi-classification model so as to obtain a first intention recognition result; Performing user intention recognition based on the current voice input text data of the user, the historical voice input text data of the user, the current expression limb data of the user and the internal and external environment data of the vehicle through a multi-mode recognition model so as to obtain a second intention recognition result; And fusing the first intention recognition result with the second intention recognition result to obtain a user real intention recognition result. In an alternative embodiment of the present application, the user's current speech input text data is saved as user's historical speech input text data. In an optional embodiment of the present application, fusing the first intention recognition result with the second intention recognition result to obtain a real intention of the user, further includes: Judging whether supplementary entity information is needed according to the intention category of the first intention recognition result; When the entity information needs to be supplemented, extracting entity information corresponding to the first intention recognition result from the text data input by the current voice of the user; And fusing the first intention recognition result, the entity information and the second intention recognition result to acquire the real intention recognition result of the user. In an alternative embodiment of the present application, obtaining text data entered by a user's current voice includes: Acquiring current voice input audio data of a user; and carrying out content recognition on the current voice input audio data of the user through a voice recognition module so as to acquire the current voice input text data of the user. In an alternative embodiment of the present application, the vehicle interior and exterior environment data includes at least one of vehicle interior temperature data, vehicle interior air pressure data, vehicle exterior temperature data, and weather conditions. In an alternative embodiment of the application, the multi-classification model comprises a decision tree model or a deep learning model, and the multi-modal identification model comprises a CogVLM model or a GPT model. In an optional embodiment of the application, the multimodal recognition model is disposed at a cloud end. In an optional embodiment of the application, the first intention recognition result inc