CN-121983051-A - Voice interaction control method and device, electronic equipment and storage medium

CN121983051ACN 121983051 ACN121983051 ACN 121983051ACN-121983051-A

Abstract

The embodiment of the disclosure provides a voice interaction control method, a voice interaction control device, electronic equipment and a storage medium. The voice interaction control method comprises the steps of monitoring an audio signal of an environment where voice interaction is located and extracting first voice characteristic information in the audio signal when the voice interaction is in an awake state, obtaining first recognition text based on the first voice characteristic information, determining whether complete intention is obtained according to the first recognition text, initializing awake state countdown of the voice interaction when the complete intention is obtained according to the first recognition text, enabling the voice interaction to be kept in the awake state before the awake state countdown is reset, and keeping the voice interaction in the awake state when the awake state countdown is reset and the mark is in a state of multi-round dialogue. The method provided by the embodiment of the disclosure can realize continuous interaction between the user and the voice interaction control module of the electronic equipment, and improve the use experience of the user.

Inventors

HE YONGQIANG

Assignees

北京罗克维尔斯科技有限公司

Dates

Publication Date: 20260505
Application Date: 20220413

Claims (18)

1. A voice interaction control method, comprising: Under the condition that voice interaction is in an awake state, monitoring an audio signal of the environment, and extracting first voice characteristic information in the audio signal; under the condition that the first voice characteristic information is extracted, obtaining a first recognition text based on the first voice characteristic information; Determining whether a complete intention is acquired according to the first recognition text, wherein the complete intention is an intention that information slots are filled; initializing the countdown of the wake-up state of the voice interaction under the condition that the complete intention is acquired according to the first recognition text, so that the voice interaction is kept in the wake-up state before the countdown of the wake-up state is zeroed; And in the state that the countdown returns to zero in the awakening state and the mark is in the multi-round dialogue, keeping the voice interaction in the awakening state, and after the multi-round dialogue finger and the user perform one voice interaction, performing at least one voice interaction again to obtain the dialogue with complete intention.
2. The method of claim 1, wherein in the event that the awake state countdown is initialized and the awake state countdown is not zeroed, the method further comprises: monitoring the audio signal of the environment continuously, and extracting second voice characteristic information in the audio signal; processing the second voice characteristic information to obtain a second recognition text under the condition that the second voice characteristic information is extracted; determining whether a first effective intention is acquired according to the second recognition text; And under the condition that the first effective intention is acquired, reinitializing the wakeup state countdown of the voice interaction.
3. The method of claim 2, further comprising, in the event that the first valid intent is obtained, further comprising: Judging whether the first effective intention is a complete intention or not; and in the case that the first valid intention is judged to be the complete intention, performing an operation matched with the first valid intention, and reinitializing the wakeup state countdown of the voice interaction.
4. A method according to claim 3, wherein in case it is determined that the first valid intention is not a complete intention, the method further comprises: determining an information slot to be filled according to the first effective intention; Generating slot filling prompt information according to the information slots to be filled, outputting the information slots, and marking the information slots to be filled in a multi-round dialogue state, wherein the multi-round dialogue state is an interaction state requiring filling of the information slots to be filled; and keeping the voice interaction in the awakening state under the state that the countdown of the awakening state returns to zero and the mark is in the multi-round dialogue.
5. The method of claim 2, wherein the determining whether the first valid intent is obtained based on the second recognized text comprises: In the case where the second recognition text is determined to be a valid semantic text and the second recognition text is not a blacklist text, determining that the first valid intent is obtained.
6. A voice interaction control method, comprising: monitoring an audio signal of the environment under the condition that voice interaction is in an awake state; determining whether a complete intention is acquired according to the audio signal; Initializing a wakeup state countdown of voice interaction under the condition that the complete intention is determined to be acquired according to the audio signal; the method for monitoring the audio signal of the environment under the condition that the voice interaction is in the wake-up state comprises the steps of obtaining a third identification text according to the audio signal; Judging whether the second effective intention is an intention which is required to be interactively fed back for corresponding operation or not under the condition that the third identification text does not comprise matching information matched with the information slot to be filled and the acquired second effective intention is determined to be a complete intention according to the third identification text; And marking a state of a non-multi-round dialogue and reinitializing the wakeup state countdown of the voice interaction under the condition that the second valid intention is the intention requiring interactive feedback.
7. The method of claim 6, wherein the obtaining third recognition text from the audio signal comprises: Extracting third voice characteristic information in the audio signal; and processing the third voice characteristic information to obtain the third recognition text under the condition that the third voice characteristic information is extracted.
8. The method of claim 7, wherein in the case where the third voice feature information is extracted, after processing the third voice feature information to obtain a third recognition text, the method further comprises: judging whether the third identification text comprises matching information matched with the information slot to be filled; Filling the matching information into the information slot to be filled in case that the third identification text is judged to comprise the matching information; judging whether the first effective intention is a complete intention again; And under the condition that the first effective intention is determined to be the complete intention when the re-judgment is executed, executing an operation corresponding to the first effective intention, marking to end a multi-round dialogue state, and reinitializing the wakeup state countdown of voice interaction.
9. The method of claim 8, wherein in the case where the third recognition text does not include the matching information, the method further comprises: determining whether a second valid intention is acquired according to the third recognition text; judging whether the second effective intention is a white list intention or not under the condition that the second effective intention is acquired; and in the case that the second valid intention is determined to be the whitelist intention, performing an operation corresponding to the second valid intention, marking a state of being in a non-multi-turn dialogue, and reinitializing an awake state countdown of voice interaction.
10. The method according to claim 9, wherein the method further comprises: and outputting slot filling prompt information again under the condition that the second effective intention is not the white list intention.
11. The method of claim 9, wherein the performing an operation corresponding to the second valid intent comprises: judging whether the second effective intention is a complete intention or not; And under the condition that the second valid intention is a complete intention, reinitializing the wakeup state countdown of the voice interaction.
12. The method of claim 6, wherein the method further comprises: And in the case that the second effective intention is an intention which does not need voice interaction feedback, the second effective intention is kept in a state of multi-round dialogue.
13. The method of claim 6, wherein the method further comprises: And outputting the slot filling prompt information again under the condition that the second effective intention is the intention which does not need voice interaction feedback.
14. The method according to any of claims 1-13, wherein in case the awake state countdown is about to return to zero, the method further comprises: And under the condition that the second voice characteristic information is not interrupted, or whether the first valid intention is acquired is not determined according to the second recognition text, or under the condition that the operation matched with the first valid intention or the second valid intention is not completed, reinitializing the countdown of the wake-up state or keeping the voice interaction in the wake-up state.
15. A voice interaction control apparatus, comprising: the first monitoring unit is used for monitoring the audio signal of the environment where the voice interaction is in the wake-up state and extracting first voice characteristic information in the audio signal; A text recognition unit, configured to obtain a first recognition text based on the first voice feature information when the first voice feature information is extracted; The first determining unit is used for determining whether a complete intention is acquired according to the first identification text, wherein the complete intention is an intention that information slots are filled; The first control unit is used for initializing the countdown of the awakening state of the voice interaction under the condition that the complete intention is acquired according to the first recognition text, so that the voice interaction is kept in the awakening state before the countdown of the awakening state is reset; The first control unit is configured to keep the voice interaction in the awake state when the awake state is countdown and zero and the mark is in a state of multiple rounds of conversations, where the multiple rounds of conversations refer to a conversation with the complete intention can be obtained after at least one time of voice interaction is performed between the voice interaction and the user.
16. A voice interaction control apparatus, comprising: The second monitoring unit is used for monitoring the audio signal of the environment where the voice interaction is in the wake-up state; A second determining unit, configured to determine whether a complete intention is acquired according to the audio signal; the second control unit is used for initializing the wakeup state countdown of voice interaction under the condition that the complete intention is determined to be acquired according to the audio signal; The second monitoring unit is further configured to obtain a third identification text according to the audio signal, determine whether the second effective intention is an intention corresponding to the operation requiring interactive feedback when the third identification text does not include matching information matching the information slot to be filled and the obtained second effective intention is determined to be a complete intention according to the third identification text, and mark a state in which the second effective intention is not in a multi-round dialogue when the second effective intention is the intention requiring interactive feedback, and reinitialize a wakeup state of voice interaction.
17. An electronic device comprising a processor and a memory, the memory for storing a computer program; the computer program, when loaded by the processor, causes the processor to perform the voice interaction control method of any of claims 1-14.
18. A computer readable storage medium, characterized in that the storage medium stores a computer program, which when executed by a processor causes the processor to implement the voice interaction control method according to any of claims 1-14.

Description

Voice interaction control method and device, electronic equipment and storage medium The application is a divisional application which is proposed after the mother application (application number 2022103891229, application day 2022, month 04 and 13) is refused, and the content of the divisional application is not beyond the range of the original application. Technical Field The disclosure relates to the technical field of human-computer interaction, in particular to a voice interaction control method, a device, electronic equipment and a storage medium. Background In the intelligent cabin technology, because the participation degree of hands and eyes of a user is reduced by the voice interaction control method, the time for the driver to leave the road surface and the time for the hands to leave the steering wheel can be reduced in the driving process, and the driving safety is further improved, so that a voice interaction control module is commonly configured in a new mass-production advanced vehicle. In the process of driving the vehicle, the user can wake up the voice interaction control module through a specific voice instruction or a key instruction, trigger the voice interaction control module to respond to the voice control instruction which is subsequently spoken, and execute corresponding control operation. In addition, in order to avoid the collection of unconscious voice characteristic information of the user and further avoid the acquisition of error instructions, the voice interaction control module is in a dormant state after completing interaction with the user. However, in the related art, after the voice interaction control module is awakened by the user, receives and executes a complete voice control instruction input by the user, the voice interaction module enters a sleep state, and the user needs to wake up the voice interaction module again to enable the user to correspond to the next voice control instruction, so that the use experience is poor. Disclosure of Invention In order to solve the technical problems, embodiments of the present disclosure provide a method, an apparatus, an electronic device, and a storage medium for controlling voice interaction. In a first aspect, an embodiment of the present disclosure provides a voice interaction control method, including: Under the condition that voice interaction is in an awake state, monitoring an audio signal of the environment, and extracting first voice characteristic information in the audio signal; under the condition that the first voice characteristic information is extracted, obtaining a first recognition text based on the first voice characteristic information; Determining whether a complete intention is acquired according to the first recognition text, wherein the complete intention is an intention that information slots are filled; initializing the countdown of the wake-up state of the voice interaction under the condition that the complete intention is acquired according to the first recognition text, so that the voice interaction is kept in the wake-up state before the countdown of the wake-up state is zeroed; And in the state that the countdown returns to zero in the awakening state and the mark is in the multi-round dialogue, keeping the voice interaction in the awakening state, and after the multi-round dialogue finger and the user perform one voice interaction, performing at least one voice interaction again to obtain the dialogue with complete intention. In a second aspect, an embodiment of the present disclosure provides a voice interaction control method, including: monitoring an audio signal of the environment under the condition that voice interaction is in an awake state; determining whether a complete intention is acquired according to the audio signal; Initializing a wakeup state countdown of voice interaction under the condition that the complete intention is determined to be acquired according to the audio signal; the method for monitoring the audio signal of the environment under the condition that the voice interaction is in the wake-up state comprises the steps of obtaining a third identification text according to the audio signal; Judging whether the second effective intention is an intention which is required to be interactively fed back for corresponding operation or not under the condition that the third identification text does not comprise matching information matched with the information slot to be filled and the acquired second effective intention is determined to be a complete intention according to the third identification text; And marking a state of a non-multi-round dialogue and reinitializing the wakeup state countdown of the voice interaction under the condition that the second valid intention is the intention requiring interactive feedback. In a third aspect, an embodiment of the present disclosure further provides a voice interaction control apparatus, including: the first monitoring unit i