EP-3923274-B1 - VOICE INTERACTION METHOD AND ELECTRONIC DEVICE

EP3923274B1EP 3923274 B1EP3923274 B1EP 3923274B1EP-3923274-B1

Inventors

LUO, Hongfeng
ZHAO, Yuxi
ZHANG, WEN

Dates

Publication Date: 20260506
Application Date: 20200314

Claims (8)

A voice interaction method performed by an electronic device, the method comprising the steps of: • displaying a first interface in response to an operation of waking up a voice assistant application by a user, wherein the first interface is used to display dialog content between the user and the voice assistant application; • receiving (S 501) first voice input of the user, wherein the first voice input comprises first slot information; • displaying (S 505) a first card in the first interface in response to the first voice input, wherein the first card comprises N candidate options of the first slot information, the N candidate options are in a one-to-one correspondence with N query requests, each query request in the N query requests carries a corresponding candidate option of the first slot information, and N ≥ 1; • after the displaying the first card in the first interface: ∘ receiving second voice input, wherein the second voice input comprises a filtering condition, the filtering condition serving for filtering the N candidate options based on the filtering condition, thereby arriving at one or more candidate options among the N candidate options that meet the filtering condition, and ∘ displaying a second card that comprises the one or more candidate options that meet the filtering conditions, • in response to an operation of selecting a first candidate option among the one or more candidate options contained in the second card by the user, sending (S506) a first query request corresponding to the first candidate option to a first server, to provide a service result corresponding to the first voice input to the user, • wherein after the displaying (S505) the first card in the first interface, the method further comprises: ∘ displaying (S509) a second interface after the electronic device switches the voice assistant application running from foreground to background; and ∘ displaying the first interface again after the electronic device switches the voice assistant application running to foreground again.
The method according to claim 1, wherein the operation of selecting the first candidate option from the one or more candidate options comprises: inputting, to the electronic device, third voice input that comprises the first candidate option, or tapping the first candidate option in the second card.
The method according to claim 1, wherein the first voice input further comprises second slot information, and after the sending the first query request corresponding to the first candidate option to the first server, the method further comprises: ∘ displaying a third card in the first interface, wherein the third card comprises M candidate options of the second slot information, the M candidate options are in a one-to-one correspondence with M query requests, the M query requests all carry the first candidate option selected by the user, each query request in the M query requests carries a corresponding candidate option of the second slot information, and M ≥ 1; and ∘ in response to an operation of selecting a second candidate option from the M candidate options by the user, sending a second query request corresponding to the second candidate option to the first server.
The method according to claim 3, wherein after the displaying the third card in the first interface, the method further comprises: ∘ displaying the second interface after the electronic device switches the voice assistant application running from foreground to background; and ∘ displaying the first interface again after the electronic device switches the voice assistant application running to foreground again.
The method according to claim 3 or 4, wherein the operation of selecting the second candidate option from the M candidate options comprises: tapping the second candidate option in the third card, or inputting, to the electronic device, third voice input that comprises the second candidate option.
The method according to any one of claims 1 to 5, wherein after the receiving the first voice input of the user, the method further comprises: ∘ sending the first voice input to the first server; and ∘ receiving the one-to-one correspondence that is between the N candidate options and the N query requests and that is sent by the first server.
The method according to any one of claims 3 to 5, wherein after the sending the first query request corresponding to the first candidate option to the first server, the method further comprises: ∘ receiving the one-to-one correspondence that is between the M candidate options and the M query requests and that is sent by the first server.
An electronic device (100) comprising: ∘ a touchscreen (194), wherein the touchscreen (194) comprises a touch-sensitive surface and a display screen; ∘ a communications module (150, 160); ∘ one or more processors (110); ∘ one or more memories (121); ∘ one or more microphones (170C); and ∘ one or more computer programs, wherein the one or more computer programs are stored in the one or more memories, the one or more computer programs comprise instructions, and when the instructions are executed by the electronic device (100), the electronic device is caused to perform the voice interaction method according to any one of claims 1 to 7.

Description

This application claims priority to Chinese Patent Application No. 201910224332.0, filed with the Chinese National Intellectual Property Administration on March 22, 2019 and entitled "VOICE INTERACTION METHOD AND ELECTRONIC DEVICE'. TECHNICAL FIELD This application relates to the field of terminal technologies, and in particular, to a voice interaction method and an electronic device. BACKGROUND Human-computer interaction (human-computer interaction, HCI) is a process of exchanging information between a person and a computer to complete a specified task through specific interaction by using a conversational language between the person and the computer. Currently, a large quantity of graphical user interfaces (Graphical User Interface, GUI) are used on an electronic device, for example, a mobile phone, to implement a human-computer interaction process with a user. With development of a voice recognition technology, a voice assistant (for example, Siri, Xiao Ai, and Celia) is added into many electronic devices to help the user complete the human-computer interaction process with the electronic device. An example in which Siri is used as the voice assistant is used. After the user wakes up Siri on the mobile phone, Siri may perform voice communication with the user by using a voice user interface (voice user interface, VUI). During voice communication, Siri may answer each query (query) initiated by the user. However, when the voice communication between the user and Siri is interrupted, for example, if an incoming call is suddenly received when the user has a dialog with Siri, the mobile phone automatically exits the current voice dialog with Siri. If the user expects to continue to perform voice communication with Siri, the user needs to wake up the voice assistant on the mobile phone again. In other words, after a dialog process between the user and the voice assistant on the mobile phone is interrupted, the voice assistant cannot resume the current voice dialog with the user, and consequently the voice assistant on the mobile phone is inefficient. The document US 2018/0336897 A1 shows a voice assistant system, in which a user interacts with a mobile communication device. Especially, the user can issue voice commands and lead a dialogue with the assistant. In case of user queries being ambiguous, the assistant asks suitable questions to overcome the disambiguity, to which the user again responds by voice commands. US2018/308485A1 discloses systems and processes for operating a digital assistant are provided. In one example, a method includes receiving a first speech input from a user. The method further includes identifying context information and determining a user intent based on the first speech input and the context information. The method further includes determining whether the user intent is to perform a task using a searching process or an object managing process. The searching process is configured to search data, and the object managing process is configured to manage objects. The method further includes, in accordance with a determination the user intent is to perform the task using the searching process, performing the task using the searching process; and in accordance with the determination that the user intent is to perform the task using the object managing process, performing the task using the object managing process. SUMMARY The present invention is defined by the independent claims. Further advantageous developments are shown by the dependent claims. This application provides a voice interaction method and an electronic device, so that after a dialog between a user and a voice assistant is interrupted, the voice assistant can resume the current dialog content with the user, to improve use efficiency and user experience of the voice assistant on the electronic device. To achieve the foregoing objective, the following technical solutions are used in this application. According to a first aspect, this application provides a voice interaction method, including: In response to an operation of waking up a voice assistant application by a user, an electronic device starts to run the voice assistant application in foreground, and displays a first interface. The first interface is used to display dialog content between the user and the voice assistant application. Further, the user can input voice to the electronic device. First voice input that is of the user and that is received by the electronic device is used as an example. The first voice input includes first slot information. If semantics of the first slot information is not clear, for example, if the first slot information is departure place information and a plurality of locations in a map are related to the departure place information, the electronic device may display a first card in the first interface in response to the first voice input. The first card includes N (N ≥ 1) candidate options of the first slot information. The N candidat