EP-4055593-B1 - METHOD AND APPARATUS FOR PROVIDING VOICE ASSISTANT SERVICE

EP4055593B1EP 4055593 B1EP4055593 B1EP 4055593B1EP-4055593-B1

Inventors

SHIN, Jaesick
LEE, SUNGHO

Dates

Publication Date: 20260506
Application Date: 20210205

Claims (15)

A method of providing a voice assistant service, the method comprising: receiving (S410) a first voice command from a user; determining, from among a plurality of candidate devices, a plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) to which the first voice command is to be transmitted; transmitting (S420) information related to the first voice command to the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5); respectively receiving, from the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5), a plurality of service provision messages generated in response to receiving the first voice command and a plurality of pieces of service provision history information of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5), wherein the service provision history information of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) includes time information indicating a current time when a service is selected in response to receiving the first voice command and location information of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5), and wherein the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) is prioritized based on the time information indicating the current time and the location information of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5); selecting (S440), based on the priorities assigned to the plurality of devices, at least one of the plurality of service provision messages; and outputting (S450), based on a result of the selecting, a response message in response to the first voice command.
The method of claim 1, wherein determining the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) comprises: converting the first voice command into a first text; interpreting the first text by using a natural language understanding (NLU) model; determining an intent of the user based on a result of the interpreting of the first text; and determining, based on a relevance between the intent of the user and each of the plurality of candidate devices, the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) that the first voice command is to be transmitted, from among the plurality of candidate devices.
The method of claim 2, wherein determining the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) based on the relevance comprises: obtaining a plurality of pieces of device information respectively regarding the plurality of candidate devices; obtaining, based on the plurality of pieces of device information, a plurality of probability values indicating a degree of the relevance between the intent of the user and each of the plurality of candidate devices; and determining the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) including probability values that are greater than or equal to a threshold value.
The method of claim 1, wherein a first service provision message received from a first device among the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) includes at least one of information about whether the first device is able to provide a service in response to the first voice command, information about whether the first device is included in a first group, identification information of the first device, identification information of a first service provided by the first device in response to the first voice command, a type of the first service, or identification information of an application used to provide the first service.
The method of claim 4, wherein selecting the at least one of the plurality of service provision messages comprises: identifying, from among the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5), one or more devices included in the first group; and selecting, based on the plurality of pieces of service provision history information of the identified one or more devices, at least one service provision message from among the plurality of service provision messages received from the identified one or more devices.
The method of claim 1, wherein service provision history information of a first device includes at least one of a number of times that a first service suggested by the first device in response to the first voice command was selected by the user or context information obtained when the first service was selected in response to the first voice command.
The method of claim 1, wherein: receiving the plurality of service provision messages and the plurality of pieces of service provision history information comprises respectively receiving, from the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5), a plurality of pieces of device information, each piece of device information regarding sub-devices constituting a corresponding device, together with the plurality of service provision messages and the plurality of pieces of service provision history information, and selecting the at least one of the plurality of service provision messages comprises selecting at least one of the plurality of service provision messages, based on the plurality of pieces of device information regarding the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) and the plurality of pieces of service provision history information thereof.
The method of claim 1, wherein selecting the at least one of the plurality of service provision messages comprises selecting at least one of the plurality of service provision messages by using a service recommendation model, and wherein the service recommendation model comprises: an artificial intelligence (AI) algorithm trained based on a voice command, service provision histories for the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) with respect to the voice command, and a plurality of pieces of device information regarding the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5).
The method of claim 1, wherein selecting the at least one of the plurality of service provision messages comprises: assigning priorities to the plurality of service provision messages based on at least one of information about whether each device provides a service in response to the first voice command, a type of service provided by each device in response to the first voice command, a number of times that a service provided by each device in response to the first voice command was selected, context information when the service was selected in response to the first voice command, or device information regarding sub-devices constituting each device; and selecting at least one of the plurality of service provision messages, based on the assigned priorities.
The method of claim 1, wherein selecting the at least one of the plurality of service provision messages comprises: assigning priorities to the plurality of service provision messages based on at least one of information about whether each device provides a service in response to the first voice command, a type of service provided by each device in response to the first voice command, a number of times that a service provided by each device in response to the first voice command was selected, context information obtained when the service provided by each service in response to the first voice command was selected, or device information regarding sub-devices constituting each device; and selecting, based on the assigned priorities, two or more service provision messages from among the plurality of service provision messages.
The method of claim 1, wherein: each of the plurality of service provision messages respectively received from the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) includes information related to a service provided by each of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) in response to the first voice command and the outputting of the response message in response to the first voice command comprises: generating the response message to include information related to at least one service in the selected at least one of the plurality of service provision messages; and outputting the response message.
The method of claim 1, wherein: selecting the at least one of the plurality of service provision messages comprises: identifying, based on the plurality of service provision messages, one or more devices included in a first group from among the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5); and by using a service recommendation model, selecting, based on the plurality of pieces of service provision history information of the identified one or more devices, at least one service provision message from among the plurality of service provision messages received from the identified one or more devices, the method further comprising: receiving a second voice command from the user; determining, based on the second voice command, a service selected by the user from among services provided by the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) in response to the first voice command; transmitting information related to the service selected by the user to the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5); and training the service recommendation model by using the information related to the service selected by the user.
An apparatus for providing a voice assistant service, the apparatus comprising: a receiver (110) configured to receive a command from a user (10); a communicator (130); a memory (140) configured to store voice assistant program one or more instructions; at least one processor (120) operably connected to the receiver (110), the communicator (130), the memory (140), and, where the processor (120) is configured to: control the receiver (110) to receive a first voice command from the user (10); determine, from among a plurality of candidate devices, a plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) to which the first voice command is to be transmitted; control the communicator (130) to transmit information related to the first voice command to the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) and respectively receive, from the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5), a plurality of service provision messages generated in response to receiving the first voice command and a plurality of pieces of service provision history information of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5); select, based on the priorities assigned to the plurality of devices, at least one of the plurality of service provision messages; and output, based on a result of the selecting, a response message in response to the first voice command, wherein the service provision history information of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) includes time information indicating a current time when a service is selected in response to receiving the first voice command and location information of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5), and wherein the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) is prioritized based on the time information indicating the current time and the location information of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5).
The apparatus of claim 13, wherein: the at least one processor is further configured to: identify, based on the plurality of service provision messages, one or more devices included in a first group from among the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5); and by using a service recommendation model, select, based on the plurality of pieces of service provision history information of the identified one or more devices, at least one service provision message from among the plurality of service provision messages received from the identified one or more devices, the receiver is further configured to further receive a second voice command; and the at least one processor is further configured to: determine, based on the second voice command, a service selected by the user from among services provided by the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) in response to the first voice command, control the communicator to transmit information related to the service selected by the user to the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5), and train the service recommendation model by using the information related to the service selected by the user.
A computer-readable medium embodying a computer program, the computer program comprising computer readable program code that when executed by a processor of an electronic device causes processor to: receive a first voice command from a user; determine, from among a plurality of candidate devices, a plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) to which the first voice command is to be transmitted; transmit information related to the first voice command to the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5); respectively receive, from the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5), a plurality of service provision messages generated in response to receiving the first voice command and a plurality of pieces of service provision history information of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5), wherein the service provision history information of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) includes time information indicating a current time when a service is selected in response to receiving the first voice command and location information of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5), and wherein the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5) is prioritized based on the time information indicating the current time and the location information of the plurality of devices (200-1, 200-2, 200-3, 200-4, 200-5); select, based on the priorities assigned to the plurality of devices, at least one of the plurality of service provision messages; and output, based on a result of the selecting, a response message in response to the first voice command.

Description

[Technical Field] The disclosure relates to methods and apparatuses for providing a voice assistant service, and more particularly, to methods and apparatuses for providing a voice assistant service whereby at least one service is recommended among services that a plurality of devices are able to provide. [Background Art] With recent developments in electronic devices, such as smartphones, for performing various functions in a complex manner, electronic devices capable of speech recognition have been launched to improve operability. A speech recognition technology may be applied to a conversational user interface for outputting a response message to a question input by a user's voice in an everyday, natural language to provide a user-friendly conversational service. The conversational user interface refers to an intelligent user interface that operates by talking in a user's language. For example, electronic devices such as smartphones, computers, personal digital assistants (PDAs), portable multimedia players (PMPs), smart home appliances, navigation devices, wearable devices, etc., may provide conversational services by connecting to a server or executing an application. Furthermore, with the advancement in artificial intelligence (AI) technology, the AI technology has also been applied to a speech recognition function to enable quick, accurate speech recognition for various utterances. An AI system is a computer system that implements human-level intelligence and enables machines to become smart by learning and making decisions on their own, compared to an existing rule-based smart system. Because the AI system improves its recognition rates and is capable of understanding a user's preferences more accurately through experience, existing rule-based smart systems are increasingly being replaced by deep learning-based AI systems. US 2018/293484 A1 discloses conventional methods and apparatuses for providing a voice assistant service, wherein multiple virtual assistants are provided on a single electronic device. WO 2017/222503 A1 discloses a conventional system in which multiple virtual assistants are running on multiple physical devices and a communication apparatus which sends queries to the virtual assistants and collects the responses. [Disclosure] [Technical Problem] As more devices are capable of providing conversational services with their speech recognition function, the number and types of services that the devices are able to provide to a user have increased and become more diverse. Accordingly, in order to select and receive a desired service, a user is inconvenienced in having to fully know and utter a number of commands associated with each device. Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments of the disclosure. [Technical Solution] To solve this problem, according to various embodiments of the disclosure, provided is a method by which a voice assistant service providing apparatus connected to a plurality of devices classifies the devices into groups according to capabilities of the devices in response to a user's voice command, selects an optimal service from among services that a group of devices are able to provide, and suggests the optimal service to the user. The invention is defined by the appended set of claims. The description that follows is subjected to this limitation. Any disclosure lying outside the scope of said claims is only intended for illustrative as well as comparative purposes. [Description of Drawings] The above and other aspects, features, and advantages of certain embodiments of the disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which: FIG. 1A illustrates a system for providing a voice assistant service, according to an embodiment of the disclosure;FIG. 1B illustrates an example of a block diagram of a system for providing a voice assistant service, according to an embodiment of the disclosure;FIG. 2 illustrates a signal flowchart of an operation method of a system for providing a voice assistant service, according to an embodiment of the disclosure;FIG. 3 illustrates a signal flowchart of an operation method of a system for providing a voice assistant service, according to an embodiment of the disclosure;FIG. 4 illustrates a flowchart of a method of providing a voice assistant service, according to an embodiment of the disclosure;FIG. 5 illustrates a detailed flowchart of a method of providing a voice assistant service, according to an embodiment of the disclosure;FIG. 6 illustrates a flowchart of an operation method of a device that interacts with a user via a voice assistant service providing apparatus, according to an embodiment of the disclosure;FIG. 7 illustrates a detailed flowchart of an operation method of a device that interacts with a user throug