EP-4738345-A1 - ELECTRONIC DEVICE AND METHOD FOR RECOGNIZING SPEAKER BY USING EXTERNAL ELECTRONIC DEVICE

EP4738345A1EP 4738345 A1EP4738345 A1EP 4738345A1EP-4738345-A1

Abstract

This electronic device may comprise: a memory storing instructions; a communication circuit; a microphone; and at least one processor. The instructions may, when executed individually or collectively by the at least one processor, cause the electronic device to: obtain a voice signal; generate, from the voice signal including a wake-up word, a first recognition result for the wake-up word; transmit the first recognition result to a server connected to one or more external electronic devices capable of recognizing the wake-up word; receive, from the server, a second recognition result for a user who uttered the wake-up word; and execute, based on the second recognition result, a designated function in the electronic device with respect to the user. The second recognition result may include identification information indicating the user, obtained from an external electronic device, in which the user is registered, among the one or more external electronic devices.

Inventors

CHOI, CHANHEE
KIM, KYUNGTAE
YOON, Hyundon

Assignees

Samsung Electronics Co., Ltd.

Dates

Publication Date: 20260506
Application Date: 20240619

Claims (15)

An electronic device (101, 105) comprising: memory comprising one or more storage media storing instructions; communication circuity; a microphone (411, 451); and at least one processor including processing circuitry; wherein the instructions, when executed by the at least one processor, cause the eletronic device (101, 105) to: obtain a voice signal via the microphone (411, 451); generate, from the voice signal including a wake-up word, a first recognition result for the wake-up word; transmit the first recognition result to a server (107) connected to one or more external eletronic devices capable of recognizing the wake-up word; receive, from the server (107), a second recognition result for a user who uttered the wake-up word; and execute, based on the second recognition result, a designated function with respect to the user, wherein the second recognition result includes identification information indicating the user, obtained from an external eletronic device (103), in which the user is registered, among the one or more external eletronic devices.
The eletronic device (101, 105) of claim 1, wherein the instructions, when executed by the at least one processor, cause the eletronic device (101, 105) to: extract a plurality of feature values from the voice signal; and identify the wake-up word based on the plurality of feature values, wherein the wake-up word is used for executing each voice recognition function of the eletronic device (101, 105) and the one or more external eletronic devices.
The eletronic device (101, 105) of claim 1, wherein the first recognition value includes at least one of a recognition accuracy value of the wake-up word in the voice signal identified by the eletronic device (101, 105), sound power level of the voice signal including the wake-up word, signal to noise ratio (SNR) of the voice signal, or timing at which the voice signal is obtained.
The eletronic device (101, 105) of claim 1, wherein the second recognition result a third recognition result for the wake-up word transmitted from the one or more external eletronic devices respectively and a fourth recognition result including the identification information, wherein the third recognition result is generated based on another voice signal including the wake-up word obtained by the one or more external eletronic devices respectively, and wherein the fourth recognition result is generated based on the another voice signal obtained by the external eletronic device in which the user is registered.
The eletronic device (101, 105) of claim 4, wherein the fourth recognition result includes at least one of the identification information of the user who uttered the wake-up word, group information in which the user is included, or a recognition accuracy value of the user.
The eletronic device (101, 105) of claim 1, wherein the instructions, when executed by the at least one processor, cause the eletronic device (101, 105) to: identify, based on the first recognition result and the second recognition result, whether an indication target of the wake-up word is the eletronic device; and based on identifying that the indication target is the eletronic device (101, 105), execute a mode for a reception of a voice signal including a command word.
The eletronic device (101, 105) of claim 1, wherein the designated function includes at least one of displaying the user interface for guiding that the eletronic device (101, 105) recognized the user or executing a software application requiring an authentication of the user.
The eletronic device (101, 105) of claim 1, wherein the instructions, when executed by the at least one processor, cause the eletronic device (101, 105) to: identify whether a speaker recognition model for recognizing the user identified based on the second recognition result is generated; and based on identifying that the speaker recognition model is not generated, implement a training of the speaker recognition model with respect to the user by using the first recognition result.
The eletronic device (101, 105) of claim 8, wherein the training of the speaker recognition model is implemented based on a designated number of voice signals with respect to the user, and wherein the designated number is identified based on signal quality of the voice signal.
A method performed by an eletronic device (101, 105) comprising: obtaining a voice signal via a microphone (411, 451); generating, from the voice signal including a wake-up word, a first recognition result for the wake-up word; transmitting the first recognition result to a server (107) connected to one or more external eletronic devices capable of recognizing the wake-up word; receiving, from the server (107), a second recognition result for a user who uttered the wake-up word; and executing, based on the second recognition result, a designated function with respect to the user, wherein the second recognition result includes identification information indicating the user, obtained from an external eletronic device (103), in which the user is registered, among the one or more external eletronic devices.
The method of claim 10, comprising: extracting a plurality of feature values from the voice signal; and identifying the wake-up word based on the plurality of feature values, wherein the wake-up word is used for executing each voice recognition function of the eletronic device (101, 105) and the one or more external eletronic devices.
The method of claim 10, wherein the first recognition value includes at least one of a recognition accuracy value of the wake-up word in the voice signal identified by the eletronic device (101, 105), sound power level of the voice signal including the wake-up word, signal to noise ratio (SNR) of the voice signal, or timing at which the voice signal is obtained.
The method of claim 10, wherein the second recognition result a third recognition result for the wake-up word transmitted from the one or more external eletronic devices respectively and a fourth recognition result including the identification information, wherein the third recognition result is generated based on another voice signal including the wake-up word obtained by the one or more external eletronic devices respectively, and wherein the fourth recognition result is generated based on the another voice signal obtained by the external eletronic device in which the user is registered.
The method of claim 13, wherein the fourth recognition result includes at least one of the identification information of the user who uttered the wake-up word, group information in which the user is included, or a recognition accuracy value of the user.
A non-transitory computer-readable storage medium, when executed individually or collectively by at least one processor of an eletronic device (101, 105) including communication circuitry and a microphone (411, 451), storing one or more programs including instructions that cause the eletronic device (101, 105) to: obtain a voice signal via the microphone (411, 451); generate, from the voice signal including a wake-up word, a first recognition result for the wake-up word; transmit the first recognition result to a server (107) connected to one or more external eletronic devices capable of recognizing the wake-up word; receive, from the server (107), a second recognition result for a user who uttered the wake-up word; and execute, based on the second recognition result, a designated function with respect to the user, wherein the second recognition result includes identification information indicating the user, obtained from an external eletronic device (103), in which the user is registered, among the one or more external eletronic devices.

Description

[Technical Field] The following descriptions relate to an electronic device for recognizing a speaker using an external electronic device and a method thereof. [Background Art] An electronic device may obtain a sound signal from the outside via a microphone. For example, the sound signal may include a speech or voice uttered by a speaker. For example, the electronic device may recognize the speaker based on the sound signal. The above-described information may be provided as a related art for a purpose of helping understanding of the present disclosure. No claim or determination is raised as to whether any of the above-described descriptions may be applied as a prior art related to the present disclosure. [Disclosure] [Technical Solution] An electronic device may include communication circuitry. The electronic device may include a microphone. The electronic device may include at least one processor. The at least one processor may be configured to obtain a voice signal via the microphone. The at least one processor may be configured to generate, from the voice signal including a wake-up word, a first recognition result for the wake-up word. The at least one processor may be configured to transmit the first recognition result to a server connected to one or more external electronic devices capable of recognizing the wake-up word. The at least one processor may be configured to receive, from the server, a second recognition result for a user who uttered the wake-up word. The at least one processor may be configured to execute, based on the second recognition result, a designated function with respect to the user in the electronic device. The second recognition result may include identification information indicating the user, obtained from an external electronic device, in which the user is registered, among the one or more external electronic devices. A method performed by an electronic device may include obtaining a voice signal. The method may include generating, from the voice signal including a wake-up word, a first recognition result for the wake-up word. The method may include transmitting the first recognition result to a server connected to one or more external electronic devices capable of recognizing the wake-up word. The method may include receiving, from the server, a second recognition result for a user who uttered the wake-up word. The method may include executing, based on the second recognition result, a designated function with respect to the user in the electronic device. The second recognition result may include identification information indicating the user, obtained from an external electronic device, in which the user is registered, among the one or more external electronic devices. A non-transitory computer-readable storage medium, when executed by at least one processor of an electronic device including communication circuitry and a microphone, may store one or more programs including instructions that cause the electronic device to obtain a voice signal via the microphone. The non-transitory computer-readable storage medium, when executed by the at least one processor, may store one or more programs including instructions that cause the electronic device to generate, from the voice signal including a wake-up word, a first recognition result for the wake-up word. The non-transitory computer-readable storage medium, when executed by the at least one processor, may store one or more programs including instructions that cause the electronic device to transmit the first recognition result to a server connected to one or more external electronic devices capable of recognizing the wake-up word. The non-transitory computer-readable storage medium, when executed by the at least one processor, may store one or more programs including instructions that cause the electronic device to receive, from the server, a second recognition result for a user who uttered the wake-up word. The non-transitory computer-readable storage medium, when executed by the at least one processor, may store one or more programs including instructions that cause the electronic device to execute, based on the second recognition result, a designated function with respect to the user in the electronic device. The second recognition result may include identification information indicating the user, obtained from an external electronic device, in which the user is registered, among the one or more external electronic devices. An electronic device may include communication circuitry. The electronic device may include a microphone. The electronic device may include at least one processor. The at least one processor may be configured to obtain a voice signal via the microphone. The at least one processor may be configured to generate, from the voice signal including a wake-up word, a first recognition result for the wake-up word. The at least one processor may be configured to broadcast the first recognition result to one or more