CN-122029601-A - Electronic device and method for processing user speech

CN122029601ACN 122029601 ACN122029601 ACN 122029601ACN-122029601-A

Abstract

A method according to an embodiment may include an operation of converting a first voice of a user into text. The method may include an operation of dividing text into a plurality of text segments including a first text segment and a second text segment. The method may include an operation of classifying a first text segment mapped to intent information for performing a task as a first type. The method may include an operation of classifying a second text segment that is not mapped to intent information for performing the task as a second type. The method may include performing an operation of a first task including a device control operation for a target device based on first intention information corresponding to a first text segment. The method may include an operation of generating pairing information by pairing the first intention information and the second text segment. The pairing information may be used to perform the first task when text corresponding to the second text segment is recognized in the second voice of the user.

Inventors

Pu Xiangmen
JIN GUITAI
SONG JIAJIN

Assignees

三星电子株式会社

Dates

Publication Date: 20260512
Application Date: 20241010
Priority Date: 20231013

Claims (14)

1.A method, comprising: Converting the first utterance of the user into text; Segmenting the text into a plurality of text segments including a first text segment and a second text segment; Classifying a first text segment mapped to intention information for performing a task into a first type; classifying a second text segment that is not mapped to intent information for performing the task as a second type; Performing a first task including a device control operation on the target device based on first intention information corresponding to the first text segment, and Pairing information is generated by pairing the first intention information with the second text segment, Wherein the pairing information is used to perform the first task when text corresponding to the second text segment is recognized in the second utterance of the user.
2. The method of claim 1, further comprising: When the number of pairs between the first intent information and the second text segment exceeds a threshold, the pair information is determined to be a chain of utterances.
3. The method of any of claims 1 and 2, wherein, when text corresponding to the second text segment is identified in the second utterance of the user, the utterance chain is used to perform the first task based on first intent information included in the utterance chain.
4. A method according to any one of claims 1 to 3, further comprising: The intention information included in the utterance chain is adaptively updated based on the third utterance of the user, Wherein the third utterance comprises: A speech in which text corresponding to the second text segment and intention information different from the first intention information are recognized, or A modification request for the first intent information.
5. The method according to any one of claims 1 to 4, wherein, The first intention information includes a plurality of pieces of intention information, and The first task includes a plurality of tasks corresponding to the plurality of pieces of intention information.
6. The method of any of claims 1-5, wherein the second text segment includes text indicating a condition of the user.
7. The method according to any one of claims 1 to 6, wherein, The recognition of text in the second utterance that corresponds to the second text snippet is performed based on determining similarity between the second text snippet and a second type text snippet that was recognized in the second utterance.
8. An electronic device (101; 201; 501) comprising: A processor (120; 203; 520), and A memory (130; 207; 530) storing instructions, Wherein the instructions, when executed by the processor (120; 203; 520) alone or together, cause the electronic device (101; 201; 501) to: Converting the first utterance of the user into text; Segmenting the text into a plurality of text segments including a first text segment and a second text segment; Classifying a first text segment mapped to intention information for performing a task into a first type; classifying a second text segment that is not mapped to intent information for performing the task as a second type; Performing a first task including a device control operation on the target device based on first intention information corresponding to the first text segment, and Pairing information is generated by pairing the first intention information with the second text segment, Wherein the pairing information is used to perform the first task when text corresponding to the second text segment is recognized in the second utterance of the user.
9. The electronic device (101; 201; 501) of claim 8, wherein the instructions, when executed by the processor (120; 203; 520) alone or together, cause the electronic device (101; 201; 501) to determine the pairing information as a chain of words when the number of pairs between the first intent information and the second text segment exceeds a threshold.
10. The electronic device (101; 201; 501) according to any one of claims 8 and 9, wherein the speech chain is used to perform the first task based on first intention information included in the speech chain when text corresponding to the second text segment is recognized in a second speech of the user.
11. The electronic device (101; 201; 501) according to any one of claims 8 to 10, wherein the instructions, when executed by the processor (120; 203; 520) individually or jointly, cause the electronic device (101; 201; 501) to: The intention information included in the utterance chain is adaptively updated based on the third utterance of the user, Wherein the third utterance comprises: A speech in which text corresponding to the second text segment and intention information different from the first intention information are recognized, or A modification request for the first intent information.
12. The electronic device (101; 201; 501) according to any one of claims 8 to 11, wherein, The first intention information includes a plurality of pieces of intention information, and The first task includes a plurality of tasks corresponding to the plurality of pieces of intention information.
13. The electronic device (101; 201; 501) according to any one of claims 8 to 12, wherein the second text segment comprises text indicating a situation of the user.
14. The electronic device (101; 201; 501) according to any one of claims 8 to 13, wherein, The recognition of text in the second utterance that corresponds to the second text snippet is performed based on determining similarity between the second text snippet and a second type text snippet that was recognized in the second utterance.

Description

Electronic device and method for processing user speech Technical Field Embodiments of the present disclosure relate to an electronic device and method for processing user speech. Background Electronic devices that include voice assistant functionality that provide services based on user utterances have been widely distributed. The electronic device may use the artificial intelligence server to recognize the user utterance and may determine the meaning and intent of the user utterance. The artificial intelligence server may interpret the user utterance to infer an intent of the user and may perform a task according to the inferred intent. The artificial intelligence server may perform tasks according to the user's intent expressed through natural language interactions between the user and the artificial intelligence server. An electronic device including a voice assistant function may chronologically perform an operation of classifying a domain for processing a user utterance and an operation (e.g., an application) of performing a task corresponding to the user utterance in the classified domain (e.g., capsule). The above information may be presented as related art to aid in understanding the present disclosure. There is no demonstration or decision as to whether any of the above could be used as prior art in connection with the present disclosure. Disclosure of Invention Technical proposal Embodiments of the present disclosure may provide a method that includes converting a first utterance of a user into text. The method may include segmenting text into a plurality of text segments including a first text segment (segment) and a second text segment. The method may include classifying a first text segment mapped to intent information for performing a task as a first type. The method may include classifying second text segments that are not mapped to intent information for performing the task as a second type. The method may include performing a first task including a device control operation on the target device based on first intent information corresponding to the first text segment. The method may include generating pairing information by pairing the first intent information with the second text segment. The first task may be performed using the pairing information when text corresponding to the second text segment is identified in the second utterance of the user. Another embodiment of the present disclosure may provide an electronic device including a processor. The electronic device may include a memory storing instructions. The instructions, when executed individually or collectively by the processor, may cause the electronic device to convert the first utterance of the user into text. The instructions, when executed individually or collectively by the processor, may cause the electronic device to segment the text into a plurality of text segments including a first text segment and a second text segment. The instructions, when executed individually or collectively by the processor, may cause the electronic device to classify a first text segment mapped to intent information for performing a task as a first type. The instructions, when executed by the processor alone or in combination, may cause the electronic device to classify a second text segment that is not mapped to intent information for performing the task as a second type. The instructions, when executed individually or collectively by the processor, may cause the electronic device to perform a first task including a device control operation on the target device based on first intent information corresponding to the first text segment. The instructions, when executed by the processor alone or in combination, may cause the electronic device to generate pairing information by pairing the first intent information with the second text segment. The first task may be performed using the pairing information when text corresponding to the second text segment is identified in the second utterance of the user. Another embodiment of the present disclosure may provide a method comprising converting an utterance of a user into text. The method may include segmenting the text into text segments including at least one of the first text segment and the second text segment. The method may include classifying a first text segment mapped to intent information for performing a task as a first type. The method may include classifying second text segments that are not mapped to intent information for performing the task as a second type. The method may include identifying a chain of utterances corresponding to the second text segment. The method may include performing a first task including a device control operation on a target device based on first intent information included in a chain of utterances. The chain of utterances may be a pairing of first intent information recognized in an utterance preceding the utterance with a second type of text snippet obtained