KR-102962665-B1 - ARTIFICIAL INTELLIGENCE APPARATUS AND METHOD FOR RECOGNIZING SPEECH WITH MULTIPLE LANGUAGES

KR102962665B1KR 102962665 B1KR102962665 B1KR 102962665B1KR-102962665-B1

Abstract

The present disclosure provides an artificial intelligence device comprising a microphone for acquiring a voice including a plurality of languages, and a processor for generating text data corresponding to the voice from the voice, generating a plurality of separated data by separating the text data by each language, performing natural language understanding processing corresponding to each language of the plurality of separated data to generate a natural language understanding processing result for each of the separated data, acquiring command information regarding a command to be indicated by voice and slot information regarding an object to be the target of the command based on the natural language understanding processing result, performing an operation corresponding to the voice based on the command information and slot information, and generating a response based on the result of the operation.

Inventors

유현
김병하
김예진
채종훈

Assignees

엘지전자 주식회사

Dates

Publication Date: 20260508
Application Date: 20191219

Claims (18)

In a speech recognition method of an artificial intelligence device including a microphone and a processor, The above microphone acquires a voice including multiple languages; The above processor generates text data corresponding to the voice from the voice; The above processor generates a plurality of separated data sets, each separated by language, of the text data; The above processor performs natural language understanding processing corresponding to the language of each of the plurality of separated data to generate a natural language understanding processing result for each of the separated data; The above processor obtains command information regarding a command to be indicated by voice and slot information regarding an object that is the target of the command, based on the result of the natural language understanding processing; A step of performing an action corresponding to the voice based on the above command information and slot information; and The above processor includes the step of generating a response based on the result of performing the above operation, A speech recognition method including multiple languages.
In paragraph 1, The step of receiving voice including the above plurality of languages is, The method includes the step of receiving a voice comprising a dual language consisting of a first language and a second language, and The step of generating the above separated data is, A method comprising the step of generating first separation data for the first language and second separation data for the second language. A speech recognition method including multiple languages.
In paragraph 1, The above separated data is, including at least one of language type, text corresponding to voice, whether main intent is present, intent information, and object information, A speech recognition method including multiple languages.
In paragraph 1, The step of generating the above natural language understanding processing result is, A step comprising obtaining at least one of intent information and object information for each of the above separated data, A speech recognition method including multiple languages.
◈Claim 5 was waived upon payment of the establishment registration fee.◈ In paragraph 4, The step of generating the above natural language understanding processing result is, A step including determining the main intention among the intention information included in each of the above-mentioned separate data, A speech recognition method including multiple languages.
◈Claim 6 was waived upon payment of the establishment registration fee.◈ In paragraph 5, The step of generating the above natural language understanding processing result is, A step comprising updating object information included in separated data containing intention information not determined by the main intention based on the main intention, A speech recognition method including multiple languages.
◈Claim 7 was waived upon payment of the establishment registration fee.◈ In paragraph 5, The step of acquiring the above-mentioned voice intent information and slot information is, A step comprising obtaining slot information corresponding to the main intention based on the above object information, A speech recognition method including multiple languages.
In paragraph 1, The step of performing an action corresponding to the above voice is, A step of determining an external device that performs an operation corresponding to the command information based on the slot information; A step of transmitting the command information to the above external device; A step comprising receiving the result of performing an operation corresponding to the command information from the external device, A speech recognition method including multiple languages.
◈Claim 9 was waived upon payment of the establishment registration fee.◈ In paragraph 5, The step of generating the above response is, A step comprising generating a natural language response based on the language type of the separated data corresponding to the main intent above, A speech recognition method including multiple languages.
A microphone for acquiring speech including multiple languages; and Generate text data corresponding to the voice from the voice, and Generate multiple separated data sets by separating the above text data by language, and Natural language understanding processing corresponding to the language of each of the above plurality of separated data is performed to generate a natural language understanding processing result for each of the separated data, and Based on the above natural language understanding processing results, command information regarding the command to be indicated by voice and slot information regarding the target object of the command are obtained, and Based on the above command information and slot information, an action corresponding to the above voice is performed, and A processor that generates a response based on the result of performing the above operation, Artificial intelligence device.
In Paragraph 10, The above microphone is, Receiving a voice including a dual language composed of a first language and a second language, and The above processor is, Generating first separation data for the first language and second separation data for the second language Artificial intelligence device.
In Paragraph 10, The above separated data is, including at least one of language type, text corresponding to voice, whether main intent is present, intent information, and object information, Artificial intelligence device.
In Paragraph 10, The above processor is, Acquiring at least one of intent information and object information for each of the above separated data, Artificial intelligence device.
◈Claim 14 was waived upon payment of the establishment registration fee.◈ In Paragraph 13, The above processor is, Determining the main intention among the intention information included in each of the above separated data, Artificial intelligence device.
◈Claim 15 was waived upon payment of the establishment registration fee.◈ In Paragraph 14, The above processor is, Updating object information included in separated data containing intention information not determined by the main intention above, based on the main intention, Artificial intelligence device.
◈Claim 16 was waived upon payment of the establishment registration fee.◈ In Paragraph 14, The above processor is, A step comprising obtaining slot information corresponding to the main intention based on the above object information, Artificial intelligence device.
In Paragraph 10, The above processor is, Determining an external device that performs an operation corresponding to the command information based on the above slot information, and A communication unit further comprising transmitting command information to the external device and receiving the result of performing an operation corresponding to the command information from the external device. Artificial intelligence device.
◈Claim 18 was waived upon payment of the establishment registration fee.◈ In Paragraph 14, The above processor is, Generating a natural language response based on the language type of the separated data corresponding to the main intent above, Artificial intelligence device.

Description

Artificial Intelligence Apparatus and Method for Recognizing Speech with Multilingual Languages The present disclosure relates to an artificial intelligence device and method for recognizing speech containing multiple languages and generating a response. Recently, there has been an increase in devices that receive sound input to perform control. Devices such as AI speakers and smartphones equipped with voice recognition capabilities recognize the user's spoken voice and perform control or provide a response corresponding to the recognition result. As globalization increases, users' spoken voices often contain multiple languages. However, since speech recognition models separate each language and process speech appropriate for each language, the recognition rate for sentences containing multiple languages is low. In addition, in the case of voice commands containing multiple languages, there is a problem in that the voice command cannot be understood because it is difficult to grasp the intent of the voice command. FIG. 1 shows an AI device according to one embodiment of the present disclosure. FIG. 2 shows an AI server according to one embodiment of the present disclosure. FIG. 3 shows an AI system according to one embodiment of the present disclosure. FIG. 4 is a block diagram showing an AI device according to one embodiment of the present disclosure. Figure 5 is a diagram illustrating the problems that occur when voice containing multiple languages is input. FIG. 6 is a flowchart for explaining a speech recognition method according to one embodiment of the present disclosure. FIG. 7 is a drawing for explaining a speech recognition process according to one embodiment of the present disclosure. FIGS. 8 and 9 are flowcharts for explaining a process of processing speech including multiple languages in one embodiment of the present disclosure. FIGS. 10 and FIGS. 11 are flowcharts for explaining a natural language understanding processing process according to one embodiment of the present disclosure. FIGS. 12 and FIGS. 13 are flowcharts for explaining a method of performing an operation corresponding to a voice command according to one embodiment of the present disclosure. Hereinafter, embodiments disclosed in this specification will be described in detail with reference to the attached drawings. Identical or similar components, regardless of drawing symbols, are assigned the same reference number, and redundant descriptions thereof will be omitted. The suffixes "module" and "part" used for components in the following description are assigned or used interchangeably solely for the ease of drafting the specification and do not inherently possess distinct meanings or roles. Furthermore, in describing embodiments disclosed in this specification, if it is determined that a detailed description of related prior art could obscure the essence of the embodiments disclosed in this specification, such detailed description will be omitted. Additionally, the attached drawings are intended only to facilitate understanding of the embodiments disclosed in this specification; the technical concept disclosed in this specification is not limited by the attached drawings, and it should be understood that they include all modifications, equivalents, and substitutions that fall within the concept and technical scope of this disclosure. Terms including ordinal numbers, such as first, second, etc., may be used to describe various components, but said components are not limited by said terms. These terms are used solely for the purpose of distinguishing one component from another. When it is stated that one component is "connected" or "connected" to another component, it should be understood that while it may be directly connected or connected to that other component, there may also be other components in between. On the other hand, when it is stated that one component is "directly connected" or "directly connected" to another component, it should be understood that there are no other components in between. Artificial Intelligence (AI) Artificial intelligence refers to the field of researching artificial intelligence or the methodologies to create it, while machine learning refers to the field of researching methodologies to define and solve various problems addressed within the field of artificial intelligence. Machine learning is also defined as an algorithm that improves performance on a task through continuous experience. An Artificial Neural Network (ANN) is a model used in machine learning that can refer to any model capable of problem-solving, composed of artificial neurons (nodes) that form a network through the connection of synapses. An artificial neural network can be defined by connection patterns between neurons in different layers, a learning process that updates model parameters, and an activation function that generates output values. An artificial neural network may include an input layer, an output layer, and optiona