KR-102961722-B1 - VOICE RECOGNITION METHOD AND DEVICE

KR102961722B1KR 102961722 B1KR102961722 B1KR 102961722B1KR-102961722-B1

Abstract

A voice recognition device and a method thereof are disclosed. In the case where the user's intention regarding the utterance cannot be identified from the tail utterance in a speech utterance divided into a head utterance and a tail utterance, the voice recognition device according to the present invention identifies the intention from the head utterance to complete the speech utterance and provides a voice recognition processing result for the speech utterance. The voice recognition device according to the present invention may be linked with an artificial intelligence module, a robot, an augmented reality (AR) device, a virtual reality (VR) device, a device related to 5G services, etc.

Inventors

유현
김병하
김예진

Assignees

엘지전자 주식회사

Dates

Publication Date: 20260508
Application Date: 20191029

Claims (20)

In the speech recognition method, A step of determining whether to temporarily pause the reception of the first utterance during the reception of the first utterance by the first processing unit; A step of outputting the voice recognition processing result of the second utterance received after the pause by the second processing unit; A step of determining the intention of a third utterance received after outputting the voice recognition result of the second utterance by an intention classification unit; A speech recognition method comprising the step of generating a user voice command based on the intent of the first utterance and the third utterance by a conversation management unit and outputting a speech recognition processing result of the user voice command.
◈Claim 2 was waived upon payment of the establishment registration fee.◈ In claim 1, The step of determining the intention of the third utterance above is, A step of extracting first information related to an entity name and the intent of the third utterance from the third utterance using an entity name identification unit and an intent classification table; and The method further includes the step of inserting the above-mentioned entity name and the above-mentioned first information into slots included in the above-mentioned intention classification table, A speech recognition method in which the above slots are associated with a plurality of intention items included in the above intention classification table.
◈Claim 3 was waived upon payment of the establishment registration fee.◈ In claim 2, After the above insertion step, A speech recognition method further comprising the step of determining whether there is an intention item among the above intention items in which a minimum number of slots for intention determination are filled.
◈Claim 4 was waived upon payment of the establishment registration fee.◈ In claim 3, The step of determining whether there is an intended item with the above minimum number of slots filled is, If there is no intention item filled with the minimum number of slots mentioned above, a step of extracting second information related to the entity name and the intention of the first utterance from the first utterance using the entity name identification unit and the intention classification table; A speech recognition method further comprising the step of inserting the above-mentioned entity name and the above-mentioned second information into slots included in the above-mentioned intention classification table.
◈Claim 5 was waived upon payment of the establishment registration fee.◈ In claim 4, The step of extracting second information related to the above entity name and the intention of the above first utterance is, A speech recognition method further comprising the step of requesting the user to provide the entity name and the second information when the entity name and the second information cannot be extracted from the first utterance.
◈Claim 6 was waived upon payment of the establishment registration fee.◈ In claim 3, The step of determining whether there is an intended item with the above minimum number of slots filled is, A speech recognition method further comprising the step of determining the intention item with the minimum number of slots filled as the intention item of the third utterance if there is at least one intention item with the minimum number of slots filled.
◈Claim 7 was waived upon payment of the establishment registration fee.◈ In claim 1, The step of determining whether to temporarily suspend the reception of the first utterance is, A step of determining whether the above-mentioned first utterance includes a non-linguistic element, such as a filled pause; If the above interjection is included in the first utterance, the step of recognizing the above interjection as a pause signal for receiving the first utterance; and A voice recognition method further comprising the step of pausing reception of the first utterance.
◈Claim 8 was waived upon payment of the establishment registration fee.◈ In claim 7, The step of determining whether the above filled pause is included is: A step of recognizing one or more words from the first utterance; and A speech recognition method comprising an additional step of comparing whether a word in a pre-existing speech dictionary is identical or similar to the word.
◈Claim 9 was waived upon payment of the establishment registration fee.◈ In claim 1, The step of determining whether to temporarily suspend the reception of the first utterance is, A voice recognition method further comprising the step of pausing the reception of the first utterance when a silent delay occurs for a preset time during the reception of the first utterance.
◈Claim 10 was waived upon payment of the establishment registration fee.◈ In claim 1, The step of determining whether to temporarily suspend the reception of the first utterance is, A step of determining whether the first utterance includes a pre-set keyword for determining a pause; and A voice recognition method further comprising the step of pausing reception of the first utterance if the above-mentioned keyword for pausing determination is included in the first utterance.
◈Claim 11 was waived upon payment of the establishment registration fee.◈ In claim 1, The step of determining whether to temporarily suspend the reception of the first utterance is, A voice recognition method further comprising the step of waiting for voice recognition processing of the first utterance received prior to the pause when the reception of the first utterance is paused.
In claim 1, A voice recognition method further comprising the step of receiving a trigger word before receiving the first utterance and initiating a voice recognition activation state.
In a voice recognition device, A first processing unit that determines whether to temporarily pause the reception of the first utterance during the reception of the first utterance; A second processing unit that outputs the voice recognition processing result of the second utterance received after the above pause; An entity name identification unit capable of extracting an entity name from a third utterance received after outputting the voice recognition results of the first utterance and the second utterance; An intention classification unit that determines an intention common to the first utterance and the third utterance; It includes a dialog management unit that generates a user voice command based on the above common intent and outputs the voice recognition processing result of the user voice command, and A speech recognition device comprising an intent classification unit further including an intent classification table comprising a plurality of items and slots associated with the items.
In claim 13, The above first processing unit is, A speech recognition device that, when the first utterance includes an interjection which is a non-linguistic element, recognizes the interjection as a signal to pause the reception of the first utterance and pauses the reception of the first utterance.
In claim 13, A voice recognition device further comprising a preprocessing unit that receives a starter word before receiving the first utterance and switches the voice recognition device to an activated mode.
◈Claim 16 was waived upon payment of the establishment registration fee.◈ In claim 15, The above second processing unit is, A voice recognition device that maintains the activation mode of the voice recognition device after outputting the voice recognition result of the second utterance above.
In claim 13, The above-mentioned entity name identification unit is, A voice recognition device that preferentially extracts the entity name from the third utterance, and if the entity name cannot be extracted from the third utterance, extracts the entity name from the first utterance.
◈Claim 18 was waived upon payment of the establishment registration fee.◈ In claim 17, The above intention classification unit is, At least one of the entity name extracted from the entity name identification unit, the first information regarding the intention of the third utterance, and the second information regarding the intention of the first utterance is inserted into the slot, A voice recognition device that inserts the first information into the slot when the first information exists among the first information and the second information, and inserts the second information into the slot when the first information does not exist.
In claim 13, The above first processing unit is, A voice recognition device that recognizes one or more words from the first utterance and compares the words with words in a pre-existing gnot dictionary, and pauses the reception of the first utterance when the words are identical or similar to the words in the gnot dictionary.
In claim 13, The above first processing unit is, A voice recognition device that pauses the reception of the first utterance when a silent delay occurs for a preset time during the reception of the first utterance.

Description

Voice Recognition Method and Voice Recognition Device The present invention relates to a voice recognition method and a voice recognition device. With the advancement of technology, various devices and services applying speech recognition technology are being introduced in many fields recently. Speech recognition technology can be described as a series of processes that convert human-spoken speech into commands that a computer can handle so that a device can understand them, and speech recognition services utilizing speech recognition technology include a series of processes in which a device recognizes a user's voice and provides a suitable corresponding service. When a user utilizes a device equipped with voice recognition capabilities, they can use the device through completed utterances. However, there are instances where a silent delay occurs or interjections are uttered during speech because the user does not know the information or words necessary to complete the utterance, or is unable to recall such information or words instantaneously. In other words, the user may hesitate because the word required to complete the utterance does not come to mind immediately, or search for the word through other means or channels while attempting to complete the speech. In this case, the speech recognition device includes delays or interjections caused by the user to recognize the utterance as complete, and processes an utterance that is actually incomplete, resulting in an incorrect speech recognition processing result. To address this, a method was proposed to divide a user's utterance into a head utterance and a tail utterance based on delays or interjections, and to complete the user's utterance by combining the head utterance and the tail utterance while excluding the delays or interjections. However, conventional methods for combining head utterances and tail utterances combine head utterances and tail utterances separated by delays or interjections in the same order; consequently, when the head utterance and tail utterance contain the same words or information, the user's utterance may not be accurately recognized by the device due to the duplicate words or information. For example, if the head utterance is "in the drama Iris" and the tail utterance is "tell me the role Kim Tae-hee played in the drama Iris," the device recognizes the user's utterance as "tell me the role Kim Tae-hee played in the drama Iris," resulting in cases where the device fails to accurately recognize the user's utterance. FIG. 1 illustrates a block diagram of a wireless communication system to which the methods proposed in this specification can be applied. Figure 2 shows an example of a signal transmission/reception method in a wireless communication system. Figure 3 shows an example of the basic operation of a user terminal and a 5G network in a 5G communication system. FIG. 4 is a block diagram of an AI device according to one embodiment of the present invention. FIG. 5 is a diagram showing a voice recognition system according to the present invention. FIG. 6 is a drawing showing the external appearance of a voice recognition device according to one embodiment of the present invention. FIG. 7 is a block diagram showing the configuration of a voice recognition device according to one embodiment of the present invention. FIG. 8 is a block diagram showing the specific configuration of the voice recognition processing unit illustrated in FIG. 7. FIG. 9 is a flowchart illustrating a method for a voice recognition device according to the present invention to recognize a user's utterance. FIG. 10 is a flowchart illustrating a method for a voice recognition device according to the present invention to determine the intention of a third utterance. FIG. 11 is an illustrative diagram explaining an example of a voice recognition device according to the present invention recognizing a user's utterance. Hereinafter, embodiments disclosed in this specification will be described in detail with reference to the attached drawings. Identical or similar components regardless of drawing symbols are assigned the same reference number, and redundant descriptions thereof will be omitted. The suffixes "module" and "part" used for components in the following description are assigned or used interchangeably solely for the ease of drafting the specification and do not inherently possess distinct meanings or roles. Furthermore, in describing the embodiments disclosed in this specification, if it is determined that a detailed description of related prior art could obscure the essence of the embodiments disclosed in this specification, such detailed description will be omitted. Additionally, the attached drawings are intended only to facilitate understanding of the embodiments disclosed in this specification; the technical concept disclosed in this specification is not limited by the attached drawings, and it should be understood that they include all modificati