KR-102960700-B1 - SYSTEM AND METHOD FOR NATURAL LANGUAGE UNDERSTANDING AND METHOD FOR PROCESSING LEARNING DATA THEREOF
Abstract
A natural language understanding system and a method for processing training data in said system are disclosed. A natural language understanding system according to one embodiment includes one or more intents and one or more training sentences corresponding to each intent, wherein the intents and the training sentences are input training data composed of either a first language or a second language, and the system includes a training data analysis module that determines whether there is language mixing between intents or within the same intent of the training data; and a weight adjustment module that adjusts the weights of at least some of one or more first language tokens and one or more second language tokens generated from the training data based on the determination result.
Inventors
- 박재혁
- 조강의
Assignees
- 삼성에스디에스 주식회사
Dates
- Publication Date
- 20260507
- Application Date
- 20201126
Claims (16)
- A learning data analysis module that receives learning data comprising one or more intents and one or more learning sentences corresponding to each intent, wherein the intents and learning sentences are composed of either a first language or a second language, and determines whether there is language mixing between intents or within the same intent of the learning data; and A natural language understanding system comprising a weight adjustment module that adjusts the weights of at least some of one or more first language tokens and one or more second language tokens generated from the training data based on the above judgment result.
- In claim 1, A natural language understanding system in which the above weights include one or more of term frequency (TF) or inverse document frequency (IDF).
- In claim 1, If, as a result of the above judgment, there is language mixing between intentions in the above training data, and the share of the first language intention in the above training data is higher than the share of the second language intention, The above weight adjustment module is a natural language understanding system that adjusts the back-document frequency of the first language token generated from the training sentence corresponding to the first language intention.
- In claim 3, It further includes a balance analysis module that determines whether the ratio of the second language token to the first language token in the training data is below a preset threshold, A natural language understanding system in which the above weight adjustment module adjusts the inverse document frequency of the first language token when the ratio of the second language token to the first language token in the training data is below a preset threshold as a result of the judgment of the above balance analysis module.
- In claim 3, The above weight adjustment module is, A natural language understanding system that adjusts the back-document frequency of the first language token based on the ratio of the first language intention to the second language intention.
- In claim 1, If, as a result of the above judgment, there is language mixing within a specific intention of the above learning data, and the share of first language learning sentences within the specific intention is higher than the share of second language learning sentences, The above weight adjustment module is a natural language understanding system that adjusts the word frequency of the second language token generated from the second language learning sentence.
- In claim 6, It further includes a balance analysis module that determines whether the ratio of the second language learning sentence to the first language learning sentence within the specific intention within the above learning data is below a preset threshold, A natural language understanding system in which the above weight adjustment module adjusts the word frequency of the second language token when the ratio of the second language learning sentence to the first language learning sentence within the specific intention is below a preset threshold as a result of the judgment of the above balance analysis module.
- In claim 6, The above weight adjustment module is, A natural language understanding system that adjusts the word frequency of the second language token based on the ratio of the first language learning sentence to the second language learning sentence.
- A step of receiving learning data comprising one or more intents and one or more learning sentences corresponding to each intent, wherein the intents and learning sentences are composed of either a first language or a second language; A step of determining whether there is language mixing between or within the same intention of the above-mentioned training data; and A method for processing training data, comprising the step of adjusting the weights of at least some of one or more first language tokens and one or more second language tokens generated from the training data based on the above judgment result.
- In claim 9, A method for processing training data, wherein the above weights include one or more of term frequency (TF) or inverse document frequency (IDF).
- In claim 9, If, as a result of the above judgment, there is language mixing between intentions in the above training data, and the share of the first language intention in the above training data is higher than the share of the second language intention, A training data processing method wherein the step of adjusting the weights is configured to adjust the inverse document frequency of the first language token generated from the training sentence corresponding to the first language intention.
- In claim 11, A training data processing method, wherein the step of adjusting the weights is configured to adjust the inverse document frequency of the first language token when the ratio of the second language token to the first language token in the training data is below a preset threshold.
- In claim 11, The step of adjusting the above weights is, A training data processing method configured to adjust the inverse document frequency of the first language token based on the ratio of the first language intention to the second language intention.
- In claim 9, If, as a result of the above judgment, there is language mixing within a specific intention of the above learning data, and the share of first language learning sentences within the specific intention is higher than the share of second language learning sentences, A method for processing learning data, wherein the step of adjusting the weights is configured to adjust the word frequency of the second language tokens generated from the second language learning sentences.
- In claim 14, A method for processing learning data, wherein the step of adjusting the weight is configured to adjust the word frequency of the second language token when the ratio of the second language learning sentence to the first language learning sentence within the specific intention is below a preset threshold.
- In claim 14, A method for processing learning data, wherein the step of adjusting the weights is configured to adjust the word frequency of the second language token based on the ratio of the first language intention to the second language intention.
Description
System for natural language understanding and method for processing learning data in said system The disclosed embodiments relate to natural language processing/understanding technology using machine learning. In computer science, Natural Language Understanding (NLU) refers to the process in which a computer receives sentences composed of natural languages—such as Korean, Japanese, or English, which humans commonly use for communication—and infers the intent of those sentences. While various technologies exist for understanding natural languages on computers, machine learning-based NLU technologies have been the primary focus of recent research. Most NLU models currently on the market are modeled by selecting only a specific language, such as Korean or English. However, this presents difficulties for service providers (e.g., chatbot services) in providing multilingual services. This is because different NLU models must be created separately for each type of language provided by the service provider. Furthermore, from the perspective of service users, there is the inconvenience of having to separately select an appropriate NLU model based on their language before using the service. Consequently, a means to configure two or more different languages into a single NLU model has become necessary. FIG. 1 is a block diagram for explaining an NLU system (100) according to one embodiment. FIG. 2 is an exemplary diagram illustrating the configuration of training data according to one embodiment. FIG. 3 is an illustrative diagram for explaining the configuration of training data according to another embodiment. FIG. 4 is a block diagram for explaining in detail a feature extractor (104) of an NLU system (100) according to one embodiment. FIG. 5 is a flowchart for explaining a learning data processing process (500) according to one embodiment. FIG. 6 is a block diagram illustrating a computing environment including a computing device suitable for use in exemplary embodiments. Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The following detailed description is provided to facilitate a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, this is merely illustrative and the present invention is not limited thereto. In describing the embodiments of the present invention, detailed descriptions of known technologies related to the present invention are omitted if it is determined that such detailed descriptions may unnecessarily obscure the essence of the present invention. Furthermore, the terms described below are defined in consideration of their functions within the present invention, and these may vary depending on the intentions or practices of the user or operator. Therefore, such definitions should be based on the content throughout this specification. Terms used in the detailed description are intended merely to describe the embodiments of the present invention and should not be limiting in any way. Unless explicitly stated otherwise, expressions in the singular form include the meaning of the plural form. In this description, expressions such as "include" or "comprise" are intended to refer to certain characteristics, numbers, steps, actions, elements, parts thereof, or combinations thereof, and should not be interpreted to exclude the existence or possibility of one or more other characteristics, numbers, steps, actions, elements, parts thereof, or combinations thereof other than those described. FIG. 1 is a block diagram illustrating a Natural Language Understanding (NLU) system (100) according to one embodiment. The NLU system (100) according to one embodiment is a system for constructing an NLU model by learning training data composed of a plurality of natural language sentences and classifying the intent of an input natural language sentence using the same. As illustrated, the NLU system (100) according to one embodiment includes a token generator (102), a feature extractor (104), and an intent classifier (106). A token generator (102, Tokenizer) receives training data and generates tokens from training sentences included in the training data. In the disclosed embodiments, the training data includes one or more intents and one or more training sentences corresponding to each intent, and the intents and training sentences may be composed of either a first language or a second language. For example, if the first language is Korean and the second language is English, the training data may include multiple training sentences composed of English or Korean, and each training sentence may be classified into one of multiple intents composed of English or Korean. For convenience in the following description, the first language is assumed to be Korean and the second language to be English. However, the disclosed embodiments are not limited to a specific type of language. FIG. 2 is an illustrative diagram for expla