KR-20260063654-A - APPARATUS AND METHOD FOR DETECTING HATE SPEECH USING DATA CLEANSING-BASED LEARNING

KR20260063654AKR 20260063654 AKR20260063654 AKR 20260063654AKR-20260063654-A

Abstract

The present invention relates to an apparatus and method for detecting hate speech through data refinement-based learning, wherein the apparatus comprises: a data mapping and classification unit that classifies each data instance by tracking the learning dynamics of each data instance using a pre-trained BERT model; a commenter consensus calculation unit that collects labels of each instance provided by a plurality of commenters and calculates the commenter consensus between the labels to classify them into agreed instances and non-agreed instances; a data refinement unit that refines each data instance based on the classification based on the learning pattern and the classification based on the consensus between the labels to generate a refined dataset; and a model learning unit that learns various neural network models based on the refined dataset and selects the optimal neural network model among the various neural network models.

Inventors

한요섭
김도경
안혜선

Assignees

연세대학교 산학협력단

Dates

Publication Date: 20260507
Application Date: 20241030

Claims (9)

A data mapping and classification unit that classifies each data instance by tracking the learning dynamics of each data instance using a pre-trained BERT model; A commenter consensus calculation unit that collects labels for each instance provided by multiple commenters, calculates the commenter consensus between the labels, and classifies them into agreed instances and non-agreed instances; A data refinement unit that refines each data instance based on the classification based on the learning pattern and the classification based on the consensus between the labels to generate a refined dataset; and A hate speech detection device based on data refinement learning, comprising a model learning unit that learns various neural network models based on the above refined dataset and selects the optimal neural network model among the above various neural network models.
In paragraph 1, the data mapping and classification unit A hate speech detection device based on data cleaning learning, characterized by determining a learning pattern by monitoring changes in the prediction probability of each data instance over several epochs and classifying each data instance based on the learning pattern.
In paragraph 2, the data mapping and classification unit A hate speech detection device through data cleaning-based learning characterized by classifying each of the above data instances into EtL (Easy-to-Learn) instances, AtL (Ambiguous-to-Learn) instances, or HtL (Hard-to-Learn) instances.
In paragraph 1, the above annotator agreement calculation unit A hate speech detection device based on data refinement learning, characterized by assigning a label of each instance as a hate label or a non-hate label through a heterogeneous AI bot implementing the aforementioned multiple annotators.
In paragraph 4, the above-mentioned annotator agreement calculation unit A hate speech detection device based on data refinement learning, characterized by calculating an average value based on the label distribution between the above labels, classifying each instance as a consensus instance if the average value is greater than or equal to a threshold, and classifying it as a non-consensus instance otherwise.
In paragraph 1, the data refinement unit A hate speech detection device based on data refinement learning, characterized by classifying the above-mentioned non-consensus instances as noise and not including them in the above-mentioned refined dataset.
In paragraph 1, the data refinement unit A hate speech detection device based on data refinement learning, characterized by combining the learning dynamics and the annotator consensus through the CONELA (COnsensual Elimination of Non-consensual Easy-to-Learn and Hard-to-Learn Annotations) strategy to avoid including unnecessary noise or data with a high risk of misclassification in the refined dataset.
In paragraph 1, the model learning unit A hate speech detection device based on data refinement learning, characterized by training a BERT (Bidirectional Encoder Representations from Transformers) model, a HateBERT model, and a RoBERTa (A Robustly Optimized BERT Pretraining Approach) model based on the above refined dataset, and selecting the optimal neural network model through hyperparameter tuning and cross-validation.
In a method for detecting hate speech through data refinement-based learning performed in a hate speech detection device through data refinement-based learning, A data mapping and classification step that classifies each data instance by tracking the learning dynamics of each data instance using a pre-trained BERT model; A commenter consensus calculation step that collects labels for each instance provided by multiple commenters, calculates the commenter consensus between the labels, and classifies them into agreed instances and non-agreed instances; A data cleaning step for generating a cleaned dataset by cleaning each data instance based on the classification based on the learning pattern and the classification based on the consensus between the labels; and A method for detecting hate speech through data refinement-based learning, comprising a model learning step of learning various neural network models based on the above refined dataset and selecting the optimal neural network model among the above various neural network models.

Description

Apparatus and Method for Detecting Hate Speech Using Data Cleansing-Based Learning The present invention relates to a hate speech detection technology, and more specifically, to a hate speech detection device and method through data refinement-based learning that can improve data quality and effectively detect implicit hate speech. In modern society, the rapid development of online platforms and social media has expanded the scope of information sharing and communication, but at the same time, it has produced the side effect of the spread of hate speech. In particular, implicit hate speech refers to expressions that promote prejudice or discrimination against specific groups or individuals without explicit profanity or slander, making its detection more complex and difficult. Because the meaning of such implicit hate speech can vary depending on the context and requires subjective interpretation, it remains a major challenge in the field of natural language processing (NLP). Existing hate speech detection models have been trained primarily on explicit expressions and suffer from performance degradation in new environments or languages due to a lack of cross-domain generalization ability. Additionally, label inconsistencies among annotators can degrade data quality and negatively impact model performance. Accordingly, a new approach was required to improve data quality and effectively detect implicit hate speech. FIG. 1 is a diagram illustrating a hate speech detection system according to the present invention. FIG. 2 is a diagram illustrating the system configuration of a hate speech detection device according to the present invention. FIG. 3 is a diagram illustrating the functional configuration of a hate speech detection device according to the present invention. Figure 4 is a flowchart illustrating a method for detecting hate speech through data refinement-based learning according to the present invention. The description of the present invention is merely an example for structural or functional explanation, and therefore the scope of the present invention should not be interpreted as being limited by the examples described in the text. That is, since the examples are subject to various modifications and may take various forms, the scope of the present invention should be understood to include equivalents capable of realizing the technical concept. Furthermore, the objectives or effects presented in the present invention do not imply that a specific example must include all of them or only such effects; therefore, the scope of the present invention should not be understood as being limited by them. Meanwhile, the meaning of the terms described in this application should be understood as follows. Terms such as "first," "second," etc., are intended to distinguish one component from another, and the scope of rights shall not be limited by these terms. For example, the first component may be named the second component, and similarly, the second component may be named the first component. When it is stated that one component is "connected" to another component, it should be understood that it may be directly connected to that other component, or that there may be other components in between. Conversely, when it is stated that one component is "directly connected" to another component, it should be understood that there are no other components in between. Meanwhile, other expressions describing the relationships between components, such as "between" and "exactly between," or "adjacent to" and "directly adjacent to," should be interpreted in the same way. A singular expression should be understood to include a plural expression unless the context clearly indicates otherwise, and terms such as "include" or "have" are intended to specify the existence of the implemented features, numbers, steps, actions, components, parts, or combinations thereof, and should be understood not to preclude the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. In each step, identifiers (e.g., a, b, c, etc.) are used for convenience of explanation and do not describe the order of the steps; the steps may occur differently from the specified order unless a specific order is clearly indicated in the context. That is, the steps may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order. The present invention may be implemented as computer-readable code on a computer-readable recording medium, and the computer-readable recording medium includes all types of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. Additionally, the computer-readable recording medium may be distributed across networked computer systems, so that computer-readabl