CN-121980402-A - Viewpoint classification method and system based on feature custom weight enhancement

CN121980402ACN 121980402 ACN121980402 ACN 121980402ACN-121980402-A

Abstract

The invention relates to the technical field of knowledge viewpoint classification, in particular to a viewpoint classification method and system based on feature self-defined weight enhancement. The method comprises the steps of S1, collecting viewpoint text data, S2, preprocessing the collected data to extract key feature words, S3, carrying out vector characterization on the key feature words by using Word2Vec, S4, using a TF-IDF superposition enhancement weight algorithm to the vectorized key feature words, S5, constructing a viewpoint classifier by using a machine learning algorithm model, and verifying the effect of the extracted key feature words in classification. The invention designs an online collaborative knowledge construction viewpoint classification method driven by machine learning and comprising six links, adopts Word2Vec to carry out vector characterization on feature words, adopts SMOTE oversampling technology to solve the problem of unbalanced sample size, adopts four machine learning algorithm models to construct viewpoint classifiers, particularly aims at distinguishing the importance of different features, self-defines a superposition enhancement weight algorithm based on TF-IDF, and improves the accuracy of machine learning viewpoint classification.

Inventors

CHEN JIE
LI JIAZHENG
TIAN XIAONAN
JIA DONGXUE
GUO ZIYI

Assignees

石河子大学

Dates

Publication Date: 20260505
Application Date: 20260106

Claims (7)

1. A viewpoint classification method based on feature self-defined weight enhancement is characterized by comprising the following steps of S1, collecting viewpoint text data; S2, preprocessing the acquired data, and further extracting key feature words; s3, carrying out vector characterization on the key feature words by using Word2 Vec; s4, using TF-IDF superposition enhancement weight algorithm to the key feature words after the vector characterization to obtain the weight of the key feature words so as to distinguish the importance of different features, thereby achieving the classification of knowledge views; S5, constructing a viewpoint classifier by adopting a machine learning algorithm model, and verifying the effect of extracting the key feature words in classification.
2. The viewpoint classification method based on feature custom weight enhancement according to claim 1, wherein the preprocessing of S2 is to first perform noise reduction processing and then extract key feature words by using a word segmentation tool Jieba.
3. The viewpoint classification method based on feature self-defined weight enhancement according to claim 2, wherein in S3, word2Vec adopts Skip-gram model to vectorize key feature words, and the processed vector is weighted by TF-IDF features.
4. The viewpoint classification method based on feature custom weight enhancement according to claim 3, wherein in S4, the feature superposition enhancement weight custom is calculated as follows: (1) Representing features In category The number of occurrences of (a); (2) Representing features In category The percentage of occurrence in (a); (3) Representing the weight of each feature calculated last; (4) Representing the characteristics of The number in this type; the number of occurrences of each feature in all types is defined as: 3 represents 3 types, and the value range of t is [1, m ]; Each feature is provided with In category Assigned weights in (a) The calculation formula of (2) is as follows: 。
5. The feature-custom-weight-enhancement-based viewpoint classification method according to claim 4, wherein the machine learning algorithm model in S4 includes any one of logistic regression, random forest, K-nearest neighbor algorithm and support vector machine.
6. A knowledge perspective classification system for implementing the knowledge perspective classification method of any one of claims 1-5, characterized in that the classification system comprises, The acquisition module is used for acquiring the viewpoint text data; the preprocessing module is used for preprocessing the data acquired by the acquisition module so as to extract key feature words; The vector characterization module is used for carrying out vector characterization on the key feature words extracted by the preprocessing module by using Word2 Vec; the weight acquisition module is used for obtaining the weight of the key feature words by using a TF-IDF superposition enhancement weight algorithm after the vector characterization module performs vector characterization on the key feature words so as to distinguish the importance of different features, thereby achieving the classification of knowledge views; and the verification module is used for constructing a viewpoint classifier by adopting a machine learning algorithm model and verifying the effect of extracting the key feature words in classification.
7. An electronic device comprising at least one processor and a memory communicatively coupled to the processor, wherein the memory stores instructions for execution by the processor to enable the processor to perform the knowledge perspective classification method of any one of claims 1-5.

Description

Viewpoint classification method and system based on feature custom weight enhancement Technical Field The invention relates to the technical field of knowledge viewpoint classification, in particular to a viewpoint classification method and system based on feature self-defined weight enhancement. Background Knowledge perspective classification is a core task in the field of Natural Language Processing (NLP) aimed at structurally classifying perspectives, standings or emotional tendencies in text. In combination with the recent research progress, the background state of the art is as follows: Rule-based methods identify viewpoint polarity by emotion dictionary and syntactic analysis. The domestic research proposes an improvement rule on Chinese negative words and degree adverbs, for example, classification accuracy is improved through a combination of turning words and emotion words. Statistical learning models such as SVM and random forest are widely applied to early research. The domestic scholars select the characteristics by combining TF-IDF with statistics, and the classification precision of more than 85% is realized on the Chinese film-to-film data set. The bidirectional LSTM combines with the attention mechanism method that domestic team puts forward 'emotion enhanced LSTM', and F1 value is improved by 5% on Chinese microblog data set by introducing emotion word embedding and position coding. The present viewpoint classification method is based on the capture of semantics, which involves the characterization of feature vectors. The weight enhancement of word2vec vectorized feature words by means of TF-IDF can improve text representation effect to a certain extent, but a plurality of short plates exist, and the method is mainly embodied in the aspects of low semantic understanding depth, insensitivity to lexical sequence and syntax structure, limited data processing capacity, calculation efficiency and the like. Disclosure of Invention In order to comprehensively solve the problems, the invention provides a viewpoint classification method and system based on feature self-defined weight enhancement, which can accurately classify, monitor and intervene in real time on viewpoints in the online collaborative knowledge construction process, support the thinking back of learners to the knowledge construction process and promote viewpoint improvement. In order to avoid the inefficiency of manual coding and insufficient interpretability of deep learning, a machine learning driven on-line collaborative knowledge construction viewpoint classification method comprising six links is designed, a Word2Vec is adopted to carry out vector characterization on feature words, a SMOTE oversampling technology is used for solving the problem of unbalanced sample size, four machine learning algorithm models of Random Forest (RF), K Nearest Neighbor (KNN), support Vector Machine (SVM) and Logistic Regression (LR) are adopted to construct a viewpoint classifier, and particularly, in order to distinguish the importance of different features, a TF-IDF superposition enhancement weight algorithm is customized, so that the accuracy of machine learning viewpoint classification is improved. In order to achieve the above object, a first aspect of the present invention provides a viewpoint classification method based on feature-defined weight enhancement, comprising the steps of S1, collecting viewpoint text data; S2, preprocessing the acquired data, and further extracting key feature words; s3, carrying out vector characterization on the key feature words by using Word2 Vec; s4, using a TF-IDF superposition enhancement weight algorithm to the key feature words after the vector characterization; S5, constructing a viewpoint classifier by adopting a machine learning algorithm model, and verifying the effect of extracting the key feature words in classification. Preferably, the noise reduction is performed first, and then the keyword is extracted by using the word segmentation tool Jieba. Preferably, in S3, word2Vec adopts Skip-gram model to vectorize the key feature words, and the vector of the feature key words after processing increases TF-IDF value. Preferably, in S4, the superposition enhancement weight of the feature is calculated as follows: (1) Representing features In categoryThe number of occurrences of (a); (2) Representing features In categoryThe percentage of occurrence in (a); (3) Representing the weight of each feature calculated last; (4) Representing the characteristics of The number in this type; the number of occurrences of each feature in all types is defined as: 3 represents 3 types, the value range of t is [1, m ]; Each feature is provided with In categoryAssigned weights in (a)The calculation formula of (2) is as follows:。 preferably, the machine learning algorithm model in S4 includes any one of logistic regression, random forest, K-nearest neighbor algorithm and support vector machine. A second aspect of the pre