Search

CN-121997936-A - Classification marking method and system based on AI text analysis

CN121997936ACN 121997936 ACN121997936 ACN 121997936ACN-121997936-A

Abstract

The invention provides a classification marking method and a classification marking system based on AI text analysis, and relates to the fields of natural language processing and artificial intelligence application. The method comprises the following steps of S1, creating a configuration annotation model, S2, creating a text analysis project, associating the unstructured text data set with the annotation model, S3, calling a large model to carry out semantic analysis on the associated unstructured text data set based on configuration information of the annotation model, and outputting an annotation result, and S4, carrying out statistics and visual display on the annotation result. By adopting the classification marking method and the classification marking system based on the AI text analysis, the marking efficiency, the accuracy and the interpretability of emotion and topic marking on unstructured texts are improved, the dependence on manual marking is reduced, and the manual marking cost is reduced.

Inventors

  • SHI BEINING
  • CHEN KAI

Assignees

  • 众言科技股份有限公司

Dates

Publication Date
20260508
Application Date
20260127

Claims (10)

  1. 1. The classification marking method based on AI text analysis is characterized by comprising the following steps: S1, creating a configuration annotation model; s2, creating a text analysis item, and associating the unstructured text data set with the annotation model; S3, based on configuration information of the labeling model, invoking a large model to perform semantic analysis on the associated unstructured text data set, and outputting a labeling result; And S4, counting and visually displaying the labeling result.
  2. 2. The method for classifying and marking based on AI text analysis of claim 1, wherein step S1 specifically comprises: step S101, constructing a topic label system comprising at least two-stage structures; Step S102, configuring label information for topic labels in a topic label system, wherein the label information comprises topic names, topic aliases and notes; Step S103, adding a small sample data set for the labeling model, wherein each small sample comprises text information, whole sentence emotion, topic marks and topic reasons.
  3. 3. The method for classifying and marking based on AI text analysis of claim 1, wherein step S2 specifically comprises: step S201, creating a text analysis project and configuring a project name; step S202, selecting at least one data source for a text analysis item from an unstructured text data set, and designating text fields to be analyzed from the data sources; And step 203, selecting at least one annotation model for the text analysis item from the configured annotation models in the step 1 for association, and establishing a binding relationship between the data source and the annotation model.
  4. 4. The method of claim 3, wherein step S202 further comprises designating a field containing time information from the data source as a time field and selecting one or more fields as selection filter fields.
  5. 5. The method for classifying and marking according to claim 1, wherein in step S4, the visual presentation includes: generating emotion analysis signboards, wherein the emotion analysis signboards comprise overall text quantity, positive evaluation, negative evaluation, emotion indexes, overall emotion trend graphs and emotion distribution; And generating a topic analysis billboard, wherein the topic analysis billboard comprises a topic mention rate TOP5, a topic qualification rate TOP5, a topic difference evaluation rate TOP5, topic statistics, a topic statistics bubble chart and a topic association degree.
  6. 6. The classification and marking method based on AI text analysis as claimed in claim 5, wherein the topic statistics include topics and their corresponding mention amounts, emotion indexes, bad evaluation rates and good evaluation rates; The topic statistics bubble diagram shows topic expression through a bubble diagram, wherein the bubble size represents the number of topics, the annular diagram in the bubble represents the ratio of the good evaluation rate to the poor evaluation rate, and the color represents the emotion tendency of the topics.
  7. 7. The classification marking method based on AI text analysis according to claim 5, wherein the topic association degree is obtained by calculating the correlation coefficient between topics, the value range is-1 to 1, the topic analysis signboard classifies the association degree between topics according to the value range of the topic association degree, specifically, the association degree is weak correlation below 0.3, the association degree is medium correlation between 0.3 and 0.6, and the association degree is strong correlation above 0.6.
  8. 8. A classification marking system based on AI text analysis, adopting the method of any one of claims 1 to 7, characterized by comprising a model management module, a text analysis project management module, a large model module and a visual billboard module; The model management module is used for providing a model management configuration interface for a user to create a configuration annotation model; The text analysis project management module is used for providing a text analysis project management interface for a user to create a text analysis project and associating the unstructured text data set with the annotation model from the model management module; The large model module is used for receiving the associated information from the text analysis project management module, calling the large model to process the unstructured text data set based on the configuration information of the annotation model, and obtaining the annotation result output by the large model; And the visual billboard module is used for receiving the labeling result from the large model module and carrying out statistics and visual display.
  9. 9. The AI-text-analysis-based classification marking system of claim 8, wherein the model management module includes: The topic configuration unit is used for constructing and managing a topic label system comprising at least two levels of structures and configuring label information for topic labels in the topic label system; And the sample learning unit is used for adding and managing a small sample data set for the labeling model.
  10. 10. The AI text analysis-based classification marking system of claim 9, wherein the visual billboard module includes: the emotion analysis board unit is used for generating and displaying emotion statistical data and a chart; the topic analysis billboard unit is used for generating and displaying topic statistical data and charts; original words unit for checking the marking condition of the text.

Description

Classification marking method and system based on AI text analysis Technical Field The invention relates to the application fields of natural language processing and artificial intelligence, in particular to a classification marking method and system based on AI text analysis. Background In the field of natural language processing, the text automatic labeling technology has gradually progressed from the stage of early dependence rule system and traditional neural network model to the stage of currently widely applied large pre-training language model (Large Language Models, LLMs, hereinafter referred to as large model). By means of the powerful semantic understanding and generalization capability of the large model, basic labeling tasks such as emotion, topics and the like are realized through fine adjustment or word project prompting, and the method becomes the mainstream practice of the industry. However, for emotion and topic recognition of unstructured text, the prior art still faces significant bottlenecks that traditional manual labeling modes are high in cost, low in efficiency and large in subjective difference, while machine learning-based methods depend on high-quality labeling data seriously, have limited processing capacity on complex semantics, new domain knowledge and language phenomena (such as metaphors and irony), have fitting risks, and are insufficient in model generalization capacity and labeling accuracy. Therefore, a new method and system for fusing semantic understanding capability of a large model are needed, and on the premise of ensuring labeling efficiency and expandability, the recognition precision of unstructured texts is improved, and the dependence on large-scale labeling data and continuous manual maintenance is reduced. Disclosure of Invention The invention aims to provide a classification marking method and a classification marking system based on AI text analysis, which aim to improve marking efficiency, accuracy and interpretability of emotion and topic marking on unstructured texts, reduce dependence on manual marking and reduce manual marking cost. In order to achieve the above purpose, the invention provides a classification marking method based on AI text analysis, which comprises the following steps: S1, creating a configuration annotation model; s2, creating a text analysis item, and associating the unstructured text data set with the annotation model; S3, based on configuration information of the labeling model, invoking a large model to perform semantic analysis on the associated unstructured text data set, and outputting a labeling result; And S4, counting and visually displaying the labeling result. Preferably, step S1 specifically includes: step S101, constructing a topic label system comprising at least two-stage structures; Step S102, configuring label information for topic labels in a topic label system, wherein the label information comprises topic names, topic aliases and notes; Step S103, adding a small sample data set for the labeling model, wherein each small sample comprises text information, whole sentence emotion, topic marks and topic reasons. Preferably, step S2 specifically includes: step S201, creating a text analysis project and configuring a project name; step S202, selecting at least one data source for a text analysis item from an unstructured text data set, and designating text fields to be analyzed from the data sources; And step 203, selecting at least one annotation model for the text analysis item from the configured annotation models in the step 1 for association, and establishing a binding relationship between the data source and the annotation model. Preferably, step S202 further comprises designating a field containing time information from the data source as a time field, and selecting one or more fields as selection filter fields. Preferably, in step S4, the visual display includes: generating emotion analysis signboards, wherein the emotion analysis signboards comprise overall text quantity, positive evaluation, negative evaluation, emotion indexes, overall emotion trend graphs and emotion distribution; And generating a topic analysis billboard, wherein the topic analysis billboard comprises a topic mention rate TOP5, a topic qualification rate TOP5, a topic difference evaluation rate TOP5, topic statistics, a topic statistics bubble chart and a topic association degree. Preferably, topic statistics comprise topics and corresponding mention amounts, emotion indexes, poor evaluation rates and good evaluation rates thereof; The topic statistics bubble diagram shows topic expression through a bubble diagram, wherein the bubble size represents the number of topics, the annular diagram in the bubble represents the ratio of the good evaluation rate to the poor evaluation rate, and the color represents the emotion tendency of the topics. Preferably, the topic relevance is obtained by calculating the correlation coefficient between topic