Search

CN-122019778-A - Label generation method, device and medium

CN122019778ACN 122019778 ACN122019778 ACN 122019778ACN-122019778-A

Abstract

The application provides a label generation method, label generation equipment and label generation medium, and relates to the field of data processing. The method comprises the steps of obtaining unstructured text data in an enterprise information system, carrying out context analysis on the unstructured text data by using a large language model to generate an initial tag set, processing the initial tag set based on a preset rule to form a cross-level tag structure, wherein the cross-level tag structure comprises at least two tag layers with different semantic dimensions, and establishing an association relation network among the tag layers. By the method and the device, the hysteresis of label updating is reduced.

Inventors

  • MO LIJUN
  • ZHANG XIAODONG
  • Ni Lizheng
  • Ma Lumeng
  • Gu Tangxuan

Assignees

  • 上海远图未来信息技术有限公司

Dates

Publication Date
20260512
Application Date
20251226

Claims (10)

  1. 1. A tag generation method, comprising: unstructured text data in an enterprise information system is obtained, wherein the unstructured text data is natural language text which is not formatted; Performing context analysis on the unstructured text data by using a large language model to generate an initial tag set; Processing the initial tag set based on a preset rule to form a cross-level tag structure, wherein the cross-level tag structure comprises at least two business tag layers with different semantic dimensions, and an association relationship is established between the business tag layers.
  2. 2. The method of claim 1, wherein the performing a context analysis on the unstructured text data using a large language model to generate an initial set of tags comprises: carrying out segmentation processing on the unstructured text data to extract key event fragments; generating a semantic vector based on the key event segment; And matching the semantic vector with a preset tag library to generate the initial tag set.
  3. 3. The method of claim 2, wherein generating a semantic vector based on the key event segments comprises: vectorizing the key event fragments to obtain event vectors; and carrying out semantic coding on the event vector to obtain the semantic vector.
  4. 4. A method according to any one of claims 1 to 3, wherein the processing the initial set of tags based on preset rules to form a cross-level tag structure comprises: Performing semantic similarity calculation on the tags in the initial tag set, and determining conflict tags and normal tags; replacing the conflict label with a uniform label with generalized semantics; Grouping the unified tag and the normal tag to obtain a grouping tag; and establishing a hierarchical association relation between the grouping labels based on a business hierarchy rule to obtain the cross-hierarchical label structure.
  5. 5. A method according to any one of claims 1 to 3, further comprising, after forming the cross-level tag structure: carrying out knowledge distillation on the cross-level label structure according to a preset period to generate a layered knowledge abstract; and storing the hierarchical knowledge abstract into a database.
  6. 6. The method of claim 5, wherein storing the hierarchical knowledge digest in a database comprises: determining the access frequency of the hierarchical knowledge abstract; According to the access frequency, dynamically adjusting the storage compression rate of the hierarchical knowledge abstract; And storing the hierarchical knowledge abstract to the database based on the storage compression rate.
  7. 7. The method of claim 5, wherein storing the hierarchical knowledge digest in a database comprises: determining a storage period of the hierarchical knowledge abstract; and storing the hierarchical knowledge abstract to the database based on the storage period.
  8. 8. A method according to any one of claims 1 to 3, wherein the large language model is determined based on a parametric pruning operation and a quantization compression operation.
  9. 9. An electronic device is characterized by comprising a memory and a processor; The memory stores computer-executable instructions; The processor executing computer-executable instructions stored in the memory, causing the processor to perform the method of any one of claims 1-8.
  10. 10. A computer readable storage medium having stored therein computer executable instructions which when executed are adapted to implement the method of any of claims 1-8.

Description

Label generation method, device and medium Technical Field The present application relates to the field of data processing, and in particular, to a tag generating method, device, and medium. Background In the present digital age, enterprises face deep changes in aspects of enterprise knowledge management, intelligent office systems, tissue efficiency analysis and the like. One of the core challenges is how to efficiently process and mine large amounts of unstructured data, which exist in many forms of text, images, audio and video, etc., and make up a significant portion of the total amount of enterprise data. Especially in the construction of the knowledge base in the enterprise, the text information such as daily newspaper, weekly newspaper, project summary and the like generated by staff in daily life brings great difficulty to the retrieval, utilization and value extraction of information due to the characteristics of fragmentation and unstructured and the lack of unified classification logic. Currently, when unstructured data is processed, a manually preset fixed tag system is often relied on. In particular, the classification labelling operation is usually performed on unstructured data to be processed by predefined rules or manually, which may lead to a lag in label updating. Disclosure of Invention The application provides a label generation method, label generation equipment and a label generation medium, which are used for reducing the hysteresis of label updating. In a first aspect, the present application provides a tag generation method, including: unstructured text data in an enterprise information system is obtained, wherein the unstructured text data is natural language text which is not subjected to formatting treatment; performing context analysis on unstructured text data by using a large language model to generate an initial tag set; processing the initial label set based on a preset rule to form a cross-level label structure, wherein the cross-level label structure comprises at least two business label layers with different semantic dimensions, and an association relation is established between the business label layers. In one possible implementation, the generating an initial set of tags using a large language model to perform a context analysis on unstructured text data includes: carrying out segmentation processing on unstructured text data to extract key event fragments; Generating a semantic vector based on the key event fragments; and matching the semantic vector with a preset tag library to generate an initial tag set. In one possible implementation, generating a semantic vector based on the key event fragments includes: vectorizing the key event fragments to obtain event vectors; and carrying out semantic coding on the event vector to obtain a semantic vector. In one possible implementation, the processing the initial tag set based on a preset rule to form a cross-level tag structure includes: carrying out semantic similarity calculation on the tags in the initial tag set, and determining conflict tags and normal tags; Replacing the conflict label with a uniform label with generalized semantics; grouping the unified tag and the normal tag to obtain a grouping tag; and establishing a hierarchical association relationship between the grouping labels based on the business hierarchy rule to obtain a cross-hierarchical label structure. In one possible embodiment, after forming the cross-level tag structure, further comprising: carrying out knowledge distillation on the cross-level label structure according to a preset period to generate a layered knowledge abstract; the hierarchical knowledge abstract is stored to a database. In one possible implementation, storing the hierarchical knowledge digest to a database includes: Determining the access frequency of the hierarchical knowledge abstract; According to the access frequency, dynamically adjusting the storage compression rate of the hierarchical knowledge abstract; Based on the storage compression rate, the hierarchical knowledge abstract is stored into a database. In one possible implementation, storing the hierarchical knowledge digest to a database includes: Determining a storage period of the hierarchical knowledge abstract; based on the storage period, the hierarchical knowledge abstract is stored into a database. In one possible implementation, the large language model is determined based on a parametric pruning operation and a quantization compression operation. In a second aspect, the present application provides a label generating apparatus comprising: The system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring unstructured text data in an enterprise information system, wherein the unstructured text data is natural language text which is not formatted; The generation module is used for carrying out context analysis on unstructured text data