Search

CN-122021593-A - Data annotation processing method and device, computer equipment and storage medium

CN122021593ACN 122021593 ACN122021593 ACN 122021593ACN-122021593-A

Abstract

The application belongs to the technical field of data processing, and relates to a data labeling processing method, a device, computer equipment and a storage medium, wherein the method comprises the steps of screening specified data to be labeled from a data set; the method comprises the steps of configuring content of a labeling template based on specified data to obtain a target labeling template, conducting data rendering on the target labeling template based on the specified data to generate a target labeling page, creating a labeling task corresponding to the target labeling page, distributing the labeling task to labeling personnel based on a task distribution mode, receiving a generated labeling result corresponding to the target labeling page, conducting quality inspection processing on the labeling result based on a quality inspection strategy, conducting acceptance processing on the labeling result based on an acceptance rule if the labeling result passes the quality inspection, and conducting output processing on the labeling result if the labeling result passes the acceptance rule. The method can be applied to the data annotation scenes in the fields of financial science and technology and medical health, and can effectively improve the processing efficiency, accuracy and annotation quality of data annotation processing.

Inventors

  • SHANG WENKE
  • WANG SHAOJUN
  • CHENG NING

Assignees

  • 平安科技(深圳)有限公司

Dates

Publication Date
20260512
Application Date
20260109

Claims (10)

  1. 1. The data labeling processing method is characterized by comprising the following steps of: Acquiring a pre-collected data set, and screening specified data to be marked from the data set; Performing content configuration processing on a preset labeling template based on the specified data to obtain a corresponding target labeling template; Performing data rendering processing on the target annotation template based on the specified data to generate a corresponding target annotation page; Creating a labeling task corresponding to the target labeling page, and distributing the labeling task to corresponding labeling personnel based on a preset task distribution mode; receiving a marking result corresponding to the target marking page, which is generated after the marking personnel executes data marking operation on the marking task; Performing quality inspection processing on the labeling result based on a preset quality inspection strategy; If the labeling result passes the quality inspection, performing acceptance inspection on the labeling result based on a preset acceptance inspection rule; And if the labeling result passes the acceptance, outputting the labeling result.
  2. 2. The method for processing data annotation according to claim 1, wherein the step of acquiring a data set collected in advance and screening out specified data to be annotated from the data set comprises the steps of: executing a preset data acquisition task to acquire original data from various channels; carrying out data processing treatment on the original data to obtain a corresponding data set; Acquiring a preset data selection rule; performing data screening processing on the data set based on the data selection rule to obtain related data meeting the requirements; And taking the related data as the specified data.
  3. 3. The method for processing data annotation according to claim 1, wherein the step of performing content configuration processing on a preset annotation template based on the specified data to obtain a corresponding target annotation template specifically comprises: Acquiring a preset service requirement; determining a corresponding labeling template based on the service requirement; Acquiring a labeling requirement corresponding to the specified data; Performing content configuration processing on the annotation template based on the annotation requirement to obtain a configured specified annotation template; And taking the appointed annotation template as the target annotation template.
  4. 4. The method for labeling data according to claim 1, wherein the step of performing quality inspection processing on the labeling result based on a preset quality inspection policy specifically comprises: Performing preliminary quality inspection processing on the labeling result based on a preset quality inspection rule; If the labeling result passes the preliminary quality inspection, a preset multi-round quality inspection flow is called; performing multi-round quality inspection processing on the labeling result based on the multi-round quality inspection flow; If the labeling result passes the quality inspection for a plurality of rounds, judging that the labeling result passes the quality inspection, otherwise, judging that the labeling result does not pass the quality inspection.
  5. 5. The method for labeling data according to claim 4, wherein the quality inspection rule includes an accuracy check rule, an integrity check rule and a consistency check rule, and the step of performing preliminary quality inspection processing on the labeling result based on the preset quality inspection rule specifically includes: Performing accuracy verification on the labeling result based on the accuracy verification rule; If the labeling result passes the accuracy check, carrying out the integrity check on the labeling result based on the integrity check rule; if the labeling result passes the integrity check, carrying out consistency check on the labeling result based on the consistency check rule; If the labeling result passes the consistency check, judging that the labeling result passes the preliminary quality check, otherwise, judging that the labeling result does not pass the preliminary quality check.
  6. 6. The method for processing data annotation according to claim 1, wherein the step of performing acceptance processing on the annotation result based on a preset acceptance rule specifically comprises: Performing format conversion processing on the labeling result based on a preset acceptance format to obtain a corresponding target labeling result; Performing matching analysis on the target labeling result based on a preset acceptance rule, and judging whether the target labeling result meets a preset acceptance standard or not; If the target labeling result meets the acceptance criterion, judging that the labeling result passes acceptance; and if the target labeling result does not meet the acceptance criterion, judging that the labeling result fails to pass the acceptance criterion.
  7. 7. The method for processing data annotation according to claim 1, wherein the step of outputting the annotation result specifically comprises: summarizing the labeling results to generate corresponding first labeling results; acquiring a preset export format; Performing format conversion processing on the first labeling result based on the derived format to obtain a corresponding second labeling result; And exporting the second labeling result.
  8. 8. A data annotation processing apparatus, comprising: The first processing module is used for acquiring a data set collected in advance and screening out designated data to be marked from the data set; the configuration module is used for carrying out content configuration processing on a preset labeling template based on the specified data to obtain a corresponding target labeling template; the generation module is used for carrying out data rendering processing on the target annotation template based on the specified data to generate a corresponding target annotation page; The second processing module is used for creating a labeling task corresponding to the target labeling page and distributing the labeling task to corresponding labeling personnel based on a preset task distribution mode; The receiving module is used for receiving a generated marking result corresponding to the target marking page after the marking personnel execute data marking operation on the marking task; the quality inspection module is used for performing quality inspection processing on the labeling result based on a preset quality inspection strategy; The acceptance module is used for carrying out acceptance processing on the labeling result based on a preset acceptance rule if the labeling result passes the quality inspection; and the output module is used for outputting the labeling result if the labeling result passes the acceptance.
  9. 9. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which when executed by the processor implement the steps of the data annotation processing method of any of claims 1 to 7.
  10. 10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the data annotation processing method according to any of claims 1 to 7.

Description

Data annotation processing method and device, computer equipment and storage medium Technical Field The application relates to the technical field of data processing, and can be applied to the fields of financial science and technology, medical health and the like, in particular to a data labeling processing method, a data labeling processing device, computer equipment and a storage medium. Background With the rapid development of Natural Language Processing (NLP) technology, industries are actively introducing the technology to reduce business operation costs. The current NLP algorithm is based on traditional machine learning, deep learning and pretraining model fine tuning, and has urgent demands on a large amount of high-quality labeling data. At present, a common data labeling processing mode is off-line Excel operation, namely actual service data are acquired off-line by means of an Excel format of a preset header, split files are distributed to different labeling operators by a labeling administrator, and after labeling is completed, results are collected and combined. Although the scheme is flexible, various text labels can be theoretically supported through the custom header, obvious disadvantages exist, production data is required to be imported under a line, the data label processing efficiency is low, and the quality and accuracy of a label result are difficult to ensure. In the field of financial insurance, insurance claim data is marked as an example, and claim information contains accident details, loss degree, claim amount and other key contents, and different risk types have different marking requirements. For example, the accident time, place, collision part, maintenance cost, etc. should be accurately marked for car insurance claims, and the health condition, death cause, etc. of the insured life should be marked for life insurance claims. However, in the existing offline Excel operation mode, when the complex and accurate labeling tasks are required, the problems of incomplete labeling and inaccuracy often occur, and smooth performance of the claim settlement process is affected. In the field of medical health, disease diagnosis data is taken as an example, medical records contain abundant information such as patient symptoms, examination results, diagnosis conclusions and the like, and the severity of different symptoms, abnormal indexes of the examination results and the like need to be accurately distinguished during the marking. However, in the existing labeling mode, due to low processing efficiency, a large amount of medical record data is difficult to process rapidly, labeling quality is uneven, and deviation can occur in follow-up disease research, treatment scheme formulation and other works. Therefore, it is desirable to provide a data labeling system that is efficient and ensures labeling quality. Disclosure of Invention The embodiment of the application aims to provide a data labeling processing method, a device, computer equipment and a storage medium, which are used for solving the technical problems that the existing data labeling mode has low processing efficiency and the quality and accuracy of labeling results are difficult to ensure. In a first aspect, a data annotation processing method is provided, including: Acquiring a pre-collected data set, and screening specified data to be marked from the data set; Performing content configuration processing on a preset labeling template based on the specified data to obtain a corresponding target labeling template; Performing data rendering processing on the target annotation template based on the specified data to generate a corresponding target annotation page; Creating a labeling task corresponding to the target labeling page, and distributing the labeling task to corresponding labeling personnel based on a preset task distribution mode; receiving a marking result corresponding to the target marking page, which is generated after the marking personnel executes data marking operation on the marking task; Performing quality inspection processing on the labeling result based on a preset quality inspection strategy; If the labeling result passes the quality inspection, performing acceptance inspection on the labeling result based on a preset acceptance inspection rule; And if the labeling result passes the acceptance, outputting the labeling result. In a second aspect, there is provided a data annotation processing apparatus comprising: The first processing module is used for acquiring a data set collected in advance and screening out designated data to be marked from the data set; the configuration module is used for carrying out content configuration processing on a preset labeling template based on the specified data to obtain a corresponding target labeling template; the generation module is used for carrying out data rendering processing on the target annotation template based on the specified data to generate a correspondi