Search

CN-122020198-A - Label setting method, device, equipment, storage medium and product

CN122020198ACN 122020198 ACN122020198 ACN 122020198ACN-122020198-A

Abstract

The application discloses a label setting method, a device, equipment, a storage medium and a product, which relate to the technical field of artificial intelligence and comprise the steps of determining unlabeled description dimension of a corpus to be labeled, wherein the description dimension is an independent direction for limiting or describing the corpus; and responding to the labeling action corresponding to the description dimension to be labeled, and setting a corresponding dimension description label for the corpus to be labeled. The method has the advantages that the non-labeling description dimension of the corpus to be labeled can be automatically determined, the description dimension to be labeled is selected from the non-labeling description dimension, and then, the corresponding dimension description label is automatically set for the corpus to be labeled based on the response of labeling personnel to the labeling action corresponding to the description dimension to be labeled, so that the multi-dimension label labeling of the corpus is realized, and the information dimension and label coverage of the corpus are improved.

Inventors

  • LIU XIUMEI
  • LI JIANGXU

Assignees

  • 北京奇虎科技有限公司

Dates

Publication Date
20260512
Application Date
20260126

Claims (10)

  1. 1. A tag setting method, characterized in that the tag setting method comprises: determining unlabeled description dimensions of the corpus to be labeled, wherein the description dimensions are independent directions for limiting or describing the corpus; selecting a description dimension to be annotated from the non-annotated description dimensions; and setting a corresponding dimension description label for the corpus to be marked in response to the labeling action corresponding to the dimension to be labeled.
  2. 2. The method for setting a label according to claim 1, wherein the setting, for the corpus to be labeled, the corresponding dimension description label in response to the labeling action corresponding to the dimension to be labeled includes: Responding to the labeling action corresponding to the dimension to be labeled, and determining the corpus description label corresponding to the labeling action; Analyzing the corpus to be annotated through a large model, and determining optional description tags corresponding to the description dimensions to be annotated; matching the corpus description tag with the selectable description tag; And setting the dimension description label corresponding to the corpus to be marked as the corpus description label when the matching is successful.
  3. 3. The tag setting method of claim 2, wherein the matching the corpus descriptive tag with the selectable descriptive tag includes: respectively carrying out consistency matching on the corpus description tags and the optional description tags; If no selectable description label consistent with the corpus description label exists, respectively performing similarity matching on the corpus description label and each selectable description label, and determining label approximation degree corresponding to each selectable description label; If the corresponding optional description label with the label similarity larger than or equal to the preset similarity threshold exists, the matching is judged to be successful.
  4. 4. The method for setting a label according to claim 3, wherein if there is no optional description label consistent with the corpus description label, performing similarity matching on the corpus description label and each optional description label, and determining a label similarity corresponding to each optional description label, further comprises: if no selectable description label with the corresponding label similarity being greater than or equal to a preset similarity threshold exists, constructing a label confirmation task according to the corpus description label and the corpus to be labeled; Acquiring at least one label confirmation result corresponding to the label confirmation task by a label setting expert; Determining a label correct confidence coefficient according to the at least one label confirmation result; and if the label correct confidence coefficient is larger than a preset confidence threshold value, judging that the matching is successful.
  5. 5. The method for setting labels according to claim 2, wherein before the analyzing the corpus to be labeled by the large model and determining the selectable description labels corresponding to the description dimensions to be labeled, the method further comprises: acquiring the set description tag of the corpus to be annotated; Detecting whether the corpus description label conflicts with the set description label or not; And if no conflict exists, executing the step of analyzing the corpus to be annotated through the large model and determining the optional description label corresponding to the description dimension to be annotated.
  6. 6. The tag setting method of claim 5, wherein the detecting whether the corpus descriptive tag conflicts with the set descriptive tag comprises: Constructing rule searching conditions according to the set description tags and the corpus description tags; Detecting whether conflict rules corresponding to the rule searching conditions exist in a conflict rule base; If a conflict rule corresponding to the rule searching condition exists, judging that the conflict exists; and if the conflict rule corresponding to the rule searching condition does not exist, judging that the conflict does not exist.
  7. 7. A label setting device, characterized in that the label setting device comprises: The determining module is used for determining unlabeled description dimensions of the corpus to be annotated, wherein the description dimensions are independent directions for limiting or describing the corpus; the selection module is used for selecting the description dimension to be annotated from the non-annotated description dimensions; the setting module is used for responding to the labeling action corresponding to the description dimension to be labeled and setting a corresponding dimension description label for the corpus to be labeled.
  8. 8. A tag setting apparatus, characterized in that the apparatus comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the tag setting method according to any one of claims 1 to 6.
  9. 9. A storage medium, characterized in that the storage medium is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the tag setting method according to any one of claims 1 to 6.
  10. 10. A computer program product, characterized in that the computer program product comprises a computer program which, when being executed by a processor, implements the steps of the tag setting method according to any of claims 1 to 6.

Description

Label setting method, device, equipment, storage medium and product Technical Field The present application relates to the field of artificial intelligence, and in particular, to a method, apparatus, device, storage medium, and product for setting a tag. Background In the fields of natural language processing, text analysis, artificial intelligence and the like, in the training data preparation process, high-quality labeling of corpus is a key premise, while a traditional corpus platform is labeled in a single dimension, for example, labeling is carried out only from the dimension of classifying labels or only quality scores and the like, and the problems of single information dimension, insufficient label coverage, limited use of a subsequent model and the like exist. Disclosure of Invention The application mainly aims to provide a label setting method, a device, equipment, a storage medium and a product, and aims to solve the technical problem that when corpus is marked in related technologies, only a single dimension is marked, and the actual use effect is poor. In order to achieve the above object, the present application provides a tag setting method, the method comprising: determining unlabeled description dimensions of the corpus to be labeled, wherein the description dimensions are independent directions for limiting or describing the corpus; selecting a description dimension to be annotated from the non-annotated description dimensions; and setting a corresponding dimension description label for the corpus to be marked in response to the labeling action corresponding to the dimension to be labeled. Optionally, the responding to the labeling action corresponding to the dimension to be labeled sets a corresponding dimension description label for the corpus to be labeled, including: Responding to the labeling action corresponding to the dimension to be labeled, and determining the corpus description label corresponding to the labeling action; Analyzing the corpus to be annotated through a large model, and determining optional description tags corresponding to the description dimensions to be annotated; matching the corpus description tag with the selectable description tag; And setting the dimension description label corresponding to the corpus to be marked as the corpus description label when the matching is successful. Optionally, the matching the corpus description label with the selectable description label includes: respectively carrying out consistency matching on the corpus description tags and the optional description tags; If no selectable description label consistent with the corpus description label exists, respectively performing similarity matching on the corpus description label and each selectable description label, and determining label approximation degree corresponding to each selectable description label; If the corresponding optional description label with the label similarity larger than or equal to the preset similarity threshold exists, the matching is judged to be successful. Optionally, if there is no optional description tag consistent with the corpus description tag, performing similarity matching on the corpus description tag and each optional description tag, and determining a tag similarity corresponding to each optional description tag, where the method further includes: if no selectable description label with the corresponding label similarity being greater than or equal to a preset similarity threshold exists, constructing a label confirmation task according to the corpus description label and the corpus to be labeled; Acquiring at least one label confirmation result corresponding to the label confirmation task by a label setting expert; Determining a label correct confidence coefficient according to the at least one label confirmation result; and if the label correct confidence coefficient is larger than a preset confidence threshold value, judging that the matching is successful. Optionally, before the analyzing the corpus to be annotated by the large model and determining the selectable description label corresponding to the description dimension to be annotated, the method further includes: acquiring the set description tag of the corpus to be annotated; Detecting whether the corpus description label conflicts with the set description label or not; And if no conflict exists, executing the step of analyzing the corpus to be annotated through the large model and determining the optional description label corresponding to the description dimension to be annotated. Optionally, the detecting whether the corpus description label conflicts with the set description label includes: Constructing rule searching conditions according to the set description tags and the corpus description tags; Detecting whether conflict rules corresponding to the rule searching conditions exist in a conflict rule base; If a conflict rule corresponding to the rule searching condition exists, jud