CN-122020182-A - Data set optimization method for improving reliability evaluation efficiency and reliability of large language model

CN122020182ACN 122020182 ACN122020182 ACN 122020182ACN-122020182-A

Abstract

The invention relates to a data set optimization method for improving reliability evaluation efficiency and reliability of a large language model, which comprises the steps of collecting an original data set for reliability evaluation of the large language model, carrying out data distillation on the original data set, sequentially carrying out repeated and abnormal sample cleaning, risk type classification and semantic clustering, and sample screening based on model differentiation and sequencing contribution to obtain a distilled data set, carrying out data enhancement on the distilled data set, generating an enhanced data set consistent with an original evaluation target through three disturbance rules of expression rewriting, contextual disturbance and security bypassing, carrying out reliability evaluation on the large language model to be tested based on the distilled data set and the enhanced data set, and outputting an evaluation efficiency index and an evaluation reliability index. The method can reduce the evaluation sample scale and the test cost, simultaneously maintain the model ordering stability and the risk type coverage capability, inhibit the phenomenon of performance deficiency and high caused by training data pollution or external safety filtration, and improve the reliability and the efficiency of large language model credibility evaluation.

Inventors

ZHOU XIN
BAI SHIWEI
XU JINWEI
XU JINYU
LIU TIANHAO
ZHANG HE

Assignees

南京大学

Dates

Publication Date: 20260512
Application Date: 20260318

Claims (9)

1. A data set optimization method for improving reliability and reliability of a large language model, the method comprising the steps of: Step 1, determining a target dimension of reliability evaluation of a large language model, and collecting an original evaluation data set meeting preset standards; Step 2, performing three-stage data distillation operation on the original evaluation data set to obtain a refined data set with the scale of 10% of the original data set, wherein the three stages comprise filtering based on cleaning, filtering based on coverage range and filtering based on sequencing; Step 3, designing three kinds of disturbance rules with unchanged semantics, randomly combining samples of the refined data set, and sequentially applying the disturbance rules to generate an enhanced data set consistent with the scale of the refined data set; And 4, inputting the refined data set into the large language model to be evaluated to obtain a model preliminary reliability score and ranking, inputting the enhanced data set into the large language model to be evaluated to obtain a model actual reliability score, and combining the two evaluation results to complete the reliability comprehensive evaluation of the large language model.
2. The method for optimizing a data set for improving reliability and reliability of a large language model according to claim 1, wherein in step 1, the preset criteria are English text data sets based on public research papers, publicly available or reproducible, attached with clear document usage instructions, and the number of sample instances is no more than 20000.
3. The data set optimization method for improving reliability and reliability of large language model according to claim 1, wherein the three-stage data distillation in step 2 is specifically: Step 2-1, a cleaning stage, namely performing repeated sample removal, invalid sample filtration and outlier rejection on an original data set to finish basic cleaning of the data set; step 2-2, a coverage stage, namely dividing the cleaned data set into risk subcategories according to evaluation dimensions and completing risk classification, and then carrying out semantic clustering on samples of all risk subcategories to ensure risk type coverage and semantic diversity of the samples; And 2-3, in the sorting stage, the discrimination score and the ranking score of each sample are calculated, the combination score is obtained after normalization, and the combination score is sampled in a descending order to obtain a refined data set with differentiation and ranking stability.
4. The data set optimization method for improving reliability and reliability of large language model according to claim 1, wherein the three-stage data enhancement in step 3 is specifically: 3-1, in the expression rewriting stage, surface form rewriting is carried out on a refined data set sample, and only the expression form of the sample is changed without changing the bottom semantic; Step 3-2, a context interference stage, namely introducing irrelevant context information for the rewritten sample, increasing model cognitive load, and testing the reliability performance of the model in an interference scene; And 3-3, a security bypass stage, namely applying strategic expression disturbance to a sample with an interference context, avoiding a security filter outside the model, and testing the real reliability of the model in a boundary scene.
5. The method for optimizing a data set for improving reliability and reliability of reliability assessment of a large language model according to claim 3, wherein in the step 2-3, the discrimination score is a model score variance of a sample, the higher the variance is, the stronger the distinguishing capability of the sample to model performance is, the higher the ranking score is a sample ranking contribution value based on Kendall Tau, and the stronger the consistency of the sample and global model ranking is.
6. The method for optimizing data set for improving reliability and reliability of large language model reliability according to claim 4, wherein in step 3, the perturbation operation of each stage adopts regularized design, each stage comprises a plurality of sub-perturbation rules, randomly selecting sub-rules for sample to perturbation, and each sub-rule is used only once in a single enhancement process.
7. The method for optimizing a data set for improving reliability and reliability of a large language model according to claim 1, wherein in step 4, the reliability score of the model is calculated in such a way that answer correctness is used as reliability score for single/multiple choice questions, similarity between model output and labels is used as reliability score for open questions, all reliability scores are normalized to a 0-1 interval, and the higher the score is, the better the reliability of the model is.
8. The method for optimizing a data set for improving reliability and reliability of a large language model according to claim 1, wherein in step 4, the decision criteria for comprehensive evaluation are that if the reliability score of the model on the enhanced data set is significantly reduced compared with the refined data set, it indicates that the high performance of the model on the original data set is a high false value caused by data pollution or an external safety filter, and if the two scores are substantially identical, it indicates that the model really has reliability corresponding to the evaluation dimension.
9. The data set optimization system for improving reliability evaluation efficiency and reliability of the large language model is characterized by comprising a data acquisition module, a data distillation module, a data enhancement module and an evaluation module, wherein the data acquisition module is used for determining a target dimension of reliability evaluation of the large language model and collecting an original evaluation data set, the data distillation module is used for performing cleaning-based filtering, coverage-based filtering and sequencing-based filtering on the original evaluation data set to generate a refined data set, the data enhancement module is used for applying three disturbance rules of expression rewriting, contextual disturbance and security bypassing to a refined data set sample to generate an enhanced data set, and the evaluation module is used for performing joint reliability evaluation on the large language model to be evaluated based on the refined data set and the enhanced data set and outputting an evaluation result.

Description

Data set optimization method for improving reliability evaluation efficiency and reliability of large language model Technical Field The application relates to a data set optimization method for improving reliability evaluation efficiency and reliability of a large language model, and belongs to the technical field of software engineering design modes. Background The large language model is widely applied to multiple fields such as question and answer, text generation, code assistance and the like by virtue of strong semantic understanding and generating capability, but the reliability problems such as privacy leakage, social bias, inconsistent facts and the like become key barriers for falling to the ground in the high-risk field. In order to evaluate the reliability of the LLM, related researches construct a large number of reference data sets covering the dimensions of privacy, bias, fact consistency and the like, and become an important basis for evaluating the reliability of the LLM. However, the conventional LLM reliability evaluation has two core problems, namely low evaluation efficiency, huge scale of the conventional reference data set and redundancy of samples, high calculation cost, long time consumption in the evaluation process, difficulty in adapting to the rapid evaluation requirement in practical application, and insufficient evaluation reliability, wherein the disclosed reference data set is easily incorporated into an LLM pre-training corpus, a model can obtain a virtual high score through a memory sample, and an external safety filter can only realize surface level risk protection and is easily bypassed by strategic wording, so that a standard test result cannot reflect the true capability of the model. The existing data set optimization method is designed aiming at a single dimension or a specific scene, or only simple data set sampling distillation is carried out, risk type coverage is ignored, or only specific data sets are enhanced, generalization capability is poor, and the dual problems of evaluation efficiency and reliability are not solved at the same time, so that the general, efficient and reliable evaluation requirement of LLM can not be met. Therefore, there is a need for a data set optimization method with simple process and high floor-standing performance, which can realize the improvement of the evaluation efficiency and the guarantee of the evaluation reliability through standardized distillation and enhancement of the process Disclosure of Invention The embodiment of the invention provides a data set optimization method for improving the reliability and the reliability of a large language model, which is used for solving the problems of high reliability evaluation cost, serious sample redundancy and distortion of an evaluation result due to the influence of data pollution and external safety filtration of the large language model in the prior art. In order to achieve the above purpose, the invention adopts the following technical scheme that the data set optimization method for improving the reliability and reliability of the reliability assessment of the large language model comprises the following steps: step 1, acquiring an original data set for evaluating the credibility of a large language model, wherein the original data set at least covers one credibility dimension in privacy, bias and fact consistency; Step 2, performing data distillation on the original data set to obtain a distilled data set; step 3, data enhancement is carried out on the distillation data set, and an enhancement data set is obtained; And 4, carrying out credibility evaluation on the target large language model by using the distillation data set and the enhancement data set, and outputting an evaluation result. In step 1, an original data set for evaluating the credibility of a large language model is obtained, and samples can be collected from published research papers, published benchmarks, reproducible experimental warehouse or enterprise internal historical evaluation data. The raw dataset preferably covers one or more dimensions of privacy, bias, and fact consistency, and retains sample text, tag answers, task types, sample sources, sub-dimension descriptions, or other fields that can be used for subsequent risk classification. The method is simultaneously suitable for selecting questions, judging questions and generating tasks in an open mode, so that the method does not depend on a single data format and can be directly processed for the existing mainstream credibility evaluation standard. In step 2, data distillation is used to screen out the most valuable sample subset from the raw data set. Unlike existing methods that compress only by random sampling or only by representative clustering, the data distillation of the present invention focuses on sample cleanliness, risk coverage, and contribution to model ordering at the same time, as follows: And 2-1, performing repeated sample remov