CN-122020161-A - Sample generation method, device, computer equipment and medium based on artificial intelligence
Abstract
The application belongs to the technical field of artificial intelligence, and relates to a sample generation method, a sample generation device, computer equipment and a storage medium based on artificial intelligence, wherein the method comprises the steps of obtaining a pre-constructed basic data set and extracting a sample set from the basic data set; training a service model based on a sample set to obtain a target service model, performing capacity detection processing on the sample set based on a greedy decoding strategy by using the target service model to screen a weak sample set from the sample set, performing sample correction on the weak sample set based on a decision model to obtain a first sample set, performing sample expansion on the first sample set based on a sample expansion strategy to obtain a second sample set, performing sample update on the first sample set and the second sample set to obtain target sample data, and outputting the target sample data. The application can be applied to sample generation scenes in the fields of financial science and technology and digital medical treatment, and the generation quality and the generation efficiency of sample data are effectively improved through the application.
Inventors
- ZHANG XULONG
- Bao Xikun
Assignees
- 平安科技(深圳)有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260119
Claims (10)
- 1. A sample generation method based on artificial intelligence, comprising the steps of: acquiring a pre-constructed basic data set, and extracting a sample set from the basic data set; training a preset service model based on the sample set to obtain a trained target service model; performing capacity detection processing on the sample set by using the target service model based on a preset greedy decoding strategy so as to screen out a corresponding weak-capacity sample set from the sample set; performing sample correction processing on the weak sample set based on a preset judgment model to obtain a corresponding first sample set; sample expansion processing is carried out on the first sample set based on a preset sample expansion strategy, and a corresponding second sample set is obtained; sample updating processing is carried out on the basis of the first sample set and the second sample set, and corresponding target sample data are obtained; And outputting the target sample data.
- 2. The artificial intelligence based sample generation method according to claim 1, wherein the step of performing capability detection processing on the sample set by using the target traffic model based on a preset greedy decoding strategy to screen out a corresponding weak-capability sample set from the sample set specifically comprises: Based on the greedy decoding strategy, the target service model is used for carrying out prediction processing on the sample set to obtain a corresponding prediction sequence; Performing item-by-item comparison on the predicted sequence and the corresponding reference label to obtain a corresponding comparison result; screening first sample data from the sample set based on the comparison result, wherein the first sample data is a sample of which the target service model cannot generate a correct sequence; calculating the confusion degree of the sample set, and screening second sample data with the confusion degree larger than a preset threshold value from the sample set; integrating the first sample data and the second sample data to obtain a corresponding integrated sample set; And taking the integrated sample set as the weak-capability sample set.
- 3. The artificial intelligence based sample generation method according to claim 1, wherein the step of performing sample correction processing on the weak sample set based on a preset decision model to obtain a corresponding first sample set specifically comprises: Performing prediction processing on the weak ability sample set based on the target service model to obtain a corresponding target prediction sequence; performing prediction processing on the target prediction sequence and a corresponding target reference label based on the judgment model to obtain a corresponding prediction result; acquiring a preset correction processing strategy; Based on the prediction result, carrying out correction processing on the weak-capacity sample set by using the correction processing strategy to obtain a processed correction sample set; And taking the corrected sample set as the first sample set.
- 4. The artificial intelligence based sample generation method according to claim 1, wherein the step of performing sample expansion processing on the first sample set based on a preset sample expansion policy to obtain a corresponding second sample set specifically comprises: Screening target expansion modes from a plurality of preset expansion modes; Constructing a corresponding target expansion function based on the target expansion mode; inputting the first sample set into the target expansion function, and expanding the first sample set based on the target expansion mode to obtain a corresponding third sample set; filtering the third sample set based on a preset filtering strategy to obtain a corresponding fourth sample set; the fourth sample set is taken as the second sample set.
- 5. The artificial intelligence based sample generation method according to claim 4, wherein the step of screening the target extension mode from the preset plurality of extension modes specifically comprises: sample characteristic recognition is carried out on the first sample set, and a corresponding sample characteristic recognition result is obtained; Acquiring a training target corresponding to the first sample set; carrying out data analysis on the sample characteristic analysis result and the training target to obtain a corresponding data analysis result; Screening out appointed expansion modes matched with the data analysis result from the multiple expansion modes; And taking the appointed extension mode as the target extension mode.
- 6. The artificial intelligence based sample generation method according to claim 4, wherein the step of filtering the third sample set based on a preset filtering policy to obtain a corresponding fourth sample set specifically includes: Carrying out grammar detection on the third sample set based on a preset grammar rule to obtain a corresponding grammar detection result; filtering the third sample set based on the grammar detection result to obtain third sample data detected by grammar; Performing decision processing on the third sample data based on the decision model to obtain a corresponding decision result; filtering the third sample data based on the judgment result to obtain filtered fourth sample data; The fourth sample data is taken as the fourth sample set.
- 7. The artificial intelligence based sample generation method according to claim 1, wherein the step of performing sample update processing based on the first sample set and the second sample set to obtain corresponding target sample data specifically comprises: integrating the first sample set and the second sample set to obtain corresponding first generated data; performing data arrangement processing on the first generated data to obtain corresponding second generated data; Performing de-duplication processing on the second generated data to obtain corresponding third generated data; And taking the third generated data as the target sample data.
- 8. An artificial intelligence based sample generation device, comprising: the processing module is used for acquiring a pre-constructed basic data set and extracting a sample set from the basic data set; The training module is used for training the preset service model based on the sample set to obtain a trained target service model; The screening module is used for carrying out capacity detection processing on the sample set by using the target service model based on a preset greedy decoding strategy so as to screen a corresponding weak-capacity sample set from the sample set; the correction module is used for carrying out sample correction processing on the weak ability sample set based on a preset judgment model to obtain a corresponding first sample set; The expansion module is used for carrying out sample expansion processing on the first sample set based on a preset sample expansion strategy to obtain a corresponding second sample set; The updating module is used for carrying out sample updating processing based on the first sample set and the second sample set to obtain corresponding target sample data; And the output module is used for carrying out output processing on the target sample data.
- 9. A computer device comprising a memory having stored therein computer readable instructions which when executed implement the steps of the artificial intelligence based sample generation method of any of claims 1 to 7.
- 10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the artificial intelligence based sample generation method of any of claims 1 to 7.
Description
Sample generation method, device, computer equipment and medium based on artificial intelligence Technical Field The application relates to the technical field of artificial intelligence, which can be applied to the fields of financial science and technology, digital medical treatment and the like, in particular to a sample generation method, device, computer equipment and storage medium based on artificial intelligence. Background In the application field of language models with tool calling capability, the current model is widely used for executing tasks such as calculation, retrieval, database query, complex task planning and the like, and the performance of the current model is highly dependent on sample data for training. However, existing sample generation approaches have significant limitations, relying primarily on manually constructed or static data sets generated using strong models. When the data is generated by adopting the strong model, a large amount of manual labeling is needed for evaluating the quality of each piece of data, so that a large amount of labor and time cost are consumed, the quality of the generated sample data is uncontrollable, and the label error phenomenon is common. This situation directly results in a sample generation mode with lower efficiency and accuracy, severely restricting further development and application of language models with tool calling capability. For example, in the field of financial insurance, sample data traditionally used for training language models to assist in insurance product recommendation and risk assessment is often generated by manual collection and sorting or a simple model, and the quality of the data is uneven and label errors occur. Therefore, the models trained based on the samples are difficult to accurately understand the requirements of clients when facing complex and changeable insurance business scenes, accurate insurance scheme recommendation and risk assessment are provided, and the quality and efficiency of insurance service are reduced. For another example, in the medical field, sample data used to train language models to aid in disease diagnosis and medical decision making, similar problems also exist. Because of the imperfect sample generation mode, the data quality is difficult to ensure, and the label error may mislead the learning and judgment of the model, thereby affecting the accuracy of disease diagnosis and the scientificity of medical decision, and bringing potential risks to the health of patients. Therefore, it is desirable to provide an improved sample generation method to improve the quality and generation efficiency of sample data, and further improve the performance and application effect of a language model with tool calling capability. Disclosure of Invention The embodiment of the application aims to provide a sample generation method, a sample generation device, computer equipment and a storage medium based on artificial intelligence, so as to solve the technical problems of low quality and low generation efficiency of the existing sample generation method. In a first aspect, there is provided an artificial intelligence based sample generation method, comprising: acquiring a pre-constructed basic data set, and extracting a sample set from the basic data set; training a preset service model based on the sample set to obtain a trained target service model; performing capacity detection processing on the sample set by using the target service model based on a preset greedy decoding strategy so as to screen out a corresponding weak-capacity sample set from the sample set; performing sample correction processing on the weak sample set based on a preset judgment model to obtain a corresponding first sample set; sample expansion processing is carried out on the first sample set based on a preset sample expansion strategy, and a corresponding second sample set is obtained; sample updating processing is carried out on the basis of the first sample set and the second sample set, and corresponding target sample data are obtained; And outputting the target sample data. In a second aspect, there is provided an artificial intelligence based sample generation apparatus comprising: the processing module is used for acquiring a pre-constructed basic data set and extracting a sample set from the basic data set; The training module is used for training the preset service model based on the sample set to obtain a trained target service model; The screening module is used for carrying out capacity detection processing on the sample set by using the target service model based on a preset greedy decoding strategy so as to screen a corresponding weak-capacity sample set from the sample set; the correction module is used for carrying out sample correction processing on the weak ability sample set based on a preset judgment model to obtain a corresponding first sample set; The expansion module is used for carrying out sample expansio