CN-122022971-A - Model training method based on multi-granularity risk labels and sorting loss and related products

CN122022971ACN 122022971 ACN122022971 ACN 122022971ACN-122022971-A

Abstract

The present disclosure provides a model training method based on multi-granularity risk tags and ordering loss and related products. According to the method, the multi-granularity risk labels are converted into the corresponding risk grades, the risk correlation labels of all training samples are determined based on the risk grades and an adjustable target label gain mapping method in a super-parameter optimization process, the risk correlation labels are used as real labels of the training samples, the risk evolution process of the training samples in different life cycle stages can be more comprehensively described, the ordering loss is used as an optimization target, a risk assessment model is guided to focus on the prediction of the relative order of risks, the ordering capability of the risk assessment model on the risks is enhanced, in addition, the method can effectively incorporate the training samples with short relation duration and showing early risk signs, the utilization efficiency of early risk signals is effectively improved, and the information loss caused by flattening and rough deletion of the labels in the traditional risk assessment model is avoided.

Inventors

YANG BOCHEN
YANG LIJUAN
LUO JIAN
CHEN CHANGRU
HAO JIANGTAO
LI TENG

Assignees

百融至信(北京)科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260127

Claims (16)

1. A model training method based on multi-granularity risk tags and ordering loss, the method comprising: Acquiring a training sample data set, a risk assessment model and a super-parameter set of the risk assessment model, wherein the training sample data set comprises training data of each training sample and multi-granularity risk labels, the multi-granularity risk labels are determined according to the performance data of the training samples in different relation duration, and the super-parameter set comprises label gain mapping super-parameters; determining risk grades corresponding to the multi-granularity risk labels of the training samples according to a preset risk label grade conversion method; Determining a target tag gain mapping method according to the tag gain mapping super-parameters; Determining risk correlation labels of the training samples according to the risk grades of the training samples and the target label gain mapping method; determining the sorting loss of each training sample according to the training data, the risk correlation label and the risk assessment model of each training sample; And carrying out iterative training on the risk assessment model according to each sorting loss until a preset model stopping training condition is met, and generating a target risk assessment model.
2. The method of claim 1, wherein the tag gain map super parameter comprises a target tag gain map method identification, and wherein the determining the target tag gain map method based on the tag gain map super parameter comprises: Acquiring a tag gain mapping method set, wherein the tag gain mapping method set comprises at least one monotonically increasing tag gain mapping method and tag gain mapping method identifiers corresponding to the tag gain mapping methods; And determining the target tag gain mapping method according to the target tag gain mapping method identifier and the tag gain mapping method set.
3. The method of claim 1, wherein the tag gain map super parameter comprises a method type and a method adjustment coefficient for the target tag gain map method, wherein the method type comprises at least one monotonically increasing tag gain map method type, wherein the determining the target tag gain map method based on the tag gain map super parameter comprises: and determining the target tag gain mapping method according to the method type and the method adjustment coefficient.
4. The method of claim 1, wherein said determining risk correlation labels for each of said training samples based on said risk level for each of said training samples and said target label gain mapping method comprises: determining a gain mapping value of each training sample according to the risk level of each training sample and the target label gain mapping method; and determining risk correlation labels of the training samples according to the gain mapping values.
5. The method of claim 1, wherein the training sample dataset comprises a full channel training sample dataset and/or a target channel training sample dataset, the risk assessment model comprises a generic risk assessment model and/or a custom risk assessment model, wherein the full channel training sample dataset comprises training data for each full channel training sample and multi-granularity risk tags, and the target channel training sample dataset comprises training data for each target channel training sample and multi-granularity risk tags or two-class risk tags.
6. The method of claim 5, wherein iteratively training the risk assessment model according to each of the ranking loss until a preset model stopping training condition is satisfied, generating a target risk assessment model, comprising: And performing iterative training on the general risk assessment model or the first customized risk assessment model according to the sorting loss of each full-channel training sample or the sorting loss of each target channel training sample until a preset general model training stopping condition or a preset first model training stopping condition is met, so as to generate a target general risk assessment model or a target customized risk assessment model.
7. The method of claim 6, wherein after generating the target generic risk assessment model, the method further comprises: Generating a predicted risk correlation label of each target channel training sample according to the training data of each target channel training sample and the target general risk assessment model; Combining each predicted risk correlation label with the training data of the corresponding target channel training sample to generate combined training data of each target channel training sample; determining the sorting loss of each target channel training sample according to the combined training data, the multi-granularity risk tag and a second customized risk assessment model of each target channel training sample; And performing iterative training on the second customized risk assessment model according to the sorting loss of each target channel training sample until a preset second customized model training stopping condition is met, and generating a second target customized risk assessment model.
8. The method of claim 6, wherein after generating the target generic risk assessment model, the method further comprises: Generating a predicted risk correlation label of each target channel training sample according to the training data of each target channel training sample and the target general risk assessment model; Combining each predicted risk correlation label with the training data of the corresponding target channel training sample to generate combined training data of each target channel training sample; Determining the cross entropy loss of each target channel training sample according to the combined training data, the classified risk labels and a third customized risk assessment model of each target channel training sample; And performing iterative training on the third customized risk assessment model according to the cross entropy loss of each target channel training sample until a preset third customized model training stopping condition is met, and generating a third target customized risk assessment model.
9. The method of claim 1, wherein said determining a loss of ordering for each of said training samples based on said training data, said risk correlation labels, and said risk assessment model for each of said training samples comprises: Inputting the training data of each training sample into the risk assessment model, and outputting a predicted risk correlation label of each training sample through the risk assessment model; and determining the sorting loss of each training sample according to the predicted risk correlation label and the risk correlation label of each training sample.
10. The method of claim 9, wherein after determining risk correlation labels for each of the training samples according to the risk level and the target label gain mapping method for each of the training samples, the method further comprises: dividing the training sample data set into X training sample data subsets according to a preset sampling rule, wherein X is a positive integer greater than 1; inputting the training data of each training sample into the risk assessment model, and outputting a predicted risk correlation label of each training sample through the risk assessment model, wherein the method comprises the following steps: And respectively inputting the training data of each training sample in the training sample data subsets into the risk assessment model aiming at each training sample data subset, and outputting a predicted risk correlation label of each training sample in the training sample data subsets through the risk assessment model.
11. A risk assessment method based on multi-granularity risk tags and ordering loss, the method comprising: Acquiring multi-dimensional data of a target object, wherein the multi-dimensional data of the target object comprises data for assessing risk of the target object; The multi-dimensional data are input into a target risk assessment model, and a risk assessment result of the target object is output through the target risk assessment model, wherein the target risk assessment model is obtained through a training method, a risk correlation label of each training sample is determined according to a risk grade and a target label gain mapping method of each training sample in a training sample data set, a sorting loss of each training sample is determined according to the training data of each training sample, the risk correlation label and the risk assessment model, and iterative training is carried out on the risk assessment model according to each sorting loss.
12. A model training apparatus based on multi-granularity risk labeling and ordering loss, the apparatus comprising: The system comprises a training data acquisition unit, a risk assessment model and a super-parameter set, wherein the training data acquisition unit is used for acquiring a training sample data set, a risk assessment model and the super-parameter set of the risk assessment model, the training sample data set comprises training data of each training sample and multi-granularity risk labels, the multi-granularity risk labels are determined according to the performance behavior data of the training samples in different relation duration, and the super-parameter set comprises label gain mapping super-parameters; The risk label grade conversion unit is used for determining the risk grade corresponding to the multi-granularity risk label of each training sample according to a preset risk label grade conversion method; The label gain mapping method determining unit is used for determining a target label gain mapping method according to the label gain mapping super-parameters; the risk correlation label determining unit is used for determining risk correlation labels of the training samples according to the risk levels of the training samples and the target label gain mapping method; A sorting loss determining unit, configured to determine a sorting loss of each training sample according to the training data, the risk correlation label, and the risk assessment model of each training sample; and the risk assessment model generating unit is used for carrying out iterative training on the risk assessment model according to each sorting loss until a preset model stopping training condition is met, so as to generate a target risk assessment model.
13. A model risk assessment device based on multi-granularity risk tags and ordering loss, the device comprising: a multi-dimensional data acquisition unit configured to acquire multi-dimensional data of a target object, wherein the multi-dimensional data of the target object includes data for evaluating a risk of the target object; The risk assessment result output unit is used for inputting the multi-dimensional data into a target risk assessment model, and outputting a risk assessment result of the target object through the target risk assessment model, wherein the target risk assessment model is obtained through a training method, a risk correlation label of each training sample is determined according to a risk grade of each training sample in a training sample data set and a target label gain mapping method, a sorting loss of each training sample is determined according to the training data of each training sample, the risk correlation label and the risk assessment model, and iterative training is carried out on the risk assessment model according to each sorting loss.
14. An electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, The program or programs, when executed by one or more processors, cause the one or more processors to implement the method of any of claims 1-10 or 11.
15. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by one or more processors, implements the method of any of claims 1-10 or 11.
16. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of any of claims 1-10 or 11.

Description

Model training method based on multi-granularity risk labels and sorting loss and related products Technical Field The disclosure relates to the technical field of artificial intelligence, in particular to a model training method based on multi-granularity risk labels and sorting loss and related products. Background In the field of financial management, risk assessment models are key tools for quantifying the risk level of an entity, and such models typically output a risk score of 300 to 1000 minutes for measuring the risk status of a sample of the entity (e.g., individual, business, transaction). The modeling method of the conventional risk assessment model mainly relies on a supervised learning framework, in which the risk assessment model is trained by marking entity samples as low risk entities or high risk entities according to their behavior within a certain fixed relation duration (e.g. 6 months), and then using cross entropy loss based on these labels. However, this conventional method has significant limitations in that (1) the risk level of the entity samples is reduced to binary labels (low risk entity/high risk entity) and the risk gradients of different degrees cannot be accurately reflected, (2) for those entity samples with a short relationship duration and showing early risk signs (such as slight abnormal behavior if only one month is observed), the conventional risk assessment model excludes the entity samples from training samples due to the fact that the conventional risk assessment model does not conform to the marking rules of the fixed relationship duration, so that these early signals with potential value for long-term risk prediction cannot be utilized by the risk assessment model. Therefore, there is a need for a model training method and related products based on multi-granularity risk tags and sorting loss, so as to solve at least one of the above technical problems. Disclosure of Invention The embodiment of the disclosure provides a model training method based on multi-granularity risk labels and sorting loss and a related product. The multi-granularity risk labels of the training samples in different relation duration are utilized, and the risk assessment model is subjected to iterative training in combination with the sorting loss, so that the risk evolution process of the training samples in different relation duration can be more finely described, more differentiated supervision information is provided for the risk assessment model, and the discrimination capability of the optimized risk assessment model on the risk sorting is enhanced. In a first aspect, the present disclosure provides a model training method based on multi-granularity risk tags and ordering loss, the method comprising: Acquiring a training sample data set, a risk assessment model and a super-parameter set of the risk assessment model, wherein the training sample data set comprises training data of each training sample and multi-granularity risk labels, the multi-granularity risk labels are determined according to the performance data of the training samples in different relation duration, and the super-parameter set comprises label gain mapping super-parameters; determining risk grades corresponding to the multi-granularity risk labels of the training samples according to a preset risk label grade conversion method; Determining a target tag gain mapping method according to the tag gain mapping super-parameters; Determining risk correlation labels of the training samples according to the risk grades of the training samples and the target label gain mapping method; determining the sorting loss of each training sample according to the training data, the risk correlation label and the risk assessment model of each training sample; And carrying out iterative training on the risk assessment model according to each sorting loss until a preset model stopping training condition is met, and generating a target risk assessment model. In some optional embodiments, the tag gain mapping super parameter includes a target tag gain mapping method identifier, and the determining a target tag gain mapping method according to the tag gain mapping super parameter includes: Acquiring a tag gain mapping method set, wherein the tag gain mapping method set comprises at least one monotonically increasing tag gain mapping method and tag gain mapping method identifiers corresponding to the tag gain mapping methods; And determining the target tag gain mapping method according to the target tag gain mapping method identifier and the tag gain mapping method set. In some optional embodiments, the tag gain mapping super parameter includes a method type and a method adjustment coefficient of the target tag gain mapping method, wherein the method type includes at least one monotonically increasing tag gain mapping method type, and the determining the target tag gain mapping method according to the tag gain mapping super parameter includes: