Search

CN-122022980-A - Knowledge graph-based credit overdue prediction analysis method and system

CN122022980ACN 122022980 ACN122022980 ACN 122022980ACN-122022980-A

Abstract

The application discloses a method and a system for predicting and analyzing overdue credit based on a knowledge graph, wherein the method comprises the steps of extracting entity network structure and business association bi-dimensional characteristics based on the credit knowledge graph, forming a numerical characteristic set, matching overdue labels and eliminating invalid samples; and the effective association rule is mined by the focused overdue label to form a rule base, and a linkage judgment mechanism of rule matching priority and GNN model bottom is adopted. The method solves the problems of incomplete feature coverage, single screening, poor suitability and other pain points in the traditional prediction, combines prediction accuracy and coverage, improves rule mining efficiency and feature quality, and provides reliable technical support for credit decision of financial institutions.

Inventors

  • ZHAO JINYU

Assignees

  • 中国工商银行股份有限公司许昌分行

Dates

Publication Date
20260512
Application Date
20251208

Claims (10)

  1. 1. The credit overdue prediction analysis method based on the knowledge graph is characterized by comprising the following steps of: based on the credit knowledge graph, extracting network structure features and business association features of the entity, and converting the network structure features and business association features into a digital original feature set; matching the original feature set with the 3-year overdue records, adding overdue classification labels, and removing invalid samples; Screening according to priority from the quantitative characteristic distinguishing capability of the correlation strength and the statistical significance dimension, wherein the first priority keeps the characteristics of high correlation strength or strong statistical significance, the second priority keeps the characteristics of medium correlation strength and significant statistical correlation, and the characteristics that only single indexes reach standards but double indexes do not reach standards at the same time are removed; Performing discretization processing on core features according to credit business threshold values through quartiles, extracting feature combinations meeting business requirements as candidate item sets by taking overdue labels as constraints, reserving combinations containing high-risk single features for pruning ineffective calculation, and finally screening effective association rules according to confidence coefficient, lifting degree and coverage rate meeting risk prediction requirements; And judging risks by using the entity to be predicted and matching the rule base, and calling the GNN model to output a result when no matching exists.
  2. 2. The knowledge-graph-based credit overdue prediction analysis method according to claim 1, wherein the knowledge-graph-based credit is characterized in that the network structure features and business association features of the entity are extracted and converted into digital original feature sets, specifically, according to the knowledge-graph-based credit, sentences are queried and entity core features are extracted, including network structure features reflecting the association density of the entity graph and business association features reflecting the association condition of the business of the entity credit, all the extracted features are converted into identifiable digital data, and the identifiable digital data is associated and bound with corresponding entity IDs to form the original feature sets comprising entity unique identifiers and various digital features.
  3. 3. The knowledge graph-based credit overdue prediction analysis method according to claim 1, wherein the method is characterized in that an original feature set is matched with a 3-year overdue record, overdue classification labels are added, invalid samples are removed, specifically, entity IDs in the original feature set are used as associated keys, the 3-year client overdue record of a credit business system is matched, the overdue classification labels are added for the entities, the labels are 1 if the overdue records exist, and are not 0, cleaning data is carried out after label addition is completed, invalid samples without corresponding overdue labels and key feature missing are removed, and a feature label data set is formed.
  4. 4. The knowledge graph-based credit overdue prediction analysis method according to claim 1, wherein the capability of distinguishing the feature from the dimension of correlation strength and statistical significance is characterized in that a mutual information value and chi-square test dual index system is adopted, the mutual information value is used for quantifying the dependency relationship between the feature and the overdue tag, the mutual information value is not lower than 0.3 and indicates that the correlation degree reaches the standard, chi-square test is used for constructing a feature value and overdue tag list, correlation significance is judged by calculating chi-square statistics and inquiring a corresponding p value, the p value is smaller than 0.05 to indicate that the feature has significant correlation with the tag, and the p value is smaller than 0.01 and is strong in significant correlation.
  5. 5. The knowledge-graph-based credit overdue prediction analysis method according to claim 4, wherein the first priority retains features of high association strength or strong statistical significance, specifically, the first priority retains features of high association strength and strong statistical significance, features meeting the criteria of high association strength or strong statistical significance are included in a core feature set during screening, and the features meeting both criteria are labeled with priority levels.
  6. 6. The knowledge-graph-based credit overdue prediction analysis method according to claim 5, wherein the second priority keeps medium association strength and has significant statistical association characteristics, specifically, for the residual characteristics after screening of the first priority, secondary screening is performed by taking double indexes as core conditions, original data and check records of the double indexes of the characteristics are firstly called during screening, index intervals are checked to avoid overlapping with the first priority standards, integrity of the calculation process is required to be verified for the standard-reaching characteristics, index critical or doubtful is calculated again, standard-reaching characteristic marks correspond to grades, and index values and ranges are synchronously recorded.
  7. 7. The knowledge-graph-based credit overdue prediction analysis method according to claim 1, wherein the feature that only single indexes reach standards but double indexes do not reach standards at the same time is removed, specifically, the feature that the double indexes reach standards at the same time is taken as a core standard, single index reaching standards in the residual features after the screening of the screening priority are checked, the situation that the association strength reaches standards but no obvious statistical association exists, the association strength exists but the association strength exists, the feature is used for building an index account during screening, standard reaching types and numerical values are marked, the index critical needs recalculation verification is performed, the error-free removal is performed, meanwhile, the feature information is recorded and removed to form a traceability file, and the feature in the core feature set is ensured to meet the requirements of both association strength and statistical significance.
  8. 8. The knowledge graph-based credit overdue prediction analysis method according to claim 1, wherein the core features are discretized by quartiles and credit business thresholds, specifically, the core feature sets screened by double indexes and corresponding feature tag data sets are firstly called for discretizing the core features, feature numerical distribution is combed, quartiles are calculated to define natural distribution intervals, business risk thresholds are determined for different types of core features, the quartiles and business threshold adjustment boundaries are fused, statistics rules and credit risk cognition are considered, numerical features are converted into a classification form of 'feature name-interval grades', interval grades are defined as low, medium and high according to business risks, and related calculation and basis formation source documents are recorded.
  9. 9. The knowledge graph-based credit overdue prediction analysis method according to claim 1, wherein overdue labels are used as constraints, feature combinations meeting service requirements are extracted to be used as candidate item sets, specifically, discretized core features and corresponding feature label data are called, a sample with overdue labels of 1 is focused, 1-item sets of single discretized features are generated, label support degree is calculated, non-standard item sets are removed to be used as basic item sets, k-item sets are generated based on the basic item sets, only combinations containing high-risk 1-item sets are reserved, the high-risk 1-item sets are required to meet double-index strong association standards and high-service risk levels, logic contradiction combinations are eliminated by checking service rules, and finally the candidate item sets meeting the requirements are formed and relevant information is recorded.
  10. 10. A system for utilizing the knowledge-graph-based credit expiration prediction analysis method set forth in any one of claims 1 to 9, comprising: The feature extraction module is used for extracting network structure features and business association features of the entity based on the credit knowledge graph and converting the network structure features and business association features into a digital original feature set; the data preprocessing module is used for matching the original feature set with the 3-year overdue records, adding overdue classification labels and removing invalid samples; The feature screening module is used for screening features according to the appointed priority from the association strength and the statistical significance dimension quantization feature distinguishing capability, and eliminating features with only single indexes reaching the standard and double indexes not reaching the standard at the same time; the association rule mining module is used for extracting feature combination candidate item sets by using overdue labels as constraints through the quartile according to credit business threshold discretization core features, and screening effective association rules according to confidence level, promotion level and coverage rate; The rule base construction module is used for verifying rules by using the test set and reserving standard-reaching rules to form a rule base; And the risk judging module is used for judging the risk of the entity matching rule base to be predicted, and calling the GNN model to output a result when no matching exists.

Description

Knowledge graph-based credit overdue prediction analysis method and system Technical Field The invention relates to the field of predictive analysis, in particular to a credit overdue predictive analysis method and system based on a knowledge graph. Background The credit business is taken as a core component of a financial system, and the risk management and control capability of the credit business is directly related to the asset security and sustainable operation of a financial institution, wherein the accurate identification and prognosis of overdue risk are key links of credit risk management and control. Along with the expansion of credit business scale and the complicating of transaction scenes, the traditional credit risk assessment method gradually exposes obvious limitations, namely on one hand, the existing scheme relies on financial indexes of single dimension or isolated customer information, network structure association (such as association density) and business association characteristics (such as credit business interaction conditions) of entities in a credit knowledge graph cannot be fully mined, so that feature extraction comprehensiveness is insufficient, on the other hand, feature screening links often adopt single index judgment, two-dimensional collaborative verification of association strength and statistical significance is lacking, invalid features with weak distinguishing capability are easily introduced, accuracy of risk prediction is influenced, meanwhile, feature discretization processing depends on statistical distribution or single business threshold, effective fusion of statistical rules and credit risk cognition is not realized, suitability is poor, in addition, high-risk single feature combination is not reserved for pertinently in the feature combination mining process, a large number of invalid calculations exist, a risk judgment system lacks effective linkage of a rule base and a model, and judgment efficiency and coverage are difficult to consider. The patent document with the publication number of CN119886312A discloses a policy rule generation method based on a decision tree, which comprises the steps of obtaining historical credit performance data of a credit institution, processing and sampling the historical credit performance data according to a business scene to obtain an original dataset, establishing a decision tree model, mining rules of all nodes of the decision tree based on the decision tree model, selecting calculation indexes by combining with the credit business scene to obtain rules and corresponding index information, and screening and sequencing the rules output by the decision tree model by combining with the credit business indexes to obtain a policy rule combination. The acquisition process of the original data set comprises the following steps of acquiring credit performance data of a historical customer from a credit agency, adjusting variable data types according to actual business scenes, carrying out layered sampling on data samples according to the proportion of each type of samples in the credit performance data, calculating weight variables of the sampled data, and dividing a training set and a testing set on the sampled data to obtain an available original data set. The problems in the prior art lead to the challenges of low feature quality, insufficient rule validity, low risk identification accuracy and the like of the financial institutions in credit risk prediction, and cannot meet the requirements of fine and intelligent risk management and control. In order to solve the pain points in the prior art, a set of full-flow feature processing and risk judging technical system based on credit knowledge maps is needed to be constructed, and a powerful support is provided for credit decision of a financial institution by comprehensively extracting entity multidimensional features, scientifically quantifying feature distinguishing capability, optimizing feature discretization and combination mining logic, establishing a rule and model linkage judging mechanism, and improving scientificity, accuracy and reliability of credit overdue risk prediction. Therefore, optimizing existing credit expiration prediction analysis systems is a considerable problem. Disclosure of Invention In order to solve the above-mentioned drawbacks of the prior art, an object of the present invention is to provide a knowledge-graph-based credit overdue prediction analysis method, and also to provide a knowledge-graph-based credit overdue prediction analysis system, so as to solve the above-mentioned problems in the prior art. In order to solve the technical problems, the invention provides the following technical scheme: In a first aspect, a knowledge graph-based credit overdue prediction analysis method includes the steps of: based on the credit knowledge graph, extracting network structure features and business association features of the entity, and converting the ne