Search

CN-122022991-A - Credit fraud risk prediction method and system based on distillation graph neural network

CN122022991ACN 122022991 ACN122022991 ACN 122022991ACN-122022991-A

Abstract

The invention discloses a credit fraud risk prediction method and a system based on a distillation diagram neural network, wherein the method firstly acquires credit related relationship diagram data and characteristic index data, and constructs a credit data diagram structure according to the obtained credit related relationship diagram data and characteristic index data; the method comprises the steps of setting up a credit fraud risk prediction model, training a teacher module of a related data input model to obtain soft labels and intermediate characteristic representations, migrating knowledge of the teacher module to a student module through a knowledge distillation fusion module and combining temperature super parameters and a mixed loss function to complete training optimization, and finally inputting the credit data to be predicted into the trained student module to output credit fraud risk prediction probability and result. The system comprises a data acquisition and preprocessing module, a teacher module training module, a knowledge distillation module, a student model reasoning module, an interpretability analysis module and a model monitoring and updating module, and solves the problems of weak model generalization capability, low reasoning speed and high deployment cost in traditional credit fraud detection.

Inventors

  • LU JUNXIANG
  • ZHANG YUQING

Assignees

  • 西安工程大学

Dates

Publication Date
20260512
Application Date
20260326

Claims (10)

  1. 1. A credit fraud risk prediction method based on a distillation graph neural network is characterized by comprising the steps of firstly obtaining credit related relation graph data and characteristic index data, constructing a credit data graph structure, constructing a distillation graph neural network credit fraud risk prediction model consisting of a teacher graph neural network module, a student graph neural network module and a knowledge distillation fusion module, inputting relevant data into the teacher module for training to obtain soft labels and intermediate characteristic representations, migrating knowledge of the teacher module to the student module through the knowledge distillation fusion module in combination with temperature super-parameters and a mixed loss function, completing training optimization, inputting the same preprocessed credit data to be predicted into the trained student module, and outputting credit fraud risk prediction probability and result.
  2. 2. The distillation graph neural network-based credit fraud risk prediction method according to claim 1, comprising the steps of: S1, acquiring credit correlation graph data and characteristic index data, constructing a credit data graph structure, wherein the relationship graph data comprises a node characteristic matrix and an edge correlation matrix, the characteristic index data comprises five primary indexes and 30 secondary indexes of loan core information, borrower basic information, credit behavior and liability information and historical credit expression, and a target variable is whether a fraudulent client; S2, a credit fraud risk prediction model based on a distillation graph neural network is constructed, wherein the prediction model comprises a teacher graph neural network module, a student graph neural network module and a knowledge distillation fusion module; S3, inputting the relation diagram data and the characteristic index data into a teacher diagram neural network module for training to obtain a soft label and an intermediate characteristic representation of the teacher module; s4, migrating the soft labels and the intermediate characteristic representations of the teacher module to the student graph neural network module through the knowledge distillation fusion module by combining the temperature super-parameters and the mixed loss function, and training and optimizing the student model; S5, the credit data to be predicted are processed according to the step S1 and then input into a training-completed student graph neural network module, and the credit fraud risk prediction probability and result are obtained.
  3. 3. The credit fraud risk prediction method based on the distillation graph neural network according to claim 2, wherein the specific process of constructing the credit data graph structure in the step S1 includes constructing the graph structure by taking a credit account as a node, carrying out transaction association, guarantee association and common application association between accounts as edges, carrying out standardization processing on 30 secondary characteristic indexes, processing missing values through a mean filling method to form a node characteristic matrix X, wherein the dimension is n×30, N is the number of nodes, traversing the association relationship between the nodes, constructing an edge relationship matrix E, the dimension is 2×m, M is the number of edges, the first row represents a source node index, and the second row represents a target node index.
  4. 4. The distillation graph neural network-based credit fraud risk prediction method according to claim 2, wherein the feature normalization adopts a Z-score normalization method, and the calculation formula is as follows: ; Wherein, the As the original value of the feature, As a mean value of the feature, As a standard deviation of the features, Is the normalized characteristic value.
  5. 5. The method for predicting credit fraud risk based on distillation graph neural network according to claim 2, wherein the teacher graph neural network module is a 3-layer GCN network structure, and the core implements node feature aggregation based on graph convolution operation, and a graph convolution calculation formula is as follows: ; Wherein, the Is the node characteristic matrix of the layer I, An output feature matrix of the I+1th layer; an adjacency matrix for adding self-loops; Is that Degree matrix of (2) satisfying , A learnable weight matrix for layer I; to activate the function.
  6. 6. The method for predicting credit fraud risk based on distillation graph neural network according to claim 5, wherein the teacher graph neural network module is specifically configured to receive a 30-dimensional node feature matrix at an input layer, 24 feature channels are set at two middle layers, a ReLU activation function is adopted, a Dropout layer is added after each layer to suppress overfitting, and a Dropout operation formula is as follows: ; Wherein, the For the dropout probability, Is in combination with Binary mask matrix with consistent dimensions, and element values obey Bernoulli distribution , The output layer is a full-connection layer, and outputs two kinds of probability distribution as soft labels of the teacher module.
  7. 7. The distillation graph neural network-based credit fraud risk prediction method according to claim 5, wherein the student graph neural network module is a 2-layer GCN network structure, and the graph convolution calculation formula and the activation function are the same as those of the teacher graph neural network module; The student graph neural network module is specifically arranged to keep the input layer consistent with the teacher module and receive the 30-dimensional node characteristic matrix, the number of the two-layer characteristic channels is 24, a ReLU activation function and a Dropout layer are adopted, the output layer is a full-connection layer, and the two-classification credit fraud risk prediction probability is output.
  8. 8. The distillation graph neural network-based credit fraud risk prediction method according to claim 5, wherein the specific operation of the knowledge distillation fusion module comprises introducing a temperature parameter Softening logits output by the teacher module to obtain soft labels The calculation formula is as follows: ; Wherein, the For the teacher's module output logits, Is used for the temperature super-parameter, Is the category number; the student model output logits is obtained after softening by the same temperature parameter The calculation formula is as follows: ; Wherein, the Outputs logits for the student model; Constructing a mixing loss function Including distillation losses And task loss The calculation formula is as follows: ; Wherein, the And the balance coefficient is used for adjusting the weight proportion of soft tag knowledge migration and hard tag supervision training, balancing the knowledge migration of the teacher module and the task training precision of the student model, and realizing the optimization training of the student model.
  9. 9. The method for predicting credit fraud risk based on distillation map neural network according to claim 2, further comprising a model interpretability analysis step of calculating a feature contribution degree by a feature importance evaluation algorithm, wherein a core formula is as follows: ; Wherein, the Is characterized by Is a score of the importance of (1), F1 score for the original model; To be characteristic of The higher the score is, the larger the influence of the feature on the predicted result is, and the correlation among the features is analyzed through thermodynamic diagram, and the calculation formula of the correlation coefficient is as follows: ; Wherein, the Is characterized by And (3) with Is used to determine the covariance of (1), 、 The variances of the two features are respectively, the range of the correlation coefficient is [ -1,1], and the larger the absolute value is, the stronger the correlation is.
  10. 10. A credit fraud risk prediction system based on a distillation map neural network, comprising: The data acquisition and preprocessing module is used for collecting credit business data, preprocessing and constructing a credit data graph structure; Training the preprocessed credit data by the teacher module training module, and outputting soft labels and intermediate characteristic representations; the knowledge distillation module is used for realizing temperature parameter adjustment and mixed loss function calculation and completing the migration of the teacher module knowledge to the student model; The student model reasoning module is used for receiving data to be predicted and outputting credit fraud risk prediction results; the interpretability analysis module is used for evaluating the feature importance and analyzing the correlation and outputting a core influence factor report; And the model monitoring and updating module is used for monitoring the performance index of the model in real time and ensuring the adaptability of the model.

Description

Credit fraud risk prediction method and system based on distillation graph neural network Technical Field The invention belongs to the technical field of financial wind control, and particularly relates to a credit fraud risk prediction method and system based on a distillation diagram neural network. Background Under the background that the digitization transformation of the financial business is continuous and deep, the online and intelligent degree of the credit business is continuously improved, meanwhile, the concealment and complexity of the credit fraud are also obviously increased, the fraud means presents the development trend of technical densification, group industrialization and mode diversity, higher requirements are put forward on the wind control capability of the financial institutions, and the method also becomes an important factor for restricting the healthy development of the credit business. The graphic neural network can effectively process graphic structural characteristics of credit data by means of unique message transmission mechanism and arrangement invariance, accurately capture association relations among accounts, mine deep group fraud and association fraud modes, and show remarkable advantages in the field of credit fraud risk detection. However, the pure graph neural network model still has a plurality of technical defects in the practical floor application, and the technical defects of the traditional credit fraud detection method, which lead to the current credit fraud risk prediction to still face a plurality of technical problems, in particular to the following aspects: 1. Traditional credit fraud detection relies on a manual rule engine and a single machine learning model, manual rules are difficult to quickly adapt to novel fraud modes, rule iteration maintenance cost is high, the problem of high false alarm rate is easy to occur, the single machine learning model cannot effectively mine complex association features among credit accounts, recognition capability on group partner and association fraud is insufficient, misjudgment rate is high, and wind control detection requirements are difficult to meet. 2. Although the deep learning model like the graphic neural network can make up for the short plates of the traditional method, the model has the advantages of large parameter quantity, low reasoning speed, high occupancy rate of the hardware video memory, high deployment cost, difficulty in adapting to the scene requirement of real-time wind control of the financial credit business and limited floor application. 3. The credit business data has the natural unbalanced problem of category, the feature dimension is high, the missing value distribution is complex, the overfitting phenomenon is easy to occur in the model training process, the generalization capability of the model is weak, and the prediction stability under different business scenes and different data distribution is poor. The knowledge distillation technology transfers the knowledge of the complex model to a lightweight model through a teacher-student network architecture, and can realize model compression and acceleration on the premise of ensuring performance. Therefore, knowledge distillation and a graph neural network are combined, and a credit fraud risk prediction model considering both detection precision and deployment efficiency is constructed, so that the method becomes a key direction for solving the pain point of the current technology. Disclosure of Invention The invention aims to provide a credit fraud risk prediction method based on a distillation graph neural network, which solves the problems of weak model generalization capability, low reasoning speed and high deployment cost in traditional credit fraud detection. It is another object of the present invention to provide a credit fraud risk prediction system based on a distillation map neural network The first technical scheme adopted by the invention is that the credit fraud risk prediction method based on the distillation graph neural network comprises the steps of firstly obtaining credit related relation graph data and characteristic index data, constructing a credit data graph structure according to the credit related relation graph data and the characteristic index data, constructing a distillation graph neural network credit fraud risk prediction model consisting of a teacher graph neural network module, a student graph neural network module and a knowledge distillation fusion module, inputting relevant data into the teacher module for training to obtain soft labels and intermediate characteristic representations, combining temperature super-parameters and a mixed loss function through the knowledge distillation fusion module, transferring knowledge of the teacher module to the student module and completing training optimization, and finally inputting the credit data to be predicted which is subjected to the same preprocessing into th