CN-121998766-A - Supply chain financial risk prediction method based on buyer transaction
Abstract
The invention relates to a supply chain financial risk prediction method based on buyer transaction, in particular to the field of supply chain finance, the method is based on buyer historical transaction data, combines a machine learning model and a systematic preprocessing flow, can dynamically identify high-risk clients, effectively overcomes the hysteresis defect of traditional risk assessment, defines risk categories by applying a pareto distribution rule and balances sample distribution by adopting a random oversampling technology, the model shows excellent robustness when processing uneven category data, the model is high in training efficiency and high in prediction stability by utilizing second-order Taylor expansion characteristics of XGBoost algorithm and grid search optimization, meanwhile, the comprehensive assessment performance is realized by indexes such as a subject working characteristic curve, an F1 score and the like, in addition, a characteristic importance analysis and threshold optimization mechanism provides interpretable decision support for managers, and early warning of default risks by assistance, optimization of payment terms and reinforcement of supply chain cooperation.
Inventors
- Ge Zelong
- LIANG ZHUOMIN
- XIE TONGTONG
- LI FAN
Assignees
- 深圳大学
Dates
- Publication Date
- 20260508
- Application Date
- 20251215
Claims (10)
- 1. A supply chain financial risk prediction method based on a buyer transaction, comprising the steps of: S1, when a supply chain financial system receives a risk prediction request, acquiring a historical transaction data set of a buyer, wherein the historical transaction data set comprises a payment behavior variable and a transaction interaction variable, the payment behavior variable comprises average payment time and overdue payment proportion, and the transaction interaction variable comprises whether supply chain financing is provided or not and whether payment terms are changed or not; S2, defining risk categories based on preprocessed standardized data by applying a pareto distribution rule, taking invoice proportion which is not paid according to a stipulated clause as a reference variable, taking the eighteenth quantile of the invoice proportion as a classification threshold, marking samples higher than the threshold as high risk categories and marking the rest samples as low risk categories; s3, constructing and training a XGBoost machine learning model based on a balanced training set, firstly dividing data into a training subset and a testing subset according to a preset proportion, optimizing model parameter combinations through a grid search method, and simultaneously evaluating model stability by adopting a multi-fold cross verification method; s4, performing performance evaluation on the trained risk prediction model by using the test subset, and quantifying model performance by using three indexes of a subject working characteristic curve, an area value under the curve and an F1 score, wherein the subject working characteristic curve is used for analyzing classification capability of the risk prediction model under different decision thresholds, the area value under the curve provides overall performance measurement, the F1 score comprehensively evaluates and predicts precision, and when the evaluation index reaches a preset standard, new buyer transaction data is input into the risk prediction model, classification results of a high risk class or a low risk class are output, and accurate prediction and early warning of financial risk of a supply chain are realized.
- 2. The method according to claim 1, wherein in step S1, the historical transaction data set in the data acquisition and preprocessing step includes payment behavior variables and transaction interaction variables, wherein the payment behavior variables include average payment time and overdue payment proportion, and the transaction interaction variables include whether to provide supply chain financing and whether to change payment terms; When the missing value is processed, a mean value filling method is adopted for the continuous variables, namely, the arithmetic mean value of each continuous variable is calculated, the missing value is replaced by the mean value, and a classification model based on a decision tree is used for predicting and filling the discrete variables, namely, a decision tree model is trained on non-missing data to predict the missing value; During correlation analysis, a redundancy index based on eigenvalue decomposition is adopted, redundancy is quantified through calculating the ratio of the absolute value of the difference between eigenvalues of a variable pair to the square root of the sum of squares of the eigenvalues, and when the redundancy score is lower than a preset threshold value, the variable is judged to be high redundancy and removed; When the data is normalized, the minimum value of each continuous variable is subtracted by the difference between the maximum value and the minimum value by adopting a minimum-maximum scaling method, so that the variable value is normalized to a zero-to-one interval.
- 3. The method for predicting the financial risk of the supply chain based on the buyer transaction according to claim 2, wherein in the step S1, a preset threshold is set to be zero point one for redundancy judgment in correlation analysis; The minimum and maximum values in the minimum and maximum scaling method are calculated from the historical data of each continuous variable, so that the normalized data are ensured to have comparability.
- 4. The method according to claim 3, wherein in step S2, based on the preprocessed standardized data, a Parritol distribution rule is applied to define risk categories, an invoice proportion which is not paid according to the stipulated terms is used as a reference variable, an eighth quantile thereof is used as a classification threshold, samples higher than the threshold are marked as high risk categories, and the rest samples are marked as low risk categories; Aiming at the problem of uneven category distribution, a random oversampling technology is adopted to balance the training data, and the number of high-risk samples and low-risk samples is balanced by copying the high-risk category samples, so that a balanced training data set is generated.
- 5. The method of claim 4, wherein in step S2, the eighteenth quantile is calculated by an inverse function of an empirical cumulative distribution function, and the minimum real number is found so that the cumulative distribution function value is greater than or equal to zero eight; Random oversampling techniques replicate samples from high risk samples by sampling with a put back so that the high risk samples are equal to the low risk samples.
- 6. The method for predicting financial risk of a supply chain based on a buyer transaction according to claim 5, wherein in step S3, a machine learning model is constructed and trained XGBoost based on the balanced training set, the data is firstly divided into a training subset and a testing subset according to a preset proportion, then model parameter combinations are optimized through a grid search method, and meanwhile, model stability is evaluated by adopting a multi-fold cross validation method; In the training process, the calculation efficiency is improved by utilizing the second-order Taylor expansion characteristic of XGBoost algorithm, and an optimized risk prediction model is generated.
- 7. The method for predicting financial risk of a supply chain based on a transaction of a buyer according to claim 6, wherein in step S3, the data division adopts a random hierarchical sampling method to ensure that the category distribution of the training set and the testing set is consistent with the original data; The multi-fold cross verification divides the training set into a plurality of mutually exclusive subsets, and the training set is used as a verification set in turn to evaluate the performance of the model; XGBoost uses the first and second derivative information of the loss function for gradient boosting during training.
- 8. The supply chain financial risk prediction method based on the buyer transaction, which is characterized by comprising the following steps of performing performance assessment on a trained risk prediction model by using a test subset, and quantifying model performance by applying three indexes of a subject work characteristic curve, an area value under the curve and an accuracy recall ratio harmonic mean, wherein the subject work characteristic curve is formed by calculating a true positive rate and a false positive rate through traversing a decision threshold, the true positive rate is a proportion of a true positive instance to an actual negative instance, the false positive rate is a proportion of the false positive instance to the actual negative instance, the area value under the curve is obtained by integrating the area under the subject work characteristic curve, the accuracy recall ratio harmonic mean is calculated by the accuracy rate and the recall ratio harmonic mean, the accuracy rate is a proportion of the true instance to the predicted positive instance, and the recall ratio is a proportion of the true instance to the actual positive instance.
- 9. The method according to claim 8, wherein in step S4, when the performance evaluation index reaches a predetermined criterion, a decision threshold is optimized, and an optimal decision threshold is selected by maximizing a difference between the true positive rate and the false positive rate, wherein the difference between the true positive rate and the false positive rate is defined as a Euclidean index, and the optimal decision threshold is a threshold that maximizes the Euclidean index.
- 10. The method of claim 9, wherein in step S4, new buyer transaction data is input into a risk prediction model to obtain a prediction probability, and then an optimal decision threshold is applied to classify, wherein when the prediction probability is greater than or equal to the optimal decision threshold, a high risk class is output, and otherwise, a low risk class is output, so that accurate prediction and early warning of the supply chain financial risk are realized.
Description
Supply chain financial risk prediction method based on buyer transaction Technical Field The invention relates to the technical field of supply chain finance, in particular to a supply chain finance risk prediction method based on buyer transaction. Background Supply chain finance is taken as a financing mode based on core enterprises and on the premise of real trade background, and has become an important tool for solving the financing problem of small and medium enterprises in recent years. In a supply chain financial scenario based on buyer transactions, a financial institution typically surrounds a core buyer enterprise with a high credit rating, providing financial services such as receivables financing, inventory financing, etc. to numerous small and medium-sized suppliers upstream thereof. In this business model, the credit status and payment ability of the buyer become key elements for risk control. The actual business involves multiparty participation bodies including core buyers, suppliers, financial institutions and third party logistics enterprises, and the transaction chain is complex and dynamically changed. The traditional risk management method mainly relies on manual auditing of buyer financial statement, historical transaction record and static credit rating information, and combines industry experience to carry out risk judgment. With the increasing globalization degree of supply chains and the increasing volume of transaction data, the business conditions and payment behaviors of buyer enterprises may change rapidly due to market fluctuations, industry policies or incidents, which makes the hysteresis problem of static risk assessment patterns increasingly prominent. The core technical problem faced by current supply chain financial risk management and control systems is the lack of dynamic predictive capability for buyer credit risk. The prior art generally adopts statistical analysis models based on historical data, and the models often depend on financial indexes and static credit scores, so that real-time risk signals in the transaction behaviors of buyers cannot be effectively captured. For example, buyers may experience delays in payment due to temporary funds turnover issues or suddenly change payment terms due to supply chain interruptions, which dynamic behavioral characteristics are difficult to identify and quantify in time in traditional wind-controlled models. More specifically, the prior art lacks machine learning analysis means for multidimensional features of buyer transaction behavior, fails to establish a dynamic correlation model between risk indicators and payment violations, resulting in significant hysteresis in early warning of potential risks. The technical defects lead financial institutions to face operation risks of untimely risk identification and unhealthy early warning mechanisms when carrying out accounts receivable insurance, reverse insurance and other businesses, and bad account loss and liquidity crisis can be caused. Disclosure of Invention The invention provides a supply chain financial risk prediction method based on buyer transaction aiming at the technical problems in the prior art so as to solve the problems in the prior art. The technical scheme for solving the technical problems is as follows, a supply chain financial risk prediction method based on buyer transaction comprises the following steps: S1, when a supply chain financial system receives a risk prediction request, acquiring a historical transaction data set of a buyer, wherein the historical transaction data set comprises a payment behavior variable and a transaction interaction variable, the payment behavior variable comprises average payment time and overdue payment proportion, and the transaction interaction variable comprises whether supply chain financing is provided or not and whether payment terms are changed or not; S2, defining risk categories based on preprocessed standardized data by applying a pareto distribution rule, taking invoice proportion which is not paid according to a stipulated clause as a reference variable, taking the eighteenth quantile of the invoice proportion as a classification threshold, marking samples higher than the threshold as high risk categories and marking the rest samples as low risk categories; s3, constructing and training a XGBoost machine learning model based on a balanced training set, firstly dividing data into a training subset and a testing subset according to a preset proportion, optimizing model parameter combinations through a grid search method, and simultaneously evaluating model stability by adopting a multi-fold cross verification method; s4, performing performance evaluation on the trained risk prediction model by using the test subset, and quantifying model performance by using three indexes of a subject working characteristic curve, an area value under the curve and an F1 score, wherein the subject working characteristic curve is