CN-122022979-A - Credit rating evaluation method and system for multi-source data fusion
Abstract
The invention relates to the technical field of data management, in particular to a credit rating evaluation method and system for multi-source data fusion, wherein the method comprises the steps of accessing three types of data, namely a financial institution, a third party and low credibility supplement, quantifying four credibility indexes, such as accuracy, integrity and the like, weighting to obtain comprehensive credibility scores of all data sources, distributing fusion weights in a layered manner, and extracting key risk signals of the low credibility data; and (3) splicing the data characteristics and the risk signals, calculating an original score, mapping the original score into a credit rating, monitoring the quality of a data source in real time, dynamically updating the credibility score and the fusion weight, and adapting and adjusting the rating result. The scheme solves the problem of accurately measuring the reliability of the multi-source data, avoids the noise interference of the low-reliability data, reserves important risk information, solves the problem of the dynamic adjustment hysteresis of the weight, and realizes the accurate and real-time updating of the credit rating.
Inventors
- XIANG CHENG
- NIU ZHIQIANG
- LI LIN
- MAO TING
Assignees
- 中国工商银行股份有限公司开封分行
Dates
- Publication Date
- 20260512
- Application Date
- 20251202
Claims (10)
- 1. The credit rating evaluation method for multi-source data fusion is characterized by comprising the following steps: three types of data are supplemented in the financial institution, a third party and low reliability, the time stamp, the field naming and the data type are unified, cleaning is completed through missing value filling, abnormal value removing and repeated data deduplication, and a reliability evaluation foundation is laid; converting four indexes of accuracy, completeness, timeliness and stability into quantized values through data matching rate, field completeness, data freshness and data quality fluctuation coefficient calculation; obtaining the comprehensive credibility score of each data source according to the preset weight weighting summation, establishing a mapping library, and starting a real-time monitoring process; Dividing high, medium and low three layers according to the comprehensive credibility score, and distributing corresponding fusion weights; Splicing the layer data characteristics and the key risk signals, weighting and calculating an original score according to the weight, and mapping the score into a corresponding credit rating grade; Updating the credibility score of the data source in real time, adjusting the fusion weight according to the score change, and if a high-priority risk signal exists, adjusting the supplementary weight in an adaptive mode, and synchronously updating the credit rating result.
- 2. The credit rating evaluation method for multi-source data fusion according to claim 1, wherein three types of data are supplemented in an access financial institution, a third party and low reliability, time stamps, field names and data types are unified, cleaning is completed through missing value filling, outlier rejection and repeated data deduplication, a reliability evaluation foundation is laid, specifically, regularized mapping is conducted on the homologous and heterologous fields crossing data sources, cleaning is completed through specific filling of missing values, abnormal values and repeated data removal according to data source priority and time stamp rules, uniform access data formats and thorough invalid data rejection are ensured, and a high-quality data foundation is provided for subsequent reliability evaluation.
- 3. The credit rating evaluation method for multi-source data fusion according to claim 1 is characterized in that four indexes of accuracy, completeness, timeliness and stability are converted into quantized values through data matching rate, field completeness, data freshness and data quality fluctuation coefficient calculation respectively, specifically, the matching rate is calculated through correlation reference data by third party data based on core fields which pass through internal verification of a financial institution, the accuracy is quantized respectively by low-reliability data in a random sampling verification mode, a core field list of each data source is determined, the integrity is quantized according to the proportion of non-missing core fields, the timeliness is quantized according to the longest effective period and the time difference from generation to warehousing of the data, the fluctuation coefficient is calculated based on the credibility intermediate value in a statistical period, abnormal period data is eliminated, and the insufficient period is adjusted according to the corresponding coefficient to quantize the stability.
- 4. The credit rating evaluation method for multi-source data fusion according to claim 1, wherein the integrated reliability score of each data source is obtained by weighting and summing according to preset weights, a mapping library is built, a real-time monitoring process is started, specifically, after four preset reliability index weights are recorded, the integrated reliability score of each data source is calculated according to a preset formula after the validity of the indexes is checked, unique identifiers are allocated to the data sources, a mapping library for storing key information of the data sources is built, query updating efficiency is guaranteed by adopting an adaptive storage mode, and the real-time monitoring process is started to collect check data at regular time and store the check data in a warehouse.
- 5. The credit rating evaluation method for multi-source data fusion according to claim 1, wherein the high, middle and low three layers are divided according to the comprehensive credibility score, corresponding fusion weights are distributed, specifically, the high, middle and low three layers are divided according to the comprehensive credibility score according to a preset rule, the weights are distributed according to a score gradient, the data in the financial institution default distributes the highest weight in the high credibility layer, the weights of other data sources are matched with the score in the forward direction, only the comprehensive credibility score is associated without additional adjustment according to types, the weight distribution rule is linked with a mapping library in real time, and the automatic verification layering is updated along with the credibility score and the matching weight.
- 6. The credit rating evaluation method for multi-source data fusion according to claim 1, wherein the low-credibility data is characterized in that key risk signals are extracted through key word matching and emotion tendency analysis, specifically, a multi-dimensional risk key word library containing synonym variant word mapping is constructed and updated regularly by taking a financial credit risk scene as a core, emotion tendency analysis is carried out by adopting a combination mode of accurate matching and semantic similarity matching and a pre-training model, effective risk signals are screened according to the correlation, authority, release time effectiveness and duplication removal conditions of a release subject, a feature set is generated through signal structuring, invalid information is removed, suspected high-risk to-be-rechecked records are marked and are subjected to manual auditing, auditing results are used for model and key word library optimization, and the extraction process is linked with real-time monitoring process.
- 7. The credit rating evaluation method for multi-source data fusion according to claim 1, wherein the method is characterized in that the method comprises the steps of splicing layer data features and key risk signals, namely, taking a client unique identifier as a core associated key, respectively extracting effective structural features of high, medium and low credibility layers, removing redundant contents, taking a key risk feature set of the low credibility data as an independent group, transversely splicing the key risk feature set of the low credibility data with the three-layer structural features to form a credit rating feature matrix, executing a field duplication removal rule, checking the integrity of the matrix, marking the specification of a missing non-core field, recording splicing time by adopting a unified time stamp, performing linkage timing with a real-time monitoring process, and reserving an updating track to support tracing.
- 8. The method for evaluating the credit rating of the multi-source data fusion according to claim 1, wherein the original score is calculated according to weight weighting, the score is mapped into a corresponding credit rating level, specifically, standardized processing is carried out on each layer of structured features, the weighted sum of each layer of structured features is calculated according to corresponding fusion weight, the key risk features calculate the score according to rules and multiply the complementary weight, the original score is obtained through superposition, the credit rating level is mapped according to preset rules, the critical value can be combined with fine adjustment of the occupation ratio of a risk signal and high reliability data, and the calculation and mapping are synchronously carried out along with updating of a feature matrix every hour.
- 9. The credit rating evaluation method for multi-source data fusion according to claim 1, wherein the reliability score of the data source is updated in real time, and the fusion weight is adjusted according to the score change, specifically, the reliability score is recalculated after the validity check by means of collecting the latest quality data of the data source by a real-time monitoring process, the weight adjustment is triggered according to a preset condition by comparing the historical score, and the new weight is matched according to a hierarchical gradient rule.
- 10. A credit rating evaluation system for multi-source data fusion for implementing the method of any of claims 1-9, comprising: the data access module is used for accessing three types of data which are in the financial institution, a third party and low-credibility supplement; The data standardization module is used for unifying time stamps, field naming and data types of various data; the data cleaning module is used for completing data cleaning through missing value filling, abnormal value removing and repeated data deduplication, and laying a credibility evaluation foundation; the credibility index quantization module is used for converting four indexes of accuracy, completeness, timeliness and stability into quantized values through calculation of data matching rate, field completeness, data freshness and data quality fluctuation coefficients; The comprehensive credibility calculation module is used for obtaining the comprehensive credibility score of each data source by weighting and summing according to preset weights and establishing a mapping library; The real-time monitoring module is used for starting a real-time monitoring process and updating the credibility score of the data source in real time; the credibility layering module is used for dividing the high, medium and low layers according to the comprehensive credibility score and distributing corresponding fusion weights for each layer; The risk signal extraction module is used for extracting key risk signals from the low-credibility data through keyword matching and emotion tendency analysis; The credit rating calculation module is used for splicing the layer data characteristics and the key risk signals, calculating an original score according to weight weighting, and mapping the score into a corresponding credit rating grade; And the weight and rating adjustment module is used for adjusting the fusion weight according to the reliability score change, and if the high-priority risk signal exists, the weight and rating adjustment module is used for adapting and adjusting the supplementary weight and synchronously updating the credit rating result.
Description
Credit rating evaluation method and system for multi-source data fusion Technical Field The invention relates to the technical field of data management, in particular to a credit rating evaluation method and system for multi-source data fusion. Background The setting of the credit rating (CREDIT RANK) refers to providing credit quality information reflecting the credit reliability degree of the rating object and being popular and easy to understand to the user of the rating result through a certain symbol on the basis of strict analysis. The credit rating is a way to evaluate the expression and transmission of information, and if the rating symbols are complex and difficult to distinguish, the meaning of the rating symbols is obscure, and the evaluation information is difficult to be known and accepted by the vast investors. In financial activities, the adoption of credit ratings is relatively widespread for different credit ratings. The patent literature with the publication number of CN120182006A discloses a gold fusion rule risk assessment method based on big data, which comprises the steps of collecting and standardizing financial transaction data, customer information data, supervision requirement data and market data to obtain a structured data set, constructing a multi-level gold fusion rule risk assessment index system based on the structured data set, training a stacked fusion network comprising a decision tree, a neural network and a support vector machine by utilizing the structured data set and the multi-level gold fusion rule risk assessment index system to obtain a gold fusion rule risk assessment model, identifying potential compliance risk points according to the compliance risk assessment report, quantitatively assessing the distribution weights of the compliance risk points to obtain a risk point quantitative assessment result, setting a hierarchical early warning threshold based on the risk point quantitative assessment result, generating an early warning signal comprising risk point description, risk level and treatment suggestion when the risk quantitative index exceeds a preset threshold, and sending the early warning signal to a corresponding risk management department. In the credit rating evaluation technology of multi-source data fusion of the prior financial institutions including the prior art, the reliability difference of the multi-source data is obvious, for example, the accuracy of data in the financial institutions is better than that of data of a third party, the weight is dynamically adjusted according to the reliability during fusion, but how to quantify the reliability is a technical problem that the reliability of a data source needs to consider 'accuracy, completeness, timeliness and stability', but the indexes are difficult to quantify, the reliability of the data source can change along with time (for example, the accuracy is reduced due to data leakage of a certain third party credit bureau), the weight needs to be monitored and adjusted in real time, part of low-reliability data (for example, social media public opinion) can comprise key risk signals (for example, enterprise negative news), and direct discarding can lead to information loss, but if the weight is given, the reliability of the low-reliability data can be covered by noise of the high-reliability data. Therefore, optimizing existing credit rating evaluation methods is a considerable problem. Disclosure of Invention In order to solve the above-mentioned shortcomings in the prior art, an object of the present invention is to provide a credit rating evaluation method for multi-source data fusion, and simultaneously provide a credit rating evaluation system for multi-source data fusion, so as to solve the problems set forth in the above-mentioned background art. In order to solve the technical problems, the invention provides the following technical scheme: in a first aspect, a credit rating assessment method for multi-source data fusion includes the steps of: three types of data are supplemented in the financial institution, a third party and low reliability, the time stamp, the field naming and the data type are unified, cleaning is completed through missing value filling, abnormal value removing and repeated data deduplication, and a reliability evaluation foundation is laid; converting four indexes of accuracy, completeness, timeliness and stability into quantized values through data matching rate, field completeness, data freshness and data quality fluctuation coefficient calculation; obtaining the comprehensive credibility score of each data source according to the preset weight weighting summation, establishing a mapping library, and starting a real-time monitoring process; Dividing high, medium and low three layers according to the comprehensive credibility score, and distributing corresponding fusion weights; Splicing the layer data characteristics and the key risk signals, weighting and calculati