CN-122022848-A - Enterprise credit supervision information verification method and system based on multi-source data
Abstract
The invention discloses an enterprise credit supervision information verification method and system based on multi-source data, comprising the steps of performing data primary screening to calculate a first risk coefficient of each multi-source credit supervision information, calculating a second risk coefficient according to metadata of each multi-source credit supervision information, performing explicit verification, calculating dynamic credibility of each multi-source credit supervision information, obtaining multi-source data fusion decision values through self-adaptive weighted fusion, constructing an enterprise credit association graph, performing implicit verification on association consistency indexes of enterprises corresponding to nodes of each graph through implicit association risk propagation calculation, determining a multi-source credit supervision information contradiction time period, calculating contradiction attention scores of each multi-source credit supervision information, determining responsibility sources in the contradiction time period, and performing information correction. The method not only can improve the efficiency and accuracy of enterprise credit supervision information verification, but also has better interpretability, and can be directly applied to an enterprise credit supervision information verification system.
Inventors
- ZHOU LI
- CHEN YUE
- MIAO XIAOFENG
- ZHENG YONGYUE
- ZHAO YAN
- WU GUOLI
- LI XIANGHUA
- JIANG ZHOU
- YUAN RUIFENG
- LI YIFENG
- YE RUYI
- MENG CUIZHU
Assignees
- 中国标准化研究院
Dates
- Publication Date
- 20260512
- Application Date
- 20260205
Claims (8)
- 1. The enterprise credit supervision information verification method based on the multi-source data is characterized by comprising the following steps of: S1, acquiring multi-source credit supervision information and metadata of an enterprise, performing data primary screening to calculate a first risk coefficient of each multi-source credit supervision information, calculating a second risk coefficient according to the metadata of each multi-source credit supervision information, and performing explicit verification according to the first risk coefficient and the second risk coefficient according to a time period; S2, calculating the dynamic credibility of each multi-source credit supervision information, and carrying out self-adaptive weighted fusion on each multi-source credit supervision information according to the dynamic credibility to obtain a multi-source data fusion decision value; S3, constructing an enterprise credit association graph according to the multi-source data fusion decision value and the enterprise basic attribute of each enterprise, continuously acquiring multi-source credit supervision information to perform implicit association risk propagation, and calculating association consistency indexes of the enterprises corresponding to nodes of each graph to perform implicit verification according to time periods; S4, determining a multisource credit supervision information contradiction time period according to the explicit verification result and the implicit verification result, calculating contradiction attention scores of the multisource credit supervision information according to the first risk coefficient, the second risk coefficient and the dynamic credibility, determining a responsibility source in the contradiction time period, and carrying out information correction; The multi-source credit supervision information comprises industrial and commercial supervision information, tax supervision information, judicial supervision information and public opinion supervision information, the metadata comprises market environment and data operation information when the credit supervision information is generated, the first risk coefficient is inversely related to the data quality of the multi-source credit supervision information, and the second risk coefficient is inversely related to the multi-source credit supervision information generation environment.
- 2. The method for verifying corporate credit supervision information based on multi-source data according to claim 1, wherein the method for calculating the first risk factor of each multi-source credit supervision information comprises: counting blank data types and quantity in each multi-source credit supervision information, wherein the blank data types comprise key blank data and non-key blank data; Screening each multi-source credit supervision information according to a standard data value range, a business rule and a standard data format to determine potential abnormal data, wherein the potential abnormal data comprises statistical abnormal data, logic abnormal data and format abnormal data; Calculating a first risk coefficient of each multi-source credit supervision information according to the statistical indexes of the blank data and the potential abnormal data, wherein the expression is as follows: ; ; Wherein the method comprises the steps of Is that A first risk factor for the data source credit regulatory information, As a parameter of the dimensions of the device, In order to be able to carry out the parameters of the translation, For the first risk score to be a first, 、 、 As a first risk weight of the first set of risk weights, Is that Data source The weight of the blank-like data, Is that Data source The blank amount of the blank-like data, Is that Data source The total number of class data, Is that The total number of potentially anomalous data from the data source, Is that Data source Potentially anomalous data Is used for the degree of abnormality scoring of (1), For the degree of abnormality scoring mapping function, Is that The difference between the time stamp of the corresponding time period of the data source data and the current verification time, To update the delay risk threshold.
- 3. The method for verifying corporate credit regulatory information based on multi-source data according to claim 1, wherein the method for calculating the second risk factor comprises: extracting market environment and data operation information of each multi-source credit supervision information, wherein the market environment comprises industry credit deviation degree, macroscopic economic fluctuation rate and enterprise relative body quantity risk; Calculating the environmental risk of enterprise credit supervision information according to the market environment, wherein the expression is: ; Wherein the method comprises the steps of Environmental risk for corporate credit regulatory information, 、 、 、 、 As a weight of the market environment, Is that The business credit score for the time period is assessed, 、 Is average and standard deviation of long-term history of the whole industry, Is that The macro economic index value of the period is evaluated, Is a macroscopic economic index value Is used for the long-term trend value of (c), Is the standard deviation of the macroscopic economic index value, For the scale location of an enterprise in the industry, Is the heat index of the industry public opinion, Is regional judicial activity index; Calculating the operation risk of each multi-source credit supervision information according to the data operation information, wherein the expression is as follows: ; Wherein the method comprises the steps of Is that The data source credits govern the risk of operation of the information, 、 、 、 、 For the weight of the data manipulation, Is that The data source single-time data acquisition is time-consuming, For the same class of task to be historically time-consuming, Is that The number of system interfaces and manual approval links involved in the data source acquisition process, Is that The density of the abnormal accesses of the data source, Is that The number of changes to the version of the data source, The trust degree of operators; calculating a second risk coefficient of each multi-source credit supervision information by using the environmental risk of the enterprise credit supervision information and the operation risk of each multi-source credit supervision information, wherein the expression is as follows: ; ; Wherein the method comprises the steps of Is that A second risk factor for the data source credit regulatory information, For the second risk score to be a score, In order to scale the parameters of the device, As a parameter of the curvature of the web, As a parameter of the curvature of the web, Is a very small constant.
- 4. The method for checking enterprise credit supervision information based on multi-source data according to claim 1, wherein the method for explicit checking according to time periods comprises: and calculating a first risk coefficient mean value and a second risk coefficient mean value of all the multi-source credit supervision information in the same time period, taking the product of the first risk coefficient mean value and the second risk coefficient mean value as an explicit risk score, and when the explicit risk score is larger than an explicit risk threshold value, judging that the multi-source credit supervision information in the corresponding time period does not pass the explicit verification.
- 5. The method for verifying enterprise credit regulatory information based on multi-source data according to claim 1, wherein the method for obtaining the multi-source data fusion decision value comprises the following steps: Calculating the dynamic credibility of each multi-source credit supervision information according to the generation time, the basic credibility rating and the historical consistency of each multi-source credit supervision information, and carrying out self-adaptive weighted fusion on each multi-source credit supervision information according to the dynamic credibility to obtain a multi-source data fusion decision value, wherein the expression is as follows: ; ; ; Wherein the method comprises the steps of Is that Time period of The multi-source data of the enterprise fuses the decision values, As a total number of data sources, Is that The dynamic trustworthiness of the data source, Is that The traffic weight of the data source is determined, Is that Time period of Enterprise The data source enterprise credit administration information, In order for the dynamic weights to be given, Is that The underlying trustworthiness rating of the data source, Is that The age sensitivity coefficient of the data source, In order to check the time period of the time, Is that The time of generation of the data source credit administration information, For the historical consistency correction factor, by an exponentially weighted moving average calculation, In order to correct the weight of the object, Is that Time period of The enterprise multi-source credit supervision information weighted median, Is a very small constant.
- 6. The method for verifying the credit regulatory information of the enterprise based on the multi-source data according to claim 1, wherein the method for performing the implicit verification according to the time period comprises the following steps: Setting graph nodes according to enterprises, setting association edges of the graph nodes according to enterprise relations, and taking enterprise basic attributes and multi-source data fusion decision values of the enterprises as node characterization of the corresponding graph nodes to construct an enterprise credit association graph construction; Calculating attention coefficients of a target enterprise corresponding graph node and an associated enterprise corresponding neighbor graph node, performing normalization operation, continuously acquiring multisource credit supervision information of each enterprise, and performing implicit associated risk propagation by adopting a graph attention network, wherein the expression is as follows: ; ; Wherein the method comprises the steps of Is a graph node And (3) with Is used for the concentration factor of (a), Is a graph node And (3) with A priori weights of the corresponding enterprise relationships, In order to be able to take the vector of attention parameters, In order for the matrix of parameters to be learnable, 、 The vector is characterized for the node and, For a projection matrix of a relationship type, A vector is embedded for the type of relationship, Graph node updated for graph-based annotation layers At the position of The token vector for the layer is set, Is a graph node Is defined by a set of neighboring nodes of the network, Is a graph node And (3) with Normalized attention coefficient of (a) At the position of The representation of the layer(s), Is that A matrix of the learnable parameters of the layer, Is a nonlinear activation function; Calculating the association consistency index of each graph node corresponding to the enterprise according to the updated enterprise credit association graph, wherein the expression is as follows: ; Wherein the method comprises the steps of Is a graph node Is a correlation consistency index of (a), Is a graph node Is a set of 2-order neighbors of (c), Is a graph node And (3) with Is used for the degree of value deviation of (2), 、 Is a graph node And (3) with Is used for the multi-source data fusion decision value of (a), Is a graph node And (3) with The associated transaction closeness of the corresponding enterprise; And when the association consistency index of the enterprise is larger than the implicit risk threshold, judging that the multisource credit supervision information of the enterprise in the corresponding time period does not pass the implicit verification.
- 7. The method for verifying corporate credit regulatory information based on multi-source data according to claim 1, wherein the method for determining the responsibility source in the contradictory time period comprises the steps of: Defining a time period which does not pass the explicit verification or the implicit verification as a contradiction time period of the multi-source credit supervision information, and calculating contradiction attention scores of the multi-source credit supervision information in the contradiction time period according to the first risk coefficient, the second risk coefficient and the dynamic credibility, wherein the expression is as follows: ; Wherein the method comprises the steps of For contradictory time periods Inner part The contradictory attention score of the data source, 、 For contradictory time periods Inner part A first risk factor and a second risk factor for the data source, For dynamic trustworthiness of the corresponding credit administration information, To corresponding credit supervision information Query vector for current contradictory states Is used to determine the correlation score of the (c), Is a set of data sources; When the contradictory attention score is larger than the contradictory attention threshold, the corresponding data source is judged to be the responsibility source of the business credit risk in the target contradictory time period state-owned enterprise, and the manual verification is introduced to carry out information correction.
- 8. An enterprise credit regulatory information verification system based on multi-source data for performing the method of any one of claims 1-7, comprising: The explicit verification module is used for performing data primary screening to calculate a first risk coefficient of each multi-source credit supervision information, calculating a second risk coefficient according to metadata of each multi-source credit supervision information, and performing explicit verification according to the first risk coefficient and the second risk coefficient according to time periods; The fusion module is used for calculating the dynamic credibility of each multi-source credit supervision information and carrying out self-adaptive weighted fusion on each multi-source credit supervision information according to the dynamic credibility to obtain a multi-source data fusion decision value; The implicit verification module is used for constructing an enterprise credit association graph, continuously acquiring multisource credit supervision information to perform implicit association risk propagation, and calculating association consistency indexes of enterprises corresponding to nodes of each graph to perform implicit verification according to time periods; and the responsibility source tracing module is used for determining a multi-source credit supervision information contradiction time period, calculating the contradiction attention score of each multi-source credit supervision information according to the first risk coefficient, the second risk coefficient and the dynamic credibility, determining the responsibility source in the contradiction time period and carrying out information correction.
Description
Enterprise credit supervision information verification method and system based on multi-source data Technical Field The invention relates to the technical field of enterprise credit supervision, in particular to an enterprise credit supervision information verification method and system based on multi-source data. Background With the deepening of 'management and administration' reform and the promotion of digital government construction, enterprise credit supervision is undergoing a mode transition from traditional manual spot check to data driving and intelligent early warning, and the rapid development of new generation information technologies such as big data, artificial intelligence and the like, provides a technical basis for multi-source heterogeneous data fusion for enterprise credit supervision, enables cross-department and cross-level credit information integration to be possible, and the convergence and cross-validation of multi-dimensional credit supervision information such as current industry and commerce, tax, judicial and public opinion become a key means for improving supervision accuracy and preventing systematic risks. However, the conventional enterprise credit supervision information verification technology still has significant limitations that firstly, static data quality assessment is mostly adopted in the conventional verification method, dynamic context factors such as market environment fluctuation, policy adjustment and the like during data generation are ignored, so that risk early warning is lagged, secondly, the prior art focuses on isolated assessment of a single enterprise credit state, deep mining of implicit association relations among enterprises is lacking, linkage effects of risk transmission across a main body are difficult to identify, and finally, contradiction detection among multi-source data stays in field-level conflict instant resolution, a time window generated by the contradiction cannot be positioned, and accurate correction cannot be traced to a specific data source, so that verification conclusion reliability is insufficient. Therefore, the invention provides the enterprise credit supervision information verification method and system based on the multi-source data, and a multi-level, self-adaptive and interpretable enterprise credit supervision information intelligent verification system is constructed through double-coefficient risk quantification, dynamic credibility fusion, graph neural network propagation and attention attribution correction, so that the enterprise credit supervision information universe, full-time and full-chain verification can be realized, and a practical and effective technical solution is provided for coping with complex and changeable business environments and increasingly hidden credit risks. Disclosure of Invention The invention aims to provide an enterprise credit supervision information verification method and system based on multi-source data. In order to achieve the above purpose, the invention is implemented according to the following technical scheme: The invention comprises the following steps: Acquiring multi-source credit supervision information and metadata of an enterprise, performing data primary screening to calculate a first risk coefficient of each multi-source credit supervision information, calculating a second risk coefficient according to the metadata of each multi-source credit supervision information, and performing explicit verification according to the first risk coefficient and the second risk coefficient according to time periods; calculating the dynamic credibility of each multi-source credit supervision information, and obtaining a multi-source data fusion decision value by self-adaptive weighted fusion of each multi-source credit supervision information according to the dynamic credibility; constructing an enterprise credit association graph according to the multi-source data fusion decision value and the enterprise basic attribute of each enterprise, continuously acquiring multi-source credit supervision information to perform implicit association risk propagation, and calculating association consistency indexes of the enterprises corresponding to nodes of each graph to perform implicit verification according to time periods; Determining a multisource credit supervision information contradiction time period according to the explicit verification result and the implicit verification result, calculating contradiction attention scores of the multisource credit supervision information according to the first risk coefficient, the second risk coefficient and the dynamic credibility, determining a responsibility source in the contradiction time period, and carrying out information correction; The multi-source credit supervision information comprises industrial and commercial supervision information, tax supervision information, judicial supervision information and public opinion supervision information, th