CN-121981815-A - Multi-source data entity identification and context classification method
Abstract
The application provides a multi-source data entity identification and context grading method which comprises the steps of generating a multi-source heterogeneous feature set according to credit information data, asset liability data and consumption transaction flow data in a credit scene, extracting entity feature primitives for the multi-source heterogeneous feature set to generate an entity feature primitive association set, generating an initial context risk cascade table based on the entity feature primitive association set, performing dynamic calibration on the initial context risk cascade table based on real-time credit transaction data to generate a calibrated context risk cascade table, and outputting the calibrated context risk cascade table to a credit approval system in the form of a risk assessment comprehensive data packet through a PCIE4.0 bus based on a credit risk early warning mechanism of an interrupt controller. The application realizes the dynamic adaptation of risk assessment, can timely reflect the change of the client risk state, reduces the risk misjudgment probability caused by insufficient timeliness of the data, and ensures that the timeliness and rationality of the risk assessment result are better.
Inventors
- Request for anonymity
Assignees
- 北京华荣信宁科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260126
Claims (10)
- 1. A method for identifying and classifying a context of a multi-source data entity, comprising: step 1, generating a multi-source heterogeneous feature set according to credit data, asset liability data and consumption transaction flow data in a credit scene; step 2, extracting entity characteristic primitives from the multi-source heterogeneous characteristic set to generate an entity characteristic primitive association set; Step 3, generating an initial context risk cascade table based on the entity characteristic primitive association set; step 4, based on the real-time credit transaction data, performing dynamic calibration on the initial context risk cascade table to generate a calibrated context risk cascade table; And step 5, based on a credit risk early warning mechanism of the interrupt controller, outputting the calibrated context risk cascade table to a credit approval system in the form of a risk assessment integrated data packet through a PCIE4.0 bus.
- 2. The method according to claim 1, wherein step1, a multi-source heterogeneous feature set is generated according to credit data, asset liability data and consumption transaction flow data in a credit scene, specifically, a multi-source data heterogeneous feature deconstructor integrated on an FPGA acceleration unit is called to execute heterogeneous feature deconstructing on the credit data, the asset liability data and the consumption transaction flow data in the credit scene, and a multi-source heterogeneous feature set is generated.
- 3. The method according to claim 2, wherein step 1, invoking a multi-source data heterogeneous feature deconstructor integrated on the FPGA acceleration unit, performs heterogeneous feature deconstructing on credit data, asset liability data, and consumption transaction pipeline data in a credit scenario, and generates a multi-source heterogeneous feature set, specifically including: Step 11, preloading dynamic adapting firmware supporting multiple protocols on the FPGA accelerating unit, wherein the dynamic adapting firmware is internally provided with multiple protocol self-adapting matching logic so as to identify credit investigation data, asset liability data and format type of consumption transaction flow data and separate the credit investigation data, the asset liability metadata and the consumption transaction flow metadata from the credit investigation data, the asset liability data and the consumption transaction flow data; Step 12, invoking a preloaded enterprise credit risk association feature item list through a multi-source data heterogeneous feature deconstructor based on the configured risk association feature item to carry out heterogeneous feature structures on credit cell data, asset liability metadata and consumption transaction running water metadata so as to screen out metadata related to the credit risk as a risk association feature item; and step 13, calculating co-occurrence dependency weights of risk associated feature items to establish a feature time sequence dependency chain, and marking a unique user identification index on the feature time sequence dependency chain to generate a multi-source heterogeneous feature set.
- 4. The method according to claim 1, wherein step 2, extracting entity feature primitives for the multi-source heterogeneous feature set to generate an entity feature primitive association set, specifically: extracting entity feature primitives from the multi-source heterogeneous feature set to generate a collapse feature subset, and generating an entity feature primitive association set according to the collapse feature subset based on a credit business scene tag library.
- 5. The method according to claim 4, wherein step 2, extracting entity feature primitives for the multi-source heterogeneous feature set to generate a collapsed feature subset, and generating an entity feature primitive association set according to the collapsed feature subset based on a credit business scene tag library, specifically includes: Step 21, the GPU parallel computing node distributes weight coefficients according to the real-time influence priority of credit risk assessment, and salt hash coding is carried out on risk feature association items in the multi-source heterogeneous feature set based on the weight coefficients so as to obtain a risk feature association item set with weight recoding; step 22, roughly screening out repeated items in the risk feature associated item set with weight recoding according to the coding prefix, and eliminating repeated items in the risk feature associated item set with weight recoding according to the whole coding uniqueness so as to obtain a collapsed feature subset; and step 23, based on a credit business scene tag library, matching the collapsed feature subsets with the scene tag library to generate entity feature primitives, and analyzing business logic dependence and co-occurrence rules of the entity feature primitives according to support degree evaluation coefficients of the entity feature primitives on risk evaluation conclusions to generate an entity feature primitive association set.
- 6. The method of claim 1, wherein the generating of the initial context risk cascade table based on the entity feature primitive association set comprises constructing a credit scene specific hierarchical dimension by a context risk cascade mapper built on the DDR4 cache based on the entity feature primitive association set, and generating the context risk cascade table.
- 7. The method according to claim 6, wherein step 3, based on the entity feature primitive association set, builds a credit scene-specific hierarchical dimension through a context risk cascade mapper built on the DDR4 cache, and generates a context risk cascade table, specifically comprising: Step 31, a context risk cascade mapper on the DDR4 cache identifies and determines core grading dimensions comprising user credit stage, service risk type and data aging characteristics by analyzing a risk assessment index system and a scene requirement of a current credit service, and generates a dimension attribute description table; step 32, taking the historical credit risk case data set as a training sample, and generating a cascading factor mapping rule base by mining the mapping relation between the core grading dimension and the risk result; step 33, each primitive in the entity characteristic primitive association set is matched with a cascading factor mapping rule base according to the service attribute of the primitive so as to determine an associated risk cascading factor; step 34, generating a context risk cascade table based on the associated risk cascade factor, and storing the context risk cascade table in the DDR4 cache.
- 8. The method of claim 1 wherein step 4, performing a dynamic calibration on the initial context risk cascade table based on real-time credit transaction data to generate a calibrated context risk cascade table, specifically invoking a dynamic confidence calibration model deployed on an ASIC chip, performing a dynamic calibration on the initial context risk cascade table in combination with real-time credit transaction data accessed by an RDMA protocol to generate a calibrated context risk cascade table.
- 9. The method of claim 8, wherein step 4, invoking a dynamic confidence calibration model deployed on an ASIC chip, performing dynamic calibration on the initial context risk cascade table in combination with real-time credit transaction data accessed by the RDMA protocol, generating a calibrated context risk cascade table, specifically comprising: step 41, extracting real-time characteristic parameters corresponding to primitives in an initial context risk cascade table from real-time credit transaction data accessed by an RDMA protocol to establish a real-time characteristic-primitive mapping relation; Step 42, comparing the dynamic deviation degree of the risk cascade factor and the real-time characteristic parameter in the initial context risk cascade table based on the real-time characteristic-primitive mapping relation, and determining a target primitive with the dynamic deviation degree exceeding the characteristic threshold by combining with the credit business risk sensitivity characteristic threshold; And 43, evaluating the dynamic deviation degree of the target primitive, matching the hierarchical calibration rule base to update the risk cascade factors in the initial context risk cascade table, reserving the calibration track and generating the calibrated context risk cascade table.
- 10. The method according to claim 1, wherein step 5, based on the credit risk early warning mechanism of the interrupt controller, outputs the calibrated context risk cascade table to the credit approval system in the form of a risk assessment integrated data packet through the pcie4.0 bus, specifically: the interrupt controller performs weighted aggregation on risk cascade factors corresponding to the same user identifier in the calibrated context risk cascade table to generate a user risk assessment total score; Mapping the risk assessment total score into a specific risk grade, and packaging with a context risk cascade table to form a risk assessment integrated data packet for transmission to a credit approval system through a PCIE4.0 bus, wherein the risk assessment integrated data packet comprises a user identifier, the risk grade, a risk cascade factor detail and calibration track information.
Description
Multi-source data entity identification and context classification method Technical Field The application relates to the technical field of data processing, in particular to a multi-source data entity identification and context classification method. Background In the field of credit business risk assessment, along with the development of financial science and technology, the credit approval process needs to integrate multi-source heterogeneous information such as credit investigation data, asset liability data, consumption transaction flow data and the like so as to comprehensively describe the credit condition and repayment capability of customers. At present, the requirements of financial institutions on management and control precision and response efficiency of credit risks are continuously improved, effective information in multi-source data is effectively processed, association relations of data entities are accurately identified, scientific grading of risk grades is achieved based on dynamic changes of the data, and the requirements of supporting credit approval decisions and reducing credit default risks become key, and compliance and operation efficiency of credit business are directly affected. The existing multisource data processing scheme for credit risk assessment generally carries out independent feature screening on credit investigation data, asset liability data and consumption transaction flow data respectively, eliminates irrelevant fields to form a single data source feature subset, then carries out simple splicing on the single data source feature subsets according to customer identification fields to form a unified multisource data set, and finally generates a static risk assessment result by adopting preset fixed risk assessment indexes and thresholds based on the data set. However, in the existing scheme, on one hand, the processing of multi-source data is stopped at an independent screening and simple splicing level, the inherent association relation among different data source characteristics is not deeply mined, so that the identification dimension of a data entity is single, the comprehensive depiction of the customer risk is difficult to form, on the other hand, the risk assessment relies on a static threshold, the risk assessment result cannot be dynamically adjusted according to real-time credit transaction data, the risk misjudgment caused by insufficient timeliness of the data is easy to occur, meanwhile, the data transmission adopts a conventional interface, the requirement that the risk information is quickly transmitted to an approval system is difficult to meet under the high-concurrency credit business scene, and the real-time property and accuracy of credit approval are influenced. Disclosure of Invention In order to solve the above technical problems, the present application provides a multi-source data entity identification and context classification method, so as to at least alleviate the above technical problems. The technical scheme provided by the embodiment of the application is as follows: the application provides a multi-source data entity identification and context classification method, which comprises the following steps: step 1, generating a multi-source heterogeneous feature set according to credit data, asset liability data and consumption transaction flow data in a credit scene; step 2, extracting entity characteristic primitives from the multi-source heterogeneous characteristic set to generate an entity characteristic primitive association set; Step 3, generating an initial context risk cascade table based on the entity characteristic primitive association set; step 4, based on the real-time credit transaction data, performing dynamic calibration on the initial context risk cascade table to generate a calibrated context risk cascade table; And step 5, based on a credit risk early warning mechanism of the interrupt controller, outputting the calibrated context risk cascade table to a credit approval system in the form of a risk assessment integrated data packet through a PCIE4.0 bus. The technical scheme provided by the application has the following technical advantages: 1. aiming at independent screening and simple splicing of multi-source data, the un-mined feature association leads to the technical effect that entity identification dimension is single In the background technology, the traditional scheme simply splices credit investigation data, asset liability data and consumption transaction flow data after independent feature screening, and does not establish the association between different data source features, so that the data entity identification dimension is single, and the customer risk is difficult to comprehensively describe. The application generates the multi-source heterogeneous feature set through the step 1, integrates three types of core credit data into a unified feature set instead of performing isolation treatment on each data