CN-121980604-A - Data blood source tracing method based on blockchain
Abstract
The invention discloses a data blood margin tracing method based on a blockchain, which mainly comprises three stages of data acquisition and structured evidence storage, blood margin information extraction and uplink, blood margin verification and visual display; according to the invention, the source, the destination and the operation information of the data in the processes of acquisition, processing, transmission and output are extracted by automatically analyzing the data processing logs or SQL sentences, and the data blood-margin relation model is constructed according to the preset blood-margin modeling rule, so that the full-link traceability of the data between different systems and the ring is realized, the trusted acquisition, the verifiable evidence storage and the atlas display of the blood-margin information are realized, and the method has the advantages of high automation degree, strong auditability and good expandability.
Inventors
- TANG HANLIN
- XIAO BIN
- YANG DEJI
- XU XUBIN
- PENG CHANGGEN
- WANG DALIANG
- DING HONGFA
- NIU JIN
Assignees
- 贵州数据宝网络科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251229
Claims (5)
- 1. The data blood source tracing method based on the blockchain is characterized by comprising the following steps of: First step, data acquisition and structured evidence storage Automatically analyzing a data processing relation from a data processing log or an SQL script, generating a structured blood-margin description file and a blood-margin model file conforming to a W3C PROV standard, calculating a hash value of an original file, generating a digital signature, then carrying out on-chain verification, and simultaneously storing complete metadata in an under-chain database; second step, blood margin information extraction and uplink Uploading the blood margin file to a knowledge graph database, establishing blood margin association among entities, activities and subjects, generating a structured blood margin record file, and completing on-chain evidence storage through intelligent contracts to realize verifiable traceability of the full life cycle of data; third step, blood margin verification and visual display The method comprises the steps of providing blood-edge inquiry based on file hash, task identification or time interval through a unified verification interface, realizing verifiable tracing of file content and a responsible main body by recalculating under-chain file hash and comparing on-chain evidence, combining signature verification and a time stamp mechanism, and displaying the whole-chain blood-edge relation of data from source and processing to result through a visual interface.
- 2. The blockchain-based data blood source tracing method of claim 1, wherein the first step comprises the steps of: S1, deploying a link-down module at a local server or a data warehouse side, which is responsible for log analysis, SQL statement analysis and blood-edge relation extraction, wherein the link-up module realizes uplink storage of data blood-edge hash and evidence based on a alliance link; S2, initializing a database and nodes on a chain, distributing unique Identity (ID) for each participant, and generating a corresponding key pair; s3, initializing a blood margin model, presetting a blood margin representation model which accords with a W3C PROV standard, defining three types of nodes including an entity, an activity and an agent and the relation thereof, simultaneously establishing a mapping rule, and automatically converting a field dependency relation in SQL into a PROV model structure; S4, data structure convention, wherein a data structure comprising sql/log files, json files and pro files which need to be linked in the executing process is set.
- 3. The blockchain-based data blood source tracing method of claim 1, wherein the second step comprises the steps of: s5, automatically collecting an execution log and an SQL file in a data processing process, identifying the data operation type through a log monitoring module and an SQL analysis module, extracting a data source table, a target table and a field mapping relation in the data operation type, storing an analysis result in a structured form as a JSON file, and generating the JSON file as an intermediate result file of blood-edge extraction; s6, converting the JSON file obtained through analysis into a data traceability representation file conforming to the W3C PROV standard through a blood edge modeling module, recording a complete path from generation, transmission and processing of data to result generation by the generated data traceability representation file, and calculating a file hash value ProvHash for subsequent uplink verification; S7, on-chain evidence and hash registration, namely calling an on-chain intelligent contract interface to write the generated hash abstract and metadata of the SON file and the data traceable representing file into a blockchain, wherein the data structure of the upper chain comprises file hash, corresponding to the under-chain identifier, uploading identity, timestamp and file description or version information; S8, after the uplink is completed, generating a link uplink downlink association index record, and ensuring that blood margin data can be searched between two storage layers; the method can quickly locate the original blood-edge file under the chain through the on-chain transaction hash or the file hash, and can simultaneously inquire the corresponding uplink record through the uid of the under-chain database to realize a bidirectional verification mechanism.
- 4. The blockchain-based data blood source tracing method of claim 1, wherein the third step comprises the steps of: S9, a user initiates a verification request through a unified blood margin query interface; s10, after the blood edge verification is completed, a knowledge graph engine is called, entities, activities and main bodies in the blood edge file are subjected to graph structural mapping, and a forming path of the data blood edge is displayed through a front-end visual interface, wherein the forming path comprises a source entity of the data, a generation activity of the source entity, input and output relations of each data processing step, a responsibility main body of each node and operation time.
- 5. The blockchain-based data blood source tracing method of claim 4, wherein in step S9, the user can initiate the verification request by three methods: Based on the on-chain hash inquiry, a user inputs the on-chain file hash, the system locates the corresponding under-chain file uid according to the blockchain record, and reads the original file meta-information from the under-chain database; searching corresponding records in the under-chain database according to the task ID or the file path, and automatically comparing the hash and signature registered in the chain; Based on the time or the identity inquiry of the uploader, the historical version record of the blood-edge file is searched through the uploader and the timestamp, and time interval screening and version backtracking are supported.
Description
Data blood source tracing method based on blockchain Technical Field The invention relates to a data blood-source tracing method based on a blockchain, and belongs to the technical field of data management and information security. Background The current data blood-source (Data Provenance/DATA LINEAGE) technology is mainly used for recording the whole process information from generation, transmission and processing to result formation of data, and is used for explaining the source, processing and responsibility main body of the data. The early research provides a theoretical model of why/white/how pro-vanance and the like, and lays a foundation for blood margin calculation. In order to realize data tracing of cross systems and cross organizations, W3C issues PROV series specifications for uniformly describing the relationship among data entities (Activity), activities (activities) and responsible parties (agents), so that interoperability of blood-margin data is improved. Along with the wide application of big data, cloud computing and the Internet of things, the data streaming process is more complex, and the traditional blood-margin recording method has limitations in the aspects of automatic extraction, cross-platform integration and trusted verification. On one hand, complicated analysis is needed for extracting the blood-edge relation from the log, SQL or ETL tool, and dynamic or unstructured data processing process is difficult to cover, and on the other hand, the aggregation and indexing of large-scale blood-edge data bring high storage and calculation cost and influence query performance. In addition, the problem of trustworthiness of the blood-margin data remains prominent, and traditional centralized storage is easily tampered with or lost. In recent years, blockchain technology has been introduced into the data blood-line field due to its decentralised, non-tamperable and traceable nature. Part of the research realizes verifiable certification of the data operation record by linking the blood-source information or the hash value thereof. However, the existing scheme has the problems of high cost on the chain, privacy disclosure risk, complex cooperation between the upper and lower links of the chain and the like, and is difficult to consider both credibility and expandability. Disclosure of Invention The invention aims to provide a data blood-source tracing method based on a blockchain, which realizes data credible tracing with cross-system, verifiability and low cost, thereby solving the technical problem. In order to solve the technical problems, the technical scheme of the invention is as follows: a data blood source tracing method based on a block chain comprises the following steps: First step, data acquisition and structured evidence storage Automatically analyzing a data processing relation from a data processing log or an SQL script, generating a structured blood-margin description file and a blood-margin model file conforming to a W3C PROV standard, calculating a hash value of an original file, generating a digital signature, then carrying out on-chain verification, and simultaneously storing complete metadata in an under-chain database; second step, blood margin information extraction and uplink Uploading the blood margin file to a knowledge graph database, establishing blood margin association among entities, activities and subjects, generating a structured blood margin record file, and completing on-chain evidence storage through intelligent contracts to realize verifiable traceability of the full life cycle of data; third step, blood margin verification and visual display The method comprises the steps of providing blood-edge inquiry based on file hash, task identification or time interval through a unified verification interface, realizing verifiable tracing of file content and a responsible main body by recalculating under-chain file hash and comparing on-chain evidence, combining signature verification and a time stamp mechanism, and displaying the whole-chain blood-edge relation of data from source and processing to result through a visual interface. As a preferred embodiment, the first step includes the steps of: S1, deploying a link-down module at a local server or a data warehouse side, which is responsible for log analysis, SQL statement analysis and blood-edge relation extraction, wherein the link-up module realizes uplink storage of data blood-edge hash and evidence based on a alliance link; S2, initializing a database and nodes on a chain, distributing unique Identity (ID) for each participant, and generating a corresponding key pair; s3, initializing a blood margin model, presetting a blood margin representation model which accords with a W3C PROV standard, defining three types of nodes including an entity, an activity and an agent and the relation thereof, simultaneously establishing a mapping rule, and automatically converting a field dependency relation in SQL into a P