CN-121979959-A - Data quality monitoring system and method under data security and privacy protection
Abstract
The invention relates to a data quality monitoring system and method under the protection of data security and privacy, the system comprises a basic information management module, a metadata integration Agent module, an intelligent classification grading Agent module, a desensitization rule generation Agent module, a desensitization rule execution Agent module, a monitoring rule generation Agent module, a monitoring rule execution Agent module, a monitoring and abnormality early warning Agent module, a system optimization Agent module and a visualization module. The invention combines the classification technology, the desensitization technology and the large model technology for data quality monitoring, can realize data differentiation protection, intelligent generation of quality monitoring rules and differentiation monitoring strategy configuration based on the importance and the sensitivity of data service, and provides safe and interpretable exception analysis and report, thereby improving the pertinence, the safety and the self-adaptive capacity of monitoring.
Inventors
- WANG PENG
- LIN YING
- LIANG SHUYANG
Assignees
- 陕西瀚光数字科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260203
Claims (10)
- 1. The data quality monitoring system under the protection of data safety and privacy is characterized by comprising a basic information management module, a metadata integration Agent module, an intelligent classification grading Agent module, a desensitization rule generation Agent module, a desensitization rule execution Agent module, a monitoring rule generation Agent module, a monitoring rule execution Agent module, a monitoring and abnormality early warning Agent module, a system optimization Agent module and a visualization module, wherein the basic information management module is used as an entrance of the system and is responsible for uniformly managing the connection parameters and access rights of heterogeneous data sources; the intelligent classification Agent module receives the metadata information acquired by the metadata acquisition Agent module, respectively carries out service importance and sensitivity identification marking on the metadata information of the data to be monitored through a classification model and a classification knowledge base, provides basis for the subsequent differentiation strategy formulation, the desensitization rule generation Agent module receives the service importance and sensitivity classification result of the data to be monitored output by the intelligent classification Agent module, carries out preprocessing on the service importance and sensitivity classification result to obtain a sensitivity identification result, automatically generates a data desensitization rule script corresponding to the sensitivity level through a desensitization rule generation model and a built-in desensitization rule template, ensures that the data of different sensitivity levels are properly protected, the desensitization rule execution Agent module is responsible for carrying out the connection information of the data to be monitored configured by the basis information management module according to the database address, authentication credential and connection parameters, the method comprises the steps of generating a data desensitization rule script set generated by an Agent module through desensitization rules, calling a multi-source data real-time desensitization tool to automatically establish real-time connection to different data sources under unified scheduling of a desensitization rule execution model, executing real-time desensitization of data to be monitored, preprocessing the data to be monitored by the monitoring rule generation Agent module based on service importance and sensitivity classification results of the data to be monitored output by an intelligent classification Agent module to obtain service importance labeling results, automatically generating a monitoring rule set matched with the service importance labeling results by utilizing a monitoring rule generation model and a monitoring rule knowledge base, receiving connection information of the data to be monitored, including database addresses, authentication certificates and connection parameters, generated by a basic information management module, and a data quality monitoring rule set generated by the monitoring rule generation Agent module, wherein the latest monitoring execution results generated by the monitoring rule execution Agent module are regularly acquired by the monitoring and abnormal Agent module, and carrying out multidimensional monitoring report analysis on structural quality inspection results generated by the monitoring rule execution Agent module by utilizing a monitoring analysis model.
- 2. The system for monitoring data quality under data security and privacy protection of claim 1, wherein the basic information management module supports MySQL, oracle, mongoDB multiple main stream database types through a standardized configuration template, and establishes an accurate and safe data channel basis for subsequent metadata acquisition, data desensitization, monitoring analysis and task scheduling by defining host addresses, ports, authentication modes, library table mapping and key parameters of responsible person contact modes.
- 3. The system for monitoring data quality under data security and privacy protection of claim 2, wherein the metadata integration Agent module establishes connection with various heterogeneous data sources through JDBC and ODBC connectors, automatically discovers and collects basic metadata including table structures, field types, field descriptions, whether nullable, primary keys and uniqueness constraints, and achieves comprehensive perception and unified management of metadata of data to be monitored.
- 4. The data quality monitoring system under the protection of data security and privacy according to claim 3, wherein the classification criteria of business importance and sensitivity in the intelligent classification and classification Agent module are as follows: 1) Business importance division standard: 1.1 Core data, namely data which directly influences the core business and strategy decision of an enterprise; 1.2 Important data, namely data which plays an important role in service operation and risk management; 1.3 General data, namely data with reference value for daily operation and auxiliary business processes; 2) Sensitivity division criteria: 2.1 A disclosure level for disclosing shared or externally published data; 2.2 Secret level, namely data which is used only in the interior and needs to be subjected to basic access control; 2.3 Secret level-data related to personal privacy, business secrets, or contract information; 2.4 Absolute level-highly sensitive data to enterprise core interests or compliance requirements.
- 5. The system for monitoring data quality under data security and privacy protection as set forth in claim 4 wherein the desensitization rule execution Agent module uses different real-time monitoring mechanisms for real-time connection based on different data source types MySQL, mongoDB, oracle to achieve continuous and automated dynamic data desensitization.
- 6. The system for monitoring data quality under data security and privacy protection of claim 5, wherein the monitoring rule generation Agent module is characterized in that the monitoring rule set comprises a monitoring frequency, a monitoring rule script and a threshold, the monitoring rule comprises multiple dimensions of integrity, accuracy, consistency, timeliness and availability, and the setting of the monitoring frequency and the threshold is determined according to service importance and service characteristics.
- 7. The system for monitoring data quality under data security and privacy protection of claim 6, wherein the monitoring rule execution Agent module, under the scheduling of the monitoring rule execution model, invokes the monitoring rule execution tool to convert the static rule into an executable timing calculation task, generates a quality inspection result comprising a pass/fail state and a detailed index, so as to realize real-time monitoring of the data to be monitored, and stores each execution result of the scheduling task corresponding to each monitoring index item locally, thereby providing a high-quality and low-delay decision data basis for the real-time monitoring and early warning Agent module.
- 8. The system for monitoring data quality under data security and privacy protection of claim 7, wherein the monitoring and anomaly early warning Agent module immediately triggers the early warning mechanism and sends the current monitoring report to the corresponding task responsible person through the message sending tool once the actual measured value of the data quality monitoring index exceeds the preset threshold value through analysis of the rule engine, wherein the message sending tool supports a plurality of notification modes of mail and nailing, In the formula, Finger means Time of day (time) The first of the tensor The result of the inspection of the individual monitoring indicators, Finger means Time of day (time) The first of the tensor The thresholds of the monitor indicators.
- 9. The system for monitoring data quality under the protection of data security and privacy according to any one of claims 1 to 8, wherein the system for monitoring data quality under the protection of data security and privacy further comprises a system optimization Agent module and a visual result display module, wherein the system optimization Agent module establishes a system closed-loop optimization mechanism, the classification rules generated in each classification task and the monitoring rule key information generated in the monitoring task are respectively stored into a classification knowledge base and a monitoring rule knowledge base through a system optimization model, the classification and the monitoring effect are continuously optimized through continuous accumulation of knowledge of different business scenes, the visual result display module performs the monitoring result generated by the Agent module through a visual cockpit, and displays the monitoring result in a visual chart form, and a global and visual data security and quality situation view is provided for a data manager.
- 10. A data quality monitoring method based on the data quality monitoring system under the protection of data security and privacy as set forth in claim 1 is characterized in that the method comprises the following steps: The method comprises the steps of 1, configuring connection information of data to be monitored on the basis of a preset data source connection parameter standardization template in a basic information management module so as to establish an accurate and safe data channel foundation for subsequent metadata acquisition, data desensitization, monitoring analysis and task scheduling, wherein the connection information comprises a host address, a port, an authentication mode, a library name, a table name and a task responsible person contact mode; Step 2, loading and analyzing the connection information of the data to be monitored configured by the basic information management module by the metadata integration Agent module, and acquiring and integrating metadata of the data to be monitored by means of an acquisition tool built in the module under unified scheduling of the metadata integration model; Step 3, the intelligent classification and classification Agent module receives the metadata information acquired by the metadata integration Agent module, and respectively carries out service importance and sensitivity identification marking on the metadata information of the data to be monitored through a classification and classification model and a classification and classification knowledge base, so as to provide basis for the subsequent differentiation strategy formulation; The dividing criteria of the service importance and the sensitivity are as follows: business importance division standard: core data, namely data which directly influences core business and strategic decisions of enterprises; Important data, namely data which plays an important role in service operation and risk management; general data, namely data with reference value for daily operation and auxiliary business processes; Sensitivity division criteria: A disclosure stage for disclosing the data which can be shared or issued externally; Secret level, namely, data which is used only in the interior and needs to be subjected to basic access control; security level, data relating to personal privacy, business secrets or contract information; The absolute level is that highly sensitive data is required for the core interests or compliance of enterprises; Step 4, a desensitization rule generation Agent module receives the business importance and sensitivity classification result of the data to be monitored output by the intelligent classification Agent module, preprocesses the business importance and sensitivity classification result to obtain a sensitivity recognition result, automatically generates a data desensitization rule script corresponding to the sensitivity level through a desensitization rule generation model and a built-in desensitization rule template, and ensures that the data of different sensitivity levels are properly protected; The desensitization rule execution Agent module is responsible for generating a data desensitization rule script set generated by the Agent module according to the connection information of the data to be monitored, which is configured by the basic information management module, and comprises a database address, an authentication certificate, connection parameters and the desensitization rule, and under the unified scheduling of a desensitization rule execution model, a multi-source data real-time desensitization tool is called to automatically establish real-time connection to different data sources and execute real-time desensitization of the data to be monitored, wherein the real-time connection adopts different real-time monitoring mechanisms according to the different data source types MySQL, mongoDB, oracle so as to realize continuous and automatic dynamic data desensitization; Step 6, the monitoring rule generating Agent module carries out preprocessing on the service importance and sensitivity classification results of the data to be monitored output by the intelligent classification Agent module to obtain service importance marking results, and a monitoring rule generating model and a monitoring rule knowledge base are utilized to automatically generate a monitoring rule set matched with the service importance marking results, wherein the monitoring rule set comprises monitoring frequency, a monitoring rule script and a threshold value, the monitoring rule covers multiple dimensions of integrity, accuracy, consistency, timeliness and usability, and the setting of the monitoring frequency and the threshold value is determined according to the service importance and service characteristics; Step 7, the monitoring rule execution Agent module receives the connection information of the data to be monitored configured by the basic information management module, wherein the connection information comprises a database address, an authentication certificate, connection parameters and a monitoring rule set generated by the monitoring rule generation Agent module, the monitoring rule set comprises a Cron scheduling expression, a monitoring SQL and threshold configuration, a monitoring rule execution tool is called to convert a static rule into an executable timing calculation task under the scheduling of a monitoring rule execution model, and a quality check result comprising a passing state, a failure state and detailed indexes is produced so as to realize the real-time monitoring of the data to be monitored, and the monitoring related information is stored locally, so that a high-quality and low-delay decision data basis is provided for the real-time monitoring and early warning Agent module; Step 8, the monitoring and abnormal early warning Agent module regularly acquires the latest monitoring execution result generated by the monitoring rule execution Agent module, and carries out multidimensional aggregation analysis on the structured quality inspection result generated by the monitoring rule execution Agent module by utilizing a monitoring analysis model to generate a monitoring report; when the actual measured value of the data quality monitoring index exceeds a preset threshold value through analysis of a rule engine, immediately triggering an early warning mechanism, and sending the monitoring report to a corresponding task responsible person through a message sending tool, wherein the message sending tool supports a plurality of notification modes such as mail, nail and the like; In the formula, Finger means Timetable Is the first of (2) The result of the inspection of the individual monitoring indicators, Finger means Timetable Is the first of (2) A threshold value of each monitoring index; Step 9, a system optimization Agent module establishes a system closed-loop optimization mechanism, key information such as classification rules generated in each classification task and monitoring rules generated in a monitoring task is respectively stored into a classification knowledge base and a monitoring rule knowledge base through a system optimization model, and classification and monitoring effects are continuously optimized through continuous accumulation of knowledge of different business scenes, so that self-evolution of the system is realized; And 10, the result visualization module analyzes the monitoring result generated by the monitoring rule execution Agent module in real time through the visual cockpit, displays the monitoring result in a visual chart form and provides a global and visual data security and quality situation view for a data manager.
Description
Data quality monitoring system and method under data security and privacy protection Technical Field The invention relates to the field of data quality monitoring, in particular to a data quality monitoring system and method under the protection of data security and privacy. Background The data quality monitoring means that quality indexes such as accuracy, completeness, consistency, uniqueness, timeliness and the like of data are continuously monitored and evaluated through a series of technical means and tools so as to ensure that the data meet business requirements and standards. By monitoring the quality of the data, enterprises can be helped to find and solve the quality problems in the data in time, and the reliability and the value of the data are improved. The existing data quality monitoring technology has the problems of low pertinence, low safety and lack of self-adaptability, so that the monitoring difficulty is high, the efficiency is low, the monitoring result is lagged or distorted, meanwhile, the leakage or falsification risk exists in the data transmission, storage and processing processes, the variable data and service environments are difficult to deal with, and the labor cost and the subjective decision risk are increased. Thus, existing data quality monitoring techniques suffer from the following drawbacks: 1) The existing data quality monitoring technology is not strong in pertinence. The existing data quality monitoring technology does not classify and classify the data according to the importance degree, so that the monitoring difficulty is increased, and the monitoring efficiency and effect are poor. 2) The existing data quality monitoring technology is low in safety. The existing data quality monitoring technology does not mention encryption and privacy protection measures of data in the transmission, storage and processing processes, so that the data is intercepted in the transmission process, tampered in the storage process or leaked in the processing process, thereby causing serious data security and privacy problems. 3) Existing data quality monitoring techniques lack adaptivity. The existing data quality monitoring rules are seriously dependent on expert experience setting, lack of self-adaption and dynamic learning capabilities, are difficult to cope with changeable data and business environments, lead to hysteresis or distortion of monitoring results, and increase labor cost and subjective risk. Disclosure of Invention In order to solve the technical problems in the background technology, the data quality monitoring system and method under the data security and privacy protection provided by the invention combine the classification technology, the desensitization technology and the large model technology for data quality monitoring, can realize data differential protection, intelligent generation of quality monitoring rules and differential monitoring strategy configuration based on the importance and the sensitivity of data service, and provide security and interpretable anomaly analysis and reporting, thereby improving the pertinence, the security and the self-adaptive capacity of monitoring. The invention provides a data quality monitoring system under the protection of data security and privacy, which is characterized in that the data quality monitoring system under the protection of data security and privacy comprises a basic information management module, a metadata integration Agent module, an intelligent classification grading Agent module, a desensitization rule generation Agent module, a desensitization rule execution Agent module, a monitoring rule generation Agent module, a monitoring rule execution Agent module, a monitoring and abnormal early warning Agent module, a system optimization Agent module and a visualization module, wherein the basic information management module is used as an entrance of the system and is responsible for uniformly managing the connection parameters and access rights of heterogeneous data sources; the metadata integrated Agent module is used as a data perception center of the system to receive the connection information of the data to be monitored configured by the basic information management module, the intelligent classification Agent module receives the metadata information acquired by the metadata acquisition Agent module, respectively carries out service importance and sensitivity identification marking on the metadata information of the data to be monitored through the classification model and the classification knowledge base to provide basis for the subsequent differentiation strategy formulation, the desensitization rule generation Agent module receives the service importance and sensitivity classification result of the data to be monitored output by the intelligent classification Agent module, carries out pretreatment on the service importance and sensitivity classification result to obtain a sensitivity identification resu