Search

CN-122022592-A - Quantitative evaluation and treatment method for enterprise data quality

CN122022592ACN 122022592 ACN122022592 ACN 122022592ACN-122022592-A

Abstract

The invention discloses a quantitative evaluation and treatment method for enterprise data quality, and belongs to the technical field of data evaluation and monitoring. The method comprises the steps of constructing a multi-level quality assessment index system comprising primary and secondary indexes according to business domain differences of enterprise data, carrying out quality measurement on the data, calculating the scores of the secondary indexes, sequentially obtaining the primary index scores and the comprehensive quality scores through weighted summarization, dividing the data into a plurality of quality grades according to the comprehensive quality scores, and executing corresponding data treatment operation on different grades. According to the invention, through multi-source heterogeneous data acquisition, multi-layer configurable index quantitative evaluation, differential period monitoring and closed-loop treatment based on intelligent analysis, the automation, refinement and intelligent evaluation and treatment of enterprise data quality are realized, the efficiency and effect of data treatment are obviously improved, and solid guarantee is provided for value release of enterprise data assets.

Inventors

  • ZHENG XIAODONG
  • CHEN YUDI
  • Yang Daiyue
  • LIN YANG
  • Qu Yeheng
  • REN ZHIJUN
  • LIU XIFENG
  • WU HAO
  • SUN BOWEI
  • HUANG JIANCHENG
  • WANG YANG

Assignees

  • 中国能源建设股份有限公司
  • 中国能源建设集团江苏省电力设计院有限公司
  • 中国电力工程顾问集团有限公司

Dates

Publication Date
20260512
Application Date
20260212

Claims (10)

  1. 1. A method for quantitatively evaluating and governance of enterprise data quality, comprising: formulating an evaluation rule for quantifying data quality according to business domain differences of enterprise data to be evaluated, and constructing a multi-level quality evaluation index system based on the evaluation rule, wherein the multi-level quality evaluation index system comprises at least one primary index and at least one secondary index corresponding to each primary index, and each primary index and each secondary index are provided with corresponding scoring weights; carrying out quality measurement processing on the enterprise data to be evaluated, calculating to obtain index scores corresponding to each secondary index, and sequentially calculating primary index scores and comprehensive quality scores reflecting the whole enterprise data through weighted summarization according to the scoring weights of the secondary indexes and the index scores thereof; and dividing the enterprise data into a plurality of quality grades based on the comprehensive quality scores, and executing corresponding data management operations aiming at different quality grades.
  2. 2. The method for quantitatively evaluating and managing the quality of enterprise data according to claim 1, wherein the method for acquiring enterprise data to be evaluated specifically comprises: The structured data acquisition method specifically comprises the steps of acquiring or receiving structured data from at least one source of a service database, a log file and an API (application program interface) through a configured data synchronization task; The text description data acquisition method specifically comprises the steps of deploying an intelligent analysis engine, performing automatic information extraction and structuring processing on text data by using a rule base based on regular expressions and fuzzy matching, and checking an extraction result by means of a large language model; The unstructured data acquisition method specifically comprises the steps of analyzing, sorting and collecting information in pictures and irregular documents by adopting a multi-mode large model.
  3. 3. The quantitative assessment and governance method for enterprise data quality as claimed in claim 1, wherein, The multi-level quality assessment index system specifically comprises: the second-level index comprises at least one of data standard coincidence rate, metadata complete rate and service rule matching degree; The second-level index comprises at least one of element integrity, record integrity and data set coverage; The secondary index comprises at least one of data content accuracy, data format compliance, data repetition rate and abnormal data occurrence rate; The consistency primary index and the secondary index arranged below the consistency primary index comprise at least one of cross-system consistency and associated data consistency.
  4. 4. The quantitative assessment and governance method for enterprise data quality as claimed in claim 1, wherein, The method further comprises the steps of setting differentiated assessment periods according to the security level of enterprise data to be assessed: For the public data, setting an annual assessment period; setting a quarter evaluation period for the internal data; for sensitive data, setting a month evaluation period; And setting a continuous audit mechanism for the confidential data, and immediately starting quality inspection for the newly-added data.
  5. 5. The quantitative assessment and governance method for enterprise data quality as claimed in claim 1, wherein, The quality measurement processing is performed on the enterprise data to be evaluated, and an index score corresponding to each secondary index is obtained through calculation, specifically comprising the following steps: For a conventional secondary index representing the coincidence rate, calculating an index score by adopting the ratio of the number of the entries meeting the quality standard to the total number of the entries; for a secondary index characterizing repetition rate, an index score is calculated using the ratio of the number of data entry intersections to the number of union.
  6. 6. The quantitative assessment and governance method for enterprise data quality as claimed in claim 1, wherein, The step of sequentially calculating a first-level index score and a comprehensive quality score reflecting the whole enterprise data through weighted summarization according to the scoring weight of the second-level index and the index score thereof specifically comprises the following steps: ; Wherein, the For the purpose of the composite quality score, Scoring the weight for the i-th level index, Setting the scoring weight of the j second-level index for the i first-level index, Setting a secondary index score for the ith primary index, wherein i is a primary index scoring category, and j is a secondary index scoring category.
  7. 7. The quantitative assessment and governance method for enterprise data quality as claimed in claim 6, wherein, And carrying out synergistic effect adjustment on the obtained scoring result to optimize the scoring result, wherein the expression is as follows: Wherein, the In order to adjust the score of the score, A value is calculated for the original score and, For the kth next-level sub-score, The corresponding weights for the kth next level sub-score, The number of next-level child indexes directly administered for the current parent-level index to be adjusted.
  8. 8. The quantitative assessment and governance method for enterprise data quality as claimed in claim 1, wherein, The enterprise data is divided into a plurality of quality grades based on the comprehensive quality scores, and corresponding data management operations are executed for different quality grades, and the method specifically comprises the following steps: Performing primary treatment, namely performing data isolation and root cause analysis on data with the comprehensive quality score in a range of 0-60% (without inclusion); Second-level treatment, namely determining whether to execute data isolation and root cause analysis according to the actual influence degree of the data with the comprehensive quality score in a range of 60-80 percent (without the content); And (3) three-stage treatment, namely, aiming at the data with the comprehensive quality scores in the range of 80-100%, not executing active treatment operation.
  9. 9. The quantitative assessment and governance method for enterprise data quality as claimed in claim 8, wherein, The data isolation operation includes at least one of suspending a business process involving the corresponding data, disabling access to the corresponding data, rolling back the corresponding data to a last reliable historical version.
  10. 10. The quantitative assessment and governance method for enterprise data quality as claimed in claim 8, wherein, The root cause analysis specifically comprises: Screening out low-quality scoring data, and identifying key quality indexes with low scores to obtain low-score indexes; the data tracing step of tracing upstream to an initial link causing data quality abnormality according to the data blood relationship to obtain a tracing result; pattern mining, namely analyzing the centralized occurrence rule of data anomaly in time and service dimension by applying a predefined pattern rule to obtain an anomaly pattern result; extracting error and warning information in the data job log, and carrying out association analysis on the error and warning information and data abnormality in time and business to obtain a problem log; The intelligent attribution comprises the steps of inputting low score indexes, traceability results, abnormal mode results and problem log information into a large language model, and generating an analysis report containing the root cause of the problem based on a preset prompt template.

Description

Quantitative evaluation and treatment method for enterprise data quality Technical Field The invention relates to an enterprise data quality quantitative evaluation and treatment method, and belongs to the technical field of data evaluation and monitoring. Background The data quality evaluation is a core link of enterprise data management, the data quality is evaluated through a systematic method, the data value and the operation efficiency are improved, the method has irreplaceable importance in enterprise operation and development, various quality problems in the data can be accurately identified through scientific and systematic evaluation processes, a solid foundation is laid for effective utilization of the data, a series of steps from quality standard determination to monitoring and continuous improvement are looped, a complete data quality guarantee closed loop is formed, enterprise decision making highly depends on data analysis results in the current digital competitive business environment, accurate, complete and consistent data can provide reliable basis for analysis, enterprises can be made to be in insight of market trend, client demands are known, business processes are optimized, and preemptive in the market is realized. With the rapid development of technologies such as big data, artificial intelligence, cloud computing and the like, the data quality assessment faces new opportunities and challenges, on one hand, the emerging technology provides more powerful tools and more efficient methods for data quality assessment, such as realizing more accurate outlier detection and data restoration by using an artificial intelligence algorithm, processing quality assessment tasks of massive data by means of powerful computing capacity and storage capacity of cloud computing, and on the other hand, the diversity, real-time performance and complexity of the data are continuously increased, and higher requirements are put on the accuracy, timeliness and comprehensiveness of the data quality assessment. Disclosure of Invention The invention aims to provide an enterprise data quality quantitative evaluation and treatment method, which realizes the automatic, refined and intelligent evaluation and treatment of the enterprise data quality through multi-source heterogeneous data acquisition, index quantitative evaluation, differential period monitoring and closed-loop treatment based on intelligent analysis. In order to achieve the above purpose/solve the above technical problems, the present invention is realized by adopting the following technical scheme. In one aspect, the present invention provides a method for quantitatively evaluating and managing enterprise data quality, including: formulating an evaluation rule for quantifying data quality according to business domain differences of enterprise data to be evaluated, and constructing a multi-level quality evaluation index system based on the evaluation rule, wherein the multi-level quality evaluation index system comprises at least one primary index and at least one secondary index corresponding to each primary index, and each primary index and each secondary index are provided with corresponding scoring weights; carrying out quality measurement processing on the enterprise data to be evaluated, calculating to obtain index scores corresponding to each secondary index, and sequentially calculating primary index scores and comprehensive quality scores reflecting the whole enterprise data through weighted summarization according to the scoring weights of the secondary indexes and the index scores thereof; and dividing the enterprise data into a plurality of quality grades based on the comprehensive quality scores, and executing corresponding data management operations aiming at different quality grades. Further, the enterprise data acquisition method to be evaluated specifically includes: The structured data acquisition method specifically comprises the steps of acquiring or receiving structured data from at least one source of a service database, a log file and an API (application program interface) through a configured data synchronization task; The text description data acquisition method specifically comprises the steps of deploying an intelligent analysis engine, performing automatic information extraction and structuring processing on text data by using a rule base based on regular expressions and fuzzy matching, and checking an extraction result by means of a large language model; The unstructured data acquisition method specifically comprises the steps of analyzing, sorting and collecting information in pictures and irregular documents by adopting a multi-mode large model. Further, the multi-level quality assessment index system specifically includes: the second-level index comprises at least one of data standard coincidence rate, metadata complete rate and service rule matching degree; The second-level index comprises at least one of element inte