Search

CN-116108004-B - Data quality detection method, device, equipment and storage medium

CN116108004BCN 116108004 BCN116108004 BCN 116108004BCN-116108004-B

Abstract

The invention discloses a data quality detection method, a device, equipment and a storage medium. The invention relates to the technical field of big data processing. The method comprises the steps of obtaining data to be detected, determining information types of the data to be detected, determining data fields contained in the data to be detected according to the information types, selecting the data fields to be detected from the data fields contained in the data to be detected according to preset data quality evaluation basis, determining data quality evaluation rules and influence levels of the data to be detected from the data quality evaluation basis of the data to be detected for any one of the data to be detected, and judging whether the quality of the data to be detected is qualified or not according to the data quality evaluation rules and the influence levels corresponding to each of the data to be detected. The technical scheme of the invention realizes the rapid and accurate detection of mass data quality, has high automation degree, does not need manual participation, and effectively improves the efficiency and accuracy of data quality detection.

Inventors

  • YUAN MIN
  • ZHOU HAIJUN
  • CHEN SHENGKAI
  • LI CHENCHEN
  • ZHENG QIANGSHENG

Assignees

  • 数字广东网络建设有限公司

Dates

Publication Date
20260505
Application Date
20230209

Claims (9)

  1. 1. A method for detecting data quality, comprising: obtaining data to be detected, determining an information class to which the data to be detected belongs, and determining a data field contained in the data to be detected according to the information class, wherein the information class is a data information type; Selecting a field to be detected from data fields contained in the data to be detected according to a preset data quality evaluation basis; for any field to be detected, determining an evaluation rule matched with the field to be detected from a data quality evaluation basis of the data to be detected, and determining the evaluation rule as a data quality evaluation rule of the field to be detected, wherein the data quality evaluation basis comprises an industry evaluation rule, a country evaluation rule and a place evaluation rule of each field to be detected; For any field to be detected, determining the influence grade of the field to be detected according to the data quality evaluation rule of the field to be detected and the evaluation description of the field to be detected, wherein the influence grade of the field to be detected is classified into A, B, C grades; and judging whether the quality of the data to be detected is qualified or not according to the data quality evaluation rule and the influence level corresponding to each field to be detected.
  2. 2. The method according to claim 1, wherein before selecting a field to be detected from the data fields included in the data to be detected according to a preset data quality evaluation basis, the method further comprises: and probing the data to be detected, and returning the data to be detected which does not pass through the probing.
  3. 3. The method according to claim 1, wherein said determining whether the quality of the data to be detected is acceptable according to the data quality evaluation rule and the impact level corresponding to each of the fields to be detected, comprises: Performing qualification rate calculation according to the data quality evaluation rule and the influence level corresponding to each field to be detected, and determining the qualification rate of the data to be detected; and determining whether the quality of the data to be detected is qualified or not according to the qualification rate of the data to be detected.
  4. 4. The method of claim 3, wherein the step of performing a qualification rate calculation according to the data quality evaluation rule and the impact level corresponding to each field to be detected, and determining the qualification rate of the data to be detected includes: For any target data quality evaluation rule of any field to be detected, determining the accuracy of the field to be detected under the target data quality evaluation rule; For any data quality evaluation rule, determining rule weight of the data quality evaluation rule according to the influence level corresponding to the data quality evaluation rule and the influence level corresponding to each of all the data quality evaluation rules; determining the qualification rate of each field to be detected under the corresponding data quality evaluation rule according to the accuracy rate and the rule weight; And determining the qualification rate of the data to be detected according to the qualification rate of each field to be detected.
  5. 5. The method according to claim 4, wherein for any one of the data quality evaluation rules, determining the rule weight of the data quality evaluation rule according to the impact level corresponding to the data quality evaluation rule and the impact level corresponding to each of all the data quality evaluation rules includes: For any data quality evaluation rule, determining a grade score of an influence grade corresponding to the data quality evaluation rule; summing the grade scores of the influence grades corresponding to all the data quality evaluation rules respectively to obtain a total influence grade score; and for any data quality evaluation rule, determining the quotient of the grade score of the influence grade corresponding to the data quality evaluation rule and the total influence grade score as the rule weight of the data quality evaluation rule.
  6. 6. A method according to claim 3, wherein said determining whether the quality of the data to be detected is acceptable based on the qualification rate of the data to be detected comprises: Judging whether the qualification rate of the data to be detected is greater than a set threshold value; If yes, the quality of the data to be detected is qualified; If not, the quality of the data to be detected is unqualified.
  7. 7. A data quality detection apparatus, comprising: The first determining module is used for obtaining data to be detected, determining an information class to which the data to be detected belongs, and determining a data field contained in the data to be detected according to the information class, wherein the information class is a data information type; The selecting module is used for selecting a field to be detected from data fields contained in the data to be detected according to a preset data quality evaluation basis; the second determining module is used for determining a data quality evaluation rule and an influence level of any field to be detected from the data quality evaluation basis of the data to be detected; The judging module is used for judging whether the quality of the data to be detected is qualified or not according to the data quality evaluation rule and the influence level corresponding to each field to be detected; wherein the second determining module includes: The first determining unit is used for determining an evaluation rule matched with the field to be detected from the data quality evaluation basis of the data to be detected for any field to be detected, and determining the evaluation rule as the data quality evaluation rule of the field to be detected, wherein the data quality evaluation basis comprises an industry evaluation rule, a country evaluation rule and a local evaluation rule of each field to be detected; And the second determining unit is used for determining the influence grade of any field to be detected according to the data quality evaluation rule of the field to be detected and the evaluation description of the field to be detected, wherein the influence grade of the field to be detected is classified into A, B, C grades.
  8. 8. An electronic device, the electronic device comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
  9. 9. A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, the computer instructions for causing a processor to perform the method of any one of claims 1-6.

Description

Data quality detection method, device, equipment and storage medium Technical Field The embodiment of the invention relates to the technical field of big data processing, in particular to a data quality detection method, a device, equipment and a storage medium. Background In the process of data acquisition, storage and transmission, the problems of data errors, loss and the like are unavoidable, so that the value of the data is reduced, and sometimes serious secondary problems are even brought. Thus, ensuring data quality is the basis for efficient use of data. At present, the data quality of the data generated by each digital source department is uneven because the data is not treated. The general method for quality inspection through excel documents needs to consume a great deal of time in content carding, quality inspection results are slow to generate, whether the output content of each sub-process of the quality inspection is wrong or not needs to be repeatedly confirmed, the quality inspection efficiency is low, the labor input cost is high, the maintenance cost is high, and the operation is complex and intelligent. Disclosure of Invention The invention provides a data quality detection method, a device, equipment and a storage medium, which are used for solving the problems of low quality inspection data efficiency and low accuracy. Obtaining data to be detected, determining an information class to which the data to be detected belongs, and determining a data field contained in the data to be detected according to the information class; Selecting a field to be detected from data fields contained in the data to be detected according to a preset data quality evaluation basis; for any field to be detected, determining a data quality evaluation rule and an influence level of the field to be detected from the data quality evaluation basis of the data to be detected; and judging whether the quality of the data to be detected is qualified or not according to the data quality evaluation rule and the influence level corresponding to each field to be detected. According to another aspect of the present invention, there is provided a data quality detection apparatus including: The first determining module is used for obtaining data to be detected, determining the information class to which the data to be detected belongs, and determining the data field contained in the data to be detected according to the information class; The selecting module is used for selecting a field to be detected from data fields contained in the data to be detected according to a preset data quality evaluation basis; the second determining module is used for determining a data quality evaluation rule and an influence level of any field to be detected from the data quality evaluation basis of the data to be detected; And the judging module is used for judging whether the quality of the data to be detected is qualified or not according to the data quality evaluation rule and the influence level corresponding to each field to be detected. According to another aspect of the present invention, there is provided an electronic apparatus including: an electronic device, the electronic device comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data quality detection method according to any one of the embodiments of the present invention. According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute the data quality detection method according to any one of the embodiments of the present invention. According to the technical scheme, the information class of the data to be detected is determined by acquiring the data to be detected, the data field contained in the data to be detected is determined according to the information class, the data field to be detected is selected from the data fields contained in the data to be detected according to the preset data quality evaluation basis, the data quality evaluation rule and the influence level of any one of the data to be detected are determined from the data quality evaluation basis of the data to be detected, whether the quality of the data to be detected is qualified or not is judged according to the data quality evaluation rule and the influence level corresponding to each of the data to be detected, rapid and accurate detection of mass data quality is achieved, the automation degree is high, manual participation is not needed, the cost of manual participation is effectively reduced, and the efficiency and the accuracy of data quality detection are greatly improved. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of