CN-122019233-A - Data auditing method, device and equipment
Abstract
The application provides a data auditing method, device and equipment, which are used for responding to a received auditing task, generating a task sequence corresponding to the auditing task according to the auditing task and a semantic knowledge graph corresponding to the current moment, wherein the semantic knowledge graph comprises a plurality of knowledge entities and business relation edges among the plurality of knowledge entities, the plurality of knowledge entities are main data stored in different business systems, acquiring abnormal data which are abnormal in target main data indicated by the task sequence based on an abnormal inspection tool corresponding to the task sequence, carrying out causal relation analysis on the abnormal data according to the business relation edges, determining the abnormal reason of the abnormal data, updating the semantic knowledge graph according to the confidence of the abnormal data and the abnormal reason, and executing the auditing task received next time by utilizing the updated semantic knowledge graph. The embodiment of the application can improve the accuracy of data auditing.
Inventors
- WANG JUE
- ZHU JIANG
- YANG CHAO
Assignees
- 华润电力投资有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260121
Claims (10)
- 1. A method of auditing data, the method comprising: Responding to the received auditing task, and generating a task sequence corresponding to the auditing task according to the auditing task and a semantic knowledge graph corresponding to the current moment, wherein the semantic knowledge graph comprises a plurality of knowledge entities and business relationship edges among the plurality of knowledge entities; Based on an abnormality checking tool corresponding to the task sequence, obtaining abnormal data with abnormality in target main data indicated by the task sequence; carrying out causal association analysis on the abnormal data according to the business relation edge, and determining the abnormal reason of the abnormal data; updating the semantic knowledge graph according to the abnormal data and the confidence coefficient of the abnormal cause, and executing the auditing task received next time by using the updated semantic knowledge graph.
- 2. The method of claim 1, wherein the anomaly detection tool comprises at least one of a graph computation engine, a rules engine, and an anomaly detection model, the rules engine comprising a master data quality rule base; the obtaining, based on the abnormality checking tool corresponding to the task sequence, abnormal data in which an abnormality exists in the main data indicated by the task sequence includes: Outputting first data with abnormality in main data indicated by the task sequence based on an abnormality checking tool corresponding to the task sequence; Determining the abnormal data according to the first data; the method comprises the steps of determining a source path of target main data based on a semantic knowledge graph, determining the integrity of the target main data based on the source path, and determining the first data from the target main data based on the integrity, wherein the anomaly checking tool is the graph calculation engine; Acquiring a target quality rule matched with the target main data from the main data quality rule base under the condition that the abnormality checking tool is the rule engine, determining the first data from the target main data based on the target quality rule, wherein the target quality rule is used for determining at least one of the integrity, the consistency and the accuracy of the target main data; And when the abnormality checking tool is the abnormality detection model, inputting the target main data into the abnormality detection model to obtain the first data.
- 3. The method of claim 1, wherein the performing causal analysis on the anomaly data according to the business relationship edge to determine an anomaly cause of the anomaly data comprises: determining at least one candidate cause associated with the anomaly data based on the business relationship edge; mapping the abnormal data and the candidate reasons to a causal potential space to obtain causal features, wherein the causal potential space characterizes potential causal relations between the abnormal data and the candidate reasons; Determining a causal relationship between the anomaly data and the candidate cause according to the causal features and a preset causal knowledge base; Determining a causal effect value of each candidate cause on the abnormal data according to the causal relationship, wherein the causal effect value is used for representing the influence degree of the candidate cause on the abnormal data; and determining the abnormal reason from the candidate reasons according to the causal effect value.
- 4. A method according to claim 3, wherein said determining a causal relationship between said anomaly data and said candidate cause based on said causal characteristics and a preset causal knowledge base, comprises: determining potential causal relations between the result variable and the variable to be analyzed through a structural equation model by taking the abnormal data as the result variable and the candidate cause as the variable to be analyzed; And screening the potential causal relation by using causal constraints stored in the causal knowledge base to obtain causal relation between the abnormal data and the candidate reasons.
- 5. A method according to claim 3, wherein said determining a causal effect value for each of said candidate causes on said anomaly data based on said causal relationships comprises: constructing a relation graph according to the causal relation, wherein the vertex in the relation graph is the abnormal data and the candidate reason, and the side of the relation graph is the causal relation between the abnormal data and the candidate reason; Based on the relation diagram, executing an intervention operation on the candidate reasons and determining a counter fact conditional probability distribution of the abnormal data under the intervention operation, wherein the intervention operation is used for changing the value of a variable corresponding to the candidate reasons so as to simulate the influence of the value change on the abnormal data; And determining the causal effect value of each candidate reason on the abnormal data according to the inverse condition probability distribution.
- 6. The method of claim 5, wherein said determining the cause of the anomaly from each of the candidate causes based on the cause and effect values comprises: carrying out data division on the abnormal data according to a preset time window to obtain a plurality of data blocks; For a first variable of a candidate reason corresponding to each data block, after randomly rearranging the first variable, reassigning a time stamp of the first variable, and keeping the time stamps of other data except the first variable in the data block unchanged to obtain a comparison sample set; Constructing zero distribution based on the control sample set, and determining the cumulative distribution function position of the causal effect value in the zero distribution to obtain the corresponding significance of the causal effect value; and under the condition that the significance is larger than a preset significance threshold, determining that the candidate source corresponding to the cause and effect value is the abnormal cause.
- 7. The method of claim 1, wherein updating the semantic knowledge-graph based on the anomaly data and the confidence level of the anomaly cause comprises: constructing an anomaly group based on the anomaly cause and the anomaly data; Determining the similarity between the abnormal group and the history group as the confidence, wherein the history main data corresponding to the history group is matched with the target main data; updating the edge weight of the business relation edge corresponding to the abnormal data based on a weight updating formula under the condition that the confidence coefficient is larger than a preset confidence coefficient threshold value, wherein the weight updating formula is sigma ij (t+1)=βσ ij (t) + (1) Beta) s, wherein beta epsilon (0, 1), sigma ij (t) is the original edge weight of the business relation edge at the current time t, and s is the confidence coefficient.
- 8. The method of claim 1, wherein before the generating a task sequence corresponding to the auditing task according to the auditing task and a semantic knowledge graph corresponding to a current time in response to receiving the auditing task, the method further comprises: Extracting entity relations from different service systems, wherein the entity relations comprise the main data, data association relations among the main data and time stamps corresponding to the main data, and semantic alignment is carried out on the main data with the same index in the different service systems; according to the time stamp, carrying out time sequence arrangement on the entity relationship to obtain a time sequence data stream; Determining causal links between the primary data according to the time-series data stream; determining the corresponding relation weight of each data association relation according to the causal link; And taking the main data as a knowledge entity, the association relationship as the business relationship side, and the relationship weight as the side weight of the business relationship side to obtain the semantic knowledge graph.
- 9. A data auditing apparatus, the apparatus comprising: The system comprises a generation module, a auditing task generation module and a processing module, wherein the generation module is used for responding to the received auditing task and generating a task sequence corresponding to the auditing task according to the auditing task and a semantic knowledge graph corresponding to the current moment, wherein the semantic knowledge graph comprises a plurality of knowledge entities and business relation edges among the plurality of knowledge entities; the acquisition module is used for acquiring abnormal data of the abnormality in the target main data indicated by the task sequence based on an abnormality checking tool corresponding to the task sequence; The determining module is used for carrying out causal relation analysis on the abnormal data according to the business relation edge and determining the abnormal reason of the abnormal data; And the updating module is used for updating the semantic knowledge graph according to the abnormal data and the confidence coefficient of the abnormal reason, and executing the auditing task received next time by utilizing the updated semantic knowledge graph.
- 10. An electronic device comprising a processor and a memory storing computer program instructions; The processor, when executing the computer program instructions, implements a data auditing method according to any one of claims 1-8.
Description
Data auditing method, device and equipment Technical Field The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, and a device for auditing data. Background After the large-scale group completes centralized deployment of enterprise resource planning (ENTERPRISE RESOURCE PLANNING, ERP), main data deposited in a plurality of sets of systems in the past are still in a scattered state, and service pain points such as high purchase cost are caused, so that urgent demands are put forward on quality audit of the main data. The data auditing method in the related art mainly relies on a predefined static rule base (such as format verification and dictionary matching) and a fixed metadata model to carry out logic judgment. However, these methods cannot sense, understand and adapt to the continuously changing business semantic context in real time, and when the data changes due to business semantic drift, the static auditing logic in the related art has difficulty in making accurate identification and judgment, thus resulting in lower accuracy of data auditing. Disclosure of Invention The data auditing method, the device and the equipment provided by the application can improve the accuracy of data auditing. In a first aspect, an embodiment of the present application provides a data auditing method, including: Responding to the received auditing task, and generating a task sequence corresponding to the auditing task according to the auditing task and a semantic knowledge graph corresponding to the current moment, wherein the semantic knowledge graph comprises a plurality of knowledge entities and business relation edges among the plurality of knowledge entities; Based on an abnormality checking tool corresponding to the task sequence, obtaining abnormal data of abnormality in target main data indicated by the task sequence; carrying out causal association analysis on the abnormal data according to the business relation edge, and determining the abnormal reason of the abnormal data; And updating the semantic knowledge graph according to the abnormal data and the confidence coefficient of the abnormal cause, and executing the auditing task received next time by using the updated semantic knowledge graph. In a second aspect, the present application provides a data auditing apparatus, the apparatus comprising: The system comprises a generation module, a auditing task generation module and a processing module, wherein the generation module is used for responding to the received auditing task and generating a task sequence corresponding to the auditing task according to the auditing task and a semantic knowledge graph corresponding to the current moment, wherein the semantic knowledge graph comprises a plurality of knowledge entities and business relation edges among the plurality of knowledge entities; the acquisition module is used for acquiring abnormal data of the abnormality in the target main data indicated by the task sequence based on an abnormality checking tool corresponding to the task sequence; The determining module is used for carrying out causal relation analysis on the abnormal data according to the business relation edge and determining the abnormal reason of the abnormal data; And the updating module is used for updating the semantic knowledge graph according to the abnormal data and the confidence coefficient of the abnormal reason, and executing the auditing task received next time by utilizing the updated semantic knowledge graph. In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing computer program instructions; The processor, when executing the computer program instructions, implements a data auditing method as in any of the embodiments of the first aspect. In a fourth aspect, embodiments of the present application provide a computer storage medium having stored thereon computer program instructions which, when executed by a processor, implement a data auditing method as in any of the embodiments of the first aspect. In a fifth aspect, embodiments of the present application provide a computer program product, instructions in which, when executed by a processor of an electronic device, cause the electronic device to perform a data auditing method implementing any one of the embodiments of the first aspect described above. In the data auditing method, device and equipment provided by the embodiment of the application, when the task sequence is generated, the task sequence is dynamically constructed by combining the semantic knowledge graph at the current moment and the auditing task instead of relying on the predefined static rule, so that the task sequence can be accurately matched with the business association logic of the main data. In the abnormal data identification stage, by means of an abnormal inspection tool corresponding to the task sequence, the