Search

CN-121996508-A - Abnormal monitoring method and system for layered data of data under-lake service domain

CN121996508ACN 121996508 ACN121996508 ACN 121996508ACN-121996508-A

Abstract

The application discloses an anomaly monitoring method and system for hierarchical data of a service domain under a data lake, and relates to the technical field of data management. The method comprises the steps of performing form verification on a plurality of matched service hierarchies, obtaining form matching degree, screening and obtaining target service hierarchies, obtaining a first service hierarchy, performing interlayer circulation verification, obtaining a plurality of circulation dependency coefficients, obtaining circulation matching degree, performing similarity verification on the first service hierarchy, and adding the first service hierarchy with abnormal judgment results into a historical abnormal service hierarchy set. The method solves the technical problems that in the prior art, a special method for layering characteristics is lacking in abnormal monitoring of data lake service domain layering data, deep anomalies such as form matching, interlayer circulation and the like are difficult to effectively identify, and abnormal discovery is not timely.

Inventors

  • LIU JUNHONG
  • YANG HUI
  • HUANG YI
  • DU QIAO
  • LU ZHENYU

Assignees

  • 广西桂冠电力股份有限公司

Dates

Publication Date
20260508
Application Date
20260127

Claims (10)

  1. 1. The method for monitoring the abnormality of the layered data of the data under-lake service domain is characterized by comprising the following steps: Acquiring service domain feature factors of target service, and carrying out data layering matching on target service areas in a data lake based on the service domain feature factors to acquire a plurality of matching service layering; Performing form verification on a plurality of matched service hierarchies to obtain form matching degree, screening the matched service hierarchies with the form matching degree larger than or equal to a form matching degree threshold to obtain the matched service hierarchies as target service hierarchies, and adding the matched service hierarchies with the form matching degree smaller than the form matching degree threshold into a history abnormal service hierarchy set as easily mixed service hierarchies; Randomly acquiring a first service layer from the target service layer, performing interlayer circulation verification, acquiring a plurality of circulation dependency coefficients, and acquiring circulation matching degree based on the circulation dependency coefficients and data quality parameters of the first service layer; and acquiring comprehensive anomaly degree based on the form matching degree and the circulation matching degree, carrying out similarity check on the first business hierarchy with the comprehensive anomaly degree larger than a comprehensive anomaly threshold value by combining the historical anomaly business hierarchy set, acquiring anomaly similarity, carrying out anomaly judgment, and adding the first business hierarchy with the anomaly judgment result of anomaly into the historical anomaly business hierarchy set.
  2. 2. The anomaly monitoring method for data in-lake service domain layered data according to claim 1, wherein obtaining service domain feature factors of a target service, and performing data layered matching on a target service area in a data lake based on the service domain feature factors, obtaining a plurality of matching service layers, comprises: acquiring service domain feature factors of target service, wherein the service domain feature factors comprise main data key attributes, service flow identifiers and management and control levels; and carrying out data layering matching on a target service area in the data lake based on the service domain characteristic factors to obtain a plurality of matching service layering, wherein the target service area is obtained based on the main data key attribute.
  3. 3. The anomaly monitoring method for hierarchical data of a service domain under a data lake of claim 1, wherein performing form verification on a plurality of matched service hierarchies to obtain a form matching degree comprises: Performing form verification on a plurality of matching service layers, wherein the form verification comprises data form verification and content form verification; acquiring data matching degree and content matching degree based on the data form verification result and the content form verification result; and weighting and calculating the data matching degree and the content matching degree to obtain the form matching degree.
  4. 4. The anomaly monitoring method for data in a data lake of business domain layering data according to claim 1, wherein the steps of screening and obtaining a matching business layering with a form matching degree greater than or equal to a form matching degree threshold as a target business layering, and adding a matching business layering with a form matching degree smaller than the form matching degree threshold as a confusing business layering into a historical anomaly business layering set comprise: a form matching degree threshold is obtained, wherein the form matching degree threshold is obtained by mapping matching based on a service domain characteristic factor; screening and obtaining a matched service layering with the form matching degree larger than or equal to a form matching degree threshold value as a target service layering; And taking the matched service layering with the form matching degree smaller than the form matching degree threshold as the miscible service layering, and adding the miscible service layering into the historical abnormal service layering set.
  5. 5. The anomaly monitoring method for data in a service domain hierarchy under a data lake of claim 1, wherein the method for monitoring anomalies in the data in the service domain hierarchy under the data lake is characterized by randomly acquiring a first service hierarchy from the target service hierarchy, performing interlayer flow verification, acquiring a plurality of flow dependence coefficients, and acquiring a flow matching degree based on the flow dependence coefficients and data quality parameters of the first service hierarchy, and comprises: Randomly acquiring a first service hierarchy from a plurality of target service hierarchies; performing interlayer flow verification on the first service layer to obtain the source layer data duty ratio and the updating frequency; calculating and acquiring a data dependent coefficient and an update dependent coefficient based on the source layer data duty ratio and the update frequency; Weighting calculation is carried out on the data dependence coefficient and the update dependence coefficient, and a plurality of circulation dependence coefficients of the first business hierarchy are obtained; acquiring a data quality parameter of the first service layer; And correcting and calculating the circulation dependency coefficient based on the data quality parameter to obtain the circulation matching degree of the first service layer, wherein the circulation matching degree is the average value of a plurality of circulation matching degrees of the first service layer and a plurality of service layers.
  6. 6. The method for monitoring the anomaly of the hierarchical data of the service domain under the data lake of claim 5, wherein the interlayer flow verification is performed on the first service hierarchy to obtain the source layer data duty ratio and the update frequency, and the method comprises the following steps: Performing interlayer flow verification on the first service layer, obtaining a plurality of data sources of the first service layer, and obtaining the data duty ratio of source layer data in the first service layer data as the source layer data duty ratio, wherein the data sources comprise a plurality of service layers; And acquiring the updating frequency of the first business hierarchy, wherein the updating frequency comprises an updating time frequency.
  7. 7. The method for monitoring anomalies in hierarchical data of a data under-lake traffic domain of claim 5, wherein calculating the acquired data dependent coefficients and the updated dependent coefficients based on the source layer data duty ratio and the update frequency comprises: acquiring the source layer data duty ratio of the first service layer as the data dependency coefficient; Acquiring update frequencies of a plurality of data sources of the first service hierarchy; and acquiring an update dependency coefficient based on the update frequency of the first service hierarchy and the average value of the update frequencies of the plurality of data sources.
  8. 8. The anomaly monitoring method for data in a business domain hierarchy under a data lake of claim 1, wherein the method is characterized in that based on the form matching degree and the circulation matching degree, comprehensive anomaly degree is obtained, similarity verification is carried out on the first business hierarchy with the comprehensive anomaly degree larger than a comprehensive anomaly threshold value in combination with the historical anomaly business hierarchy set, anomaly similarity is obtained, anomaly discrimination is carried out, and the first business hierarchy with anomaly discrimination result being anomaly is added into the historical anomaly business hierarchy set, and comprises the steps of: weighting and calculating the form matching degree and the circulation matching degree to obtain comprehensive anomaly degree; acquiring a historical abnormal business layered set; When the comprehensive anomaly degree of the first service layering is larger than a comprehensive anomaly threshold value, taking the first service layering as a preliminary anomaly layering, otherwise, randomly acquiring the first service layering again; Based on the business domain characteristic factors, the flow dependence coefficients and the form matching degree of the preliminary abnormal layering, performing similarity verification on the historical abnormal business layering set to obtain a plurality of historical abnormal business layering which are concentrated by the historical abnormal business layering set and have the similarity larger than a historical abnormal threshold value as a plurality of similar abnormal business layering; Based on the preliminary abnormal layering and a plurality of historical abnormal similarities of a plurality of similar abnormal business layering, acquiring abnormal similarities and carrying out abnormal judgment; and adding the preliminary abnormal hierarchy with the abnormal judgment result being abnormal into a historical abnormal service hierarchy set.
  9. 9. The anomaly monitoring method for data in a data lake of the present invention according to claim 8, wherein based on the preliminary anomaly hierarchy and a plurality of historical anomaly similarities for a plurality of similar anomaly service hierarchies, acquiring anomaly similarities, and performing anomaly discrimination, comprising: Acquiring a plurality of historical anomaly similarities of the preliminary anomaly hierarchy and a plurality of similar anomaly service hierarchies; obtaining the maximum historical abnormal similarity and the average value of the historical abnormal similarities in the plurality of the historical abnormal similarities; and calculating and acquiring historical abnormal credibility based on the maximum historical abnormal similarity and the deviation value of the historical abnormal similarity mean value, correcting the historical abnormal similarity mean value, acquiring abnormal similarity and carrying out abnormal judgment.
  10. 10. An anomaly monitoring system for data under-lake service domain layered data, characterized by being used for executing the anomaly monitoring method for data under-lake service domain layered data according to any one of claims 1 to 9, comprising: The information acquisition module is used for acquiring service domain characteristic factors of target services, carrying out data layering matching on target service areas in the data lake based on the service domain characteristic factors, and acquiring a plurality of matching service layering; the form verification module is used for carrying out form verification on the plurality of matched service hierarchies to obtain form matching degree, screening the matched service hierarchies with the form matching degree larger than or equal to a form matching degree threshold to be used as target service hierarchies, and adding the matched service hierarchies with the form matching degree smaller than the form matching degree threshold into a history abnormal service hierarchy set as easy-to-mix service hierarchies; The matching degree calculation module is used for randomly acquiring a first service layer from the target service layer, performing interlayer circulation verification, acquiring a plurality of circulation dependence coefficients, and acquiring circulation matching degree based on the circulation dependence coefficients and data quality parameters of the first service layer; the similarity acquisition module is used for acquiring comprehensive abnormal degree based on the form matching degree and the circulation matching degree, carrying out similarity check on the first business hierarchy with the comprehensive abnormal degree larger than a comprehensive abnormal threshold value by combining the historical abnormal business hierarchy set, acquiring abnormal similarity, carrying out abnormal judgment, and adding the first business hierarchy with the abnormal judgment result being abnormal into the historical abnormal business hierarchy set.

Description

Abnormal monitoring method and system for layered data of data under-lake service domain Technical Field The application relates to the technical field of data management, in particular to an anomaly monitoring method and system for layered data of a data under-lake service domain. Background With the explosive growth of data volume and the deep advancement of enterprise digital transformation, data lakes are widely applied to various industries as a platform for centralized storage and management of massive heterogeneous data. Business domain layering is an important way of organizing and managing data in a data lake, and the usability and management efficiency of the data can be improved by layering the data according to business logic and data characteristics. However, in practical application, the data flow among different service hierarchies under the data lake may have the conditions of disorder of dependency relationship and uneven data quality, partial service hierarchies may be confused due to similar naming and similar structure, and simultaneously, new abnormal modes are continuously emerging along with dynamic changes of the service and continuous access of the data. In the prior art, anomaly monitoring of data of a data lake is focused on basic quality dimensions such as integrity, consistency and the like of the data, a special monitoring method aiming at layering characteristics of a service domain is lacked, deep anomalies of layering data in aspects such as form matching and interlayer circulation are difficult to effectively identify, anomaly discovery is not timely and accurate, and analysis decision and service application effects based on the data of the data lake are further affected. Disclosure of Invention The embodiment of the application solves the technical problems that in the prior art, the abnormal monitoring of the data lake business domain layered data lacks a special method aiming at layered characteristics, and deep anomalies such as form matching, interlayer circulation and the like are difficult to effectively identify, so that abnormal discovery is not timely. The technical scheme for solving the technical problems is as follows: in a first aspect, the present application provides a method for monitoring anomalies in hierarchical data of a service domain under a data lake, where the method includes: Acquiring service domain feature factors of target service, and carrying out data layering matching on target service areas in a data lake based on the service domain feature factors to acquire a plurality of matching service layering; Performing form verification on a plurality of matched service hierarchies to obtain form matching degree, screening the matched service hierarchies with the form matching degree larger than or equal to a form matching degree threshold to obtain the matched service hierarchies as target service hierarchies, and adding the matched service hierarchies with the form matching degree smaller than the form matching degree threshold into a history abnormal service hierarchy set as easily mixed service hierarchies; Randomly acquiring a first service layer from the target service layer, performing interlayer circulation verification, acquiring a plurality of circulation dependency coefficients, and acquiring circulation matching degree based on the circulation dependency coefficients and data quality parameters of the first service layer; and acquiring comprehensive anomaly degree based on the form matching degree and the circulation matching degree, carrying out similarity check on the first business hierarchy with the comprehensive anomaly degree larger than a comprehensive anomaly threshold value by combining the historical anomaly business hierarchy set, acquiring anomaly similarity, carrying out anomaly judgment, and adding the first business hierarchy with the anomaly judgment result of anomaly into the historical anomaly business hierarchy set. In a second aspect, the present application provides an anomaly monitoring system for hierarchical data of a service domain under a data lake, including: The information acquisition module is used for acquiring service domain characteristic factors of target services, carrying out data layering matching on target service areas in the data lake based on the service domain characteristic factors, and acquiring a plurality of matching service layering; the form verification module is used for carrying out form verification on the plurality of matched service hierarchies to obtain form matching degree, screening the matched service hierarchies with the form matching degree larger than or equal to a form matching degree threshold to be used as target service hierarchies, and adding the matched service hierarchies with the form matching degree smaller than the form matching degree threshold into a history abnormal service hierarchy set as easy-to-mix service hierarchies; The matching degree calculation module is