Search

CN-121542274-B - Method and device for collecting global metadata and automatically constructing metadata association relation

CN121542274BCN 121542274 BCN121542274 BCN 121542274BCN-121542274-B

Abstract

The invention relates to the technical field of big data, in particular to a method and a device for collecting global metadata and automatically constructing a metadata association relation, wherein a main control platform and a collector Agent are designed, and metadata collectors of different types are integrated through a new technical architecture, so that the collection of metadata of all types of the global metadata can be realized by only one technical architecture; on the basis of the technology of constructing the full-link relationship for the metadata, the technology of constructing the full-link relationship for the metadata of each type is adopted, the intelligent and automatic establishment of the association relationship for the metadata of each type scattered in the full-link of data circulation is realized based on the metadata unified identification generation technology and the cross-mode relationship construction technology, and the method for realizing the plug-in implementation of the metadata acquisition technology comprises plug-in standard interfaces, centralized configuration management and unified state monitoring, so that the important engineering difficulty of seamlessly integrating multiple heterogeneous acquisition technologies into one platform is solved.

Inventors

  • DING HONGXIN
  • CHEN HUI
  • YANG SHU
  • JIAN YIPENG
  • Yan Xidi

Assignees

  • 中电科大数据研究院有限公司

Dates

Publication Date
20260512
Application Date
20260120

Claims (8)

  1. 1. The method for automatically constructing the global metadata acquisition and metadata association relation is characterized by comprising the following steps of: The method comprises the following steps that S1, a plurality of metadata acquisition plug-ins are managed in a unified mode through a main control platform, wherein the main control platform comprises a plug-in manager, a plug-in warehouse, a state monitoring manager and a configuration center manager, the plug-in warehouse stores metadata acquisition plug-ins which are subjected to modularized transformation, the plug-in manager registers and manages the metadata acquisition plug-ins, the configuration center manager distributes the selected metadata acquisition plug-ins to collector agents, and the state monitoring manager monitors the running states of the collector agents in real time; s2, loading and executing the metadata acquisition plug-in through the collector Agent, acquiring metadata from a plurality of links of a full chain of data circulation, wherein the collector Agent comprises a configuration client, a heartbeat client, an acquisition frame, the collector plug-in and a data cache module, the configuration client configures communication information between the collector Agent and the main control platform, the heartbeat client reports the running state of the collector Agent at regular time, the acquisition frame loads and executes the collector plug-in, and the data cache module temporarily stores the acquired metadata; S3, generating a unified identifier for the acquired metadata, wherein the unified identifier is formed by splicing key information based on the metadata, and the key information comprises a time stamp, a host ID, a database name, a field name, a service type, an operation description, an operation user and a permission; S4, automatically constructing association relations among metadata based on the unified identification through an intelligent association analysis engine, wherein the intelligent association analysis engine analyzes metadata identification content by adopting at least one strategy of rule-based construction, semantic-based construction and map-based pushing, and outputs the metadata association relations; s5, storing the constructed metadata association relationship to a metadata repository to support metadata application; In the step S4, further includes: When the intelligent association analysis engine adopts a rule-based construction strategy, a Drools rule engine is used for analyzing the unique identification content of the metadata, and the association relation of the metadata is identified according to the time similarity and the attribute overlapping degree rule; When a strategy based on semantic construction is adopted, a TF-IDF model is used for calculating semantic similarity of metadata identification content, and a metadata dependency relationship is judged based on a similarity threshold; When adopting a map-based reasoning strategy, storing metadata entities and relations by using a Neo4j graph database, and deducing the metadata relations which are not directly related by applying a reasoning rule; in the rule-based construction strategy, the rule includes: If the time stamp difference value of the two metadata is within a preset range and shares at least one same attribute, judging that an association relationship exists; in the semantic-based construction strategy, a semantic similarity threshold value is set to be 0.9, and if the semantic similarity threshold value exceeds the threshold value, the existence of a dependency relationship is judged; In the graph-based reasoning strategy, the reasoning rule comprises that if the metadata A is associated with the metadata B and the metadata B is associated with the metadata C, the metadata A is inferred to be associated with the metadata C.
  2. 2. The method for automatically constructing global metadata collection and metadata association according to claim 1, wherein in S1, further comprises: The main control platform reads metadata acquisition plug-in jar packages in a plug-in warehouse through a plug-in manager and completes plug-in registration, the plug-in manager displays a plug-in list through a graphical interface, the configuration center manager selects metadata acquisition plug-ins based on the plug-in list and distributes the metadata acquisition plug-ins to specified collector agents, and the state monitoring manager collects running logs and performance indexes of the collector agents through a heartbeat mechanism.
  3. 3. The method for automatically constructing global metadata collection and metadata association according to claim 1, wherein in S2, further comprises: The collector Agent receives a collection instruction issued by a main control platform through a configuration client, the collection instruction comprises a target data source address, authentication information and collection parameters, the heartbeat client sends running state information to the main control platform according to preset frequency, the acquisition framework dynamically loads an acquisition plug-in jar packet through the class loader, calls a plug-in execution function to start metadata acquisition, and the data caching module temporarily stores acquired metadata in a buffer area and triggers the metadata unified identification generation flow.
  4. 4. The method for automatically constructing global metadata collection and metadata association according to claim 1, wherein in S3, the specific step of generating the uniform identifier includes: S31, receiving metadata from the collector Agent through a metadata information receiving module; S32, reading key information from metadata, wherein the key information comprises a time stamp, a host ID, a database name, a field name, a service type, an operation description, an operation user and authority; s33, splicing the key information into character strings according to a preset sequence to form a metadata unique identifier; and S34, storing the unique identification and the metadata together into a database.
  5. 5. The method for automatically constructing global metadata collection and metadata association according to claim 1, wherein in S2, further comprises: The collector plug-in comprises at least one of a JDBC probe, an API parser and an XML parser, and is subjected to modular modification through standard interface specifications, wherein the standard interface specifications comprise an initialization method, a start collection method, a stop collection method and a plug-in inspection method.
  6. 6. The method for automatically constructing global metadata collection and metadata association according to claim 1, wherein S5 further comprises S6: providing metadata application services based on association in a metadata repository, including blood-margin analysis, full-link data flow monitoring and problem root cause investigation; In the step S1, further includes: And uploading metadata to collect plug-in jar packages through a graphical interface by the plug-in warehouse, adding the plug-ins to a management list through jar package registration function by the plug-in manager, and supporting batch distribution of the plug-ins to a plurality of collector agents by the configuration center manager.
  7. 7. The method for automatically constructing global metadata collection and metadata association according to claim 1, wherein in S4, further comprises: the intelligent association analysis engine cooperatively uses rule-based construction, semantic-based construction and map-based reasoning strategies, applies the rule-based construction strategy to identify direct association, then applies the semantic-based construction strategy to supplement similar metadata association, and finally applies the map-based reasoning strategy to infer indirect association to construct a complete metadata relationship map.
  8. 8. The utility model provides a device that global metadata gathered and metadata incidence relation automatic construction which characterized in that includes: one or more processors; A memory having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement a method for global metadata collection and metadata association automatic construction according to any one of claims 1 to 7.

Description

Method and device for collecting global metadata and automatically constructing metadata association relation Technical Field The invention relates to the technical field of big data, in particular to a method and a device for collecting global metadata and automatically constructing a metadata association relationship. Background In the public data authorized operation field, data circulation involves a plurality of links and participating subjects, and the security risk has the characteristics of concealment, transitivity, complexity and the like. In order to effectively manage and control the safety risk of data in open utilization, fine-grained monitoring needs to be performed on the whole chain of data acquisition, processing, warehousing, data loading to catalogue, data application, data approval, data authorization, data product development and data authorization supply in the data circulation process. To monitor the whole chain in fine granularity, the metadata information generated by the whole chain must be probed and collected, and analysis is performed based on the collected metadata. The current technical proposal of the public data authorization operation full-link metadata acquisition mainly has the following difficulties: Metadata collection mainly focuses on metadata of key data flow processes of a single link, such as database operation monitoring log metadata, data security classification grading metadata, data calling condition metadata and the like, and is difficult to support and develop data flow process risk analysis and monitoring of a full link. The full-link metadata acquisition often needs to reform a service system, each service system relates to related technologies of embedding metadata acquisition in each link of a data flow, has high reform and implementation cost and great difficulty, can influence the stability and performance of the service system, and is difficult to adapt to heterogeneous complex environments of a public data authorization operation platform. Under different system scenes and different modes of data, metadata needs to be acquired by using different technologies, such as an SQL parser or Sidercar technology is required to acquire metadata aiming at database or file metadata, and Bytecode enhancement technology is required to acquire metadata aiming at information generated in the approval process, so that the metadata acquisition requirements of various types of metadata in the whole data circulation process cannot be met by using a single metadata acquisition technology. A complete business data full chain comprises data acquisition, processing, warehousing, data loading to catalogue, data application, data approval, data authorization, data product development and data authorization supply. Metadata collected in each process is fractured, and manual combing is needed to establish association relation between the metadata and each other. Disclosure of Invention According to a first aspect of the present invention, the present invention provides a method for global metadata collection and metadata association automatic construction, comprising the following steps: The method comprises the following steps that S1, a plurality of metadata acquisition plug-ins are managed in a unified mode through a main control platform, wherein the main control platform comprises a plug-in manager, a plug-in warehouse, a state monitoring manager and a configuration center manager, the plug-in warehouse stores metadata acquisition plug-ins which are subjected to modularized transformation, the plug-in manager registers and manages the metadata acquisition plug-ins, the configuration center manager distributes the selected metadata acquisition plug-ins to collector agents, and the state monitoring manager monitors the running states of the collector agents in real time; s2, loading and executing the metadata acquisition plug-in through the collector Agent, acquiring metadata from a plurality of links of a full chain of data circulation, wherein the collector Agent comprises a configuration client, a heartbeat client, an acquisition frame, the collector plug-in and a data cache module, the configuration client configures communication information between the collector Agent and the main control platform, the heartbeat client reports the running state of the collector Agent at regular time, the acquisition frame loads and executes the collector plug-in, and the data cache module temporarily stores the acquired metadata; S3, generating a unified identifier for the acquired metadata, wherein the unified identifier is formed by splicing key information based on the metadata, and the key information comprises a time stamp, a host ID, a database name, a field name, a service type, an operation description, an operation user and a permission; S4, automatically constructing association relations among metadata based on the unified identification through an intelligent association an