CN-121996622-A - Dynamic collection method, system and equipment for multi-level data catalogue
Abstract
The invention relates to the technical field of data processing and provides a dynamic collection method, a system and equipment of a multi-level data directory, comprising the steps of receiving a directory operation application submitted by a user according to a preset data directory description strategy at a regional node, and executing approval on the directory operation application; updating the state of the corresponding directory entry at the local node according to the approval result, storing the updated directory entry in the local directory library, dynamically reporting the approved directory entry with the updated state to the global node, and collecting and merging the directory entries from at least one local node at the global node to update the global directory library to provide unified query service. The method can improve the accuracy and authority of the catalog information from the data source, realize the flow and responsibility definition of the data catalog management, ensure the standardization, simultaneously give consideration to the flexibility and timeliness of catalog management, and obviously reduce the technical complexity and cost of system integration and long-term maintenance.
Inventors
- WU ZIXUAN
Assignees
- 中电云计算技术有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260127
Claims (10)
- 1. A method for dynamic aggregation of multi-level data directories, the method comprising: step S1, at a regional node, receiving a catalog operation application submitted by a user according to a preset data catalog description strategy, and executing approval on the catalog operation application; Step S2, updating the state of the corresponding directory entry at the regional node according to the approval result, and storing the updated directory entry in a regional directory library; step S3, dynamically reporting the catalog entries which pass approval and are updated in state to a global node; and S4, at the global node, collecting and merging directory entries from at least one local node, updating the global directory library, and providing unified query service.
- 2. The method for dynamic aggregation of multi-level data directories according to claim 1, wherein step S1 comprises: receiving a catalogue operation application submitted by a user through a catalogue registration interface provided by the regional node, wherein the catalogue operation comprises newly added registration, content change or catalogue revocation; performing normalization verification on application contents submitted by a user according to a preset data directory description strategy, wherein the preset data directory description strategy comprises structured fields, data types, format constraints and semantic rules of a data directory; And submitting the checked application to an approval process, generating a task to be approved, notifying an area manager to carry out approval, receiving approval operation of the manager, and recording approval results and approval comments.
- 3. The method for dynamic aggregation of multi-level data directories according to claim 2, wherein the approval process comprises: Setting a catalogue approval workflow unit for managing an approval process at a regional node, and sending an approval notice to a designated regional manager when a task to be approved is generated; Displaying the complete content of the to-be-approved catalogue entry to an administrator through an approval operation interface, wherein the complete content comprises an application type, a submitter, application time and inventory information; providing two operation options of passing and rejecting in an approval operation interface, and setting an approval opinion input area; and recording complete approval process information, wherein the approval process information comprises approval serial numbers, approval types, approval states, approval time, approvers and approval opinions, and forming an auditable approval record.
- 4. The method according to claim 1, wherein the step S2 comprises setting an area directory state identification unit for determining the state of the directory entry according to the approval result at the area node, marking the corresponding directory entry as the validated state if the approval result is passed and the directory operation is newly added registration, marking the corresponding directory entry as the updated state and updating the directory content if the approval result is passed and the directory operation is content change, marking the corresponding directory entry as the invalidated state if the approval result is passed and the directory operation is directory withdrawal, and storing the directory entry and the associated metadata thereof set in the completed state into the area directory library.
- 5. The method of claim 1, wherein step S3 includes setting an area reporting service unit for monitoring an event of updating the directory state at an area node, automatically triggering a reporting process when the state update of the directory entry is detected and the corresponding approval result is passed, packaging the directory entry to be reported and the operation type, state information and approval record thereof into a reporting data packet, transmitting the reporting data packet to a global node through a directory reporting and communication module, and executing identity verification and data encryption in the communication process.
- 6. The method of claim 1, wherein in step S4, the step of aggregating and merging directory entries from at least one local node at the global node comprises: setting a global catalog receiving unit of the reporting data packet for receiving the local node at the global node; analyzing the reported data packet, and extracting the directory entry and the global unique identifier, the operation type and the state information thereof; Judging whether the catalog entry is a new entry in the global catalog library according to the global unique identifier, if so, creating a corresponding record in the global catalog library, and if so, executing content merging or state updating according to the operation type; version information and update time of directory entries in the global directory library are updated.
- 7. The method of dynamic aggregation of multi-level data directories according to claim 6, wherein step S4 further comprises: Setting a catalog state management unit for maintaining the consistency of catalog states at the global node, wherein the catalog state management unit records the current state, the effective time, the failure time and the state change history of each catalog item; When receiving the catalog update report of the regional node, updating the corresponding record in the catalog state management unit according to the state information in the report information; by comparing the directory states of the local node and the global node, it is ensured that the states of the global directory library and the local directory libraries remain consistent.
- 8. The method for dynamic aggregation of multi-level data directories according to claim 1, wherein in step S4, a unified query service is provided as follows: Setting a catalog inquiry interface unit at a global node to provide two inquiry modes of an application programming interface and a structured inquiry language; a search and filter unit supporting combined search according to a plurality of dimensions of directory names, data sources, affiliated areas, industry classifications and update time is arranged, And setting a catalog display interface unit supporting paging display, sorting and detailed information viewing, and displaying the query result in a graphical mode.
- 9. A dynamic aggregation system for a multi-level data directory, the system comprising: The application and approval module is used for receiving a catalogue operation application submitted by a user at the regional node according to a preset data catalogue description strategy and executing approval on the catalogue operation application; the updating and storing module is used for updating the state of the corresponding directory entry at the regional node according to the approval result and storing the updated directory entry in the regional directory library; The reporting module is used for dynamically reporting the catalogue entries which pass the approval and are updated in the state to the global node; And the aggregation and merging module is used for aggregating and merging directory entries from at least one local node at the global node, updating the global directory library and providing unified query service.
- 10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any one of claims 1-8 when the program is executed.
Description
Dynamic collection method, system and equipment for multi-level data catalogue Technical Field The present invention relates to the field of data processing technologies, and in particular, to a method, a system, and an apparatus for dynamic aggregation of multi-level data directories. Background In the current digital transformation process, data has become a core strategic asset in various organizations and industry fields. In order to realize effective management, sharing and value mining of data, a unified, standard and credible data directory system is constructed as an essential basic link. The data directory is not only a systematic catalogue of data resources, but also a key support for realizing data capitalization, service and flow, and particularly has a remarkable effect in large-scale organization or country-level data infrastructure construction of cross-region, cross-layer and cross-system. Currently, the management technology of data directories mainly has the following typical modes, but these methods show obvious limitations when dealing with complex and large-scale application scenarios: First, directory management methods based on manual compilation are common in organizations with early informatization or smaller scale. The method typically relies on a spreadsheet, document, or internal knowledge base for manual recording and maintenance. The method has the advantages of simple operation, obvious defects of low manual updating efficiency, easiness in information hysteresis, lack of unified format and semantic specification, difficulty in guaranteeing data quality, incapability of supporting automatic retrieval and integrated calling, and serious defects of expandability and usability. Secondly, heterogeneous data directory systems, which are independently deployed, are increasingly being widely adopted with technological development. Different departments or areas often construct respective data directories based on open source or business tools such as APACHE ATLAS, collibra, etc. Although effective in operation in a local area, these systems differ significantly in terms of data model, metadata structure, interface protocol, etc., due to the lack of top-level design and unified standards, creating a "data islanding" effect. As a result, a global unified view of data assets cannot be formed, cross-system data discovery and sharing become extremely difficult, and overall data governance performance is limited. Furthermore, based on the preliminary aggregation of interface calls, an attempt is made to extract directory information from each individual system through the application programming interface and aggregate at the central platform. However, this approach presents a significant challenge in practice. The most central problem is the isomerism of directory descriptions, namely that naming, format and semantic expression of the same data attribute by different systems are often inconsistent, for example, a data source field may be defined as "source", "dataSource" or "data source", and the update frequency may have multiple expressions such as "daily", "day", "every natural day", and the like. This semantic level of inconsistency makes automated aggregation dependent on extensive manual mapping and cleaning, costly to implement and difficult to maintain continuously. In addition, the method adopts a timing batch processing mode to synchronize data, so that real-time updating is difficult to realize, and information lag and inconsistent state occur between the central directory and the regional directory. When a new regional node needs to be accessed, a special interface and an adaptation logic are usually required to be customized and developed, the system expansibility and maintainability are poor, and the whole architecture is difficult to adapt to a dynamically-changed service environment. The defects of the prior art can be reduced to the first aspect of lacking a set of commonly accepted data catalog description specifications with constraint force and failing to unify data definition and semantic expression from the source, the second aspect of lacking a matched technical framework supporting standardized registration, flow approval and dynamic synchronization, and being difficult to realize efficient and reliable multi-level catalog collection on the premise of ensuring data quality. The Chinese patent with publication number CN115269927A discloses a distributed data asset directory aggregation method and system, which solves the basic problem of data directory aggregation from dispersion to concentration, but adopts a technical paradigm of concentration, batch processing and post-verification. The paradigm has inherent shortboards on the key requirements of modern data asset management such as real-time dynamic synchronization, source data management, flow-process entitlement control, flexible system expansion and the like, and the defects are caused by the failure to estab