CN-122019554-A - Knowledge graph-based enterprise data asset intelligent inventory method and system
Abstract
The invention relates to the technical field of enterprise data management, in particular to an enterprise data asset intelligent checking method and system based on a knowledge graph, wherein the method receives asset metadata of data to be checked, extracts asset characteristics and distributes unique asset identifications according to similarity thresholds; the method comprises the steps of comparing the characteristic similarity of assets at adjacent time points, identifying the same asset at the time points and uniformly identifying the same asset to generate an initial inventory result, processing metadata by utilizing a trained deep learning inventory model to obtain initial knowledge node positions and graph embedded coordinates of data assets in a knowledge graph, tracking the life cycle by combining the inventory tracking model to obtain a target tracking result, constructing an asset change track based on the graph embedded result, dynamically checking and correcting the initial inventory result by combining the tracking result, and outputting a final inventory result with high accuracy, thereby realizing accurate identification, continuous tracking and automatic inventory of the data assets.
Inventors
- WANG SHAOQING
- Yang chuangye
- Xiong Boyuan
Assignees
- 青田坤德技术服务有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260115
Claims (10)
- 1. An enterprise data asset intelligent inventory method based on a knowledge graph, which is characterized by comprising the following steps: Receiving metadata of to-be-checked data assets, identifying asset characteristics corresponding to-be-checked data assets included in the metadata of to-be-checked data assets, and distributing different asset identifications to the to-be-checked data assets with the asset similarity smaller than a preset similarity threshold value between any two asset characteristics, wherein the metadata of to-be-checked data assets include to-be-checked data assets; Determining the actual similarity corresponding to any two asset characteristics in the metadata of a plurality of to-be-checked data assets of adjacent time points, determining to-be-checked data assets corresponding to the actual similarity not smaller than the preset similarity threshold as the same to-be-checked data asset crossing the time points, unifying the asset identifications in a plurality of time points, and determining the number of all the asset identifications as an initial checking result; Processing metadata of the data asset to be checked through a trained deep learning checking model to obtain initial knowledge node positions and graph embedded coordinates of the data asset included in the data asset to be checked, and tracking the life cycle of the data asset based on the trained checking tracking model to obtain a target tracking result; And determining a change track of each to-be-checked data asset based on the initial knowledge node position and the graph embedded coordinate, and verifying the initial checking result based on the change track and the target tracking result to obtain a checking result.
- 2. The knowledge-based intelligent inventory method of enterprise data assets, as claimed in claim 1, wherein said receiving metadata of the data assets to be inventory, comprises: Based on an enterprise multi-source data acquisition interface, obtaining structured data and unstructured data, and carrying out semantic alignment and entity disambiguation on the structured data and the unstructured data to obtain a snapshot of data assets to be checked; Extracting metadata from the data asset snapshot to be checked according to a preset time interval to obtain initial data asset metadata to be checked; And performing preprocessing operation on the initial to-be-checked data asset metadata to obtain to-be-checked data asset metadata, wherein the preprocessing operation comprises mode unification, field normalization and missing value filling.
- 3. The knowledge-graph-based enterprise data asset intelligent inventory method of claim 1, wherein determining the actual similarity corresponding to any two of the asset characteristics in the plurality of to-be-inventory data asset metadata of adjacent time points, and determining to-be-inventory data assets corresponding to which the actual similarity is not less than the preset similarity threshold as the same to-be-inventory data asset across time points, comprises: performing time sequence alignment on the asset characteristics included in the plurality of to-be-checked data asset metadata of adjacent time points, and establishing a characteristic evolution window for the to-be-checked data asset corresponding to each asset characteristic in the currently corresponding to-be-checked data asset metadata; Determining the actual similarity of the asset characteristics at the position corresponding to the characteristic evolution window time, and determining the to-be-checked data asset corresponding to the asset characteristics as the same to-be-checked data asset crossing a time point under the condition that the actual similarity is not smaller than the preset similarity threshold; And when the actual similarity is smaller than the preset similarity threshold, determining the actual similarity of the asset characteristics included in the metadata of the to-be-checked data assets except the current characteristic evolution window and the current asset characteristics, and when the actual similarity is not smaller than the preset similarity threshold, determining the to-be-checked data assets corresponding to the asset characteristics as the same to-be-checked data asset crossing a time point.
- 4. The knowledge-based intelligent inventory method of enterprise data assets according to claim 1, wherein said identifying asset characteristics corresponding to the data assets to be checked included in each of said metadata of the data assets to be checked includes: Extracting local metadata characteristics of the data asset to be checked, wherein the local metadata characteristics comprise at least one of data mode characteristics, field constraint characteristics, data format characteristics and data magnitude characteristics; Extracting global semantic features of the to-be-checked data asset, wherein the global semantic features comprise at least one of a business domain feature, a data sensitivity feature and an ownership relationship feature; and fusing the local metadata features and the global semantic features to obtain initial asset features corresponding to the data asset to be checked, and performing vector embedding processing on the initial asset features to obtain asset features.
- 5. The knowledge-based intelligent inventory method of enterprise data assets, as claimed in claim 1, wherein said verifying the initial inventory result based on the change track and the target tracking result, to obtain an inventory result, comprises: Constructing a knowledge association graph model, wherein each node in the knowledge association graph represents a data asset to be checked, and the edge weight represents semantic association probability; when the change track and the target tracking result meet the preset rechecking condition, re-identifying the data asset to be checked, and determining the checking result based on the re-identified data asset to be checked and the initial checking result; The preset rechecking condition comprises that the actual similarity between the asset features existing in the same change track is larger than a preset mutation threshold, asset features in different change tracks are mixed, and the relation deviation between the target tracking result and the change track exceeds at least one of preset fault tolerance thresholds.
- 6. The knowledge-based intelligent inventory method of enterprise data assets according to claim 1, further comprising, after determining the actual similarity corresponding to any two of the asset characteristics in the plurality of to-be-inventory data asset metadata at adjacent time points: Distributing asset identifiers to-be-checked data assets corresponding to the actual similarity smaller than the preset similarity threshold; after determining the actual similarity corresponding to each to-be-checked data asset of all to-be-checked data asset metadata, determining the active frequency corresponding to all the asset identifications, deleting the asset identifications when the active frequency is smaller than a preset frequency threshold value, and determining the number of all the remaining asset identifications as a checking result.
- 7. The knowledge-based intelligent inventory method of enterprise data assets, as claimed in claim 1, wherein the training process of the trained deep learning inventory model includes: Constructing a multi-level knowledge graph network, wherein the multi-level knowledge graph network comprises a bottom layer network, a middle layer network and a high layer network, the bottom layer network is used for extracting field-level characteristics, the middle layer network is used for capturing entity relation characteristics, and the high layer network is used for integrating business semantic characteristics; Inputting training metadata into the multi-level knowledge graph network, and adjusting the multi-level knowledge graph network through time sequence consistency constraint to obtain an initial deep learning inventory model; and carrying out joint optimization detection on the initial deep learning inventory model, and adding a relation loss function to obtain the trained deep learning inventory model.
- 8. An enterprise data asset intelligent inventory system based on a knowledge graph, the system comprising: The characteristic identification module is used for receiving metadata of to-be-checked data assets, identifying asset characteristics corresponding to-be-checked data assets included in the metadata of each to-be-checked data asset, and distributing different asset identifications to the to-be-checked data assets with the asset similarity smaller than a preset similarity threshold value between any two asset characteristics, wherein the to-be-checked data asset metadata includes to-be-checked data assets; The cross-time matching module is used for determining the actual similarity corresponding to any two asset characteristics in the metadata of the plurality of to-be-checked data assets of adjacent time points, determining to-be-checked data assets corresponding to the actual similarity not smaller than the preset similarity threshold as the same to-be-checked data asset of the cross-time points, unifying the asset identifications in a plurality of time points, and determining the number of all the asset identifications as an initial checking result; The model tracking module is used for processing the metadata of the data asset to be checked through the trained deep learning checking model to obtain the initial knowledge node position and the graph embedded coordinate of the data asset included in the data asset to be checked, and tracking the life cycle of the data asset based on the trained checking tracking model to obtain a target tracking result; And the track verification module is used for determining the change track of each data asset to be checked based on the initial knowledge node position and the graph embedded coordinate, verifying the initial checking result based on the change track and the target tracking result and obtaining the checking result.
- 9. A knowledge-based enterprise data asset intelligent inventory device, comprising a memory, a processor, and a knowledge-based enterprise data asset intelligent inventory program stored on the memory and executable on the processor, the knowledge-based enterprise data asset intelligent inventory program configured to implement the steps of the knowledge-based enterprise data asset intelligent inventory method of any of claims 1-7.
- 10. A medium, wherein a knowledge-based enterprise data asset intelligent inventory program is stored on the medium, and the knowledge-based enterprise data asset intelligent inventory program, when executed by a processor, implements the steps of the knowledge-based enterprise data asset intelligent inventory method according to any one of claims 1 to 7.
Description
Knowledge graph-based enterprise data asset intelligent inventory method and system Technical Field The invention relates to the technical field of enterprise data management, in particular to an enterprise data asset intelligent checking method and system based on a knowledge graph. Background With the continued penetration of enterprise digital transformation, data has become one of the core strategic assets. However, enterprises accumulate a large amount of scattered, heterogeneous and cross-system data resources in the long-term informatization construction process, so that the problems of various data assets, wide sources, complex structure, data island, unclear assets, repeated redundancy, unclear responsibilities and the like are commonly caused. The traditional data asset management method mainly relies on manual checking or simple statistics based on static metadata, is difficult to realize dynamic identification and accurate tracking of the whole life cycle of the data asset, and particularly has the problems of omission, misjudgment, repeated counting and the like when facing frequently-changed data tables, temporary data sets or logically-multiplexed data interfaces, and seriously influences the efficiency and accuracy of data management. In recent years, some enterprises try to introduce automation tools or rule-based metadata analysis systems to check data assets, but these methods generally lack deep understanding of semantic features and evolution relations of the data assets, cannot effectively identify homologous data assets crossing time and systems, and also cannot cope with identity misjudgment caused by tiny changes of data patterns. Meanwhile, the correlation among data assets and the insufficient utilization of the context information lead to isolated and static inventory process and lack of context awareness capability. The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art. Disclosure of Invention The invention mainly aims to provide an enterprise data asset intelligent checking method and system based on a knowledge graph, and aims to solve the technical problems that the existing enterprise data asset checking method is difficult to realize accurate identification, continuous tracking and semantic association of multi-source heterogeneous and dynamic evolution data assets, so that missing, repetition and misjudgment of asset checking results exist. In order to achieve the above object, the present invention provides an enterprise data asset intelligent inventory method based on a knowledge graph, the method comprising: Receiving metadata of to-be-checked data assets, identifying asset characteristics corresponding to-be-checked data assets included in the metadata of to-be-checked data assets, and distributing different asset identifications to the to-be-checked data assets with the asset similarity smaller than a preset similarity threshold value between any two asset characteristics, wherein the metadata of to-be-checked data assets include to-be-checked data assets; Determining the actual similarity corresponding to any two asset characteristics in the metadata of a plurality of to-be-checked data assets of adjacent time points, determining to-be-checked data assets corresponding to the actual similarity not smaller than the preset similarity threshold as the same to-be-checked data asset crossing the time points, unifying the asset identifications in a plurality of time points, and determining the number of all the asset identifications as an initial checking result; Processing metadata of the data asset to be checked through a trained deep learning checking model to obtain initial knowledge node positions and graph embedded coordinates of the data asset included in the data asset to be checked, and tracking the life cycle of the data asset based on the trained checking tracking model to obtain a target tracking result; And determining a change track of each to-be-checked data asset based on the initial knowledge node position and the graph embedded coordinate, and verifying the initial checking result based on the change track and the target tracking result to obtain a checking result. Optionally, the receiving the metadata of the data asset to be checked includes: Based on an enterprise multi-source data acquisition interface, obtaining structured data and unstructured data, and carrying out semantic alignment and entity disambiguation on the structured data and the unstructured data to obtain a snapshot of data assets to be checked; Extracting metadata from the data asset snapshot to be checked according to a preset time interval to obtain initial data asset metadata to be checked; And performing preprocessing operation on the initial to-be-checked data asset metadata to obtain to-be-checked data asset metadata, wherein t