CN-122023015-A - Marking method, device, medium and system for importance level of data asset
Abstract
The application provides a marking method, a device, a medium and a system for the importance level of a data asset, wherein the method comprises the steps of extracting data processing link information from a metadata management system and constructing a data blood margin map according to the data processing link information; and marking the importance level of the data asset according to the approximate minimum point coverage set, and displaying the marking result in a visual mode. Therefore, the problem that the prior art cannot find the minimum core asset set covering global dependence in a reasonable time, so that the protection range is overlarge or key nodes are omitted is solved.
Inventors
- HUANG ZHOU
- WANG WEI
- WANG ZHIHAO
Assignees
- 中国工商银行股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260122
Claims (10)
- 1. A method for marking a level of importance of a data asset, comprising: Extracting data processing link information from a metadata management system, and constructing a data blood-edge map according to the data processing link information, wherein the data blood-edge map is in a directed graph form, nodes of the data blood-edge map represent data assets, and edges of the data blood-edge map represent data flow directions; Solving an approximate minimum point coverage set for the data blood-lineage map; And marking importance level of the data asset according to the approximate minimum point coverage set, and displaying marking results in a visual mode.
- 2. The method of claim 1, wherein solving the data blood-lineage map for an approximate minimum coverage set includes: and solving an approximate minimum point coverage set by adopting an iterative algorithm, wherein a protection set is empty at the beginning of the iterative algorithm, then selecting a node with the maximum current degree for each iteration to add into the protection set, and deleting all edges associated with the node from the data blood edge map until no edge exists in the data blood edge map.
- 3. The method of claim 2, wherein in solving the set of approximate minimum point coverage using an iterative algorithm, the method further comprises: selecting a node which is currently associated with the most uncovered edges as a target node each time; A removing step of adding the target node to an initialization protection set and removing all edges associated with the target node from an edge set; the selecting step and the removing step are repeated until the edge set is an empty set.
- 4. The method of claim 1, wherein in extracting data processing link information from a metadata management system and constructing a data blood-margin map from the data processing link information, the method further comprises: Extracting the characteristics of each link in the data blood-edge map by adopting a machine learning model, wherein the characteristics of the links comprise the frequency of data flow, the size of data volume, the sensitivity degree of data and the service association degree between upstream and downstream data assets; calculating the weight of each link by adopting the machine learning model, and determining the score of each link based on the weight of each link; The irrelevant or redundant links are filtered based on the score of each of the links.
- 5. The method of claim 1, wherein in solving the data blood-lineage map for an approximate minimum coverage set, the method further includes: Constructing a target model, wherein the target model is used for converting the minimum point coverage problem of the data blood-edge map into a linear programming problem, each node corresponds to a decision variable, the decision variables represent whether the nodes are selected into a coverage set, the target function is used for minimizing the total number of the selected nodes, and the constraint condition ensures that at least one endpoint is selected on each side in the map; And calculating an optimal solution of the target model by adopting a linear programming solver to obtain a decision variable value of each node.
- 6. The method of claim 1, wherein after marking the importance level for the data asset, the method further comprises: counting historical access frequency of the data asset by adopting a sliding window to determine an abnormal access mode; And adjusting the importance level of the data asset according to the abnormal access mode.
- 7. The method of claim 6, wherein adjusting the level of importance of the data asset according to the abnormal access pattern comprises: Collecting all access records to the data asset and identifying an access pattern deviating from normal behavior as the abnormal access pattern using a statistical or machine learning algorithm; And processing the abnormal access mode by adopting a neural network model to obtain a target importance level, and adjusting the importance level of the data asset to the target importance level.
- 8. A data asset importance level marking device comprising: The construction unit is used for extracting data processing link information from the metadata management system and constructing a data blood-edge map according to the data processing link information, wherein the data blood-edge map is in a directed graph form, nodes of the data blood-edge map represent data assets, and edges of the data blood-edge map represent data flow directions; A first processing unit for solving an approximate minimum point coverage set for the data blood-lineage map; And the second processing unit is used for marking the importance level of the data asset according to the approximate minimum point coverage set and displaying the marking result in a visual mode.
- 9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program, when run, controls a device in which the computer readable storage medium is located to perform the method of any one of claims 1 to 7.
- 10. A marking system for a level of importance of a data asset, comprising one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-7.
Description
Marking method, device, medium and system for importance level of data asset Technical Field The application relates to the technical field of big data, in particular to a marking method of a data asset importance level, a marking device of the data asset importance level, a computer readable storage medium and a marking system of the data asset importance level. Background The data blood-margin map has huge data volume which can reach hundreds of millions of entity scale, and has important significance for the maintenance, management and protection of data assets. Evaluation of data asset importance for such scale maps requires extremely high human effort. Scheme one (full spectrum protection) is to perform equal-level security protection on all nodes in the data blood-edge map. For example, encryption and access control policies are deployed for all database tables. Scheme two (manual rule screening) manually circumscribing a subset of key assets based on predefined rules (e.g., data sensitive tags, access frequency) for protection, ignoring global dependencies. The prior art has the defects that the full graph protection needs to put in security resources for all data assets, and redundant overhead (such as storage encryption and authority management cost) is caused. Manual rule screening relies on subjective experience, cannot dynamically adapt to blood-cause changes, and is difficult to cover hidden but core derived assets (e.g., intermediate tables). The existing method can not find the minimum core asset set covering global dependence in a reasonable time, so that the protection range is overlarge or key nodes are omitted. Disclosure of Invention The application mainly aims to provide a marking method of a data asset importance level, a marking device of the data asset importance level, a computer-readable storage medium and a marking system of the data asset importance level, so as to at least solve the problems that a minimum core asset set covering global dependence cannot be found in a reasonable time, and the protection range is overlarge or key nodes are omitted in the prior art. In order to achieve the above object, according to one aspect of the present application, there is provided a marking method for importance level of a data asset, comprising extracting data processing link information from a metadata management system, and constructing a data blood-edge map according to the data processing link information, wherein the data blood-edge map is in the form of a directed graph, nodes of the data blood-edge map represent the data asset, edges of the data blood-edge map represent a data flow direction, solving an approximate minimum point coverage set for the data blood-edge map, marking importance level of the data asset according to the approximate minimum point coverage set, and displaying a marking result in a visual manner. Optionally, solving the approximate minimum point coverage set for the data blood-edge map comprises adopting an iterative algorithm to solve the approximate minimum point coverage set, wherein a protection set is empty at the beginning of the iterative algorithm, then selecting a node with the largest current degree for each iteration to add the protection set, and deleting all edges associated with the node from the data blood-edge map until no edges exist in the data blood-edge map. Optionally, in the process of solving the approximate minimum coverage set by adopting an iterative algorithm, the method further comprises a selection step of selecting a node currently associated with the most uncovered edges as a target node each time, a removal step of adding the target node into an initialized protection set and removing all edges associated with the target node from an edge set, and repeating the selection step and the removal step until the edge set is an empty set. Optionally, in the process of extracting data processing link information from a metadata management system and constructing a data blood-edge map according to the data processing link information, the method further comprises the steps of extracting characteristics of each link in the data blood-edge map by using a machine learning model, wherein the characteristics of each link comprise frequency of a data flow, data volume size, sensitivity of data and business association degree between upstream and downstream data assets, calculating weight of each link by using the machine learning model, determining scores of the links based on the weight of each link, and filtering irrelevant or redundant links according to the scores of the links. Optionally, in the process of solving the approximate minimum point coverage set for the data blood-edge map, the method further comprises the steps of constructing a target model, wherein the target model is a model for converting a minimum point coverage problem of the data blood-edge map into a linear programming problem, each node corresponds to a decision vari