CN-122019549-A - Mixed directory management method and system for trusted data space of automobile industry
Abstract
The invention discloses a mixed directory management method and system for a trusted data space of an automobile industry, and aims to solve the contradiction between the main right protection and the query efficiency of a pure centralized and pure federal directory architecture. The method is based on a mixed architecture of participating nodes, a centralized directory service and data consumers, desensitizes and differentially synchronizes the public layer metadata to the central directory through metadata acquisition preprocessing and automatic grading, finishes query request verification and standardization by combining DID identity authentication, intelligently selects a query path by a routing engine according to multi-dimensional cost, performs distributed fine-grained authorization on service/sensitive layer metadata, and finally aggregates multi-source result feedback. Experiments prove that compared with the pure federal mode query efficiency, the method has the advantages that the 1-2 orders of magnitude are improved, the synchronous bandwidth is reduced by 97% compared with the pure centralized mode, the balance between the efficient global discovery of data resources and the strict main right protection in the automobile industry is realized, and the method is suitable for large-scale collaborative scenes such as supply chain collaboration.
Inventors
- CHEN CHUAN
- CHENG XU
- LV WANG
- ZHU JUN
- JIA GUORUI
- WANG NA
- WU SHUYUE
- WANG GUAN
- LI JUNKAI
Assignees
- 中汽数据(天津)有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260413
Claims (10)
- 1. A hybrid catalog management method for an automotive industry trusted data space, characterized in that it is implemented based on a hybrid catalog system architecture comprising participating nodes, a centralized catalog service and data consumers, the method comprising the steps of: s1, collecting original metadata from an automobile industry related service system by a participating node, and generating a metadata object with a feature vector through standardized preprocessing; S2, a metadata grading module of the participating node carries out sensitivity measurement and calculation on the metadata object, the metadata object is divided into a disclosure layer, a service layer and a sensitive layer according to a preset threshold value, and disclosure layer abstract metadata is generated on the service layer or the sensitive layer metadata as required; S3, the participating nodes perform desensitization processing on the public layer metadata and the public layer abstract metadata, then the desensitized increment is synchronized to a centralized directory service by adopting a differential synchronization strategy, and the centralized directory service stores the desensitized increment into a public metadata pool and updates a global resource index to provide data support for global inquiry; S4, the data consumer submits a query request carrying a DID identity certificate through a unified query interface, the validity of the request signature is verified by the central directory service prior, and then legal original query sentences are normalized into a structured query expression, so that a unified analysis basis is provided for routing decisions; s5, a query routing engine of the centralized directory service performs hierarchical analysis on the structured query expression, calculates comprehensive costs of three routing modes, namely a centralized routing mode, a federal routing mode and a mixed routing mode by combining multidimensional factors such as query complexity, node number and access authority, and selects an optimal routing mode with the minimum comprehensive cost; S6, if the optimal routing mode relates to service layer or sensitive layer metadata access, the centralized directory service forwards the query request to a corresponding participating node, the participating node verifies the identity and the attribute set of the data consumer through a DID identity authentication module, then performs authorization judgment based on an attribute-based access control strategy, and only opens the access right of the corresponding hierarchical metadata for the request passing authorization; And S7, collecting query results from the centralized directory service and/or the authorized passing participating nodes according to the optimal routing mode, performing fusion processing of deduplication, sequencing and credibility weighting on the multi-source results, and finally returning the aggregated unified query results to the data consumers through a unified query interface.
- 2. The method for managing a hybrid catalog in an automotive industry-oriented trusted data space of claim 1, wherein S1 comprises, Aiming at unstructured/semi-structured data of different service sources, standardized processing is executed, including field analysis, uniform format, noise cleaning and structured extraction, so as to generate metadata objects in uniform format; the original metadata set is: (1) the characteristic structure is as follows: (2) Wherein the method comprises the steps of In order to participate in the numbering of the nodes, Is a node Acquired first The original metadata record of the strip is recorded, For the number of raw metadata collected; Is a node Is a set of original metadata of (1); the metadata standardization and feature extraction function comprises the steps of field analysis, denoising and structuring conversion; for original metadata The feature vector obtained after processing is composed of a plurality of feature dimensions and is used for subsequent sensitivity calculation and classification.
- 3. The method for managing a hybrid catalog in an automotive industry-oriented trusted data space of claim 2, wherein S2 comprises, The metadata grading module in the participating node carries out sensitivity measurement and calculation on each piece of metadata and determines grading by combining whether the metadata contains publicable information or not; classifying the metadata into three types, namely a disclosure layer, a service layer and a sensitive layer, wherein the classification is carried out by carrying out sensitivity calculation on the metadata feature vector and automatically making classification decisions according to a set threshold value; sensitivity scoring: (3) grading rule: (4) Wherein the method comprises the steps of Is characterized by Is the first of (2) A characteristic value; Mapping the feature value to a sensitivity contribution for a feature sensitivity mapping function; the sensitivity weight coefficient is used for reflecting the influence degree of different characteristics on sensitivity; score the overall sensitivity of the metadata; And The sensitivity threshold is the upper and lower bounds of the sensitivity threshold and is used for distinguishing the public layer, the service layer and the sensitive layer; is the final classification level for the metadata record.
- 4. The method for managing the hybrid catalog in the trusted data space for automobile industry as claimed in claim 3, wherein S3 comprises, Performing desensitization and field screening on metadata classified as a 'public layer', and only retaining summary information such as a non-sensitive identifier and a service label; then, the system adopts a differential synchronization strategy to only synchronize the updated or newly added metadata items compared with the last synchronization; the synchronized record enters a public metadata pool of the centralized directory service, and the global resource index is updated; Desensitization conversion: (5) Differential synchronization: (6) Wherein the method comprises the steps of For desensitization function, deleting or hashing sensitive field, only preserving the content of the metadata; recording the desensitized public layer metadata; For the node at time A disclosure layer metadata set generated at a moment; the time of last synchronization to the central directory for the node; incremental data to be uploaded for the synchronization is needed.
- 5. The method for managing a hybrid catalog in an automotive industry-oriented trusted data space of claim 4, wherein S4 comprises, The data user submits a query request through a unified query interface and carries DID identity credentials thereof, and the centralized directory service firstly verifies the signature of the query party to ensure that the request comes from a legal entity; signature verification: (7) Query normalization: (8) Wherein the method comprises the steps of A public key of a party for data use; a digital signature of the query request for the data consumer; is the original query expression; attribute sets for the user; a digital signature verification function for verifying the validity of the request body; Converting the input query into a unified format for the query normalization function; is a normalized standard query expression.
- 6. The method for managing a hybrid catalog in an automotive industry-oriented trusted data space of claim 5, wherein S5 comprises, Query routing engine analysis normalized query Judging whether only metadata needs to be disclosed, whether the metadata relates to a service layer or sensitive information, and selecting a centralized query path, a federal query path and a mixed query path among three query paths by combining factors such as query complexity, node number and access authority; cost function: (9) Optimal routing: (10) Wherein the method comprises the steps of For candidate route patterns, include In the case of a central directory query mode, In order to be in the federal node query mode, Is a hybrid query mode; Is a normalized query expression; in the routing mode Down execution query Is a pre-estimated delay of (1); Coverage rate of the routing mode to the query requirement; And The method is used for controlling the influence proportion of efficiency and coverage in routing as weight parameters; In the mode of Down execution query Is a comprehensive cost of (1); optimal query routing patterns to minimize cost.
- 7. The method for managing a hybrid catalog in an automotive industry-oriented trusted data space of claim 6, wherein S6 comprises, For the inquiry request of the metadata of the service layer or the sensitive layer to be accessed, the centralized directory service forwards the inquiry to the related participating nodes, the DID identity authentication module at the node side verifies the identity of the inquiry initiator and carries out authorization judgment according to the local access strategy, and only the record passing through the authorization can be returned, thereby realizing the distributed and fine-grained access control; access policy function: (11) And (3) authorization judgment: (12) Wherein the method comprises the steps of To aim at resource Performing authorization judgment based on the attribute; Attribute sets for users, including information such as roles, organizations, service domain licenses; returning permission/rejection for policy matching of user attributes; Metadata recording for participating nodes And after the authorization is passed, the node returns the corresponding service layer or sensitive layer metadata.
- 8. The method for managing a hybrid catalog in an automotive industry-oriented trusted data space of claim 7, wherein S7 comprises, According to the query route type, the system combines the query results from the central directory and a plurality of participating nodes, performs the operations of de-duplication, sequencing and credibility weighting on records of different sources, and uniformly returns the final result to a data user; federal result merging: (13) Mixed mode result aggregation: (14) Wherein the method comprises the steps of For coming from participating nodes Is a query result set; A set of query results from a plurality of participating nodes; A set of query results from a central directory; the method is a set union operation and is used for summarizing federal results; the result fusion function in the mixed mode comprises the processes of de-duplication, sequencing and relativity scoring; and finally returning the result set to the data user.
- 9. A hybrid catalog management system for an automotive industry trusted data space, for implementing the hybrid catalog management method for an automotive industry trusted data space of any one of claims 1-8, characterized by: the system comprises participating nodes, a centralized directory service and data consumers; the participating node deploys a local federal directory, a metadata grading module and a DID identity authentication module, the centralized directory service deploys a global resource index, a public metadata pool, a query routing engine and a policy decision point, and the data consumer configures a unified query interface and a DID identity credential.
- 10. The automotive industry trusted data space oriented hybrid catalog management system of claim 9, wherein: A local federal directory for storing locally complete service metadata for the data provider, the metadata containing sensitive information and maintained only within the local node; the metadata grading module is used for automatically grading the local business metadata and dividing the local business metadata into a disclosure layer, a business layer and a sensitive layer so as to realize the classified management of the metadata according to the sensitive grade; The DID identity authentication module is used for carrying out identity verification on a data consumer initiating a data access request and completing access authority authorization and authentication according to a preset rule; The global resource index is used for uniformly maintaining the data resource information and the data position information of all the participating nodes of the whole network to form a global retrievable resource catalog; The public metadata pool is used for gathering and storing public layer metadata synchronized from each participating node, does not contain sensitive information and supports global public inquiry; The query routing engine is used for intelligently selecting an execution path of central query or federal query according to the complexity, data distribution and authority conditions of the current query request; The policy decision point is used for making and executing a globally unified access control policy and judging the authority and controlling the policy of the data access of the cross-node; The unified query interface is used for providing a standardized and unified data resource query entry to the outside and supporting a data consumer to search and access global data resources; DID identity certificate, which is used as the digital identity of the data consumer to submit identity authentication, request signature and authority verification to the system.
Description
Mixed directory management method and system for trusted data space of automobile industry Technical Field The invention relates to the technical field of trusted data space construction, in particular to a hybrid directory management method and system for a trusted data space of the automobile industry. Background In the process of constructing a trusted data space in the automobile industry, the data directory system is used as a core infrastructure for realizing data resource discovery, positioning and access, and the architecture design of the data directory system is directly related to the circulation efficiency and the main authority guarantee of data elements. Currently, the main stream scheme mainly adopts a fully centralized or fully federated directory architecture. The centralized directory can provide efficient global inquiry, but requires the participants to upload complete metadata, and has risks of data ownership loss, single-point fault, performance bottleneck and the like, while the pure federal directory can guarantee data local control, but has challenges of low inquiry efficiency, insufficient global visibility and the like in large-scale and multi-node industrial ecology. With the rapid development of business scenes such as intelligent network-connected automobiles, supply chain collaborative manufacturing and the like, the automobile industry provides higher requirements on the real-time performance, safety and collaborative efficiency of data sharing. The existing directory system with a single architecture is difficult to simultaneously meet the two core requirements of high-efficiency global discovery and strict data ownership, and the large-scale application of the data space in a complex industrial environment is severely restricted. There are mainly two typical modes of conventional schemes: One is a fully centralized directory service, requiring all participants to uniformly register metadata to a central node. Although the method can realize quick retrieval, the method has single-point fault risk, and the problem of low registration will generally exists due to the concern of data master rights of the participants, so that the directory coverage is insufficient. And secondly, completely federating directory inquiry, wherein each node independently maintains the directory, and the inquiry request needs to be broadcast to all nodes. Although the method guarantees the data ownership, the query delay is obviously increased when the number of nodes is increased, and efficient global statistics and resource discovery cannot be supported, so that the requirements of a large-scale and real-time collaborative scene in the automobile industry are difficult to meet. Disclosure of Invention The invention provides a hybrid catalog management method and a system for a trusted data space in the automobile industry, which can at least solve one of the technical problems in the background technology. In order to achieve the above purpose, the present invention adopts the following technical scheme: a method for managing the mixed catalogue of the trusted data space of automobile industry includes such steps as executing the following steps by computer, Based on a hybrid directory system architecture comprising participating nodes, a centralized directory service, and data consumers, the method comprises the steps of: s1, collecting original metadata from an automobile industry related service system by a participating node, generating a metadata object with a feature vector through standardized preprocessing, and providing a data base for subsequent metadata classification; S2, a metadata grading module of the participating node carries out sensitivity measurement and calculation on the metadata object, the metadata object is divided into a public layer, a service layer and a sensitive layer according to a preset threshold value, the metadata of the service layer or the sensitive layer is generated into summary metadata of the public layer according to the requirement, and a range is defined for metadata synchronization; s3, the participating nodes execute desensitization processing on the public layer metadata and the public layer abstract metadata, then the desensitized incremental public layer metadata is synchronized to a centralized directory service by adopting a differential synchronization strategy, and the centralized directory service stores the incremental public layer metadata into a public metadata pool and updates a global resource index to provide data support for global inquiry; S4, the data consumer submits a query request carrying a DID identity certificate through a unified query interface, the validity of the request signature is verified by the central directory service prior, and then legal original query sentences are normalized into a structured query expression, so that a unified analysis basis is provided for routing decisions; S5, the query routing engine of the cen