Search

CN-121979838-A - Unified management method and system for unstructured data

CN121979838ACN 121979838 ACN121979838 ACN 121979838ACN-121979838-A

Abstract

The invention belongs to the technical field of data processing. A unified management method and system for unstructured data includes creating file set containing attribute of tenant ID as unified container to realize isolation of multi-tenant data, mapping internal and external storage files to virtual path space of file set by means of subscription definition and virtual path mapping configuration of association definition, extracting analysis virtual path after access request is received, constructing complete storage path by means of file set ID and mapping configuration, verifying unified authority of file gateway layer and masking bottom layer difference by unified API, calling corresponding adapter to access bottom layer storage according to data source type finally to realize unified access of unstructured data, improving data management standardization and access efficiency, guaranteeing safety of multi-tenant data and adapting to multi-storage scene.

Inventors

  • LI YANG
  • SUN HAO
  • CHEN TONG
  • SUN LU
  • LI ZHONGZE
  • WU SHIWEI
  • WANG ZHAOQI
  • LI XINXIN
  • FENG YUE

Assignees

  • 山东亿云信息技术有限公司

Dates

Publication Date
20260505
Application Date
20260105

Claims (10)

  1. 1. The unified management method of unstructured data is characterized by comprising the following steps: Creating a file set as a unified container of unstructured data, wherein the file set comprises tenant ID attributes to support multi-tenant isolation, and file set data of different tenants are isolated from each other; Mapping files stored in the platform and files in a storage system registered outside to a virtual path space of the file set through virtual path mapping configuration, wherein the virtual path mapping configuration comprises two modes of subscription definition and association definition; Receiving a file access request, extracting a virtual path from a request URI, analyzing the hierarchy of the virtual path to distinguish a basic virtual path and a real file path, searching a virtual path mapping configuration according to a file set ID and the basic virtual path, and combining the basic path and the real file path in the virtual path mapping configuration to construct a complete storage path; Executing unified authority verification at a file gateway layer, shielding storage difference at a bottom layer through a unified API interface provided by a file gateway module, and responding to the file access request based on a constructed complete storage path; And calling a corresponding data source adapter according to the data source type, and accessing the bottom storage system based on the constructed complete storage path to realize unified access to unstructured data.
  2. 2. The method for unified management of unstructured data according to claim 1, wherein, The attributes of the fileset also comprise a unique fileset identifier, a fileset name consisting of a Chinese name and an English code, an architecture ID and a sensitivity level.
  3. 3. The method for unified management of unstructured data according to claim 1, wherein, The subscription definition is used for mapping external storage system paths of an object storage source or a service data source to virtual paths in a file set, and the association definition is used for directly associating single files to the file set.
  4. 4. The method for unified management of unstructured data according to claim 1, wherein, The format of the virtual path is/tenant ID/file set code/virtual path/file path, the first three layers after analysis are basic virtual paths, and the fourth layer and later are real file paths; When searching the virtual path mapping configuration, if the subscription definition is found, the storage configuration information is acquired according to the data source type and the ID, and then the basic path in the subscription definition and the real file path are combined to form a complete storage path.
  5. 5. The method for unified management of unstructured data according to claim 1, wherein, The unified API interface comprises three specific interfaces: The first type is a file list interface, which supports prefix filtering, relimeter separator, max-keys maximum return number and start-after paging cursor parameters and is used for returning a file list and a virtual path list; the second Type is a file downloading interface, which supports a range request and supports self-defined Content-Type and Content-position; the third type is a file information interface for returning metadata information.
  6. 6. The method for unified management of unstructured data according to claim 1, wherein, The unified authority verification mechanism comprises two layers of verification: The first layer verifies the application key, distributes a unique AppKey for each application, verifies whether the application has authority to access the designated file set or not through the AppKey, and simultaneously supports the universal AppKey for system internal call; The second layer is the file set level rights control, rights association is established between the file set and the application, and only the authorized application can access the corresponding file set.
  7. 7. A unified management system for unstructured data, comprising: a file set creation unit configured to create a file set as a unified container of unstructured data, the file set containing tenant ID attributes to support multi-tenant isolation, file set data of different tenants being isolated from each other; a path mapping configuration unit configured to map files stored inside the platform and files in the externally registered storage system to a virtual path space of the file set through a virtual path mapping configuration, wherein the virtual path mapping configuration comprises two modes of subscription definition and association definition; The path analysis construction unit is configured to receive a file access request, extract a virtual path from a request URI, analyze the hierarchy of the virtual path to distinguish a basic virtual path and a real file path, search a virtual path mapping configuration according to a file set ID and the basic virtual path, and construct a complete storage path by combining the basic path and the real file path in the virtual path mapping configuration; The gateway authority response unit is configured to execute unified authority verification at a file gateway layer, shield storage differences at a bottom layer through a unified API interface provided by the file gateway module, and respond to the file access request based on a constructed complete storage path; The data source adapting unit is configured to call the corresponding data source adapter according to the data source type, access the bottom storage system based on the constructed complete storage path and realize unified access to unstructured data.
  8. 8. A computer device comprises a processor and a computer-readable storage medium; A processor adapted to execute a computer program; A computer readable storage medium having stored therein a computer program which, when executed by the processor, implements a method of unified management of unstructured data according to any of claims 1 to 6.
  9. 9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded by a processor and to perform a unified management method of unstructured data according to any of claims 1 to 6.
  10. 10. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements a method for unified management of unstructured data according to any of claims 1 to 6.

Description

Unified management method and system for unstructured data Technical Field The invention relates to the technical field of data processing, in particular to a unified management method and system for unstructured data. Background The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art. With the rapid development of big data and cloud computing technology, enterprises and organizations face management challenges for massive unstructured data. Unstructured data is typically stored in the form of files, including documents, pictures, video, audio, and many types. In practical applications, these data are often stored in a plurality of different storage systems in a scattered manner, and management access is very complicated. Such as object storage systems (AmazonS 3, alicloud OSS, minIO, etc.) and file systems (local file systems, network File Systems (NFS), distributed file systems (HDFS), etc.). The method comprises the following steps of (1) storing unstructured data of different sources in different storage systems in a scattered manner, and lacking a unified organization and management mechanism to cause difficulty in searching and managing the data, (2) enabling different storage systems to provide different access interfaces and protocols, enabling an application program to write different access codes for the different storage systems, increasing development and maintenance cost, (3) enabling authority management to be complex, enabling the different storage systems to have different authority management mechanisms to be difficult to realize unified authority control and access audit, and (4) lacking a unified view to enable the application program to provide a unified data view, enabling the application program to know details of a bottom storage system, increasing complexity of the system, and enabling files required by an AI model to be scattered in various places, only enabling manual merging and not enabling a method to be intensively managed to form a unified data view. Disclosure of Invention In order to solve the defects of the prior art, the invention provides a unified management method and a system for unstructured data, which realize unified management and access of unstructured files of a platform and unstructured files externally registered to the platform by taking a file set as a unified container and combining a virtual path mapping mechanism and a unified file gateway. In order to achieve the above purpose, the present invention adopts the following technical scheme: in a first aspect, the present invention provides a method for unified management of unstructured data. A unified management method of unstructured data comprises the following steps: creating a file set as a unified container of unstructured data, wherein the file set comprises tenant ID attributes to support multi-tenant isolation, and file set data of different tenants are isolated from each other; Mapping files stored in the platform and files in a storage system registered outside to a virtual path space of a file set through virtual path mapping configuration, wherein the virtual path mapping configuration comprises two modes of subscription definition and association definition; Receiving a file access request, extracting a virtual path from a request URI, analyzing the hierarchy of the virtual path to distinguish a basic virtual path and a real file path, searching a virtual path mapping configuration according to a file set ID and the basic virtual path, and constructing a complete storage path by combining the basic path and the real file path in the virtual path mapping configuration; executing unified authority verification at a file gateway layer, shielding storage difference at a bottom layer through a unified API interface provided by a file gateway module, and responding to a file access request based on a constructed complete storage path; And calling a corresponding data source adapter according to the data source type, and accessing the bottom storage system based on the constructed complete storage path to realize unified access to unstructured data. In an implementation manner of the first aspect of the present invention, the attribute of the fileset further includes a fileset unique identifier, a fileset name composed of a chinese name and an english code, an architecture ID, and a sensitivity level. In one implementation of the first aspect of the present invention, the subscription definition is used to map external storage system paths of an object storage source or a business data source to virtual paths within a set of files, and the association definition is used to directly associate individual files to the set of files. In one implementation manner of the first aspect of the present invention, the format of the virtual path is/tenant ID/file set code/virtual path/file path, the first three layers after