Search

CN-121996154-A - Cross-domain multi-data center storage management method oriented to computational and electrical fusion

CN121996154ACN 121996154 ACN121996154 ACN 121996154ACN-121996154-A

Abstract

The invention discloses a storage management method of a cross-domain multiple data center for computing and electrofusion, which comprises the steps of 1) carrying out cross-domain abstract modeling on data center storage resources distributed in different regions, namely distributing global unique identifications or paths for the data centers, abstracting storage nodes of the data centers into unified logic storage pools, generating a resource vector according to the storage capacity, computing power and power states of the data centers as ternary coupling virtualization representation of the data centers, 2) carrying out joint optimization on data storage based on power supply, network bandwidth and storage load, transferring task data which are determined to be high concurrent and high I/O load to the corresponding data centers, and reducing energy consumption through cache classification and I/O path optimization, and 3) carrying out consistency check on data copies among the data centers, and carrying out trend analysis on the load and power of the storage nodes by utilizing a prediction model based on time sequences, so as to realize fault tolerance and self-adaption restoration on the storage nodes.

Inventors

  • CHENG YAODONG
  • Kui Lichang
  • LI HAIBO
  • GU MINHAO
  • BAI YUNXIANG

Assignees

  • 四川天府新区宇宙线研究中心
  • 中国科学院高能物理研究所

Dates

Publication Date
20260508
Application Date
20251212

Claims (7)

  1. 1. A cross-domain multi-data center storage management method for arithmetic and electric fusion comprises the following steps: 1) The method comprises the steps of carrying out cross-domain abstract modeling on data center storage resources distributed in different regions, namely distributing global unique identifiers or paths for the data centers, abstracting storage nodes of the data centers into a unified logic storage pool for realizing cross-domain global addressing and transparent access, generating a resource vector according to the storage capacity, calculation power and power states of the data centers, and using the resource vector as ternary coupling virtualization representation of the data centers; 2) Based on power supply, network bandwidth and storage load, carrying out joint optimization on data storage, and migrating task data which are determined to be high concurrent and high I/O load to a corresponding data center so as to preferentially consume clean energy, and reducing energy consumption through cache classification and I/O path optimization; 3) The method comprises the steps of carrying out consistency check on data copies among data centers, repairing the data copies in a log playback or copy reconstruction mode when the data copies are inconsistent with corresponding main copy data, carrying out trend analysis on loads and electric power of storage nodes by using a time sequence-based prediction model, and automatically triggering the data copy reconstruction and/or I/O path switching on the storage nodes when a prediction result shows that the storage nodes have potential abnormality or electric power fluctuation exceeds a preset threshold value, so as to realize fault tolerance and self-adaptive repair on the storage nodes.
  2. 2. The method of claim 1, wherein the data storage is jointly optimized based on power supply, network bandwidth and storage load, and the method for migrating the task data determined to be high concurrent and high I/O load to the corresponding data center is to select an optimal data center with the minimum total energy consumption or the maximum green electricity consumption as an objective function according to set constraint conditions, and migrate the task data determined to be high concurrent and high I/O load to the optimal data center, wherein the constraint conditions comprise network bandwidth, storage load and data consistency.
  3. 3. The method of claim 2, wherein the set of scheduling periods is recorded with the goal of minimizing total energy consumption and compromising green power priority The data center is assembled into The objective function is minimized Wherein, the method comprises the steps of, Data center At each cycle The internal causes of non-green power consumption by storage and I/O tasks, To at each cycle Internal data center The green power is utilized to bear the energy consumption of the same task, The constraint conditions for solving the objective function include a) storage capacity constraint, i.e., in any one scheduling period Is arranged in a data center The sum of all object sizes on the data center cannot exceed the data center Maximum available capacity of (2) B) bandwidth constraint, i.e. at any one scheduling period In, data center The total bandwidth consumption of the actual bearer cannot exceed its maximum bandwidth upper limit C) delay constraints, i.e. each request Response time delay of (2) Cannot exceed the maximum delay upper limit 。
  4. 4. The method of claim 2, wherein the green electricity consumption maximization is a green electricity supply ratio increase magnitude of the data center greater than a set ratio threshold and an electricity price below a set threshold.
  5. 5. The method of claim 1,2 or 3, wherein the data migration of the tasks with high concurrency and high I/O load is determined to be migrated to the corresponding data center through a hierarchical migration mechanism and I/O request routes of the clients are dynamically adjusted, and the hierarchical migration mechanism is used for dividing the data into three stages of hot, warm and cold according to the access frequency and importance of the data, preferentially migrating the hot data so as to maximize energy efficiency benefit and guaranteeing the service quality of a frequently accessed path.
  6. 6. The method according to claim 1, wherein the method for consistency checking of the data copies among the data centers comprises the steps of selecting one data center responsible for processing the write request and generating the operation log, taking the data of the data center as main copy data and copying the main copy data to other data centers, periodically comparing the checksum of the main copy data with the checksum of each data copy, and carrying out consistency checking according to the checksum.
  7. 7. A method according to claim 1,2 or 3, characterized in that physical resource information and real-time status of storage nodes in the respective data centers are collected by a distributed metadata management system, said physical resource information comprising storage capacity, performance, type.

Description

Cross-domain multi-data center storage management method oriented to computational and electrical fusion Technical Field The invention belongs to the technical field of multi-data center storage management, and relates to a multi-data center storage system management method oriented to an arithmetic-electric fusion background. Background With the explosive growth of data scale in the fields of large scientific devices, high-energy physics and the like, the traditional computing power and storage center are subjected to the problems of high energy consumption, uneven regional distribution, network and power resource fracture and the like. How to fully utilize the green clean energy advantage in the western region and realize the cooperative scheduling of green computing power and a storage system becomes a key problem to be solved urgently. The existing multi-data center storage management technology has the defects in the aspects of uniform abstraction of cross-domain resources, energy efficiency driven storage scheduling and data consistency guarantee, and is difficult to meet the requirements of high efficiency, low carbon and stable operation in the context of electricity calculation fusion. Disclosure of Invention Aiming at the problems existing in the prior art, the invention aims to provide a storage management method of a cross-domain multi-data center for arithmetic and electric integration so as to realize uniform and abstract cross-domain resources, energy efficiency-aware storage scheduling, intelligent data consistency and operation and maintenance, thereby improving the green power utilization rate and enhancing the high concurrency service capability of a storage system. The method comprises the steps of carrying out cross-domain abstract modeling on data center storage resources distributed in different regions, carrying out ternary coupling virtualization representation on storage capacity, computing power and power states, constructing a unified naming space to realize global addressing, incorporating power supply, network bandwidth and storage load into joint optimization based on an energy efficiency perception scheduling mechanism, dynamically adjusting storage data placement, I/O path selection and data hierarchical migration, guaranteeing consistency and reliable synchronization of data copies among multiple data centers through a cross-domain consistency protocol and an intelligent operation and maintenance mechanism, and realizing fault tolerance and self-adaptive restoration based on AI prediction. The invention realizes high-efficiency energy consumption management, cross-domain coordination and high-reliability operation and maintenance in a multi-data center environment, and can effectively improve the green electricity consumption rate and support the high concurrent access requirement of a large scientific device. The technical scheme of the invention is a cross-domain multi-data center storage management method for arithmetic and electrofusion, which comprises the following steps: 1) The method comprises the steps of carrying out cross-domain abstract modeling on data center storage resources distributed in different regions, namely distributing global unique identifiers or paths for the data centers, abstracting storage nodes of the data centers into a unified logic storage pool for realizing cross-domain global addressing and transparent access, generating a resource vector according to the storage capacity, calculation power and power states of the data centers, and using the resource vector as ternary coupling virtualization representation of the data centers; 2) Based on power supply, network bandwidth and storage load, carrying out joint optimization on data storage, and migrating task data which are determined to be high concurrent and high I/O load to a corresponding data center so as to preferentially consume clean energy, and reducing energy consumption through cache classification and I/O path optimization; 3) The method comprises the steps of carrying out consistency check on data copies among data centers, repairing the data copies in a log playback or copy reconstruction mode when the data copies are inconsistent with corresponding main copy data, carrying out trend analysis on loads and electric power of storage nodes by using a time sequence-based prediction model, and automatically triggering the data copy reconstruction and/or I/O path switching on the storage nodes when a prediction result shows that the storage nodes have potential abnormality or electric power fluctuation exceeds a preset threshold value, so as to realize fault tolerance and self-adaptive repair on the storage nodes. Preferably, the method for migrating the task data with the high concurrency and the high I/O load to the corresponding data center based on the joint optimization of the power supply, the network bandwidth and the storage load comprises the steps of taking the minimum total e