CN-122018816-A - Cloud computing platform data storage method based on big data analysis
Abstract
The invention discloses a cloud computing platform data storage method based on big data analysis, which relates to the technical field of cloud computing data storage, solves the technical problems of homogenization of traditional cloud computing storage system resource allocation, overlarge hot spot data access delay, cost waste caused by high-performance resources occupied by cold data, lack of a dynamic sensing and self-adaptive scheduling mechanism of data heat and unbalanced load distribution of storage nodes, according to the invention, a full-flow technical system of 'storage resource grading division, big data access frequency analysis, dynamic grading storage, real-time migration scheduling and load balancing calibration' is constructed, and compared with a traditional cloud computing unified storage mode, the method not only realizes low-time-delay and high-speed access of high-frequency hot spot data, performance and cost balance of medium-frequency hot spot data and low-cost storage of low-frequency cold spot data, but also enables storage resources to be always and accurately matched with data heat through a big data real-time analysis and self-adaptive scheduling mechanism.
Inventors
- ZHOU GUANGYUN
- Xiao Junda
Assignees
- 宁德极臻科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260410
Claims (6)
- 1. The cloud computing platform data storage method based on big data analysis is characterized by comprising the following steps of: Dividing storage resource grades into a high grade S 1 , a medium grade S 2 and a low grade S 3 according to input and output performance, the number of copies, bandwidth and redundancy strategies of a cloud computing platform, wherein the S 1 adopts a strong redundancy strategy of 3 copies and more, the S 2 adopts a balanced redundancy strategy of 2 copies, and the S 3 adopts a redundancy strategy of 1 copy or erasure code, wherein the S 1 is used for high-frequency hot spot data, the S 2 is used for medium-frequency Wen Dianshu data, and the S 3 is used for low-frequency cold spot data; Collecting and calculating data access frequency, namely collecting access logs of the whole data objects in real time through a big data collecting module, counting access times according to a sliding time window, and calculating data heat by adopting an average access frequency formula; Step three, data heat grading judgment, namely setting a high-frequency threshold F H and a low-frequency threshold F L , dividing the data into hot spot data, temperature spot data and cold spot data, and binding heat labels which can be updated in real time for each data object; Step four, according to the data heat label and the corresponding relation between the heat grade and the storage resource grade, the cloud computing platform central controller automatically stores hot spot data into S 1 , stores the hot spot data into S 2 and stores cold spot data into S 3 ; refreshing the average access frequency with a fixed period, calculating the absolute value of the change of the current heat value and the historical heat value, and automatically triggering data migration across storage grades when the heat change amplitude reaches a preset frequency change threshold value so as to adapt the data to the latest heat grade; Step six, load balancing and calibrating storage nodes, namely calculating the load rate of each level of nodes, wherein the node load rate L j is obtained by weighting calculation of CPU utilization rate, disk input and output utilization rate, network bandwidth utilization rate and memory occupancy rate, when L j >L max is started, L max is a node load safety upper limit threshold preset by a system, and when the load exceeds the upper limit threshold, the data with the lowest access frequency in the level is migrated downwards by one level.
- 2. The cloud computing platform data storage method based on big data analysis of claim 1, wherein the storage resource level satisfies the requirements of high level S 1 node number > medium level S 2 node number > low level S 3 node number, and the higher the level is, the higher the number of input/output operations per second of a single node is, the more the number of copies is, the larger the access bandwidth is, and the lower the response delay is.
- 3. The cloud computing platform data storage method based on big data analysis of claim 1, wherein the average access frequency formula is F i = Wherein F i is the average access frequency of the ith data object; the method comprises the steps of setting a time window length of a sliding window, wherein the time window length is a preset constant, and the time window length is the number of accesses of the ith data in the T time unit.
- 4. The cloud computing platform data storage method based on big data analysis according to claim 3, wherein the data heat classification judgment rule is: hot spot data, F i ≥F H ; Temperature point data F L <F i <F H ; Cold spot data, F i ≤F L ; Wherein F H is a high frequency threshold and F L is a low frequency threshold.
- 5. The cloud computing platform data storage method based on big data analysis of claim 4, wherein the specific mode of data migration across storage levels is that DeltaF i =∣F i new-F i old-DeltaF Threshold value triggers migration when DeltaF Threshold value is a frequency change threshold, deltaF i is an absolute value of heat change, F i is a current period latest heat value, and F i old is a previous period historical heat value; The original cold point data or the hot point data is increased to hot point data and is migrated into S 1 ; the original hot spot data or the hot spot data is reduced to cold spot data and migrated into S 3 ; the original hot spot data is reduced to temperature point data and is migrated into S 2 ; The original cold spot data is raised to warm spot data and migrated into S 2 .
- 6. The cloud computing platform data storage method based on big data analysis of claim 5, wherein the live migration process follows the following collaborative priority rules: When the data is updated from cold point data or warm point data to hot point data, preferentially executing the operation of migrating into the high-grade S 1 storage node, and guaranteeing the high-speed access requirement of core business; When the high-grade S 1 and the medium-grade S 2 are insufficient in storage node resources or the load is over-limit, preferentially migrating cold point data in the nodes from the S 1 、S 2 node, and releasing high-performance storage resources; the data objects with the lowest access frequency are preferentially migrated in order of access frequency from low to high.
Description
Cloud computing platform data storage method based on big data analysis Technical Field The invention relates to the technical field of cloud computing data storage, in particular to a cloud computing platform data storage method based on big data analysis. Background Under the age background of comprehensive promotion of digital transformation, the data scale generated by each industry presents explosive growth situation, and scenes such as big data processing, artificial intelligent training, internet business service, enterprise core system and the like are highly dependent on storage and computing services provided by a cloud computing platform. The cloud computing platform has become a core carrier for carrying mass data, circulating data and supporting business by virtue of the advantages of elastic expansion, distribution according to needs and centralized management. The data storage system is used as a bottom-layer foundation support of the cloud computing platform, and the read-write performance, response time delay, resource utilization rate and storage cost of the data storage system directly determine the operation efficiency, service quality and platform overall competitiveness of the upper-layer service. The data storage scheme of the traditional cloud computing platform generally has the problems of single design thought and stiff resource scheduling. Most cloud storage systems adopt homogeneous storage architecture with unified hardware configuration, unified redundancy strategy and unified bandwidth allocation, and do not perform differential distinction and resource matching aiming at access frequencies, business importance and read-write characteristics of different data. In the actual operation process, hot spot data of high-frequency access, temperature spot data of medium-frequency access and cold spot data of long-term very little access are stored in storage nodes with the same performance level indiscriminately, so that a series of technical defects which are difficult to solve through simple optimization are caused, namely, firstly, the problems of input and output blocking, bandwidth resource preemption and read-write queuing delay rise are very easy to occur in a high-concurrency and large-flow access scene, service response is slow and user experience is lowered directly, service timeout, request failure and other stability problems are caused even in a flow peak period, secondly, a large amount of cold data which is not accessed for a long time, has no requirement on read-write performance continuously occupy high input and output operation times per second, multiple copies and high-bandwidth high-performance storage resources, serious redundancy of hardware resources is caused, equipment investment cost and machine room energy consumption cost are not matched seriously, thirdly, the existing storage system lacks real-time access behavior analysis capability based on the large data, continuous acquisition, quantitative calculation and dynamic sensing cannot be performed on the data access frequency, automatic execution of change, intelligent storage position, intelligent scheduling is realized according to the lifting change of the data, the overall performance is in a low-level storage node, the overall performance is not balanced, the high-level storage node is realized, the overall performance is in a low-level storage node is in a low-time load-state, the overall performance storage node is not can be balanced, and the high-level is in a low-level storage performance is low, and the storage node is in a low-level is in a load state is prone to be balanced, and can be stored in a low-level, and is in a low-level storage stage load stage is in a load stage storage stage is cannot be in a low-down, and can be in a load situation is in a low-load situation, and can be cannot be in a low. Aiming at a series of problems of unreasonable resource allocation, high hot spot delay, high cold data cost, lack of dynamic scheduling, unbalanced load and the like in the prior art, the invention provides a cloud computing platform data storage method based on big data analysis. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a cloud computing platform data storage method based on big data analysis, which is used for solving the technical problems of homogenization of the resource allocation of the traditional cloud computing storage system, overhigh hot spot data access delay, cost waste caused by high-performance resources occupied by cold data, lack of a dynamic sensing and self-adaptive scheduling mechanism of data heat and unbalanced load distribution of storage nodes. In order to achieve the purpose, the cloud computing platform data storage method based on big data analysis comprises the following steps of: The method comprises the steps of firstly, classifying storage resources into high-level S 1, medium-level S 2 and low-level S 3 according t