CN-121996158-A - Large-scale data storage optimization method and system based on artificial intelligence
Abstract
The invention relates to the technical field of data storage, in particular to a large-scale data storage optimization method and system based on artificial intelligence. The method comprises the steps of obtaining large-scale data, carrying out multidimensional feature analysis and business influence evaluation based on artificial intelligence to obtain business influence degrees corresponding to all sub-data, obtaining access frequencies and access time intervals corresponding to all sub-data, carrying out importance level classification evaluation and storage equipment hierarchical mapping, optimally storing corresponding structured data and unstructured data layout in the large-scale data into corresponding storage equipment, obtaining disk I/O rates and storage utilization efficiency corresponding to the storage equipment, carrying out dynamic storage optimization adjustment, generating a dynamic storage strategy of the large-scale data equipment, executing corresponding equipment dynamic storage layout adjustment work, and improving corresponding storage access speed. The invention can improve the efficiency, reliability and space utilization rate of large-scale data storage.
Inventors
- GUO XIN
- JIN XIN
Assignees
- 西藏数联科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20251229
Claims (10)
- 1. The large-scale data storage optimization method based on artificial intelligence is characterized by comprising the following steps of: step S1, large-scale data is obtained, wherein the large-scale data comprises structured data and unstructured data, the unstructured data comprises image data, video data and text data, and multi-dimensional characteristic analysis is carried out on the large-scale data based on artificial intelligence to obtain large-scale multi-dimensional data characteristics; s2, acquiring access frequency and access time interval corresponding to each piece of sub data through the large-scale data, and carrying out importance level classification evaluation on each piece of sub data in the large-scale data based on service influence degree, access frequency and access time interval corresponding to each piece of sub data so as to obtain importance levels corresponding to each piece of sub data; Step S3, carrying out storage device layering mapping on each piece of sub data in the large-scale data based on the access frequency and the importance level corresponding to each piece of sub data to generate a storage mapping relation corresponding to each piece of sub data and the storage device; And S4, acquiring the corresponding disk I/O rate and storage utilization efficiency of the storage device, and carrying out dynamic storage optimization adjustment on the corresponding large-scale data on the storage device based on the disk I/O rate and the storage utilization efficiency to generate a dynamic storage strategy of the large-scale data device so as to execute the corresponding dynamic storage layout adjustment work of the device and improve the corresponding storage access speed.
- 2. The artificial intelligence based large scale data storage optimization method of claim 1, wherein step S1 comprises the steps of: step S11, large-scale data is acquired, wherein the large-scale data comprises structured data and unstructured data, and the unstructured data comprises image data, video data and text data; Step S12, data cleaning and standardization are carried out on the large-scale data so as to obtain large-scale standard data; Step S13, carrying out structural feature analysis on the corresponding structured data in the large-scale standard data to obtain the structural data features by carrying out statistical analysis on the field types, the field sizes and the distribution rules corresponding to the structured data; Step S14, combining the structured data features and the unstructured data features to obtain large-scale multidimensional data features; and step S15, carrying out service influence evaluation on each piece of sub data in the large-scale standard data based on the large-scale multidimensional data characteristics to obtain the service influence degree corresponding to each piece of sub data.
- 3. The artificial intelligence based large scale data storage optimization method of claim 2, wherein the artificial intelligence based unstructured feature analysis of the corresponding unstructured data in the large scale standard data comprises: Constructing a 5x5 convolution layer, a 3x3 convolution layer and a network architecture corresponding to a 1x1 pooling layer based on a convolution neural network, and performing image visual feature analysis on image data corresponding to unstructured data to analyze and extract color, texture and shape visual features corresponding to the image so as to obtain image visual features; Performing video dynamic feature analysis on the video data corresponding to the unstructured data by combining an optical flow method with a physical model of object movement corresponding to the video based on the cyclic neural network so as to analyze and extract a corresponding movement track and scene switching dynamic feature in the video and obtain video dynamic features; Carrying out grammar analysis and syntax analysis on the text data corresponding to the unstructured data, carrying out text semantic feature mining by combining a semantic network corresponding to a Transformer architecture, matching corresponding words and sentence patterns in the text data with nodes in the semantic network, and searching and mining corresponding keywords and topics in the text data through a relation path in the semantic network to obtain text semantic features; and giving different weights to each feature according to the corresponding correlation of different types of data, and organically fusing the visual features, the video dynamic features and the text semantic features of the image based on the weights to obtain unstructured data features.
- 4. The artificial intelligence based large-scale data storage optimization method according to claim 3, wherein the step S15 comprises the steps of: Step S151, carrying out structured business logic mining analysis on structured data in large-scale standard data based on corresponding structured data features in large-scale multidimensional data features and combining linear regression so as to obtain linear association coefficients between the structured features and the structured business logic; Step S152, carrying out unstructured business logic reasoning analysis on unstructured data in large-scale standard data based on corresponding unstructured data features in large-scale multidimensional data features and combining with nonlinear mapping reasoning to obtain nonlinear mapping indexes between the unstructured features and unstructured business logic, wherein the nonlinear mapping indexes comprise contribution degree, accuracy factors and integrity factors of the unstructured features to the business logic; And step 153, carrying out service influence evaluation on each piece of sub-data in the large-scale data by utilizing a service influence degree calculation formula based on a linear association coefficient between the structural features and the structural service logic or a nonlinear mapping index between the unstructured features and the unstructured service logic to obtain the service influence degree corresponding to each piece of sub-data.
- 5. The artificial intelligence-based large-scale data storage optimization method according to claim 4, wherein the business impact degree calculation formula in step S153 is specifically: for structured data: ; for unstructured data: ; in the formula, To the extent of business impact to which the structured data corresponds, To the extent of business impact corresponding to unstructured data, For the total number of structured data features to correspond to, Is the first The characteristics of the data in the individual structures, Is the first Linear correlation coefficients between individual structured data features and structured business logic, For the total number of unstructured data features, Is the first A feature of the data that is not structured, Is the first The accuracy factor between individual unstructured data features versus business logic, Is the first The integrity factor between individual unstructured data features versus business logic, Is the first The degree of contribution of individual unstructured data features to business logic.
- 6. The artificial intelligence based large scale data storage optimization method of claim 1, wherein step S2 comprises the steps of: s21, acquiring historical data access service records corresponding to all sub-data through the big rule data; Step S22, carrying out access frequency and interval statistical analysis on each piece of sub data in the big rule data based on the historical data access service record corresponding to each piece of sub data so as to obtain the access frequency and the access time interval corresponding to each piece of sub data; step S23, updating and quantizing each piece of sub data in the large-scale data based on the access frequency and the access time interval corresponding to each piece of sub data to obtain the update frequency corresponding to each piece of sub data; and step S24, carrying out importance level classification evaluation on each piece of sub-data in the large-scale data based on the business influence degree and the updating frequency corresponding to each piece of sub-data and combining a decision tree so as to obtain the importance level corresponding to each piece of sub-data.
- 7. The method according to claim 6, wherein the importance levels in step S24 are grade I, grade II, and grade III, wherein grade I corresponds to a business impact level of 0-39 and an update frequency of greater than or equal to once a month or a year, grade II corresponds to a business impact level of 40-79 and an update frequency of greater than once a day and less than once a month or a year, and grade III corresponds to a business impact level of 80-100 and an update frequency of less than or equal to once a day.
- 8. The artificial intelligence based large scale data storage optimization method of claim 1, wherein step S3 comprises the steps of: Step S31, carrying out storage device hierarchical mapping on each piece of sub data in the large-scale data based on the access frequency and the importance level corresponding to each piece of sub data, so that when the access frequency is several times per second and the importance level is II or above, the corresponding piece of sub data is mapped and stored on the corresponding solid state disk, and when the access frequency is several times per week and the importance level is I, the corresponding piece of sub data is mapped and stored on the corresponding mechanical hard disk, so as to generate a corresponding storage mapping relation between each piece of sub data and the storage device; And S32, carrying out storage layout optimization on the corresponding structured data and unstructured data in the large-scale data based on the corresponding storage mapping relation between each sub-data and the storage device, if the structured data is obtained, storing the corresponding structured data in adjacent space positions in the storage device by utilizing a clustering storage strategy so as to reduce the corresponding seek time of the storage device, and if the unstructured data is obtained, storing the corresponding unstructured data in the same space positions in the storage device according to the corresponding feature similarity of the unstructured data so as to facilitate quick retrieval and access.
- 9. The artificial intelligence based large scale data storage optimization method of claim 1, wherein step S4 comprises the steps of: S41, acquiring a disk I/O rate corresponding to a storage device; Step S42, obtaining the corresponding disk access refreshing times and data access frequency of the storage device, and carrying out utilization efficiency evaluation on the storage device based on the disk access refreshing times and the data access frequency to obtain the corresponding storage utilization efficiency of the storage device; Step S43, dynamically storing, optimizing and adjusting corresponding large-scale data on the storage device based on the disk I/O rate and the storage utilization efficiency, so as to predict potential faults corresponding to the storage device according to the corresponding disk I/O rate and the storage utilization efficiency, carrying out data migration and device maintenance in advance, dynamically adjusting the storage layout corresponding to the data, and dynamically monitoring the storage layout corresponding to the data on the mechanical hard disk, if the access frequency corresponding to the data on the mechanical hard disk is greatly increased, then migrating the data on the mechanical hard disk to the mechanical hard disk, and if the access frequency corresponding to the data on the solid hard disk is greatly reduced, generating a dynamic storage strategy of the large-scale data device, so as to execute the corresponding dynamic storage layout adjustment work of the device and improve the corresponding storage access speed.
- 10. An artificial intelligence based large scale data storage optimization system for performing the artificial intelligence based large scale data storage optimization method of claim 1, the artificial intelligence based large scale data storage optimization system comprising: The data business influence evaluation module is used for acquiring large-scale data, wherein the large-scale data comprises structured data and unstructured data, the unstructured data comprises image data, video data and text data, and multi-dimensional characteristic analysis is carried out on the large-scale data based on artificial intelligence to obtain large-scale multi-dimensional data characteristics; The importance level evaluation module is used for acquiring the access frequency and the access time interval corresponding to each piece of sub-data through the large-scale data, and carrying out importance level classification evaluation on each piece of sub-data in the large-scale data based on the service influence degree, the access frequency and the access time interval corresponding to each piece of sub-data so as to obtain the importance level corresponding to each piece of sub-data; The hierarchical mapping layout storage module is used for performing storage device hierarchical mapping on each piece of sub data in the large-scale data based on the access frequency and the importance level corresponding to each piece of sub data so as to generate a corresponding storage mapping relation between each piece of sub data and the storage device; The device dynamic storage optimization adjustment module is used for acquiring the disk I/O rate and the storage utilization efficiency corresponding to the storage device, dynamically storing, optimizing and adjusting the corresponding large-scale data on the storage device based on the disk I/O rate and the storage utilization efficiency, generating a large-scale data device dynamic storage strategy, executing the corresponding device dynamic storage layout adjustment work and improving the corresponding storage access speed.
Description
Large-scale data storage optimization method and system based on artificial intelligence Technical Field The invention relates to the technical field of data storage, in particular to a large-scale data storage optimization method and system based on artificial intelligence. Background With rapid development of information technology and arrival of large data age, data storage becomes a key link in information processing systems. Particularly, under the wide application of technologies such as cloud computing, internet of things, artificial intelligence and the like, the storage amount of data shows explosive growth. How to efficiently store and manage mass data has become a significant challenge for various industries. However, the conventional large-scale data storage method mainly depends on a distributed storage technology, a distributed file system or a database system is adopted to store data in a plurality of nodes in a scattered manner, when a storage device fails, data loss or access interruption occurs, so that the storage resource utilization rate is low, and meanwhile, for different types of data, such as structured data and unstructured data (images, videos, texts and the like), the conventional storage method adopts a uniform storage strategy, and the characteristics of various types of data cannot be fully considered, so that the storage space is seriously wasted. Disclosure of Invention Based on the foregoing, there is a need for an artificial intelligence-based large-scale data storage optimization method and system to solve at least one of the above-mentioned problems. In order to achieve the above object, an artificial intelligence-based large-scale data storage optimization method comprises the following steps: step S1, large-scale data is obtained, wherein the large-scale data comprises structured data and unstructured data, the unstructured data comprises image data, video data and text data, and multi-dimensional characteristic analysis is carried out on the large-scale data based on artificial intelligence to obtain large-scale multi-dimensional data characteristics; s2, acquiring access frequency and access time interval corresponding to each piece of sub data through the large-scale data, and carrying out importance level classification evaluation on each piece of sub data in the large-scale data based on service influence degree, access frequency and access time interval corresponding to each piece of sub data so as to obtain importance levels corresponding to each piece of sub data; Step S3, carrying out storage device layering mapping on each piece of sub data in the large-scale data based on the access frequency and the importance level corresponding to each piece of sub data to generate a storage mapping relation corresponding to each piece of sub data and the storage device; And S4, acquiring the corresponding disk I/O rate and storage utilization efficiency of the storage device, and carrying out dynamic storage optimization adjustment on the corresponding large-scale data on the storage device based on the disk I/O rate and the storage utilization efficiency to generate a dynamic storage strategy of the large-scale data device so as to execute the corresponding dynamic storage layout adjustment work of the device and improve the corresponding storage access speed. Further, step S1 includes the steps of: step S11, large-scale data is acquired, wherein the large-scale data comprises structured data and unstructured data, and the unstructured data comprises image data, video data and text data; Step S12, data cleaning and standardization are carried out on the large-scale data so as to obtain large-scale standard data; Step S13, carrying out structural feature analysis on the corresponding structured data in the large-scale standard data to obtain the structural data features by carrying out statistical analysis on the field types, the field sizes and the distribution rules corresponding to the structured data; Step S14, combining the structured data features and the unstructured data features to obtain large-scale multidimensional data features; and step S15, carrying out service influence evaluation on each piece of sub data in the large-scale standard data based on the large-scale multidimensional data characteristics to obtain the service influence degree corresponding to each piece of sub data. Further, the performing unstructured feature analysis on the unstructured data corresponding to the large-scale standard data based on the artificial intelligence comprises: Constructing a 5x5 convolution layer, a 3x3 convolution layer and a network architecture corresponding to a 1x1 pooling layer based on a convolution neural network, and performing image visual feature analysis on image data corresponding to unstructured data to analyze and extract color, texture and shape visual features corresponding to the image so as to obtain image visual features; Performing video dynamic