CN-122019488-A - Data optimized storage method and big data system based on artificial intelligence
Abstract
The invention discloses a data optimized storage method and a big data system based on artificial intelligence, which relate to the technical field of data storage and comprise the steps of obtaining files, classifying the files into pictures, documents and videos, and extracting the characteristics of the files by using different methods; through a multi-modal learning technology, semantic association among pictures, documents and videos is mined, and an association graph is constructed; compressing, storing in layers and optimizing indexes according to semantic association and association graphs; the method and the device realize the fusion storage of pictures, documents and videos, and facilitate the inquiry of subsequent files by intelligently analyzing the data characteristics and mining semantic association between the data.
Inventors
- YOU JIAN
- LI YILIANG
Assignees
- 深圳极联信息技术开发有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251219
Claims (9)
- 1. The data optimizing and storing method based on artificial intelligence is characterized by comprising the following steps: s1, obtaining a file, classifying the file into pictures, documents and videos, and extracting the characteristics of the file by using different methods; S2, mining semantic association among pictures, documents and videos through a multi-mode learning technology, and constructing an association graph; s3, compressing, storing in a layering mode and optimizing indexes according to semantic association and an association graph; S4, storing the data, monitoring the storage state in real time, collecting and storing performance indexes, and optimizing the model according to the performance indexes.
- 2. The method for optimizing and storing data based on artificial intelligence according to claim 1, wherein the step of obtaining the file, classifying the file into a picture, a document and a video, and extracting the characteristics of the file by using different methods S1 comprises the steps of identifying the obtained file, classifying the file into a text document, a picture and a video, and cleaning the file to remove abnormal data; And extracting visual characteristics of the picture by using a convolutional neural network, extracting semantic characteristics of the document by using a natural language processing technology, and extracting space-time characteristics of the video based on a model of a transducer.
- 3. The artificial intelligence based data optimizing storage method of claim 2, wherein mining semantic associations between pictures, documents and videos and constructing an association graph S2 by a multi-modal learning technique comprises mapping features of pictures, documents and videos to the same embedded space using a contrast learning method to construct positive and negative pairs of samples; Optimizing parameters of the model by minimizing the distance between positive sample pairs and maximizing the distance between negative sample pairs, so that samples of different modes can be mapped to an embedding space with consistent semantics; inputting the preprocessed and feature extracted pictures, documents and video data into a contrast learning model, continuously adjusting parameters by the model, learning how to map features of different modes to the same embedding space, mining semantic association between the features, and obtaining optimized embedding space and model parameters through repeated iterative training until the model converges; taking the pictures, the documents and the videos as nodes of the pictures, wherein the feature vector of each node is the feature vector mapped to the same embedded space through a comparison learning model; calculating similarity between two node feature vectors using cosine similarity; Setting a threshold value, and adding an edge between two nodes according to the calculated semantic association strength, wherein the weight of the edge is the semantic association strength when the semantic association strength is greater than the threshold value; the graph data structure is used for storing information of nodes and edges, the graph is optimized, and the constructed association graph is visually displayed by using graph visualization tools.
- 4. The method for optimizing and storing data based on artificial intelligence according to claim 3, wherein the step S3 of compressing, hierarchically storing and optimizing index data according to semantic association and association graph comprises selecting an optimal compression algorithm according to data characteristics and association strength, wherein the selection method of the compression algorithm comprises the steps of jointly encoding text characteristics of a document and visual characteristics of a picture if association strength between data is high and independently compressing each data type if association strength between data is low by using a prediction-based compression algorithm if association strength between data is high; according to the access frequency and importance of the data, the data is distributed to a hot storage medium, a warm storage medium or a cold storage medium, wherein the hot storage medium has a rapid read-write speed and high cost, the warm storage medium has a medium read-write speed and medium cost, the cold storage medium has a low read-write speed and low cost, and the distribution method comprises the following steps: An access frequency weight W f and a data importance weight W i are defined, For each data object, respectively giving scores according to the access frequency and the importance of the data object, wherein the access frequency is divided into high frequency, medium frequency and low frequency, the importance is divided into high frequency, medium frequency and low frequency, and the comprehensive score calculation formula is as follows: ; Wherein S is a composite score, and S f is an access frequency score S i which is a data importance score; based on the association graph, generating a multi-modal index, and optimizing the data retrieval efficiency.
- 5. The method for optimizing and storing data based on artificial intelligence of claim 3, wherein the multi-modal index constructing step comprises obtaining feature vector of each node from the association graph; Storing the feature vector and the similar node list of each node in an index, storing the nodes and the edges in a database, and storing the edge weights as attributes; The edge weight W e is calculated by cosine similarity: ; wherein A and B are feature vectors of two nodes respectively, and A and B are norms of the vectors; the method for optimizing the data retrieval efficiency comprises the steps of clustering nodes by using a clustering algorithm, reducing the retrieval range, rapidly positioning related nodes according to query semantics through a correlation diagram, and sequencing by using edge weights.
- 6. The method for optimizing data storage based on artificial intelligence according to claim 4, wherein the storing data, monitoring the storage state in real time, collecting the storage performance index, and creating the data in a hierarchical manner according to the generated strategy by compressing, hierarchically storing and indexing according to the performance index optimizing model S4; monitoring the storage state in real time, and collecting performance indexes; Establishing an I/O workload analysis model, and identifying performance bottlenecks by analyzing the type, mode and characteristics of the workload; taking the collected performance index data as input, inputting the input data into an optimization model, learning and adjusting the model according to the data, and optimizing and adjusting a storage system according to an output result of the model; And evaluating whether the performance of the optimized storage system is improved, continuously monitoring the performance of the storage system, ensuring that the optimization effect is maintained, and further adjusting the model according to new data feedback.
- 7. An artificial intelligence based data optimizing storage big data system based on the artificial intelligence based data optimizing storage method of any one of claims 1-6, characterized in that it comprises, The feature extraction module is used for identifying the format of the acquired file, preprocessing the file and extracting the features of the preprocessed file; the association module is used for establishing association among the files aiming at the extracted characteristics and setting up an association graph; the storage module compresses data, performs layered storage according to the importance of the data, and stores the data in a related manner; and the monitoring module is used for carrying out storage monitoring on the data, collecting storage performance indexes and carrying out storage optimization according to the indexes.
- 8. Computer device comprising a memory and a processor, said memory storing a computer program, characterized in that said processor, when executing said computer program, implements the steps of the artificial intelligence based data optimizing storage method according to any one of claims 1-6.
- 9. Computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the artificial intelligence based data optimizing storage method according to any one of claims 1-6.
Description
Data optimized storage method and big data system based on artificial intelligence Technical Field The invention relates to the technical field of data storage, in particular to an artificial intelligence-based data optimal storage method and a big data system. Background With the rapid development of information technology, the data volume is increased explosively, especially multimedia data such as pictures, documents and videos, and the traditional storage method generally stores different types of data separately, so that the storage efficiency is low, the retrieval is difficult and the cost is high, in addition, rich semantic association exists between the different types of data, however, the conventional method cannot fully utilize the correlations to perform optimal storage, so that a storage method capable of fusing different types of data is urgently needed to improve storage efficiency, reduce cost and improve accessibility and manageability of the data. Disclosure of Invention Aiming at the defects in the prior art, the invention provides an intelligent service robot management and control system based on data analysis, which comprises the following steps of S1, obtaining files, classifying the files into pictures, documents and videos, and extracting the characteristics of the files by using different methods; S2, mining semantic association among pictures, documents and videos through a multi-mode learning technology, and constructing an association graph; s3, compressing, storing in a layering mode and optimizing indexes according to semantic association and an association graph; S4, storing the data, monitoring the storage state in real time, collecting and storing performance indexes, and optimizing the model according to the performance indexes. Further, the step S1 of obtaining the file and classifying the file into pictures, documents and videos by utilizing different methods comprises the steps of identifying the obtained file and classifying the file into text documents, pictures and videos, and cleaning the file to remove abnormal data; And extracting visual characteristics of the picture by using a convolutional neural network, extracting semantic characteristics of the document by using a natural language processing technology, and extracting space-time characteristics of the video based on a model of a transducer. Further, the mining of semantic association between pictures, documents and videos and the construction of an association graph S2 by a multi-modal learning technology comprises the steps of mapping features of the pictures, the documents and the videos to the same embedded space by using a contrast learning method, and constructing a positive sample pair and a negative sample pair; Optimizing parameters of the model by minimizing the distance between positive sample pairs and maximizing the distance between negative sample pairs, so that samples of different modes can be mapped to an embedding space with consistent semantics; inputting the preprocessed and feature extracted pictures, documents and video data into a contrast learning model, continuously adjusting parameters by the model, learning how to map features of different modes to the same embedding space, mining semantic association between the features, and obtaining optimized embedding space and model parameters through repeated iterative training until the model converges; taking the pictures, the documents and the videos as nodes of the pictures, wherein the feature vector of each node is the feature vector mapped to the same embedded space through a comparison learning model; calculating similarity between two node feature vectors using cosine similarity; Setting a threshold value, and adding an edge between two nodes according to the calculated semantic association strength, wherein the weight of the edge is the semantic association strength when the semantic association strength is greater than the threshold value; the graph data structure is used for storing information of nodes and edges, the graph is optimized, and the constructed association graph is visually displayed by using graph visualization tools. Further, the data compression, hierarchical storage and index optimization S3 according to the semantic association and the association graph comprises selecting an optimal compression algorithm according to the data characteristics and the association strength, wherein the selection method of the compression algorithm comprises the steps of using a prediction-based compression algorithm if the association strength between the data is high, jointly encoding the text characteristics of the document and the visual characteristics of the picture if the association strength between the document content and the picture content is high, and respectively independently compressing each data type if the association strength between the data is low; according to the access frequency and importance of the data, th