Search

CN-121996165-A - Data security storage system based on big data analysis

CN121996165ACN 121996165 ACN121996165 ACN 121996165ACN-121996165-A

Abstract

The invention discloses a data security storage system based on big data analysis, which relates to the technical field of big data security storage and comprises a packaging slicing module, a feature analysis module, a clustering modeling module and a coding storage module. And the packaging slicing module performs hierarchical sequential slicing on the original data according to the data time continuity and the content similarity to generate a slicing set with the identification. The feature analysis module executes multi-scale feature detection, and self-adaptively adjusts the weights of the features through the dynamic weight distribution network to form weighted feature vectors. The clustering modeling module utilizes the self-organizing feature mapping network to cluster the feature vectors to generate feature cluster descriptions. And the code storage module executes security enhancement coding and integrity binding according to the clustering result to form a traceable security storage unit. The system can maintain the internal association among the data slices and dynamically optimize the processing weight according to the data characteristics, thereby improving the accuracy and the self-adaptive management capability of the safe storage.

Inventors

  • ZHONG YUFANG
  • ZHANG SHIHAO

Assignees

  • 深圳市百易信息科技有限公司

Dates

Publication Date
20260508
Application Date
20260123

Claims (10)

  1. 1. A data secure storage system based on big data analysis, comprising: The packaging slicing module is used for receiving the original data with the source tag and the time tag, performing structured packaging processing to generate a data packet set, layering and sequentially slicing according to the continuity of the data time tag and the similarity of the data content attribute, and generating a data slice set containing a layer identifier and a layer slice identifier; The feature analysis module is used for executing multi-scale feature detection on the data slice set, calculating the feature intensity value of each data slice on the time scale, the content scale and the structure scale, forming slice feature vectors fusing the multi-scale feature intensities, and constructing a dynamic weight distribution network, wherein the dynamic weight distribution network dynamically adjusts the weight coefficient of each scale feature according to the distribution mode and the change trend of the feature intensity value; the clustering modeling module is used for inputting the weighted slice feature vectors into the self-organizing feature mapping network, the self-organizing feature mapping network carries out iterative mapping and clustering on a competition layer according to the input vectors to form feature clusters, and cluster center vectors and cluster boundary descriptions are generated for each feature cluster; and the code storage module is used for executing security enhancement coding according to the corresponding relation between the characteristic cluster and the data slice set, generating a security enhancement data block, and executing integrity binding on the security enhancement data block to form a security storage unit with a complete traceability chain.
  2. 2. The big data analysis based data security storage system of claim 1, wherein the generating the set of data slices comprising the layer identification and the intra-layer slice identification comprises: receiving original data, wherein the original data is provided with a source tag and a time tag; Performing structured packaging processing on the original data, and converting the received original data into a data packet set with a preset packaging format; layering slicing is carried out on the data packet set, layering is carried out according to the continuity of the data time labels and the similarity of the data content attributes, layer identification is distributed to each data layer, sequential slicing is carried out in each data layer according to the data packet sequence identifiers, and a data slice set containing the layer identification and the intra-layer slice identification is generated; the step of performing structured packaging processing on the original data specifically includes: The encapsulation processing comprises adding a sequence identifier and an encapsulation verification identifier; analyzing the original data, and identifying the data structure and metadata information implicit in the original data; generating a unique sequence identifier for each data unit, the sequence identifier comprising coding information of the data source, the time of receipt, and the processing batch; The analyzed metadata information and the generated sequence identifier are packaged together to form a packaging head, and the original data content is packaged to form a data body; And calculating a combined hash value of the encapsulation head and the data body as an encapsulation verification identifier, and attaching the encapsulation head and the data body to the encapsulation tail to complete the construction of the data packet set, wherein each data packet comprises the encapsulation head, the data body and the encapsulation verification identifier.
  3. 3. The big data analysis based data security storage system of claim 2, wherein the performing hierarchical slicing for the set of data packets, in particular comprises: Scanning the data packet set, sorting and segmenting according to the time labels of the data packets, and dividing a plurality of continuous data packets with time continuity exceeding a preset threshold into a time layer; In each time layer, sub-layer division is carried out according to the similarity of the content attributes of the data packets, and the data packets with attribute similarity exceeding a preset similarity threshold are classified into the same content sub-layer; distributing a globally unique layer identifier for each time layer, and distributing a sub-layer identifier for each content sub-layer; and in each finally determined content sub-layer, cutting according to the sequence of the data packet sequence identifiers and the data quantity with fixed size to generate the data slice set with layer identification, sub-layer identification and intra-layer slice identification.
  4. 4. The big data analysis based data security storage system of claim 1, wherein the performing multi-scale feature detection on the set of data slices comprises: On a time scale, analyzing the distribution density and interval rule of time labels of data packets in a data slice, and calculating a time distribution entropy as a time scale characteristic intensity value; on the content scale, analyzing information entropy, keyword frequency distribution and statistical characteristics of data content in a data slice, and calculating content complexity as a content scale characteristic intensity value; On the structure scale, analyzing the packaging format consistency, the metadata integrity and the internal dependency relationship corresponding to the data slice, and calculating the structure standardization as the structure scale characteristic intensity value; and combining the time distribution entropy, the content complexity and the structure standardization corresponding to each data slice into a three-dimensional vector serving as the slice characteristic vector.
  5. 5. The data security storage system based on big data analysis of claim 4, wherein the constructing a dynamic weight distribution network specifically comprises: monitoring the change track of the slice characteristic vector corresponding to each data slice in a continuous time window; analyzing the correlation coefficient and the cooperative variation mode among the time distribution entropy, the content complexity and the structure standardization; dynamically generating a group of weight coefficients according to the stability of the change track and the dominance of the cooperative change mode, wherein the weight coefficients respectively correspond to a time scale, a content scale and a structure scale; And carrying out weighted fusion on the generated weight coefficient and the corresponding slice feature vector, and outputting a weighted slice feature vector.
  6. 6. The big data analysis based data security storage system of claim 1, wherein the inputting the weighted slice feature vector into the self-organizing feature mapping network specifically comprises: Initializing competitive layer nodes and connection weights of the self-organizing feature mapping network; Sequentially inputting the weighted slice feature vectors into a network, and calculating the similarity between the input vectors and weight vectors of all nodes of a competition layer; selecting the node with the highest similarity as a winning node, and adjusting the weight vector of the winning node and the nodes in the neighborhood thereof to enable the node to be closer to the input vector; after multiple iterations, the competitive layer nodes form feature mapping maintained by the topological structure according to the distribution of input vectors, and similar input vectors are mapped to the same or adjacent nodes to form the feature clusters; and calculating the mass centers of all input vectors in each feature cluster as the cluster center vector, and calculating the maximum distance from the vector in the cluster to the mass centers as the cluster boundary description.
  7. 7. The data security storage system based on big data analysis according to claim 6, wherein the performing security enhancement encoding according to the correspondence between the feature clusters and the set of data slices specifically comprises: establishing an index relation table from the data slice to the mapped characteristic cluster; extracting a common layer identifier of a plurality of data slices mapped to the same feature cluster; Aggregating slice feature vectors corresponding to a plurality of data slices with the same layer identification; And carrying out combined coding on the aggregated feature vector, a cluster center vector of the belonging feature cluster and cluster boundary description to generate the safety enhancement data block, wherein the safety enhancement data block is encoded with the feature information of the original slice and the attribution information of the original slice in a feature space.
  8. 8. The big data analysis based data secure storage system of claim 1, wherein the performing integrity binding on the security enhanced data block, in particular comprises: Calculating a content hash value for each security enhanced data block; Acquiring an original data slice set corresponding to the generated safety enhancement data block, and calculating the integral hash value of the original data slice set; The content hash value of the safety enhancement data block, the integral hash value of the corresponding original data slice set and the hash value of the previous safety storage unit are connected in series, and a new hash value after the series connection is calculated; And taking the new hash value as an integrity verification code, and storing the new hash value together with the security enhanced data block and the corresponding original data slice index information to form the security storage unit with the complete tracing chain, wherein each security storage unit is associated with the front unit and the rear unit through a hash chain.
  9. 9. The method for securely storing data based on big data analysis according to claim 5, wherein dynamically generating a set of weight coefficients according to the stability of the change track and the dominance of the collaborative change pattern comprises: obtaining a slice feature vector of each data slice in a continuous time window, calculating the variance of each scale feature intensity value in the window, and normalizing the reciprocal of the variance to be used as a first basic weight for measuring the variation stability of the corresponding scale feature intensity value; In a time window, calculating pearson correlation coefficients between a time scale and a content scale, between the time scale and a structure scale and between the content scale and the structure scale, selecting a scale corresponding to the correlation coefficient with the largest absolute value as a dominant scale, and distributing a fixed dominant weight addition for the dominant scale; Adding the first basic weight of each scale and the dominant weight to carry out weighted summation to obtain a group of initial weight coefficients; and carrying out softmax normalization processing on the initial weight coefficient, ensuring that the sum of the weight coefficients of all scales is 1, and finally outputting a dynamic weight coefficient group corresponding to the time scale, the content scale and the structure scale.
  10. 10. The method for securely storing data based on big data analysis according to claim 7, wherein aggregating slice feature vectors corresponding to a plurality of data slices having the same layer identification comprises: Screening all data slices mapped to the same feature cluster according to the established index relation table from the data slices to the feature cluster; extracting a subset of the data slices with the same layer identification from the screened data slices; For each data slice in the subset of data slices, reading its corresponding weighted slice feature vector; carrying out arithmetic average on all the read weighted slice feature vectors in the feature dimension, and calculating to obtain an average feature vector; and using the calculated average feature vector as an aggregate feature vector representing the subset of the data slices with the same layer identification for subsequent security enhancement coding.

Description

Data security storage system based on big data analysis Technical Field The invention belongs to the technical field of big data safety storage, and particularly relates to a data safety storage system based on big data analysis. Background Current big data secure storage systems commonly employ data slicing methods based on fixed size or predefined rules. This ignores the inherent temporal continuity of the data and semantic association of the content, resulting in the slicing result being presented as isolated blocks of data. When data retrieval, association analysis or integrity verification is carried out later, the system needs extra calculation overhead to reconstruct the logic relationship between the data, so that the processing efficiency is reduced, the security policy is difficult to be effectively combined with the context and structural characteristics of the data, and the intelligent level of storage management is limited. At the feature processing level, existing schemes typically extract data features from different dimensions, but use static weight coefficients for fusion or evaluation. This pattern of fixed weights cannot respond to differences in characteristic intensity distribution and trend of variation of the data stream on a temporal, content and structural scale. In the face of a large data environment with dynamic change, static characteristic fusion is difficult to accurately describe a data state, so that characteristic expression is misaligned, the accuracy of links such as subsequent clustering, anomaly detection and safe coding is directly affected, and the self-adaptive protection capability of a storage system is limited. What is needed is a slicing method capable of maintaining the inherent logical association of data, and a processing mechanism capable of dynamically adjusting the evaluation weights according to the data characteristics, so as to improve the adaptability and processing efficiency of the secure storage system to complex data characteristics. Disclosure of Invention The present invention aims to solve at least one of the technical problems existing in the prior art; To this end, the invention proposes a data security storage system based on big data analysis, comprising: The packaging slicing module is used for receiving the original data with the source tag and the time tag, performing structured packaging processing to generate a data packet set, layering and sequentially slicing according to the continuity of the data time tag and the similarity of the data content attribute, and generating a data slice set containing a layer identifier and a layer slice identifier; The feature analysis module is used for executing multi-scale feature detection on the data slice set, calculating the feature intensity value of each data slice on the time scale, the content scale and the structure scale, forming slice feature vectors fusing the multi-scale feature intensities, and constructing a dynamic weight distribution network, wherein the dynamic weight distribution network dynamically adjusts the weight coefficient of each scale feature according to the distribution mode and the change trend of the feature intensity value; the clustering modeling module is used for inputting the weighted slice feature vectors into the self-organizing feature mapping network, the self-organizing feature mapping network carries out iterative mapping and clustering on a competition layer according to the input vectors to form feature clusters, and cluster center vectors and cluster boundary descriptions are generated for each feature cluster; and the code storage module is used for executing security enhancement coding according to the corresponding relation between the characteristic cluster and the data slice set, generating a security enhancement data block, and executing integrity binding on the security enhancement data block to form a security storage unit with a complete traceability chain. Preferably, the generating a set of data slices including a layer identifier and an intra-layer slice identifier includes: receiving original data, wherein the original data is provided with a source tag and a time tag; Performing structured packaging processing on the original data, and converting the received original data into a data packet set with a preset packaging format; layering slicing is carried out on the data packet set, layering is carried out according to the continuity of the data time labels and the similarity of the data content attributes, layer identification is distributed to each data layer, sequential slicing is carried out in each data layer according to the data packet sequence identifiers, and a data slice set containing the layer identification and the intra-layer slice identification is generated; the step of performing structured packaging processing on the original data specifically includes: The encapsulation processing comprises adding a sequence identifier and a