CN-122020725-A - Data confidentiality method, device and storage medium

CN122020725ACN 122020725 ACN122020725 ACN 122020725ACN-122020725-A

Abstract

The invention provides a data confidentiality method, a device and a storage medium, which are characterized by generating a first storage path set and constructing an initial path mapping chain table by initializing a node distribution topological graph, executing fragment confusion on a data set to be stored, splitting a data block into data fragment units, distributing a storage path node sequence and a confusion mark sequence for each fragment, establishing a bidirectional index chain table, sending the fragments to a storage node according to a path, receiving a storage confirmation certificate and a node state snapshot, updating a node health state record, periodically traversing the node health state record, executing fragment repositioning when the node health state record is detected to be inconsistent with an expected state, generating a new sequence and iteratively replacing a chain table corresponding item, collecting the storage confirmation certificate and the state snapshot, generating an audit certificate sequence based on a change time axis and storing the audit certificate sequence in association with the bidirectional index chain table. The invention improves the confidentiality and reliability of data storage and realizes the audit traceability of the whole operation process.

Inventors

HAN YUNJIE
HU XIAONAN
XIAO QINGQUAN
JIN YOUGANG

Assignees

贵州大学
深圳市大乘科技股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260416

Claims (12)

1. A method of data security, comprising: Initializing a node distribution topological graph of the secret storage space, generating a first storage path set according to a node connection state in the node distribution topological graph, and constructing an initial path mapping linked list of the storage space based on the first storage path set; Executing a fragmentation confusion operation on a data set to be stored, splitting each data block into a plurality of data fragmentation units, distributing a storage path node sequence and a confusion mark sequence for each data fragmentation unit according to an initial path mapping linked list, and establishing a bidirectional index linked list between the storage path node sequence and the confusion mark sequence; Transmitting the data slicing units carrying the confusion mark sequences to the corresponding storage nodes according to the storage path node sequences, receiving the storage confirmation certificates and the node running state snapshots returned by each storage node, and updating the node health state records in the bidirectional index linked list according to the node running state snapshots; Periodically traversing the node health state records in the bidirectional index linked list, executing fragment repositioning operation when detecting that the node health state records are inconsistent with the pre-stored expected health states in the bidirectional index linked list, generating a new storage path node sequence and a new confusion mark sequence, and iteratively replacing corresponding entries in the bidirectional index linked list; Collecting all storage confirmation certificates and node running state snapshots, generating an audit certificate sequence of storage operation according to a change time axis of the node health state record, and storing the audit certificate sequence in association with a bidirectional index linked list.
2. The method of claim 1, wherein performing a sharding obfuscation operation on the set of data to be stored, splitting each data block into a plurality of data shards, allocating a storage path node sequence and an obfuscation flag sequence for each data shard according to an initial path mapping linked list, and establishing a bidirectional index linked list between the storage path node sequence and the obfuscation flag sequence, comprises: taking out a data block from a data set to be stored, cutting the data block into a plurality of continuous data slicing units according to the natural arrangement sequence of bytes in the data block, wherein each data slicing unit comprises a fixed number of bytes, and recording the initial offset position and the end offset position of each data slicing unit in an original data block; Extracting a node distribution topological graph in an initial path mapping linked list, acquiring node identifier lists of all storage nodes from the node distribution topological graph, and carrying out polling distribution on the data slicing units according to the node sequence in the node identifier list to generate a storage path node sequence corresponding to each data slicing unit; Allocating a confusion mark for each node in each storage path node sequence, wherein the confusion mark is determined by the node identifier of the node and the position serial number of the data slicing unit in the data block, and arranging all the confusion marks according to the sequence of the storage path node sequence to obtain a confusion mark sequence; Forward association is carried out on each node identifier in the storage path node sequence and the confusion mark at the corresponding position in the confusion mark sequence, a forward association record is generated, and the forward association record is written into a forward index area of the bidirectional index linked list; reversely associating each confusion mark in the confusion mark sequence with a node identifier at a corresponding position in the storage path node sequence to generate a reversely associated record, and writing the reversely associated record into a reversely index area of the bidirectional index linked list; And writing an index metadata block into the head of the bidirectional index linked list, wherein the index metadata block comprises a start address pointer of a forward index area, a start address pointer of a reverse index area and length information of the bidirectional index linked list, and completing construction of the bidirectional index linked list.
3. The data security method according to claim 2, wherein the extracting the node distribution topology map in the initial path mapping chain table, obtaining node identifier lists of all storage nodes from the node distribution topology map, performing polling allocation on the data slicing units according to the node sequence in the node identifier list, and generating the storage path node sequence corresponding to each data slicing unit, includes: Reading a topological structure descriptor of a node distribution topological graph from an initial path mapping linked list, wherein the topological structure descriptor comprises node identifiers of all storage nodes and a connection relation matrix between the nodes, and the connection relation matrix between the nodes is used for indicating whether a direct communication path exists between any two storage nodes; Ordering node identifiers of all storage nodes in the node distribution topological graph according to the sequence of adding the nodes into the storage space, and generating a node identifier list, wherein the length of the node identifier list is equal to the total number of the storage nodes; numbering the data slicing units according to the position serial numbers of the data slicing units in the data blocks, sequentially increasing the numbers from the initial serial numbers, dividing the numbers of the data slicing units by the length of the node identifier list to obtain remainder, and selecting the corresponding node identifiers in the node identifier list as the first storage node of the data slicing units according to the remainder; starting from a first storage node, sequentially selecting nodes which are adjacent to the current node and are not used by the current data slicing unit along a connection relation matrix in a node distribution topological graph, and generating a storage path node sequence of the data slicing unit, wherein the length of the storage path node sequence is determined by the importance level of the data slicing unit; Arranging the generated storage path node sequences of all the data slicing units according to the serial numbers of the data slicing units to obtain a storage path node sequence set corresponding to each data block, wherein the lengths of each sequence in the storage path node sequence set can be different; Recording the number of hops between adjacent nodes in each storage path node sequence, the number of hops being used to evaluate the cost of reassigning paths in subsequent shard relocation operations, and writing the number of hops to an additional record area of the doubly indexed linked list.
4. The data security method of claim 2, wherein forward associating each node identifier in the sequence of storage path nodes with a confusion tag at a corresponding location in the sequence of confusion tags, generating a forward associated record, and writing the forward associated record into a forward index region of the doubly indexed linked list, comprises: The first node identifier is taken out of the node sequence of the storage path, meanwhile, the first confusion mark is taken out of the confusion mark sequence, the node identifier is taken as a key, the confusion mark is taken as a value, and a first key value pair of the forward association record is constructed; according to the sequence of the storage path node sequence and the confusion mark sequence, sequentially forming key value pairs by each node identifier and the confusion mark at the corresponding position, and arranging all the key value pairs according to the sequence to form a forward association record, wherein the length of the forward association record is equal to that of the storage path node sequence; A record identifier is allocated to the forward associated record, wherein the record identifier consists of a block identifier of a data block and a slice identifier of a data slice unit together, and is used for uniquely identifying the forward associated record in a forward index area; Mapping the forward associated record to storage slot bits in a forward index region according to the hash value of the record identifier, the forward index region being divided into a plurality of storage slot bits, each storage slot bit for storing one or more forward associated records; After writing a forward associated record in a storage slot bit of a forward index area, updating an occupation mark of the storage slot bit, wherein the occupation mark is used for indicating whether the storage slot bit is fully written, and if the storage slot bit is fully written, selecting the next free storage slot bit to continue writing; After the writing of all forward associated records is completed, writing a check placeholder of the forward index area at the tail part of the forward index area, wherein the check placeholder is used for marking the end position of the forward index area, and recording the start address and the end address of the forward index area at the head part of the bidirectional index linked list.
5. The data security method according to claim 1, wherein the sending the data slicing unit carrying the confusion label sequence to the corresponding storage node according to the storage path node sequence, receiving the storage confirmation credential and the node operation state snapshot returned by each storage node, and updating the node health state record in the bidirectional index linked list according to the node operation state snapshot, includes: reading a storage path node sequence from the bidirectional index linked list, extracting a node identifier of a first storage node and a corresponding confusion mark from the storage path node sequence, packaging a data slicing unit and the confusion mark into a transmission data packet, wherein the transmission data packet comprises an original byte stream of the data slicing unit and a copy of the confusion mark; establishing a transmission connection with the first storage node, sending a transmission data packet to the first storage node through the transmission connection, and waiting for the first storage node to return a receipt confirmation message, wherein the receipt confirmation message is used for confirming that the first storage node has received the transmission data packet; Receiving a storage confirmation credential returned by the first storage node, wherein the storage confirmation credential comprises a node identifier of the first storage node, a receiving time stamp of the data slicing unit and a storage position pointer, and the storage position pointer is used for indicating a storage position of the data slicing unit on the first storage node; receiving a node running state snapshot returned by the first storage node, wherein the node running state snapshot comprises the current load level, the current available storage capacity and the node running time length of the first storage node, and the node running time length is used for evaluating the stability of the node; according to the current load level in the node operation state snapshot, updating a load field in a node health state record corresponding to a first storage node in the bidirectional index linked list, updating a capacity field according to the current available storage capacity, and updating a stability field according to the node operation time; And associating the updated node health status record with the storage confirmation certificate, generating a status update entry of the storage node, writing the status update entry into a node status log area of the bidirectional index linked list, wherein the node status log area is used for recording the status change history of each storage node.
6. The method of claim 5, wherein establishing a transmission connection with the first storage node, sending a transmission data packet to the first storage node via the transmission connection, and waiting for the first storage node to return a receipt acknowledgement message, the receipt acknowledgement message being used to confirm that the first storage node has received the transmission data packet, comprises: extracting a node identifier of a first storage node from the transmission data packet, and searching network address information of the first storage node in the node distribution topological graph according to the node identifier, wherein the network address information comprises a communication port and a protocol type of the first storage node; Initiating a connection establishment request with the first storage node by using the network address information, wherein the connection establishment request comprises a sender identity and a flag bit for requesting to establish connection, and the sender identity is used for enabling the first storage node to identify the source of the request; receiving a connection establishment response returned by the first storage node, wherein the connection establishment response comprises a data transmission channel identifier distributed by the first storage node, and the data transmission channel identifier is used for distinguishing different data transmission sessions on the connection; dividing a transmission data packet into a plurality of transmission fragments through a data transmission channel designated by a data transmission channel identifier, wherein each transmission fragment carries a fragment sequence number and a total fragment number, and sequentially transmitting the transmission fragments to a first storage node; after all the transmission fragments are sent, a transmission end mark is sent to the first storage node, and the transmission end mark comprises a fragment sequence number list of all the transmission fragments and the data length of each fragment and is used for the first storage node to check whether all the fragments are received; And receiving a receiving confirmation message returned by the first storage node, wherein the receiving confirmation message comprises a successfully received fragment sequence number list and a missing fragment sequence number list, if the missing fragment sequence number list is not empty, retransmitting a corresponding transmission fragment, and repeating the process until the missing fragment sequence number list in the receiving confirmation message is empty.
7. The method of claim 6, wherein the splitting the transmission data packet into a plurality of transmission fragments by the data transmission channel specified by the data transmission channel identifier, each transmission fragment carrying a fragment sequence number and a total fragment number, sequentially sending the transmission fragments to the first storage node, comprises: Extracting a starting offset position and an ending offset position of a data slicing unit carried in a transmission data packet, determining a byte interval range of the data slicing unit in an original data block according to the starting offset position and the ending offset position, and writing the byte interval range serving as an identity mark of the transmission data packet into a head of the transmission data packet; Generating a routing label of the transmission data packet according to the identity, wherein the routing label comprises a splicing sequence of a starting position and an ending position of a byte interval range, and binding the routing label with network address information of a first storage node to generate a routing mapping item, and the routing mapping item is used for indicating a forwarding path of the transmission data packet in a node distribution topological graph; Dividing a transmission data packet into a plurality of transmission fragments according to a preset fragment capacity, distributing a fragment sequence number for each transmission fragment, increasing the fragment sequence number from a starting number, and writing the fragment sequence number and a routing label into the fragment head of each transmission fragment together; Sequentially transmitting all transmission fragments through a data transmission channel, starting a fragment receiving waiting timer after each transmission fragment is transmitted, retransmitting the current transmission fragment if receiving confirmation of the fragment returned by the first storage node is not received before the fragment receiving waiting timer is overtime, and recording retransmission times in a retransmission count field of the fragment head; After all the transmission fragments are sent, a transmission end mark is sent to a first storage node, wherein the transmission end mark comprises a fragment sequence number list of all the transmission fragments, the byte length of each fragment and a copy of the identity of a transmission data packet; And receiving a receiving confirmation message returned by the first storage node, wherein the receiving confirmation message comprises a successfully received segment sequence number list and a missing segment sequence number list, if the missing segment sequence number list is not empty, retransmitting a corresponding transmission segment according to the missing segment sequence number list, and repeating the process until the missing segment sequence number list in the receiving confirmation message is empty.
8. The method of claim 1, wherein periodically traversing the node health record in the bi-directional index linked list, when it is detected that the node health record is inconsistent with an expected health state pre-stored in the bi-directional index linked list, performing a shard relocation operation, generating a new sequence of storage path nodes and a new sequence of confusion marks, and iteratively replacing corresponding entries in the bi-directional index linked list, comprises: Starting a node health state scanning task according to a preset time interval, traversing a node state log area in a bidirectional index linked list by the scanning task, and extracting the latest node health state record of each storage node, wherein the latest node health state record comprises the current values of a load field, a capacity field and a stable field; Reading expected health state records of each storage node from the head of the bidirectional index linked list, wherein the expected health state records are reference values which are pre-configured when initializing the secret storage space, and the reference values are used for judging whether the nodes are in a normal working state or not; Comparing the load field in the latest node health state record with the load reference value in the expected health state record, comparing the capacity field with the capacity reference value, comparing the stable field with the stable reference value, judging that the node health state is consistent if all three fields are matched with the reference value, and judging that the node health state is inconsistent if all three fields are not matched with the reference value; When the node health status of a certain storage node is not consistent, extracting a slicing identifier list of all data slicing units held by the storage node from a bidirectional index linked list, wherein the slicing identifier list is used for determining a data slicing unit set needing to be relocated; For each data slicing unit in the slicing identifier list, selecting other storage nodes except the storage node from the node distribution topological graph, generating a new storage path node sequence according to the order of the node load level from low to high, and regenerating a confusion mark sequence corresponding to the new storage path node sequence; writing the new storage path node sequence and the new confusion mark sequence into a temporary replacement area in the bidirectional index linked list, copying the contents of the temporary replacement area to the corresponding positions of the forward index area and the reverse index area after all the data slicing units needing to be relocated are processed, and completing the iterative replacement of the bidirectional index linked list.
9. The data security method according to claim 8, wherein for each data shard unit in the shard identifier list, selecting storage nodes other than the storage node from the node distribution topology, generating a new storage path node sequence in order of low-to-high node load level, and regenerating a confusion flag sequence corresponding to the new storage path node sequence, comprises: a fragmentation identifier of a data fragmentation unit is taken out from a fragmentation identifier list, and an original storage path node sequence of the data fragmentation unit is searched in a forward index area of a bidirectional index linked list according to the fragmentation identifier, wherein the original storage path node sequence comprises node identifiers of storage nodes with inconsistent states; Obtaining node load level snapshots of all storage nodes from a node distribution topological graph, wherein the node load level snapshots comprise the current task queue length of each storage node and the current network bandwidth occupancy rate, and the current task queue length is used for representing the busyness of the nodes; Removing storage nodes with inconsistent states from the storage node list, and sequencing the rest storage nodes according to the sequence from small to large of the current task queue length to obtain a sequenced candidate node list, wherein the node with the minimum current task queue length is ranked at the forefront of the candidate node list; Selecting a first storage node from the candidate node list as a first node of a new storage path node sequence, selecting a second storage node from the candidate node list as a second node of the new storage path node sequence, and so on to generate a new storage path node sequence, wherein the length of the new storage path node sequence is the same as that of the original storage path node sequence; Generating a new confusion mark for each node according to the node identifier of each node in the new storage path node sequence and the position serial number of the data slicing unit in the data block, wherein the calculation mode of the new confusion mark is consistent with that of the original confusion mark; And sequentially pairing all node identifiers in the new storage path node sequence with corresponding new confusion marks, generating a new forward association record and a new reverse association record, and storing the new forward association record and the new reverse association record in the temporary replacement area.
10. The data security method of claim 9, wherein sequentially pairing all node identifiers in the new sequence of storage path nodes with corresponding new confusion marks, generating a new forward association record and a new reverse association record, and storing the new forward association record and the new reverse association record in the temporary replacement area, comprises: the method comprises the steps of taking a first node identifier from a new storage path node sequence, simultaneously taking a first confusion mark from a new confusion mark sequence, taking the node identifier as a forward index key, and taking the confusion mark as a forward index value, and constructing a first forward entry of a new forward association record; According to the sequence of the new storage path node sequence and the new confusion mark sequence, sequentially forming each node identifier and the confusion mark at the corresponding position into forward entries, and arranging all the forward entries according to the sequence to form a new forward association record, wherein the length of the new forward association record is equal to that of the new storage path node sequence; writing the new forward associated record into a forward replacement storage area of the temporary replacement area, distributing a continuous storage space for the new forward associated record in the forward replacement storage area, and writing a fragmentation identifier of a data fragmentation unit corresponding to the forward associated record in the head of the continuous storage space; Taking out a first confusion mark from the new confusion mark sequence, simultaneously taking out a first node identifier from the new storage path node sequence, taking the confusion mark as a reverse index key and the node identifier as a reverse index value, and constructing a first reverse entry of a new reverse association record; According to the sequence of the new confusion mark sequence and the new storage path node sequence, sequentially forming each confusion mark and the node identifier of the corresponding position into reverse entries, and arranging all the reverse entries according to the sequence to form a new reverse association record, wherein the length of the new reverse association record is equal to that of the new confusion mark sequence; Writing the new reverse association record into a reverse replacement storage area of the temporary replacement area, allocating continuous storage space for the new reverse association record in the reverse replacement storage area, and writing a fragmentation identifier of a data fragmentation unit corresponding to the reverse association record into the head of the continuous storage space.
11. A data security device, comprising: The map initialization module is used for initializing a node distribution topological graph of the secret storage space, generating a first storage path set according to a node connection state in the node distribution topological graph, and constructing an initial path mapping linked list of the storage space based on the first storage path set; The system comprises a fragmentation confusion module, a data block management module and a data block management module, wherein the fragmentation confusion module is used for executing fragmentation confusion operation on a data set to be stored, splitting each data block into a plurality of data fragmentation units, distributing a storage path node sequence and a confusion mark sequence for each data fragmentation unit according to an initial path mapping linked list, and establishing a bidirectional index linked list between the storage path node sequence and the confusion mark sequence; The state updating module is used for sending the data slicing units carrying the confusion mark sequences to the corresponding storage nodes according to the storage path node sequences, receiving the storage confirmation certificates and the node running state snapshots returned by each storage node, and updating the node health state records in the bidirectional index linked list according to the node running state snapshots; The iteration updating module is used for periodically traversing the node health state records in the bidirectional index linked list, executing the fragment repositioning operation when detecting that the node health state records are inconsistent with the expected health states prestored in the bidirectional index linked list, generating a new storage path node sequence and a new confusion mark sequence, and iteratively replacing corresponding items in the bidirectional index linked list; The certificate generation module is used for collecting all storage confirmation certificates and node running state snapshots, generating an audit certificate sequence of the storage operation according to a change time axis of the node health state record, and storing the audit certificate sequence in association with the bidirectional index linked list.
12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.

Description

Data confidentiality method, device and storage medium Technical Field The present invention relates to the field of data processing technologies, and in particular, to a data security method, device, and storage medium. Background At present, a common data security storage technology generally adopts a mode of integrally storing data in a single storage node or storing the data in a plurality of storage nodes after simply splitting the data, completes the writing operation of the data according to a preset storage path in the storage process, and records storage position information after the storage is completed. However, the distribution of the storage paths in the storage mode is relatively fixed, when the running state of the storage nodes fluctuates, the distribution of the stored data is difficult to adjust in time, and the association relationship between the node state change and the data slicing in the storage process lacks an effective recording and tracing mechanism, so that the integrity and auditability of the storage operation are difficult to be ensured. Disclosure of Invention In view of the above, the present invention provides a data security method, device and storage medium. The technical scheme of the embodiment of the invention is realized as follows: in one aspect, an embodiment of the present invention provides a data security method, including: Initializing a node distribution topological graph of the secret storage space, generating a first storage path set according to a node connection state in the node distribution topological graph, and constructing an initial path mapping linked list of the storage space based on the first storage path set; Executing a fragmentation confusion operation on a data set to be stored, splitting each data block into a plurality of data fragmentation units, distributing a storage path node sequence and a confusion mark sequence for each data fragmentation unit according to an initial path mapping linked list, and establishing a bidirectional index linked list between the storage path node sequence and the confusion mark sequence; Transmitting the data slicing units carrying the confusion mark sequences to the corresponding storage nodes according to the storage path node sequences, receiving the storage confirmation certificates and the node running state snapshots returned by each storage node, and updating the node health state records in the bidirectional index linked list according to the node running state snapshots; Periodically traversing the node health state records in the bidirectional index linked list, executing fragment repositioning operation when detecting that the node health state records are inconsistent with the pre-stored expected health states in the bidirectional index linked list, generating a new storage path node sequence and a new confusion mark sequence, and iteratively replacing corresponding entries in the bidirectional index linked list; Collecting all storage confirmation certificates and node running state snapshots, generating an audit certificate sequence of storage operation according to a change time axis of the node health state record, and storing the audit certificate sequence in association with a bidirectional index linked list. In another aspect, an embodiment of the present invention provides a data security device, including: The map initialization module is used for initializing a node distribution topological graph of the secret storage space, generating a first storage path set according to a node connection state in the node distribution topological graph, and constructing an initial path mapping linked list of the storage space based on the first storage path set; The system comprises a fragmentation confusion module, a data block management module and a data block management module, wherein the fragmentation confusion module is used for executing fragmentation confusion operation on a data set to be stored, splitting each data block into a plurality of data fragmentation units, distributing a storage path node sequence and a confusion mark sequence for each data fragmentation unit according to an initial path mapping linked list, and establishing a bidirectional index linked list between the storage path node sequence and the confusion mark sequence; The state updating module is used for sending the data slicing units carrying the confusion mark sequences to the corresponding storage nodes according to the storage path node sequences, receiving the storage confirmation certificates and the node running state snapshots returned by each storage node, and updating the node health state records in the bidirectional index linked list according to the node running state snapshots; The iteration updating module is used for periodically traversing the node health state records in the bidirectional index linked list, executing the fragment repositioning operation when detecting that the node health state reco