CN-121983106-A - Method and device for compensating read interference of solid state disk based on deep reinforcement learning

CN121983106ACN 121983106 ACN121983106 ACN 121983106ACN-121983106-A

Abstract

The invention provides a method and a device for compensating read interference of a solid state disk based on deep reinforcement learning, which are used for acquiring a read operation event stream record; the method comprises the steps of constructing a read interference propagation influence map by combining a physical block topological structure and a history compensation record, generating a read interference evolution state representation sequence, inputting the read interference evolution state representation sequence into a circulating state inference network, generating an interference propagation state implicit representation, inputting an action strategy network, generating a compensation adjustment instruction sequence comprising voltage threshold adjustment and read operation scheduling offset by combining decisions, executing compensation adjustment, collecting response data, generating a reward evaluation signal, and storing an experience sample into an experience playback memory bank. The method can adaptively learn the optimal compensation strategy, effectively inhibit voltage offset accumulation caused by read interference, and improve the reliability and service life of the read operation of the solid state disk while reducing the compensation operation cost.

Inventors

YU KAI
LI CHUPENG
Zou sai
HU XIAONAN

Assignees

贵州大学
深圳市大乘科技股份有限公司

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (10)

1. The method for compensating the read interference of the solid state disk based on the deep reinforcement learning is characterized by comprising the following steps of: Acquiring a read operation event stream record of a plurality of physical blocks in a solid state disk in a continuous read operation period, wherein the read operation event stream record comprises a physical block identifier, a read operation time stamp and a read operation voltage threshold offset record corresponding to each read operation; Constructing a read interference propagation influence map according to the read operation event stream record and the physical block topological structure of the solid state disk, and generating a read interference evolution state representation sequence of each physical block according to the read interference propagation influence map and the history compensation operation record, wherein the read interference evolution state representation sequence comprises a voltage offset prediction track of each physical block in a non-applied compensation state and voltage offset response feedback information after history compensation operation; Inputting the read interference evolution state characterization sequence into a circulating state inference network of a deep reinforcement learning model, and performing time sequence association modeling on the read interference evolution state characterization sequence of each physical block through a gating state updating mechanism of the circulating state inference network to generate an interference propagation state implicit characterization of each physical block at the current moment; The interference propagation state implicit representation is input into an action strategy network of the deep reinforcement learning model, and the combined decision processing of the compensation action is carried out on each physical block group through a multi-branch decision structure of the action strategy network, so that a compensation adjustment instruction sequence containing a voltage threshold adjustment instruction and a read operation scheduling offset instruction is generated; And executing the read operation compensation adjustment operation of the solid state disk according to the compensation adjustment instruction sequence, collecting voltage threshold offset response observation data of each physical block in a subsequent read operation period after compensation adjustment, generating a reward evaluation signal according to the difference between the voltage threshold offset response observation data and a predicted track in the read interference evolution state representation sequence, and combining the reward evaluation signal, the read interference evolution state representation sequence and the compensation adjustment instruction sequence into a new experience sample to be stored in an experience playback memory bank.
2. The method of claim 1, wherein the constructing a read disturb propagation influence map according to the read operation event stream record and the physical block topology of the solid state disk, and generating a read disturb evolution state characterization sequence of each physical block according to the read disturb propagation influence map and the history compensation operation record, comprises: Analyzing a read operation time stamp sequence of each physical block in the read operation event stream record, identifying a dense read operation interval and a sparse read operation interval in the read operation time stamp sequence, and generating read operation burst mode descriptors of each physical block according to an alternate distribution mode of the dense read operation interval and the sparse read operation interval; Extracting physical adjacency relations among different physical blocks in a physical block topological structure of the solid state disk, and determining read operation frequency interaction coefficients among the physical blocks according to the physical adjacency relations and read operation burst mode descriptors of the physical blocks, wherein the read operation frequency interaction coefficients comprise influence direction identifiers and influence intensity grade identifiers; Constructing a directed graph structure which takes the physical blocks as nodes and the read operation frequency interaction coefficients as edge weights according to the physical adjacency relations among the read operation frequency interaction coefficients and the physical blocks, and taking the directed graph structure as a read interference propagation influence graph; Extracting actual measurement values of voltage threshold offset of each physical block at a plurality of historical moments and corresponding compensation operation records from the historical compensation operation records, and storing the actual measurement values of the voltage threshold offset and the corresponding compensation operation records in a correlated manner to form a historical compensation experience entry set; According to the directed graph structure in the read interference propagation influence graph, historical compensation experience items of other physical blocks with propagation association relation with the current physical block are screened out from the historical compensation experience item set, the screened historical compensation experience items are aligned with the historical compensation experience items of the current physical block in time sequence, a read interference evolution state representation sequence of each physical block on a continuous time window is generated, and each state node in the read interference evolution state representation sequence comprises a voltage offset actual measurement value of the current physical block, a voltage offset influence value propagated by an adjacent physical block and a corresponding compensation operation record; And executing time sequence smoothing processing on the read interference evolution state characterization sequence, realigning voltage offset influence values in state nodes of adjacent physical blocks according to propagation delay time according to propagation association relations in a directed graph structure, generating an aligned state node sequence with a unified time reference, and updating confidence weights of voltage offset response feedback information in the read interference evolution state characterization sequence of each physical block according to ratio relations between voltage offset actual measurement values of all state nodes in the aligned state node sequence and the propagated voltage offset influence values.
3. The method of claim 2, wherein the parsing the read operation timestamp sequence of each physical block in the read operation event stream record identifies dense read operation intervals and sparse read operation intervals in the read operation timestamp sequence, comprising: extracting a read operation time stamp sequence in a read operation event stream record of each physical block, and performing differential processing on the read operation time stamp sequence to obtain a time interval sequence between adjacent read operations, wherein each interval value in the time interval sequence corresponds to the time difference between two adjacent read operations; Determining a dense read operation threshold and a sparse read operation threshold according to the statistical distribution characteristics of the time interval sequence, dividing adjacent read operations with time interval values not larger than the dense read operation threshold into continuous read operation pairs in a dense read operation interval, and marking idle time intervals between the adjacent read operations with the time interval values not smaller than the sparse read operation threshold as sparse read operation intervals; Carrying out connectivity merging processing on the continuous read operation pairs in the dense read operation intervals, merging a plurality of continuous read operations with adjacent time intervals meeting the dense condition into one dense read operation interval, and generating a start read operation time stamp, an end read operation time stamp and the number of read operations in the interval of each dense read operation interval as dense read operation interval attributes; Identifying the time interval between adjacent dense reading operation intervals, dividing the time period between adjacent dense reading operation intervals with the time interval larger than the sparse reading operation threshold value into sparse reading operation intervals, and generating a starting idle time stamp, an ending idle time stamp and interval duration of each sparse reading operation interval as sparse reading operation interval attributes; According to the dense read operation interval attribute and the sparse read operation interval attribute, alternately arranging the dense read operation interval and the sparse read operation interval according to time sequence, and generating a read operation burst mode descriptor of each physical block, wherein the read operation burst mode descriptor comprises an alternating sequence of distribution positions of the dense read operation interval and distribution positions of the sparse read operation interval; Inputting the read operation burst mode descriptors of all the physical blocks into a burst mode clustering device, clustering the read operation burst mode descriptors of different physical blocks according to the initial timestamp distribution of the intensive read operation interval and the similarity of the read operation quantity in the interval, generating physical block group identifiers with similar read operation burst modes, and storing the physical block group identifiers in the read operation burst mode descriptors of all the physical blocks in an associated mode.
4. The method according to claim 2, wherein constructing a directed graph structure with a physical block as a node and a read operation frequency interaction coefficient as an edge weight as a read interference propagation influence graph according to the read operation frequency interaction coefficient and a physical adjacency relation between the physical blocks comprises: Determining the space adjacent degree level between physical blocks according to the physical adjacent relation between the physical blocks, and marking the physical block pairs with the physical adjacent degree level exceeding a preset adjacent threshold value as candidate propagation physical block pairs, wherein the candidate propagation physical block pairs comprise source physical block identifications and target physical block identifications; For each candidate propagation physical block pair, acquiring a read operation burst mode descriptor of a source physical block and a read operation burst mode descriptor of a target physical block, and calculating an intensity reference value of a read operation frequency interaction coefficient according to the overlapping degree of a dense read operation interval of the source physical block and a dense read operation interval of the target physical block on a time axis; Determining an influence direction mark of a read operation frequency mutual influence coefficient according to the relative positions of the source physical block and the target physical block in the physical topological structure, wherein the influence direction mark is used for indicating that interference is in a unidirectional propagation direction from the source physical block to the target physical block; combining the intensity reference value with the influence direction mark, and generating a read operation frequency interaction coefficient with directivity for each candidate propagation physical block pair, wherein the numerical value of the read operation frequency interaction coefficient reflects the influence degree of the read operation of the source physical block on the voltage threshold offset of the target physical block; Taking all physical blocks as node sets, taking candidate propagation physical block pairs as directed edge sets, taking read operation frequency mutual influence coefficients as weight values of directed edges, and constructing a directed graph structure as a read interference propagation influence map; And carrying out hierarchical clustering processing on the nodes in the directed graph structure, dividing physical blocks into different interference propagation levels according to directed edge weight distribution among the nodes, generating interference propagation direction identifiers among the levels and inter-level transfer delay parameters, and storing the inter-level transfer delay parameters into a read interference propagation influence map as additional attributes of directed edges.
5. The method of claim 1, wherein the inputting the read disturb evolution state characterization sequence into the loop state inference network of the deep reinforcement learning model, performing time-series correlation modeling on the read disturb evolution state characterization sequence of each physical block through a gating state update mechanism of the loop state inference network, generating an disturb propagation state implicit characterization of each physical block at a current moment, comprises: Arranging the state nodes of each physical block in the read interference evolution state representation sequence on a continuous time window according to a time sequence to obtain a state node time sequence chain of each physical block, wherein each state node in the state node time sequence chain comprises a voltage offset actual measurement value of the current physical block, a voltage offset influence value transmitted by an adjacent physical block and a corresponding compensation operation record; Inputting state node time sequence chains of all physical blocks into an input layer of a circulating state inference network in parallel, wherein the input layer distributes independent input channels for each physical block, and parameter sharing is kept among the input channels so as to extract common time sequence evolution rules of different types of physical blocks; The method comprises the steps that a gating state updating unit of a loop state inference network performs time-step-by-time state recursive calculation on a state node time sequence chain of each physical block, and on each time step, the gating state updating unit calculates a reset gating signal and an update gating signal according to an input state node of a current time step and a hidden state output by a previous time step; selectively forgetting the hidden state output in the previous time step and fusing the current input information according to the reset gating signal and the update gating signal, and generating the output hidden state in the current time step as the intermediate state representation of the physical block at the current moment; Splicing the hidden states output by each physical block in the last time step to obtain state tensors with uniform dimensionality, inputting the state tensors into an output projection layer of a cyclic state inference network, performing nonlinear transformation processing on the state tensors through the output projection layer, and generating implicit characterization of the interference propagation state of each physical block at the current moment; And inputting the implicit characterization of the interference propagation state into a feedback regulation unit of the cyclic state inference network, generating a state change trend vector by the feedback regulation unit according to the variable quantity between the implicit characterization of the interference propagation state of each physical block at the current moment and the implicit characterization of the interference propagation state output at the last moment, and fusing the state change trend vector with the implicit characterization of the interference propagation state of each physical block at the current moment and outputting the fused state change trend vector.
6. The method of claim 5, wherein the step-by-step state recursion calculation of the state node timing chain of each physical block by the gating state update unit of the loop state inference network comprises: Splicing the input state node of the current time step and the hidden state output by the previous time step to generate a spliced state vector, inputting the spliced state vector into a reset gate computing unit, linearly transforming the spliced state vector by the reset gate computing unit, and generating a reset gate control signal through an activation function, wherein the reset gate control signal is used for controlling the forgotten information proportion in the hidden state of the previous time step; Inputting the spliced state vector into an updating gate computing unit, linearly transforming the spliced state vector, and generating an updating gate control signal through an activation function, wherein the updating gate control signal is used for controlling the fusion proportion of current input information and history information in state updating; Resetting the hidden state output by the previous time step according to the reset gating signal, splicing the reset hidden state with the input state node of the current time step, inputting the spliced vector to a candidate state calculation unit, performing linear transformation on the spliced vector, and generating a candidate hidden state through an activation function; Carrying out weighted summation on the hidden state output by the previous time step and the candidate hidden state according to the updated gating signal to generate the hidden state output by the current time step, wherein the updated gating signal in the weighted summation is used as a weight coefficient to control the contribution degree of the candidate hidden state; transmitting the hidden state output by the current time step to the next time step as the input of a gating state updating unit of the next time step, and outputting the hidden state output by the current time step to an output layer of a circulating state inference network for subsequent processing; The hidden states of all physical blocks output in the current time step are grouped according to the propagation hierarchy relation of the physical blocks in the read interference propagation influence spectrum, the hidden states of all physical blocks in the same propagation hierarchy are subjected to aggregation operation to generate a hierarchy aggregation state vector, and the hierarchy aggregation state vector is broadcasted back to all physical blocks of the corresponding propagation hierarchy to serve as additional hidden state input.
7. The method of claim 5, wherein the splicing the hidden states of each physical block output in the last time step to obtain a state tensor with uniform dimensions, and inputting the state tensor into the output projection layer of the loop state inference network comprises: Extracting the hidden state of each physical block output in the last time step of a state node time sequence chain, wherein the hidden state output in the last time step gathers all time sequence evolution information of the physical block from the historical starting moment to the current moment; Splicing the hidden states output by all physical blocks in the last time step according to the spatial arrangement sequence of the physical blocks in the read interference propagation influence map, and generating an initial state tensor with spatial dimension and characteristic dimension, wherein each spatial position in the initial state tensor corresponds to a hidden state vector of one physical block; inputting the initial state tensor into a spatial feature extraction unit of an output projection layer, extracting spatial neighborhood correlation features among physical blocks through convolution operation, and generating a state tensor after spatial enhancement; The state tensor after space enhancement is input into a nonlinear mapping unit of an output projection layer, and channel-by-channel characteristic transformation is carried out on the state tensor after space enhancement through a multi-layer full-connection structure, so that an intermediate interference propagation state implicit representation is generated; the time sequence smoothing unit of the intermediate interference propagation state hidden representation input-output projection layer carries out weighted moving average processing according to the intermediate interference propagation state hidden representation at the current moment and the interference propagation state hidden representation output at the last moment to generate the interference propagation state hidden representation of each physical block at the current moment; And setting the output dimension of the interference propagation state implicit characterization to be a dimension matched with the input dimension of the action strategy network, so that the interference propagation state implicit characterization can be directly used as the input data of the input layer of the action strategy network.
8. The method of claim 1, wherein the implicitly characterizing the interference propagation state into the action policy network of the deep reinforcement learning model performs a joint decision process of compensating actions on each physical block group through a multi-branch decision structure of the action policy network, generating a compensation adjustment instruction sequence including a voltage threshold adjustment instruction and a read operation scheduling offset instruction, comprising: The method comprises the steps that a shared feature extraction main network of an action strategy network is input into an interference propagation state implicit representation, wherein the shared feature extraction main network comprises a plurality of convolution layers and attention layers and is used for extracting a spatial distribution mode and a physical block correlation mode in the interference propagation state implicit representation to generate a shared feature representation; The shared characteristic is represented by a voltage adjustment branch network of an input action strategy network, a voltage threshold adjustment amplitude suggestion value is generated according to the interference propagation state of each physical block in the shared characteristic representation, and the voltage threshold adjustment amplitude suggestion value is stored in association with a physical block identifier to form a voltage threshold adjustment instruction set; a scheduling offset branch network of the shared characteristic representation input action strategy network generates a global scheduling offset strategy according to the interference propagation state distribution of all physical blocks in the shared characteristic representation, wherein the global scheduling offset strategy comprises priority offset of each physical block in a read operation queue and a read operation execution sequence rearrangement instruction; Performing instruction fusion processing on the voltage threshold adjustment instruction set and a global scheduling offset strategy, and pairing the voltage threshold adjustment instruction and a corresponding read operation scheduling priority offset according to a physical block identifier to generate a compensation adjustment instruction item of each physical block; arranging compensation adjustment instruction items of all physical blocks according to the propagation level sequence of the physical blocks in a read interference propagation influence map, and generating a compensation adjustment instruction sequence, wherein each item in the compensation adjustment instruction sequence comprises a target physical block identifier, a voltage threshold adjustment amplitude value and a read operation scheduling priority offset; And inputting the compensation adjustment instruction sequence into an instruction conflict detection unit of the action strategy network, detecting whether the conflict of the voltage threshold adjustment amplitude values exists in a plurality of compensation adjustment instruction items aiming at the same physical block group in the compensation adjustment instruction sequence, and if so, weighting and combining the conflicting voltage threshold adjustment amplitude values according to the propagation influence weight of the physical block in the read interference propagation influence map to generate a conflict-free compensation adjustment instruction sequence.
9. The method of claim 8, wherein said representing the shared characteristic as a voltage adjustment branch network of an input action policy network comprises: the shared characteristic representation is input into a physical block level characteristic extraction layer of the voltage adjustment branch network, independent characteristic extraction is carried out according to characteristic areas corresponding to physical blocks in the shared characteristic representation, and a characteristic vector of each physical block is generated, wherein the characteristic vector comprises current interference propagation state coding information of the physical block; Configuring an independent action output head for the feature vector of each physical block, wherein the action output head comprises a multi-layer perceptron structure, and the multi-layer perceptron carries out layer-by-layer nonlinear mapping on the input feature vector to generate probability distribution on a voltage threshold adjustment amplitude candidate set of the physical block; Sampling according to probability values from probability distribution on the voltage threshold adjustment amplitude candidate set of each physical block to obtain a voltage threshold adjustment amplitude recommended value of each physical block, and converting the voltage threshold adjustment amplitude recommended value into a voltage adjustment gear code which can be identified by a voltage control unit; Dividing physical blocks into a plurality of physical block groups according to the propagation hierarchical relation of the physical blocks in a read interference propagation influence map, generating a voltage threshold adjustment amplitude recommended value for the physical blocks in each physical block group, performing propagation influence consistency adjustment on the voltage threshold adjustment amplitude recommended value in the same physical block group, and if the propagation influence consistency adjustment finds that the voltage threshold adjustment amplitude recommended value of the physical block in the same physical block group in the upstream propagation is larger than the voltage threshold adjustment amplitude recommended value of the physical block in the downstream propagation, reallocating the voltage threshold adjustment amplitude recommended value according to the propagation direction to ensure that the voltage threshold adjustment amplitude recommended value of the physical block in the upstream propagation is not larger than the voltage threshold adjustment amplitude recommended value of the physical block in the downstream propagation; And storing the suggested value of the voltage threshold adjustment amplitude of each physical block subjected to propagation influence consistency adjustment according to the physical block identification, and generating a voltage threshold adjustment instruction set.
10. The solid state disk read interference compensation device is characterized by comprising: The system comprises a record acquisition module, a record generation module and a record generation module, wherein the record acquisition module is used for acquiring a read operation event stream record of a plurality of physical blocks in a solid state disk in a continuous read operation period, and the read operation event stream record comprises a physical block identifier, a read operation time stamp and a read operation voltage threshold offset record corresponding to each read operation; The map construction module is used for constructing a read interference propagation influence map according to the read operation event stream record and the physical block topological structure of the solid state disk, generating a read interference evolution state representation sequence of each physical block according to the read interference propagation influence map and the history compensation operation record, wherein the read interference evolution state representation sequence comprises a voltage offset prediction track of each physical block in a state of no compensation and voltage offset response feedback information after the history compensation operation; The time sequence modeling module is used for inputting the read interference evolution state representation sequence into a circulating state inference network of the deep reinforcement learning model, performing time sequence association modeling on the read interference evolution state representation sequence of each physical block through a gating state updating mechanism of the circulating state inference network, and generating an interference propagation state implicit representation of each physical block at the current moment; The compensation decision module is used for implicitly representing the interference propagation state into an action strategy network of the deep reinforcement learning model, performing joint decision processing of compensation actions on each physical block group through a multi-branch decision structure of the action strategy network, and generating a compensation adjustment instruction sequence comprising a voltage threshold adjustment instruction and a read operation scheduling offset instruction; The compensation adjustment module is used for executing the read operation compensation adjustment operation of the solid state disk according to the compensation adjustment instruction sequence, collecting voltage threshold offset response observation data of each physical block in a subsequent read operation period after compensation adjustment, generating a reward evaluation signal according to the difference between the voltage threshold offset response observation data and the predicted track in the read interference evolution state representation sequence, and combining the reward evaluation signal, the read interference evolution state representation sequence and the compensation adjustment instruction sequence into a new experience sample to be stored in an experience playback memory bank.

Description

Method and device for compensating read interference of solid state disk based on deep reinforcement learning Technical Field The invention relates to the technical field of deep learning and hard disk optimization, in particular to a method and a device for compensating solid state disk read interference based on deep reinforcement learning. Background The solid state disk is used as a data storage core device, and a read interference effect between adjacent physical blocks in the read operation process can cause unexpected offset of a voltage threshold value of a storage unit, and when the voltage threshold value offset is accumulated to a certain extent, data reading errors are caused. The current compensation method for the solid state disk read interference generally executes unified voltage threshold adjustment operation on a physical block after detecting that the read operation times of the physical block reach a preset threshold, or carries out independent compensation decision according to the historical read operation frequency statistics result of a single physical block. However, the compensation method is faced with the following problems in practical application that the propagation phenomenon of mutual interference exists among different physical blocks in the solid state disk due to the proximity of physical positions and the difference of read operation frequencies, the spatial propagation characteristics and time sequence evolution rules of interference among the physical blocks are difficult to accurately describe only according to the read operation times or the historical frequency statistics of single physical blocks, the selection of compensation time and compensation amplitude is difficult to match with the actual interference accumulation state, and the compensation effect is restricted. Disclosure of Invention In view of the above, the present invention provides a method and apparatus for compensating for read disturbance of a solid state disk based on deep reinforcement learning. The technical scheme of the embodiment of the invention is realized as follows: in one aspect, an embodiment of the present invention provides a method for compensating for read interference of a solid state disk based on deep reinforcement learning, including: Acquiring a read operation event stream record of a plurality of physical blocks in a solid state disk in a continuous read operation period, wherein the read operation event stream record comprises a physical block identifier, a read operation time stamp and a read operation voltage threshold offset record corresponding to each read operation; Constructing a read interference propagation influence map according to the read operation event stream record and the physical block topological structure of the solid state disk, and generating a read interference evolution state representation sequence of each physical block according to the read interference propagation influence map and the history compensation operation record, wherein the read interference evolution state representation sequence comprises a voltage offset prediction track of each physical block in a non-applied compensation state and voltage offset response feedback information after the history compensation operation; Inputting the read interference evolution state characterization sequence into a circulating state inference network of the deep reinforcement learning model, and performing time sequence correlation modeling on the read interference evolution state characterization sequence of each physical block through a gating state updating mechanism of the circulating state inference network to generate an interference propagation state implicit characterization of each physical block at the current moment; The method comprises the steps that an interference propagation state implicitly represents an action strategy network input into a deep reinforcement learning model, and a multi-branch decision structure of the action strategy network performs joint decision processing of compensation actions on each physical block group to generate a compensation adjustment instruction sequence comprising a voltage threshold adjustment instruction and a read operation scheduling offset instruction; And executing the read operation compensation adjustment operation of the solid state disk according to the compensation adjustment instruction sequence, collecting voltage threshold offset response observation data of each physical block in a subsequent read operation period after compensation adjustment, generating a reward evaluation signal according to the difference between the voltage threshold offset response observation data and a predicted track in the read interference evolution state representation sequence, and combining the reward evaluation signal, the read interference evolution state representation sequence and the compensation adjustment instruction sequence into a new experience