CN-122024806-A - On-orbit mass storage NANDFLASH single-event function interruption self-recovery method
Abstract
The invention discloses an on-orbit large-capacity storage NANDFLASH single-particle function interruption self-recovery method, and belongs to the technical field of large-capacity storage fault recovery design. The method comprises the steps of constructing an independently powered storage system architecture, establishing a read-write pointer management mechanism and a timeout monitoring mechanism in an FPGA, and judging that a single-event function interrupt occurs when a write address pointer is kept unchanged for a plurality of continuous sampling periods or a timeout mark is kept in a set state continuously. After the FPGA detects the fault, the FPGA autonomously outputs a control pulse to the DCDC power supply module, and the control NANDFLASH independently executes the power-off restarting operation to complete the fault self-recovery. The invention does not need to power off the whole system, only powers up NANDFLASH singly, does not influence the execution of the main control task, and the fault recovery process is completed autonomously by the FPGA without manual intervention, thereby having strong universality and low realization cost.
Inventors
- ZHANG YI
- QIU QINGLIN
- XU ZHENLONG
- YANG JIANG
Assignees
- 山东航天电子技术研究所
Dates
- Publication Date
- 20260512
- Application Date
- 20251223
Claims (8)
- 1. An on-orbit mass storage NANDFLASH single-event function interruption self-recovery method is characterized by comprising the following steps: s1, constructing an independent power supply storage system architecture, adopting an FPGA and NANDFLASH to form a storage system, selecting a DCDC power supply module with an output enabling control pin, connecting the output end of the DCDC power supply module to the power supply input end of NANDFLASH to provide an independent power supply channel for NANDFLASH, and connecting the output enabling control pin of the DCDC power supply module with a control output port of the FPGA; s2, a read-write pointer management mechanism is established, a write address pointer and a read address pointer are arranged in the FPGA and are respectively used for indicating the current write address and the current read address of NANDFLASH data storage areas, when the read-write operation is executed, the storage areas of NANDFLASH are accessed according to the corresponding pointers, and the corresponding pointers are updated after the operation is completed; s3, configuring a timeout monitoring mechanism, connecting a NANDFLASH ready/busy state signal pin to an input detection port of the FPGA, setting a timeout counter in the FPGA, starting the timeout counter when executing memory operation, monitoring the state change of the ready/busy state signal pin, and setting or clearing a timeout mark according to a monitoring result; S4, executing fault state judgment, wherein the FPGA periodically samples the value of the write address pointer and the state of the overtime mark, and judging that single-event function interruption occurs when the write address pointer is kept unchanged for a plurality of continuous sampling periods or the overtime mark is kept in a set state continuously; And S5, implementing automatic power-off recovery, after the FPGA detects the interruption of the single-event function, outputting a low-level pulse to an output enabling control pin of the DCDC power supply module through the control output port, and controlling NANDFLASH to independently execute power-off and then re-power-on operation to complete the self-recovery of the interruption of the single-event function.
- 2. The method according to claim 1, wherein the step S1 of connecting the output enable control pin of the DCDC power module to the control output port of the FPGA further comprises a power-on timing control configuration: Connecting a power-on reset signal of the system and a configuration completion signal of the FPGA to an output enabling control pin of the DCDC power supply module through an AND gate circuit, so that the output enabling of the DCDC power supply module is simultaneously controlled by the power-on reset signal, the FPGA configuration completion signal and an FPGA control output port; the power-on time sequence control configuration ensures that NANDFLASH is powered on after the system power-on reset is completed and the FPGA configuration is completed, so that logic confusion caused by NANDFLASH before the FPGA power-on is avoided.
- 3. The method according to claim 1, wherein updating the corresponding pointer after the operation in step S2 is completed, specifically comprises: when the writing operation is executed, the FPGA determines NANDFLASH a target writing address in the current writing address pointer according to the value of the current writing address pointer, writes data into a storage area corresponding to the address, and updates the writing address pointer according to the following formula after the writing operation is completed: ; Wherein, the The updated write address pointer value; a write address pointer value before updating; Data amount (bytes) for a single page write operation; When the reading operation is executed, the FPGA determines NANDFLASH the target reading address according to the value of the current reading address pointer, reads data from the storage area corresponding to the address, and updates the reading address pointer in the same mode after the reading operation is finished.
- 4. The method according to claim 1, wherein the setting or clearing of the timeout flag in step S3 according to the monitoring result specifically includes: Setting independent timeout counters for block erase operation, page programming operation and page read operation in the FPGA respectively, wherein the timeout threshold value of each timeout counter is determined according to the maximum allowable time of the corresponding operation specified by NANDFLASH device specifications; the ready/busy state signal pin is the #R/B pin of NANDFLASH, the low level indicates NANDFLASH is busy state, and the high level indicates NANDFLASH is ready state; When executing the memory operation, starting the corresponding timeout counter to start timing, continuously monitoring the level state of the #R/B pin, if the timeout counter detects that the #R/B pin jumps from low level to high level before reaching the timeout threshold, judging that the operation is normally completed, clearing the timeout mark and stopping timing, and if the timeout counter still keeps low level when reaching the timeout threshold, setting the timeout mark.
- 5. The method according to claim 1, wherein the single event interrupt is determined in step S4 when the write address pointer is kept unchanged for a plurality of consecutive sampling periods, and the specific determining method is as follows: the FPGA samples the current value of the write address pointer in each telemetry period and compares the current value with the write address pointer value sampled in each telemetry period of the preamble; When in continuous The sampled write address pointer values in the telemetry periods are equal, and a fault judgment condition is met, wherein the judgment logic is expressed as follows: ; Wherein, the For the write address pointer failure determination result, when the value is true, the failure condition is satisfied; is the first Write address pointer values sampled for each telemetry period; is the first Write address pointer values sampled for each telemetry period; is the first Write address pointer values sampled for each telemetry period; the value of the threshold value of the continuous sampling period number is "=" Is an equality determination operator; The fault condition corresponding to the fault judging condition is that NANDFLASH single event function interruption causes block erasing operation failure, a writing state machine in the FPGA cannot jump normally, and a writing address pointer stops updating.
- 6. The method according to claim 1, wherein the step S4 is a specific method for determining that a single event interruption occurs when the timeout flag is kept in the set state, where: the FPGA samples the state of the timeout mark in each telemetry period when the timeout mark is in continuous When the set state is maintained in each telemetry period, the fault judging condition is met, and the judging logic is expressed as follows: ; Wherein, the For the overtime fault determination result, when the value is true, the fault condition is met; is the first The state of the timeout flag sampled in each telemetry period, a value of true indicates that the timeout flag is set; is the first A timeout flag state sampled for each telemetry period; is the first A timeout flag state sampled for each telemetry period; the # "is a threshold value of the continuous sampling period number, and the #" is a logical AND operator; The fault condition corresponding to the fault judging condition is that NANDFLASH single event function interruption causes abnormal read-write operation, the FPGA cannot detect high-level jump of the #R/B pin, and the overtime mark is continuously set.
- 7. The method according to claim 5 or 6, wherein the integrated decision logic for single event interruption in step S4 is: ; Wherein, the The single-event function interruption comprehensive judgment result is obtained, and when the value is true, the judgment that the single-event function interruption occurs is indicated; the fault judgment result of the write address pointer; The overtime fault judgment result is a logical or operator; When (when) Or (b) When any one of the two is true, If true, the FPGA determines NANDFLASH that a single event function interrupt occurs and triggers the autonomous power-up recovery of step S5.
- 8. The method according to claim 1, wherein the outputting the low level pulse to the output enable control pin of the DCDC power module through the control output port in step S5 specifically comprises: under a normal working state, the FPGA continuously outputs a high level to a RUN pin of the DCDC power supply module through the control output port, and the DCDC power supply module keeps an output enabling state and continuously supplies power for NANDFLASH; When step S4 judges that single-event function interruption occurs, the FPGA switches the output of the control output port from high level to low level, and after receiving the low level, the RUN pin of the DCDC power supply module closes the output, NANDFLASH is powered off; duration of power outage After that, the FPGA restores the output of the control output port to a high level, and the RUN pin of the DCDC power supply module restores the output after receiving the high level, NANDFLASH is electrified again, and the power-off restoration is completed; In the power-off and power-on recovery process, only NANDFLASH executes power-off restarting, and the FPGA and other parts of the system keep normal operation without interrupting the execution of the main control task.
Description
On-orbit mass storage NANDFLASH single-event function interruption self-recovery method Technical Field The invention belongs to the technical field of fault recovery design of a large-capacity memory, and particularly relates to an on-orbit large-capacity memory NANDFLASH single-particle function interruption self-recovery method. Background NANDFLASH as a nonvolatile, erasable, high-integration memory is widely used in the field of aerospace mass data storage. Integrated electronic computer systems for in-orbit satellites typically employ NANDFLASH to store whole-satellite real-time telemetry data for transmission back to the ground station during the measurement and control arc. However, NANDFLASH is susceptible to energetic particles in a spatially radiating environment, and single particle effects occur. In a space radiation environment, when high energy heavy ions pass through NANDFLASH a memory, ionization occurs within the floating gate and surrounding insulating layers and electron-hole pairs are generated. These electron-hole pairs drift under the influence of the electric field inside the device and form interface trap charges at the interface, resulting in a drift of the threshold voltage of the transistor in the negative direction. The drift of the threshold voltage may turn on the transistor that is originally in the off state, thereby causing a change in the state of the memory cell or the logic control unit, causing a single event upset or a single event functional interruption. The single event function interruption is a non-destructive transient fault and is characterized in that the internal control logic of a memory is disturbed, logic state errors are caused, and the current operations such as reading, writing, erasing and the like are abnormal. Because the control logic of the device is affected by the single-event function interruption, rather than the storage unit itself, the fault cannot be eliminated by conventional means such as system reset or control software reloading, and the faulty device must be powered off and restarted to resume normal operation. In the prior art, when NANDFLASH has a single-event function interruption, a mode of restarting the whole system after power failure is generally adopted to restore the normal function of the memory. However, a system power down restart may cause an interruption of the master task being performed, which is unacceptable in a space master device with high reliability requirements. In addition, the prior art lacks an effective monitoring means for NANDFLASH single-event function interruption, and often depends on ground measurement and control personnel to judge whether a fault occurs through telemetry data analysis, so that the fault is discovered in time, manual intervention is needed in the recovery process, and the autonomy is poor. Therefore, a method for realizing NANDFLASH single-event function interrupt autonomous detection and autonomous recovery without interrupting a master control task is needed. Disclosure of Invention In order to solve the problems in the background technology, the invention provides an on-orbit mass storage NANDFLASH single-particle function interruption self-recovery method, which comprises the following steps: s1, constructing an independent power supply storage system architecture, adopting an FPGA and NANDFLASH to form a storage system, selecting a DCDC power supply module with an output enabling control pin, connecting the output end of the DCDC power supply module to the power supply input end of NANDFLASH to provide an independent power supply channel for NANDFLASH, and connecting the output enabling control pin of the DCDC power supply module with a control output port of the FPGA; s2, a read-write pointer management mechanism is established, a write address pointer and a read address pointer are arranged in the FPGA and are respectively used for indicating the current write address and the current read address of NANDFLASH data storage areas, when the read-write operation is executed, the storage areas of NANDFLASH are accessed according to the corresponding pointers, and the corresponding pointers are updated after the operation is completed; s3, configuring a timeout monitoring mechanism, connecting a NANDFLASH ready/busy state signal pin to an input detection port of the FPGA, setting a timeout counter in the FPGA, starting the timeout counter when executing memory operation, monitoring the state change of the ready/busy state signal pin, and setting or clearing a timeout mark according to a monitoring result; S4, executing fault state judgment, wherein the FPGA periodically samples the value of the write address pointer and the state of the overtime mark, and judging that single-event function interruption occurs when the write address pointer is kept unchanged for a plurality of continuous sampling periods or the overtime mark is kept in a set state continuously; And S5, implementing auto