CN-121996149-A - Selective panic relief
Abstract
The present application relates to selective panic relief. Not all panic situations in a data storage device require a host device to initiate a reset. Efficiency is achieved when the data storage device is likely to handle a panic event and simply informs the host device that the panic event is avoided. For the multi-tenant case, the data storage device may track the type of trace and determine whether a host device initiated reset is necessary or whether the data storage device may process the reset internally. The data storage device may delay a host device initiated reset required by a tenant until other tenants are ready to make the host device initiated reset.
Inventors
- S. Benisti
- A. Na Weng
- J.G. Hahn
- A. basssky of BA Zal
Assignees
- 闪迪技术公司
Dates
- Publication Date
- 20260508
- Application Date
- 20250418
- Priority Date
- 20241106
Claims (20)
- 1. A data storage device, the data storage device comprising: Memory device, and A controller coupled to the memory device, wherein the controller is configured to: Tracking an indication of a reset of one or more Physical Functions (PFs), one or more Virtual Functions (VFs), or a combination of PFs and VFs; Determining whether a reset is to occur; determining whether the workload is a read workload, and It is determined whether to process the reset internally or to go to the host device to initiate the reset.
- 2. The data storage device of claim 1, wherein the one or more PFs, the one or more VFs, or a combination of the PFs and VFs comprises a first PF and a second PF, wherein the first PF comprises a first VF and a second VF.
- 3. The data storage device of claim 1, wherein the controller is configured to determine that the workload is a read workload, and wherein the controller is configured to internally process the reset.
- 4. The data storage device of claim 1, wherein the controller is configured to determine that the workload is not a read workload, and wherein the controller is configured to steer the host device to initiate the reset.
- 5. The data storage device of claim 1, wherein the controller is configured to collect reset feedback and internal reset preparations, wherein the controller is configured to determine whether all resets are ready, and wherein the controller is configured to initiate resets for all relevant VFs and PFs.
- 6. The data storage device of claim 5, wherein the initiated reset is both internal and external.
- 7. The data storage device of claim 1, wherein the controller is configured to track a type of trace.
- 8. The data storage device of claim 7, wherein the controller is configured to store an indication of whether a workload is a read workload for each of the one or more PFs, the one or more VFs, or a combination of the PFs and VFs.
- 9. The data storage device of claim 1, wherein the controller comprises a fault detector.
- 10. The data storage device of claim 1, wherein the controller comprises a Host Interface Module (HIM) comprising a panic reset module, a transparent reset and log module, and a reset synchronization module.
- 11. A data storage device, the data storage device comprising: Memory device, and A controller coupled to the memory device, wherein the controller is configured to: Operating as a multi-tenant device coupled to one or more Physical Functions (PFs), one or more Virtual Functions (VFs), or a combination of PFs and VFs; tracking a trace of each function of the one or more PFs, the one or more VFs, or a combination of the PFs and VFs; Determining whether the trace is for a read workload, and An indication of whether the workload is a read workload is stored.
- 12. The data storage device of claim 11, wherein the controller is configured to determine that a reset should occur and process the reset internally for a function with a read workload.
- 13. The data storage device of claim 11, wherein the controller is configured to determine that a reset should occur and turn to the host device to initiate the reset for functions having other workloads than the read workload.
- 14. The data storage device of claim 11, wherein the tracking is performed continuously.
- 15. The data storage device of claim 11, wherein the tracking is performed by determining a current workload of each of the one or more PFs, each of the one or more VFs, or each of the PFs and a combination of VFs and PFs once a reset is indicated.
- 16. The data storage device of claim 11, wherein at least one PF comprises a plurality of VFs.
- 17. The data storage device of claim 11, wherein the storing comprises storing a value in a bitmap indicating whether the workload is a read workload or a workload other than a read workload.
- 18. A data storage device, the data storage device comprising: Apparatus for storing data, and A controller coupled to the means for storing data, wherein the controller is configured to: determining that a near-fault event has occurred; determining that the near-fault event can be handled without a host device reset; initiating host device isolation; resetting the processing state; Restoring system state from state recovery database, and Host device isolation is removed.
- 19. The data storage device of claim 18, wherein the controller comprises a fault indication module and a fault recovery module, and wherein the controller maintains the state recovery database.
- 20. The data storage device of claim 19, wherein the controller operates as a multi-tenant device coupled to one or more Physical Functions (PFs), one or more Virtual Functions (VFs), or a combination of PFs and VFs.
Description
Selective panic relief Background Technical Field Embodiments of the present disclosure relate generally to panic relief. Description of related Art Peripheral component interconnect express (PCI) (PCIe) standards introduced single root input/output (I/O) virtualization (SR-IOV) including Physical Functions (PF) and Virtual Functions (VFs). The PF is a full feature PCIe function. VFs are lightweight functions that lack some configuration resources. A multi-tenant environment typically means that there is some type of virtualization implemented in the device controller, such as one or more VFs, one or more PFs, or a combination thereof. Most particularly, a multi-tenant environment involves multiple functions. When a data storage device encounters an internal failure, the data storage device has several recovery paths. Some failures may be handled within the data storage device and some failures involve resetting the host interface or otherwise disrupting host-device communications. An event involving host device interaction is referred to as a panic event (PANIC EVENT). Mechanisms exist in the fast non-volatile memory (NVM) (NVMe) and open computing item (OCP) standards for resolving panic events while minimizing impact on end users. Whether operating in a client or enterprise Solid State Drive (SSD) environment, reducing the frequency of panic events is valuable to avoid disrupting the host interface wherever possible. Thus, there is a need in the art to alleviate panic events. Disclosure of Invention Not all panic situations in a data storage device require a host device to initiate a reset. Efficiency is achieved when the data storage device is likely to handle a panic event and simply informs the host device that the panic event is avoided. For the multi-tenant case, the data storage device may track the type of trace and determine whether a host device initiated reset is necessary or whether the data storage device may process the reset internally. The data storage device may delay a host device initiated reset required by a tenant until other tenants are ready to make the host device initiated reset. In one embodiment, a data storage device includes a memory device and a controller coupled to the memory device, wherein the controller is configured to track an indication of a reset of one or more Physical Functions (PFs), one or more Virtual Functions (VFs), or a combination of PFs and VFs, determine whether a reset is to occur, determine whether a workload is a read workload, and determine whether to process the reset internally or to go to a host device to initiate the reset. In another embodiment, a data storage device includes a memory device, a controller coupled to the memory device, wherein the controller is configured to operate as a multi-tenant device coupled to one or more Physical Functions (PFs), one or more Virtual Functions (VFs), or a combination of PFs and VFs, track traces of each of the one or more PFs, the one or more VFs, or the combination of PFs and VFs, determine whether the traces are for a read workload, and store an indication of whether the workload is a read workload. In another embodiment, a data storage device includes means for storing data, a controller coupled to the means for storing data, wherein the controller is configured to determine that a near failure event has occurred, determine that the near failure event can be handled without a host device reset, initiate host device isolation, handle a state reset, recover a system state from a state recovery database, and remove host device isolation. Drawings So that the manner in which the above recited features of the present disclosure can be understood in detail, a more particular description of the disclosure, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this disclosure and are therefore not to be considered limiting of its scope, for the disclosure may admit to other equally effective embodiments. FIG. 1 is a schematic block diagram illustrating a storage system in which a data storage device may act as a storage device for a host device, according to some embodiments. FIG. 2 is a block diagram depicting a standard for introducing a single root input/output (I/O) virtualization (SR-IOV) system. FIG. 3 is a schematic diagram of an internal reset for a single port memory device. FIG. 4 is a flow diagram illustrating the processing of a fault event according to one embodiment. FIG. 5 is a flowchart illustrating the operation of a state recording module and database according to one embodiment. Fig. 6 is a flow chart illustrating panic event mitigation in a multi-tenant system. Fig. 7 is a flow chart illustrating selective reset preparation for a multi-tenant system. Fig. 8 is a schematic diagram of a memory system with panic event mitigation resourc