CN-122019299-A - SSD link state detection method and system based on link information
Abstract
The application provides a SSD link state detection method and system based on link information, wherein the method comprises the steps of obtaining real-time speed values of a target link of SSD at all moments in a current observation window; determining the transmission rate state of a target link at each moment according to each real-time rate value and a preset threshold value, determining the current initial link state detection result of the target link according to the type duty ratio of each transmission rate state, wherein the initial link state detection result is a normal state, a first abnormal state and a second abnormal state, determining the current link state detection result of the target link to be a link abnormality if the initial link state detection result is the second abnormal state, and judging the current link state detection result of the target link according to PCIe error detection information of the target link in a preset time period if the initial link state detection result is the first abnormal state, so that the accuracy of SSD link state detection is improved.
Inventors
- YU CONG
- LIU GUOQUAN
- LI JIAJI
- WU ZHONGYUAN
- CHEN JIANCHUN
Assignees
- 天津中科曙光存储科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260108
Claims (10)
- 1. The SSD link state detection method based on the link information is characterized by comprising the following steps: acquiring real-time rate values of a target link of the SSD at all times in a current observation window; Determining the transmission rate state of the target link at each moment according to each real-time rate value and a preset threshold, wherein the transmission rate state is a normal rate state, a first abnormal rate state and a second abnormal rate state; determining the current initial link state detection result of the target link according to the type duty ratio of each transmission rate state, wherein the initial link state detection result is a normal state, a first abnormal state and a second abnormal state; If the initial link state detection result is the first abnormal state, judging the current link state detection result of the target link according to PCIe error detection information of the target link in a preset time period.
- 2. The method for detecting the SSD link state based on the link information according to claim 1, wherein the determining the transmission rate state of the target link at each time according to each real-time rate value and a preset threshold value includes: If the real-time rate value of the target link at any moment is greater than or equal to a first preset threshold value, determining that the transmission rate state of the target link at the corresponding moment is a normal rate state; If the real-time rate value of the target link at any moment is smaller than a first preset threshold value and larger than a second preset threshold value, determining that the transmission rate state of the target link at the corresponding moment is a first abnormal rate state; and if the real-time rate value of the target link at any time is smaller than or equal to a second preset threshold value, determining that the transmission rate state of the target link at the corresponding time is a second abnormal rate state.
- 3. The method for detecting SSD link state based on link information according to claim 1, wherein said determining a current initial link state detection result of the target link according to the type duty ratio of each of the transmission rate states comprises: according to the preset time length, the observation window is divided into a plurality of sub-windows on average; Traversing each sub-window, and for any sub-window, if each transmission rate state in the sub-window is a normal rate state, determining that the current initial link state detection result of the target link is a normal state, and stopping traversing; Counting the weight values corresponding to various transmission rate states in each sub-window, and calculating to obtain final weight values corresponding to various transmission rate states in the observation window; Determining the current initial link state detection result of the target link according to the size of each final weighting value; and if the final weighting values are equal, determining that the current initial link state detection result of the target link is a second abnormal state.
- 4. The method for detecting the SSD link state based on the link information according to claim 1, wherein if the initial link state detection result is a first abnormal state, determining the current link state detection result of the target link according to PCIe error detection information of the target link within a preset period of time includes: obtaining PCIe error detection information of the target link in a preset time period; based on a first preset duration, dividing the preset time period into a plurality of first sub-time periods on average, and counting first index values of the first sub-time periods, wherein the first index values are the occurrence times of uncorrectable errors in the PCIe error detection information in the sub-time periods; If the first index value of any first sub-time period is greater than or equal to a corresponding third preset threshold value, determining that the current link state detection result of the target link is an abnormal state.
- 5. The SSD link state detection method based on link information of claim 4, further comprising: if the first index value is smaller than the third preset threshold value, based on a second preset duration, dividing the preset time period into a plurality of second sub-time periods on average, and counting each second index value of each second sub-time period, wherein each second index value is the occurrence times of various correctable errors in the PCIe error detection information in the second sub-time period, and the first preset duration is longer than the second preset duration; for any second sub-time period, if a plurality of second index values of the second sub-time period are all larger than a fourth preset threshold value, determining the second sub-time period as an abnormal time period; if a preset number of continuous abnormal time periods exist in the preset time period, weighting and summing all second index values of all second sub-time periods according to the type of correctable error reporting to obtain a comprehensive index value; And if the comprehensive index value is larger than a fifth preset threshold value, determining that the current link state detection result of the target link is an abnormal state.
- 6. The SSD link state detection method based on link information according to any one of claims 1-5, characterized in that the SSD link state detection method further comprises: if the current link state detection result of any target link exists in the SSD, determining that the target link is an abnormal link, and determining that the SSD is an abnormal SSD; And in the current observation window, if the number of the abnormal SSDs detected and determined in the same storage device exceeds a preset abnormal number threshold, reporting device-level alarm information, otherwise, carrying out abnormal repair on each abnormal SSD respectively.
- 7. The method for detecting the SSD link state based on the link information as recited in claim 6, wherein the performing the anomaly repair on each of the abnormal SSDs, respectively, includes: for any abnormal SSD, closing each link between the abnormal SSD and the controller at the same time and opening each link at the same time, so as to finish one-time abnormal repair of the abnormal SSD; If the abnormal repair times of any SSD are larger than the preset repair times threshold, reporting corresponding SSD abnormal alarm information.
- 8. The SSD link state detection method of claim 7, wherein the SSD link state detection method further comprises: in a first processing time period after the abnormal SSD is subjected to abnormal repair, if any one of the first abnormal links in the abnormal SSD is detected to be in an abnormal state again, closing the first abnormal link, and accumulating the abnormal closing times of the first abnormal link; In the next observation window after the first abnormal link is closed, if the non-closed link of the abnormal SSD is detected to be in an abnormal state, opening the first abnormal link; In a second processing time period after the first abnormal link is closed, if the abnormal state of the non-closed link of the abnormal SSD is not detected, opening the first abnormal link and repairing the abnormal SSD; If the abnormal closing times of any first link of all links in the SSD is larger than a preset closing times threshold value, the first link is not opened any more; Wherein the number of link ports of the SSD is greater than or equal to 2.
- 9. The SSD link state detection system based on the link information is characterized by comprising an acquisition module, a rate judgment module, a first detection module and a second detection module; the acquisition module is used for acquiring real-time rate values of a target link of the SSD at all moments in a current observation window; the rate judging module is used for determining the transmission rate state of the target link at each moment according to each real-time rate value and a preset threshold value, wherein the transmission rate state is a normal rate state, a first abnormal rate state and a second abnormal rate state; The first detection module is used for determining the current initial link state detection result of the target link according to the type duty ratio of each transmission rate state, wherein the initial link state detection result is a normal state, a first abnormal state and a second abnormal state; The second detection module is configured to determine that the current link state detection result of the target link is a link abnormality if the initial link state detection result is a second abnormal state, and determine the current link state detection result of the target link according to PCIe error detection information of the target link in a preset time period if the initial link state detection result is a first abnormal state.
- 10. The SSD link state detection system of claim 9, wherein the SSD link state detection system further comprises an exception handling module; The abnormality processing module is used for determining that the target link is an abnormal link and determining that the SSD is an abnormal SSD if the current link state detection result of any target link is an abnormal state, reporting equipment-level alarm information if the number of the abnormal SSDs detected and determined in the same storage equipment exceeds a preset abnormal number threshold value in a current observation window, and otherwise, carrying out abnormality repair on each abnormal SSD respectively.
Description
SSD link state detection method and system based on link information Technical Field The present application relates to the field of storage device link detection technologies, and in particular, to a method and a system for detecting a storage device link state based on link information. Background As a storage medium of the storage device, the health and the running state of a Solid state disk (Solid STATE DISK, abbreviated as SSD) are one of the research emphasis of each storage manufacturer, and although the number of bad disks of the storage device can be gradually increased under the development of technologies such as redundancy, the storage device has very important significance if the SSD is recognized in advance and corresponding measures are taken. When detecting the SSD state, SMART is a very important index which cannot be bypassed, and the index is given by SSD manufacturers based on the running condition of firmware, so that the SMART has very important reference value. The SMART is information given by each manufacturer according to a standard protocol based on the running condition of the firmware of each manufacturer, part of manufacturers expand on the basis of the SMART, the information is more abundant, query modes such as Log Page are provided, but SSD is inserted in a storage device frame and depends on links among controllers to carry out service transmission, the quality of the links can directly influence the service, and the information given by the SMART is limited. Therefore, none of the above mentioned information effectively and directly reflects the link situation of the SSD. The link detection currently adopts a very common method that an abnormal state is detected, namely that an abnormal value exists or an unexpected result is detected, namely that the link is judged to be abnormal, and then a relevant alarm is reported or corresponding measures are taken for intervention. However, the link itself may be affected by some unpredictable interference factors, which are usually short and non-persistent, and may be recovered in a short time, and the actually caused effect is within an acceptable range, and if the link is determined to be abnormal only by detecting the instantaneous abnormality and taking corresponding intervention measures, the normal operation of the SSD may be affected by erroneous judgment. In addition, the storage device supports cascading various hard disk frames based on capacity expansion and other considerations, but is directly connected with the SSD, namely the frame where the SSD is located, currently, the common practice is to detect each SSD in the frame as an independent individual, and when an abnormality is detected, the abnormality of the backboard of the frame or the abnormality of the connection between the frame and a controller of a higher layer cannot be identified by the strategy, and the abnormality generally has commonality and generally affects not only one disk in the frame. If such an abnormality is encountered, merely identifying the abnormality of the SSD itself and taking corresponding measures may cause a certain number of SSDs to be identified as abnormal and queued for subsequent operations, which may not actually be effective and may be counterproductive. Disclosure of Invention Aiming at the technical problems, the application provides an SSD link state detection method and system based on link information, which improves the accuracy of SSD link state detection. In a first aspect, an embodiment of the present application provides a method for detecting an SSD link state based on link information, including: acquiring real-time rate values of a target link of the SSD at all times in a current observation window; Determining the transmission rate state of the target link at each moment according to each real-time rate value and a preset threshold, wherein the transmission rate state is a normal rate state, a first abnormal rate state and a second abnormal rate state; determining the current initial link state detection result of the target link according to the type duty ratio of each transmission rate state, wherein the initial link state detection result is a normal state, a first abnormal state and a second abnormal state; If the initial link state detection result is the first abnormal state, judging the current link state detection result of the target link according to PCIe error detection information of the target link in a preset time period. The embodiment of the application provides an SSD link state detection method based on link information, which comprises the steps of continuously obtaining a real-time rate value of a target link in an observation window, finely classifying transmission rate states (normal, first abnormal and second abnormal) at all moments based on a preset threshold value, further synthesizing a state ratio in the whole window to obtain an initial link state detection result, and finally compre