US-12625789-B2 - Storage system, learning model, and learning model generation method

US12625789B2US 12625789 B2US12625789 B2US 12625789B2US-12625789-B2

Abstract

A failure predictor of a drive apparatus in a storage system is detected more accurately. A control apparatus 1 for a storage system S stores a learning model(s) 132 M for evaluating response performance of a drive apparatus 3 with respect to execution of a command relating to input and output by the control apparatus 1 . The control apparatus 1 acquires operation information of the drive apparatus 3 and inputs specified information regarding commands, which is included in the operation information, to the learning model 132 M. The control apparatus 1 judges a failure predictor of the drive apparatus 3 on the basis of output relating to the response performance by the learning model 132 M in response to the input of the specified information.

Inventors

Yuki Kotake
Akihiro SHIKANO
Atsushi Sato
Mizuho MOGI
Minoru Kobayashi

Assignees

HITACHI, LTD.

Dates

Publication Date: 20260512
Application Date: 20240320
Priority Date: 20230623

Claims (11)

1 . A storage system, comprising: a drive apparatus for storing data and a control apparatus for controlling input and output of data to and from the drive apparatus, wherein the control apparatus includes a processor and a memory, wherein the memory stores a learning model for evaluating a response performance of the drive apparatus with respect to execution of commands relating to the input and output by the control apparatus, and wherein the processor: acquires operation information of the drive apparatus; inputs specified information regarding the commands included in the operation information to the learning model; and judges a failure predictor of the drive apparatus based on a comparison between an evaluation threshold for evaluating response performance data indicating the response performance of the drive apparatus output by the learning model in response to the input of the specified information and the response performance data.
2 . The storage system according to claim 1 , wherein the specified information includes a transfer block length and a command count of both read commands and write commands among the commands; wherein the learning model has already learned training data that includes an explanatory variable which is generated from the transfer block length and the command count of the read commands and the write commands, and an objective variable which is generated from response performance data indicating a response performance of a measurement drive apparatus measured when transmitting the read commands with a first transfer block length as many as a first command count to the measurement drive apparatus and transmitting the write commands with a second transfer block length as many as a second command count to the measurement drive apparatus; and wherein the response performance data is response time, IOPS (Input/Output Per Second), latency, or throughput of the measurement drive apparatus with respect to execution of the read commands and the write commands.
3 . The storage system according to claim 2 , wherein the training data includes the response performance data measured when transmitting the read commands and the write commands, regarding which the first transfer block length and the second transfer block length are different, to the measurement drive apparatus.
4 . The storage system according to claim 3 , wherein the training data includes the response performance data measured when transmitting the read commands and the write commands, regarding which the first transfer block length and the second transfer block length are different, continuously to the measurement drive apparatus over a specified amount of time.
5 . The storage system according to claim 2 , wherein the training data includes the response performance data measured when transmitting the read commands and the write commands which are provided, on a regular basis, with an idling period where no load is placed on the measurement drive apparatus, to the measurement drive apparatus.
6 . The storage system according to claim 5 , wherein the training data includes the response performance data measured when transmitting the read commands and the write commands, which are provided with the idling period on a regular basis, continuously to the measurement drive apparatus over a specified amount of time.
7 . The storage system according to claim 2 , wherein the memory stores a plurality of learning models, including the learning model, in correspondence with a drive type and a capacity of the drive type, each learning model having already learned the training data of the drive apparatus, and wherein the processor: selects the learning model corresponding to each drive type and each capacity of the drive apparatus; and inputs the specified information to the selected learning model.
8 . The storage system according to claim 2 , wherein the specified information includes specified performance statistical information of the drive apparatus; and wherein the learning model has already learned the training data including the specified performance statistical information.
9 . The storage system according to claim 8 , wherein the learning model has already learned the training data including the specified performance statistical information of other drive apparatuses possessed by other storage systems which are collected via a network.
10 . A learning model for evaluating response performance of a drive apparatus with respect to read commands and write commands relating to input and output of data to the drive apparatus in a storage system including the drive apparatus for storing data and a control apparatus for controlling the input and output of data to and from the drive apparatus, wherein the learning model which already learned training data that includes an explanatory variable which is generated from transfer block lengths and a command count of the read commands and the write commands, and an objective variable which is generated from response performance data indicating a response performance of a measurement drive apparatus measured when transmitting the read commands with a first transfer block length as many as a first command count to the measurement drive apparatus and transmitting the write commands with a second transfer block length as many as a second command count to the measurement drive apparatus; wherein the training data includes the response performance data which is one or more of the following: the response performance data measured when transmitting the read commands and the write commands, regarding which the first transfer block length and the second transfer block length are different, to the measurement drive apparatus; the response performance data measured when transmitting the read commands and the write commands which are provided, on a regular basis, with an idling period where no load is placed on the measurement drive apparatus, to the measurement drive apparatus; the response performance data measured when transmitting the read commands and the write commands, regarding which the first transfer block length and the second transfer block length are different, continuously to the measurement drive apparatus over a specified amount of time; and the response performance data measured when transmitting the read commands and the write commands, which are provided with the idling period on a regular basis, continuously to the measurement drive apparatus over a specified amount of time; and wherein the control apparatus is caused to function so as to output an evaluation threshold of the drive apparatus to evaluate the response performance of the drive apparatus by receiving input of the transfer block lengths and the commands count of the read commands and the write commands which are transmitted by the control apparatus to the drive apparatus.
11 . A learning model generation method for evaluating response performance of a drive apparatus with respect to read commands and write commands relating to input and output of data to the drive apparatus in a storage system including the drive apparatus for storing data and a control apparatus for controlling the input and output of data to and from the drive apparatus, wherein the learning model generation apparatus method includes each of the following processing for: acquiring transfer block lengths and a command count of the read commands and the write commands; generating feature values from the transfer block lengths and the command count; generating an explanatory variable from the feature values; acquiring response performance data indicating response performance of a measurement drive apparatus measured when transmitting the read commands with a first transfer block length as many as a first command count to the measurement drive apparatus and transmitting the write commands with a second transfer block length as many as a second command count to the measurement drive apparatus; generating an objective variable from the response performance data; and learning training data including the explanatory variable and the objective variable; and wherein the training data includes the response performance data which is one or more of the following: the response performance data measured when transmitting the read commands and the write commands, regarding which the first transfer block length and the second transfer block length are different, to the measurement drive apparatus; the response performance data measured when transmitting the read commands and the write commands which are provided, on a regular basis, with an idling period where no load is placed on the measurement drive apparatus, to the measurement drive apparatus; the response performance data measured when transmitting the read commands and the write commands, regarding which the first transfer block length and the second transfer block length are different, continuously to the measurement drive apparatus over a specified amount of time; and the response performance data measured when transmitting the read commands and the write commands, which are provided with the idling period on a regular basis, continuously to the measurement drive apparatus over a specified amount of time.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2023-117605, filed on Jul. 19, 2023, the entire contents of which are incorporated herein by reference. BACKGROUND Generally, regarding a storage system, when response performance of a drive apparatus with respect to an I/O (Input-Output) request from a host becomes less than a threshold value, a failure predictor is detected. Alternatively, response performance of drive apparatuses belonging to the same RAID (Redundant Array of Inexpensive Disks) of the storage system is compared with each other and the failure predictor will be detected in a drive apparatus which is relatively delayed. There is also a conventional technology that detects a failure predictor of a drive apparatus by using a machine learning model(s). For example, Japanese Patent Application Laid-Open No. 2021-43891 discloses a technology detects the failure predictor of a drive apparatus on the premise that drive apparatuses have the same workload if they belong to the same parity group. However, there is a problem in a case of the storage system, like an SDS(s) which has become widespread in recent years, where the drive apparatuses which belong to the same parity group may not have the same workload, so that the failure predictor of a drive apparatus cannot be detected accurately. The SDS is Software Defined Storage. The present invention was devised in light of the above-described circumstance and it is an object of the invention to detect the failure predictor in a drive apparatus in the storage system more accurately. SUMMARY One aspect for solving the above-described problem is a storage system including a drive apparatus for storing data and a control apparatus for controlling input and output of data to and from the drive apparatus, wherein the control apparatus has a processor and a memory; wherein the memory stores a learning model for evaluating response performance of the drive apparatus with respect to execution of a command relating to the input and output by the control apparatus; and wherein the processor: acquires operation information of the drive apparatus; inputs specified information regarding the commands included in the operation information to the learning model; and judges a failure predictor of the drive apparatus based on output relating to the response performance by the learning model in response to the input of the specified information. According to the present invention, the failure predictor of the drive apparatus in the storage system can be detected more accurately. The details of one or more implementations of the subject matter described in the specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a diagram illustrating the configuration of a storage system and a learning model generation apparatus according to Embodiment 1; FIG. 2 is a diagram illustrating the structure of a learning model management table according to Embodiment 1; FIG. 3 is a diagram illustrating the structure of a learning model pool according to Embodiment 1; FIG. 4 is a diagram illustrating the structure of a failure predictor judgment result table according to Embodiment 1; FIG. 5 is a diagram illustrating the structure of a response performance measurement I/O pattern table according to Embodiment 1; FIG. 6 is a diagram illustrating the structure of a drive apparatus response performance data table according to Embodiment 1; FIG. 7 is a diagram illustrating the configuration and processing a control apparatus according to Embodiment 1; FIG. 8 is a diagram illustrating the configuration and processing of the learning model generation apparatus according to Embodiment 1; FIG. 9 is a flowchart illustrating drive apparatus failure predictor judgment processing according to Embodiment 1; FIG. 10 is a flowchart illustrating learning model generation processing according to Embodiment 1; FIG. 11A is a diagram explaining performance evaluation results of a learning model (Comparative Example) which has learned only training data acquired by setting the same transfer block length per command regarding read commands and write commands; FIG. 11B is a diagram for explaining performance evaluation result of a learning model (Embodiment 1) which has learned data by including training data acquired by setting different same transfer block lengths per command for read commands and write commands; FIG. 12 is a diagram for explaining actual response performance acquired by transmitting read commands and write commands without inserting an idling period and transmitting the read commands and the write commands by inserting the idling period to a normal drive apparatus; FIG. 13 is a diagram illustrati