Search

EP-4742036-A1 - RESOURCE MANAGEMENT PROGRAM, RESOURCE MANAGEMENT METHOD, AND RESOURCE MANAGEMENT APPARATUS

EP4742036A1EP 4742036 A1EP4742036 A1EP 4742036A1EP-4742036-A1

Abstract

A computer (31) is caused to perform a process including, installing an OS on a standby server having a predetermined hardware configuration and setting the standby server to standby in a state where the OS is booted, specifying, when the standby server is incorporated into a cluster, a hardware resource to be used based on a predetermined job to be allocated to the standby server and specifying an additional resource to be added to the predetermined hardware configuration; acquiring the additional resource from a resource pool that stores a plurality of hardware resources and adding the acquired additional resource to the standby server, completing configuring the additional resource in the standby server, and incorporating the standby server into the cluster.

Inventors

  • HANADA, MITSURU
  • TANEDA, MASAHIRO
  • FURUHASHI, YOSHIHIRO
  • HOSOKAWA, YUKA

Assignees

  • Fsas Technologies Inc.

Dates

Publication Date
20260513
Application Date
20251030

Claims (8)

  1. A resource management program causing a computer (31) to perform a process comprising: installing an operating system (OS) on a backup device having a predetermined hardware configuration and setting the backup device to standby in a state where the OS is booted; specifying, when the backup device is incorporated into a cluster, a hardware resource to be used based on a predetermined job to be allocated to the backup device and specifying an additional resource to be added to the predetermined hardware configuration; acquiring the additional resource from a resource pool (3) that stores a plurality of hardware resources and adding the acquired additional resource to the backup device; completing configuring the additional resource in the backup device; and incorporating the backup device into the cluster.
  2. The resource management program according to claim 1, wherein the specifying the additional resource includes acquiring information of the predetermined job among jobs stored in a job queue of the cluster and determining the hardware resource to be used, and setting a difference between the determined hardware resource and the predetermined hardware configuration to the additional resource.
  3. The resource management program according to claim 2, wherein the setting the backup device to standby in the state where the OS is booted includes monitoring the cluster and determining whether or not to execute the incorporation of the backup device based on an allocation state of the jobs stored in the job queue.
  4. The resource management program according to claim 1 or 2, wherein completing configuring the backup device includes loading a driver of the additional resource, reserving a resource used for the driver, and installing the driver.
  5. The resource management program according to any one of claims 1 to 4, causing the computer to further execute a process of acquiring the predetermined resource from the resource pool and adding the acquired predetermined resource to the backup device to make the backup device have the predetermined hardware configuration before installing the OS on the backup device.
  6. The resource management program according to any one of claims 1 to 5, wherein the specifying the additional resource is a process of specifying, as the additional resource, any one type or a plurality of types among a graphics processing unit (GPU), an auxiliary storage device, and a main storage device.
  7. A resource management method carried out by a computer (31), comprising: installing an OS on a backup device having a predetermined hardware configuration and setting the backup device to standby in a state where the OS is booted; specifying, when the backup device is incorporated into a cluster, a hardware resource to be used based on a predetermined job to be allocated to the backup device and determining an additional resource to be added to the predetermined hardware configuration; acquiring the additional resource from a resource pool (3) that stores a plurality of hardware resources and adding the acquired additional resource to the backup device; completing configuring the additional resource in the backup device; and incorporating the backup device into the cluster.
  8. A resource management apparatus (31) comprising: a configuration management unit (13) configured to set a backup device having a predetermined hardware configuration to standby in a state where an OS is booted; an additional configuration determination unit (14) configured to specify, when the backup device is incorporated into a cluster, a hardware resource to be used based on a predetermined job to be allocated to the backup device and to determine an additional resource to be added to the predetermined hardware configuration; a configuration change unit (16) configured to acquire the additional resource from a resource pool (3) that stores a plurality of hardware resources and to add the acquired additional resource to the backup device; a software application unit (15) configured to complete configuring the additional resource in the backup device; and a cluster management unit (11) configured to incorporate the backup device where the additional resource is completed into the cluster.

Description

FIELD The embodiment discussed herein is related to a resource management program, a resource management method, and a resource management apparatus. BACKGROUND In core businesses or data centers of companies, a method called clustering in which many computers such as servers are clustered and collectively managed is frequently used. Each of the clustered computers is called a node. As a clustering software for implementing the clustering, RHOCP (Red Hat (registered trademark), OpenShift (registered trademark) Container Platform), Rancher, or the like is present. By using the clustering software, a node can be dynamically added or deleted according to the total load of the cluster. The clustering technique is particularly important, for example, when a business has a large variation in load or business expansion is expected. Further, as a technique for improving the convenience of clustering, a technique called a composable disaggregated infrastructure (CDI) has been developed. In the CDI system, various computer resources such as a GPU (Graphics Processing Unit) or a computer main body are pooled in advance as the entire system, and computers with various configurations can be dynamically prepared as needed. As the clustering technique, the following technique is present. There is disclosed a technique for setting a transfer unit and an input/output (I/O) unit of a standby server device to standby, setting the status of an application execution unit to active, and switching the transfer unit and the I/O unit from standby to active when a failure is detected. In addition, there is disclosed a technique in which, when an identifier of each system switching method is set and hot standby is selected, a system switching method corresponding to a designated identifier is reflected on a standby system in a state where the standby system is activated. In addition, in a work inheriting system including a plurality of clusters, a technique of inheriting a work of a cluster in which a failure occurs to a hot standby cluster is disclosed. Patent Literature 1: Japanese Laid-open Patent Publication No. 2012-155540Patent Literature 2: Japanese Laid-open Patent Publication No. 2008-269332Patent Literature 3: International Publication Pamphlet No. WO 1997/049034 However, when a node is added to a cluster, the following problem occurs. When a node is added to a cluster where a job that is not allocated due to resource deficiency occurs, a work for preparing a node in a state where a job can be executed, for example, hardware configuration construction, OS (Operating System) installation, or software deployment is executed. For this work, for example, about 600 seconds is needed as a whole. In addition, when an activated node is set to standby and the node set to standby is added to a cluster where a job that is not allocated due to resource deficiency occurs, resources of the added node may be unsuitable for the job. For example, a case where a GPU having a sufficient performance is not provided in a standby node for a job that needs a high-performance GPU or a case where a high-performance GPU is mounted on a standby node for a job that does not need a GPU can be considered. This way, in the clustering technique of the related art, it is difficult to add a node having a hardware configuration corresponding to a job within a short period of time, and the efficient use of resources is difficult. Even in the technique of setting the transfer unit and the I/O unit to standby, setting the status of the application execution unit to active, and executing the switching, it is difficult to add a node having a hardware configuration corresponding to a job. In addition, in the technique of reflecting the system switching method designated to each standby system or the technique of inheriting the work of the cluster where a failure occurs to the hot standby cluster, resources to be added are fixed, and it is difficult to add a node having a hardware configuration corresponding to a job. Accordingly, regardless of the use of any of the techniques, the efficient use of resources is difficult. The present disclosed technique has been made under these circumstances, and an object thereof is to provide a resource management program, a resource management method, and a resource management apparatus capable of improving the usage efficiency of resources. SUMMARY According to an aspect of an embodiment, a resource management program causes a computer to perform a process including, installing an operating system (OS) on a backup device having a predetermined hardware configuration and setting the backup device to standby in a state where the OS is booted, specifying, when the backup device is incorporated into a cluster, a hardware resource to be used based on a predetermined job to be allocated to the backup device and specifying an additional resource to be added to the predetermined hardware configuration, acquiring the additional resourc