US-12619444-B2 - Analyzing and recommending initial workload sizing to be run on a cluster

US12619444B2US 12619444 B2US12619444 B2US 12619444B2US-12619444-B2

Abstract

A computer-implemented method is described that provides an initial workload sizing to be run on a cluster. This includes receiving a configuration file defined to create a container using verified cluster limits. The configuration file is verified and sections of interest containing input values to define a family of a container definition are extracted. The verified configuration file is classified as Good, Bad or Neutral by a trained classification model using the input values. The configuration file is tagged with the classification, and Neutral classifications are tagged as Good with a warning. For a Bad classification, a knowledge database is consulted to identify whether family specification limits exist. If the family limits exist, the configuration file is adjusted using a new set of cluster limits associated with the family specification. If the family limits do not exist, the configuration file is adjusted using a new set of aleatory updates.

Inventors

Cesar Lourenco Botti Filho
Pablo Roberto Millicay Gonzalez
Diego Brito Veiga
Jorge Damiao Barbosa das Chagas

Assignees

INTERNATIONAL BUSINESS MACHINES CORPORATION

Dates

Publication Date: 20260505
Application Date: 20211130

Claims (20)

1 . A computer implemented method for providing an initial workload sizing for a container to be run on a cluster, the method, comprising: receiving a configuration file containing input values as a candidate to create the container in the cluster using verified cluster limits, wherein a plurality of container family types are defined for the cluster such that containers that use similar resources are in a same container family type, and wherein each container family type comprises a plurality of containers; verifying the configuration file; extracting strings from a limits section of the verified configuration file and a requests section of the verified configuration file to identify initial resources for the initial workload sizing; extracting strings containing input values from the verified configuration file usable to determine the container family type for the container from among the plurality of container family types defined for the cluster; classifying the verified configuration file using the input values to provide a classification of one of Good, Neutral or Bad using a trained machine learning model, wherein the trained machine learning model was trained using historical information for a plurality of historical containers of family types corresponding to the plurality of container family types defined for the cluster and wherein the classification of Bad indicates a likely runtime failure of the container; providing the classification as an output; responsive to the output containing the classification of Good or Neutral, indicating that the configuration file is ready to deploy; and responsive to the output containing the classification of Bad, consulting a knowledge database including technical specifications to determine whether family specification limits exist for the container family type of the container, responsive to determining that family specification limits exist, generating a new set of cluster limits and adjusting the configuration file using the new set of cluster limits to provide an adjusted configuration file; or responsive to determining that family specification limits do not exist, generating a new set of aleatory updates for the configuration file and adjusting the configuration file using the new set of aleatory updates to provide the adjusted configuration file; extracting strings, classifying the adjusted configuration file, providing an adjusted classification as the output, consulting the knowledge database, and generating another adjusted configuration file in the same manner as the verified configuration file repeatedly until the classification of one of Good or Neutral is provided as output to improve cluster workload deployment efficiency of the cluster and reduce runtime failure of containers in the cluster.
2 . The computer implemented method as recited in claim 1 , further comprising: responsive to the output containing the classification of Good, tagging the configuration file as Good; and responsive to the output containing the classification of Neutral, tagging the configuration file as Good with a warning.
3 . The computer implemented method of claim 1 , wherein the trained machine learning model was trained using a method comprising: scraping at least one repository having historical information for the plurality of historical containers of family types corresponding to the plurality of container family types defined for the cluster, wherein the historical information includes cluster limits, resource allocations and exit codes associated with the plurality of container family types defined for the cluster; classifying the container family type for each container of the plurality of historical containers; generating input features associated with each container family type among the plurality of container family types defined for the cluster; preparing data including the container family type and associated input features for each historical container for input to the machine learning model; and training the machine learning model with the prepared data to provide the trained machine learning model.
4 . The computer implemented method as recited in claim 3 , further comprising: generating a binary file from the trained machine learning model, wherein the binary file, when executed, provides the classification of one of Good, Neutral or Bad as the output, responsive to the input values of the verified configuration file.
5 . The computer implemented method as recited in claim 3 , wherein the at least one repository includes both a public repository and an enterprise repository, and wherein configuration files scraped from the public repository are assigned a lower weight than configuration files scraped from the enterprise repository in training the machine learning model.
6 . The computer implemented method as recited in claim 1 , wherein the extracting strings from the limits section of the verified configuration file and the requests section of the verified configuration file to identify initial resources for the initial workload sizing is based on image name and a set of key labels.
7 . A computer program product comprising: one or more computer readable storage media having program instructions collectively stored on the one or more computer readable storage media, wherein the program instructions are executable to provide an initial workload sizing for a container to be run on a cluster, wherein the program instructions include instructions executable to: receive a configuration file containing input values, the configuration file defined using a predetermined data serialization language, as a candidate to create the container in the cluster using verified cluster limits, wherein a plurality of container family types are defined for the cluster such that containers that use similar resources are in a same container family type, and wherein each container family type comprises a plurality of containers; verify the configuration file; extract strings from a limits section of the verified configuration file and a requests section of the verified configurations file to identify initial resources for the initial workload sizing; extract strings containing input values from the verified configuration file usable to determine the container family type for the container from among the plurality of family types defined for the cluster; classify the verified configuration file using the input values to provide a classification of one of Good, Neutral or Bad using a trained machine learning model, wherein the trained machine learning model was trained using historical information for a plurality of historical containers of family types corresponding to the a plurality of container family types defined for the cluster, and wherein the classification of Bad indicates a likely runtime failure of the container; provide the classification as an output; responsive to the output containing the classification of one of Good or Neutral, indicate that the configuration file is ready to deploy; and responsive to the output containing the classification of Bad, consult a knowledge database including technical specifications to determine whether family specification limits exist for container family type of the container, responsive to determining that family specification limits exist, generate a new set of cluster limits and adjust the configuration file using the new set of cluster limits to provide an adjusted configuration file; or responsive to determining that family specification limits do not exist, generate a new set of aleatory updates for the configuration file and adjust the configuration file using the new set of aleatory updates to provide the adjusted configuration file; and extract strings, classify the adjusted configuration file, provide an adjusted classification as the output, consult the knowledge database, and generate another adjusted configuration file in the same manner as the verified configuration file repeatedly until the classification of one of Good or Neutral is provided as the output to reduce runtime failure of containers in the cluster and improve cluster workload deployment efficiency of the cluster.
8 . The computer program product of claim 7 , wherein the trained machine learning model was trained using program instructions executable to: scrape at least one repository having historical information for the plurality of historical containers of family types corresponding to the plurality of container family types defined for the cluster, wherein the historical information includes cluster limits, resource allocations and exit codes associated with the plurality of container family types defined for the cluster; classify the container family type for each container of the plurality of historical containers; generate input features associated with each container family type among the plurality of container family types defined for the cluster; prepare data including the container family type and associated input features for each historical container for input to the machine learning model; and train the machine learning model with the prepared data to provide the trained machine learning model.
9 . The computer program product as recited in claim 8 , further comprising program instructions executable to: generate a binary file from the trained machine learning model, wherein the binary file, when executed, provides the classification of one of Good, Neutral or Bad as the output, responsive to the input values of the verified configuration file.
10 . The computer program product as recited in claim 8 , wherein the at least one repository comprises both a public repository and an enterprise repository, and wherein configuration files scraped from the public repository are assigned a lower weight than configuration files scraped from the enterprise repository in training the machine learning model.
11 . The computer program product as recited in claim 7 , wherein to extract strings from the limits section of the verified configuration file and the requests section of the verified configuration file to identify initial resources for the initial workload sizing is based on an image name and a set of key labels.
12 . The computer program product as recited in claim 7 , further comprising program instructions executable to: responsive to the output containing the classification of Good, tag the configuration file as Good; and responsive to the output containing the classification of Neutral, tag the configuration file as Good with a warning.
13 . A system comprising: a processor, a computer readable memory, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to provide an initial workload sizing for a container to be run on a cluster, wherein the program instructions include instructions executable to: receive a configuration file containing input values, the configuration file defined using a predetermined data serialization language, as a candidate to create the container in the cluster using verified cluster limits, wherein a plurality of container family types are defined for the cluster such that containers that use similar resources are in the same family type, and wherein each container family type comprises a plurality of containers; verify the configuration file; extract strings from a limits section of the verified configuration file and a requests section of the verified configuration file to identify initial resources for the initial workload sizing; extract strings containing input values from the verified configuration file usable to determine the container family type for the container from among the plurality of container family types defined for the cluster; classify the verified configuration file using the input values to provide a classification of one of Good, Neutral or Bad, using a trained machine learning model, wherein the trained machine learning model was trained using historical information for a plurality of historical containers of family types corresponding to the plurality of container family types defined for the cluster, and wherein the classification of Bad indicates a likely runtime failure of the container; provide the classification as an output; responsive to the output containing the classification of Good or Neutral, indicate that the configuration file is ready to deploy; and responsive to the output containing the classification of Bad, consult a knowledge database including technical specifications to determine whether family specification limits exist for the container family type of the container, responsive to determining that family specification limits exist, generate a new set of cluster limits and adjust the configuration file using the new set of cluster limits to provide an adjusted configuration file, or responsive to determining that family specification limits do not exist, generate a new set of aleatory updates for the configuration file and adjust the configuration file using the new set of aleatory updates to provide the adjusted configuration file; extract strings, classify the adjusted configuration file, provide an adjusted classification as the output, consult the knowledge database, and generate another adjusted configuration file in the same manner as the verified configuration file repeatedly until the classification of Good or Neutral is provided as the output, to improve cluster workload deployment efficiency of the cluster and reduce runtime failure of containers in the cluster.
14 . The system as recited in claim 13 , further comprising program instructions executable to: responsive to the output containing the classification of Good or Neutral, tag the configuration file as Good; and responsive to the output containing the classification of Neutral, tag the configuration file as Good with a warning.
15 . The system as recited in claim 13 , further comprising program instructions executable to train the trained machine learning model including instructions executable to: scrape at least one repository having historical information for the plurality of historical containers of family types corresponding to the plurality of container family types defined for the cluster, wherein the historical information includes cluster limits, resource allocations and exit codes associated with each of the plurality of container family types defined for the cluster; classify the container family type for each of the plurality of container family types defined for the cluster; generate input features associated with each container family type among the plurality of container family types defined for the cluster; prepare data including the container family type and associated input features for each historical container for input to the machine learning model; and train the machine learning model with the prepared data to provide the trained machine learning model.
16 . The system as recited in claim 15 , further comprising program instructions executable to: generate a binary file from the trained machine learning model, wherein the binary file, when executed, provides the classification of one of Good, Neutral or Bad as the output, responsive to the input values of the verified configuration file.
17 . The system as recited in claim 15 , wherein the at least one repository comprises both a public repository and an enterprise repository, and wherein configuration files scraped from the public repository are assigned a lower weight than configuration files scraped from the enterprise repository in training the machine learning model.
18 . The system as recited in claim 15 , wherein the historical information comprises historical container resource allocations and associated container exit codes.
19 . The system as recited in claim 13 , wherein to extract strings from the limits section of the verified configuration file and the requests section of the verified configuration file to identify initial resources for the initial workload sizing is based on an image name and a set of key labels.
20 . The system as recited in claim 13 , further comprising program instructions executable to receive the configuration file containing input values from a user and, when the configuration file is ready to deploy, provide the configuration file to the user.

Description

BACKGROUND Aspects of the present invention relate generally to workloads in a cluster, and more particularly, to improving the use of resources in containers executing on a computing device. In existing systems there are various ways of isolating processes executing on a machine. Some environments use virtual machines. Other environments use containers. Containers are typically seen as an abstraction in the application layer, whereby code and dependencies are compiled or packaged together. It is often feasible to run multiple containers on one machine. Each container instance shares the operating system (OS) kernel with other containers, with each running as an isolated process. Isolated processes may be preferred in implementations for security or other purposes. A sample application, or a microservice, is packaged into a container image and deployed for use through the container platform. The container platform is basically client-server software facilitating the execution of the container by providing three key operational components: A daemon, which is a process that runs in the background. This daemon manages objects like images, containers, and other communication (network), and storage (data volume) objects needed by the microservice encapsulated within the container.An application programming interface (API) which allows programs to interact with and direct the daemon process.A command line interface (CLI) where a client may issue commands, like “pull” and “run”, and is used to access container images from a configured registry. The command line uses the API to control or interact with the daemon through direct commands, or scripts containing commands. The daemon, in turn, delivers the results through the Host OS System for further processing, or as a final output. SUMMARY In a first aspect of the invention, there is a computer-implemented process including receiving a configuration file containing input values as a candidate for a workload sizing of a container in a cluster using verified cluster limits; verifying the configuration file of the candidate; extracting sections of interest from the verified configuration file; and classifying the verified configuration file into an output containing a classification of one of Good, Neutral or Bad. In response to an output containing a classification of Good or Neutral, the process indicates that the configuration file is ready to deploy, and provides the configuration file as the workload sizing candidate. In response to the output containing a classification of Bad, the process adjusts the configuration file with a new set of cluster limits as an adjusted configuration file, and classifies the adjusted configuration file into an output containing a classification of one of Good, Neutral or Bad. The adjusting and classifying of the adjusted configuration file are repeated until an output classification of Good or Neutral is identified, and providing the adjusted configuration file having the Good or Neutral output classification as the workload sizing candidate. In another aspect of the invention, there is a computer program product including one or more computer readable storage media having program instructions collectively stored on the one or more computer readable storage media. The program instructions are executable to: provide an initial workload sizing for a container to be run on a cluster, and wherein the program instruction are executable to: receive a configuration file, defined using a predetermined data serialization language, as a candidate to create a container using verified cluster limits; verify the configuration file of the candidate; extract sections of interest from the verified configuration file, containing input values to define a family of a container definition; classify the verified configuration file into an output containing a classification of one of Good, Neutral or Bad, using a predetermined classification model using the input values; responsive to a Good classification, indicate the configuration file is ready to deploy by tagging the configuration file as Good; and responsive to a Neutral classification, indicate the configuration file is not bad, by tagging the configuration file as Good with a warning. The program instructions are further executable to: responsive to a Bad classification: determine whether family specification limits exist for the workload in one or more knowledge databases including technical specifications; responsive to determining the family specification limits exist, adjust the configuration file using a new set of cluster limits, the new set of cluster limits associated with the family specification; and responsive to determining family specification limits do not exist, adjust the configuration file using a new set of aleatory updates. In another aspect of the invention, there is system including a processor, a computer readable memory, one or more computer readable storage media, and pro