Search

US-20260126979-A1 - SYSTEMS AND METHODS FOR ON-DEMAND DEPLOYMENT OF PRE-CONFIGURED CONTAINERS

US20260126979A1US 20260126979 A1US20260126979 A1US 20260126979A1US-20260126979-A1

Abstract

Systems and methods to support on-demand deployment of pre-configured containers are disclosed. Exemplary implementations may store information electronically, including a particular artificial intelligence (AI) model and corresponding installation information; effectuate a presentation to a user, through a user interface, of a selectable user interface element, wherein the selectable user interface element is associated with the particular artificial intelligence model; responsive to the user selecting the selectable user interface element, provision a particular server that includes a particular Graphics Processing Unit (GPU), launch a container instance on the particular server such that the user has access to the particular GPU, install software in the container instance in accordance with the corresponding installation information, and install the particular AI model in the container instance; and/or perform other actions.

Inventors

  • Nader Khalil
  • Alecsander Fong

Assignees

  • NVIDIA CORPORATION

Dates

Publication Date
20260507
Application Date
20251231

Claims (20)

  1. 1 . A system configured to support on-demand deployment of pre-configured containers, the system comprising: electronic storage configured to store information electronically, wherein the stored information comprises at least one artificial intelligence (AI) model and corresponding installation information, wherein the corresponding installation information comprises references to one or more of: software applications, software libraries, or software development tools; and one or more hardware processors configured by machine-readable instructions to: effectuate or cause a presentation to a user, through a user interface, of a selectable user interface element, wherein the selectable user interface element is associated with the at least one AI model; and responsive to the user selecting the selectable user interface element: provision a server; launch a container instance; and install or configure software in the container instance, the software comprising one or more of the software applications in accordance with the corresponding installation information, the software libraries in accordance with the corresponding installation information, or the software development tools in accordance with the corresponding installation information, and wherein the container instance comprises the at least one AI model.
  2. 2 . The system of claim 1 , wherein the selectable user interface element is part of a browser extension or plug-in.
  3. 3 . The system of claim 1 , wherein the user has root access to the container instance running on a cloud services platform.
  4. 4 . The system of claim 1 , wherein the user interface is a browser interface, and wherein the user can use the at least one AI model for inference directly from the browser interface.
  5. 5 . The system of claim 1 , wherein the container instance is managed by a container management software application similar to or based on DOCKER, and wherein the container management software application performs launching the container instance and installing the software in the container instance.
  6. 6 . The system of claim 1 , wherein the container instance is managed by a container cluster manager similar to or based on KUBERNETES.
  7. 7 . The system of claim 1 , wherein the server uses an AMAZON™ Elastic Compute Cloud (EC2) instance provided through AMAZON WEB SERVICES™ (AWS).
  8. 8 . The system of claim 1 , wherein the container instance runs on either AWS, AZURE, or GCP.
  9. 9 . The system of claim 1 , wherein the one or more hardware processors are further configured to: verify whether the user already has access to a given GPU through a given server; and verify whether the given GPU has sufficient capabilities for execution of the at least one AI model; wherein the container instance is launched on the given server, and wherein the at least one AI model is installed such that the execution of the at least one AI model is performed on the given GPU.
  10. 10 . The system of claim 1 , wherein the user interface presents multiple selectable user interface elements that are associated with multiple AI models, respectively.
  11. 11 . The system of claim 1 , wherein the software applications in accordance with the corresponding installation information include a particular version of PYTHON, and wherein the software libraries in accordance with the corresponding installation information include a particular version of CUDA.
  12. 12 . The system of claim 1 , wherein the at least one AI model includes a neural network using over a billion parameters or weights.
  13. 13 . The system of claim 1 , wherein the at least one AI model is a generative text-to-image AI model.
  14. 14 . The system of claim 1 , wherein the at least one AI model is a large language model (LLM).
  15. 15 . A system configured to support on-demand deployment of pre-configured computing environments, the system comprising: electronic storage configured to store at least one artificial intelligence (AI) model and corresponding installation information, wherein the corresponding installation information specifies software dependencies for the at least one AI model; and one or more hardware processors configured by machine-readable instructions to: effectuate or cause a presentation to a user, through a user interface, of a selectable user interface element associated with the at least one AI model; and responsive to user interaction with the selectable user interface element: allocate computing resources for the at least one AI model; establish a computing environment using the allocated computing resources; and install or configure software in the computing environment in accordance with the corresponding installation information, and wherein the computing environment comprises the at least one AI model.
  16. 16 . The system of claim 15 , wherein the user interface is a browser interface, and wherein the user can use the at least one AI model for inference directly from the browser interface.
  17. 17 . The system of claim 15 , wherein the computing environment is managed by a container management software application similar to or based on DOCKER, and wherein the container management software application performs launching the computing environment and installing the software in the computing environment.
  18. 18 . The system of claim 15 , wherein the one or more hardware processors are further configured to: verify whether the user already has access to a given GPU through a given server; and verify whether the given GPU has sufficient capabilities for execution of the at least one AI model; wherein the computing environment is launched on the given server, and wherein the at least one AI model is installed such that the execution of the at least one AI model is performed on the given GPU.
  19. 19 . The system of claim 15 , wherein the user interface presents multiple selectable user interface elements that are associated with multiple AI models, respectively.
  20. 20 . A method of deploying artificial intelligence (AI) models, the method comprising: storing, in electronic storage, at least one AI model and corresponding information specifying software requirements for executing the at least one AI model; receiving a request to deploy the at least one AI model; and responsive to receiving the request: automatically allocating computing resources based on the information specifying the software requirements; establishing a container instance using the allocated computing resources; automatically configuring the container instance based on the information specifying the software requirements; and enabling access to the at least one AI model through the container instance.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a Continuation of U.S. patent application Ser. No. 18/417,496, filed on Jan. 19, 2024, which is incorporated herein by reference in its entirety and for all purposes. FIELD OF THE DISCLOSURE The present disclosure relates to systems and methods for pausing and restarting container instances across multiple cloud services platforms. BACKGROUND Containers, which are bundles of software applications and the dependencies needed for their code to run, are known. A container may include a file system, code, a runtime environment, system tools, libraries, and other elements. Container orchestration platforms (a.k.a. container orchestrators or container cluster managers suitable to operate and scale containerized applications) are known, such as, e.g., KUBERNETES™. SUMMARY Artificial intelligence (AI) models, including but not limited to generative AI models, are usable for a wide variety of tasks due to their flexible and powerful neural networks including billions of parameters and/or weights. Well-known examples include Chat Generative Pre-trained Transformer (CHATGPT), DALL-E, Stable Diffusion, Large Language Model Meta AI (LLaMA), and many other AI models, most of which require substantial computing resources to use, and some of which are open source. For example, one version of LLaMA-2 includes about 70 billion parameters. Containers are highly portable due to the packaging together of their elements, including code, runtime environments, system tools, libraries, and other elements. In some implementations, containers may include infrastructure services such as storage. Containers may need persistent storage, whether on-premises, in the cloud (e.g., AMAZON WEB SERVICES™ (AWS) cloud storage)), and/or other persistent storage. For example, a container may mount a storage volume, and bind the volume mount to a directory. A container cluster (or simply “cluster”) may provide one or more of dynamic container placement, cluster scheduling, labels and replication controllers, connections within a cluster (e.g., using naming resolution), and/or other services. An example of a container platform suitable for creating, deploying, and sharing containers is DOCKER™, which supports the DOCKER ENGINE™ as its runtime environment. Container instances may run on cloud services platforms (also referred to as cloud computing platforms), including but not limited to AMAZON WEB SERVICES™ (AWS), MICROSOFT AZURE™, and GOOGLE™ CLOUD PLATFORM (GCP). One aspect of the present disclosure relates to a system configured to support on-demand deployment of pre-configured containers. The system may store information electronically, including a particular artificial intelligence (AI) model and corresponding installation information. The system may effectuate a presentation to a user, through a user interface, of a selectable user interface element, wherein the selectable user interface element is associated with the particular artificial intelligence model. The system may, responsive to the user selecting the selectable user interface element, provision a particular server that includes a particular Graphics Processing Unit (GPU). As used herein, the term “GPU” refers to a computing architecture such as a High Performance Computing (HPC) architecture, and not merely or specifically to a personal unit for graphics rendering such as a graphics card. Examples of particular GPUs include the NVIDIA™ A100 architecture, the NVIDIA™ H100 architecture, the AMD™ MI250 architecture, the AMD™ MI300X architecture, and/or other architectures, including from INTEL™, APPLE™, as well as other competitors. The system may launch a container instance on the particular server such that the user has access to the particular GPU. The system may install software in the container instance in accordance with the corresponding installation information. The system may install the particular AI model in the container instance, and/or perform other actions. Another aspect of the present disclosure related to a method of supporting on-demand deployment of pre-configured containers. The method may include storing information electronically, including a particular artificial intelligence (AI) model and corresponding installation information. The method may include effectuating a presentation to a user, through a user interface, of a selectable user interface element, wherein the selectable user interface element is associated with the particular artificial intelligence model. The method may include, responsive to the user selecting the selectable user interface element, provisioning a particular server that includes a particular Graphics Processing Unit (GPU). The method may include launching a container instance on the particular server such that the user has access to the particular GPU. The method may include installing software in the container instance in accordance with the corresponding installation informat