Search

US-12619454-B2 - Autonomous cell-based control plane for scalable virtualized computing

US12619454B2US 12619454 B2US12619454 B2US 12619454B2US-12619454-B2

Abstract

A number of cells of a control plane of a virtualized computing service are set up, including a first cell with one or more request processing nodes, a local instance of a data store, and metadata indicating a set of virtualization hosts. A request processer transmits a request for a virtual machine to the first cell. A request processor of the cell initiates a workflow to launch the virtual machine using a virtualization host; the workflow includes storing a record of the request in the local instance of the data store.

Inventors

  • UPENDRA BHALCHANDRA SHEVADE
  • Diwakar Gupta
  • MICHAEL BROOKE FURR
  • Nishant Mehta
  • Kevin P. Smith

Assignees

  • AMAZON TECHNOLOGIES, INC.

Dates

Publication Date
20260505
Application Date
20210628

Claims (17)

  1. 1 . A computer-implemented method, comprising: receiving, via one or more programmatic interfaces of a computing service of a cloud computing environment, a first request from a client to launch a first virtual machine; determining, at the computing service, that a first resource requirement indicated in the first request does not match a resource capacity of individual ones of a plurality of pre-defined virtual machine types of the computing service, wherein the first requirement is distinct from the plurality of pre-defined virtual machine types; and determining, based at least in part on one or more properties of the first request, that one or more administrative operations associated with the first virtual machine are to be processed at a cell-based control plane layer of the computing service, wherein the cell-based control plane layer comprises a plurality of cells including a first cell, wherein the first cell comprises one or more request processing nodes, and wherein the first cell comprises metadata indicating a plurality of host computers associated with the first cell; assigning the first cell to fulfill the first request to launch the first virtual machine; initiating, by a request processing node of the one or more request processing nodes of the first cell, at least a portion of a workflow to perform said launch the first virtual machine; and at a host computer of the plurality of host computers of the computing service, wherein the host computer is selected from the plurality of host computers based at least in part on the first resource requirement indicated in the first request.
  2. 2 . The computer-implemented method as recited in claim 1 , wherein the first resource requirement applies to a first resource type from a set of resource types which includes computing resources, memory, storage, and networking resources, the computer-implemented method further comprising: in response to determining, at the computing service, that a second resource requirement indicated in a second request to launch a second virtual machine can be satisfied at the host computer after the first virtual machine has been launched, wherein the second resource requirement applies to a second resource type from the set of resource types, and wherein the second resource requirement does not match the resource capacity of individual ones of the plurality of pre-defined virtual machine types of the computing service: instantiating the second virtual machine at the host computer.
  3. 3 . The computer-implemented method as recited in claim 1 , further comprising: implementing, using the first cell, one or more requested state changes of the first virtual machine, without interacting with another cell of the plurality of cells.
  4. 4 . The computer-implemented method as recited in claim 1 , wherein determining that the one or more administrative operations associated with the first virtual machine are to be processed at the cell-based control plane layer is based at least in part on determining that a demand for processing, storage, or memory resources of the first virtual machine is below a threshold.
  5. 5 . The computer-implemented method as recited in claim 1 , further comprising: instantiating, at the host computer, a second virtual machine, wherein the second virtual machine belongs to a pre-defined virtual machine type of the plurality of pre-defined virtual machine types.
  6. 6 . The computer-implemented method as recited in claim 1 , further comprising: storing, at the computing service, an indication of a first limit on computing, memory, storage or networking resources to be assigned to the first virtual machine; modifying the first limit to a second limit in response to a programmatic request received at the computing service; and assigning a resource of the computing, memory, storage or networking resources to the first virtual machine based at least in part on the second limit.
  7. 7 . A system, comprising: one or more processors and corresponding memory; wherein the memory stores instructions that upon execution on or across the one or more processors: obtain, via one or more programmatic interfaces of a computing service of a cloud computing environment a first request from a client to launch a first virtual machine; determine, at the computing service, that a first resource requirement indicated in the first request does not match a resource capacity of individual ones of a plurality of pre-defined virtual machine types of the computing service, wherein the first resource requirement is distinct from the plurality of pre-defined virtual machine types; determine, based at least in part on one or more properties of the first request, that one or more administrative operations associated with the first virtual machine are to be processed at a cell-based control plane layer of the computing service, wherein the cell-based control plane layer comprises a plurality of cells including a first cell, wherein the first cell comprises one or more request processing nodes, and wherein the first cell comprises metadata indicating a plurality of host computers associated with the first cell; assign the first cell to fulfill the first request to launch the first virtual machine; and initiate, by a request processing node of the one or more request processing nodes of the first cell, at least a portion of a workflow to instantiate the first virtual machine; and the first virtual machine at a host computer at the computing service, wherein the host computer is selected from the plurality of host computers based at least in part on the first resource requirement indicated in the first request.
  8. 8 . The system as recited in claim 7 , wherein the first resource requirement applies to a first resource type from a set of resource types which includes computing resources, memory, storage, and networking resources, and wherein the memory stores further instructions that upon execution on or across the one or more processors: in response to determining that a second resource requirement indicated in a second request to launch a second virtual machine can be satisfied at the host computer after the first virtual machine has been launched, wherein the second resource requirement applies to a second resource type from the set of resource types, and wherein the second resource requirement does not match the resource capacity of individual ones of the plurality of pre-defined virtual machine types of the computing service: instantiate the second virtual machine at the host computer.
  9. 9 . The system as recited in claim 7 , wherein the memory stores further instructions that upon execution on or across the one or more processors: implement, using the first cell, one or more requested state changes of the first virtual machine, without interacting with another cell of the plurality of cells.
  10. 10 . The system as recited in claim 7 , wherein a determination that the one or more administrative operations associated with the first virtual machine are to be processed at the cell-based control plane layer is based at least in part on a determination that a resource requirement of the first request is below a threshold.
  11. 11 . The system as recited in claim 7 , wherein the memory stores further instructions that upon execution on or across the one or more processors: instantiate, at the host computer, a second virtual machine, wherein the second virtual machine belongs to a pre-defined virtual machine type of the plurality of pre-defined virtual machine types.
  12. 12 . The system as recited in claim 7 , wherein the memory stores further instructions that upon execution on or across the one or more processors: store, at the computing service, an indication of a first limit on computing, memory, storage or networking resources to be assigned to the first virtual machine; modify the first limit to a second limit in response to a programmatic request received at the computing service; and assign a resource of the computing, memory, storage or networking resources to the first virtual machine based at least in part on the second limit.
  13. 13 . One or more non-transitory computer-accessible storage media storing program instructions that when executed on or across one or more processors: obtain, via one or more programmatic interfaces of a computing service of a cloud computing environment, a first request from a client to launch a first virtual machine; determine, at the computing service, that a first resource requirement indicated in the first request does not match a resource capacity of individual ones of a plurality of pre-defined virtual machine types of the computing service, wherein the first resource requirement is distinct from the plurality of pre-defined virtual machine types; determine, based at least in part on one or more properties of the first request, that one or more administrative operations associated with the first virtual machine are to be processed at a cell-based control plane layer of the computing service, wherein the cell-based control plane layer comprises a plurality of cells including a first cell, wherein the first cell comprises one or more request processing nodes, and wherein the first cell comprises metadata indicating a plurality of host computers associated with the first cell; assign the first cell to fulfill the first request to launch the first virtual machine; initiate, by a request processing node of the one or more request processing nodes of the first cell, at least a portion of a workflow to instantiate the first virtual machine; and instantiate the first virtual machine at a host computer at the computing service, wherein the host computer is selected from the plurality of host computers based at least in part on the first resource requirement indicated in the first request.
  14. 14 . The one or more non-transitory computer-accessible storage media as recited in claim 13 , wherein the first resource requirement applies to a first resource type from a set of resource types which includes computing resources, memory, storage, and networking resources, and wherein the one or more non-transitory computer-accessible storage media store further program instructions that when executed on or across the one or more processors: in response to determining that a second resource requirement indicated in a second request to launch a second virtual machine can be satisfied at the host computer after the first virtual machine has been launched, wherein the second resource requirement applies to a second resource type from the set of resource types, and wherein the second resource requirement does not match the resource capacity of individual ones of the plurality of pre-defined virtual machine types of the computing service: instantiate the second virtual machine at the host computer.
  15. 15 . The one or more non-transitory computer-accessible storage media as recited in claim 13 , storing further program instructions that when executed on or across the one or more processors: implement, using the first cell, one or more requested state changes of the first virtual machine, without interacting with another cell of the plurality of cells.
  16. 16 . The one or more non-transitory computer-accessible storage media as recited in claim 13 , wherein a determination that the one or more administrative operations associated with the first virtual machine are to be processed at the cell-based control plane layer is based at least in part on a determination that a performance requirement of the first request is below a threshold.
  17. 17 . The one or more non-transitory computer-accessible storage media as recited in claim 13 , storing further program instructions that when executed on or across the one or more processors: instantiate, at the host computer, a second virtual machine, wherein the second virtual machine belongs to a pre-defined virtual machine type of the plurality of pre-defined virtual machine types.

Description

This application is a continuation of U.S. patent application Ser. No. 15/905,681, filed Feb. 26, 2018, which is hereby incorporated in reference herein in its entirety. BACKGROUND Many companies and other organizations operate computer networks that interconnect numerous computing systems to support their operations, such as with the computing systems being co-located (e.g., as part of a local network) or instead located in multiple distinct geographical locations (e.g., connected via one or more private or public intermediate networks). For example, data centers housing significant numbers of interconnected computing systems have become commonplace, such as private data centers that are operated by and on behalf of a single organization, and public data centers that are operated by entities as businesses to provide computing resources to customers. Some public data center operators provide network access, power, and secure installation facilities for hardware owned by various customers, while other public data center operators provide “full service” facilities that also include hardware resources made available for use by their customers. The advent of virtualization technologies for commodity hardware has provided benefits with respect to managing large-scale computing resources for many customers with diverse needs, allowing various computing resources to be efficiently and securely shared by multiple customers. For example, virtualization technologies may allow a single physical virtualization host to be shared among multiple users by providing each user with one or more “guest” virtual machines hosted by the single virtualization host. Each such virtual machine may represent a software simulation acting as a distinct logical computing system that provides users with the illusion that they are the sole operators of a given hardware computing resource, while also providing application isolation and security among the various virtual machines. Instantiating several different virtual machines on the same host may also help increase the overall hardware utilization levels at a data center, leading to higher returns on investment. A network-accessible service that provides virtualized computing functionality may have to manage hundreds of thousands, or even millions, of virtual machines concurrently. Some of the virtual machines, established for long-running client applications, may remain operational for weeks, months, or years. Other virtual machines may be short-lived—e.g., lasting for just a few minutes or seconds to perform a specific task on behalf of a client. The demand for different types of virtual machine may vary substantially over time. The portion of a virtualized computing service which handles administrative actions, such as the provisioning of physical resources, networking configuration and the like, may be referred to as the control plane of the service; the portion used primarily for client applications and data may be referred to as the data plane. Designing a control plane that can efficiently handle a workload to manage large, dynamically changing mixes of virtual machines with widely differing functional and performance requirements remains a non-trivial technical challenge. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 illustrates an example system environment in which a virtual computing service whose control plane comprises a plurality of autonomous cells for enhanced scalability and availability may be implemented, according to at least some embodiments. FIG. 2 illustrates an example of a flexible virtual machine specification which may be provided by a client of a virtualized computing service which also supports a set of standardized virtual machine categories, according to at least some embodiments. FIG. 3 provides a high-level overview of an example architecture of a control plane cell of a virtualized computing service, according to at least some embodiments. FIG. 4 illustrates example isolation characteristics of a cell-based architecture of a control plane of a virtualized computing service, according to at least some embodiments. FIG. 5 illustrates example components of a virtualization host which may be managed with the help of a control plane cell, according to at least some embodiments. FIG. 6 illustrates a high-level overview of a persistent log-based data store which may be employed to store records pertaining to virtual machines state changes at a control plane cell, according to at least some embodiments. FIG. 7 illustrates an example directed acyclic graph configuration of a persistent log-based data store, according to at least some embodiments. FIG. 8 illustrates a provider network environment at which a virtualized computing service with a cell-based control plane may be implemented, according to at least some embodiments. FIG. 9 is a flow diagram illustrating aspects of operations that may be performed to manage a pool of cells of a control plane of a virtualized compu