US-12625717-B2 - Client-aware container image optimization
Abstract
Mechanisms are provided that provide a container image to a client computing device. The mechanisms receive, from a client computing device, a pull request for a requested container image. The pull request comprises a specification of cached container layers that are stored in a local container layer store of the client computing device. The mechanisms determine, based on the specification of the cached container layers, a set of one or more of the cached container layers that can be reused to provide the container image at the client computing device, thereby generating a set of reuse container layers. The mechanisms generate, based on the determination, an output container image comprising only container layers, of the requested container image, that provide files not provided by the set of reuse container layers. The mechanisms transmit the output container image to the client computing device.
Inventors
- Bhargav P. Kulkarni
- Harikrishnan Balagopal
- Padmanabha Venkatagiri Seshadri
- Ashok Pon Kumar Sree Prakash
- Akash Nayak
- Mehant Kammakomati
Assignees
- INTERNATIONAL BUSINESS MACHINES CORPORATION
Dates
- Publication Date
- 20260512
- Application Date
- 20230601
Claims (20)
- 1 . A computer-implemented method, in a data processing system, for providing a container image to a client computing device, the computer-implemented method comprising: receiving, from the client computing device, a pull request for the container image, wherein the pull request comprises a specification of the container image and an identification of each cached container layer of a plurality of cached container layers that are cached and stored locally in a local container layer store of the client computing device; determining a set of cached container layers, among the plurality of cached container layers, that is reusable to provide the container image at the client computing device, wherein the determining of the set of cached container layers is based on the specification of the container image and the identification of each cached container layer of the plurality of cached container layers; generating, based on the determining of the set of cached container layers, a set of reuse container layers; generating, based on the generating of the set of reuse container layers, an output container image comprising container layers of the container image, wherein the output container image provides a first set of files not provided by the set of reuse container layers, and the output container image does not include a second set of files present in the set of reuse container layers; and transmitting the output container image to the client computing device.
- 2 . The computer-implemented method of claim 1 , wherein the container layers of the output container image are fewer than a total number of container layers of the container image.
- 3 . The computer-implemented method of claim 1 , wherein the container layers of the output container image are different from the set of reuse container layers of the container image, wherein the container layers comprise the first set of files not present in the set of reuse container layers.
- 4 . The computer-implemented method of claim 1 , wherein the generating of the output container image further comprises executing a layer ordering operation on the container layers of the output container image to order the container layers according to a hierarchical ordering of the container layers of the output container image.
- 5 . The computer-implemented method of claim 4 , wherein the layer ordering operation comprises a topology sorting algorithm applied to a graph representation of the container layers of the output container image.
- 6 . The computer-implemented method of claim 1 , wherein the generating of the output container image comprises executing a residual correction on container images of the output container image to add the container layers of the output container image from a container layer repository.
- 7 . The computer-implemented method of claim 1 , wherein the generating of the output container image comprises executing a whiteout of a third set of files of the first set of files in the output container image to remove the third set of files that are not included in the container image.
- 8 . The computer-implemented method of claim 1 , wherein the generating of the set of reuse container layers comprises: determining a file universe, for the container image, comprising all files included in all container layers of the container image; and determining the second set of files of the plurality of cached container layers based on the file universe and a plurality of files included in the plurality of cached container layers to determine the first set of files that are not covered by the plurality of cached container layers, wherein the plurality of files of the plurality of cached container layers includes the second set of files.
- 9 . The computer-implemented method of claim 8 , wherein the determining of the second set of files comprises: identifying a first cached container layer among the plurality of cached container layers having a maximum number of a third set of files in the file universe, wherein the second set of files of the plurality of cached container layers includes the third set of files; adding the first cached container layer to the set of reuse container layers; removing the third set of files associated with the first cached container layer from the file universe to generate an updated file universe; and repeating the identifying, the adding, and the removing based on the updated file universe until no further second cached container layer, among the plurality of cached container layers, that has associated a fourth set of files in the updated file universe is found, wherein the second set of files of the plurality of cached container layers includes the fourth set of files.
- 10 . The computer-implemented method of claim 8 , wherein the determining of the second set of files comprises: determining, for each cached container layer of the plurality of cached container layers, a redundancy metric value and a contribution metric value, wherein the redundancy metric value of a first cached container layer of the plurality of cached container layers specifies a number of cached container layers, in the plurality of cached container layers, that contain a rarest file in the first cached container layer, and wherein the contribution metric value of the first cached container layer specifies a number of wanted files in the first cached container layer minus a number of unwanted files in the first cached container layer; selecting the first cached container layer based on the redundancy metric value of the first cached container layer is smallest among the plurality of cached container layers, and the contribution metric value of the first cached container layer is largest among the plurality of cached container layers; removing a third set of files, associated with the selected first cached container layer, from the file universe to generate an updated file universe, wherein the third set of files includes the rarest file and the wanted files, and the second set of files of the plurality of cached container layers includes the third set of files; and repeating the determining of the redundancy metric value and the contribution metric value, the selecting, and the removing based on the updated file universe until no further second cached container layer that has associated a fourth set of files in the updated file universe is selected, wherein the second set of files of the plurality of cached container layers includes the fourth set of files.
- 11 . A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a data processing system, causes the data processing system to: receive, from a client computing device, a pull request for a container image, wherein the pull request comprises a specification of the container image and an identification of each cached container layer of a plurality of cached container layers that are cached and stored locally in a local container layer store of the client computing device; determine, a set of cached container layers, among the plurality of cached container layers, that is reusable to provide the container image at the client computing device, wherein the determination of the set of cached container layers is based on the specification of the container image and the identification of each cached container layer of the plurality of cached container layers; generating, based on the determination of the set of cached container layers, a set of reuse container layers; generate, based on the generation of the set of reuse container layers, an output container image comprising container layers of the container image, wherein the output container image provides a first set of files not provided by the set of reuse container layers, and the output container image does not include a second set of files present in the set of reuse container layers; and transmit the output container image to the client computing device.
- 12 . The computer program product of claim 11 , wherein the container layers of the output container image are different from the set of reuse container layers of the container image, wherein the container layers comprise the first set of files not present in the set of reuse container layers.
- 13 . The computer program product of claim 11 , wherein the generation of the output container image further comprises executing a layer ordering operation on the container layers of the output container image to order the container layers according to a hierarchical ordering of the container layers of the output container image.
- 14 . The computer program product of claim 13 , wherein the layer ordering operation comprises a topology sorting algorithm applied to a graph representation of the container layers of the output container image.
- 15 . The computer program product of claim 11 , wherein the generation of the output container image comprises executing a residual correction on container images of the output container image to add the container layers of the output container image from a container layer repository.
- 16 . The computer program product of claim 11 , wherein the generation of the output container image comprises executing a whiteout of a third set of files of the first set of files in the output container image to remove the third set of files that are not included in the container image.
- 17 . The computer program product of claim 11 , wherein the generation of the set of reuse container layers comprises: determining a file universe, for the container image, comprising all files included in all container layers of the container image; and determining the second set of files of the plurality of cached container layers based on the file universe and a plurality of files included in the plurality of cached container layers to determine the first set of files that are not covered by the plurality of cached container layers, wherein the plurality of files of the plurality of cached container layers includes the second set of files.
- 18 . The computer program product of claim 17 , wherein the determination of the second set of files comprises: identifying a first cached container layer among the plurality of cached container layers having a maximum number a third set of files in the file universe, wherein the second set of files of the plurality of cached container layers includes the third set of files; adding the first cached container layer to the set of reuse container layers; removing the third set of files associated with the first cached container layer from the file universe to generate an updated file universe; and repeating the identifying, the adding, and the removing based on the updated file universe until no further second cached container layer, among the plurality of cached container layers, that has associated a fourth set of files in the updated file universe is found, wherein the second set of files of the plurality of cached container layers includes the fourth set of files.
- 19 . The computer program product of claim 17 , wherein the determination of the second set of files comprises: determining, for each cached container layer of the plurality of cached container layers, a redundancy metric value and a contribution metric value, wherein the redundancy metric value of a first cached container layer of the plurality of cached container layers specifies a number of cached container layers, in the plurality of cached container layers, that contain a rarest file in the first cached container layer, and wherein the contribution metric value of the first cached container layer specifies a number of wanted files in the first cached container layer minus a number of unwanted files in the first cached container layer; selecting the first cached container layer based on the redundancy metric value of the first cached container layer is smallest among the plurality of cached container layers, and the contribution metric value of the first cached container layer is largest among the plurality of cached container layers; removing a third set of files, associated with the selected first cached container layer, from the file universe to generate an updated file universe, wherein the third set of files includes the rarest file and the wanted files, and the second set of files of the plurality of cached container layers includes the third set of files; and repeating the determining of the redundancy metric value and the contribution metric value, the selecting, and the removing based on the updated file universe until no further second cached container layer that has associated a fourth set of files in the updated file universe is selected, wherein the second set of files of the plurality of cached container layers includes the fourth set of files.
- 20 . An apparatus comprising: at least one processor; and at least one memory coupled to the at least one processor, wherein the at least one memory comprises instructions which, when executed by the at least one processor, cause the at least one processor to: receive, from a client computing device, a pull request for a container image, wherein the pull request comprises a specification container image and an identification of each cached container layer of a plurality of cached container layers that are cached and stored locally in a local container layer store of the client computing device; determine a set of cached container layers, among the plurality of cached container layers, that is reusable to provide the container image at the client computing device, wherein the determination of the set of cached container layers is based on the specification container image and the identification of each cached container layer of the plurality of cached container layers; generating, based on the determination of the set of cached container layers, a set of reuse container layers; generate, based on the generation of the set of reuse container layers, an output container image comprising container layers of the container image, wherein the output container image provides a first set of files not provided by the set of reuse container layers, and the output container image does not include a second set of files present in the set of reuse container layers; and transmit the output container image to the client computing device.
Description
BACKGROUND The present application relates generally to an improved data processing apparatus and method, and more specifically to an improved computing tool and improved computing tool operations/functionality for performing client-aware container image optimization. The cloud-edge is the area of a cloud network architecture where a device or local network interfaces with the cloud computing system. To illustrate the cloud-edge, often persons reference the cloud computing system in terms of levels, where level 0-1 is the user devices, level 2 is line server level, and levels 3 may be a plant data center level, level 4 may be a regional data center level, and level 5 may be a headquarters/cloud data center level. The breakdown of levels is not a fixed architecture, and may have modifications, but essentially the levels give an indication as to how close or how far away from the core computing resources of the cloud computing system that a particular resource is located. For example, levels 0-2 may be considered part of the edge of the cloud computing system, whereas levels 3-5 may be considered to be the core of the cloud computing system. Virtualization at the edge is increasing with the emergence of single node or lean container orchestration platforms, such as the lightweight Kubernetes distribution K3s, K3d which is a lightweight wrapper to run k3s, single node OpenShift which offers both control and worker node functionality in a single node, and the like. SUMMARY This Summary is provided to introduce a selection of concepts in a simplified form that are further described herein in the Detailed Description. This Summary is not intended to identify key factors or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. In one illustrative embodiment, a method, in a data processing system, is provided for providing a container image to a client computing device. The method comprises receiving, from a client computing device, a pull request for a requested container image. The pull request comprises a specification of cached container layers that are stored in a local container layer store of the client computing device. The method further comprises determining, based on the specification of the cached container layers, a set of one or more of the cached container layers that can be reused to provide the container image at the client computing device, thereby generating a set of reuse container layers. The method also comprises generating, based on the determination, an output container image comprising container layers, of the requested container image, that provide files not provided by the set of reuse container layers, wherein the output container image does not include files present in the set of reuse container layers. In addition, the method comprises transmitting the output container image to the client computing device. In some illustrative embodiments, the output container image comprises fewer layers than a total number of the layers of the requested container image. In this way, the effective number of bytes that must be transmitted to the client computing device to implement the container image at the client computing device is minimized. In some illustrative embodiments, the output container image comprises new layers of the output container image, different from layers of the requested container image, wherein the new layers comprise files not present in the cached container layers. Thus, the mechanisms of the illustrative embodiments are able to build a modified container image that is transmitted that comprises container layers that may not be present in the stored container image. In some illustrative embodiments, generating the output container image comprises executing a layer ordering operation on container layers of the output container image to order them according to a hierarchical ordering of the container layers of the output container image. In some illustrative embodiments, this layer ordering operation comprises a topology sorting algorithm applied to a graph representation of the container layers of the output container image. In this way, dependencies between container layers, due to the container layer architecture, may be considered to ensure the correct combination of files are present in the implemented container image at the client computing device. In some illustrative embodiments, generating the output container image comprises executing a residual correction on the container images of the output container image to add container layers from a container layer repository that have files of the requested container image that are not included in the container layers of the output container image, and are not in the set of reuse container layers, as one or more additional files in the output container image. In this way, files that may be not present in container layers of a container layer repository