DE-102025145247-A1 - SELF-OPTIMIZING CONTAINER IMAGE FILE SYSTEM
Abstract
A data platform is provided that builds a container image file system from a container image. The data platform runs an application that accesses the container image file system and records the directory and file access activity of the application during execution. Using this directory and file access activity, the data platform dynamically rebuilds the container image file system and mounts it as a newly rebuilt system during subsequent executions of the application.
Inventors
- David B. Bailley
- Benoit Dageville
- Egor Derevenetc
- Mihir Sathe
Assignees
- SNOWFLAKE INC.
Dates
- Publication Date
- 20260513
- Application Date
- 20251104
- Priority Date
- 20250109
Claims (20)
- A machine-implemented method comprising: Building a container image file system from a container image; Executing an application that accesses the container image file system; Recording directory and file access activity of the container image file system by the application during execution; Dynamicly reconstructing the container image file system using the directory and file access activity, the reconstructing of the container image file system resulting in a reconstructed container image file system; and Mounting the reconstructed container image file system during a subsequent execution of the application.
- Machine-implemented method according to Claim 1 , wherein the dynamic restoration of the container image file system includes creating a new version of a Binary Large Object (Blob) for metadata and one or more content blobs comprising the container image file system, wherein the one or more content blobs contain files grouped based on directory and file access activity.
- The machine-implemented method according to Claim 2 , which further includes: partitioning two or more content blobs into a first set of content blobs that are expected to be used in their entirety during the startup of the application, and a second set of content blobs that are not expected to be used in their entirety during the startup of the application.
- The machine-implemented method according to Claim 2 further includes storing one or more content blobs in an account-specific object store, which deduplicates content blobs of the one or more content blobs within an account connected to the object store.
- Machine-implemented method according to Claim 2 , where the metadata blob is compressed and contains at least one of the following elements: directory hierarchy, file names, sizes, modification times, permissions, one or more unique identifiers, extended attributes, and link targets.
- Machine-implemented method according to Claim 2 , wherein the one or more content blobs have a variable size and include a header followed by one or more file segments, the header providing an index to all segments in a respective content blob of the one or more content blobs.
- The machine-implemented method according to Claim 2 It also includes partitioning one or more content blobs into categories based on usage patterns.
- The machine-implemented method according to Claim 2 It also includes preloading directory and file contents of one or more content blobs into a kernel page cache based on directory and file access activity.
- The machine-implemented method according to Claim 1 , which further includes saving a newly built container image file system with an updated version number, while a previous version of the container image file system remains active.
- A system comprising: at least one processor; and at least one memory that stores instructions which, when executed by the at least one processor, cause the system to perform operations that include: building a container image file system from a container image; executing an application that accesses the container image file system; recording the directory and file access activity on the container image file system by the application during execution; dynamically rebuilding the container image file system using the directory and file access activity, the rebuilding of the container image file system resulting in a rebuilt container image file system; and mounting the rebuilt container image file system during a subsequent execution of the application.
- System according to Claim 10 , wherein the dynamic restoration of the container image file system comprises creating a new version of a metadata blob and one or more content blobs comprising the container image file system, wherein the one or more content blobs contain files that are based on are grouped based on directory and file access activity.
- System according to Claim 11 , wherein the operations further include: splitting two or more content blobs into a first set of content blobs that are expected to be fully used during the startup of the application, and a second set of content blobs that are not expected to be fully used during the startup of the application.
- System according to Claim 11 , wherein the operations further include splitting large files into separate content blobs of one or more content blobs aligned with historically called page regions.
- System according Claim 11 , wherein the operations further include storing the one or more content blobs in an account-specific object store, which deduplicates content blobs of the one or more content blobs within an account associated with the object store.
- System according to Claim 11 , where the metadata blob is compressed and contains at least one of the following elements: directory hierarchy, file names, sizes, modification times, permissions, one or more unique identifiers, extended attributes, and link targets.
- System according to Claim 11 , wherein the one or more content blobs have a variable size and include a header followed by one or more file segments, the header providing an index to all segments in a respective content blob of the one or more content blobs.
- System according to Claim 11 , wherein the operations further include preloading directory and file contents of one or more content blobs into a kernel page cache based on directory and file access activity.
- System according to Claim 11 , the operations further include saving a newly built container image file system with an updated version number, while a previous version of the container image file system remains active.
- A machine storage medium that stores instructions which, when executed by one or more processors of a system, cause the system to perform operations that include: Building a container image file system from a container image; Running an application that accesses the container image file system; Recording directory and file access activity of the container image file system by the application during execution; Dynamicly rebuilding the container image file system using the directory and file access activity, with the rebuilding of the container image file system resulting in a rebuilt container image file system; and Mounting the rebuilt container image file system during a subsequent execution of the application.
- Machine storage medium according to Claim 19 , wherein the dynamic restoration of the container image file system includes creating a new version of a metadata blob and one or more content blobs comprising the container image file system, wherein the one or more content blobs contain files that are grouped based on directory and file access activity.
Description
PRIORITY CLAIM This application claims priority of the provisional US patent application serial no. 63/718,403 , which was submitted on November 8, 2024 and whose contents are incorporated herein by reference. TECHNICAL AREA Examples of disclosure generally relate to data platforms and, in particular, to methods for deploying and optimizing container applications. BACKGROUND Data platforms are frequently used for data storage and access in computing and communications contexts. In terms of architecture, a data platform can be a local data platform, a network-based data platform (e.g., a cloud-based data platform), a combination of both, and/or another type of architecture. Regarding the type of data processing, a data platform might implement online transaction processing (OLTP), online analytics processing (OLAP), a combination of both, and/or another type of data processing. Furthermore, a data platform can be or include a relational database management system (RDBMS) and/or one or more other types of database management systems. Cloud-based data platforms can exchange data between databases. BRIEF DESCRIPTION OF THE DRAWINGS The present revelation will be better understood with the help of the detailed description below and the accompanying drawings with various examples of the revelation. 1 shows a sample computer environment that includes a network-based data platform communicating with a cloud storage provider user system, as shown in some examples. 2 This is a block diagram showing the components of a Compute Service Manager, according to some examples. 3 is a block diagram that illustrates the components of an execution platform according to some examples. 4 This shows an optimized container image file system pipeline according to some examples. 5 illustrates an optimized container image file system computing environment according to some examples. 6 illustrates an optimized container image file system procedure according to some examples. 7 Illustrates metadata and content blobs (“binary large objects”) of a container image file system according to some examples. 8 illustrates a sequence of operations of components of an optimized container image file system, according to some examples. 9 illustrates a sequence of operations of components of an optimized container image file system, according to some examples. 10 shows a schematic representation of a machine in the form of a computer system in which a set of instructions can be executed to cause the machine to perform one or more of the methods discussed here, according to some examples. DETAILED DESCRIPTION Containerized applications are becoming increasingly popular for packaging and deploying software, particularly in the fields of AI and machine learning. However, with the growing complexity and size of these applications, a significant problem has emerged: slow startup times due to the need to download and unpack large container images before execution. This process can take several minutes, even on high-speed networks, resulting in a poor user experience and inefficient resource utilization. The problem is especially pronounced with AI and ML workloads, where image sizes often exceed 10 GB. When an application packaged as a container image is to be run on a host, the traditional approach is to first download and unpack the entire image. Due to the sheer size of the artifacts and the time required for downloading and decompressing, this can add minutes to the application's startup time. Existing solutions have attempted to address this problem, but they often lack a comprehensive and efficient approach. Some implementations rely on alternative representations of image contents or on over-processing. Network-mounted file systems are used, but these solutions may not fully optimize the startup process or may introduce additional complexities. Furthermore, many of these solutions do not adequately address the need for continuous optimization based on actual application usage patterns, leaving room for improvement in reducing startup times for frequently run workloads. Another challenge in the current landscape is the lack of efficient deduplication and storage optimization for container images within a single account or organization. This leads to unnecessary data duplication and increased storage costs, especially when multiple applications share common base layers or dependencies. Furthermore, existing solutions often lack adequate security measures, such as content encryption, which are beneficial for protecting sensitive data in containerized environments. In some examples, a data platform, as described in this disclosure, optimizes container image file systems to improve application startup times and performance. The data platform builds a container image file system from a container image by creating a Binary Large Object (Blob) with metadata that includes a directory hierarchy, file names, sizes, and attributes, while partitioning the file