EP-4738141-A1 - SYSTEM AND METHOD FOR HORIZONTAL SCALING OF MULTI-TENANT WEB APPLICATIONS
Abstract
In some implementations, the techniques described herein relate to a method including: determining whether a distributed lock is available for an operation on a shared resource via a coordination server managing locks across a plurality of application servers in a multi-tenant environment; acquiring the distributed lock when available by generating a fencing token associated with the distributed lock, the fencing token being a monotonically increasing value managed by the coordination server; verifying the fencing token by a given application server prior to performing the operation on the shared resource; invalidating cached data across application servers after performing the operation, wherein cache invalidation is triggered by the given application server that performs the operation; and releasing the distributed lock to the coordination server after the operation, allowing other application servers to acquire the distributed lock.
Inventors
- THOMAS, Melbin
- RELHAN, Suhina
- CHAIL, Rishi
- RAPURU, Amarendra
- KAR, Chandratap
- BORNHOEVD, CHRISTOF
Assignees
- Workday, Inc.
Dates
- Publication Date
- 20260506
- Application Date
- 20251027
Claims (10)
- A method comprising: determining whether a distributed lock is available for an operation on a shared resource via a coordination server managing locks across a plurality of application servers in a multi-tenant environment; acquiring the distributed lock when available by generating a fencing token associated with the distributed lock, the fencing token being a monotonically increasing value managed by the coordination server; verifying the fencing token by a given application server prior to performing the operation on the shared resource; invalidating cached data across application servers after performing the operation, the cache invalidation being triggered by the given application server that performs the operation; and releasing the distributed lock to the coordination server after the operation, allowing other application servers to acquire the distributed lock.
- The method of claim 1, further comprising: detecting a data change in one application server; generating a cache invalidation event associated with the data change; publishing the cache invalidation event to a distributed messaging system; and consuming the cache invalidation event on multiple application servers to invalidate cached data.
- The method of claim 1 or 2, further comprising: receiving a request to cancel a long-running operation on an application server; identifying the operation associated with the request; publishing a cancellation event to a distributed messaging system; consuming the cancellation event at the application server; and interrupting the long-running operation.
- The method of any one of the preceding claims, further comprising: detecting a failure of an application server; identifying affected user sessions on the application server; retrieving session data from an external session store; and restoring the affected user sessions on an available application server.
- The method of any one of the preceding claims, wherein the coordination server implements a hierarchical locking mechanism, wherein locks are applied at different levels of data objects, including individual records, tables, and tenant-specific resources, within the multi-tenant environment.
- The method of any one of the preceding claims, further comprising assigning tenants to specific application servers dynamically, based on a current load and predefined service level agreements, wherein a load balancer adjusts a distribution of tenants in response to changes in resource availability and server performance.
- The method of any one of the preceding claims, wherein the fencing token is generated as a monotonically increasing value, wherein the given application server holding the distributed lock presents the fencing token before modifying the shared resource, and a modification is executed only if the fencing token corresponds to a most recent token issued for that lock.
- A computer program that, when executed by one or more processors, causes the one or more processors to carry out the method of any one of claims 1 to 7.
- A computer readable medium storing the computer program of claim 8.
- A system comprising: a processor; and a storage medium storing thereon program logic for execution by the processor, the program logic, when executed by the processor, to cause the processor to carry out the method of any one of claims 1 to 7.
Description
BACKGROUND Multi-tenant web applications allow multiple customers or tenants to share computing resources while maintaining logical separation of data and functionality. As the number of tenants and users grows, these applications face challenges in scaling to meet increased demand. Horizontal scaling, which involves adding more server instances to distribute load, is a common approach for improving capacity and performance. However, implementing horizontal scaling in multi-tenant environments presents unique technical challenges related to load balancing, session management, caching, and maintaining data consistency across distributed systems. BRIEF DESCRIPTION OF THE FIGURES FIG. 1 is a block diagram illustrating a horizontally scaled multi-tenant web application architecture.FIG. 2 is a block diagram illustrating a distributed locking system.FIG. 3 is a flow diagram illustrating a method for implementing fencing tokens in a distributed locking system.FIG. 4 is a block diagram illustrating a load balancing and session affinity system.FIG. 5 is a flow diagram illustrating a method for cache invalidation across multiple application servers.FIG. 6 is a block diagram illustrating a centralized root dashboard system.FIG. 7 is a flow diagram illustrating a method for canceling long-running operations across distributed application servers.FIG. 8 is a block diagram illustrating a distributed job scheduling system.FIG. 9 is a flow diagram illustrating a method for end-user session recovery in a horizontally scaled environment.FIG. 10 is a block diagram illustrating an architecture for merging application server endpoints.FIG. 11 is a block diagram illustrating a computing device. DETAILED DESCRIPTION Given the above known challenges, it has been recognized that effective solutions must address these challenges while optimizing resource utilization and preserving application functionality. To meet these ends, in some implementations, the techniques described herein relate to a method including: determining whether a distributed lock is available for an operation on a shared resource via a coordination server managing locks across a plurality of application servers in a multi-tenant environment; acquiring the distributed lock when available by generating a fencing token associated with the distributed lock, the fencing token being a monotonically increasing value managed by the coordination server; verifying the fencing token by a given application server prior to performing the operation on the shared resource; invalidating cached data across application servers after performing the operation, wherein cache invalidation is triggered by the given application server that performs the operation; and releasing the distributed lock to the coordination server after the operation, allowing other application servers to acquire the distributed lock. In some implementations, the techniques described herein relate to a method, further including: detecting a data change in one application server; generating a cache invalidation event associated with the data change; publishing the cache invalidation event to a distributed messaging system; and consuming the cache invalidation event on multiple application servers to invalidate cached data. In some implementations, the techniques described herein relate to a method, further including: receiving a request to cancel a long-running operation on an application server; identifying the operation associated with the request; publishing a cancellation event to a distributed messaging system; consuming the cancellation event at the application server; and interrupting the long-running operation. In some implementations, the techniques described herein relate to a method, further including: detecting a failure of an application server; identifying affected user sessions on the application server; retrieving session data from an external session store; and restoring the affected user sessions on an available application server. In some implementations, the techniques described herein relate to a method, wherein the coordination server implements a hierarchical locking mechanism, wherein locks are applied at different levels of data objects, including individual records, tables, and tenant-specific resources, within the multi-tenant environment. In some implementations, the techniques described herein relate to a method, further including assigning tenants to specific application servers dynamically, based on a current load and predefined service level agreements, wherein a load balancer adjusts a distribution of tenants in response to changes in resource availability and server performance. In some implementations, the techniques described herein relate to a method, wherein the fencing token is generated as a monotonically increasing value, wherein the given application server holding the distributed lock presents the fencing token before modifying the shared resource, and a modification