Search

US-12621241-B1 - Avoiding retry abuses in service-oriented architectures

US12621241B1US 12621241 B1US12621241 B1US 12621241B1US-12621241-B1

Abstract

Methods, systems, and computer-readable storage media for a retry framework for executing retries by adding header(s) or additional data to an existing header to calls in a workflow and using a retry history table to record retries between services. In some examples, each call between services includes header(s) or additional data in a header to uniquely identify a workflow that the call belongs to and to uniquely identify a branch of the workflow. If a service is to retry a call, the service queries the retry history table to determine a number of times the call has been retried, if any. If the call has not been retried, or has been retried less than a threshold number of times, the service updates the retry history table and executes the retry. If the call has been retried at least the threshold number of times, the service returns an error.

Inventors

  • Hui Li

Assignees

  • SAP SE

Dates

Publication Date
20260505
Application Date
20241107

Claims (20)

  1. 1 . A computer-implemented method for retrying requests between services in cloud computing systems, the method being executed by one or more processors and comprising: receiving, at a first service, a first request comprising first header data including a flow identifier for a workflow and a first branch identifier, wherein the first branch identifier uniquely identifies a first branch of a workflow corresponding to the flow identifier; transmitting, from the first service and to a second service, a second request comprising second header data including the flow identifier and a second branch identifier; determining that the second request from the first service to the second service has failed; querying, by the first service, a retry history table using a first query comprising the flow identifier and the first branch identifier; and generating a first query result responsive to the first query by: retrieving a first retry count from the retry history table, retrieving a threshold retry count, determining the first retry count is less than the threshold retry count so as to generate a first retry instruction, and incrementing the first retry count to provide a second retry count stored in the retry history table; transmitting the first query result comprising the first retry instruction; and retrying the second request from the first service to the second service so as to generate a retried second request in response to the first retry instruction in the first query result.
  2. 2 . The method of claim 1 , wherein the querying by the first service the retry history further comprises: determining that the retry history table indicates absence of a record for the flow identifier and the first branch identifier, and in response: inserting a record for the first retry count into the retry history table indexed by the flow identifier and the first branch identifier; setting the first retry count for the record to an initial value; and wherein the first retry count of first query response is the initial value.
  3. 3 . The method of claim 1 , further comprising: determining that the retried second request has failed; querying by the first service the retry history table using a second query comprising the flow identifier and the first branch identifier; generating a second query result responsive to the second query by: retrieving the second retry count from the retry history table, retrieving the threshold retry count, and determining the second retry count is not less than the threshold retry count so as to generate an error message; and transmitting the second query result comprising the error message.
  4. 4 . The method of claim 3 , wherein the error is returned to one of a gateway and a third service.
  5. 5 . The method of claim 1 , further comprising: transmitting, from the first service to a third service, a third request comprising third header data including the flow identifier and a third branch identifier, wherein the first branch identifier is the same as the second branch identifier and third branch identifier is different from both the first and second branch identifiers; determining that the third request from the first service to the third service has failed; querying by the first service the retry history table using a second query comprising the flow identifier and the third branch identifier; generating a second query result responsive to the second query by: retrieving a third retry count from the retry history table, retrieving a second threshold retry count, determining the third retry count is less than the second threshold retry count so as to generate a second retry instruction, and incrementing the third retry count to a fourth retry count stored in the retry history table; transmitting the second query response comprising the second retry instruction; and retrying the third request from the first service to the third service so as to generate a retried third request.
  6. 6 . The method of claim 5 , further comprising: determining that the retried third request has failed; querying by the first service the retry history table using a third query comprising the flow identifier and the third branch identifier; generating a third query result responsive to the third query by: retrieving a fourth retry count from the retry history table, retrieving the second threshold retry count, and determining that the fourth retry count is not less than the second threshold retry count so as to generate an error message; and transmitting the third query response to the first service.
  7. 7 . The method of claim 1 , wherein the first request is transmitted from a third service to the first service.
  8. 8 . The method of claim 7 , wherein determining the second service has failed is based on exceeding a first timeout condition of the first service, the third service having a second timeout condition different from the first timeout condition.
  9. 9 . The method of claim 1 , wherein the first request is transmitted from a gateway to the first service.
  10. 10 . The method of claim 9 , further comprising receiving a response from the first service to the gateway wherein the gateway instructs the retry history table to erase any entries with the flow identifier.
  11. 11 . The method of claim 1 , wherein the flow identifier is generated by a gateway.
  12. 12 . A non-transitory computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations for retrying requests between services in cloud computing systems, the operations comprising: receiving, at a first service, a first request comprising first header data including a flow identifier for a workflow and a first branch identifier, wherein the first branch identifier uniquely identifies a first branch of a workflow corresponding to the flow identifier; transmitting, from the first service and to a second service, a second request comprising second header data including the flow identifier and a second branch identifier; determining that the second request from the first service to the second service has failed; querying, by the first service, a retry history table using a first query comprising the flow identifier and the first branch identifier; and generating a first query result responsive to the first query by: retrieving a first retry count from the retry history table, retrieving a threshold retry count, determining the first retry count is less than the threshold retry count so as to generate a first retry instruction, incrementing the first retry count to provide a second retry count stored in the retry history table; transmitting the first query result comprising the first retry instruction; and retrying the second request from the first service to the second service so as to generate a retried second request in response to the first retry instruction in the first query result.
  13. 13 . The non-transitory computer-readable storage medium of claim 12 , wherein the querying by the first service the retry history further comprises: determining that the retry history table indicates absence of a record for the flow identifier and the first branch identifier, and in response: inserting a record for the first retry count into the retry history table indexed by the flow identifier and the first branch identifier; setting the first retry count for the record to an initial value; and wherein the first retry count of first query response is the initial value.
  14. 14 . The non-transitory computer-readable storage medium of claim 12 , wherein operations further comprise: determining that the retried second request has failed; querying by the first service the retry history table using a second query comprising the flow identifier and the first branch identifier; generating a second query result responsive to the second query by: retrieving the second retry count from the retry history table, retrieving the threshold retry count, and determining the second retry count is not less than the threshold retry count so as to generate an error message; and transmitting the second query result comprising the error message.
  15. 15 . The non-transitory computer-readable storage medium of claim 14 , wherein the error is returned to one of a gateway and a third service.
  16. 16 . The non-transitory computer-readable storage medium of claim 12 , wherein operations further comprise: transmitting, from the first service to a third service, a third request comprising third header data including the flow identifier and a third branch identifier, wherein the first branch identifier is the same as the second branch identifier and third branch identifier is different from both the first and second branch identifiers; determining that the third request from the first service to the third service has failed; querying by the first service the retry history table using a second query comprising the flow identifier and the third branch identifier; generating a second query result responsive to the second query by: retrieving a third retry count from the retry history table, retrieving a second threshold retry count, determining the third retry count is less than the second threshold retry count so as to generate a second retry instruction, and incrementing the third retry count to a fourth retry count stored in the retry history table; transmitting the second query response comprising the second retry instruction; and retrying the third request from the first service to the third service so as to generate a retried third request.
  17. 17 . A system, comprising: a computing device; and a computer-readable storage device coupled to the computing device and having instructions stored thereon which, when executed by the computing device, cause the computing device to perform operations for retrying requests between services in cloud computing systems, the operations comprising: receiving, at a first service, a first request comprising first header data including a flow identifier for a workflow and a first branch identifier, wherein the first branch identifier uniquely identifies a first branch of a workflow corresponding to the flow identifier; transmitting, from the first service and to a second service, a second request comprising second header data including the flow identifier and a second branch identifier; determining that the second request from the first service to the second service has failed; querying, by the first service, a retry history table using a first query comprising the flow identifier and the first branch identifier; and generating a first query result responsive to the first query by: retrieving a first retry count from the retry history table, retrieving a threshold retry count, determining the first retry count is less than the threshold retry count so as to generate a first retry instruction, and incrementing the first retry count to provide a second retry count stored in the retry history table; transmitting the first query result comprising the first retry instruction; and retrying the second request from the first service to the second service so as to generate a retried second request in response to the first retry instruction in the first query result.
  18. 18 . The system of claim 17 , wherein the querying by the first service the retry history further comprises: determining that the retry history table indicates absence of a record for the flow identifier and the first branch identifier, and in response: inserting a record for the first retry count into the retry history table indexed by the flow identifier and the first branch identifier; setting the first retry count for the record to an initial value; and wherein the first retry count of first query response is the initial value.
  19. 19 . The system of claim 17 , wherein operations further comprise: determining that the retried second request has failed; querying by the first service the retry history table using a second query comprising the flow identifier and the first branch identifier; generating a second query result responsive to the second query by: retrieving the second retry count from the retry history table, retrieving the threshold retry count, and determining the second retry count is not less than the threshold retry count so as to generate an error message; and transmitting the second query result comprising the error message.
  20. 20 . The system of claim 19 , wherein the error is returned to one of a gateway and a third service.

Description

BACKGROUND Cloud computing can be described as Internet-based computing that provides shared computer processing resources and data to computers and other devices on demand. Users can establish respective sessions, during which processing resources and bandwidth are consumed. During a session, for example, a user is provided on-demand access to a shared pool of configurable computing resources (e.g., computer networks, servers, storage, applications, and services). The computing resources can be provisioned and released (e.g., scaled) to meet user demand. In cloud-based environments, applications can be provisioned using services, also referred to as microservices, which have gained popularity in service-oriented architectures (SOAs). In SOAs, applications are composed of multiple, independent services, and are deployed in standalone containers with a well-defined interface. The services are deployed and managed by a cloud platform and execute on top of a cloud infrastructure. In such a services environment, messages or request/responses are issued among the various services. If one or more services has difficulty in providing a response in a reasonable amount of time, a retry request can be sent. However, in a system with dozens or hundreds of services, a small set of errors could trigger a disproportionate number of retry requests that will in turn decrease the overall efficiency of the computer system and may lead some tenants in a multi-tenant system to experience significant delays in response time due to an overburdened system due to a delayed response trigger multiple retry request from one tenant. SUMMARY Implementations of the present disclosure are directed to retrying calls between services in cloud-based systems. More particularly, implementations of the present disclosure are directed to a retry framework for retrying calls between services in cloud-based systems. In some implementations, actions include receiving, at a first service, a first request including first header data including a flow identifier for a workflow and a first branch identifier, wherein the first branch identifier uniquely identifies a first branch of a workflow corresponding to the flow identifier, transmitting, from the first service and to a second service, a second request including second header data including the flow identifier and a second branch identifier, determining that the second request from the first service to the second service has failed, querying, by the first service, a retry history table using a first query including the flow identifier and the first branch identifier, and generating a first query result responsive to the first query by retrieving a first retry count from the retry history table, retrieving a threshold retry count, determining the first retry count is less than the threshold retry count so as to generate a first retry instruction, and incrementing the first retry count to provide a second retry count stored in the retry history table, transmitting the first query result comprising the first retry instruction, and retrying the second request from the first service to the second service so as to generate a retried second request in response to the first retry instruction in the first query result. Other implementations of this aspect include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices. These and other implementations can each optionally include one or more of the following features: querying by the first service the retry history further includes determining that the retry history table indicates absence of a record for the flow identifier and the first branch identifier, and in response, inserting a record for the first retry count into the retry history table indexed by the flow identifier and the first branch identifier, setting the first retry count for the record to an initial value, and the first retry count of first query response being the initial value; actions further include determining that the retried second request has failed, querying by the first service the retry history table using a second query comprising the flow identifier and the first branch identifier, generating a second query result responsive to the second query by retrieving the second retry count from the retry history table, retrieving the threshold retry count, and determining the second retry count is not less than the threshold retry count so as to generate an error message, and transmitting the second query result including the error message; the error is returned to one of a gateway and a third service; actions further include transmitting, from the first service to a third service, a third request including third header data including the flow identifier and a third branch identifier, wherein the first branch identifier is the same as the second branch identifier and third branch identifier is differen