US-12619751-B1 - Reconstructive introspection of application flows

US12619751B1US 12619751 B1US12619751 B1US 12619751B1US-12619751-B1

Abstract

Applications using complex interactions among networked services may have flows reconstructed through analysis of trace logs of the services. Logs collected independently and absent knowledge of other logs or flows may be parsed by an adaptive parser to identify service events which may be temporally organized using timestamps and performance metrics collected during previous usage of the services. Analyzing these temporally organized events may produce a list of independent candidate events that may associated with an application flow. These candidate events may then be sorted based on key identifiers in the events according to previously established application facts with identifiers of events correlated to reconstruct the application flow. The reconstructed flow may then be provided for stored, visualized, replayed or provided as input to a topology generator for further introspection.

Inventors

Matthew Lee Muncy

Assignees

AMAZON TECHNOLOGIES, INC.

Dates

Publication Date: 20260505
Application Date: 20221216

Claims (18)

1 . A system, comprising: one or more processors and a memory storing program instructions that, when executed on the one or more processors, implement a trace collector configured to reconstruct an application flow, wherein to reconstruct the application flow the trace collector is configured to: analyze, to identify a plurality of events, respective independently generated event logs of a plurality of interconnected services including a first service and a second service, wherein individual ones of the plurality of events respectively comprise a timestamp and a plurality of identifiers defining a context of the respective event, wherein a first identifier defining a context of a first event from the first service is different from a second identifier defining a context of a second event from the second service, and wherein the respective ones of the trace logs individually comprise: timestamps for respective ones of a plurality of entries; identifiers of the respective interconnected services collecting the respective event logs; and identifiers of one or more artifacts associated with a context for respective ones of a plurality of entries; sort individual ones of the plurality of events according respective timestamps to identify candidate events of the application flow; categorize individual ones of the candidate events according to key identifiers including the first identifier and second identifier; identify the first identifier as matching the second identifier based at least in part on the categorizing and on additional information for the application flow, the additional information obtained independent of the respective event logs and different from the respective timestamps and respective pluralities of identifiers of the individual event logs; and report, to a client, the reconstructed application flow comprising the first event and the second event.
2 . The system of claim 1 , wherein at least one log of the respective event logs is formatted in an unknown format, and wherein analyzing the at least one log is performed using inferences derived from the at least one log according to a machine learning model.
3 . The system of claim 1 , wherein the trace collector is further configured to: reconstruct an application topology according to the matched first identifier and second identifier; and report the reconstructed application topology to the client.
4 . The system of claim 1 , wherein the plurality of interconnected services is provided by a service provider network, and wherein reconstructing the application flow is performed by an analysis service of the provider network.
5 . A method, comprising: analyzing, to identify a plurality of events, respective independently generated event logs of a plurality of interconnected services including a first service and a second service, wherein individual ones of the plurality of events respectively comprise a timestamp and a plurality of identifiers defining a context of the respective event, wherein a first identifier defining a context of a first event from the first service is different from a second identifier defining a context of a second event from the second service; sorting individual ones of the plurality of events according respective timestamps to identify candidate events of the application flow; categorizing individual ones of the candidate events according to key identifiers including the first identifier and second identifier; identifying the first identifier as matching the second identifier based at least in part on the categorizing and on additional information for the application flow, the additional information obtained independent of the respective event logs and different from the respective timestamps and respective pluralities of identifiers of the individual event logs; and reporting, to a client, a reconstructed application flow comprising the first event and the second event.
6 . The method of claim 5 , wherein at least one log of the respective trace logs is formatted in an unknown format, and wherein analyzing the at least one log is performed using inferences derived from the at least one log according to a machine learning model.
7 . The method of claim 5 , wherein individual ones of the plurality of interconnected services are communicatively coupled to other ones of the plurality of interconnected services via respective network connections.
8 . The method of claim 5 , wherein individual ones of the plurality of interconnected services are communicatively coupled to other ones of the plurality of interconnected services via shared files.
9 . The method of claim 5 , further comprising collecting, by individual ones of the plurality of interconnected services, respective ones of the trace logs, wherein the respective ones of the trace logs individually comprise: timestamps for respective ones of a plurality of entries; identifiers of the respective interconnected services collecting the respective trace logs; and identifiers of one or more artifacts associated with a context for respective ones of a plurality of entries.
10 . The method of claim 5 , wherein sorting individual ones of the plurality of events according respective timestamps is based at least in part on performance metrics of the respective interconnected services.
11 . The method of claim 5 , further comprising: reconstructing an application topology according to the matched first identifier and second identifier; and reporting the reconstructed application topology to the client.
12 . The method of claim 5 , wherein the plurality of interconnected services is provided by a service provider network, and wherein the method is performed by an analysis service of the service provider network.
13 . One or more non-transitory computer-accessible storage media storing program instructions that when executed on or across one or more computing devices cause the one or more computing devices to reconstruct a process flow, comprising: analyzing, to identify a plurality of log entries, respective independently generated event logs of a plurality of interconnected subsystems including a first subsystem and a second subsystem, wherein individual ones of the plurality of log entries respectively comprise a timestamp and a plurality of identifiers defining a context of the respective log entry, wherein a first identifier defining a context of a first log entry from the first subsystem is different from a second identifier defining a context of a second log entry from the second subsystem; sorting individual ones of the plurality of log entries according respective timestamps to identify candidate log entries of the process flow; categorizing individual ones of the candidate log entries according to key identifiers including the first identifier and second identifier; identifying the first identifier as matching the second identifier based at least in part on the categorizing and on additional information for the application flow, the additional information obtained independent of the respective event logs and different from the respective timestamps and respective pluralities of identifiers of the individual event logs; and reporting, to a client, the reconstructed process flow comprising the first log entry and the second log entry.
14 . The one or more non-transitory computer-accessible storage media of claim 13 , wherein at least one log of the respective event logs is formatted in an unknown format, and wherein analyzing the at least one log is performed using inferences derived from the at least one log according to a machine learning model.
15 . The one or more non-transitory computer-accessible storage media of claim 13 , wherein reconstructing the process flow further comprises collecting, by individual ones of the plurality of interconnected subsystems, respective ones of the event logs, wherein the respective ones of the event logs individually comprise: timestamps for respective ones of a plurality of entries; identifiers of the respective interconnected subsystems collecting the respective event logs; and identifiers of one or more artifacts associated with a context for respective ones of a plurality of entries.
16 . The one or more non-transitory computer-accessible storage media of claim 13 , wherein sorting individual ones of the plurality of log entries according respective timestamps is based at least in part on performance metrics of the respective interconnected subsystems.
17 . The one or more non-transitory computer-accessible storage media of claim 13 , wherein reconstructing the process flow further comprises: reconstructing a process topology according to the matched first identifier and second identifier; and reporting the reconstructed process topology to the client.
18 . The one or more non-transitory computer-accessible storage media of claim 13 , wherein the plurality of interconnected subsystems is provided by a service provider network, and wherein reconstructing the process flow is performed by an analysis service of the service provider network.

Description

BACKGROUND Applications leveraging large scale network-based services, along with the systems that provide those applications and services, may evolve to incorporate complex interactions of a scale beyond straightforward analysis and understanding. As a result, optimizing, configuring and maintaining such systems may become problematic as introspection of these interactions becomes increasingly challenging with scale. Furthermore, as a provider network may provide hundreds of such services, often with countless interdependencies, application flows may become hopelessly opaque. Unfortunately, the same scaling issues that lead to such introspection challenges also present massive challenges to instrumentation and debugging, as adding even the most basic tracing information for applications may be prohibitive. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram illustrating a provider network that may implement an application flow analysis service, according to some embodiments. FIG. 2 illustrates example interactions implementing reconstruction of an application flow and topology, according to some embodiments. FIG. 3 illustrates example interactions implementing simulation of a reconstructed application flow, according to some embodiments. FIG. 4 is a flow diagram illustrating reconstruction of an application flow and topology, according to some embodiments. FIG. 5 is a block diagram illustrating an example computer system, according to various embodiments. While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that the embodiments are not limited to the embodiments or drawings described. It should be understood that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). The words “include,” “including,” and “includes” indicate open-ended relationships and therefore mean including, but not limited to. Similarly, the words “have,” “having,” and “has” also indicate open-ended relationships, and thus mean having, but not limited to. The terms “first,” “second,” “third,” and so forth as used herein are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.) unless such an ordering is otherwise explicitly indicated. “Based On.” As used herein, this term is used to describe one or more factors that affect a determination. This term does not foreclose additional factors that may affect a determination. That is, a determination may be solely based on those factors or based, at least in part, on those factors. Consider the phrase “determine A based on B.” While B may be a factor that affects the determination of A, such a phrase does not foreclose the determination of A from also being based on C. In other instances, A may be determined based solely on B. The scope of the present disclosure includes any feature or combination of features disclosed herein (either explicitly or implicitly), or any generalization thereof, whether or not it mitigates any or all of the problems addressed herein. Accordingly, new claims may be formulated during prosecution of this application (or an application claiming priority thereto) to any such combination of features. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the appended claims. DETAILED DESCRIPTION Cloud-based applications, such as critical business applications, have evolved to leverage large scale network-based services. These applications, along with the systems that provide those them as well as the services they use, develop increasingly complex interactions of a scale beyond straightforward analysis. As a result, developing, debugging, optimizing, configuring and maintaining such systems may become problematic as introspection of these complex interactions becomes increasingly challenging at scale. Incorporated in this scale, a provider network may provide hundreds of such services, often with countless interdependencies. As a result, application flows may become hopelessly opaque. At the same time, these scaling issues that lead to such introspection challenges also present challenges to instrumentation and debugging, as adding eve