Search

US-12625754-B2 - Personalized service disruption notification

US12625754B2US 12625754 B2US12625754 B2US 12625754B2US-12625754-B2

Abstract

Systems and methods are provided for generating personalized service disruption notifications. The system allocates resources of a database system to a plurality of entities, the resources of the database system being distributed in a cloud environment and analyzes a plurality of signals on the database system. The system, in response to analyzing the plurality of signals, detects a likelihood of a service availability disruption on the database system for a first entity of the plurality of entities. The system notifies the first entity of the service availability disruption in response to detecting the likelihood of the service availability disruption.

Inventors

  • Samartha Chandrashekar
  • Kaushal Y. Jain
  • Carl Yates Perry
  • Lian Yu
  • Xiaojun Zhao

Assignees

  • SNOWFLAKE INC.

Dates

Publication Date
20260512
Application Date
20231206

Claims (20)

  1. 1 . A system comprising: at least one hardware processor; and at least one memory storing instructions that cause the at least one hardware processor to execute operations comprising: allocating resources of a database system to a plurality of entities, the resources of the database system being distributed in a cloud environment; analyzing a plurality of signals on the database system; in response to analyzing the plurality of signals, detecting a likelihood of a service availability disruption on the database system for a first entity of the plurality of entities; providing a graphical user interface (GUI) to the first entity that enables the first entity to define a first threshold for triggering notifications of service availability disruptions, the GUI enabling different entities of the plurality of entities to define respective thresholds used to control whether to transmit notifications about service disruptions; receiving, via the GUI, input from the first entity that defines the first threshold; comparing the likelihood of the service availability disruption to the first threshold; and based on determining that the likelihood, defined by the input received via the GUI from the first entity, transgresses the first threshold, notifying the first entity of the service availability disruption.
  2. 2 . The system of claim 1 , the operations comprising: computing the likelihood of the service availability disruption based on the plurality of signals; and triggering notifying the first entity in response to determining that the likelihood transgresses a threshold value.
  3. 3 . The system of claim 1 , wherein the service availability disruption affects the first entity while services continue to be made available to a second entity of the plurality of entities, the first entity being notified without notifying the second entity.
  4. 4 . The system of claim 1 , wherein the resources of the database system are allocated to the first entity on a first portion of the cloud environment, and wherein the resources of the database system are allocated to a second entity on a second portion of the cloud environment.
  5. 5 . The system of claim 1 , wherein the plurality of entities comprises third-party entities relative to a provider of the database system.
  6. 6 . The system of claim 1 , the operations comprising: storing a job table that associates a first plurality of job features with the first entity and a second plurality of job features with a second entity; and deriving the plurality of signals from the job table.
  7. 7 . The system of claim 1 , the operations comprising: receiving input from a first computing system associated with the first entity that defines the first threshold for triggering service availability disruptions; and receiving input from a second computing system associated with a second entity that defines a second threshold for triggering service availability disruptions.
  8. 8 . The system of claim 7 , wherein the second entity is notified about a service availability disruption in response to determining that another likelihood of a service availability disruption for the second entity transgresses the second threshold.
  9. 9 . The system of claim 1 , the operations comprising: providing, as part of notifying the first entity about the service availability disruption, an indicator in the GUI comprising information indicating an approximate time to restore service for the first entity on the database system.
  10. 10 . The system of claim 1 , the operations comprising: processing the plurality of signals by a machine learning model to predict the likelihood of the service availability disruption on the database system, the machine learning model generating a prediction indicating when the service availability disruption will start.
  11. 11 . The system of claim 10 , wherein the machine learning model generates outputs comprising one or more of: an identifier of a region associated with the service availability disruption, an identifier of an individual entity affected by the service availability disruption, an estimated start time of the service availability disruption, an estimated end time of the service availability disruption, a likelihood score for the service availability disruption, and indication of one or more signals of the plurality of signals that cause the service availability disruption.
  12. 12 . The system of claim 10 , the operations comprising training the machine learning model by performing training operations comprising: accessing training data that associates training database signals associated with a plurality of training entities to which resources on the database system are allocated and corresponding likelihoods of service availability disruptions; processing, by the machine learning model, a first batch of the training database signals to generate an estimated likelihood of service availability disruption; computing a deviation between the estimated likelihood of the service availability disruption and the corresponding likelihood of the service availability disruption associated with the first batch of the training database signals; and updating parameters of the machine learning model based on the deviation.
  13. 13 . The system of claim 12 , the operations comprising: generating the training database signals based on analyzing operations associated with the plurality of training entities on the database system; and computing the likelihoods of service availability disruptions for the training database signals based on application of Mahalanobis distance and principal component analysis (PCA) to the training database signals.
  14. 14 . The system of claim 1 , wherein the plurality of signals comprises at least one of: a number of query related incidents within a first prior time interval, a duration for which queries were delayed waiting to be resumed within a second prior time interval, a queued job percentage in a third prior time interval, a valid queries success rate, a total number of valid queries submitted by an individual entity in the second prior time interval, the duration for which queries were delayed waiting to be resumed within a fourth prior time interval, an average duration of valid queries in a fifth prior time interval, a total number of failed queries in a sixth prior time interval, a number of queries waiting to be executed in a seventh prior time interval, a number of queries waiting to be executed in an eighth prior time interval, an average duration of valid queries in a ninth prior time interval, an average duration of valid queries in a tenth prior time interval, or a total number of query related incidents visible to one or more entities.
  15. 15 . The system of claim 1 , the operations comprising: receiving, from a computing device of the first entity, a request from the first entity to access service availability disruptions; and in response to receiving the request, generating the graphical user interface comprising the likelihood of the service availability disruption for presentation on the computing device of the first entity.
  16. 16 . A method comprising: allocating, by one or more processors, resources of a database system to a plurality of entities, the resources of the database system being distributed in a cloud environment; analyzing a plurality of signals on the database system; in response to analyzing the plurality of signals, detecting a likelihood of a service availability disruption on the database system for a first entity of the plurality of entities; providing a graphical user interface (GUI) to the first entity that enables the first entity to define a first threshold for triggering notifications of service availability disruptions, the GUI enabling different entities of the plurality of entities to define respective thresholds used to control whether to transmit notifications about service disruptions; receiving, via the GUI, input from the first entity that defines the first threshold; comparing the likelihood of the service availability disruption to the first threshold; and based on determining that the likelihood, defined by the input received via the GUI from the first entity, transgresses the first threshold, notifying the first entity of the service availability disruption.
  17. 17 . The method of claim 16 , comprising: computing the likelihood of the service availability disruption based on the plurality of signals; and triggering notifying the first entity in response to determining that the likelihood transgresses a threshold value.
  18. 18 . The method of claim 16 , wherein the service availability disruption affects the first entity while services continue to be made available to a second entity of the plurality of entities, the first entity being notified without notifying the second entity.
  19. 19 . The method of claim 16 , wherein the resources of the database system are allocated to the first entity on a first portion of the cloud environment, and wherein the resources of the database system are allocated to a second entity on a second portion of the cloud environment.
  20. 20 . The method of claim 16 , wherein the plurality of entities comprises third-party entities relative to a provider of the database system, and wherein the graphical user interface comprises: a dashboard displaying availability related issues for the first entity, the dashboard including a time indicator specifying when data from the database system was last accessed to generate the plurality of signals; a custom alerts setup option that, when selected by the first entity, presents a threshold definition interface enabling the first entity to define the first threshold; and a detailed information region displaying predicted service availability disruptions.

Description

TECHNICAL FIELD Examples of the disclosure relate generally to data platforms and databases and, more specifically, to predicting service disruptions on the databases. BACKGROUND Databases are widely used for data storage and access in computing applications. A goal of database storage is to provide enormous sums of information in an organized manner so that it can be accessed, managed, updated, and shared. In a database, data may be organized into rows, columns, and tables. Databases are used by various entities and companies for storing information that may need to be accessed or analyzed. Various operations performed on a database, such as joins and unions, involve combining query results obtained from different data sources (e.g., different tables, possibly on different databases) into a single query result. The accuracy and efficiency at which various operations can be performed are impacted by the schema associated with various rows/columns of the tables. BRIEF DESCRIPTION OF THE DRAWINGS The present disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the disclosure. FIG. 1 illustrates an example computing environment that includes a network-based data platform, in accordance with some examples of the present disclosure. FIG. 2 is a block diagram illustrating components of a compute service manager, in accordance with some examples of the present disclosure. FIG. 3 is a block diagram illustrating components of an execution platform, in accordance with some examples of the present disclosure. FIG. 4 is a block diagram of a service disruption detection system, in accordance with some examples of the present disclosure. FIG. 5 is an illustrative set of signals analyzed by the service disruption detection system, in accordance with some examples of the present disclosure. FIG. 6 is an illustrative output of the service disruption detection system, in accordance with some examples of the present disclosure. FIG. 7 is a flow diagram illustrating operations of the service disruption detection system, in accordance with some examples of the present disclosure. FIG. 8 illustrates a diagrammatic representation of a machine in the form of a computer system within which a set of instructions may be executed for causing the machine to perform any one or more of the methodologies discussed herein, in accordance with some examples of the present disclosure. DETAILED DESCRIPTION Reference will now be made in detail to specific example embodiments for carrying out the inventive subject matter. Examples of these specific embodiments are illustrated in the accompanying drawings, and specific details are set forth in the following description in order to provide a thorough understanding of the subject matter. It will be understood that these examples are not intended to limit the scope of the claims to the illustrated embodiments. On the contrary, they are intended to cover such alternatives, modifications, and equivalents as may be included within the scope of the disclosure. Data platforms are widely used for data storage and data access in computing and communication contexts. Concerning architecture, a data platform could be an on-premises data platform, a network-based data platform (e.g., a cloud-based data platform), a combination of the two, and/or include another type of architecture. With respect to type of data processing, a data platform could implement online transactional processing (OLTP), online analytical processing (OLAP), a combination of the two, and/or another type of data processing. Moreover, a data platform could be or include a relational database management system (RDBMS) and/or one or more other types of database management systems. In a typical implementation, a data platform includes one or more databases that are maintained on behalf of a customer account. The data platform may include one or more databases that are respectively maintained in association with any number of customer accounts (e.g., entities), as well as one or more databases associated with a system account (e.g., an administrative account) of the data platform, one or more other databases used for administrative purposes, and/or one or more other databases that are maintained in association with one or more other organizations and/or for any other purposes. A data platform may also store metadata in association with the data platform in general and in association with, as examples, particular databases and/or particular customer accounts as well. The entities that are allocated services on the data platform may be third-parties relative to an entity that provides or hosts the data platform. Users and/or executing processes that are associated with a given customer account may, via one or more types of clients, be able to cause data to be ingested into the database, and may also be able to manipulate the data, add additional data, remo