Search

US-12619818-B2 - Generating and validating data insights using machine learning models

US12619818B2US 12619818 B2US12619818 B2US 12619818B2US-12619818-B2

Abstract

A system obtains a bundle of insights generated based on insight templates and provides the bundle of insights as input to a machine learning model. The system then generates a summary of the bundle of insights using the machine learning model.

Inventors

  • Nathan Drew Nichols
  • Hon Hing WANG
  • Caroline Sherman
  • Lara Thompson
  • Jonathan Alden Drake
  • Ian Arthur Booth

Assignees

  • SALESFORCE, INC.

Dates

Publication Date
20260505
Application Date
20240131

Claims (12)

  1. 1 . A method for generating insight summaries, including: receiving a natural language expression describing respective insights and relationships between the respective insights in a bundle; obtaining the bundle of the respective insights generated based on insight templates; ranking the respective insights in the bundle of insights according to relevancy of respective metrics associated with the respective insights; providing the ranked insights in the bundle as input to a machine learning model; generating a first summary of the bundle of insights using the machine learning model that is trained based on semantics derived from corpuses and metadata derived from labelled data; validating accuracy of the first summary of the bundle of insights using the machine learning model, including identifying one or more inaccuracies; and generating, using the machine learning model, a second summary of the bundle of insights correcting the identified one or more inaccuracies.
  2. 2 . The method of claim 1 , further comprising validating the accuracy of the first summary of the bundle of insights using regular expressions and heuristics.
  3. 3 . The method of claim 1 , wherein the machine learning model includes a first machine learning model that is used to generate the first summary of the bundle of insights and a second machine learning model that is used to validate the accuracy of the first summary of the bundle of insights.
  4. 4 . The method of claim 1 , wherein at least some metrics that are associated with the respective insights in the bundle of insights have predetermined relationships to other metrics stored in a metrics database.
  5. 5 . A computer system having one or more processors and memory, wherein the memory stores one or more programs configured for execution by the one or more processors, and the one or more programs comprise instructions for: receiving a natural language expression describing respective insights and relationships between the respective insights in a bundle; obtaining the bundle of the respective insights generated based on insight templates; ranking the respective insights in the bundle of insights according to relevancy of respective metrics associated with the respective insights; providing the ranked insights in the bundle as input to a machine learning model; generating a first summary of the bundle of insights using the machine learning model that is trained based on semantics derived from corpuses and metadata derived from labelled data; validating accuracy of the first summary of the bundle of insights using the machine learning model, including identifying one or more inaccuracies; and generating, using the machine learning model, a second summary of the bundle of insights correcting the identified one or more inaccuracies.
  6. 6 . The computer system of claim 5 , wherein the one or more programs further comprise instructions for validating the accuracy of the first summary of the bundle of insights using regular expressions and heuristics.
  7. 7 . The computer system of claim 5 , wherein the machine learning model includes a first machine learning model that is used to generate the first summary of the bundle of insights and a second machine learning model that is used to validate the accuracy of the first summary of the bundle of insights.
  8. 8 . A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer system having one or more processors and memory, the one or more programs comprising instructions for: receiving a natural language expression describing respective insights and relationships between the respective insights in a bundle; obtaining the bundle of the respective insights generated based on insight templates; ranking the respective insights in the bundle of insights according to relevancy of respective metrics associated with the respective insights; providing the ranked insights in the bundle as input to a machine learning model; generating a first summary of the bundle of insights using the machine learning model that is trained based on semantics derived from corpuses and metadata derived from labelled data; validating accuracy of the first summary of the bundle of insights using the machine learning model, including identifying one or more inaccuracies; and generating, using the machine learning model, a second summary of the bundle of insights correcting the identified one or more inaccuracies.
  9. 9 . The non-transitory computer readable storage medium of claim 8 , the one or more programs comprising instructions for validating the accuracy of the first summary of the bundle of insights using regular expressions and heuristics.
  10. 10 . The non-transitory computer readable storage medium of claim 8 , wherein the machine learning model includes a first machine learning model that is used to generate the first summary of the bundle of insights and a second machine learning model that is used to validate the accuracy of the first summary of the bundle of insights.
  11. 11 . The computer system of claim 5 , wherein at least some metrics that are associated with the respective insights in the bundle of insights have predetermined relationships to other metrics stored in a metrics database.
  12. 12 . The non-transitory computer readable storage medium of claim 8 , wherein at least some metrics that are associated with the respective insights in the bundle of insights have predetermined relationships to other metrics stored in a metrics database.

Description

RELATED APPLICATIONS This application claims priority to U.S. Provisional Patent Application No. 63/537,808, filed Sep. 11, 2023, titled “Metric Layer Bootstrapping,” which is incorporated by reference herein in its entirety. The application relates to U.S. Utility patent application Ser. No. 18/429,072, entitled “Automatically Generating Metric Objects Using a Machine Learning Model,” filed Jan. 31, 2024, which is incorporated by reference herein in its entirety. TECHNICAL FIELD The disclosed embodiments relate generally to data analytics and, more specifically, to systems and methods for automatically generating and validating data insights using machine learning models. BACKGROUND Data analysis is the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves the use of various techniques, methods, and tools to examine and interpret data, uncover patterns, and extract insights. The primary objective of data analysis is to gain a better understanding of the underlying trends, relationships, and characteristics within the data. Data analysis is widely used across various industries and domains, including business, finance, healthcare, science, and technology. It plays a crucial role in extracting meaningful information from large and complex datasets, helping organizations make informed decisions and gain a competitive advantage. SUMMARY There is an increasing demand for making business insights accessible to business users and other users (e.g., in sales, marketing, HR, finance, or others) without the need for data analysts or scientists to manually create KPIs, metrics, data visualizations, or other business insights. The consumers of business insights have the need to make data-driven decisions but typically rely on others to manually create and track metrics for a selected data source. For example, a data analyst manually selects or creates various metadata that is used to provide business context for a metric. This process can be time consuming and inefficient. Manual creation of metrics fails to leverage metrics already created by others and may result in duplicate efforts. Accordingly, there is a need to automate the process for creating metrics and to improve the metrics themselves by augmenting metrics with additional metadata that provide additional business context, thereby improving the business insights that can be generated using the metrics. The above deficiencies and other problems associated with generating metrics are reduced or eliminated by the disclosed systems and methods. In accordance with some embodiments, a method is executed at a computer system having one or more processors and memory storing one or more programs configured for execution by the one or more processors. The method includes, obtaining a plurality of data fields from a selected data source, wherein a first subset of the plurality of data fields correspond to a plurality of measures and a second subset of the plurality of data fields correspond to a plurality of dimensions. The method further includes, prompting a machine learning model to generate a plurality of suggested metric objects. The method further includes, in response to prompting the machine learning model, generating a respective metric definition for each measure in the plurality of measures, wherein each generated respective metric definition includes a plurality of fields, including: (i) a name; (ii) a measure; (iii) a time dimension; and (iv) an aggregation type. Today, the amount of data businesses have to analyze can be overwhelming, and reports are often presented in a one-size-fits-all way to serve many different teams at once. Users are left searching through the reports to find what's relevant to them, which can be time-consuming and inefficient. For example, a user may have to filter the same dashboard in multiple ways to find the numbers that are important. Accordingly, there is a need for natural language descriptions (e.g., digests or insights) of the data trends. Further, there is a need for improving discoverability of such digests or insights. Insights into data trends can be generated using templates. For example, templates can be used to generate accurate, simple, and short natural language expressions of insights or trends in the data, such as “CRM stock has been steadily climbing recently and hit $220, a 15% increase over 90 days ago ($190).” However, individual insights are not sufficient to give a business user a comprehensive understanding of various trends that are occurring in the data source. Accordingly, there is a need to generate summaries of multiple insights or trends. Using concatenation of individual templates while accurate can be repetitive, lengthy, and less legible. Accordingly, there is a need to generate summaries that are more legible. In some embodiments, large language models can be used to sum