CA-3165160-C - DECISION IMPLEMENTATION WITH INTEGRATED DATA QUALITY MONITORING

CA3165160CCA 3165160 CCA3165160 CCA 3165160CCA-3165160-C

Abstract

Computer-implemented methods and systems include downstream execution for individual rule-based flagging of upstream data quality errors by receiving upstream data from a plurality of sources, identifying a downstream task to be executed, applying a plurality of rules to the upstream data, generating a plurality of outputs including at least one output for each of the plurality of rules applied to the upstream data, each of the plurality of outputs being associated with a corresponding rule of the plurality of rules, identifying a tagged population based on the plurality of outputs, determining that at least one of the plurality of outputs does not meet a corresponding rule threshold, and activating the downstream execution for the tagged population after at least one of (i) updating the corresponding rule threshold or (ii) overriding an error.

Inventors

Thomas Grimes
Kenneth Wydler

Assignees

CAPITAL ONE SERVICES, LLC

Dates

Publication Date: 20260505
Application Date: 20220623
Priority Date: 20210628

Claims (20)

What is claimed is: 1. A computer-implemented downstream execution method for individual rule-based flagging of upstream data quality errors, comprising: receiving upstream data, corresponding to an overall population of users, from a plurality of sources each source selected from one of a relational database, a non¬ relational database, or a file system; identifying a downstream task to be executed, the downstream task being associated with at least a portion of the overall population of users; applying a plurality of rules to the upstream data; generating a plurality of outputs including at least one output for each of the plurality of rules applied to the upstream data, each of the plurality of outputs being associated with a corresponding rule of the plurality of rules; identifying a tagged population based on the plurality of outputs, the tagged population being a subset of the overall population of users; determining that at least one of the plurality of outputs does not meet a corresponding rule threshold; generating one or more errors based on the determining that the at least one of the plurality of outputs does not meet the corresponding rule threshold; updating the corresponding rule threshold based on the one or more errors; overriding the one or more error; and activating an execution of the downstream task for the tagged population.
2. The method of claim 1 , further comprising generating a graphical representation of the at least one of the plurality of outputs that does not meet the corresponding rule threshold, the graphical representation comprising an indication of the corresponding rule threshold.
3. The method of claim 1 , wherein the upstream data comprises one or more of user account information, user behavior information, user action information, user status, or user changes.
4. The method of claim 1 , wherein the upstream data comprises one or more of a system status, a system profile, and a system action.
5. The method of claim 1 , wherein the corresponding rule threshold is generated by a machine learning model.
6. The method of claim 5, wherein the machine learning model is updated based on the downstream execution.
7. The method of claim 5, wherein the machine learning model is generated based on training data comprising data from past downstream executions.
8. The method of claim 5, wherein the machine learning model is generated based on training data from attributes associated with the corresponding rule.
9. The method of claim 1 , further comprising organizing the upstream data based at least on a type of at least a subset of the upstream data.
10. The method of claim 9, wherein the organized upstream data associates a plurality of data points with a corresponding user.
11. The method of claim 1 further comprising modifying a first corresponding threshold of a first rule independently from modifying a second corresponding threshold of a second rule.
12. A computer-implemented downstream execution method, comprising: 41 receiving source data from each of a plurality of sources each source selected from one of a relational database, a non-relational database, or a file system; identifying a downstream task to be executed, the downstream task being associated with at least a portion of an overall population of users; applying a plurality of rules to each of the source data from the plurality of sources; generating a plurality of outputs including at least one output for each of the plurality of rules applied to each of the source data; determining that at least one of the plurality of outputs from a first source of the plurality of sources does not meet a corresponding rule threshold; generating one or more errors based on the determining that the at least one of the plurality of outputs does not meet the corresponding rule threshold; flagging the first source based on the one or more errors; identifying a plurality of usable sources from the plurality of sources, the usable sources excluding the first source; identifying a tagged population based on the plurality of outputs associated with the usable sources, the tagged population being a subset of the overall population of users; and activating an execution of the downstream task for the tagged population.
13. The method of claim 12, further comprising: identifying a last known valid source; and including the last known valid source in the plurality of usable sources. 42
14. The method of claim 13, wherein the last known valid source is a previous version of the first source.
15. The method of claim 14, wherein the last known valid source previously met the corresponding rule threshold.
16. The method of claim 12, wherein more than one of the plurality of sources comprise data about a same user.
17. The method of claim 12, wherein the corresponding rule threshold is generated by a machine learning model.
18. The method of claim 12, further comprising organizing the source data based at least on a type of at least a subset of the source data.
19. The method of claim 18, wherein the organized source data associates a plurality of data points with a corresponding user.
20. A system comprising: a data storage device storing processor-readable instructions; and a processor operatively connected to the data storage device and configured to execute the instructions to perform operations that include: receiving source data from each of a plurality of sources, each source selected from one of a relational database, a non-relational database, or a file system; applying a plurality of rules to each of the source data from the plurality of sources; generating a plurality of outputs including at least one output for each of the plurality of rules applied to each of the source data; 43 determining that at least one of the plurality of outputs from a first source of the plurality of sources does not meet a corresponding rule threshold; generating one or more errors based on the determining that the at least one of the plurality of outputs does not meet the corresponding rule threshold; flagging the first source based on the one or more errors; identifying a plurality of usable sources from the plurality of sources, the usable sources excluding the first source; identifying a downstream task to be executed, the downstream task being associated with an overall population of users; identifying a tagged population based on the plurality of outputs associated with the usable sources, the tagged population being a subset of the overall population of users; and activating an execution of the downstream task for the tagged population. 44

Description

DECISION IMPLEMENTATION WITH INTEGRATED DATA QUALITY MONITORING TECHNICAL FIELD [001] Various embodiments of the present disclosure relate generally to performing downstream tasks for populations, and more particularly, systems and methods for individual rule-based flagging of upstream data quality errors. BACKGROUND [002] Large amounts of data may be obtained from various sources and may be processed using one or more rules and/or policies to be output for a given use. Processing the data may be done in a manner that limits or otherwise modifies the large amounts of data without allowing individual rule-based flagging of low quality or incorrect data. Such processing may limit the use of the data, limit error detection in the various sources, and/or result in unintended results. [003] The present disclosure is directed to addressing one or more of the above-referenced challenges. The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art, or suggestions of the prior art, by inclusion in this section. SUMMARY OF THE DISCLOSURE [004] According to certain aspects of the disclosure, methods and systems are disclosed for downstream execution with individual rule-based flagging of upstream data quality errors and include receiving upstream data, corresponding to an overall population of users, from a plurality of sources each source selected from one of a 1 relational database, a non-relational database, ora file system, identifying a downstream task to be executed, the downstream task being associated with at least a portion of the overall population, applying a plurality of rules to the upstream data, generating a plurality of outputs including at least one output for each of the plurality of rules applied to the upstream data, each of the plurality of outputs being associated with a corresponding rule of the plurality of rules, identifying a tagged population based on the plurality of outputs, the tagged population being a subset of the overall population, determining that at least one of the plurality of outputs does not meet a corresponding rule threshold, and activating the downstream execution for the tagged population after at least one of (i) updating the corresponding rule threshold or (ii) overriding an error generated based on the determining that the at least one of the plurality of outputs does not meeting the threshold. [005] In another aspect, an exemplary embodiment of a computer-implemented method includes receiving source data from each of a plurality of sources each source selected from one of a relational database, a non- relational database, or a file system, identifying a downstream task to be executed, the downstream task being associated with at least a portion of an overall population, applying a plurality of rules to each of the source data from the plurality of sources, generating a plurality of outputs including at least one output for each of the plurality of rules applied to each of the source data, determining that at least one of the plurality of outputs from a first source of the plurality of sources does not meet a corresponding rule threshold, flagging the first source based on the at least one of the plurality outputs not meeting a corresponding rule threshold, identifying a plurality of usable sources from the plurality of sources,the usable sources 2 Date Re?ue/Date Received 2022-06-23excluding the first source, identifying a downstream task to be executed based on the source data from the usable sources, the downstream task being associated with an overall population, identifying a tagged population based on the plurality of outputs associated with the usable sources, the tagged population being a subset of the overall population, and activating the downstream execution for the tagged population. [006] In another aspect, a system includes a data storage device storing processor-readable instructions and a processor operatively connected to the data storage device and configured to execute the instructions to perform operations that include, receiving source data from each of a plurality of sources, each source selected from one of a relational database, a non- relational database, or a file system, applying a plurality of rules to each of the source data from the plurality of sources, generating a plurality of outputs including at least one output for each of the plurality of rules applied to each of the source data, determining that at least one of the plurality of outputs from a first source of the plurality of sources does not meet a corresponding rule threshold, flagging the first source based on the at least one of the plurality outputs not meeting a corresponding rule threshold, identifying a plurality of usable sources from the plurality of sources