EP-4736027-A1 - AUTOMATIC VALIDATION OF INPUT FILES

EP4736027A1EP 4736027 A1EP4736027 A1EP 4736027A1EP-4736027-A1

Abstract

A network system to analyze a combined output of various input files from data-based applications. The system provides custom profiling of the data from each application based on an application of one or more sets of rules. The system stores the data from any other number of applications in a base level of granularity to allow direct comparison of the data from each application output. Because the data is stored at a same level of granularity, the data may be compared or processed regardless of the application from which the data is received. The system applies rules to compare the data across the applications to identify outliers, trends, or commonalities. The system may also search for and identify data fitting a specific rule across the applications to extract, modify, or label the data. The system provides a visualization of the data based on the rules applied.

Inventors

KURIAN, ROBIN J.
RATH, NEELIMA
KURUGANTHY, MADHUSHALINI
MARSHALL, JAMAAL D.

Assignees

Citibank, N.A.

Dates

Publication Date: 20260506
Application Date: 20240628

Claims (20)

1. A data management system, comprising: a processor of a service provider communicatively coupled to a storage device, wherein the processor executes application code instructions that are stored in the storage device to cause the system to: receive data from a plurality of different applications associated with the service provider, the received data being in a plurality of formats; process the data into a base level format with a configured level of granularity; categorize the processed data; configure a set of rules that is applicable to perform a positive operation on the categorized data with the base level format with the configured level of granularity; apply the set of rules to the processed data; generate an output of the application of rules for the categorized data; and provide a display of a visualization of the generated output for the data.
2. The data management system of claim 1, further comprising application code instructions to: compare the categorized data in a particular category with other data in the particular category; and identify outliers in the processed data.
3. The data management system of claim 2, further comprising application code instructions to isolate the identified outliers for rectification by the system.
4. The data management system of claim 1, wherein the set of rules causes the data management system to update the data in a particular category.
5. The data management system of claim 4, wherein the update to the data comprises a deletion of at least a portion of the data in the particular category across a plurality of different applications.
6. The data management system of claim 1, wherein processing the data comprises generating a custom profile for the data from each application.
7. The data management system of claim 1 , wherein the data from the plurality of different applications is received in a plurality of formats and from a plurality of sources.
8. The data management system of claim 1, wherein the data from the plurality of different applications associated with the service provider comprises each set of data from each of the different applications used by every entity that maintains an account with the service provider.
9. The data management system of claim 1, wherein the visualization of the generated output for the data is a chart presented on a graphical user interface.
10. The data management system of claim 1, wherein the output of the application of rules for data from each of the plurality of applications is presented as a summarized report for each application of the plurality of applications.
11. The data management system of claim 1, wherein the application code instructions to the apply a set of rules to the processed data is initiated by a user upon request.
12. The data management system of claim 1, wherein the application code instructions to the apply a set of rules to the processed data is automatically initiated based on a configured schedule.
13. A method to manage data, comprising: receiving, by one or more computing devices, data from a plurality of different applications associated with the service provider, the received data being in a plurality of formats; processing, by the one or more computing devices, the data to a configured format and level of granularity; categorizing, by the one or more computing devices, the processed data; configuring, by the one or more computing devices, a set of rules that is applicable to the categorized data with a base level format with a configured level of granularity; applying, by the one or more computing devices, the set of rules to the processed data; generating, by the one or more computing devices, an output of the application of rules for the categorized data; and providing, by the one or more computing devices, a display of a visualization of the generated output for the data.
14. The method of claim 13, further comprising: categorize the processed data; compare processed data categorized in a particular category with other data in the particular category; and identify outliers in the processed data.
15. The method of claim 13, wherein processing the data comprises generating a custom profile for the data from each application.
16. The method of claim 13, wherein the data from the plurality of different applications is received in a plurality of formats and from a plurality of sources.
17. A computer program product, comprising: a non-transitory computer-readable medium having computer-readable program instructions embodied thereon, the computer-readable program instructions comprising instructions to: receive data from a plurality of different applications associated with the service provider; process the data into a base level format with a configured format; categorize the processed data; configure a set of rules that is applicable to the categorized data with a base level format with a configured level of granularity; apply the set of rules to the processed data; generate an output of the application of rules for the categorized data; and provide a display of a visualization of the generated output for the data.
18. The computer programming product of claim 17, further comprising instructions to update the data in a particular category.
19. The computer programming product of claim 18, wherein the update to the data comprises a deletion of at least a portion of the data in the particular category.
20. The computer programming product of claim 17, wherein the data from the plurality of different applications associated with the service provider comprises each set of data from each of the different applications used by every entity that maintains an account with the service provider.

Description

AUTOMATIC VALIDATION OF INPUT FILES CROSS REFERENCE TO RELATED APPLICATIONS [0001] This application claims benefit of priority of U.S. Patent Application No. 18/217,041, filed June 30, 2023. The content of the forgoing application is incorporated herein in its entirety by reference. FIELD OF THE INVENTION [0002] The technology relates generally to the field of data validation, and more particularly to methods and systems to provide a workflow-based quality engineering automation solution that automatically manages and validates data input files of multiple applications provided in different formats. BACKGROUND [0003] In data management systems, users attempt to process data to allow the data to be manipulated, compared, sorted, revised, validated, or have any other type of process applied. The data management system may receive data inputs from many different types of applications or processes. The different applications may provide the data in different formats or styles. When processing the data, the data management system is unable to directly compare or analyze the different data sets because the formats do not allow direct comparison. The quantity of data may be too great to process the data when the data is in incompatible formats. [0004] For example, a first application, such as an application used in an access management organization, records a user phone number. A second application, such as an application on a social media site, records a second instance of the user phone number. In order to validate one or both applications, to validate a user phone number, or for any other reason, the data management system desires to compare the phone numbers. If the phone numbers are recorded and stored in different formats or using different data storage criteria, the comparison may not be possible or would require human intervention. [0005] In another example, a conventional data management system may desire to perform a processing action across all data associated with a user, such as to delete account data of a user. The data management system may have stored data from the user in different formats from different applications. The data management system may not have an ability to capture all the instances of the user data because the search criteria are unable to capture each different format. [0006] Conventional data management systems are unable to compare data, identify outlier data, validate systems, and modify data from multiple applications operating with different systems and/or in different formats. Nonetheless, a data management system must be able to manage all of the data received from all of the applications of institutions and businesses that service all of the users, members, clients, associates, and customers. No group of humans could search each data sets stored from each application in any reasonable amount of time. Further, searching the data by a group of humans is unreasonable due to the different formats and levels of granularity of the data and the varying computer-based storage technologies. BRIEF DESCRIPTION OF THE DRAWINGS [0007] Figure 1 is a block diagram depicting a system to manage data from multiple data- based applications. [0008] Figure 2 is a block flow diagram depicting a method to manage data from multiple data-based applications. [0009] Figure 3 is a block flow diagram depicting a method to delete data related to a user across multiple applications. [0010] Figure 4 is a block flow diagram depicting a method to identify data outliers across multiple applications. [0011] Figure 5 is an illustration of an example graphical user interface of a list of application functions and a number of rules applied per function. [0012] Figure 6 is an illustration of an example graphical user interface of a list of the rules and outputs. [0013] Figure 7 is an illustration of an example graphical user interface of an issue identified by the application of the rules. [0014] Figure 8 is an illustration of an example graphical user interface displaying results of an application of a set of rules to a data set. [0015] Figure 9 depicts a computing machine and a module. DETAILED DESCRIPTION Example System Architecture [0016] Figure 1 is a block diagram depicting a system to manage data from multiple data- based applications. As depicted in Figure 1, the architecture 100 includes a first entity computing system 110, a data management system 120, and a second entity computing device 130 that are connected by communications network 99. [0017] Each network, such as communication network 99, includes a wired or wireless telecommunication mechanism and/or protocol by which the components depicted in Figure 1 can exchange data. For example, each network 99 can include a local area network (“LAN”), a wide area network (“WAN”), an intranet, an Internet, a mobile telephone network, storage area network (SAN), personal area network (PAN), a metropolitan area network (MAN), a wireless local area network