US-20260127272-A1 - DATA PIPELINE
Abstract
An example computer system for ingesting data from multiple sources. The example computer system comprises one or more processors; and non-transitory computer-readable storage media encoding instructions which, when executed by the one or more processors, causes the computer system to: receive, from a plurality of data sources, data entries, the plurality of data sources including an external computing device and an application programming interface; determine an application for use of the data entries; transform the data entries for storage in a database; curate a history record of the data entries stored in the database; and refine the data entries for use with the application.
Inventors
- Satish Raj KATAKAM
- Ralph Pinheiro
- Umamaheshwari Thandapani
Assignees
- WELLS FARGO BANK, N.A.
Dates
- Publication Date
- 20260507
- Application Date
- 20241101
Claims (20)
- 1 . A computer system for ingesting data from multiple sources, the computer system comprising: one or more processors; and non-transitory computer-readable storage media encoding instructions which, when executed by the one or more processors, causes the computer system to: receive, from a plurality of data sources, data entries, the plurality of data sources including an external computing device and an application programming interface; determine an application for use of the data entries; transform the data entries for storage in a database; curate a history record of the data entries stored in the database; and refine the data entries for use with the application.
- 2 . The computer system of claim 1 , wherein the instructions further cause the computer system to: provide the data entries to the application, wherein the application is a cyber-security analysis tool.
- 3 . The computer system of claim 1 , wherein the instructions further cause the computer system to: edit a data entry within the database; and update the history record of the data entry.
- 4 . The computer system of claim 1 , wherein the plurality of data sources further includes an internal computing system, the internal computing system being part of an internal network including the computer system.
- 5 . The computer system of claim 1 , wherein refining includes processing the data with business logic.
- 6 . The computer system of claim 1 , wherein the instructions further cause the computer system to: responsive to a reception of the data entries, perform controls to ingest the data entries.
- 7 . The computer system of claim 6 , wherein the controls include a data receipt control.
- 8 . The computer system of claim 7 , where in the controls include a data completeness management control.
- 9 . The computer system of claim 1 , wherein the instructions further cause the computer system to: determine a source for the data entries; and select a control to ingest the data entries based on a determined source.
- 10 . The computer system of claim 9 , wherein the instructions further cause the computer system to: perform a selected control.
- 11 . A method for ingesting data from multiple sources, the method comprising: receiving, from a plurality of data sources, data entries, the plurality of data sources including an external computing device and an application programming interface; determining an application for use of the data entries; transforming the data entries for storage in a database; curating a history record of the data entries stored in the database; and refining the data entries for use with the application.
- 12 . The method of claim 11 , further comprising: providing the data entries to the application, wherein the application is a cyber-security analysis tool.
- 13 . The method of claim 11 , further comprising editing a data entry within the database; and updating the history record of the data entry.
- 14 . The method of claim 11 , wherein the plurality of data sources further includes an internal computing system, the internal computing system being part of an internal network.
- 15 . The method of claim 11 , wherein refining includes processing the data entries with business logic.
- 16 . The method of claim 11 , further comprising: responsive to a reception of the data entries, performing controls to ingest the data entries.
- 17 . The method of claim 16 , wherein the controls include a data receipt control.
- 18 . The method of claim 16 , where in the controls include a data completeness management control.
- 19 . The method of claim 11 , further comprising: determining a source for the data entries; and selecting a control to ingest the data entries based on determining the source.
- 20 . The method of claim 19 , further comprising: performing a selected control.
Description
BACKGROUND Data ingestion is the process of gathering and importing data from various sources into a centralized location, such as a data warehouse, data lake, or database, for further processing and analysis. Ingestions involves collecting raw data from diverse origins, like databases, application programming interfaces (APIs), files, sensors, and social media feeds, and transforming it into a usable format. Further, data ingestion includes steps to produce useable data. Once the data is collected, the data is transformed into a usable format. After the data is transformed, the data can be provided to the target system. However, the variety of data sources results in extensive development effort to process the data. SUMMARY Examples provided herein are directed to data ingestion pipeline. According to one aspect, an example computer system for ingesting data from multiple sources comprises: one or more processors; and non-transitory computer-readable storage media encoding instructions which, when executed by the one or more processors, causes the computer system to: receive, from a plurality of data sources, data entries, the plurality of data sources including an external computing device and an application programming interface; determine an application for use of the data entries; transform the data entries for storage in a database; curate a history record of the data entries stored in the database; and refine the data entries for use with the application. According to another aspect, an example method for ingesting data from multiple sources comprises: receiving, from a plurality of data sources, data entries, the plurality of data sources including an external computing device and an application programming interface; determining an application for use of the data entries; transforming the data entries for storage in a database; curating a history record of the data entries stored in the database; and refining the data entries for use with the application. The details of one or more techniques are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of these techniques will be apparent from the description, drawings, and claims. DESCRIPTION OF THE DRAWINGS FIG. 1 shows an example system including a data ingestion pipeline. FIG. 2 shows example logical components of a server device of the system of FIG. 1. FIG. 3 shows additional details of the example server device of FIG. 2. FIG. 4 shows a method for ingesting data with the system of FIG. 1. FIG. 5 shows example physical components of the server device of FIG. 2. DETAILED DESCRIPTION This disclosure relates to a data ingestion pipeline. Data ingestion and processing allows organizations to harness the power of their data, regardless of its source or format, and improve business value and innovation. For example, acquired data from outside sources can be input into analytics platforms, which enables organizations to gain insights about the data. The insights may indicate customer behavior, preferences, or other information about the organization's business. In addition, processing data can be used to train machine learning models for artificial intelligence. The data can also be used for cyber security analysis. Cyber security analysists can use the data for threat detection, prevention, and response. Ingested data can also be stored in repositories as historical data and current data. The repositories may be data warehouses or data lakes. While acquiring data for these purposes can have valuable purposes, data from external sources is often in a format that is unusable by internal systems of the organization or entity. For example, the data may need to be cleaned, converted, and/or transformed before use. Thus, the data must be processed before it can be stored, analyzed, or used for other purposes. Ingestion and processing to provide usable data can consume extensive resources and take considerable amounts of time. The present disclosure provides a data ingestion pipeline (DPL). Embodiments of the DPL can be implemented in a DPL system that provides a modern, well-managed, and easy-to-use data ecosystem for cyber security analysts, data scientists, incident responders, and threat hunters to self-serve analytical data to support accelerated threat detection, prevention and response, and to confidently secure and protect customers and assets. Further, data can be quickly ingested with little to no development effort. Further, the DPL system utilizes multiple knobs that can be tuned based on ingestion type and use case. Knobs refer to configurable parameters or settings that influence the behavior and performance of a system or process. Further, the knobs allow for fine-tuning of various aspects of how data is handled. In some embodiments, the DPL system uses a Yet Another Markup Language (YAML). The DPL system also manages access to each set of data. For example, the DPL system may support domain-