Search

US-20260127309-A1 - REAL-TIME DATA INGESTION AND MODEL TRAINING

US20260127309A1US 20260127309 A1US20260127309 A1US 20260127309A1US-20260127309-A1

Abstract

In some implementations, a system may receive, at a first type of data structure, a set of data elements of a data stream. The system may forward the set of data elements to a second type of data structure and a third type of data structure. The system may receive, based on forwarding the set of data elements to the second type of data structure and the third type of data structure, a query for machine learning training data. The system may transmit, to a computational element associated with a machine learning processing platform, information relating to the set of data elements to train a machine learning model, wherein the information includes timing information relating to a set of instances of each data element of the set of data elements.

Inventors

  • Devansh DHUTIA
  • Akshina TRENTACOSTE
  • Obaidur Rehman KHAN
  • Archana SANTHIRAJ

Assignees

  • CAPITAL ONE SERVICES, LLC

Dates

Publication Date
20260507
Application Date
20251210

Claims (20)

  1. 1 - 20 . (canceled)
  2. 21 . A system, comprising: one or more memories; and one or more processors, coupled to the one or more memories, configured to: receive, at a first data structure via a network, a set of data elements, wherein the first data structure is associated with reading data from the data stream; forward the set of data elements to a second data structure and a third data structure, wherein the second data structure is associated with storing the set of data elements using a key-value technique, and wherein the third data structure is associated with storing the set of data elements using a relational database; transmit an instruction to analyze information included in the set of data elements in connection with a data policy, wherein the information includes timing information associated with the set of data elements; identify a data management action based on receiving a data report on the data policy; and transmit one or more commands to cause the data management action to be performed.
  3. 22 . The system of claim 21 , wherein the set of data elements is associated with timing information identifying at least one of a first time at which a data element is received at the first type of data structure or a second time at which the data element was generated
  4. 23 . The system of claim 21 , wherein the one or more processors are further configured to: identify a plurality of events related to a plurality of instances of data elements of the set of data elements over a period of time; consolidate the plurality of events into a consolidated event; and perform one or more event-based actions associated with the consolidated event.
  5. 24 . The system of claim 21 , wherein transmitting the instruction to analyze information is based on at least one of: detecting a triggering event, or an auditing schedule.
  6. 25 . The system of claim 21 , wherein analyzing the information comprises: identify at least one of changes or trends associated with the set of data elements using timing information associated with the set of data elements; and perform at least one of simulating or recreating one or more events associated with the set of data elements to determine at least one of whether a data policy is satisfied, a trend is observed, or another criteria has occurred that corresponds to the data management action.
  7. 26 . The system of claim 21 , wherein the data management action comprises removing one or more data elements, from the set of data elements, that violate the data policy.
  8. 27 . The system of claim 21 , wherein the data management action comprises an action associated with at least one of an access privilege or access control.
  9. 28 . A non-transitory computer-readable medium storing a set of instructions, the set of instructions comprising: one or more instructions that, when executed by one or more processors of a device, cause the device to: receive, at a first data structure via a network, a set of data elements, wherein the first data structure is associated with reading data from the data stream; forward the set of data elements to a second data structure and a third data structure, wherein the second data structure is associated with storing the set of data elements using a key-value technique, and wherein the third data structure is associated storing the set of data elements using a relational database; transmit an instruction to analyze information included in the set of data elements in connection with a data policy, wherein the information includes timing information associated with the set of data elements; identify a data management action based on receiving a data report on the data policy; and transmit one or more commands to cause the data management action to be performed.
  10. 29 . The non-transitory computer-readable medium of claim 28 , wherein the set of data elements is associated with timing information identifying at least one of a first time at which a data element is received at the first type of data structure or a second time at which the data element was generated
  11. 30 . The non-transitory computer-readable medium of claim 28 , wherein the one or more instructions further cause the device to: identify a plurality of events related to a plurality of instances of a data elements of the set of data elements over a period of time; consolidate the plurality of events into a consolidated event; and perform one or more event-based actions associated with the consolidated event.
  12. 31 . The non-transitory computer-readable medium of claim 28 , wherein transmitting the instruction to analyze information is based on at least one of: detecting a triggering event, or an auditing schedule.
  13. 32 . The non-transitory computer-readable medium of claim 28 , wherein the one or more instructions, that cause the device to analyze the information, cause the device to: identify at least one of changes or trends associated with the set of data elements using timing information associated with the set of data elements; and perform at least one of simulating or recreating one or more events associated with the set of data elements to determine at least one of whether a data policy is satisfied, a trend is observed, or another criteria has occurred that corresponds to the data management action.
  14. 33 . The non-transitory computer-readable medium of claim 28 , wherein the one or more instructions further cause the device to remove one or more data elements, from the set of data elements, that violate the data policy.
  15. 34 . The non-transitory computer-readable medium of claim 28 , wherein the data management action comprises an action associated with at least one of an access privilege or access control.
  16. 35 . A method, comprising: receiving, by a device, at a first data structure via a network, a set of data elements, wherein the first data structure is associated with reading data from the data stream; forwarding, by the device, the set of data elements to a second data structure and a third data structure, wherein the second data structure is associated with storing the set of data elements using a key-value technique, and wherein the third data structure is associated storing the set of data elements using a relational database; transmitting, by the device, an instruction to analyze information included in the set of data elements in connection with a data policy, comprising timing information associated with the set of data elements; identifying, by the device, a data management action based on receiving a data report on the data policy; and transmitting, by the device, one or more commands to cause the data management action to be performed.
  17. 36 . The method of claim 25 , wherein the set of data elements is associated with timing information identifying at least one of a first time at which a data element is received at the first type of data structure or a second time at which the data element was generated.
  18. 37 . The method of claim 25 , further comprising: identifying a plurality of events related to a plurality of instances of a data elements of the set of data elements over a period of time; consolidating the plurality of events into a consolidated event; and performing one or more event-based actions associated with the consolidated event.
  19. 38 . The method of claim 25 , wherein transmitting the instruction to analyze information is based on at least one of: detecting a triggering event, or an auditing schedule.
  20. 39 . The method of claim 25 , wherein analyzing the information comprises: identifying at least one of changes or trends associated with the set of data elements using timing information associated with the set of data elements; and performing at least one of simulating or recreating one or more events associated with the set of data elements to determine at least one of whether a data policy is satisfied, a trend is observed, or another criteria has occurred that corresponds to the data management action.

Description

RELATED APPLICATION This application is a continuation of U.S. patent application Ser. No. 18/626,256, filed Apr. 3, 2024 (now U.S. Pat. No. 12,499,262), which is incorporated herein by reference in its entirety. BACKGROUND A data platform may perform an ingestion procedure to collect or absorb data into object storage. For example, from a streaming source, a data platform may perform continuous ingestion. In contrast, from a batch source, the data platform may perform periodic or triggered ingestion. Data platforms may make data available for further use, such as by exposing data application programming interfaces (APIs). A system may use an API to request and receive a dataset from the data platform, which may be used for generating a visualization, generating one or more metrics, or training a model, among other examples. SUMMARY Some implementations described herein relate to a system for data management. The system may include one or more memories and one or more processors communicatively coupled to the one or more memories. The one or more processors may be configured to receive, at a first type of data structure, a set of data elements of a data stream, wherein the set of data elements is associated with timing information identifying at least one of a first time at which each data element is received at the data structure or a second time at which each data element was generated. The one or more processors may be configured to forward the set of data elements to a second type of data structure and a third type of data structure, at least one of the second type of data structure or the third type of data structure being associated with a computational interface. The one or more processors may be configured to detect, based on forwarding the set of data elements to the second type of data structure and the third type of data structure, a trigger to audit the set of data elements. The one or more processors may be configured to transmit, to a computational element associated with the computational interface, an instruction to analyze information included in the set of data elements in connection with a data policy. The one or more processors may be configured to receive, from the computational element associated with the computational interface, a data report on the data policy. The one or more processors may be configured to identify a data management action based on the data report on the data policy. The one or more processors may be configured to transmit one or more commands to cause the data management action to be performed. Some implementations described herein relate to a method. The method may include receiving, by a system and at a first type of data structure, a set of data elements of a data stream, wherein the set of data elements is associated with timing information identifying at least one of a first time at which each data element is received at the data structure or a second time at which each data element was generated. The method may include forwarding, by the system, the set of data elements to a second type of data structure and a third type of data structure, at least one of the second type of data structure or the third type of data structure being associated with a computational interface. The method may include receiving, based on forwarding the set of data elements to the second type of data structure and the third type of data structure, a query for machine learning training data. The method may include transmitting, by the system and to a computational element associated with a machine learning processing platform, information relating to the set of data elements to train a machine learning model, wherein the information includes timing information relating to a set of instances of each data element of the set of data elements. Some implementations described herein relate to a non-transitory computer-readable medium that stores a set of instructions. The set of instructions, when executed by one or more processors of a system, may cause the system to receive a set of data elements from a data stream associated with a first type of data structure. The set of instructions, when executed by one or more processors of the system, may cause the system to store the set of data elements in a second type of data structure and a third type of data structure. The set of instructions, when executed by one or more processors of the system, may cause the system to detect, based on forwarding the set of data elements to the second type of data structure and the third type of data structure, a trigger to audit the set of data elements. The set of instructions, when executed by one or more processors of the system, may cause the system to transmit, to a computational element associated with the second type of data structure or the third type of data structure, an instruction to analyze information included in the set of data elements in connection with a data policy. The set of instructio