Search

US-20260129104-A1 - SYSTEMS AND METHODS FOR GENERATING A FILTERED DATA SET

US20260129104A1US 20260129104 A1US20260129104 A1US 20260129104A1US-20260129104-A1

Abstract

The present disclosure relates to generating a filtered data set. Data from a plurality of systems of record of a plurality of data source providers may be accessed. A master data set generated using the data accessed from the plurality of systems of record may be maintained. Restriction policies including one or more rules for restricting sharing of data may be maintained. A filtered data set may be generated for a data source provider responsive to an application of restriction policies of other data source providers to the master data set. The filtered data set may be provisioned.

Inventors

  • Oleksandr Oleinikov
  • Oleg Rogynskyy

Assignees

  • People.ai, Inc.

Dates

Publication Date
20260507
Application Date
20260105

Claims (20)

  1. 1 . A system comprising: one or more processors configured by machine-readable instructions to: access data from a plurality of systems of record of a plurality of data source providers; maintain a master data set generated using the data accessed from the plurality of systems of record; maintain, for each data source provider of the plurality of data source providers, a respective restriction policy including one or more rules for restricting sharing of data of a respective system of record of the data source provider; generate, for a first data source provider of the plurality of data source providers, a filtered data set responsive to applying, to the master data set, the respective restriction policy of each other data source provider of the plurality of data source providers, the filtered data set comprising a plurality of filtered node profiles; and provision the filtered data set for processing data of the first data source provider.
  2. 2 . The system of claim 1 , wherein the one or more processors are further configured by machine-readable instructions to: identify, for the first data source provider, a first subset of the data of the master data set stored in a first system of record of the first data source provider; and wherein the one or more processors are configured by machine-readable instructions to generate the filtered data set by applying, to a remaining subset of the data of the master data set, the respective restriction policy of each of the other data source providers of the plurality of data source providers.
  3. 3 . The system of claim 1 , wherein the one or more processors are configured by machine-readable instructions to generate the filtered data set by: determining, for a field-value pair of a first master node profile maintained by the one or more processors, a list of data source providers from which the field-value pair was obtained; determining, for each data source provider included in the list, whether the field-value pair satisfies the respective restriction policy of the data source provider; and restricting the field-value pair from inclusion in a filtered node profile of the plurality of filtered node profiles for the first master node profile responsive to determining that the field-value pair satisfies the respective restriction policy of each data source provider included in the list.
  4. 4 . The system of claim 1 , wherein the one or more processors are configured by machine-readable instructions to generate the filtered data set by: determining, for a field-value pair of a master node profile maintained by the one or more processors, a list of data source providers from which the field-value pair was obtained; determining, for each data source provider included in the list, whether the field-value pair satisfies the respective restriction policy of the data source provider; and including the field-value pair in a filtered node profile of the plurality of filtered node profiles for the master node profile responsive to determining that the field-value pair does not satisfy at least one restriction policy of a data source provider included in the list.
  5. 5 . The system of claim 4 , wherein the one or more processors are configured by machine-readable instructions to provision the filtered data set for processing data of the first data source provider by transmitting instructions to store data of the filtered node profile in a first system of record of the first data source provider.
  6. 6 . The system of claim 1 , wherein the one or more processors are configured by machine-readable instructions to generate the filtered data set by: determining, for a field-value pair of a master node profile maintained by the one or more processors, a count of data source providers from which the field-value pair was obtained; and including the field-value pair in a filtered node profile of the plurality of filtered node profiles for the master node profile responsive to determining the count of data source providers satisfies a threshold.
  7. 7 . The system of claim 1 , wherein the one or more processors are configured by machine-readable instructions to generate the filtered data set by generating the filtered data set responsive to applying, to the master data set, the restriction policy of the first data source provider.
  8. 8 . The system of claim 1 , wherein the one or more processors are configured by machine-readable instructions to generate the filtered data set by: determining, for a field-value pair of a master node profile, a sharing type of the field-value pair, the sharing type selected from a group comprising shareable, non-shareable, and public; and including the field-value pair in a filtered node profile of the plurality of filtered node profiles for the master node profile responsive to determining that the field-value pair has a sharing type of shareable or public.
  9. 9 . The system of claim 1 , wherein the one or more processors are configured by machine-readable instructions to generate the filtered data set for the first data source provider without maintaining separate filtered data sets for other data source providers.
  10. 10 . The system of claim 1 , wherein the one or more processors are configured by machine-readable instructions to generate the filtered data set by: determining, for each field-value pair of a first master node profile maintained by the one or more processors, a respective list of data source providers from which the field-value pair was obtained; determining, for each data source provider included in the respective list, whether each field-value pair of the first master node profile satisfies the respective restriction policy of the data source provider; and restricting all data stored in the master node profile from inclusion in the filtered data set responsive to determining each field-value pair satisfies the respective restriction policy of each data source provider included in the list.
  11. 11 . The system of claim 1 , wherein the one or more processors are configured by machine-readable instructions to generate the filtered data set by: determining, for each field-value pair of a first master node profile maintained by the one or more processors, a respective list of data source providers from which the field-value pair was obtained; determining, for each data source provider included in the respective list, whether each field-value pair of the first master node profile satisfies the respective restriction policy of the data source provider; and including each field-value pair in the master node profile in the filtered data set responsive to determining at least one field-value pair of the master node profile does not satisfy a respective restriction policy of at least one data source provider included in the list.
  12. 12 . The system of claim 1 , wherein the one or more processors are further configured by machine-readable instructions to: generate, for a second data source provider, a second filtered data set responsive to applying, to the master data set, the respective restriction policy of each data source provider of the plurality of data source providers other than the second data source provider; and provision the second filtered data set for processing data of the second data source provider.
  13. 13 . The system of claim 1 , wherein the one or more processors are configured to generate the filtered data set by: determining, for a field-value pair of a master node profile maintained by the one or more processors, a source from which the field-value pair was obtained; determining a source type of the source; and including the field-value pair of the master node profile in a filtered node profile of the plurality of filtered node profiles for the master node profile based on the source type.
  14. 14 . The system of claim 1 , wherein the one or more processors are configured to calculate a value for a field-value pair of a filtered node profile of the filtered data set based on a confidence score for the value.
  15. 15 . The system of claim 1 , wherein the one or more processors are configured to generate the filtered data set by generating the filtered data set responsive to applying, to the master data set, a restriction policy of a data set generation system including the one or more processors.
  16. 16 . The system of claim 1 , wherein the one or more processors are configured to provision the filtered data set for processing data of the first data source provider by storing the filtered data set in memory, and wherein the one or more processors train a machine learning model based solely on data from the filtered data set.
  17. 17 . A method, comprising: accessing, by one or more processors, data from a plurality of systems of record of a plurality of data source providers; maintaining, by the one or more processors, a master data set generated using the data accessed from the plurality of systems of record; maintaining, by the one or more processors, for each data source provider of the plurality of data source providers, a respective restriction policy including one or more rules for restricting sharing of data of a respective system of record of the data source provider; generating, by the one or more processors, for a first data source provider of the plurality of data source providers, a filtered data set responsive to applying, to the master data set, the respective restriction policy of each other data source provider of the plurality of data source providers, the filtered data set comprising a plurality of filtered node profiles; and provisioning, by the one or more processors, the filtered data set for processing data of the first data source provider.
  18. 18 . The method of claim 17 , further comprising: identifying, by the one or more processors for the first data source provider, a first subset of the data of the master data set stored in a first system of record of the first data source provider; and wherein generating the filtered data set comprises generating, by the one or more processors, the filtered data set by applying, to a remaining subset of the data of the master data set, the respective restriction policy of each of the other data source providers of the plurality of data source providers.
  19. 19 . The method of claim 17 , wherein generating the filtered data set for the first data source provider comprises generating, by the one or more processors, the filtered data set without maintaining separate filtered data sets for other data source providers.
  20. 20 . A method, comprising: accessing, by one or more processors, a plurality of record objects from a plurality of systems of record of a plurality of data source providers, the plurality of record objects corresponding to a plurality of entities; maintaining, by the one or more processors, for each entity of the plurality of entities, a master node profile including one or more field-value pairs obtained from at least one record object of the plurality of record objects and, for each field-value pair included in the master node profile, a respective identifier identifying a data source provider maintaining the at least one record object from which the field-value pair is obtained; maintaining, by the one or more processors, for each data source provider of the plurality of data source providers, a respective restriction policy including one or more rules for restricting sharing of field-value pairs included in record objects of a respective system of record of the data source provider; generating, by the one or more processors, for a first data source provider of the plurality of data source providers, a plurality of filtered node profiles corresponding to the plurality of master node profiles, each filtered node profile generated by: determining, for each field-value pair of a respective master node profile, a list of data source providers from which the field-value pair was obtained; determining, for each data source provider included in the list, whether the field-value pair satisfies a respective restriction policy of the data source provider; and i) restricting the field-value pair from inclusion in the filtered node profile responsive to determining that the field-value pair satisfies the respective restriction policy of each data source provider included in the list, or ii) including the field-value pair in the filtered node profile responsive to determining that the field-value pair does not satisfy the respective restriction policy of a data source provider included in the list; and provisioning, by the one or more processors, for the first data source provider, the plurality of filtered node profiles.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS The present application is a continuation of U.S. patent application Ser. No. 18/594,998, filed Mar. 4, 2024, which claims the benefit of and priority to U.S. patent application Ser. No. 17/853,797, filed Jun. 29, 2022, which claims the benefit of and priority to U.S. Provisional Patent Application No. 63/217,026, filed Jun. 30, 2021, and is a continuation-in-part of U.S. patent application Ser. No. 16/694,253, filed Nov. 25, 2019, which is a continuation of U.S. patent application Ser. No. 16/399,679, filed Apr. 30, 2019, which claims the benefit of and priority to U.S. Provisional Patent Application No. 62/747,452, filed Oct. 18, 2018, U.S. Provisional Patent Application No. 62/725,999, filed Aug. 31, 2018, and U.S. Provisional Patent Application No. 62/676,187, filed May 24, 2018, each of which is incorporated herein by reference for all purposes. BACKGROUND An organization may attempt to manage or maintain a system of record associated with electronic communications at the organization. The system of record can include information such as contact information, logs, and other data associated with the electronic activities. Data regarding the electronic communications can be transmitted between computing devices associated with one or more organizations using one or more transmission protocols, channels, or formats, and can contain various types of information. For example, the electronic communication can include information about a sender of the electronic communication, a recipient of the electronic communication, and content of the electronic communication. The information regarding the electronic communication can be input into a record being managed or maintained by the organization. However, due to the large volume of heterogeneous electronic communications transmitted between devices and the challenges of manually entering data, inputting the information regarding each electronic communication into a system of record can be challenging, time consuming, and error prone. SUMMARY One aspect of the present disclosure relates to a system. The system may comprise one or more processors configured by machine-readable instructions to access data from a plurality of systems of record of a plurality of data source providers; maintain a master data set generated using the data accessed from the plurality of systems of record; maintain, for each data source provider of the plurality of data source providers, a respective restriction policy including one or more rules for restricting sharing of data of a respective system of record of the data source provider; generate, for a first data source provider of the plurality of data source providers, a filtered data set responsive to applying, to the master data set, the respective restriction policy of each other data source provider of the plurality of data source providers the filtered data set comprising a plurality of filtered node profiles; and provision the filtered data set for processing data of the first data source provider. In some implementations, the one or more processors are further configured by machine-readable instructions to identify, for the first data source provider, a first subset of the data of the master data set stored in a first system of record of the first data source provider; wherein the one or more processors are configured by machine-readable instructions to generate the filtered data set by applying, to a remaining subset of the data of the master data set, the respective restriction policy of each of the other data source providers of the plurality of data source providers. In some implementations, the one or more processors are configured by machine-readable instructions to generate the filtered data set by determining, for a field-value pair of a first master node profile maintained by the one or more processors, a list of data source providers from which the field-value pair was obtained; determining, for each data source provider included in the list, whether the field-value pair satisfies the respective restriction policy of the data source provider; and restricting the field-value pair from inclusion in a filtered node profile of the plurality of filtered node profiles for the first master node profile responsive to determining that the field-value pair satisfies the respective restriction policy of each data source provider included in the list. In some implementations, the one or more processors are configured by machine-readable instructions to generate the filtered data set by determining, for a field-value pair of a master node profile maintained by the one or more processors, a list of data source providers from which the field-value pair was obtained; determining, for each data source provider included in the list, whether the field-value pair satisfies the respective restriction policy of the data source provider; and including the field-value pair in a filtered node profile of