US-12625870-B2 - Systems and methods for associating data entries
Abstract
In one embodiment, a first entry in a first database is modified to include data from a highest-ranked one of one or more available data tables that correspond to the first entry. Each of one or more characters fields of the modified first entry are converted into a respective one or more first-entry tokens, and each of one or more character fields of each of a plurality of second entries in a second database is converted into a respective one or more second-entry tokens. The first-entry tokens are compared to the second-entry tokens, and, in response to the comparison, it is determined whether the first entry matches one of the second entries. In response to determining that the first entry matches one of the second entries, the first entry and the matching second entry are associated with one another in one or both the first and second databases.
Inventors
- Lu Zhang
- Nichole Haas
- Joshua Manoj
- Sri Raja Harshini Koka
Assignees
- SAP SE
Dates
- Publication Date
- 20260512
- Application Date
- 20241121
Claims (17)
- 1 . A method, comprising: loading first data entries from an expense report database and an itinerary database into a data entry store; converting, by a tokenizer module of a computing device, the first data entries in the data entry store into one or more first-entry tokens; assigning, by a weighting engine executed by the computing device, a weight to each of the one or more first-entry tokens based on a frequency of the of the first-entry tokens; filtering, by a filtering component of the computing device, the first data entries to remove the first data entries in expense reports that are not associated with a travel segment expense; converting, by the tokenizer, each of one or more character fields of each of a plurality of second entries in a second database into a respective one or more second-entry tokens and weighting the second-entry tokens based on a frequency of the second-entry tokens; comparing, by a comparison engine of the computing device, the first-entry tokens to the second-entry tokens; determining, by the computing device, whether the first data entries matches one of the second entries based on the comparing and weights of the first-entry tokens and the second-entry tokens, wherein determining that the first data entries matches one of the plurality of second entries is further in response to: each of a first plurality of character fields of the first data entries exactly matching a respective character field of the one of the plurality of second entries; and each of at least two of a second plurality of character fields of the first data entries at least partially matching a respective character field of the one of the plurality of second entries; associating, by the computing device, a first data entry with one of the second entries in response to determining that the first data entry matches the one of the second entries; and determining that the first data entry matches one of the plurality of second entries.
- 2 . The method of claim 1 wherein the filtering removes first data entries that have been submitted but have not been approved.
- 3 . The method of claim 1 wherein the filtering removes first data entries that have not been submitted for approval.
- 4 . The method of claim 1 wherein the filtering removes first data entries that are already associated with a travel segment entry in the itinerary database.
- 5 . The method of claim 1 wherein converting each of one or more character fields of each of the plurality of second entries in the second database into a respective one or more second-entry tokens includes converting each of the one or more character fields into a respective plurality of overlapping second-entry tokens.
- 6 . The method of claim 1 , wherein determining that the first data entries match one of the plurality of second entries is further in response to: each of a first plurality of character fields of the first data entries exactly matching a respective character field of the one of the plurality of second entries; each of at least one of a second plurality of character fields of the first data entries at least partially matching a respective character field of the one of the plurality of second entries; and each of at least one of a third plurality of character fields of the first data entries at least partially matching a respective character field of the one of the plurality of second entries.
- 7 . A system, comprising: one or more processors; and a non-transitory machine-readable medium storing a program executable by the one or more processors, the program comprising sets of instructions for: loading first data entries from an expense report database and an itinerary database into a data entry store; converting, by a tokenizer module, the first data entries in the data entry store into one or more first-entry tokens; assigning, by a weighting engine, a weight to each of the one or more first-entry tokens based on a frequency of the of the first-entry tokens; filtering, by a filtering component, the first data entries to remove the first data entries in expense reports that are not associated with a travel segment expense; converting, by the tokenizer, each of one or more character fields of each of a plurality of second entries in a second database into a respective one or more second-entry tokens and weighting the second-entry tokens based on a frequency of the second-entry tokens; comparing, by a comparison engine using the one or more processors, the first-entry tokens to the second-entry tokens; determining, by the one or more processors, whether the first data entries matches one of the second entries based on the comparing and weights of the first-entry tokens and the second-entry tokens wherein determining that the first data entries matches one of the plurality of second entries is further in response to: each of a first plurality of character fields of the first data entries exactly matching a respective character field of the one of the plurality of second entries; and each of at least two of a second plurality of character fields of the first data entries at least partially matching a respective character field of the one of the plurality of second entries; associating, by the one or more processors, a first data entry with one of the second entries in response to determining that the first data entry matches the one of the second entries; and determining, by the one or more processors, that the first data entry matches one of the plurality of second entries.
- 8 . The system of claim 7 wherein the filtering removes first data entries that have been submitted but have not been approved.
- 9 . The system of claim 7 wherein the filtering removes first data entries that have not been submitted for approval.
- 10 . The system of claim 7 wherein the filtering removes first data entries that are already associated with a travel segment entry in the itinerary database.
- 11 . The system of claim 7 wherein converting each of one or more character fields of each of the plurality of second entries in the second database into a respective one or more second-entry tokens includes converting each of the one or more character fields into a respective plurality of overlapping second-entry tokens.
- 12 . The system of claim 7 , wherein determining that the first data matches entries match one of the plurality of second entries is further in response to: each of a first plurality of character fields of the first data entries exactly matching a respective character field of the one of the plurality of second entries; each of at least one of a second plurality of character fields of the first data entries at least partially matching a respective character field of the one of the plurality of second entries; and each of at least one of a third plurality of character fields of the first data entries at least partially matching a respective character field of the one of the plurality of second entries.
- 13 . A non-transitory machine-readable medium storing a program executable by at least one processor of a computer, the program comprising sets of instructions for: loading first data entries from an expense report database and an itinerary database into a data entry store; converting, by a tokenizer module of the computer, the first data entries in the data entry store into one or more first-entry tokens; assigning, by a weighting engine, a weight to each of the one or more first-entry tokens based on a frequency of the of the first-entry tokens; filtering, by a filtering component the first data entries to remove the first data entries in expense reports that are not associated with a travel segment expense; converting, by the tokenizer of the computer, each of one or more character fields of each of a plurality of second entries in a second database into a respective one or more second-entry tokens and weighting the second-entry tokens based on a frequency of the second-entry tokens; comparing, by the computer, the first-entry tokens to the second-entry tokens; determining, by the computer, whether the first data entries matches one of the second entries based on the comparing and weights of the first-entry tokens and the second-entry tokens, wherein determining that the first data entries matches one of the plurality of second entries is further in response to: each of a first plurality of character fields of the first data entries exactly matching a respective character field of the one of the plurality of second entries; and each of at least two of a second plurality of character fields of the first data entries at least partially matching a respective character field of the one of the plurality of second entries; associating, by the computer, a first data entry with one of the second entries in response to determining that the first data entry matches the one of the second entries; and determining, but the computer, that the first data entry matches one of the plurality of second entries.
- 14 . The non-transitory machine-readable medium of claim 13 wherein the filtering removes first data entries that have been submitted but have not been approved.
- 15 . The non-transitory machine-readable medium of claim 13 wherein the filtering removes first data entries that have not been submitted for approval.
- 16 . The non-transitory machine-readable medium of claim 13 wherein the filtering removes first data entries that are already associated with a travel segment entry in the itinerary database.
- 17 . The non-transitory machine-readable medium of claim 13 wherein converting each of one or more character fields of each of the plurality of second entries in the second database into a respective one or more second-entry tokens includes converting each of the one or more character fields into a respective plurality of overlapping second-entry tokens.
Description
CROSS-REFERENCE TO RELATED APPLICATION This application is a continuation of U.S. patent application Ser. No. 17/531,546, filed Nov. 19, 2021, which is a continuation of U.S. patent application Ser. No. 16/201,807, filed Nov. 27, 2018. The entire contents of U.S. patent application Ser. Nos. 16/201,807 and 17/531,546 are incorporated herein by reference in its entirety for all purposes. BACKGROUND The present disclosure relates to computing and data processing, and in particular, to systems and methods for matching and associating data entries, for example, in respective databases. The widespread adoption of using computers for data processing has led to number of challenges. One challenge stems from the desirability of linking, or otherwise associating, a data entry in one database with a data entry in another database, where both databases may be part of, or may be configured to communicate with one another via, a cloud or other computing system. For example, an entity (e.g., a company) may prefer that a data entry related to a member's (e.g., employee's) travel itinerary (e.g., airfare or hotel reservation) and stored in an itinerary (transactional) database be linked, or otherwise associated, with the corresponding entry in the member's expense report stored in an expense-reporting (analytical processing) database. These challenges are compounded, for example, where the corresponding data entries are entered into their respective databases at different times, sometimes days, weeks, or even months apart, and where each of one or more of the databases includes a large number of corresponding data entries that are not linked, or otherwise associated, with one another. SUMMARY Embodiments of the present disclosure pertain to systems and methods for associating data entries in respective databases. In one embodiment, the present disclosure pertains to systems and methods for matching and associating corresponding data entries in respective databases. A first data entry in a first data base is modified to include data from a highest-ranked one of one or more available data tables that correspond to the first data entry. Each of one or more characters fields of the modified first data entry is converted into a respective one or more first-entry tokens, and each of one or more character fields of each of a plurality of second entries in a second database are converted into a respective one or more second-entry tokens. The first-entry tokens are compared to the second-entry tokens, and, in response to the comparison, it is determined whether the first data entry matches one of the second data entries. In response to determining that the first data entry matches one of the second data entries, the first data entry and the matching second data entry are associated with one another, for example, in one or both of the first and second databases. The following detailed description and accompanying drawings provide a better understanding of the nature and advantages of the present disclosure. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates an architecture for associating a data entry in one database with a corresponding data entry in another database, according to an embodiment. FIG. 2 illustrates a process for associating a data entry in one database with a corresponding data entry in another database, according to an embodiment. FIG. 3 illustrates an example architecture for associating a data entry in one database with a corresponding data entry in another database, according to an embodiment. FIG. 4 illustrates an example process that occurs before associating a data entry in one database with a corresponding data entry in another database, according to an embodiment. FIG. 5 illustrates an example process for associating a data entry in one database with a data entry in another database, according to an embodiment. FIG. 6 illustrates hardware of a special purpose computing machine configured to operate and to function according to the present disclosure. DETAILED DESCRIPTION In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of the present disclosure. Such examples and details are not to be construed as unduly limiting the elements of the claims or the claimed subject matter as a whole. It will be evident to one skilled in the art, based on the language of the different claims, that the claimed subject matter may include some or all of the features in these examples, alone or in combination, and may further include modifications and equivalents of the features and techniques described herein. FIG. 1 illustrates an architecture 100 for associating a data entry in one database with a data entry in another database, according to an embodiment. Features and advantages of the present disclosure include automatically associating an entry in a record, such as an expense report, stored in one database, with an en