US-12619983-B2 - Machine learning classifier based on category modeling
Abstract
Provided are systems and methods which can use machine learning to draw additional inferences about transaction records from transaction strings. The inferred data can be used to build a classification model configured to map transaction string to predefined categories. In one example, a method may include receiving a file comprising transaction strings corresponding to a plurality of transaction records, executing the machine learning model on the transaction strings to identify a plurality of categories associated with the transaction strings, generating a classifier model that comprises patterns of keywords from the transactions strings mapped to the plurality of identified categories, respectively, and storing the classifier model in the data store.
Inventors
- Jason Robinson
- Andrew Toloff
- Tyler Howard
- Nathan Crockett
- Amanda Miguel
- Marcel Crudele
Assignees
- STEADY PLATFORM LLC
Dates
- Publication Date
- 20260505
- Application Date
- 20220719
Claims (14)
- 1 . A computing system comprising: a data store configured to store a machine learning model; and a processor configured to: receive, via an application programming interface (API) call, a file comprising transaction strings corresponding to a plurality of transaction records; execute a machine learning model on the transaction strings to identify mappings between the transaction strings and a plurality of deposit source categories; assign a label, by the machine learning model, to each of the transaction strings corresponding to its associated deposit source category as indicated in the identified mappings; generate, by the machine learning model, a classifier model that comprises respective patterns of keywords from the transaction strings mapped to the plurality of deposit source categories, respectively, based on the identified mappings; store the classifier model in the data store; receive, via a second API call, a file including a plurality of additional transaction records, the plurality of additional transaction records corresponding to additional plurality of transaction strings; execute the classifier model on the plurality of additional transaction strings to identify one of the deposit source categories to assign to each of the additional transaction records and to generate an output file including each of the additional transaction records further labeled with the deposit source category identified by the classifier model; and execute a second machine learning model on the output file including the plurality of labeled additional transaction strings to identify a counterparty of each additional transaction strings.
- 2 . The computing system of claim 1 , wherein the processor is further configured to parse the plurality of transaction strings and remove variable features from the parsed transaction strings prior to executing the machine learning model on the transaction strings.
- 3 . The computing system of claim 2 , wherein the processor is configured to remove one or more of date values, non-word characters, and whitespaces, from the transaction strings to create cleaned transaction strings.
- 4 . The computing system of claim 1 , wherein the plurality of categories comprises a plurality of deposit sources, and the processor is configured to execute a machine learning classification model on the transaction strings to identify which deposit source from among the plurality of deposit sources is mapped to each transaction string, respectively.
- 5 . The computing system of claim 1 , wherein the processor is further configured to execute a third machine learning model on the plurality of labeled additional transaction strings to verify an income of a user associated with the plurality of additional transaction strings.
- 6 . A method comprising: receiving, via an application programming interface (API) call, a file comprising transaction strings corresponding to a plurality of transaction records; executing a machine learning model on the transaction strings to identify mappings between the transaction strings and a plurality of categories; assigning a label, by the machine learning model, to each of the transaction strings corresponding to its associated deposit source category as indicated in the identified mappings; generating, by the machine learning model, a classifier model that comprises respective patterns of keywords from the transaction strings mapped to the plurality of categories, respectively, based on the identified mappings; storing the classifier model in a data store; receiving, via a second API call, a file including a plurality of additional transaction records, the plurality of additional transaction records corresponding to additional plurality of transaction strings; executing the classifier model on the plurality of additional transaction strings to identify one of the deposit source categories to assign to each of the additional transaction records and to generate an output file including each of the additional transaction records further labeled with the deposit source category identified by the classifier model; and executing a second machine learning model on the output file including the plurality of labeled additional transaction strings to identify a counterparty of each additional transaction strings.
- 7 . The method of claim 6 , wherein the method further comprises parsing the plurality of transaction strings and removing variable features from the parsed transactions strings prior to executing the machine learning model on the transaction strings.
- 8 . The method of claim 7 , wherein the removing comprises deleting one or more of date values, non-word characters, and whitespaces, from the transaction strings, to create cleaned transaction strings.
- 9 . The method of claim 6 , wherein the plurality of categories comprises a plurality of deposit sources, and the executing comprises executing the classifier model on the transaction strings to identify which deposit source from among the plurality of deposit sources is mapped to each transaction string, respectively.
- 10 . The method of claim 6 , wherein the method further comprises executing a third machine learning model on the plurality of labeled additional transaction strings and the identified counterparties of the additional transaction strings to verify an income of a user associated with the plurality of additional transaction strings.
- 11 . A non-transitory computer-readable medium comprising instructions which when executed by a computer cause a processor to perform a method comprising: receiving, via an application programming interface (API) call, a file comprising transaction strings corresponding to a plurality of transaction records; executing a machine learning model on the transaction strings to identify mappings between the transaction strings and a plurality of categories; assigning a label, by the machine learning model, to each of the transaction strings corresponding to its associated deposit source category as indicated in the identified mappings; generating, by the machine learning model, a classifier model that comprises respective patterns of keywords from the transaction strings mapped to the plurality of categories, respectively, based on the identified mappings; storing the classifier model in a data store; receiving, via a second API call, a file including a plurality of additional transaction records, the plurality of additional transaction records corresponding to additional plurality of transaction strings; executing the classifier model on the plurality of additional transaction strings to identify one of the deposit source categories to assign to each of the additional transaction records and to generate an output file including each of the additional transaction records further labeled with the deposit source category identified by the classifier model; and executing a second machine learning model on the output file including the plurality of labeled additional transaction strings to identify a counterparty of each additional transaction strings.
- 12 . The non-transitory computer-readable medium of claim 11 , wherein the method further comprises parsing the plurality of transaction strings and removing variable features from the parsed transactions strings prior to executing the machine learning model on the transaction strings.
- 13 . The non-transitory computer-readable medium of claim 12 , wherein the removing comprises deleting one or more of date values, non-word characters, and whitespaces, from the transaction strings, to create cleaned transaction strings.
- 14 . The non-transitory computer-readable medium of claim 11 , wherein the plurality of categories comprises a plurality of deposit sources, and the executing comprises executing a machine learning classification model on the transaction strings to identify which deposit source from among the plurality of deposit sources is mapped to each transaction string, respectively.
Description
CROSS-REFERENCE TO RELATED APPLICATION(S) The present application is a continuation-in-part (CIP) of U.S. patent application Ser. No. 17/342,622, filed on Jun. 9, 2021, in the United States Patent and Trademark Office, which is fully incorporated herein by reference for all purposes. BACKGROUND When a financial account is used in a financial transaction, for example, a payment to another, receipt of funds, transfer of funds, etc., a record is typically created by the financial institution that issued the financial account. The transaction record may include a transaction string embodied as a collection of text that provides details about a financial transaction. In particular, that transaction string may include some helpful features about the transaction such as a date of the transaction, a location of the transaction, a type or purpose of the transaction, and in some cases, an identifier of a counterparty entity (e.g., the entity that owns the other account) involved in the transaction. Transaction strings in raw format often contain a significant amount of variability. For example, two payment transactions from an employer to an employee may cause the financial institution to create two different transaction strings with significantly different content such as different sub strings, different account identifiers, different dates, different locations, and the like. The variability within the transaction strings makes it difficult to categorize transactions together for further processing. BRIEF DESCRIPTION OF THE DRAWINGS Features and advantages of the example embodiments, and the manner in which the same are accomplished, will become more readily apparent with reference to the following detailed description taken in conjunction with the accompanying drawings. FIG. 1A-1B are diagrams illustrating a host platform that is configured for categorizing transactions in accordance with example embodiments. FIGS. 2A-2B are diagrams illustrating a process of cleaning transaction strings in accordance with example embodiments. FIGS. 3A-3C are diagrams illustrating a process of building a classifier via machine learning in accordance with example embodiments. FIG. 4 is diagram illustrating batch processing of transaction records using the classifier in accordance with example embodiments. FIG. 5 is a diagram illustrating a method for generating a classifier for classifying transaction strings into categories in accordance with an example embodiment. FIG. 6 is a diagram illustrating an example of a computing system for use in any of the examples described herein. Throughout the drawings and the detailed description, unless otherwise described, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The relative size and depiction of these elements may be exaggerated or adjusted for clarity, illustration, and/or convenience. DETAILED DESCRIPTION In the following description, details are set forth to provide a reader with a thorough understanding of various example embodiments. It should be appreciated that modifications to the embodiments will be readily apparent to those skilled in the art, and generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the disclosure. Moreover, in the following description, numerous details are set forth as an explanation. However, one of ordinary skill in the art should understand that embodiments may be practiced without the use of these specific details. In other instances, well-known structures and processes are not shown or described so as not to obscure the description with unnecessary detail. Thus, the present disclosure is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein. Financial transactions (i.e., transactions) are events that represent the movement of money from one party to another. A bank account may include a document, file, spreadsheet, printout, digital user interface, or the like, with the history of transactions over a period of time in the form of transaction records. Transaction records can include several pieces of data, for example, a date of the transaction, an amount of the transaction, whether it was a credit or debit, and the transaction string. As described herein, a transaction string is a collection of text that provides additional detail about the transaction and might include additional date information, location information, and ideally a description of the other entity (or “counterparty”) involved in the transaction (aside from the owner of the financial account). Transaction strings are typically unique to a particular financial institution that creates the transaction string. Each financial institution may use different content, different ordering, different variability, and the like, within a transaction string. Mul