US-20260127684-A1 - Transaction Data Processing Systems and Methods
Abstract
A method comprises: determining a candidate financial record associated with a transaction between a first accounting entity and a second entity; determining, using a numerical representation generation model, a numerical representation of the candidate financial record, the numerical representation generation model having been trained on a corpus generated from historical transaction records; providing, to a transaction attribute prediction model, the numerical representation of the candidate financial record, the transaction attribute prediction model having been trained using a dataset of previously reconciled financial records, each associated with a respective first transaction attribute; determining, by the transaction attribute prediction model, at least one first transaction attribute associated with the candidate financial record.
Inventors
- Delia Rusu
- Hayden Jeune
- Rebecca Dridan
- Soon-Ee Cheah
- Brett Calcott
- Zhimin Wang
- Quentin-Gabriel Thurier
- Fubiao Qin
- Niklas Patrick Pechan
Assignees
- XERO LIMITED
Dates
- Publication Date
- 20260507
- Application Date
- 20251219
- Priority Date
- 20201223
Claims (20)
- 1 . A computer-implemented method, comprising: determining, by an accounting system comprising memory, and one or more processors configured to execute instructions stored in the memory, a set of example financial records, each example financial record being associated with a transaction between a first example entity and a second example entity, and each example financial record having a first label identifying the first example entity; for each example financial record of the set of example financial records: determining, by the accounting system, an example character string based on the example financial record; determining, by the accounting system, one or more first example substrings from the example character string; determining, by the accounting system, one or more second example substrings from the example character string, wherein the one or more second example substrings are different from the one or more first example substrings; generating, by the accounting system, a first match score for each of the one or more first example substrings by comparing the one or more first example substrings to the first label; generating, by the accounting system, a second match score for each of the one or more second example substrings by comparing the one or more second example substrings to the first label; determining, by the accounting system, a best match score based on the one or more first match scores and the one or more second match scores; determining, by the accounting system, that the best match score exceeds a threshold match score; responsive to the best match score exceeding a threshold match score, annotating by the accounting system the example financial record with an entity identifier, the entity identifier derived from the example substring associated with the best match score; determining, by the accounting system, a training dataset comprising the annotated example financial records, each annotated example financial record comprising a character string of a financial record and a label entity identifier; training, by the accounting system, an entity prediction model using the training dataset to provide a trained entity prediction model, wherein the trained entity prediction model is configured to provide a predicted entity identifier of a candidate financial record; and providing the trained entity prediction model for use in predicting entity identifiers of candidate financial records.
- 2 . The method of claim 1 , further comprising: determining a position indicator for the substring associated with the best match score, wherein the entity identifier comprises the position indicator.
- 3 . The method of claim 1 , wherein the entity identifier comprises the substring associated with the best match score.
- 4 . The method of claim 1 , wherein determining a best match score based on the one or more first match scores comprises determining a highest first match score of the one or more first match scores as the best match score.
- 5 . The method of claim 1 , further comprising: for each example financial record of the set of example financial records: determining a highest first match score of the one or more first match scores; and determining a highest second match score of the one or more second match scores, wherein determining the best match score based on the one or more first match scores and the one or more second match scores comprises determining the best match score as a higher of the highest first match score and the highest second match score.
- 6 . The method of claim 1 , wherein the one or more first substrings are tokens.
- 7 . The method of claim 6 , wherein the one or more second substrings are n-grams.
- 8 . The method of claim 1 , wherein the one or more first substrings are n-grams.
- 9 . The method of claim 1 , wherein generating the first match score for each of the one or more first substrings by comparing the one or more first substrings to the first label comprises: determining a similarity score between the each of the one or more first substrings and the first label using fuzzy matching.
- 10 . The method of claim 1 , wherein training the entity prediction model using the training dataset comprises: for each of the annotated example financial records of the training dataset: determining, by the accounting system, one or more first substrings from the character string of the annotated example financial record; generating, by the accounting system, a first set of tokens by tokenising each of the one or more first substrings; determining, by the accounting system, one or more second substrings from the character string of the annotated example financial record, wherein the one or more second substrings are different from the one or more first substrings; generating, by the accounting system, a second set of tokens by tokenising each of the one or more second substrings; providing, by the accounting system, the first set of tokens and the second set of tokens to a numerical representation generation model of the accounting system to generate a numerical representation of the annotated example financial record; providing, by the accounting system, the numerical representation of the annotated example financial record and the respective label entity identifier as an input to the entity prediction model; determining, by the accounting system and as an output of the entity prediction model, a predicted entity identifier; comparing, by the accounting system, the predicted entity identifier with the respective label entity identifier; and determining, by the accounting system, one or more weights of the entity prediction model based on the comparing.
- 11 . The method of claim 1 , further comprising: determining, by the accounting system, a candidate financial record associated with a transaction between a first entity and a second entity; using, by the accounting system, the trained entity prediction model to generate a predicted entity identifier for the candidate financial record based on the candidate financial record.
- 12 . The method of claim 1 , further comprising: using the predicted entity identifier, by the accounting system, to: (i) reconcile the candidate financial record with a respective accounting record of the accounting system; or (ii) create a new accounting record in the accounting system.
- 13 . The method of claim 11 , wherein using the trained entity prediction model to generate a predicted entity identifier for the candidate financial record comprises: determining one or more first substrings from a character string of the candidate financial record; determining one or more second substrings from the character string of the candidate financial record, wherein the one or more second substrings from the character string of the candidate financial record are different from the one or more first substrings from the character string of the candidate financial record; providing the one or more first substrings from the character string of the candidate financial record and the one or more second substrings from the character string of the candidate financial record to a numerical representation generation model to generate a numerical representation of the candidate financial record; providing the numerical representation of the candidate financial record as an input to the trained entity prediction model; and determining, as an output of the trained entity prediction model, the predicted entity identifier.
- 14 . The method of claim 13 , further comprising: comparing the predicted entity identifier with a set of entity identifiers; and determining one or more suggested entity identifiers based on the comparing.
- 15 . The method of claim 1 , wherein the label entity identifier comprises an entity identifier substring extracted from the character string of the candidate financial record, and/or a label position indicator of the entity identifier substring within the character string of the candidate financial record.
- 16 . The method of claim 1 , wherein the entity prediction model is a multi-class classifier.
- 17 . A system, comprising: one or more processors; and memory comprising computer executable instructions, which when executed by the one or more processors, cause the system to: determine a set of example financial records, each example financial record being associated with a transaction between a first example entity and a second example entity, and each example financial record having a first label identifying the first example entity; for each example financial record of the set of example financial records: determine an example character string based on the example financial record; determine one or more first example substrings from the example character string; determine one or more second example substrings from the example character string, wherein the one or more second example substrings are different from the one or more first example substrings; generate a first match score for each of the one or more first example substrings by comparing the one or more first example substrings to the first label; generate a second match score for each of the one or more second example substrings by comparing the one or more second example substrings to the first label; determine a best match score based on the one or more first match scores and the one or more second match scores; determine that the best match score exceeds a threshold match score; responsive to the best match score exceeding a threshold match score, annotate by the example financial record with an entity identifier, the entity identifier derived from the example substring associated with the best match score; determine a training dataset comprising the annotated example financial records, each annotated example financial record comprising a character string of a financial record and a label entity identifier; train an entity prediction model using the training dataset to provide a trained entity prediction model, wherein the trained entity prediction model is configured to provide a predicted entity identifier of a candidate financial record; and provide the trained entity prediction model for use in predicting entity identifiers of candidate financial records.
- 18 . The system of claim 17 further configured to: determine a candidate financial record associated with a transaction between a first entity and a second entity; and use the trained entity prediction model to generate a predicted entity identifier for the candidate financial record based on the candidate financial record.
- 19 . A non-transient computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform operations including: determining a set of example financial records, each example financial record being associated with a transaction between a first example entity and a second example entity, and each example financial record having a first label identifying the first example entity; for each example financial record of the set of example financial records: determining an example character string based on the example financial record; determining one or more first example substrings from the example character string; determining one or more second example substrings from the example character string, wherein the one or more second example substrings are different from the one or more first example substrings; generating a first match score for each of the one or more first example substrings by comparing the one or more first example substrings to the first label; generating a second match score for each of the one or more second example substrings by comparing the one or more second example substrings to the first label; determining a best match score based on the one or more first match scores and the one or more second match scores; determining that the best match score exceeds a threshold match score; responsive to the best match score exceeding a threshold match score, annotating by the example financial record with an entity identifier, the entity identifier derived from the example substring associated with the best match score; determining a training dataset comprising the annotated example financial records, each annotated example financial record comprising a character string of a financial record and a label entity identifier; training an entity prediction model using the training dataset to provide a trained entity prediction model, wherein the trained entity prediction model is configured to provide a predicted entity identifier of a candidate financial record; and providing the trained entity prediction model for use in predicting entity identifiers of candidate financial records.
- 20 . The non-transient computer-readable storage medium of claim 19 further comprising: determining a candidate financial record associated with a transaction between a first entity and a second entity; and using the trained entity prediction model to generate a predicted entity identifier for the candidate financial record based on the candidate financial record.
Description
CROSS-REFERENCE TO RELATED APPLICATION(S) This application is a continuation of U.S. patent application Ser. No. 17/693,300, filed on Mar. 11, 2022, which is a continuation of International Patent Application Serial No. PCT/NZ2021/050151, filed on Aug. 25, 2021, which claims priority to Australian Patent Application Serial No. 2020904805, filed on Dec. 23, 2020, the entire contents of each incorporated herein by reference. TECHNICAL FIELD Embodiments generally relate to methods, systems, and computer-readable media for determining transaction attributes of financial records, and in some embodiments, to generate accounting records using the determined transaction attributes to allow for reconciliation of the financial records. BACKGROUND Reconciliation is a procedure for determining that the entries (accounting records) in an accounting system match corresponding entries in a financial record, such as a bank statement, or line items in a bank statement feed. When an accountant receives a financial record, such as a bank statement, the accountant has to analyse each entry in the bank statement to identify a corresponding account and account code and potentially further attributes associated with the entry to reconcile the entry with corresponding entries in the accounting system. However, financial records generated by financial systems often include entries with insufficiently particularised details, which makes it difficult to identify the relevant information for reconcile. For example, an entry may not include the name of the payer; instead, it may include a general description of the nature of the transaction, such as taxes, drawings, or wages. Because of the great degree of variability among financial records of a financial system, reconciliation can be a difficult and time-consuming task, more so for a computer program configured to automatically reconcile the data. A person may use their experience to identify the nature of transactions, but automating a computer program to automatically identify the nature of a transaction, as well as the parties of the transaction, is a difficult task due to the lack of standards in providing descriptions for entries in bank statements. Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present disclosure as it existed before the priority date of each claim of this application. SUMMARY Some embodiments relate to a method comprising: determining a candidate financial record associated with a transaction between a first accounting entity and a second entity; determining, using a numerical representation generation model, a numerical representation of the candidate financial record, the numerical representation generation model having been trained on a corpus generated from historical transaction records; providing, to a transaction attribute prediction model, the numerical representation of the candidate financial record, the transaction attribute prediction model having been trained using a dataset of previously reconciled financial records, each associated with a respective first transaction attribute; determining, by the transaction attribute prediction model, at least one first transaction attribute associated with the candidate financial record. The method of some embodiments further comprises: providing, to the transaction attribute prediction model, numerical representations of each of a plurality of accounting entity specified first attributes; and wherein determining, by the transaction attribute prediction model, at least one first transaction attribute associated with the candidate financial record comprises: determining the first transaction attribute associated with the candidate financial record as being one of the plurality of accounting entity specified first attributes. The method of some embodiments further comprises: determining, using the numerical representation generation model, a numerical representation of the accounting entity specified first attributes, the numerical representation generation model having been trained on the corpus generated from historical transaction records. In some embodiments, the accounting entity specified first attributes comprises accounting entity defined first attributes. The accounting entity specified first attributes may comprise accounting system predefined first attributes. The method of some embodiments further comprises sending, to a computing device, the determined at least one first transaction attribute for presentation on a user interface of a reconciliation application. The method of some embodiments further comprises: receiving, from the computing device, approval of an approved first transaction attribute of the determined at least one first transaction attributes;