EP-4742137-A1 - MACHINE LEARNING BASED SYSTEMS AND METHODS FOR IDENTIFYING EQUIVALENT ENTITIES
Abstract
A machine learning based (ML-based) computing method and system for automatically identifying equivalent entities, is disclosed. The method involves obtaining transaction data, standardizing data fields, and generating unique transaction numbers. Entities with identical transaction numbers are grouped into initial groups. An ML model, utilizing an apriori algorithm, eliminates certain groups to form intermediate groups based on frequent item sets. An iterative unionization process then merges intermediate groups to form resultant groups of equivalent entities. The method further includes assessing model accuracy, with optional re-training if accuracy falls below a threshold. Standardization steps include column name alignment, datetime and numeric conversion, currency conversion, intra-company removal, and write-off removal. Resultant groups are provided to end users via interfaces, allowing accurate, automated identification of equivalent entities in financial transactions.
Inventors
- Kunwar, Anupam
- Roy Burman, Ramit
- Shrivastava, Apoorva
Assignees
- Highradius Corporation
Dates
- Publication Date
- 20260513
- Application Date
- 20251111
Claims (15)
- A machine-learning based (ML-based) computing method for automatically identifying one or more equivalent entities, the ML-based computing method comprising: obtaining, by one or more hardware processors, data associated with one or more financial transactions performed by one or more entities, from one or more databases, wherein the data comprise one or more data fields associated with at least one of: one or more transaction types, one or more company codes, one or more document types, one or more document numbers, one or more posting keys, one or more posting dates, one or more invoice dates, one or more clearing dates, one or more clearing documents, one or more invoice amounts, one or more entity names, and one or more entity numbers; pre-processing, by the one or more hardware processors, the data associated with the one or more financial transactions, wherein pre-processing the one or more data comprise standardizing the one or more data fields; generating, by the one or more hardware processors, one or more transaction numbers for each of the one or more financial transactions by concatenating the pre-processed one or more data fields; grouping, by the one or more hardware processors, one or more entities having identical transaction numbers, into one or more initial groups; eliminating, by the one or more hardware processors, one or more groups from the one or more initial groups using a machine learning (ML) model to obtain one or more intermediate groups, wherein the ML model is configured to obtain the one or more intermediate groups using an association identified within the one or more initial groups based on frequent item sets within the one or more initial groups; comparing, by the one or more hardware processors, the one or more entities across the one or more intermediate groups to obtain one or more resultant groups using an iterative unionization process, wherein each of the one or more resultant group comprises the one or more equivalent entities performing the one or more financial transactions; and providing, by the one or more hardware processors, the one or more resultant groups as an output to one or more end users through one or more user interfaces of one or more electronic devices associated with the one or more end users.
- The ML-based computing method of claim 1, wherein standardizing the data comprises at least one of: column name standardization, datetime and numeric conversion, handling of null records, currency conversion, intra-company removal, and write-off removal.
- The ML-based computing method of claim 2, further comprising: determining, by the one or more hardware processors, whether one or more datasets comprising the data having uniformity and clarity, providing data analysis and interpretation of the data, using the column name standardization, wherein the column name standardization comprise one or more information varies from one or more types of enterprise resource planning (ERP); standardizing, by the one or more hardware processors, date and numeric formats across the one or more data fields using the datetime and numeric conversion; filtering, by the one or more hardware processors, the null records based on at least one of: one or more seasonal trends and an impact of the null records with one or more values; standardizing, by the one or more hardware processors, the one or more financial transactions based on one or more functional currencies of one or more business units to synchronize one or more transactional data across one or more sources, for consistency of the data, using the currency conversion; filtering, by the one or more hardware processors, the one or more financial transactions between the one or more business units of the one or more entities, using the intra-company removal; and ending, by the one or more hardware processors, the one or more financial transactions where one or more irrelevant invoices using a single offset record by which one or more irrelevant entities are grouped, using the write-off removal.
- The ML-based computing method of claim 1, wherein eliminating the one or more groups from the one or more initial groups using the ML model to obtain the one or more intermediate groups, comprises: obtaining, by the one or more hardware processors, information associated with the one or more initial groups, as one or more inputs to the ML model, wherein the ML model is an unsupervised ML model comprising an apriori model; setting, by the one or more hardware processors, one or more values for one or more parameters associated with the one or more entities in the ML model, wherein the one or more parameters comprise at least one of: a support threshold, a confidence threshold , and a predefined length of grouping of the one or more entities; computing, by the one or more hardware processors, a support for each of the entity based on a frequency of occurrence of each of the entity within the one or more initial groups; computing, by the one or more hardware processors, a support for a combination of the one or more entities based on a frequency of occurrence of the combination of the one or more entities, iteratively until the pre-defined length of grouping is attained, wherein the combination of the one or more entities comprises grouping of the one or more entities having the support exceeding the support threshold; and computing, by the one or more hardware processors, a confidence value for each of the combination of the one or more entities using the support computed for each of the entity and the support computed for the combination of the one or more entities, wherein the one or more intermediate groups are obtained from the combination of the one or more entities having the confidence value exceeding the confidence threshold.
- The ML-based computing method of claim 1, wherein the iterative unionization process comprises: determining, by the one or more hardware processors, an intersection of each intermediate group with the one or more intermediate groups; performing, by the one or more hardware processors, an unionization process between each intersected intermediate group, to obtain the one or more resultant groups; comparing, by the one or more hardware processors, the one or more intermediate groups with the one or more resultant groups to determine whether a count of the one or more resultant groups is equal to a count of the one or more intermediate groups; and repeating, by the one or more hardware processors, the iterative unionization process until the count of the one or more resultant groups is equal to the count of the one or more intermediate groups, wherein the one or more resultant groups are determined as one or more inputs to the iterative unionization process when the count of the one or more resultant groups is not equal to the count of the one or more intermediate groups.
- The ML-based computing method of claim 1, further comprising assessing, by the one or more hardware processors, an accuracy of the ML model by comparing the one or more resultant groups with one or more historical entity grouping data.
- The ML-based computing method of claim 6, further comprising: re-training, by the one or more hardware processors, the ML model by adjusting the one or more values of the one or more parameters, when the accuracy of the ML model on obtaining the one or more intermediate groups, is below a predetermined accuracy threshold value; and processing, by the one or more hardware processors, the re-trained ML model with the adjusted one or more values of the one or more parameters to optimize the accuracy of the ML model.
- A machine-learning based (ML-based) computing system for automatically identifying one or more equivalent entities, the ML-based computing system comprising: one or more hardware processors; a memory coupled to the one or more hardware processors, wherein the memory comprises a plurality of subsystems in form of programmable instructions executable by the one or more hardware processors, and wherein the plurality of subsystems comprises: a data obtaining subsystem configured to obtain data associated with one or more financial transactions performed by one or more entities, from one or more databases, wherein the data comprise one or more data fields associated with at least one of: one or more transaction types, one or more company codes, one or more document types, one or more document numbers, one or more posting keys, one or more posting dates, one or more invoice dates, one or more clearing dates, one or more clearing documents, one or more invoice amounts, one or more entity names, and one or more entity numbers; a data pre-processing subsystem configured to pre-process the data associated with the one or more financial transactions, wherein pre-processing of the data comprise standardizing the one or more data fields; an entity grouping subsystem configured to: generate one or more transaction numbers for each of the one or more financial transactions by concatenating the pre-processed one or more data fields; group one or more entities having identical transaction numbers, into one or more initial groups; and eliminate one or more groups from the one or more initial groups using a machine learning (ML) model to obtain one or more intermediate groups, wherein the ML model is configured to obtain the one or more intermediate groups using an association identified within the one or more initial groups based on frequent item sets within the one or more initial groups; an entity identifying subsystem configured to compare the one or more entities across the one or more intermediate groups to obtain one or more resultant groups using an iterative unionization process, wherein each of the one or more resultant group comprises the one or more equivalent entities performing the one or more financial transactions; and an output subsystem configured to provide the one or more resultant groups as an output, to one or more end users through one or more user interfaces of one or more electronic devices associated with the one or more end users.
- The ML-based computing system of claim 8, wherein standardizing the data comprises at least one of: column name standardization, datetime and numeric conversion, handling of null records, currency conversion, intra-company removal, and write-off removal.
- The ML-based computing system of claim 9, wherein: the column name standardization comprise one or more information varies from one or more types of enterprise resource planning (ERP), wherein the column name standardization is configured to determine whether one or more datasets comprising the data having uniformity and clarity, providing data analysis and interpretation of the data; the datetime and numeric conversion is configured to standardize date and numeric formats across the one or more data fields; the null records are filtered based on at least one of: one or more seasonal trends and an impact of the null records with one or more values; the currency conversion is configured to standardize the one or more financial transactions based on one or more functional currencies of one or more business units to synchronize one or more transactional data across one or more sources, for consistency of the data; the intra-company removal is configured to filter the one or more financial transactions between the one or more business units of the one or more entities; and the write-off removal is configured to end the one or more financial transactions where one or more irrelevant invoices using a single offset record by which one or more irrelevant entities are grouped.
- The ML-based computing system of claim 8, wherein in eliminating the groups from the one or more initial groups using the ML model to obtain the one or more intermediate groups, the entity grouping subsystem is configured to: obtain information associated with the one or more initial groups, as one or more inputs to the ML model, wherein the ML model is an unsupervised ML model comprising an apriori model; set one or more values for one or more parameters associated with the one or more entities in the ML model, wherein the one or more parameters comprise at least one of: a support threshold, a confidence threshold, and a predefined length of grouping of the one or more entities, compute a support for each of the entity based on a frequency of occurrence of each of the entity within the one or more initial groups; compute a support for a combination of the one or more entities based on a frequency of occurrence of the combination of the one or more entities, iteratively until the pre-defined length of grouping is attained, wherein the combination of the one or more entities comprises grouping of the one or more entities having the support exceeding the support threshold; and compute a confidence value for each of the combination of the one or more entities using the support computed for each of the entity and the support computed for the combination of the one or more entities, wherein the one or more intermediate groups are obtained from the combination of the one or more entities having the confidence value exceeding the confidence threshold.
- The ML-based computing system of claim 8, wherein in the iterative unionization process, the entity identifying subsystem is configured to: determine an intersection of each intermediate group with the one or more intermediate groups; perform an unionization process between each intersected intermediate group, to obtain the one or more resultant groups; compare the one or more intermediate groups with the one or more resultant groups to determine whether a count of the one or more resultant groups is equal to a count of the one or more intermediate groups; and repeat the iterative unionization process until the count of the one or more resultant groups is equal to the count of the one or more intermediate groups, wherein the one or more resultant groups are determined as one or more inputs to the iterative unionization process when the count of the one or more resultant groups is not equal to the count of the one or more intermediate groups.
- The ML-based computing system of claim 8, further comprising an accuracy assessment subsystem is configured to assess an accuracy of the ML model by comparing the one or more resultant groups with one or more historical entity grouping data .
- The ML-based computing system of claim 13, further comprising a re-training subsystem configured to: re-train the ML model by adjusting the one or more values of the one or more parameters, when the accuracy of the ML model on obtaining the one or more intermediate groups, is below a predetermined accuracy threshold value; and process the re-trained ML model with the adjusted one or more values of the one or more parameters to optimize the accuracy of the ML model.
- A non-transitory computer-readable storage medium having instructions stored therein that when executed by a hardware processor, cause the processor to execute operations of: obtaining data associated with one or more financial transactions performed by one or more entities, from one or more databases, wherein the data comprise one or more data fields associated with at least one of: one or more transaction types, one or more company codes, one or more document types, one or more document numbers, one or more posting keys, one or more posting dates, one or more invoice dates, one or more clearing dates, one or more clearing documents, one or more invoice amounts, one or more entity names, and one or more entity numbers; pre-processing the data associated with the one or more financial transactions, wherein pre-processing the data comprise standardizing the one or more data fields; generating one or more transaction numbers for each of the one or more financial transactions by concatenating the pre-processed one or more data fields ; grouping one or more entities having identical transaction numbers, into one or more initial groups; eliminating one or more groups from the one or more initial groups using a machine learning (ML) model to obtain one or more intermediate groups, wherein the ML model is configured to obtain the one or more intermediate groups using an association identified within the one or more initial groups based on frequent item sets within the one or more initial groups; comparing the one or more entities across the one or more intermediate groups to obtain one or more resultant groups using an iterative unionization process, wherein each of the one or more resultant group comprises the one or more equivalent entities performing the one or more financial transactions; and providing one or more resultant groups as an output, to one or more end users through one or more user interfaces of one or more electronic devices associated with the one or more end users.
Description
FIELD OF INVENTION Embodiments of the present disclosure relate to machine learning based (ML-based) computing systems, and more particularly relates to a ML-based computing method and system for identifying one or more equivalent entities. BACKGROUND In accounts receivables, an entity may include a company, organization, or individual, conducting business activities including a financial transaction. Equivalent entities refer to a group of entities within a business or financial system, where each member has an authority to carry out financial transactions, including at least one of: making payments and claiming credits, on behalf of any other member in the group. Here each member of the group has specific identification (for example: Entity ID). For equivalent entities, if an invoice or credit memo is addressed to a particular entity ID, then another entity ID within the same group may settle the invoice or claim the credit. The key feature of equivalent entities is their shared financial responsibilities and permissions, which allow them to act as payers for one another, creating a flexible network of financial interactions within the group. At present, no automated solutions exist to identify equivalent entities. As a result, finance teams must manually identify and group these customers. This process usually involves carefully reviewing entities accounts and transactions to identify interconnected financial responsibilities and permissions that define equivalent entities. The process further requires finance professionals to meticulously analyze large volumes of data, relying on their expertise and attention to detail to accurately group these entities. This manual approach is crucial for maintaining the integrity of financial operations and reporting within the organization. However, this manual approach has significant limitations and drawbacks. The manual approach is inherently time-consuming and labor-intensive, requiring considerable human resources and meticulous attention to detail. Additionally, the accuracy of this manual process depends heavily on the experience and expertise of the finance professionals involved, introducing subjectivity and an increased risk of errors. These challenges not only slow down financial operations but also undermine an accuracy and reliability of financial reporting. As a result, the manual identification and grouping of equivalent entities present a major bottleneck in optimizing financial processes, underscoring an urgent need for a more efficient and precise solution. Hence, there is a need for an improved machine learning based (ML-based) computing system and method for identifying one or more equivalent entities, in order to address the aforementioned issues. SUMMARY This summary is provided to introduce a selection of concepts, in a simple manner, which is further described in the detailed description of the disclosure. This summary is neither intended to identify key or essential inventive concepts of the subject matter nor to determine the scope of the disclosure. In accordance with an embodiment of the present disclosure, a machine-learning based (ML-based) computing method for automatically identifying one or more equivalent entities, is disclosed. The ML-based computing method comprises obtaining, by one or more hardware processors, data associated with one or more financial transactions performed by one or more entities, from one or more databases. The data comprise one or more data fields associated with at least one of: one or more transaction types, one or more company codes, one or more document types, one or more document numbers, one or more posting keys, one or more posting dates, one or more invoice dates, one or more clearing dates, one or more clearing documents, one or more invoice amounts, one or more entity names, and one or more entity numbers. The ML-based computing method further comprises pre-processing, by the one or more hardware processors, the data associated with the one or more financial transactions. In an embodiment, pre-processing the one or more data comprise standardizing the one or more data fields. The ML-based computing method further comprises generating, by the one or more hardware processors, one or more transaction numbers for each of the one or more financial transactions by concatenating the pre-processed one or more data fields . The ML-based computing method further comprises grouping, by the one or more hardware processors, one or more entities having identical transaction numbers, into one or more initial groups. The ML-based computing method further comprises eliminating, by the one or more hardware processors, one or more groups from the one or more initial groups using a machine learning (ML) model to obtain one or more intermediate groups. The ML model is configured to obtain the one or more intermediate groups using an association identified within the one or more initial groups based on frequent item sets