Search

US-12626192-B2 - System and method for detecting invalid identification documents using synthetic data for model training

US12626192B2US 12626192 B2US12626192 B2US 12626192B2US-12626192-B2

Abstract

A method and system which processes images representing identification documents such as driver's licenses and passports and uses machine learning techniques wherein a model is trained and the resulting model is used in production to more effectively and accurately detect whether the source identification document has been invalidated. According to the teachings herein, the disclosed method and system can improve the detection of invalid identification documents even where there are limited source documents representing invalid documents available for model training thru the generation of synthetic training data which can be used to enhance model training.

Inventors

  • Feng Xiao
  • Suchita Bhinge
  • Yaguang Li
  • Pablo Ysrrael Abreu

Assignees

  • SOCURE, INC.

Dates

Publication Date
20260512
Application Date
20240930

Claims (20)

  1. 1 . A system configured to detect invalidity indicia appearing on an identity document, the system comprising: one or more processors configured to execute computer program modules comprising a first model and a physical storage capability; a training computer program module operative to receive at least one source of training identity document data and process said training identity document data in combination with at least one background therefor to generate synthetic invalidity indicia training data comprising one or more known invalidity designations associated with said training identity document data and storing said synthetic invalidity indicia training data in an invalidity indicia database comprising said first model; an invalidity detection computer program module operative to, based on said one or more said known invalidity designations, determine a presence or lack thereof of one or more said known invalidity designations on said identity document through a matching of said synthetic invalidity indicia training data as against data extracted from said identity document; wherein said processing of said training identity document data in combination with at least one background therefor to generate synthetic invalidity indicia training data comprises an addition of a synthetically generated known invalidity designation to said training identity document data and a placement of said training identity document data, having said addition of said synthetically generated known invalidity designation, on said at least one background.
  2. 2 . The system of claim 1 wherein said invalidity detection computer program module is further operative to determine a location of said known invalidity designation as said known invalidity designation may appear on said identity document.
  3. 3 . The system of claim 1 wherein said invalidity detection computer program module is further operative to classify a type of said known invalidity designation as said known invalidity designation may appear on said identity document.
  4. 4 . The system of claim 1 wherein said known invalidity designation comprises one or more of the following: cut corner, punched shape, plurality of punched holes forming a shape or a word, punched word or punched shape.
  5. 5 . The system of claim 1 wherein said identity document comprises one of the following: driver's license, passport, social security card, or voter identification card.
  6. 6 . The system of claim 1 wherein said training identity document data comprises only valid identity documents without any known invalidity designations.
  7. 7 . The system of 1 wherein said training identity document data comprises both valid identity documents without any known invalidity designations and invalid identity documents including at least one known invalidity designation.
  8. 8 . The system of claim 1 wherein said training computer program module is further operative to iteratively test and refine said first model using real source data to minimize false positive results associated with identity documents not containing known invalidity designations.
  9. 9 . The system of claim 8 wherein said iterative testing and refining of said first model comprises matching of a location of each of respective holes on both the front and back of a document.
  10. 10 . The system of claim 1 wherein said first model is refined during production operation using data obtained in connection with said production operation.
  11. 11 . A computer-implemented method of generating identity verification results for identities included in online transactions, the method being implemented in a computer system comprising one or more processors configured to execute computer program modules comprising a first model and a physical storage capability, and the method comprising the steps of: receiving at least one source of training identity document data and processing said training identity document data in combination with at least one background therefor to generate synthetic invalidity indicia training data comprising one or more known invalidity designations associated with said training identity document data and storing said synthetic invalidity indicia training data in an invalidity indicia database comprising said first model; determining a presence or lack thereof of one or more said known invalidity designations on said identity document through a matching of said synthetic invalidity indicia training data as against data extracted from said identity document based on said one or more known invalidity designations; wherein said processing of said training identity document data in combination with at least one background therefor to generate synthetic invalidity indicia training data comprises an addition of a synthetically generated known invalidity designation to said training identity document data and a placement of said training identity document data having said addition of said synthetically generated known invalidity designation, on said at least one background.
  12. 12 . The method of claim 11 further including the step of determining the location of said known invalidity designation as said known invalidity designation may appear on said identity document.
  13. 13 . The method of claim 11 further comprising the step of classifying a type of said known invalidity designation as said known invalidity designation may appear on said identity document.
  14. 14 . The method of claim 11 wherein said known invalidity designation comprises one or more of the following: cut corner, punched shape, plurality of punched holes forming a shape or a word, punched word or punched shape.
  15. 15 . The method of claim 11 wherein said identity document comprises one of the following: driver's license, passport, social security card, or voter identification card.
  16. 16 . The method of claim 11 wherein said training identity document data comprises only valid identity documents without any known invalidity designations.
  17. 17 . The method of claim 11 wherein said training identity document data comprises both valid identity documents without any known invalidity designations and invalid identity documents including at least one known invalidity designation.
  18. 18 . The method of claim 11 further comprising the step of iteratively testing and refining said first model using real source data to minimize false positive results associated with identity documents not containing known invalidity designations.
  19. 19 . The method of claim 18 wherein said iterative testing and refining of said first model comprises matching of a location of each of respective holes on both the front and back of a document.
  20. 20 . The method of claim 11 wherein said first model is refined during production operation using data obtained in connection with said production operation.

Description

FIELD OF THE DISCLOSURE Disclosed embodiments relate to the detection of identification documents that have been invalidated, and more specifically, to the use of machine learning systems and generated synthetic data to train models to accurately detect various indicia of invalidity on such identification documents. BACKGROUND Identity verification is often performed in connection with online transactions in order to assess whether the person attempting the transaction is who they say they are. For example, an individual seeking to open a new checking account online may be asked to upload a photographic image of an identity document such as a driver's license for verification prior to the system allowing the transaction to proceed. If the individual is unwilling to provide an image of an identity document or an uploaded identity document can not be verified as establishing sufficient evidence that the individual is who they say they are, then the transaction will typically be rejected. Once the image has been uploaded, the system proceeds to extract information from the document image in connection with the verification process. In connection with an image of a driver's license, for example, the system may scan the image using OCR or other means so that it can extract first name, last name, date of birth, driver's license number, expiration date and/or any other data contained within the driver's license. The data can then be used by the validation system to verify identity. One issue that may arise in connection with identity verification in these contexts is that the individual providing the document image may intentionally or unintentionally provide a document that is invalid or expired. While it may be possible in many commercial applications to validate even based on an invalid id (i.e. an expired driver's license) because the embedded data is still viable and may only be somewhat out of date, there are other applications where this is not acceptable. By way of example, in many public sector applications, identity verification must be accomplished only with a valid and current identification document. This is often the case, for example, in mission critical applications where known current data from a current document is required as a precondition for transaction processing. In cases where in invalid identity documents should be excluded from the verification process, it is thus necessary for the processing systems to detect that the documents are invalid and prevent them from being used in connection with the verification process. Issuers of identity documents such as governmental agencies and other issuing authorities often require expired documents to be surrendered as a condition of receiving a replacement identity document. However, this is not always the case. Some issuing authorities instead require that the document be physically modified in some way to irreversibly designate the document as invalid. One known example of invalidity designation requires the cutting of a corner off of the document such as a driver's license. Another example is punching a hole or other shape somewhere on the document. Yet another example is punching a large number of smaller holes in the document in the shape of a word such as “VOID” or some other indicia. Many other examples are in use for invalidating different types of documents such as driver's licenses, passports, license plates, etc. It would be desirable to implement a system functioning with machine learning techniques in which a model can be trained to detect a wide variety of invalidity designations on a wide variety of identity documents. This presents a challenge due to the variations in documents and invalidity designations as well as the requirement to minimize false positives and false negatives when implementing invalid document detection as part of the overall identity verification process. Unfortunately, when seeking to train a model to detect invalid identification documents, limited source documents with invalidity markings may be available for training such that training would be marginal at best. At the same time, the model must be trained sufficiently to allow for an acceptable level of false positives and false negatives where the acceptable levels are likely to be extremely low. These two competing realities make it difficult with current solutions to consistently detect documents that are truly invalid while at the same time avoiding false positives wherein documents are actually valid but incorrectly detected as invalid. Thus, as will be understood, there is a need for a system and methodology in which identity verification can be performed using identification document images and wherein images reflecting invalid identification documents are rejected so they are not permitted to be used as a data source in connection with the verification process. It is also important that this system and methodology minimize false positive and false ne