US-12619784-B2 - Systems and methods of determining compromised identity information
Abstract
A compromised data exchange system extracts data from websites using a crawler, detects portions within the extracted data that resemble personally identifying information (PII) data based on PII data patterns using a risk assessment module, and compares a detected portion to data within a database of disassociated compromised PII data to determine a match using the risk assessment module. A risk score may be assigned to a data item within the database in response to determining the match. In some embodiments, URL data may also be detected in the extracted data. The detected URL data represents further websites that can be automatically crawled by the system to detect further PII data.
Inventors
- Lester Leland Lockhart, III
- David Hugh Munson
- Gregor R. Bonin
- Michael Cook
Assignees
- EARLY WARNING SERVICES, LLC
Dates
- Publication Date
- 20260505
- Application Date
- 20230417
Claims (20)
- 1 . A compromised data exchange system, comprising: a network interface; one or more processors; and a memory coupled with the one or more processors, the memory storing instructions thereon that, when executed, cause the one or more processors to: extract data from one or more websites; detect portions of the data that resemble personally identifying information (PII) data based on PII data patterns; compare a detected portion of the data to data within a database of disassociated compromised PII data to determine a match, wherein the disassociated compromised PII data comprises PII data elements that are disconnected from one another and cannot be re-associated to correlate the PII data elements to an actual consumer identity by anyone other than a data originator of the PII data elements; and assign a risk score to a data item within the database in response to determining the match.
- 2 . The compromised data exchange system of claim 1 , wherein the instructions further cause the one or more processors to: provide the detected portions of the data to one or both of an administrator and an artificial intelligence engine.
- 3 . The compromised data exchange system of claim 1 , wherein the instructions further cause the one or more processors to: encrypt the portions of the data that resemble PII data, wherein the data within the database of disassociated compromised PII data is encrypted.
- 4 . The compromised data exchange system of claim 3 , wherein: the portions of the data that resemble PII data are encrypted using a same set of one or more encryption keys as used to encrypt the data within the database of disassociated compromised PII data.
- 5 . The compromised data exchange system of claim 4 , wherein: the one or more encryption keys comprise different encryption keys for each field of PII data.
- 6 . The compromised data exchange system of claim 1 , wherein the instructions further cause the one or more processors to: assign a risk score to each piece of disassociated data within the database of disassociated compromised PII data associated with a particular data breach event in response to determining multiple matches between the portions and the disassociated data.
- 7 . The compromised data exchange system of claim 1 , wherein the instructions further cause the one or more processors to: increase a risk score for a piece of disassociated data within the database of disassociated compromised PII data in response to determining the match.
- 8 . A method of analyzing compromised data, comprising: extracting data from one or more websites; detecting portions of the data that resemble personally identifying information (PII) data based on PII data patterns; comparing a detected portion of the data to data within a database of disassociated compromised PII data to determine a match, wherein the disassociated compromised PII data comprises PII data elements that are disconnected from one another and cannot be re-associated to correlate the PII data elements to an actual consumer identity by anyone other than a data originator of the PII data elements; and assigning a risk score to a data item within the database in response to determining the match.
- 9 . The method of analyzing compromised data of claim 8 , wherein: at least one website of the one or more websites is a website that is not indexed on search engines.
- 10 . The method of analyzing compromised data of claim 9 , wherein: the at least one website associated with the dark web.
- 11 . The method of analyzing compromised data of claim 8 , further comprising: ranking the one or more websites based on a metric of comprised information associated with each of the one or more websites.
- 12 . The method of analyzing compromised data of claim 11 , wherein: ranking the one or more websites comprises combining and quantifying extracted patterns within the data from each of the one or more websites.
- 13 . The method of analyzing compromised data of claim 8 , further comprising: deploying new data patterns of interested based on changed in breached data posting behavior.
- 14 . The method of analyzing compromised data of claim 11 , wherein: the data is extracted using a crawler.
- 15 . A non-transitory computer-readable medium having instructions stored thereon that, when executed by one or more processors, cause the one or more processors to: extract data from one or more websites; detect portions of the data that resemble personally identifying information (PII) data based on PII data patterns; compare a detected portion of the data to data within a database of disassociated compromised PII data to determine a match, wherein the disassociated compromised PII data comprises PII data elements that are disconnected from one another and cannot be re-associated to correlate the PII data elements to an actual consumer identity by anyone other than a data originator of the PII data elements; and assign a risk score to a data item within the database in response to determining the match.
- 16 . The non-transitory computer-readable medium of claim 15 , wherein the instructions further cause the one or more processors to: identify at least one website associated with one or both of a uniform resource locator and a link from the extracted data.
- 17 . The non-transitory computer-readable medium of claim 16 , wherein the instructions further cause the one or more processors to: extract data from the at least one website.
- 18 . The non-transitory computer-readable medium of claim 16 , wherein the instructions further cause the one or more processors to: rank websites of the one or more websites from which the one or both of the uniform resource locator and the link associated with each of the at least one website was found.
- 19 . The non-transitory computer-readable medium of claim 18 , wherein: ranking the websites of the one or more websites is performed based on a number of different sets of PII data at each of the websites of the one or more websites.
- 20 . The non-transitory computer-readable medium of claim 15 , wherein: the risk score is determined at least in part based on a ranking of a webpage from which the data item was extracted.
Description
CROSS-REFERENCE TO RELATED APPLICATION(S) The present disclosure is a continuation of co-pending U.S. application Ser. No. 16/563,341, filed on Sep. 6, 2019, which is a continuation-in-part of and claims priority to U.S. application Ser. No. 16/267,297, filed on Feb. 4, 2019 and entitled “Systems and Methods of Determining Compromised Identity Information,” now U.S. Pat. No. 10,599,872, which is a continuation of U.S. application Ser. No. 15/237,519, filed on Aug. 15, 2016 and entitled “Systems and Methods of Determining Compromised Identity Information,” now U.S. Pat. No. 10,268,840, which is a continuation-in-part of and claims priority to U.S. application Ser. No. 14/960,288, filed on Dec. 4, 2015 and entitled “Compromised Identity Exchange Systems and Methods,” the complete disclosures of which are fully incorporated by reference herein for all purposes. FIELD The present disclosure is generally related to identification of compromised identity information, and more particularly to system and methods of determining compromised personally identifiable information on the Internet. BACKGROUND Personally identifiable information (PII) may be collected by a variety of organizations, including healthcare organizations, governmental organizations, financial entities (e.g., credit card companies, banks, etc.), credit bureaus, educational institutions, and other organizations. PII includes information that can be used to uniquely identify an individual and may include, but is not limited to, the individual's full name, date of birth, social security number, bank or credit card numbers, passwords, addresses, phone numbers, and the like. Such data is increasingly maintained in electronic form, making it easier for such data to become compromised, such as through a hacking event, inadvertent disclosure, or other data breach incidents. Compromised PII data may be used for identify theft and for other nefarious purposes. In addition to data breach events, PII can be compromised through “phishing,” which refers to a process of masquerading as a trustworthy entity in an electronic communication. An example of phishing may include a fraudulent email that appears to be from a valid source, such as, for example, a national bank or a credit card company. The fraudulent email may incorporate a uniform resource locator (URL) that re-directs the user to a fraudulent website that masquerades as a legitimate website for the real company. However, the fraudulent website may be designed to steal PII via a false transaction. For example, the fraudulent website may request “confirmation” of PII, such as, for example, a credit card number or a username and password. The “confirmed” PII may then be stored for later improper use. Once collected, PII data may be sold on a black market through various web sites and illicit data sources. Such web sites and data sources may not be registered with standard search engines, making them difficult to find through traditional web searches. Such web sites and data sources may be part of the “dark” web, which can be represented by a large number of web servers that do not permit search engine indexing and which host information for those who know where to look. The legitimate owner or holder of PII (such as a credit card company) may know that data has been compromised, for example, when a credit card number has been used in an attempt to conduct a fraudulent transaction. However, that alone does not necessarily reflect the degree of risk to the affected individual. For example, while a credit card number may have been compromised and used for a single attempted transaction, it may or may not be offered for sale on the dark web. Once a stolen credit card number or other compromised PII is offered for sale on the dark web, the risk associated with the compromised PII greatly increases. Websites on the dark web that offer PII often present partial or complete samples of actual PII that can be purchased (with an opportunity to negotiate and purchase PII beyond the samples). The samples often appear on “marketplace” websites on the dark web. The marketplace websites typically display not only PII samples and a link or URL for contacting the seller, but also links to other marketplace websites where PII (and other illicit items) may be offered. It is usually impractical (and prohibitively expensive) to negotiate the purchase of PII in order to determine whether specific PII is being offered for sale. One approach in trying to determine whether compromised PII has been offered for sale would be to visit many sites and base the determination on samples of stolen PII offered at those sites, with the hope that if enough sites are visited, there is a reasonable chance of finding at one of those sites compromised PII for a specific individual (if it is being offered for sale). However, because of the vast number of sites on the dark web, particularly “marketplace” sites that offer PII “samples” and provide