US-12619766-B2 - System and method for confidential data identification with quantitative risk analysis in networks
Abstract
Present invention relates to systems and methods for calculation of information and cyber risk posed by the systems and methods that process data and their automated non-compliance verification and information and cyber risk posed by non-compliance. Disclosed is a system ( 100 ) and a method ( 200 ) for calculation of information and cyber risk by identifying sensitive electronic information stored in client devices ( 10 ) like desktops, laptops, mobile devices and databases of shared network drives or cloud environments connected through a communication network ( 20 ). The system ( 100 ) is capable of identifying the data at rest stored in various file formats such as word, excels, csv, pdf, power point, database file formats and compressed file formats. The method ( 200 ) calculates information and cyber risk and there identifies the potential liability or insurance value based on volume and value of data and compliance with corporate policies for data protection and potential areas of non-compliance.
Inventors
- Shirish Dattatraya CHIRPUTKAR
Assignees
- Shirish Dattatraya CHIRPUTKAR
Dates
- Publication Date
- 20260505
- Application Date
- 20210514
Claims (10)
- 1 . A system for confidential data and quantitative risk analysis in networks comprising: a plurality of client devices, each of the plurality of client devices having an input device, an output device, memory, a processor with a plurality of applications configured thereon; a communication network operably coupled with each of the plurality of client devices for providing communication links thereto; a communication unit configured to communicate with each of the plurality of client devices through the communication network, the communication unit having: a memory unit for storing a plurality of processing instructions for an operation of the communication unit, a processor unit in communication with the memory unit, the processor configured to execute the plurality of processing instructions stored in the memory unit, an input unit for providing inputs to the communication unit, an output unit for providing output of the communication unit, a crawler module having a set of instructions for scanning and searching confidential data including PII (personally identifiable information), PHI (personal/protected health information), PCI (payment card industry), business confidential data and customized general data stored at each of the plurality of client devices in multiple file formats, a data repository for storing the confidential data received from the crawler module, an artificial intelligence module configured to analyze the confidential data stored in the data repository and categorizing the confidential data into data classes, and a reporting module reporting the categorized confidential data on the output unit; wherein the artificial intelligence module is configured to: define and manage a set of rules for analyzing and categorizing the confidential data, monitor and manage operation of the set of rules through an active monitoring service, learn from metadata associated with the confidential data to identify additional related data stored on the plurality of client devices on the communications network, wherein the learning includes identifying metadata in PII, PHI, PCI, business confidential data, and regulatory data, and actively learning to identify additional related data based on the identified metadata, generate at least one of a new rule or modification to at least one of the set of rules in response to the identified additional related data, wherein the system recommends at least one of a new rule or modification based on the learning from the metadata, perform quantitative risk analysis of the identified confidential data, and calculate information risk and cyber risk associated with the confidential data to assess one or more of a liability or insurance value based on one or more predefined parameters, a volume of data, a value of data, a compliance with corporate policies for data protection, and an area of non-compliance, wherein said calculation includes at least completeness of the record, duration of exposure, probable and actual area of exposure, and a determination, by the artificial intelligence module, of whether a false identity can be established using confidential information complemented by publicly available information, wherein the quantitative risk calculation reflects the outcome of said determination, and wherein said calculation further includes a geographical location associated with storage, transmission, or exposure of the confidential data and wherein said calculating comprises executing a rule to increase at least one of the liability or insurance value assigned to the confidential data when the confidential data is accessible across multiple geographical locations or in domains or countries having stricter privacy law.
- 2 . The system as claimed in claim 1 , wherein the crawler module scans the confidential data stored in the multiple file formats including plain text file formats, compressed file formats, raster/image file formats and customized file formats.
- 3 . The system as claimed in claim 1 , wherein an end point location extractor is coupled to the crawler module for providing location and configuration of the plurality of the client devices.
- 4 . The system as claimed in claim 1 , wherein the data repository is a database selected from a SQL (Structured Query Language) database, a No-SQL Database and a key-value pair database.
- 5 . The system as claimed in claim 1 , wherein the artificial intelligence module is configured to define and manage the set of rules, the set of rules comprising a set of knowledge rules for confirming confidential and non-confidential data in the data repository.
- 6 . The system as claimed in claim 1 , wherein the artificial intelligence module is configured to actively learn from the metadata identified in PII, PHI, PCI, business confidential, and regulatory data, and wherein the artificial intelligence module is configured to recommend new rules or modify an existing rule in response to the learned metadata.
- 7 . A method for confidential data and quantitative risk analysis in networks comprising steps of: searching confidential data including PII (personally identifiable information), PHI (personal/protected health information), PCI (payment card industry), business confidential data and customized general data stored on a plurality of client devices by a crawler module of a communication unit via a communication network; storing the confidential data in a data repository of the communication unit; analyzing the confidential data stored in the data repository by an artificial intelligence module of the communication unit; categorizing the confidential data into data classes, by the artificial intelligence module of the communication unit, based on a defined set of knowledge rules; managing the defined set of knowledge rules by the artificial intelligence module of the communication unit; monitoring and managing operation of the set of knowledge rules through an active monitoring service; learning, by the artificial intelligence module of the communication unit, additional related data stored on the plurality of client devices on the communication network identified from metadata associated with the confidential data, wherein the learning includes identifying metadata in PII, PHI, PCI, business confidential data, and regulatory data, and actively learning to identify additional related data based on the identified metadata; generating, by the artificial intelligence module of the communication unit, at least one of a new rule or a modification to at least one of the defined set of knowledge rules in response to the learned additional related data, wherein the artificial intelligence module recommends at least one of a new rule or modification based on the learning from the metadata; calculating, by the artificial intelligence module, information risk and cyber risk to assess a liability or an insurance value based on a volume of data, a value of data, a compliance with corporate policies for data protection, and an area of non-compliance, wherein said calculating includes at least completeness of the record, duration of exposure, probable and actual area of exposure, and a determination, by the artificial intelligence module, of whether a false identity can be established using confidential data complemented by publicly available information, wherein the quantitative risk calculation reflects the outcome of said determination, and wherein said calculating further includes a geographical location associated with storage, transmission, or exposure of the confidential data and wherein said calculating comprises executing a rule to increase at least one of the liability or insurance value assigned to the confidential data increases when the confidential data is accessible across multiple geographical locations or in domains or countries having stricter privacy laws; and reporting categorized confidential data by a reporting module of the communication unit.
- 8 . The method as claimed in claim 7 , wherein the artificial intelligence module is configured to define and manage the set of knowledge rules for confirming confidential and non-confidential data in the data repository.
- 9 . The method as claimed in claim 7 , wherein the artificial intelligence module is configured to actively learn from the metadata identified in PII, PHI, PCI, and regulatory data, and wherein the artificial intelligence module is configured to generate the new rules or modify the at least one of set of knowledge rules in response to the identified metadata.
- 10 . The method as claimed in claim 7 , the crawler module scans the confidential data stored in the multiple file formats including plain text file formats, compressed file formats, raster/image file formats and customized file formats.
Description
This application is a national phase of International Application No. PCT/IN2021/050460 filed May 14, 2021, which claims priority to India application No. 202021056477 filed Dec. 25, 2020, the entire disclosures of which are hereby incorporated by reference. FIELD OF INVENTION The present invention relates to system and method for confidential data identification with quantitative risk analysis in networks and more particularly relates to the system and the method for quantitatively determining and presenting private-confidential and business-confidential data identifiers to produce critical cyber, financial, public relations, business continuity and other risk metrics that organizations may be legally liable for in a novel and streamlined way. BACKGROUND OF THE INVENTION Computer networks have become a significant and vital part of day to day life. Accordingly, machines connected to such networks have become primary tools for storing various types of private and/or confidential personal and business information. Said information, including proprietary, confidential, or other sensitive data becomes at risk as its dissemination increases. This, in turn, increases the necessity of securing said data and therefore enterprises and other organizations have come to rely on numerous disparate tools and time consuming and inefficient processes in an attempt to keep intruders and unauthorized personnel from accessing said information. According to the US Securities and Exchange Commission (https://www.sec.gov/about/privacy/piaguide.pdf) and Federal Trade Commission (https://www.ftc.gov/site-information/privacy-policy/privacy-impact-assessments) the organizations of all sizes need an intelligent, private, data discovery solution to appropriately quantify information and cyber risk so that data governance and its compliance can be implemented. Said organizations are typically ill-equipped to sufficiently quantify information or cyber risk posed by the data at rest and in transit on their networks performed on a periodic basis that requires protection commensurate with various US and European regulations such as NYSDFS (New York State Department of Financial Services), GDPR (European General Data Protection Regulation), GLBA (Gramm Leach Bliley Act), PCI (Payment Card Information) and PHI (Personal Health Information), and like. As a result, any insufficient protections may expose said organizations to various liabilities including those due to regulatory non-compliance. A prior art patent application US20120004945A1 relates to a computerized system and method for collecting, analyzing, and reporting governance, risk, and compliance information relating to an organization. The method includes specifying a target for scanning, establishing a communication link with the specified target, identifying technical data within the specified target, receiving the identified technical data, parsing the technical data into one or more lexical units, selecting a regulatory map against which the one or more lexical units are evaluated, determining whether one or more lexical units is in compliance with the selected regulatory map, and providing the results of the determining step to a user. Another prior art patent application US20090265199A1 discloses a method for governance, risk, and compliance management which includes providing an interface for defining a control to be used to reach a goal of an organization. The control provides a procedure to be followed by the organization. The method further includes providing the interface for defining a metric for tracking progress of the organization towards reaching the goal using the procedure. The method further includes receiving metric data from an external source. The metric data corresponds to the metric. The method further includes tracking the progress of the organization towards reaching the goal using at least the metric and the metric data and displaying the progress of the organization towards reaching the goal. One more prior art patent U.S. Pat. No. 9,262,727B2 is directed to a system and method for searching a computing device for confidential content and reporting back any policy violations. Yet another prior art patent U.S. Pat. No. 10,482,396B2 relates to system and method for automated compliance verification. In particular, a compliance computer creates and sends a transmission object, which contains data referencing the rules contained in the compliance documents, to an operator server. The operator server searches for updates to the rules referenced in the transmission object and informs the entity if any updates are found. The transmission object may also reference jurisdictions and topics associated with the entity's operations, in which case the operator server uses that data to identify rules and/or updates to rules applicable to the entity's operations. The entity may then use the information from the operator server to update its compliance documents. Hence,