US-20260127316-A1 - APPARATUS AND A METHOD FOR THE ANONYMIZATION OF USER DATA
Abstract
An apparatus for the anonymization of user data is disclosed. The apparatus includes at least processor and a memory communicatively connected to the processor. The memory contains instructions configuring the processor to receive a plurality of user data. The memory contains instructions configuring the processor to identify a plurality of patient identifiers within the plurality of user data. The memory contains instructions configuring the processor to generate contextual data associated with each patient identifier of the plurality of patient identifier. The memory contains instructions configuring the processor to identify one or more false positive patient identifiers within the plurality of patient identifiers as a function of the contextual data.
Inventors
- Karthik MURUGADOSS
- Sankar Ardhanari
Assignees
- nference, inc.
Dates
- Publication Date
- 20260507
- Application Date
- 20251229
Claims (20)
- 1 . An apparatus, wherein the apparatus comprises: at least a processor; and a memory communicatively connected to the at least a processor, wherein the memory contains instructions configuring the at least a processor to: receive a plurality of user data and a plurality of patient identifiers associated with the plurality of user data; generate contextual data associated with each patient identifier of the plurality of patient identifiers, wherein generating the contextual data associated with each patient identifier comprises identifying a set of composite terms within the plurality of user data; and identify, using a language processing model having a transformer architecture, one or more false positive patient identifiers within a plurality of patent identifiers as a function of the contextual data.
- 2 . The apparatus of claim 1 , wherein the user data comprises patient data.
- 3 . The apparatus of claim 1 , wherein at least one composite term of the set of composite terms is crafted by aggregating one or more patient identifiers with one or more related terms, wherein the aggregation creates at least a phrase that provides additional context to the at least one composite term.
- 4 . The apparatus of claim 1 , wherein the at least a processor is further configured to identify, using the language processing model, the one or more false positive patient identifiers by executing one or more attention mechanisms, wherein the one or more attention mechanisms is configured to: identify a plurality of relationships between words within the contextual data; and detect, as a function of the identified relationships, the one or more false positive patient identifiers within the plurality of patient identifiers.
- 5 . The apparatus of claim 4 , wherein the one or more attention mechanisms comprise a plurality of layers of self-attention neural networks and feed-forward neural networks.
- 6 . The apparatus of claim 1 , wherein the at least a processor is further configured to identify, using the language processing model, the one or more false positive patient identifiers by: determining one or more indications within the contextual data, wherein the one or more indications are determined as a function of a comparison between extracted significant terms and a semantic relationship; and classify as a function of the one or more indications, each patient identifier as a true positive or a false positive.
- 7 . The apparatus of claim 6 , wherein the one or more indications comprise one or more of a positive indication associated with the true positive and a negative indication associated with the false positive.
- 8 . The apparatus of claim 1 , wherein the at least a processor is further configured to identify, using a deep learning model, the one or more false positive patient identifiers by: weighing an importance of sub-words within the contextual data; and detecting, as a function of a weighted importance of the sub-words, one or more associations between extracted significant terms; and identify the one or more false positive patient identifiers based on the one or more associations.
- 9 . The apparatus of claim 1 wherein the memory further contains instructions configuring the at least a processor to generate anonymized data corresponding to the plurality of patient identifiers.
- 10 . The apparatus of claim 1 , wherein receiving the plurality of user data comprises receiving the plurality of user data from one or more electronic health records (EHRs).
- 11 . A method, wherein the method comprises: receiving, using at least a processor, a plurality of user data and a plurality of patient identifiers associated with the plurality of user data; generating, using the at least a processor, contextual data associated with each patient identifier of the plurality of patient identifiers, wherein generating the contextual data associated with each patient identifier comprises identifying a set of composite terms within the plurality of user data; and identifying, using a language processing model having a transformer architecture, one or more false positive patient identifiers within a plurality of patent identifiers as a function of the contextual data.
- 12 . The method of claim 11 , wherein the user data comprises patient data.
- 13 . The method of claim 11 , wherein at least one composite term of the set of composite terms is crafted by aggregating one or more patient identifiers with one or more related terms, wherein the aggregation creates at least a phrase that provides additional context to the at least one composite term.
- 14 . The method of claim 11 , further comprising identifying, using the language processing model, the one or more false positive patient identifiers by executing one or more attention mechanisms, wherein the one or more attention mechanisms is configured to: identify a plurality of relationships between words within the contextual data; and detect, as a function of the identified relationships, the one or more false positive patient identifiers within the plurality of patient identifiers.
- 15 . The method of claim 14 , wherein the one or more attention mechanisms comprise a plurality of layers of self-attention neural networks and feed-forward neural networks.
- 16 . The method of claim 11 , further comprising identifying, using the language processing model, the one or more false positive patient identifiers by: determining one or more indications within the contextual data, wherein the one or more indications are determined as a function of a comparison between extracted significant terms and a semantic relationship; and classify as a function of the one or more indications, each patient identifier as a true positive or a false positive.
- 17 . The method of claim 16 , wherein the one or more indications comprise one or more of a positive indication associated with the true positive and a negative indication associated with the false positive.
- 18 . The method of claim 11 , further comprising identifying, using a deep learning model, the one or more false positive patient identifiers by: weighing an importance of sub-words within the contextual data; and detecting, as a function of a weighted importance of the sub-words, one or more associations between extracted significant terms; and identify the one or more false positive patient identifiers based on the one or more associations.
- 19 . The method of claim 11 , further comprising generating, using the at least a processor, anonymized data corresponding to the plurality of patient identifiers.
- 20 . The method of claim 11 , wherein receiving the plurality of user data comprises receiving the plurality of user data from one or more electronic health records (EHRs).
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/604,154, filed on Mar. 13, 2024, and titled “APPARATUS AND A METHOD FOR THE ANONYMIZATION OF USER DATA” (“the Ser. No. 18/604,154 application”), which claims the benefit of priority of U.S. Provisional Patent Application Ser. No. 63/461,121, filed on Apr. 21, 2023, and titled “SYSTEMS AND METHODS FOR IMPROVING PERFORMANCE IN A COMPUTING SYSTEM FOR DE-IDENTIFICATION” (“the 63/461,121 application”). This application claims the benefit of priority of both the Ser. No. 18/604,154 application and the 63/461,121 application, which are both incorporated by reference herein in their entirety. FIELD OF THE INVENTION The present invention generally relates to the field of data processing. In particular, the present invention is directed to an apparatus and a method for the anonymization of user data. BACKGROUND Managing sensitive data, such as, without limitation, data about patients collected by hospitals, healthcare providers, and/or care givers has long been a labor-intensive process. Data security threats have increased exponentially due to the escalating challenges of data security in an interconnected world. SUMMARY OF THE DISCLOSURE In an aspect, an apparatus is disclosed. The apparatus includes at least processor and a memory communicatively connected to the processor. The memory contains instructions configuring the processor to receive a plurality of user data and a plurality of patient identifiers associated with the plurality of user data, generate contextual data associated with each patient identifier of the plurality of patient identifiers, wherein generating the contextual data associated with each patient identifier comprises identifying a set of composite terms within the plurality of user data, and identify, using a language processing model having a transformer architecture, one or more false positive patient identifiers within the plurality of patent identifiers as a function of the contextual data. In another aspect, a method is disclosed. The method includes receiving, using at least a processor, a plurality of user data and a plurality of patient identifiers associated with the plurality of user data, generating, using the at least a processor, contextual data associated with each patient identifier of the plurality of patient identifiers, wherein generating the contextual data associated with each patient identifier comprises identifying a set of composite terms within the plurality of user data, and identifying, using a language processing model having a transformer architecture, one or more false positive patient identifiers within the plurality of patent identifiers as a function of the contextual data. These and other aspects and features of non-limiting embodiments of the present invention will become apparent to those skilled in the art upon review of the following description of specific non-limiting embodiments of the invention in conjunction with the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS For the purpose of illustrating the invention, the drawings show aspects of one or more embodiments of the invention. However, it should be understood that the present invention is not limited to the precise arrangements and instrumentalities shown in the drawings, wherein: FIG. 1 is a block diagram of an exemplary embodiment of an apparatus for the anonymization of user data; FIG. 2 is a block diagram of an exemplary machine-learning process; FIG. 3 is a block diagram of an exemplary embodiment of an anonymization database; FIG. 4 is a diagram of an exemplary embodiment of a neural network; FIG. 5 is a diagram of an exemplary embodiment of a node of a neural network; FIG. 6 is an illustration of an exemplary embodiment of fuzzy set comparison; FIG. 7 is an illustration of an exemplary embodiment of a user interface; FIG. 8 is a flow diagram of an exemplary method for the anonymization of user data; and FIG. 9 is a block diagram of a computing system that can be used to implement any one or more of the methodologies disclosed herein and any one or more portions thereof. The drawings are not necessarily to scale and may be illustrated by phantom lines, diagrammatic representations, and fragmentary views. In certain instances, details that are not necessary for an understanding of the embodiments or that render other details difficult to perceive may have been omitted. DETAILED DESCRIPTION At a high level, aspects of the present disclosure are directed to an apparatus and a method for the anonymization of user data. The apparatus includes at least processor and a memory communicatively connected to the processor. The memory contains instructions configuring the processor to receive a plurality of user data. The memory contains instructions configuring the processor to identify a plurality of patient identifiers within the plurality of user data. The memory contains instru