EP-4738168-A1 - METHOD AND SYSTEM FOR MITIGATING ENTERPRISE DATA LEAKAGE IN QUERIES TO LARGE LANGUAGE MODELS

EP4738168A1EP 4738168 A1EP4738168 A1EP 4738168A1EP-4738168-A1

Abstract

Unrestricted access to large language models (LLM) based services can lead to potential data leakages, especially for large enterprises providing products and services to clients that require legal confidentiality guarantees. However, a blanket restriction on such services is not ideal as these LLMs boost employee productivity. Objective of the present disclosure is to build a solution that enables enterprise employees to query such external LLMs, without leaking confidential internal and client information. QueryShield platform of the present disclosure is a platform that enterprises can use to interact with external LLMs without leaking data through queries. It detects if a query leaks data and rephrases it to minimize data leakage while limiting the impact to its semantics. A language model is chosen from a set of lightweight model candidates that are identified and fine tuned for this purpose using a huge dataset and evaluated using multiple metrics.

Inventors

ANTONY, DELTON MYALIL
RAMRAKHIYANI, NITIN VIJAYKUMAR
PAWAR, SACHIN SHARAD
Apte, Manoj Madhav
ALASINGARA BHATTACHAR, RAJAN MINDIGAL
SAGLANI, Divyesh
SHAIK, IMTIYAZUDDIN
LODHA, SACHIN PREMSUKH

Assignees

Tata Consultancy Services Limited

Dates

Publication Date: 20260506
Application Date: 20250922

Claims (15)

A processor-implemented method (200), the method comprising: receiving (202), by one or more hardware processors, an input user query associated with querying a plurality of large language models (LLMs); computing (204), by the one or more hardware processors, a sensitive data leakage level associated with the input user query by prompting a trained enterprise data leakage mitigation model based on a first set of specific instructions (T1) as prefix to the input user query, wherein the sensitive data leakage level is classified as one of (i) a high and (ii) a low based on an associated threshold value; generating (206), by the one or more hardware processors, via the trained enterprise data leakage mitigation model if the sensitive data leakage level is higher than a predefined threshold, a plurality of rephrased queries associated with the input user query that retain semantics of the input user query and reduce sensitive data leakage level by prompting the trained enterprise data leakage mitigation model with a second set of specific instructions (T2) as prefix to the user query; simultaneously identifying (208), by the one or more hardware processors, types of the sensitive data leakage associated with the input user query by prompting the trained enterprise data leakage mitigation model based on a third set of specific instructions (T3) as prefix to the input user query in the trained enterprise data leakage mitigation model; and repeating (210), by the one or more hardware processors, the above steps until generating an optimal rephrased query with the sensitive data leakage level less than the predefined threshold.
The processor implemented method as claimed in claim 1, wherein the user is allowed to query an associated LLM from among the plurality of LLMs using the input user query only if the sensitive data leakage level is less than a predefined threshold and, wherein identified types of the sensitive data leakage associated with the input user query is intimated to the user and further utilized for updating the data leakage mitigation model.
The processor implemented method as claimed in claim 1, wherein the enterprise data leakage mitigation model is obtained by: receiving a plurality of user queries from a plurality of sources, wherein the plurality of sources comprises queries generated by a group of users, ChatGPT, and queries from a plurality of publicly available datasets; obtaining a plurality of training instances for T1, T2 and T3 tasks from a plurality of sources, wherein T1 is for classifying the sensitive data leakage level in the plurality of user queries, T2 is for generating rephrased queries if a classified query has high sensitive data leakage level, and T3 is for detecting the types of sensitive data leakage level with respect to pre-defined list sensitive data leakage level; obtaining a gold-standard labels for the plurality of training instances of T1, T2 and T3, wherein gold-standard labels for training instances of T1, T2 and T3 are obtained from an associated predefined plurality of annotations; finetuning a pre-trained language model with the training instances of T1 for K epochs, wherein each of the plurality of training instances of T1 comprises input text paired with an expected labelled output text obtained from the predefined plurality of annotations, wherein the plurality of training instances of T1 are debiased using an anomaly detection technique; finetuning a language model with the associated plurality of training instances of T1, and T3 for K epochs with a validation loss less than a predefined threshold, wherein the validation loss is computed on a validation set in each of the K epochs and an optimum validation loss is selected over the K epochs based on a predefined validation threshold, wherein each of the plurality of training instances of T3 comprises the paired high sensitive data leakage level text and labelled output text generated based on the predefined plurality of annotations; finetuning the language model with the associated plurality of training instances of T1, T2, and T3 with the optimum validation loss, wherein each of the plurality of training instances of T2 comprises the paired high sensitive data leakage level text and rephrased output text generated based on the predefined plurality of annotations; performing the final training of the language model for K epochs with the training instances of T1, T2, and T3; and evaluating the trained language model using a plurality of metrices.
The processor implemented method as claimed in claim 3, wherein T1 is evaluated using recall and F1 score metric, and wherein T3 is evaluated using micro and macro averaged F1 scores.
The processor implemented method as claimed in claim 3, wherein T2 is evaluated by: (i) computing a cross-reference score by comparing a plurality of rephrased queries, input user queries and a gold standard rephrased query (ii) computing a Named Entity Leakage (NEL) as the percentage of named entity terms in the plurality of rephrased queries occurring as part of false positives (iii) evaluating the plurality of rephrased queries for retaining the maximum original semantics of query-q using CRR score and NEL and (iv) precision of label "LOW" of T1.
A system (100) comprising: at least one memory (104) storing programmed instructions; one or more Input /Output (I/O) interfaces (112); and one or more hardware processors (102) operatively coupled to the at least one memory (104), wherein the one or more hardware processors (102) are configured by the programmed instructions to: receive an input user query associated with querying a plurality of large language models (LLMs); compute a sensitive data leakage level associated with the input user query by prompting a trained enterprise data leakage mitigation model based on a first set of specific instructions (T1) as prefix to the input user query, wherein the sensitive data leakage level is classified as one of (i) a high and (ii) a low based on an associated threshold value; generate via the trained enterprise data leakage mitigation model if the sensitive data leakage level is higher than a predefined threshold, a plurality of rephrased queries associated with the input user query that retain semantics of the input user query and reduce sensitive data leakage level by prompting the trained enterprise data leakage mitigation model with a second set of specific instructions (T2) as prefix to the user query; simultaneously identify types of the sensitive data leakage associated with the input user query by prompting the trained enterprise data leakage mitigation model based on a third set of specific instructions (T3) as prefix to the input user query in the trained enterprise data leakage mitigation model; and repeat the above steps until generating an optimal rephrased query with the sensitive data leakage level less than the predefined threshold.
The system of claim 6, wherein the user is allowed to query an associated LLM from among the plurality of LLMs using the input user query only if the sensitive data leakage level is less than a predefined threshold and, wherein identified types of the sensitive data leakage associated with the input user query is intimated to the user.
The system of claim 6, wherein the enterprise data leakage mitigation model is obtained by: receiving a plurality of user queries from a plurality of sources, wherein the plurality of sources comprises queries generated by a group of users, ChatGPT, and queries from a plurality of publicly available datasets; obtaining a plurality of training instances for T1, T2 and T3 tasks from a plurality of sources, wherein T1 is for classifying the sensitive data leakage level in the plurality of user queries, T2 is for generating rephrased queries if a classified query has high sensitive data leakage level, and T3 is for detecting the types of sensitive data leakage level with respect to pre-defined list sensitive data leakage level; obtaining a gold-standard labels for the plurality of training instances of T1, T2 and T3, wherein gold-standard labels for training instances of T1, T2 and T3 are obtained from an associated predefined plurality of annotations; finetuning a pre-trained language model with the training instances of T1 for K epochs, wherein each of the plurality of training instances of T1 comprises input text paired with an expected labelled output text obtained from the predefined plurality of annotations, wherein the plurality of training instances of T1 are debiased using an anomaly detection technique; finetuning a language model with the associated plurality of training instances of T1, and T3 for K epochs with a validation loss less than a predefined threshold, wherein the validation loss is computed on a validation set in each of the K epochs and an optimum validation loss is selected over the K epochs based on a predefined validation threshold, wherein each of the plurality of training instances of T3 comprises the paired high sensitive data leakage level text and labelled output text generated based on the predefined plurality of annotations; finetuning the language model with the associated plurality of training instances of T1, T2, and T3 with the optimum validation loss, wherein each of the plurality of training instances of T2 comprises the paired high sensitive data leakage level text and rephrased output text generated based on the predefined plurality of annotations; performing the final training of the language model for K epochs with the training instances of T1, T2, and T3; and evaluating the trained language model using a plurality of metrices.
The system of claim 8, wherein T1 is evaluated using recall and F1 score metric, and wherein T3 is evaluated using micro and macro averaged F1 scores.
The system of claim 8, wherein T2 is evaluated by: (i) computing a cross-reference score by comparing a plurality of rephrased queries, input user queries and a gold standard rephrased query (ii) computing a Named Entity Leakage (NEL) as the percentage of named entity terms in the plurality of rephrased queries occurring as part of false positives (iii) evaluating the plurality of rephrased queries for retaining the maximum original semantics of query-q using CRR score and NEL and (iv) precision of label "LOW" of T1.
One or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause: receiving, by one or more hardware processors, an input user query associated with querying a plurality of large language models (LLMs); computing, by the one or more hardware processors, a sensitive data leakage level associated with the input user query by prompting a trained enterprise data leakage mitigation model based on a first set of specific instructions (T1) as prefix to the input user query, wherein the sensitive data leakage level is classified as one of (i) a high and (ii) a low based on an associated threshold value; generating, by the one or more hardware processors, via the trained enterprise data leakage mitigation model if the sensitive data leakage level is higher than a predefined threshold, a plurality of rephrased queries associated with the input user query that retain semantics of the input user query and reduce sensitive data leakage level by prompting the trained enterprise data leakage mitigation model with a second set of specific instructions (T2) as prefix to the user query; simultaneously identifying, by the one or more hardware processors, types of the sensitive data leakage associated with the input user query by prompting the trained enterprise data leakage mitigation model based on a third set of specific instructions (T3) as prefix to the input user query in the trained enterprise data leakage mitigation model; and repeating, by the one or more hardware processors, the above steps until generating an optimal rephrased query with the sensitive data leakage level less than the predefined threshold.
The one or more non-transitory machine-readable information storage mediums of claim 11, wherein the user is allowed to query an associated LLM from among the plurality of LLMs using the input user query only if the sensitive data leakage level is less than a predefined threshold and, wherein identified types of the sensitive data leakage associated with the input user query is intimated to the user.
The one or more non-transitory machine-readable information storage mediums of claim 11, wherein the enterprise data leakage mitigation model is obtained by: receiving a plurality of user queries from a plurality of sources, wherein the plurality of sources comprises queries generated by a group of users, ChatGPT, and queries from a plurality of publicly available datasets; obtaining a plurality of training instances for T1, T2 and T3 tasks from a plurality of sources, wherein T1 is for classifying the sensitive data leakage level in the plurality of user queries, T2 is for generating rephrased queries if a classified query has high sensitive data leakage level, and T3 is for detecting the types of sensitive data leakage level with respect to pre-defined list sensitive data leakage level; obtaining a gold-standard labels for the plurality of training instances of T1, T2 and T3, wherein gold-standard labels for training instances of T1, T2 and T3 are obtained from an associated predefined plurality of annotations; finetuning a pre-trained language model with the training instances of T1 for K epochs, wherein each of the plurality of training instances of T1 comprises input text paired with an expected labelled output text obtained from the predefined plurality of annotations, wherein the plurality of training instances of T1 are debiased using an anomaly detection technique; finetuning a language model with the associated plurality of training instances of T1, and T3 for K epochs with a validation loss less than a predefined threshold, wherein the validation loss is computed on a validation set in each of the K epochs and an optimum validation loss is selected over the K epochs based on a predefined validation threshold, wherein each of the plurality of training instances of T3 comprises the paired high sensitive data leakage level text and labelled output text generated based on the predefined plurality of annotations; finetuning the language model with the associated plurality of training instances of T1, T2, and T3 with the optimum validation loss, wherein each of the plurality of training instances of T2 comprises the paired high sensitive data leakage level text and rephrased output text generated based on the predefined plurality of annotations; performing the final training of the language model for K epochs with the training instances of T1, T2, and T3; and evaluating the trained language model using a plurality of metrices.
The one or more non-transitory machine-readable information storage mediums of claim 13, wherein T1 is evaluated using recall and F1 score metric, and wherein T3 is evaluated using micro and macro averaged F1 scores.
The one or more non-transitory machine-readable information storage mediums of claim 13, wherein T2 is evaluated by: (i) computing a cross-reference score by comparing a plurality of rephrased queries, input user queries and a gold standard rephrased query (ii) computing a Named Entity Leakage (NEL) as the percentage of named entity terms in the plurality of rephrased queries occurring as part of false positives (iii) evaluating the plurality of rephrased queries for retaining the maximum original semantics of query-q using CRR score and NEL and (iv) precision of label "LOW" of T1.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY The present application claims priority to Indian application no. 202421084709, filed on November 5, 2024. TECHNICAL FIELD The disclosure herein generally relates to the field of machine learning and, more particularly, to a method and system for mitigating enterprise data leakage in queries to large language models. BACKGROUND The rapid advancement of Generative AI (Gen-AI), especially Large Language Models (LLMs), has significantly improved productivity across various industries. These models, capable of understanding and generating human-like text, save considerable time in tasks that traditionally required extensive human effort. This efficiency allows businesses to enhance throughput without sacrificing output quality. AI is emerging as a tool that augments human capabilities, and by integrating AI, businesses can maintain a competitive edge. Companies that adopted AI experienced substantial productivity gains over those who did not. This disparity has further expanded with the introduction of Gen-AI. However, the privacy, security and safety implications of Gen-AI demands special investigation. It was observed that sensitive details inadvertently surfacing in model outputs since they are trained on gargantuan datasets. The accurate and coherent performance of LLMs emerges from their ability to memorize rare training samples, and this poses significant privacy threats when the datasets used to train them contain sensitive data. In contrast, there is potential for data leakage to an LLM through user queries as humans are the weakest link in security and privacy. LLM service providers may use this interaction data for further model training, and this may consequently spill the same sensitive data, that was once sent as a query, when attacked. This risk is further exacerbated when employees of companies, in attempts to gain competitive edge, leak confidential company data through their prompts to an external LLM service such as Chat GPT or Google Gemini. Despite the confidentiality guarantees provided by the LLM service providers, there have been unintentional instances where chat data was leaked. This concern has led some companies to enforce an organizational ban on chat models. Such restrictions severely impact the competitive edge of a company, especially if competent in-house alternatives are not provided. There is an increasing need for a privacy preserving prompting solution that not only safeguards against data leakage, but also ensures that the utility provided by powerful external LLMs like GPT-4o is not impacted. SUMMARY Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for mitigating enterprise data leakage in queries to large language models (LLMs) is provided. The method includes receiving, by one or more hardware processors, an input user query associated with querying a plurality of large language models (LLMs). Further, the method includes computing, by the one or more hardware processors, a sensitive data leakage level associated with the input user query by prompting a trained enterprise data leakage mitigation model based on a first set of specific instructions (T1) as prefix to the input user query, wherein the sensitive data leakage level is classified as one of (i) a high and (ii) a low based on an associated threshold value. Furthermore, the method includes generating, by the one or more hardware processors, via the trained enterprise data leakage mitigation model if the sensitive data leakage level is higher than a predefined threshold, a plurality of rephrased queries associated with the input user query that retain semantics of the input user query and reduce sensitive data leakage level by prompting the trained enterprise data leakage mitigation model with a second set of specific instructions (T2) as prefix to the user query. Furthermore, the method includes simultaneously identifying, by the one or more hardware processors, types of the sensitive data leakage associated with the input user query by prompting the trained enterprise data leakage mitigation model based on a third set of specific instructions (T3) as prefix to the input user query in the trained enterprise data leakage mitigation model. Finally, the method includes repeating, by the one or more hardware processors, the above steps until generating an optimal rephrased query with the sensitive data leakage level less than the predefined threshold. In another aspect, a system for mitigating enterprise data leakage in queries to large language models (LLMs) is provided. The system includes at least one memory storing programmed instructions, one or more Input /Output (I/O) interfaces, and one or more hardware processors operatively coupled to the at least one memory, wherein the o