US-12619781-B2 - Verbatim feedback processing for compliant retention

US12619781B2US 12619781 B2US12619781 B2US 12619781B2US-12619781-B2

Abstract

A verbatim feedback processing system utilizes generative Artificial Intelligence (AI) which has been trained to rewrite text in a manner that retains the meaning, tone, sentiment, and the like, while also anonymizing the text by removing personal identifiable information. The anonymized feedback is then processed by a similarity validation component and an anonymity validation component. The similarity validation component determines whether the anonymized feedback is within a predetermined similarity threshold of the original feedback. The anonymity validation component determines whether the anonymized feedback satisfies anonymity requirements. If anonymized feedback satisfies similarity and anonymity requirements, the anonymized feedback is stored in an anonymized feedback database.

Inventors

Maanasa GHANTASALA
Mastafa Hamza FOUFA

Assignees

MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date: 20260505
Application Date: 20240430

Claims (20)

1 . A data processing system comprising: a processor; and a memory in communication with the processor, the memory comprising executable instructions that, when executed by the processor alone or in combination with other processors, cause the data processing system to perform functions of: receiving a verbatim feedback item from a computing device via a network, the verbatim feedback item comprising natural language text entered via a user input device and comprising unstructured content including sentences and expressions; generating a feedback prompt that includes the verbatim feedback item and providing the feedback prompt to an anonymizer model, the anonymizer model comprising a generative language model trained to process the natural language input to rewrite the text while preserving linguistic characteristics including meaning, tone, and sentiment, the anonymizer model being trained to generate an anonymized feedback item based on the verbatim feedback item included in the feedback prompt, the anonymized feedback item comprising natural language text; providing the anonymized feedback item and the verbatim feedback item to a similarity validation component which generates a similarity score based on a similarity metric indicative of a level of similarity between the anonymized feedback item and the verbatim feedback item, wherein the similarity validation component processes the verbatim feedback item and the anonymized feedback item as natural language input by converting each into a semantic embedding representing linguistic meaning and context, and computes the similarity score using a similarity metric that quantifies semantic relatedness between the semantic embeddings; providing the anonymized feedback item to an anonymity validation component which generates an anonymity score indicative of whether the anonymized feedback item includes personally identifiable information; in response to the similarity score being within a predefined threshold for the similarity score and the anonymity score indicating that the anonymized feedback does not include personally identifiable information, saving the anonymized feedback item to an anonymized feedback store; and in response to the similarity score not being within the predefined threshold for the similarity score or the anonymity score indicating that the anonymized feedback does include personally identifiable information, discarding the anonymized feedback item without saving the anonymized feedback item in the anonymized feedback store.
2 . The data processing system of claim 1 , wherein saving the anonymized feedback item includes saving metadata in association with the anonymized feedback item that identifies at least one product, application, service, and experience that the anonymized feedback item is directed to.
3 . The data processing system of claim 1 , wherein the similarity metric comprises cosine similarity.
4 . The data processing system of claim 3 , wherein the predefined threshold for the similarity score is 0 to 0.8.
5 . The data processing system of claim 1 , wherein the functions further comprise: transforming the anonymized feedback item to an anonymized sentence embedding and the verbatim feedback item to a verbatim sentence embedding using a pre-trained Sentence BERT (SBERT) model.
6 . The data processing system of claim 1 , wherein the anonymity validation component comprises at least one artificial intelligence (AI) model or machine learning model trained to recognize personal identifiable information in natural language text.
7 . The data processing system of claim 1 , wherein the functions further comprise: saving a plurality of anonymized feedback items in the anonymized feedback store; and processing the plurality of anonymized feedback items using a feedback analysis component, the feedback analysis component being configured to generate feedback analysis data pertaining to the plurality of anonymized feedback items, the feedback analysis data including insights and/or action items pertaining to a product, application, service, or experience to which the plurality of anonymized feedback items pertain.
8 . The data processing system of claim 1 , wherein the functions further comprise: generating a feedback request that includes a survey, at least one question, and/or a request for comments and displaying the feedback request in a user interface of a computing device, wherein the verbatim feedback item is received in response to the feedback request.
9 . A method of processing verbatim feedback, the method comprising: receiving a verbatim feedback item from a computing device via a network, the verbatim feedback item comprising natural language text entered via a user input device and comprising unstructured content including sentences and expressions; generating a feedback prompt that includes the verbatim feedback item and providing the feedback prompt to an anonymizer model, the anonymizer model comprising a generative language model trained to process the natural language input to rewrite the text while preserving linguistic characteristics including meaning, tone, and sentiment, the anonymizer model being trained to generate an anonymized feedback item based on the verbatim feedback item included in the feedback prompt, the anonymized feedback item comprising natural language text; providing the anonymized feedback item and the verbatim feedback item to a similarity validation component which generates a similarity score based on a similarity metric indicative of a level of similarity between the anonymized feedback item and the verbatim feedback item, wherein the similarity validation component processes the verbatim feedback item and the anonymized feedback item as natural language input by converting each into a semantic embedding representing linguistic meaning and context, and computes the similarity score using a similarity metric that quantifies semantic relatedness between the semantic embeddings; providing the anonymized feedback item to an anonymity validation component which generates an anonymity score indicative of whether the anonymized feedback item includes personally identifiable information; in response to the similarity score being within a predefined threshold for the similarity score and the anonymity score indicating that the anonymized feedback does not include personally identifiable information, saving the anonymized feedback item to an anonymized feedback store; and in response to the similarity score not being within the predefined threshold for the similarity score or the anonymity score indicating that the anonymized feedback does include personally identifiable information, discarding the anonymized feedback item without saving the anonymized feedback item in the anonymized feedback store.
10 . The method of claim 9 , wherein saving the anonymized feedback item includes saving metadata in association with the anonymized feedback item that identifies at least one product, application, service, and experience that the anonymized feedback item is directed to.
11 . The method of claim 9 , wherein the similarity metric comprises cosine similarity.
12 . The method of claim 11 , wherein the predefined threshold for the similarity score is 0 to 0.8.
13 . The method of claim 9 , further comprising: transforming the anonymized feedback item to an anonymized sentence embedding and the verbatim feedback item to a verbatim sentence embedding using a pre-trained Sentence BERT (SBERT) model.
14 . The method of claim 9 , wherein the anonymity validation component comprises at least one artificial intelligence (AI) model or machine learning model trained to recognize personal identifiable information in natural language text.
15 . The method of claim 9 , further comprising: saving a plurality of anonymized feedback items in the anonymized feedback store; and processing the plurality of anonymized feedback items using a feedback analysis component, the feedback analysis component being configured to generate feedback analysis data pertaining to the plurality of anonymized feedback items, the feedback analysis data including insights and/or action items pertaining to a product, application, service, or experience to which the plurality of anonymized feedback items pertain.
16 . The method of claim 9 , further comprising: generating a feedback request that includes a survey, at least one question, and/or a request for comments and displaying the feedback request in a user interface of a computing device, wherein the verbatim feedback item is received in response to the feedback request.
17 . A non-transitory computer readable medium on which are stored instructions that, when executed, cause a programmable device to perform functions of: receiving a verbatim feedback item from a computing device via a network, the verbatim feedback item comprising natural language text entered via a user input device and comprising unstructured content including sentences and expressions; generating a feedback prompt that includes the verbatim feedback item and providing the feedback prompt to an anonymizer model, the anonymizer model comprising a generative language model trained to process the natural language input to rewrite the text while preserving linguistic characteristics including meaning, tone, and sentiment, the anonymizer model being trained to generate an anonymized feedback item based on the verbatim feedback item included in the feedback prompt, the anonymized feedback item comprising natural language text; providing the anonymized feedback item and the verbatim feedback item to a similarity validation component which generates a similarity score based on a similarity metric indicative of a level of similarity between the anonymized feedback item and the verbatim feedback item, wherein the similarity validation component processes the verbatim feedback item and the anonymized feedback item as natural language input by converting each into a semantic embedding representing linguistic meaning and context, and computes the similarity score using a similarity metric that quantifies semantic relatedness between the semantic embeddings; providing the anonymized feedback item to an anonymity validation component which generates an anonymity score indicative of whether the anonymized feedback item includes personally identifiable information; in response to the similarity score being within a predefined threshold for the similarity score and the anonymity score indicating that the anonymized feedback does not include personally identifiable information, saving the anonymized feedback item to an anonymized feedback store; and in response to the similarity score not being within the predefined threshold for the similarity score or the anonymity score indicating that the anonymized feedback does include personally identifiable information, discarding the anonymized feedback item without saving the anonymized feedback item in the anonymized feedback store.
18 . The non-transitory computer readable medium of claim 17 , wherein the similarity metric comprises cosine similarity.
19 . The non-transitory computer readable medium of claim 17 , wherein the functions further comprise: transforming the anonymized feedback item to an anonymized sentence embedding and the verbatim feedback item to a verbatim sentence embedding using a pre-trained Sentence BERT (SBERT) model.
20 . The non-transitory computer readable medium of claim 17 , wherein the functions further comprise: saving a plurality of anonymized feedback items in the anonymized feedback store; and processing the plurality of anonymized feedback items using a feedback analysis component, the feedback analysis component being configured to generate feedback analysis data pertaining to the plurality of anonymized feedback items, the feedback analysis data including insights and/or action items pertaining to a product, application, service, or experience to which the plurality of anonymized feedback items pertain.

Description

BACKGROUND Users provide written feedback pertaining to products, applications, services, experiences, and the like through various forums. This feedback is used in insight generation processes to guide development, generate insights, prioritize action items, resolve confusion, and improve customer satisfaction. However, current data privacy regulations limit how feedback data can be collected, stored, and used. For example, data privacy regulations may limit how long user feedback data can be stored. These regulations depend at least in part on whether feedback data includes personal information. Data retention time limits pose challenges to processing large amounts of written text to generate useful information, which can be a time-consuming task. Data retention time limits can be avoided by anonymizing feedback data. However, anonymizing feedback data is faced with the same challenges as insight generation processing as it also requires processing large amounts of written text within data retention time limits. In addition, anonymizing feedback data can alter the meaning, tone, or sentiment of the feedback which defeats the purpose of verbatim feedback analysis. Thus, what is needed is a method of anonymizing verbatim feedback data that overcomes the limitations and the challenges faced by previously known feedback processing systems. SUMMARY In one general aspect, the instant disclosure presents a data processing system having a processor and a memory in communication with the processor wherein the memory stores executable instructions that, when executed by the processor alone or in combination with other processors, cause the data processing system to perform multiple functions. The functions include receiving a verbatim feedback item from a computing device via a network, the verbatim feedback item comprising natural language text; generating a feedback prompt that includes the verbatim feedback item and providing the feedback prompt to an anonymizer model, the anonymizer model being trained to generate an anonymized feedback item based on the verbatim feedback item included in the feedback prompt, the anonymized feedback item comprising natural language text; providing the anonymized feedback item and the verbatim feedback item to a similarity validation component which generates a similarity score based on a similarity metric indicative of a level of similarity between the anonymized feedback item and the verbatim feedback item; providing the anonymized feedback item to an anonymity validation component which generates an anonymity score indicative of whether the anonymized feedback item includes personally identifiable information; in response to the similarity score being within a predefined threshold for the similarity score and the anonymity score indicating that the anonymized feedback does not include personally identifiable information, saving the anonymized feedback item to an anonymized feedback store; and in response to the similarity score not being within the predefined threshold for the similarity score or the anonymity score indicating that the anonymized feedback does include personally identifiable information, discarding the anonymized feedback item without saving the anonymized feedback item in the anonymized feedback store. In yet another general aspect, the instant disclosure presents a method of processing verbatim feedback that comprises receiving a verbatim feedback item from a computing device via a network, the verbatim feedback item comprising natural language text; generating a feedback prompt that includes the verbatim feedback item and providing the feedback prompt to an anonymizer model, the anonymizer model being trained to generate an anonymized feedback item based on the verbatim feedback item included in the feedback prompt, the anonymized feedback item comprising natural language text; providing the anonymized feedback item and the verbatim feedback item to a similarity validation component which generates a similarity score based on a similarity metric indicative of a level of similarity between the anonymized feedback item and the verbatim feedback item; providing the anonymized feedback item to an anonymity validation component which generates an anonymity score indicative of whether the anonymized feedback item includes personally identifiable information; in response to the similarity score being within a predefined threshold for the similarity score and the anonymity score indicating that the anonymized feedback does not include personally identifiable information, saving the anonymized feedback item to an anonymized feedback store; and in response to the similarity score not being within the predefined threshold for the similarity score or the anonymity score indicating that the anonymized feedback does include personally identifiable information, discarding the anonymized feedback item without saving the anonymized feedback item in the anonymized feedback