US-12619827-B2 - System and method for intelligent generation of privilege logs
Abstract
Systems, methods, and computer readable media for intelligent generation of a privilege log are provided. These techniques may include accessing a corpus of documents and applying an unsupervised machine learning model thereto to identify a plurality of topics. Using the identified topics, the techniques include associating a plurality of categories with the identified topics and executing a classifier training model to train classifiers corresponding to the categories. These classifiers are then applied to the corpus of documents to label the documents. Subsequently, the techniques automatically generate a privilege log based upon the labels applied to the documents by the classifiers.
Inventors
- Jasneet Singh Sabharwal
- Somya Anand
- Ayushi Dalmia
Assignees
- RELATIVITY ODA LLC
Dates
- Publication Date
- 20260505
- Application Date
- 20230406
Claims (14)
- 1 . A computer-implemented method for intelligent generation of a privilege log, the method comprising: accessing, by one or more processors, a corpus of documents, wherein the corpus of documents includes a first type of document and a second type of document; applying, by the one or more processors, an unsupervised machine learning model to the corpus of documents to identify a plurality of topics associated with the corpus of documents; associating, by the one or more processors, a plurality of categories with respective subsets of the plurality of topics; executing, by the one or more processors, a classifier training model to train a plurality of classifiers corresponding to the plurality of categories, wherein executing the classifier training model to train a classifier corresponding to a category comprises executing a first classifier training model to train a first classifier for the category; and executing a second classifier training model to train a second classifier for the category; detecting, by the one or more processors, that, for the first type of document, a performance metric for the first classifier for the category is greater than the performance metric for the second classifier for the category; detecting, by the one or more processors, that, for the second type of document, the performance metric for the second classifier for the category is greater than the performance metric for the first classifier for the category; applying, by the one or more processors, the first classifier for the category to documents of the first type of document; applying, by the one or more processors, the second classifier for the category to documents of the second type of document; and generating, by the one or more processors, a privilege log based upon the classifiers applied to documents in the corpus of documents.
- 2 . The computer-implemented method of claim 1 , wherein identifying a topic of the plurality of topics comprises: identifying, by the one or more processors, a cluster in a conceptual space generated by the unsupervised machine learning model.
- 3 . The computer-implemented method of claim 2 , further comprising: determining, by the one or more processors, that a first cluster and a second cluster exhibit a threshold amount of overlap; and corresponding, by the one or more processors, the topic of the plurality of topics to both of the first cluster and the second cluster.
- 4 . The computer-implemented method of claim 1 , further comprising: detecting, by the one or more processors, that a performance metric for the first classifier for the category is greater than a performance metric for the second classifier for the category; and selecting, by the one or more processors, the first classifier for the category to be the classifier corresponding to the category.
- 5 . The computer-implemented method of claim 1 , further comprising: presenting, by the one or more processors, a user interface that enables a user to (i) modify the categories included in the plurality of categories and/or (ii) define a rule that documents must satisfy to be associated with the category.
- 6 . The computer-implemented method of claim 1 , wherein executing the classifier training model to train the plurality of classifiers comprises: generating, by the one or more processors, a seed set of documents from the corpus of documents, wherein the seed set of documents include a threshold number of documents associated with each topic in the plurality of topics; and executing, by the one or more processors, the classifier training model on the seed set of documents.
- 7 . The computer-implemented method of claim 1 , wherein generating the privilege log comprises: inputting, by the one or more processors, a labeled document into a generative artificial intelligence model to generate a natural language description associated with the labeled document's inclusion in the privilege log.
- 8 . A system for intelligent generation of a privilege log, the system comprising: one or more processors; a communication interface communicatively coupled to a document storage system storing a corpus of documents; and one or more memories storing non-transitory, computer-readable instructions that, when executed by the one or more processors, cause the system to: access, via the communication interface, the corpus of documents, wherein the corpus of documents includes a first type of document and a second type of document; apply an unsupervised machine learning model to the corpus of documents to identify a plurality of topics associated with the corpus of documents; associate a plurality of categories with respective subsets of the plurality of topics; execute a classifier training model to train a plurality of classifiers respectively corresponding to categories in the plurality of categories, wherein to execute the classifier training model to train a classifier corresponding to a category comprises executing a first classifier training model to train a first classifier for the category and executing a second classifier training model to train a second classifier for the category; detect that, for the first type of document, a performance metric for the first classifier for the category is greater than the performance metric for the second classifier for the category; detect that, for the second type of document, the performance metric for the second classifier for the category is greater than the performance metric for the first classifier for the category; apply the first classifier for the category to documents that are the first type of document; apply the second classifier for the category to documents that are the second type of document; and generate a privilege log based upon the classifiers applied to documents in the corpus of documents.
- 9 . The system of claim 8 , wherein to identify a topic of the plurality of topics, the instructions, when executed, cause the system to: identify a cluster in a conceptual space generated by the unsupervised machine learning model.
- 10 . The system of claim 9 , wherein the instructions, when executed, cause the system to: determine that a first cluster and a second cluster exhibit a threshold amount of overlap; and correspond the topic of the plurality of topics to both of the first cluster and the second cluster.
- 11 . The system of claim 8 , wherein the instructions, when executed, cause the system to: detect that a performance metric for the first classifier for the category is greater than a performance metric for the second classifier for the category; and select the first classifier for the category to be the classifier corresponding to the category.
- 12 . The system of claim 8 , wherein to execute the classifier training model to train the plurality of classifiers, the instructions, when executed, cause the system to: generate a seed set of documents from the corpus of documents, wherein the seed set of documents include a threshold number of documents associated with each topic in the plurality of topics; and execute the classifier training model on the seed set of documents.
- 13 . The system of claim 8 , wherein to generate the privilege log, the instructions, when executed, cause the system to: input a labeled document into a generative artificial intelligence model to generate a natural language description associated with the labeled document's inclusion in the privilege log.
- 14 . A non-transitory computer-readable storage medium storing processor-executable instructions, that when executed cause one or more processors to: access a corpus of documents, wherein the corpus of documents includes a first type of document and a second type of document; apply an unsupervised machine learning model to the corpus of documents to identify a plurality of topics associated with the corpus of documents; execute a classifier training model to train a plurality of classifiers respectively corresponding to topics in the plurality of topics, wherein executing the classifier training model to train a classifier corresponding to a category comprises executing a first classifier training model to train a first classifier for the category and executing a second classifier training model to train a second classifier for the category; detect that, for the first type of document, a performance metric for the first classifier for the category is greater than the performance metric for the second classifier for the category; detect that, for the second type of document, the performance metric for the second classifier for the category is greater than the performance metric for the first classifier for the category; apply the first classifier for the category to documents that are the first type of document; apply the second classifier for the category to documents that are the second type of document; and generate a privilege log by applying a set of rules that utilizes the classifiers applied to documents in the corpus of documents.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Application 63/327,989, entitled “SYSTEM AND METHOD FOR INTELLIGENT GENERATION OF PRIVILEGE LOGS,” filed on Apr. 6, 2022, the disclosure of which is hereby incorporated herein by reference. FIELD OF THE DISCLOSURE The present disclosure generally relates to the intelligent generation of privilege logs and, more specifically, to applying machine learning techniques to improve the accuracy of privilege logs generated for a corpus of documents. BACKGROUND In various applications, a need exists to identify a set of documents within a corpus of documents that are subject to one or more privilege rules. For example, during the discovery process for a litigation, a producing party is required to produce a corpus of documents that meets the discovery conditions. Within this corpus of documents, individual documents may be covered by one or more privileges, such as attorney-client privilege, attorney work product privilege, confidential data, and/or other types of privilege. Privileged documents need not be produced by the producing party. A privilege log is a document that indicates which documents are withheld from discovery and the particular reasoning why the document is subject to a privilege claim. This privilege log enables the requesting party to review the privilege claims made by the producing party. In many discovery processes, the corpus of documents that meet the discovery request is voluminous, often exceeding millions of documents. Thus, manual review of the corpus of documents is often unable to produce a privilege log in a timely manner. Accordingly, automated techniques are often applied to identify the privileged documents without significantly delaying the legal process. With the introduction of automated processes, it is important to ensure that automated processes are applied in a manner that accurately reflects the privilege claims. As a result, there is a need to develop intelligent privilege log generation techniques that improve the ability of automated systems to accurately identify privileged documents within a corpus of documents, thereby improving the functionality of the automated privilege log generation computing system itself. BRIEF SUMMARY In one aspect, a computer-implemented method for intelligent generation of a privilege log is provided. The method includes (1) accessing, by one or more processors, a corpus of documents; (2) applying, by the one or more processors, an unsupervised machine learning model to the corpus of documents to identify a plurality of topics associated with the corpus of documents; (3) executing, by the one or more processors, a classifier training model to train a plurality of classifiers respectively corresponding to topics in the plurality of topics; (4) applying, by the one or more processors, the classifiers to documents in the corpus of documents; and (6) generating, by the one or more processors, a privilege log by applying a set of rules that utilizes the classifiers applied to documents in the corpus of documents. In another aspect, a system for intelligent generation of a privilege log is provided. The system includes (i) one or more processors; (ii) a communication interface communicatively coupled to a document storage system storing a corpus of documents; and (iii) one or more memories storing non-transitory, computer-readable instructions. The instructions, when executed by the one or more processors, cause the system to (1) access, via the communication interface, the corpus of documents; (2) apply an unsupervised machine learning model to the corpus of documents to identify a plurality of topics associated with the corpus of documents; (3) execute a classifier training model to train a plurality of classifiers respectively corresponding to topics in the plurality of topics; (4) apply the classifiers to documents in the corpus of documents; and (5) generate a privilege log by applying a set of rules that utilizes the classifiers applied to documents in the corpus of documents. In another aspect, a non-transitory computer-readable storage medium storing processor-executable instructions is provided. The instructions, when executed cause one or more processors to (1) access a corpus of documents; (2) apply an unsupervised machine learning model to the corpus of documents to identify a plurality of topics associated with the corpus of documents; (3) execute a classifier training model to train a plurality of classifiers respectively corresponding to topics in the plurality of topics; (4) apply the classifiers to documents in the corpus of documents; and (5) generate a privilege log by applying a set of rules that utilizes the classifiers applied to documents in the corpus of documents. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 depicts an example computing process in which a corpus of electronic communication documents is analyzed to produce a privileg