US-12625874-B2 - System, method, and computer program product for searching a plurality of documents based on a text string

US12625874B2US 12625874 B2US12625874 B2US 12625874B2US-12625874-B2

Abstract

Provided are systems, methods, and computer program products for searching a plurality of documents based on a text string. The system includes at least one processor programmed or configured to identify a plurality of documents including a plurality of document types, each document of the plurality of documents including a document type, receive a text string based on user input, generate, with a machine-learning model, an ordered list of document types based the text string, search the plurality of documents for the text string to identify a subset of documents based on similarity between the text string and each document of the subset of documents, rank the subset of documents based at least partially on the similarity, a document type of each document of the subset of documents, and the ordered list of document types, and generate a graphical user interface based on the ranked list of documents.

Inventors

Jacqueline Grace Schafer
Jose Demetrio Saura

Assignees

CLEARBRIEF, INC.

Dates

Publication Date: 20260512
Application Date: 20241104

Claims (20)

1 . A system comprising at least one processor programmed or configured to: identify a plurality of documents comprising a plurality of document types, each document of the plurality of documents comprising a document type of the plurality of document types; receive a text string based on user input; search the plurality of documents for the text string to identify a subset of documents based on similarity between the text string and each document of the subset of documents; rank the subset of documents based at least partially on the similarity between the text string and each document of the subset of documents, a document type of each document of the subset of documents, and an ordered list of document types to generate a ranked list of documents; and generate a graphical user interface based on the ranked list of documents.
2 . The system of claim 1 , wherein the subset of documents is ranked based at least partially on a weight of a document type of each document of the subset of documents.
3 . The system of claim 2 , wherein the at least one processor is further configured to: apply the weight of each document to the ranked subset of documents such that an order of the ranked subset of documents changes.
4 . The system of claim 2 , wherein the document type of each document is based on metadata associated with the document.
5 . The system of claim 1 , wherein the ordered list of document types is output by a model based on inputting the text string into the model.
6 . The system of claim 1 , wherein the ordered list of document types comprises a prioritized list of document types, and wherein a highest priority document type of the prioritized list is a closest match to the text string.
7 . The system of claim 1 , wherein the at least one processor is further configured to: determine the document type for each document of the plurality of documents by processing each document with a classification model.
8 . The system of claim 1 , wherein the ordered list of document types is determined based on an assertion in the text string.
9 . The system of claim 1 , wherein ranking the subset of documents comprises: ranking the subset of documents in an order based on a similarity score of the similarity between the text string and each document of the subset of documents; and re-ranking the subset of documents based on applying a weight to each document based on the document type and the ordered list of documents.
10 . A method comprising: identifying, with at least one processor, a plurality of documents comprising a plurality of document types, each document of the plurality of documents comprising a document type of the plurality of document types; receiving, with at least one processor, a text string based on user input; searching, with at least one processor, the plurality of documents for the text string to identify a subset of documents based on similarity between the text string and each document of the subset of documents; ranking, with at least one processor, the subset of documents based at least partially on the similarity between the text string and each document of the subset of documents, a document type of each document of the subset of documents, and an ordered list of document types to generate a ranked list of documents; and generating, with at least one processor, a graphical user interface based on the ranked list of documents.
11 . The method of claim 10 , wherein the subset of documents is ranked based at least partially on a weight of a document type of each document of the subset of documents.
12 . The method of claim 11 , further comprising: applying the weight of each document to the ranked subset of documents such that an order of the ranked subset of documents changes.
13 . The method of claim 11 , wherein the document type of each document is based on metadata associated with the document.
14 . The method of claim 10 , wherein the ordered list of document types is output by a model based on inputting the text string into the model.
15 . The method of claim 10 , wherein the ordered list of document types comprises a prioritized list of document types, and wherein a highest priority document type of the prioritized list is a closest match to the text string.
16 . The method of claim 10 , further comprising: determining the document type for each document of the plurality of documents by processing each document with a classification model.
17 . The method of claim 10 , wherein the ordered list of document types is determined based on an assertion in the text string.
18 . The method of claim 10 , wherein ranking the subset of documents comprises: ranking the subset of documents in an order based on a similarity score of the similarity between the text string and each document of the subset of documents; and re-ranking the subset of documents based on applying a weight to each document based on the document type and the ordered list of documents.
19 . A computer program product comprising at least one non-transitory computer-readable medium including program instructions that, when executed by at least one processor, cause the at least one processor to: identify a plurality of documents comprising a plurality of document types, each document of the plurality of documents comprising a document type of the plurality of document types; receive a text string based on user input; search the plurality of documents for the text string to identify a subset of documents based on similarity between the text string and each document of the subset of documents; rank the subset of documents based at least partially on the similarity between the text string and each document of the subset of documents, a document type of each document of the subset of documents, and an ordered list of document types to generate a ranked list of documents; and generate a graphical user interface based on the ranked list of documents.
20 . The computer program product of claim 19 , wherein the subset of documents is ranked based at least partially on a weight of a document type of each document of the subset of documents.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/754,249, filed on Jun. 26, 2024, which claims the benefit of U.S. Provisional Patent Application No. 63/523,167, filed on Jun. 26, 2023, the disclosures of which are hereby incorporated by reference in their entireties. BACKGROUND 1. Field This disclosure relates generally to document processing and, in non-limiting embodiments or aspects, systems, methods, and computer program products for searching a plurality of documents based on a text string. 2. Technical Considerations When drafting a document, such as a legal brief, it is common for authors to omit citations to sources, particularly factual sources, to avoid disrupting their writing. In some examples, a user may note the need for a citation but will plan to add it at a later time, after the brief is drafted or the like. For example, an author may type an assertion that they recall is supported by a factual document, such as a deposition transcript, but may not recall which document includes the support. Existing word processing applications operate independently of source documents that correspond to assertions in a textual document being authored within the word processing application. As a result, authors must operate several additional software applications and/or computing devices to locate and cite to source documents. Further, finding an associated document to cite can be difficult and resource-intensive, especially in examples in which a quotation or portion thereof appears in several different documents. SUMMARY According to non-limiting embodiments or aspects, provided is a system comprising at least one processor programmed or configured to: identify a plurality of documents comprising a plurality of document types, each document of the plurality of documents comprising a document type of the plurality of document types; receive a text string based on user input; generate, with a machine-learning model, an ordered list of document types based the text string; search the plurality of documents for the text string to identify a subset of documents based on similarity between the text string and each document of the subset of documents; rank the subset of documents based at least partially on the similarity, a document type of each document of the subset of documents, and the ordered list of document types; and generate a graphical user interface based on the ranked list of documents. In non-limiting embodiments or aspects, wherein receiving the text string comprises identifying a selected portion of a textual document, the text string comprising the selected portion of the textual document and/or a portion of the textual document related to the selected portion. In non-limiting embodiments or aspects, wherein generating the ordered list of document types based one the text string comprises: determining at least one assertion based on the text string; and inputting a vector representing the at least one assertion into the machine-learning model, the machine-learning model configured to output the ordered list of document types. In non-limiting embodiments or aspects, wherein each document of the plurality of documents comprises metadata comprising the document type. In non-limiting embodiments or aspects, the at least one processor is further programmed or configured to: generate a citation based on a document from the ranked list of documents. In non-limiting embodiments or aspects, the plurality of document types comprises at least one of the following: court orders, transcripts, briefs, pleadings, or any combination thereof. In non-limiting embodiments or aspects, the at least one processor is further programmed or configured to: classify, with at least one classification model, each document of the plurality of documents, such that each document is classified as at least one document type. In non-limiting embodiments or aspects, wherein receiving the text string comprises: receiving the user input; identifying the text string based on the user input; and pre-processing the text string. According to non-limiting embodiments or aspects, provided is a computer-implemented method comprising: identifying, with at least one processor, a plurality of documents comprising a plurality of document types, each document of the plurality of documents comprising a document type of the plurality of document types; receiving, with at least one processor, a text string based on user input; generating, with at least one processor and a machine-learning model, an ordered list of document types based the text string; searching, with at least one processor, the plurality of documents for the text string to identify a subset of documents based on similarity between the text string and each document of the subset of documents; ranking, with at least one processor, the subset of documents based at least partially on the similarity, a document type of