US-12626060-B2 - Systems and methods for facilitating text analysis

US12626060B2US 12626060 B2US12626060 B2US 12626060B2US-12626060-B2

Abstract

A system for facilitating text analysis is configurable to (i) receive input text data comprising a set of reference text and at least a first set of text, wherein the set of reference text and the first set of text each comprise structured components; process the input text data utilizing a syntax and verb usage module of a natural language processing (NLP) layer; generate a mapping of structured components of the first set of text to structured components of the set of reference text by processing output of the syntax and verb usage module utilizing a similarity analysis module or a categorization module of the NLP layer; and generate an output depicting one or more aspects of the mapping.

Inventors

JoAnna Jean Butler
Casey Hatcher Cooper
Jaden Kadesh Flint
Jason LeVoy Rogers
Keith Boyd Zook
Kyle Jordan RUSSELL

Assignees

INTUITIVE RESEARCH AND TECHNOLOGY CORPORATION

Dates

Publication Date: 20260512
Application Date: 20230818

Claims (20)

1 . A system for facilitating text analysis, comprising: one or more processors; and one or more hardware storage devices that store instructions that are executable by the one or more processors to configure the system to: process input text data utilizing a syntax and verb usage module of a natural language processing (NLP) layer, wherein the input text data comprises a set of reference text and at least a first set of text, wherein the set of reference text and the first set of text each comprise structured components; utilize output of the syntax and verb usage module as input to a categorization module of the NLP layer, wherein the categorization module is configured to map structured components of the first set of text to structured components of the set of reference text by generating embeddings based on the output of the syntax and verb usage module and categorizing the output of the syntax and verb usage module relative to predefined groupings determined via the categorization module based on the set of reference text; utilize the output of the syntax and verb usage module as input to a similarity analysis module of the NLP layer, wherein the similarity analysis module is configured to map structured components of the first set of text to structured components of the set of reference text by comparing, in an embedding space, (i) embeddings generated via the similarity analysis module based on the output of the syntax and verb usage module and (ii) embeddings generated via the similarity analysis module based on the set of reference text; fuse output of the categorization module and the similarity analysis module to generate a mapping of structured components of the first set of text to structured components of the set of reference text; generate and present, on a user interface, (i) an output depicting one or more aspects of the mapping and (ii) a prompt, wherein the output depicts, for at least one structured component of the set of reference text, an indication of one or more structured components of the first set of text that are mapped to the at least one structured component, and wherein the prompt is associated in the user interface with the indication of the one or more structured components of the first set of text; receive user input directed to the prompt, wherein the user input indicates rejection of the mapping of the one or more structured components of the first set of text to the at least one structured component of the set of reference text; and tune one or more modules of the NLP layer based on the user input.
2 . The system of claim 1 , wherein the structured components of the set of reference text and the first set of text comprise sentences.
3 . The system of claim 1 , wherein the set of reference text comprises one or more reference documents, and wherein the first set of text comprises one or more first documents.
4 . The system of claim 3 , wherein the structured components of the set of reference text indicate one or more requirements, and wherein the mapping comprises a mapping of structured components of the one or more first documents to the one or more requirements of the set of reference text.
5 . The system of claim 1 , wherein the input text data comprises output of one or more preprocessing modules.
6 . The system of claim 1 , wherein the output of the syntax and verb usage module is further processed by a topic analysis module prior to generating the mapping.
7 . The system of claim 1 , wherein the indication of the one or more structured components indicates a quantity or a confidence level associated with the one or more structured components.
8 . The system of claim 7 , wherein the confidence level comprises a normalized confidence score.
9 . The system of claim 1 , wherein the indication of the one or more structured components indicates text content or a location associated with the one or more structured components.
10 . The system of claim 1 , wherein the input text data further comprises a second set of text that comprises structured components, and wherein the instructions are executable by the one or more processors to further configure the system to: generate a second mapping of structured components of the second set of text to structured components of the set of reference text by processing output of the syntax and verb usage module utilizing the similarity analysis module or the categorization module of the NLP layer.
11 . The system of claim 10 , wherein the output depicting one or more aspects of the mapping further depicts one or more aspects of the second mapping.
12 . A method for facilitating text analysis, comprising: processing input text data utilizing a syntax and verb usage module of a natural language processing (NLP) layer, wherein the input text data comprises a set of reference text and at least a first set of text, wherein the set of reference text and the first set of text each comprise structured components; utilizing output of the syntax and verb usage module as input to a categorization module of the NLP layer, wherein the categorization module is configured to map structured components of the first set of text to structured components of the set of reference text by generating embeddings based on the output of the syntax and verb usage module and categorizing the output of the syntax and verb usage module relative to predefined groupings determined via the categorization module based on the set of reference text; utilizing the output of the syntax and verb usage module as input to a similarity analysis module of the NLP layer, wherein the similarity analysis module is configured to map structured components of the first set of text to structured components of the set of reference text by comparing, in an embedding space, (i) embeddings generated via the similarity analysis module based on the output of the syntax and verb usage module and (ii) embeddings generated via the similarity analysis module based on the set of reference text; fusing output of the categorization module and the similarity analysis module to generate a mapping of structured components of the first set of text to structured components of the set of reference text; generating and presenting, on a user interface, (i) an output depicting one or more aspects of the mapping and (ii) a prompt, wherein the output depicts, for at least one structured component of the set of reference text, an indication of one or more structured components of the first set of text that are mapped to the at least one structured component, and wherein the prompt is associated in the user interface with the indication of the one or more structured components of the first set of text; receiving user input directed to the prompt, wherein the user input indicates rejection of the mapping of the one or more structured components of the first set of text to the at least one structured component of the set of reference text; and tuning one or more modules of the NLP layer based on the user input.
13 . The method of claim 12 , wherein the structured components of the set of reference text and the first set of text comprise sentences.
14 . The method of claim 12 , wherein the set of reference text comprises one or more reference documents, and wherein the first set of text comprises one or more first documents.
15 . The method of claim 14 , wherein the structured components of the set of reference text indicate one or more requirements, and wherein the mapping comprises a mapping of structured components of the one or more first documents to the one or more requirements of the set of reference text.
16 . The method of claim 12 , wherein the input text data comprises output of one or more preprocessing modules.
17 . The method of claim 12 , wherein the output of the syntax and verb usage module is further processed by a topic analysis module prior to generating the mapping.
18 . The method of claim 12 , wherein the indication of the one or more structured components indicates a quantity or a confidence level associated with the one or more structured components.
19 . The method of claim 12 , wherein the indication of the one or more structured components indicates text content or a location associated with the one or more structured components.
20 . One or more hardware storage devices that store instructions that are executable by one or more processors of a system to configure the system to: process input text data utilizing a syntax and verb usage module of a natural language processing (NLP) layer, wherein the input text data comprises a set of reference text and at least a first set of text, wherein the set of reference text and the first set of text each comprise structured components; utilize output of the syntax and verb usage module as input to a categorization module of the NLP layer, wherein the categorization module is configured to map structured components of the first set of text to structured components of the set of reference text by generating embeddings based on the output of the syntax and verb usage module and categorizing the output of the syntax and verb usage module relative to predefined groupings determined via the categorization module based on the set of reference text; utilize the output of the syntax and verb usage module as input to a similarity analysis module of the NLP layer, wherein the similarity analysis module is configured to map structured components of the first set of text to structured components of the set of reference text by comparing, in an embedding space, (i) embeddings generated via the similarity analysis module based on the output of the syntax and verb usage module and (ii) embeddings generated via the similarity analysis module based on the set of reference text; fuse output of the categorization module and the similarity analysis module to generate a mapping of structured components of the first set of text to structured components of the set of reference text; generate and present, on a user interface, (i) an output depicting one or more aspects of the mapping and (ii) a prompt, wherein the output depicts, for at least one structured component of the set of reference text, an indication of one or more structured components of the first set of text that are mapped to the at least one structured component, and wherein the prompt is associated in the user interface with the indication of the one or more structured components of the first set of text; receive user input directed to the prompt, wherein the user input indicates rejection of the mapping of the one or more structured components of the first set of text to the at least one structured component of the set of reference text; and tune one or more modules of the NLP layer based on the user input.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority to U.S. Provisional Patent Application No. 63/420,501, filed on Oct. 28, 2022, and entitled “SYSTEMS AND METHODS FOR FACILITATING TEXT ANALYSIS,” the entirety of which is incorporated herein by reference. BACKGROUND With recent technological advancements, electronic storage of data is ubiquitous and readily utilizable by individuals and enterprises/organizations. The accessibility/usability of electronic data storage has given rise to the use/occurrence of voluminous bodies of electronically stored text data in various contexts. Consequently, many individuals/entities are expected to analyze, interpret, implement, or otherwise interact with and/or act upon large quantities of electronically stored text data. Such interactions with large quantities of text data can occur in various endeavors, such as technological, commercial, regulatory/legal, research, and/or other endeavors. However, interacting with and/or acting upon large quantities of text data is associated with many challenges, such as being time-consuming and/or complex, which can give rise to errors. Furthermore, in many instances, multiple individuals/entities collaborate to interact with and/or act upon one or more bodies of text data. Different individuals/entities typically have different perspectives, paradigms, and/or biases that can give rise to inconsistent and/or unpredictable results from analysis of the same body of text data. The subject matter claimed herein is not limited to embodiments that solve any challenges or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced. BRIEF DESCRIPTION OF THE DRAWINGS In order to describe the manner in which the above-recited and other advantages and features can be obtained, a more particular description of the subject matter briefly described above will be rendered by reference to specific embodiments which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments and are not therefore to be considered to be limiting in scope, embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which: FIG. 1 illustrates example components of a text analysis system, in accordance with implementations of the present disclosure. FIGS. 2, 3, and 4 illustrate example conceptual representations of output generated by a text analysis system, in accordance with implementations of the present disclosure. FIGS. 5 and 6 illustrate example flow diagrams depicting acts associated with analyzing text utilizing a text analysis system, in accordance with implementations of the present disclosure. FIG. 7 illustrates an example system that may comprise or implement one or more disclosed embodiments. DETAILED DESCRIPTION Disclosed embodiments are directed to systems, methods, devices, and/or techniques for facilitating text analysis. As noted above, interacting with and/or acting upon one or more large bodies of text data is associated with many challenges. For example, in the domain of model-based systems engineering (MB SE), systems are often associated with a high level of complexity, with numerous requirements (e.g., on the level of thousands, which can relate to performance characteristics, functions, structure/physical architecture, etc.), interfaces, documentation, etc. Conventionally, systems engineers manually examine documents that detail myriad system requirements and manually categorize the requirements (e.g., into functional and non-functional categories) and trace the requirements to physical and/or functional architecture. Such manual analysis of large bodies of systems requirements can give rise to human errors and can amount to an arduous and time-consuming process. As another example, in the domain of litigation discovery, the aforementioned ubiquity of electronically stored data has given rise to an increase in the scope of text data that is potentially relevant to legal disputes (e.g., electronic communications, internal documents, social media content, etc.). Thus, litigants responding to discovery requests and/or receiving discovery materials often have significant amounts of text data to examine for relevance. Conventionally, law practitioners manually examine documents to sift out irrelevant information and find relevant information in a case-specific manner. Natural language processing (NLP) techniques implement artificial intelligence (AI) and/or machine learning (ML) technologies to configure algorithms to analyze written human languages. NLP modules are useful for performing particular interpretive tasks with respect to text data, such as determining the meaning, sentiment, and/or connotation of words and/or phrases. Conventional NLP techniques, however, ar