US-20260127158-A1 - MULTIMODAL RETRIEVAL BASED ON RELATIONAL MODELING OF ATOMIC DOCUMENT ELEMENTS
Abstract
An atomic relational retrieval system can determine a type of modality for each document of a plurality of documents having unstructured data. The system can route each document to a parser based on the type of modality. The system can parse at least the unstructured data of each document according to an atomic unit type to extract a plurality of atomic units from the document and a plurality of attributes of each atomic unit. The system can update a table in a relational database to include a record for each atomic unit, the record including a unique identifier of the atomic unit, a document identifier linking the atomic unit to its source document, and the plurality of attributes. The system can output, in response to a request for a chunk of one or more atomic units, at least one record corresponding to the chunk, the chunk is dynamically defined.
Inventors
- Jackson Mostoller
- Parth Anand Jawale
- Isaac Lo
- Ben Barone
Assignees
- Cohere Health, Inc.
Dates
- Publication Date
- 20260507
- Application Date
- 20251031
Claims (20)
- 1 . A system comprising: one or more processors to: receive a plurality of documents comprising unstructured data; determine a type of modality for each document of the plurality of documents; route each document to a corresponding parser based on the type of modality for the document; select an atomic unit type for parsing each document based on the type of modality; parse at least the unstructured data of each document, using the corresponding parser, according to the atomic unit type to extract a plurality of atomic units from the document and a plurality of attributes of each atomic unit of the plurality of atomic units; update a table in a relational database to include a record for each atomic unit of the plurality of atomic units, the record comprising a unique identifier of the atomic unit, a document identifier linking the atomic unit to the document from which the atomic unit is extracted, and the plurality of attributes of the atomic unit; and output, in response to a request for a chunk of one or more atomic units, at least one record corresponding to the chunk, the chunk dynamically defined responsive to the request.
- 2 . The system of claim 1 , wherein the one or more processors are to dynamically define the chunk as a selection of the one or more atomic units based on one or more criteria indicated by the request.
- 3 . The system of claim 1 , wherein the one or more processors are to represent the chunk as a first table comprising one or more chunk-level attributes of the chunk and a second table comprising an identifier of the chunk and the unique identifier of each of the one or more atomic units of the chunk.
- 4 . The system of claim 1 , wherein the one or more processors are to output the chunk, based on the request, to include atomic units of a plurality of types of modalities.
- 5 . The system of claim 1 , wherein: the request is a first request indicating one or more first criteria for selection of the one or more atomic units; and the one or more processors are to output, responsive to a second request indicating one or more second criteria, a subset of the one or more atomic units of the chunk.
- 6 . The system of claim 1 , wherein the one or more processors are to provide, for generation of the request, a function to select the one or more atomic units according to at least one of a content attribute of the one or more atomic units or a metadata attribute of the one or more atomic units.
- 7 . The system of claim 1 , wherein the one or more processors are to output the at least one record to include each of text data and image data.
- 8 . The system of claim 1 , wherein the one or more processors are to generate the plurality of attributes of each atomic unit to include a location of the atomic unit in the document from which the atomic unit is extracted.
- 9 . The system of claim 1 , wherein the plurality of documents comprise a plurality of types of modalities including the type of modality, the plurality of types of modalities including at least a text type and an image type.
- 10 . The system of claim 1 , wherein the one or more processors are to determine the plurality of attributes of each atomic unit to include at least one of a text value or a pixel color of the atomic unit, and at least one of a position or a time stamp of the atomic unit.
- 11 . The system of claim 1 , wherein the atomic unit type comprises a text token type, an image pixel type, or an audio sample type, and the one or more processors are to use the correspond parser to perform tokenization, pixel identification, or audio sampling of the document.
- 12 . The system of claim 1 , wherein the one or more processors are to: determine, based on the request, at least one of a relevance score, an embedding, a text representation, or a bounding box for the chunk.
- 13 . A method comprising: receiving, by one or more processors, a plurality of documents comprising unstructured data; determining, by the one or more processors, a type of modality for each document of the plurality of documents; routing, by the one or more processors, each document to a corresponding parser based on the type of modality for the document; selecting, by the one or more processors, an atomic unit type for parsing each document based on the type of modality; parsing, by the one or more processors, at least the unstructured data of each document, using the corresponding parser, according to the atomic unit type to extract a plurality of atomic units from the document and a plurality of attributes of each atomic unit of the plurality of atomic units; updating, by the one or more processors, a table in a relational database to include a record for each atomic unit of the plurality of atomic units, the record comprising a unique identifier of the atomic unit, a document identifier linking the atomic unit to the document from which the atomic unit is extracted, and the plurality of attributes of the atomic unit; and outputting, by the one or more processors, in response to a request for a chunk of one or more atomic units, at least one record corresponding to the chunk, the chunk dynamically defined responsive to the request.
- 14 . The method of claim 13 , comprising defining the chunk as a selection of the one or more atomic units based on one or more criteria indicated by the request.
- 15 . The method of claim 13 , comprising structuring, by the one or more processors, the chunk as a first table comprising one or more chunk-level attributes of the chunk and a second table comprising an identifier of the chunk and the unique identifier of each of the one or more atomic units of the chunk.
- 16 . The method of claim 13 , wherein: the request is a first request indicating one or more first criteria for selection of the one or more atomic units; and the method comprises outputting, by the one or more processors, responsive to a second request indicating one or more second criteria, a subset of the one or more atomic units of the chunk.
- 17 . The method of claim 13 , comprising providing, by the one or more processors, for generation of the request, a function to select the one or more atomic units according to any of a content attribute of the one or more atomic units or a metadata attribute of the one or more atomic units.
- 18 . The method of claim 13 , comprising generating, by the one or more processors, the plurality of attributes of each atomic unit to include a location of the atomic unit in the document from which the atomic unit is extracted.
- 19 . The method of claim 13 , comprising determining, by the one or more processors, the plurality of attributes of each atomic unit to include at least one of a text value or a pixel color of the atomic unit, and at least one of a position or a time stamp of the atomic unit.
- 20 . A non-transitory computer-readable medium comprising machine-readable instructions that when executed by one or more processors, cause the one or more processors to execute operations comprising: parsing one or more documents, according to one or more modalities of the one or more documents, to extract a plurality of atomic units from the one or more documents and a plurality of attributes of each atomic unit of the plurality of atomic units; updating a database to include a record for each atomic unit of the plurality of atomic units, the record comprising a unique identifier of the atomic unit, a document identifier linking the atomic unit to the document from which the atomic unit is extracted, and the plurality of attributes of the atomic unit; and outputting, based at least on a request for a chunk of one or more atomic units, at least a portion of at least one record corresponding to the chunk.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS The present application claims the benefit of and priority to U.S. Provisional Application No. 63/715,425, filed Nov. 1, 2024, the disclosure of which is incorporated herein by reference in its entirety. BACKGROUND Information retrieval systems are used to manage, store, and retrieve large volumes of digital data from diverse sources. Unstructured data such as text, images, audio, and other multimedia formats often require specialized tools for processing and searching. However, existing systems face difficulties in handling heterogeneous data types, maintaining metadata consistency, and enabling efficient retrieval across different modalities. This can lead to retrieval that lacks in performance in speed, compute requirements, and/or data storage requirements. SUMMARY Systems and methods in accordance with the present disclosure can represent documents and their components as relational data, including by extracting atomic units of data in any of a variety of modalities, and grouping, e.g., chunking, the atomic units into chunks to respond to queries for data retrieval. For example, the system can provide dynamic view-based chunking in which the chunks are provided as views over the atomic units, rather than relying on chunks that are fixed at indexing of the documents. This can allow for variable granularity of retrieval without re-indexing. Metadata, including spatial and semantic annotations, can be associated with atomic units directly, and can be aggregated at the chunk level through relational joins or grouping operations. In response to a query, retrieval operations can be expressed as composable relational expressions that select, filter, or aggregate atomic and chunk-level attributes from a unified multimodal corpus. This can allow for flexible and consistent information access across different data types. The system can allow for multi-stage retrieval operations, which can allow for more efficient retrieval of relevant data. For example, systems and methods as described herein can achieve faster retrieval, including with fewer requirements for intermediate data to be stored or maintained. Systems and methods in accordance with the present disclosure can be applied to retrieval tasks in any of a variety of applications, including but not limited to document generation or processing, classification, clinical workflows, administrative workflows, healthcare operations including prior authorization, scheduling, patient support, clinician support, claims processing, chart or lab processing, report generation, conversational agent management, or various combinations thereof. At least one aspect relates to a system. The system can receive a plurality of documents comprising unstructured data. The system can determine a type of modality for each document of the plurality of documents. The system can route each document to a corresponding parser based on the type of modality for the document. The system can select an atomic unit type for parsing each document based on the type of modality. The system can parse at least the unstructured data of each document according to the atomic unit type to extract a plurality of atomic units from the document and a plurality of attributes of each atomic unit. The system can update a table in a relational database to include a record for each atomic unit, the record including a unique identifier of the atomic unit, a document identifier linking the atomic unit to the document from which the atomic unit is extracted, and the plurality of attributes of the atomic unit. The system can output, in response to a request for a chunk of one or more atomic units, at least one record corresponding to the chunk, where the chunk is dynamically defined responsive to the request. In some implementations, the system can dynamically define the chunk as a selection of one or more atomic units based on one or more criteria indicated by the request. In some implementations, the system can represent the chunk as a first table comprising one or more chunk-level attributes of the chunk and a second table comprising an identifier of the chunk and the unique identifier of each atomic unit of the chunk. In some implementations, the system can output the chunk, based on the request, to include atomic units of a plurality of modalities. In some implementations, the request can be a first request indicating one or more first criteria for selection of atomic units, and the system can output responsive to a second request indicating one or more second criteria, a subset of the atomic units of the chunk. In some implementations, the system can provide, for generation of the request, a function to select atomic units according to a content attribute or a metadata attribute of the atomic units. In some implementations, the system can output the record to include both text data and image data. In some implementations, the system can generate the plurality of attri