Search

EP-4740107-A2 - SYSTEMS AND METHODS FOR DEVELOPING A KNOWLEDGE BASE COMPRISED OF DATA COLLECTED FROM MYRIAD SOURCES

EP4740107A2EP 4740107 A2EP4740107 A2EP 4740107A2EP-4740107-A2

Abstract

A system for building a knowledge base is provided. The system includes a processor. The system further includes a non-transitory computer-readable storage medium containing instructions which, when executed on the processor, causes the system to perform operations. The operations include receiving multi-modal data from one or more sources; analyzing the data to determine features; storing the features in a database; receiving a search query for searching the stored features; analyzing the search query using a large language model to extract search features; generating search results from the search features; and displaying search results on a standardized graphical user interface including a legend having at least one or more of the search features displayed.

Inventors

  • CORREDOR ORTEGA, Oscar David
  • KEENAN, Henry Forsyth
  • VALENCIA DUQUE, Andrés Felipe
  • MARTÍNEZ CASTILLO, Juan David
  • DOMINGUEZ ROSALES, Alejandro
  • PÉREZ BURITICÁ, Andrés
  • MARTINEZ, JOSE

Assignees

  • Red Atlas Inc.

Dates

Publication Date
20260513
Application Date
20240703

Claims (20)

  1. 1. A system for building a knowledge base, the system comprising a processor and a non-transitory computer readable medium storing instructions such that when the instructions are executed by the processor, the system is configured to: receive multi-modal data from one or more sources; analyze the data to determine features; store the features in a database; receive a search query for searching the stored features; analyze the search query using a large language model to extract search features; generate search results from the search features; and display search results on a standardized graphical user interface including a legend having at least one or more of the search features displayed.
  2. 2. The system of claim 1, wherein the multi-modal data includes video, unstructured text, structured text, and/or images.
  3. 3. The system of claim 1, wherein the multi-modal data includes description of listings, documents from insurance companies, documents relating to mortgages, documents relating to banking, documents relating to student loans, documents relating to academics, or any combination thereof.
  4. 4. The system of claim 1, wherein the system is configured to analyze the data to determine features by: receiving a ranking of questions for extracting a first feature, the ranking of questions including at least two questions associated with extracting the first feature; and extracting the first feature using the large language model, wherein inputs to the large language model include (i) one of the at least two questions and (ii) the multi-modal data.
  5. 5. The system of claim 4, wherein the at least two questions include a key and the first feature extracted includes a value associated with the key.
  6. 6. The system of claim 4, wherein the ranking of questions is based on historical information associated with a performance metric of the large language model, the historical information including an upvote received from a client device.
  7. 7. The system of claim 1, wherein the system is configured to analyze the data to determine features by: determining a keyword from the multi-modal data using the large language model; generating a question from the keyword; and extracting a first feature using the large language model, wherein inputs to the large language model include (i) the generated question and (ii) the multi-modal data.
  8. 8. The system of claim 7, wherein the system is configured to generate a confidence level associated with the extracted first feature.
  9. 9. The system of claim 8, wherein the confidence level is based on a magnitude associated with the extracted first feature.
  10. 10. The system of claim 1, wherein the feature is stored as a key -value pair, the key identifying a category associated with the feature and the value indicating a property associated with the category.
  11. 11. The system of claim 10, wherein the key is a number of bedrooms and the value is an integer.
  12. 12. The system of claim 1, wherein the search query is a sentence, a sentence fragment, or a paragraph.
  13. 13. A system for building a knowledge base, the system comprising a processor and a non-transitory computer readable medium storing instructions such that when the instructions are executed by the processor, the system is configured to: receive multi-modal data from one or more sources, the multi-modal data including text data; analyze the data to determine features; store the features in a database; receive a search query for searching the stored features; analyze the search query using a large language model to extract search features; generate search results from the search features; and display search results on a standardized graphical user interface including a legend having at least one or more of the search features displayed.
  14. 14. The system of claim 13, wherein the multi-modal data includes video, unstructured text, structured text, and/or images.
  15. 15. The system of claim 13, wherein the multi-modal data includes images and text and wherein the system is configured to analyze the data to determine features includes: determining image features from images associated with a first item; determining text features from text associated with the first item; comparing the text features to the image features to determine whether there are categories in the image features missing in the text features; and in response to determining that there are categories in the image features missing in the text features, store the missing image features in the database.
  16. 16. The system of claim 15, wherein the system is configured to analyze the data to determine features further includes: in response to determining that there are categories in the image features present in the text features, store the text features in the database.
  17. 17. The system of claim 15, wherein a property associated with the missing image features is a Boolean value.
  18. 18. The system of claim 15, wherein a confidence level is associated with the missing image features.
  19. 19. The system of claim 13, wherein the system is further configured to: train an image model to extract image features.
  20. 20. The system of claim 13, wherein the system is further configured to: receive an updated image associated with a first item; determine image features associated with the first item; compare stored features in the database against the image features to determine discrepancies in the stored features associated with the first item and the image features; and in response to determining that there is at least one discrepancy, resolve the at least one discrepancy.

Description

SYSTEMS AND METHODS FOR DEVELOPING A KNOWLEDGE BASE COMPRISED OF DATA COLLECTED FROM MYRIAD SOURCES CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/511,773, filed July 3, 2023, which is hereby incorporated by reference herein in its entirety. TECHNICAL FIELD [0002] The present disclosure relates to information extraction and information management systems, and more specifically, to systems and methods for enriching databases with information from various data sources, including unstructured and incomplete data sources. BACKGROUND [0003] Information can be gathered at different granularity. Institutions rely on collected information to make decisions, for example, decisions relating to technology development, business, research, hiring, investments, etc. Similarly, individuals rely on information to make decisions as well, albeit on a smaller scale. With information explosion due to democratization of knowledge discovery, important data or material can be discovered in a first camp and yet is unavailable in a second camp. Individuals or institutions in the first camp can take advantage of the discovered data or material in making decisions, but individuals or institutions in the second camp cannot. Asymmetry in knowledge bases between the two camps illustrates that comparatively, the camp with access to incomplete information may be at a disadvantage. Alternatively, even when knowledge bases are similar between two camps, inability of one camp to incorporate available data can leave that camp at a disadvantage. In some situations, a camp is unable to incorporate available data due to the sheer size of data being generated by the information explosion. The present disclosure provides systems and methods for building knowledge bases and enriching these knowledge bases over time to improve accuracy and better inform decision making processes. SUMMARY [0004] The term embodiment and like terms, e.g., implementation, configuration, aspect, example, and option, are intended to refer broadly to all of the subject matter of this disclosure and the claims below. Statements containing these terms should be understood not to limit the subject matter described herein or to limit the meaning or scope of the claims below. Embodiments of the present disclosure covered herein are defined by the claims below, not this summary. This summary is a high-level overview of various aspects of the disclosure and introduces some of the concepts that are further described in the Detailed Description section below. This summary is not intended to identify key or essential features of the claimed subject matter. This summary is also not intended to be used in isolation to determine the scope of the claimed subject matter. The subject matter should be understood by reference to appropriate portions of the entire specification of this disclosure, any or all drawings, and each claim. [0005] According to certain aspects of the present disclosure, a system for building a knowledge base is provided. The system includes a processor. The system further includes a non-transitory computer-readable storage medium containing instructions which, when executed on the processor, causes the system to perform operations. The operations include receiving multi-modal data from one or more sources; analyzing the data to determine features; storing the features in a database; receiving a search query for searching the stored features; analyzing the search query using a large language model to extract search features; generating search results from the search features; and displaying search results on a standardized graphical user interface including a legend having at least one or more of the search features displayed. [0006] According to certain aspects of the present disclosure, a system for building a knowledge base is provided. The system includes a processor. The system further includes a non-transitory computer-readable storage medium containing instructions which, when executed on the processor, causes the system to perform operations. The operations include receiving multi-modal data from one or more sources, the multi-modal data including text data; analyzing the data to determine features; associating the determined features with source identifiers based on the one or more sources; storing the features and corresponding source identifiers in a database; receiving a search query for searching the stored features; analyzing the search query using a large language model to extract search features; generating search results from the search features using at least the stored features and corresponding source identifiers; and displaying the search results on a standardized graphical user interface including a legend having at least one or more of the search features displayed. [0007] According to certain aspects of the present disclosure, a system for build