CN-121986329-A - Semantic search interface for data store
Abstract
The present invention discloses a method of the type described herein, the method provides visual analysis of a dataset. A system receives a natural language search query directed to a data store that includes a data source and a data visualization. The system parses the search term to determine if the natural language search query contains an analysis intent. The system also uses semantic searching to determine whether the search term matches a field in one or more data sources. The system generates and displays a visual response when (i) the search term matches a field in the one or more data sources and (ii) the natural language search query contains an analytic intent. When (i) the search term does not match a field in the data source or (ii) the natural language search query does not contain the analysis intent, the system displays pre-written content from the data visualization.
Inventors
- V. R. Setler
- A. Srinivasan
- A. Kanuka
Assignees
- 硕动力公司
Dates
- Publication Date
- 20260505
- Application Date
- 20240405
- Priority Date
- 20240130
Claims (20)
- 1. A method of visual analysis of a dataset, comprising: at a computing system having one or more processors and memory storing one or more programs configured for execution by the one or more processors: Receiving a natural language search query directed to a plurality of data stores comprising a plurality of data sources and one or more data visualizations; parsing search terms corresponding to the natural language search query to determine whether the natural language search query contains one or more analysis intent; determining whether the search term matches a data field in one or more of the plurality of data sources using semantic search; Generating and displaying one or more visual responses based on (i) the search term matching a field in the one or more data sources and (ii) the natural language search query containing a determination of one or more analysis intents, and Based on a determination that (i) the search term does not match a field in the plurality of data sources or (ii) the natural language search query does not contain the one or more analysis intents, pre-written content from the one or more data visualizations is displayed.
- 2. The method as recited in claim 1, further comprising: The search term is obtained using a federated query search that distributes a query to multiple search repositories and combines the results into a single consolidated search result.
- 3. The method of any one of claims 1 to 2, wherein the one or more analytic intents are selected from the group consisting of grouping, aggregation, association, filters and restrictions, time and geospatial.
- 4. The method of any of claims 1-2, wherein parsing the search term further comprises identifying data fields and data values along with the one or more analysis intents based on the plurality of data sources and metadata of the plurality of data sources.
- 5. The method of claim 4, wherein identifying data fields and data values comprises comparing N-grams corresponding to the search term with available data fields for syntactic and semantic similarity.
- 6. The method of claim 5, wherein the syntactic similarity is identified using a Levenshtein distance and the semantic similarity is identified using a Wu-Palmer similarity score.
- 7. The method of any of claims 1-2, wherein the semantic search comprises: Indexing each of the plurality of data stores and metadata of the plurality of data stores to obtain an index, and A federated search is performed based on the index to determine whether the search term matches a field in one or more of the plurality of data sources.
- 8. The method of claim 7, wherein the indexing comprises: Representing each file as a respective document vector for each data store and visualization context and associated metadata, and N-gram tokens from the document vector are stored to support partial and exact matches.
- 9. The method of claim 7, wherein performing the federal search comprises: Obtaining a query vector corresponding to the search term; encoding the query vector into query string tokens using an encoder for generating the index, and A predetermined number of candidate document vectors are selected from the document vectors for each data repository and visualization context and associated metadata based on an amount of overlap between the query string tokens and the document string tokens of the document vector.
- 10. The method as recited in claim 9, further comprising: ranking the predetermined number of candidate document vectors using a scoring function that scores documents based on the search terms that appear in each document, regardless of the proximity of the search terms within the document.
- 11. The method of any one of claims 1 to 2, further comprising: The one or more visual responses are generated and displayed based on the data fields, the data values, and the one or more analytic intents in the natural language search query.
- 12. The method of any one of claims 1 to 2, further comprising: based on a determination that (i) the semantic search returns a matching data source for the natural language query and (ii) the search term does not resolve to a valid data field and data value within the data source, a suggested query for the data source is displayed.
- 13. The method as recited in claim 12, further comprising: The suggested queries are generated using a template-based approach based on a combination of data fields from the data sources and data interestingness metrics.
- 14. The method of any one of claims 1 to 2, further comprising: the one or more visual responses are generated and displayed using three encoding channels (x, y, and color) and four marker types (bar, line, dot, and geographic shape), supporting dynamic generation of bar, line, scatter, and map across a range of analysis intents.
- 15. The method of any one of claims 1 to 2, further comprising: the tag type of the one or more visual responses is determined based on a mapping between the visual encoding and the data type of the data field.
- 16. The method of any one of claims 1 to 2, further comprising: Generating and displaying dynamic text summaries describing the one or more visual responses using one or more statistical calculations and a large language model; Providing hints to the large language model containing statistical descriptions extracted from the one or more visual responses using a predefined set of heuristics, and In response to providing the prompt, the dynamic text excerpt is received from the large language model.
- 17. The method of claim 16, wherein the cues correspond to (i) a minimum/maximum value and an average value for a bar graph, and (ii) pearson correlation coefficients for a scatter graph.
- 18. The method of any of claims 1-2, further comprising, prior to receiving the natural language search query: receiving a user selection of a data source; Presenting a graphical user interface for analyzing data in the selected data source, and Providing three search options, including: (i) A question-answer search for interpreting analytical intent within the selected data source; (ii) Exploratory search for document-based information retrieval of indexed visual content of a selected data source, and (Iii) Design searches of visual metadata using selected data sources.
- 19. A computer system for visual analysis of a dataset, comprising: One or more processors, and A memory; Wherein the memory stores one or more programs configured for execution by the one or more processors, and the one or more programs include instructions for: Receiving a natural language search query directed to a plurality of data stores comprising a plurality of data sources and one or more data visualizations; parsing search terms corresponding to the natural language search query to determine whether the natural language search query contains one or more analysis intent; determining whether the search term matches a data field in one or more of the plurality of data sources using semantic search; Generating and displaying one or more visual responses based on (i) the search term matching a field in the one or more data sources and (ii) the natural language search query containing a determination of one or more analysis intents, and Based on a determination that (i) the search term does not match a field in the plurality of data sources or (ii) the natural language search query does not contain the one or more analysis intents, pre-written content from the one or more data visualizations is displayed.
- 20. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer system with a display, one or more processors, and memory, the one or more programs comprising instructions for: Receiving a natural language search query directed to a plurality of data stores comprising a plurality of data sources and one or more data visualizations; parsing search terms corresponding to the natural language search query to determine whether the natural language search query contains one or more analysis intent; determining whether the search term matches a data field in one or more of the plurality of data sources using semantic search; Generating and displaying one or more visual responses based on (i) the search term matching a field in the one or more data sources and (ii) the natural language search query containing a determination of one or more analysis intents, and Based on a determination that (i) the search term does not match a field in the plurality of data sources or (ii) the natural language search query does not contain the one or more analysis intents, pre-written content from the one or more data visualizations is displayed.
Description
Semantic search interface for data store Priority The present application is a continuation of U.S. patent application Ser. No. 18/427,799, entitled "SEMANTIC SEARCH INTERFACE for Data Repositories," filed 1/30 of 2024. U.S. patent application Ser. No. 18/427,799 claims priority from U.S. provisional application Ser. No. 63/457,367, entitled "SEMANTIC SEARCH INTERFACE for Data Repositories," filed on 5, 4, 2023. U.S. patent application Ser. No. 18/427,799 also claims priority from U.S. provisional application Ser. No. 63/461,237, entitled "SEMANTIC SEARCH INTERFACE for Data Repositories," filed on 21, 4, 2023. Each of the above applications is hereby incorporated by reference in its entirety. Technical Field The disclosed implementations relate generally to data visualization and, more particularly, to systems, methods, and user interfaces for semantic searching of a data store. Background User expectations for search interfaces are evolving. It is increasingly desirable for search engines to be able to answer questions while providing context-dependent content that helps achieve the goals of the searcher. Current keyword-based search methods are mostly designed for content retrieval. Their main potential drawback is limited support for structured query types where a focused and specific response is generally desired. On the other hand, the Natural Language (NL) question-answer (Q & a) interface supports more real-world survey queries, but does not support content or document discovery and retrieval. With the increasing number of data stores on networks, including structured data in the form of relational databases, files, and knowledge maps, there is a vast amount of information that supports combining the generation of responses to factual survey questions with document retrieval. Similarly, data stores and visualization tools host hundreds or thousands of visualizations representing a broad range of data sets, making them a rich knowledge sharing and consumption platform. Searches play a key role in these repositories, enabling people to screen out content of interest to them (e.g., charts of specific topics, charts showing data trends, and customized visualizations such as Sang Jitu, or charts written by specific people). Current search systems tend to rely on document retrieval techniques to provide relevant search results for a given query. However, a challenge of data stores is the sparsity of searchable text within the data store, and data sources and graphs typically have limited textual information in the form of, for example, titles, captions, and textual data values. There is a need to explore alternative ways of indexing and searching content based on such limited text information availability. Another challenge is that current search features of data stores provide limited expressive force in specifying search queries, limiting users from keyword searching for content based primarily on visualized titles and writers. In contrast, other contemporary search interfaces, such as general web searches, image and video searches, and social networking sites, enable users to find and discover content through rich combinations of text content (e.g., keywords or topics covered in the website), visual features in the content (e.g., looking for images with a particular background color), date (e.g., watching video for the last week), geographic location (e.g., limiting searches to postal codes or cities), and even different types of media (e.g., searching for similar images through reverse image searches). In view of the current limitations of these systems, designing expressive search interfaces for data repositories requires a more thorough demonstration understanding of the search needs of people. What are people considering what goals when using searches in the context of a data store, for example, how do people formulate their search queries? what are the complementary/alternative modalities? which support metadata is used to filter search results? Disclosure of Invention Thus, there is a need for systems, methods, and interfaces for semantic searching of data repositories. Some implementations bridge the gap between two distinct search paradigms (keyword-based search methods and Natural Language (NL) question-answering (Q & a) interfaces) based on a hybrid approach called semantic search. Semantic searching applies user intent and meaning (e.g., semantics) of words and phrases to determine the correct content that may not be immediately presented in text (the keywords themselves) but that is closely related to what the searcher wants. Information retrieval techniques go beyond simple keyword matching by using information such as entity recognition, word disambiguation, and relationship extraction to interpret searcher intent in a query. For example, a keyword search may find documents with the query "FRENCH PRESS (French press)", and such as "How do I quickly make strong coffee? a) is better suited f