CN-121979904-A - Table data intelligent question-answering method based on multidimensional coordinate index
Abstract
The invention discloses a table data intelligent question-answering method based on multi-dimensional coordinate index, which comprises a multi-dimensional coordinate index system, a three-layer vector database, an intelligent question analysis and routing system, an intelligent retrieval and context construction system and an answer generation and output system; receiving natural language questions input by users, judging the types of the questions and selecting a vector database, searching and constructing a structured context in parallel, generating answers and returning data coordinates and searching statistical information. The method overcomes the defects of traditional table data retrieval, realizes accurate positioning of cell level by constructing a multidimensional coordinate index, improves retrieval precision and efficiency and accuracy of context understanding, and reduces query time cost.
Inventors
- ZHANG BO
- JIANG XINJIAN
- HAN LIN
- YANG SHAOWU
- WEI TONG
- TAN XINGYU
Assignees
- 中国二十冶集团有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260105
Claims (10)
- 1. A table data intelligent question-answering method based on multi-dimensional coordinate index is characterized by comprising a multi-dimensional coordinate index system, a three-layer vector database, an intelligent question analysis and routing system, an intelligent search and context construction system and an answer generation and output system, receiving natural language questions input by a user, inputting the natural language questions input by the user into the intelligent question analysis and routing system, judging the types of the questions, dynamically selecting the types of the searched vector database, searching related cell, row and column information in parallel in the multi-dimensional coordinate index system by using the intelligent search and context construction system based on the types of the questions, intelligently expanding adjacent area data according to the searched data coordinates to construct a structured context, inputting the constructed structured context and the natural language questions input by the user into an answer generation and output system, generating natural language answers by using a large language model, and returning related data coordinates and search statistical information.
- 2. The method for intelligently asking and answering table data based on multi-dimensional coordinate index as recited in claim 1, wherein the multi-dimensional coordinate index system is established based on coordinates of table cells.
- 3. The intelligent question-answering method for table data based on multi-dimensional coordinate index as set forth in claim 1 or 2, wherein said multi-dimensional coordinate index system generates unique coordinates for each cell in the table, establishes bi-directional mapping between cell coordinates and values, and supports wild card queries.
- 4. The method for intelligently asking for table data based on multi-dimensional coordinate index of claim 1 wherein said three-layer vector database comprises a cell-level vector library, a row-level vector library and a column-level vector library.
- 5. The intelligent question-answering method for table data based on multi-dimensional coordinate indexes of claim 4, wherein the cell level vector library stores coordinates, column names and values of each cell, the row level vector library stores spliced texts of whole row data, and the column level vector library stores column statistical information including non-null numbers, unique values and example values.
- 6. The intelligent question answering method based on the multi-dimensional coordinate index of claim 1, wherein the intelligent question analyzing and routing system judges the question type of the natural language questions input by the user, wherein the identified natural language questions are judged by means of keyword recognition, and aggregation, entity, relation and coordinate query is adopted for judging the question type.
- 7. The intelligent question-answering method of table data based on multi-dimensional coordinate index of claim 1, wherein based on the question type, said intelligent search and context construction system re-analyzes the question type and performs supplementary search if the structured context construction is incomplete after the structured context is constructed.
- 8. The intelligent question-answering method for table data based on multi-dimensional coordinate index of claim 1, wherein the large language model adopted by the answer generation and output system comprises DeepSeek or Ollama large language model, accurate and simple natural language answers are generated, and relevant data coordinates and retrieval statistical information are returned.
- 9. The intelligent question-answering method of table data based on multi-dimensional coordinate index of claim 1, wherein if the user natural language question relates to multi-table association, the intelligent question analysis and routing system expands to retrieve the associated table and updates the structured context through the intelligent retrieval and context construction system.
- 10. The intelligent question answering method for table data based on the multi-dimensional coordinate index of claim 1, wherein the intelligent question analyzing and routing system supports time sequence data analysis, time sequence related features including time stamp coordinates are extracted through a three-layer vector database, the retrieval range of the three-layer vector database is adjusted according to the time sequence features, and a time sequence analysis result is output.
Description
Table data intelligent question-answering method based on multidimensional coordinate index Technical Field The invention relates to the technical field of artificial intelligence, in particular to a table data intelligent question-answering method based on multidimensional coordinate indexes. Background With the development of enterprise informatization, form data (such as Excel and CSV) become important carriers for storing structured information. The user typically needs to obtain specific information in the form through natural language questions such as "what department is Zhang three". Traditional methods rely on manual searches or simple keyword matching, are inefficient and cannot understand semantic associations. Conventional table data queries rely on fixed field names and database query statements, with a higher threshold for non-technicians. With the development of artificial intelligence technology, users expect to ask questions in natural language directly like human dialogue, and the system can automatically analyze questions, locate data and return answers. The key challenge is how to let the machine understand the table with multi-level header, merging cells, etc. and build efficient index to support fast data recall. In the prior art, a form question-answering system based on vector retrieval or a large language model exists, but the following defects exist: 1. the retrieval precision is low, namely, the specific table cells can not be accurately positioned only by relying on semantic similarity; 2. The context understanding is weak, namely, the understanding of the table structure, such as row, column and coordinate relation, is lacking; 3. Complex queries such as cross-rank aggregation, wild card queries, region extensions, etc. are not supported; 4. and the method has poor expansibility, and is difficult to adapt to complex scenes such as multi-table association, time sequence analysis and the like. Disclosure of Invention The technical problem to be solved by the invention is to provide a table data intelligent question-answering method based on a multi-dimensional coordinate index, which overcomes the defects of traditional table data retrieval, realizes accurate positioning of cell level by constructing the multi-dimensional coordinate index, improves retrieval precision and efficiency, context understanding accuracy and reduces query time cost. The intelligent question answering method based on the multi-dimensional coordinate index comprises the steps of constructing a multi-dimensional coordinate index system, a three-layer vector database, an intelligent question analysis and routing system, an intelligent search and context construction system and an answer generation and output system, receiving natural language questions input by users, inputting the natural language questions input by the users into the intelligent question analysis and routing system, judging the types of the questions, dynamically selecting the types of the searched vector database, searching related cell, row and column information in parallel in the multi-dimensional coordinate index system by using the intelligent search and context construction system based on the types of the questions, intelligently expanding adjacent area data according to the searched data coordinates, constructing a structured context, inputting answers to the constructed structured context and the natural language questions input by the users, generating natural language answers by using a large language model, and returning related data coordinates and search statistical information. Further, the multi-dimensional coordinate index system is established based on the table cell coordinates. Further, the multi-dimensional coordinate index system generates unique coordinates for each cell in the table, establishes a bi-directional mapping between cell coordinates and values, and supports wild card queries. Further, the three-tier vector database includes a cell-level vector library, a row-level vector library, and a column-level vector library. Further, the cell level vector library stores coordinates, column names and values of each cell, the row level vector library stores spliced texts of whole row data, and the column level vector library stores column statistical information comprising non-null numbers, unique values and example values. Further, the intelligent problem analysis and routing system judges the problem type of the natural language problem input by the user, wherein the identified natural language problem adopts aggregation, entity, relationship and coordinate query to judge the problem type through keyword identification. Further, based on the problem type, the intelligent retrieval and context construction system re-analyzes the problem type and performs supplementary retrieval if the structured context is constructed incompletely after the structured context is constructed. Further, the large language model adopte