CN-122021921-A - Intelligent result matching method and system based on large model question-answering mechanism

CN122021921ACN 122021921 ACN122021921 ACN 122021921ACN-122021921-A

Abstract

The invention discloses an intelligent result matching method and system based on a large model question-answer mechanism, which belong to the technical field of artificial intelligence, information retrieval and data processing, and comprise the steps of result data initialization and synchronization, namely acquiring basic data of results from a relational database, and segmenting the basic data into text fragments after text cleaning and splicing; and converting the text segment into a semantic vector by using an embedded model, storing the semantic vector into a vector database, and constructing a keyword retrieval index based on the text segment. Therefore, a mixed retrieval mechanism of vector retrieval and BM25 is adopted, and by combining Rerank model refined sequencing, the accurate fusion and sequencing of retrieval results are realized through a multidimensional algorithm formula, the limitation of a single retrieval mode is solved, the recall rate is improved by more than 80%, and the matching precision is more than 80%.

Inventors

ZHANG YILIN
PAN HAO
GONG JIAXIN
Li tianan
LI RENWEI
CHEN LINA

Assignees

青岛檬豆网络科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260211

Claims (10)

1. An intelligent result matching method based on a large model question-answering mechanism is characterized by comprising the following steps: s1, initializing and synchronizing result data, namely acquiring basic data of results from a relational database, cleaning and splicing texts, and then cutting the texts into text fragments; Converting the text segment into a semantic vector by using an embedded model, storing the semantic vector into a vector database, and constructing a keyword retrieval index based on the text segment; s2, receiving a query text input by a user, and cleaning and standardizing the query text; S3, carrying out understanding and expansion on the preprocessed query text by using a large language model to generate multi-angle expansion query and hypothesis documents; Based on the expansion inquiry, similarity retrieval based on semantic vectors and retrieval based on keywords are executed in parallel, and candidate achievements are respectively recalled; S4, a retrieval result reordering step, namely inputting the preliminary candidate result set and the original query of the user into a reordering model, calculating the refined relevance score of each candidate result and the query by the model, and sequencing and screening according to the score to obtain a high-relevance result set; s5, generating and outputting a result, namely generating the high-correlation result set into a structured matching list or a natural language answer according to the interface type, and outputting the result.
2. The intelligent result matching method based on the large model question-answering mechanism according to claim 1, wherein in the step of initializing and synchronizing result data, the method for obtaining result data from the relational database comprises the steps of reading all data through a full-volume synchronous interface or receiving single data pushed by an external system through an incremental synchronous interface.
3. The intelligent result matching method based on the large model question-answering mechanism according to claim 2, wherein the text segment is converted into a semantic vector by using an embedded model, and the method is specifically calculated by using a word vector weighted average method based on word segmentation in the text segment.
4. The intelligent result matching method based on the large model question-answering mechanism according to claim 3, wherein in the multi-dimensional search and candidate result recall step, the expanded search condition generated by using the large language model further comprises metadata filtering condition extracted from the user query.
5. The intelligent result matching method based on the large model question-answering mechanism according to claim 4, wherein the two types of retrieval recalled results are fused, specifically a weighted fusion method is adopted, and semantic vector retrieval scores and keyword retrieval scores are weighted and summed according to preset weight coefficients to obtain a mixed retrieval initial score.
6. The intelligent result matching method based on a large model question-answer mechanism according to claim 5, wherein in the multi-dimensional search and candidate result recall step, the search result is optimized by further combining the similarity between the semantic vector of the hypothetical document and the semantic vector of the candidate result.
7. The intelligent result matching method based on a large model question-answering mechanism according to claim 6, wherein in the retrieval result reordering step, the reordering model calculates a relevance score after splicing a user query vector and a candidate result vector based on a cross attention mechanism.
8. The intelligent result matching method based on the large model question-answer mechanism according to claim 7, wherein in the step of initializing and synchronizing result data, the vector database and the relational database are incrementally synchronized, and whether synchronization is needed is judged by comparing the version numbers of the same result stored by both sides or updating the time stamps last.
9. An intelligent outcome matching system based on a large model question-answering mechanism for implementing the method of any one of claims 1 to 8, comprising: the data storage layer comprises a relational database for storing result basic data and a vector database for storing result semantic vectors and metadata; The AI component layer comprises an embedded model module for generating text semantic vectors, a large language model module for inquiring expansion and text generation and a reordering model module for carrying out fine scoring ordering on candidate achievements; The search engine layer comprises a mixed search module for cooperatively executing semantic search and keyword search, a multi-query expansion module for generating multi-angle query conditions and hypothetical documents, and a data cleaning module for cleaning texts; The interface service layer at least comprises a data synchronization interface for triggering data synchronization, an intelligent interaction interface for receiving user inquiry and a result matching interface; and the output layer is used for outputting the structured matching result or natural language answer.
10. The intelligent result matching system based on the large model question-answering mechanism according to claim 9, wherein the data synchronization interface of the interface service layer supports full-scale synchronization based on version number or timestamp judgment and single data increment synchronization, and the structured result output by the output layer is in a JSON format and comprises result identification, title, matching score and matching reason.

Description

Intelligent result matching method and system based on large model question-answering mechanism Technical Field The invention relates to an intelligent result matching method and system based on a large model question-answering mechanism, and belongs to the technical fields of artificial intelligence, information retrieval and data processing. Background In the scenes of achievement resource docking, technical consultation and the like, the traditional achievement retrieval system has obvious technical problems: The retrieval mode is single, the accurate matching of the keywords depends on the multiple, the meaning of the user requirement cannot be understood, the result resource omission of 'semantic correlation but keyword mismatch' is caused, and the retrieval recall rate is low; The data storage and synchronous splitting are carried out, the result data are stored in a relational database, semantic retrieval cannot be directly supported, and a real-time synchronization mechanism of the relational database and a retrieval engine is lacked, so that data update is delayed; the sorting precision of the search results is insufficient, the candidate results of the preliminary recall lack a fine scoring mechanism, and irrelevant or low-correlation results are ranked at the front, so that the user experience is affected; The user demand description is fuzzy and scenerized, and the traditional system can not improve the matching accuracy through demand expansion and logic reconstruction; the system security and the expandability are insufficient, sensitive information such as an API key is easy to leak, and a single retrieval mode is difficult to adapt to the change of requirements of different scenes. In the prior art, vector retrieval and keyword retrieval are mostly applied independently, a collaborative retrieval closed loop is not formed, the fusion depth of a large model and a retrieval engine is insufficient, the advantages of the large model in query expansion, semantic understanding and logic reconstruction are not fully exerted, and meanwhile, a customized retrieval strategy aiming at a result matching scene is lacking, so that retrieval precision, recall rate and system response speed are difficult to consider, and efficiency and accuracy of result resource docking are severely restricted. Disclosure of Invention The invention aims to provide an intelligent result matching method and system based on a large model question-answering mechanism, so as to solve the problems in the background technology. In order to achieve the above purpose, the invention adopts the following technical scheme: Compared with the prior art, the invention provides an intelligent result matching method and system based on a large model question-answering mechanism, and the method comprises the following steps: S1, initialization and synchronization of result data S101, initializing a MySQL database, and creating achievement tables for storing basic data such as result ids, titles, introductions and the like; S102, reading all result data in MySQL through a full-volume synchronous interface, or receiving single result data pushed by an external system through an incremental synchronous interface; s103, cleaning the result data, namely removing HTML labels, merging blank characters, and splicing the complete text of 'result title + introduction'; s104, cutting a long text into fragments with fixed length (chunk_size=500 and chunk_overlap=50) through a text cutter, and avoiding overrun of Embedding model input length; S105, generating semantic vectors through Embedding models based on a word vector weighted average method of text fragments, writing the text fragments, associated metadata, namely result ids, titles and the semantic vectors into a Chroma vector database, updating BM25 indexes in an increment mode, constructing a corpus based on jieba word segmentation, and completing data synchronization closed loop; S2, user demand receiving and preprocessing S201, a user inputs a demand query (text format) through an intelligent interaction interface or a result matching interface; S202, a data cleaning module preprocesses the user inquiry, removes special characters and standardized expressions, and ensures the normalization of search input; S3, multi-dimensional retrieval and recall of candidate achievements S301, generating multi-dimensional search conditions through LLM aiming at a result matching interface, wherein the multi-dimensional search conditions comprise 3 expansion queries with different angles, 1 section of hypothetical result introduction (HyDE) and metadata extraction filtering conditions; S302, the hybrid search module performs collaborative search on each extended query: Vector retrieval, namely converting query into semantic vector through embedding model, executing similarity retrieval in Chroma based on cosine similarity formula, recalling top-k candidate result; BM25 retrieval, namely, segmenting query wo