CN-121996799-A - Retrieval processing method, device and equipment based on multisource knowledge base data

CN121996799ACN 121996799 ACN121996799 ACN 121996799ACN-121996799-A

Abstract

The invention provides a retrieval processing method, a device and equipment based on multi-source knowledge base data, wherein the method comprises the steps of obtaining multi-source heterogeneous data; the method comprises the steps of carrying out standardized processing on a data format and a target term of multi-source heterogeneous data to obtain standard data, carrying out repeated entity identification on the standard data to remove repeated entities in the standard data to obtain target data, obtaining entity relations in the multi-source heterogeneous data according to the multi-source heterogeneous data to construct a knowledge graph, obtaining a user target problem, and inquiring in the target data and the knowledge graph to obtain a target output result. The scheme of the invention can uniformly manage multi-source data, improves the database-crossing retrieval accuracy through the knowledge graph, and improves the real-time response capability of the data.

Inventors

LIU SHENGKAI
YAN GUOQING
LIU MEILU
Tang man

Assignees

中译语通信息科技(上海)有限公司

Dates

Publication Date: 20260508
Application Date: 20251205

Claims (10)

1. A retrieval processing method based on multi-source knowledge base data is characterized by comprising the following steps: acquiring multi-source heterogeneous data; Carrying out standardized processing on the data format and the target term of the multi-source heterogeneous data to obtain standard data; Performing repeated entity identification on the standard data, and removing repeated entities in the standard data to obtain target data; Acquiring entity relations in the multi-source heterogeneous data according to the multi-source heterogeneous data, and constructing a knowledge graph; and acquiring a user target problem, and inquiring in the target data and the knowledge graph to obtain a target output result.
2. The method for processing the multi-source knowledge base data according to claim 1, wherein the standardized processing is performed on the data format and the target term of the multi-source heterogeneous data to obtain standard data, and the method comprises the following steps: Performing format conversion on data with different data formats in the multi-source heterogeneous data, and unifying the data with different data formats into standardized data; When the target term is identified in the standardized data, mapping the target term in the standardized data into a standard term according to a term comparison relation in a preset domain dictionary to obtain standard data.
3. The method for processing the multi-source knowledge base data retrieval according to claim 1, wherein the step of performing repeated entity identification on the standard data and removing repeated entities in the standard data to obtain target data comprises the steps of: Performing natural language processing on the standard data to generate a plurality of semantic vectors; performing similarity calculation on the plurality of semantic vectors, and determining two semantic vectors corresponding to the similarity score as the same entity when the similarity score is larger than a preset threshold; And reserving one entity in the same entity according to a preset coordination rule to obtain target data.
4. The retrieval processing method based on multi-source knowledge base data according to claim 1, wherein obtaining entity relations in the multi-source heterogeneous data according to the multi-source heterogeneous data, and constructing a knowledge graph, comprises: Extracting a target entity in the multi-source heterogeneous data according to the multi-source heterogeneous data; performing entity classification and alignment on the target entities, and obtaining a plurality of triples according to the entity relationship among the target entities; and obtaining a knowledge graph according to the triples.
5. The method for processing the multi-source knowledge base data retrieval according to claim 1, wherein the steps of obtaining a user target question, querying in the target data and the knowledge graph to obtain a target output result, and include: Acquiring a target problem input by a user; Searching in the target data according to the target problem to obtain a search result; And carrying out knowledge enhancement through the knowledge graph according to the search result to obtain a target output result.
6. The method for processing the retrieval based on the multi-source knowledge base data according to claim 5, wherein the retrieving in the target data according to the target problem to obtain the retrieval result comprises: According to the target problem, carrying out structural retrieval in the target data through keyword query to obtain a first retrieval result; Vectorizing the target problem, and carrying out semantic retrieval in the target data to obtain a second retrieval result; And merging and de-duplicating the first search result and the second search result, and carrying out result fusion according to a preset weight formula to obtain the search result.
7. The method for processing the multi-source knowledge base data retrieval according to claim 5, wherein the step of obtaining the target output result by knowledge enhancement through the knowledge graph according to the retrieval result comprises the steps of: performing expansion query in the knowledge graph through a preset query statement to obtain related entities; And calculating the correlation degree of each related entity according to a preset weight mechanism to obtain a target output result.
8. A retrieval processing apparatus based on multi-source knowledge base data, comprising: the acquisition module is used for acquiring multi-source heterogeneous data; The processing module is used for carrying out standardized processing on the data format and the target term of the multi-source heterogeneous data to obtain standard data, carrying out repeated entity identification on the standard data, removing repeated entities in the standard data to obtain target data, obtaining entity relations in the multi-source heterogeneous data according to the multi-source heterogeneous data to construct a knowledge graph, obtaining a user target problem, and inquiring in the target data and the knowledge graph to obtain a target output result.
9. A computing device comprising a processor, a memory storing a computer program which, when executed by the processor, performs the method of any one of claims 1 to 7.
10. A computer readable storage medium storing instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 7.

Description

Retrieval processing method, device and equipment based on multisource knowledge base data Technical Field The invention relates to the technical field of computer information technology processing, in particular to a retrieval processing method, device and equipment based on multisource knowledge base data. Background The current strategy database mostly adopts an isolated architecture (such as an independent talent database or a policy database), and has the following problems in data processing: The integration of multi-source heterogeneous data is difficult, and talent, organization and policy data in various formats (such as Excel and JSON) cannot be efficiently cleaned and standardized, so that the data redundancy rate is high (> 15%) and the quality is low (null rate > 10%). Talent, institution and policy libraries operate independently, lack cross-library association capability, and have a query response delay of more than 5 seconds. The knowledge retrieval precision is insufficient, the traditional keyword retrieval only supports single-field matching, fuzzy matching cannot be realized in multiple fields such as talent names, research fields and the like, and the recall rate is lower than 70%. Implicit relationships between policies and institutions, talents (e.g. "influence of policies on talents in a particular field") have not been mined due to lack of knowledge graph modeling and reasoning capabilities. The real-time defect of the data, the data update depends on manual batch processing, and the delay exceeds 1 hour, so that the response of talent flow or policy change cannot be performed in real time. Incremental update collision rate is high (> 10%) due to lack of real-time synchronization and collision resolution mechanisms. Disclosure of Invention The technical problem to be solved by the invention is to provide a retrieval processing method, a device and equipment based on multi-source knowledge base data, which are used for uniformly managing multi-source data, improving the data cross-base retrieval accuracy through a knowledge graph and improving the real-time response capability of the data. In order to solve the technical problems, the technical scheme of the invention is as follows: a retrieval processing method based on multi-source knowledge base data comprises the following steps: acquiring multi-source heterogeneous data; Carrying out standardized processing on the data format and the target term of the multi-source heterogeneous data to obtain standard data; Performing repeated entity identification on the standard data, and removing repeated entities in the standard data to obtain target data; Acquiring entity relations in the multi-source heterogeneous data according to the multi-source heterogeneous data, and constructing a knowledge graph; and acquiring a user target problem, and inquiring in the target data and the knowledge graph to obtain a target output result. Optionally, the normalizing processing is performed on the data format and the target term of the multi-source heterogeneous data to obtain standard data, which includes: Performing format conversion on data with different data formats in the multi-source heterogeneous data, and unifying the data with different data formats into standardized data; When the target term is identified in the standardized data, mapping the target term in the standardized data into a standard term according to a term comparison relation in a preset domain dictionary to obtain standard data. Optionally, performing repeated entity identification on the standard data, and removing repeated entities in the standard data to obtain target data, where the steps include: Performing natural language processing on the standard data to generate a plurality of semantic vectors; performing similarity calculation on the plurality of semantic vectors, and determining two semantic vectors corresponding to the similarity score as the same entity when the similarity score is larger than a preset threshold; And reserving one entity in the same entity according to a preset coordination rule to obtain target data. Optionally, obtaining the entity relationship in the multi-source heterogeneous data according to the multi-source heterogeneous data, and constructing a knowledge graph includes: Extracting a target entity in the multi-source heterogeneous data according to the multi-source heterogeneous data; performing entity classification and alignment on the target entities, and obtaining a plurality of triples according to the entity relationship among the target entities; and obtaining a knowledge graph according to the triples. Optionally, obtaining a user target problem, querying in the target data and the knowledge graph to obtain a target output result, including: Acquiring a target problem input by a user; Searching in the target data according to the target problem to obtain a search result; And carrying out knowledge enhancement through the knowled