CN-122019759-A - Enterprise consultation data retrieval method and system based on graph enhanced retrieval generation
Abstract
The invention provides a method and a system for searching enterprise consultation data generated based on graph enhancement search, which relate to the technical field of knowledge graph construction and search and comprise the following steps of introducing word segmentation semantic features, contextual features, word segmentation position features and entity similarity mechanisms based on structural semantic units to perform entity identification and relation extraction, and constructing an enterprise knowledge graph; obtaining user search content, searching in an enterprise knowledge graph by calculating the matching similarity between the user search content and each entity node to obtain at least one entity node as a candidate matching result, optimizing the candidate matching result based on the similarity in the enterprise knowledge graph to obtain a final matching result, and extracting an answer corresponding to a target problem of the user search content from the entity node corresponding to the final matching result as a search result. The invention has the advantages of improving the expression capability of the enterprise knowledge graph and the retrieval precision.
Inventors
- XU JIACHEN
- ZHOU TIAN
- YANG HANG
- ZHOU YIPING
- LIN WEIWEI
- ZHANG QINGLIANG
- WANG YANPENG
- LI TING
- Kang Linqi
Assignees
- 北京英大长安风险管理咨询有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260211
Claims (10)
- 1. The enterprise consultation data retrieval method based on the graph enhancement retrieval generation is characterized by comprising the following steps of: Based on the structural semantic unit, word segmentation semantic features, context features, word segmentation position features and entity similarity mechanisms are introduced to conduct entity identification and relation extraction, an enterprise knowledge graph is constructed, the enterprise knowledge graph comprises entity nodes, attribute nodes, relation edges, similarity edges and subordinate edges, the entity nodes of storage entities are connected with the attribute nodes of the corresponding storage entity attributes through the subordinate edges, the entity nodes are connected through the relation edges and the similarity edges, and the similarity between the entity nodes is marked on the similarity edges; Obtaining user retrieval content, and performing retrieval on an enterprise knowledge graph by calculating the matching similarity between the user retrieval content and each entity node to obtain at least one entity node as a candidate matching result; Optimizing the candidate matching result based on the similarity sides in the enterprise knowledge graph to obtain a final matching result; and extracting an answer corresponding to the target question of the user retrieval content from the entity node corresponding to the final matching result as a retrieval result.
- 2. The method for searching the enterprise consulting data generated based on the graph enhancement search of claim 1, wherein the method for constructing the enterprise knowledge graph is as follows: obtaining enterprise data and carrying out standardized processing, wherein the standardized processing comprises syntax segmentation, unified formatting and de-duplication; Respectively carrying out word segmentation processing on each sentence through a hidden Markov model to obtain a structural semantic unit taking words as units; Extracting the entity and the corresponding entity attribute and the relation between the entity and other entities based on the structural semantic unit, connecting the entity and the corresponding entity attribute through the subordinate edge, and connecting the entity and the entity with the relation through the relation edge to obtain an initial enterprise knowledge graph; and according to the entity and the corresponding entity attribute, acquiring the identification parameter of each entity, calculating the similarity between the entities according to the identification parameter, connecting the entities with the similarity larger than a preset threshold value through the similarity edge, and marking the similarity on the similarity edge to obtain the enterprise knowledge graph.
- 3. The enterprise consulting data retrieval method based on graph enhancement retrieval generation of claim 2, wherein the extraction method of the structured semantic units is as follows: Establishing a plurality of entity description vectors, initializing the entity description vectors into an empty set, wherein a first element of the entity description vectors is used for storing entity names, and each other element stores a pair of attribute categories and corresponding attribute parameters respectively; Establishing an entity relation matrix, initializing the entity relation matrix into an empty set, wherein the nth row and the nth column of the entity relation matrix represent the relation between the mth entity and the nth entity; Initializing sentence number i=1, initializing number j=1 of entity description vector, initializing element number of jth entity description vector ; The following is done until all sentences are traversed: acquiring a classification label of each word of the ith sentence through a word segmentation classification model, wherein the classification label comprises entities, relationships among the entities, attribute categories and attribute parameters; Judging the number of entities in the ith sentence, if the number of the entities is one, judging whether the entities in the ith sentence are already stored in the entity description vector, if not, storing the entities in the ith sentence in the jth entity description vector, updating the value of j to j=j+1, and simultaneously sequentially carrying out the following operations on the attribute category and the corresponding attribute parameters in the ith sentence, namely storing the jth entity description vector Parameters and update The numerical value of (2) is If the entity is greater than 1, judging whether each entity in the ith sentence is stored in the entity description vector, if not, storing the entity in the ith sentence in the jth entity description vector, updating the value of j to j=j+1, and storing the relationship between the entities in the ith sentence in the entity relationship matrix to sequentially store the attribute category and the corresponding attribute parameter in the ith sentence in the jth entity description vector Parameters and update The numerical value of (2) is ; Updating the value of i to i=i+1; and after the traversal is completed, constructing the initial enterprise knowledge graph according to the entity description vector and the entity relation matrix.
- 4. The enterprise consulting data retrieval method based on graph enhancement retrieval generation of claim 3, wherein the word segmentation classification model includes: the input layer is used for inputting the word segmentation to be classified and all the word segmentation of the sentence; feature extraction layer for extracting word segmentation semantic features Contextual features And word segmentation location feature Feature fusion is carried out to obtain word segmentation fusion features : ; ; ; ; ; ; Wherein, the Is a two-way long-short-term memory neural network, To construct a character sequence function of the character-by-character ordered sequence, A function of extracting a vector of each character and performing concatenation for a vector index table constructed based on the characters, For the word segmentation to be classified, For the word preceding the word of the segmentation to be classified, For the latter word of the segmented word to be classified, For the total number of the segmented words of the sentence where the segmented words to be classified are located, For sequential numbering of the segmentations to be classified in sentences, And Respectively is And When the word before or after the word to be classified does not exist, the corresponding word semantic features Or (b) Setting the numerical value of (2) to zero; an output layer for outputting according to the word segmentation fusion characteristics Probability belonging to each of the classification labels, and selecting as the highest probability Is a classification tag of: ; ; ; Wherein, the And Respectively represent Belonging to the kth category of said classification labels Is used for the probability and the score of (a), As a function of the natural index of refraction, As an intermediate parameter, a parameter which is a function of the parameter, 、 And In order to train the weights, the weights are, 、 And For training bias.
- 5. The method for retrieving business consultation data generated based on graph enhancement retrieval of claim 4, wherein the method for obtaining the identification parameter of each entity and calculating the similarity between the entities according to the identification parameter is as follows: calculating an identification parameter according to the entity description vector of the entity: ; Wherein, the Is the first Entity description vector number of each of the entities The number of the attribute category to which the individual element corresponds, Is the first Entity description vector number of each of the entities Attribute parameters of the attribute categories to which the individual elements correspond, Is the first The total number of elements of the entity description vector for each of the entities; Calculate the first Personal entity and the first The method for the similarity between the individual entities comprises the following steps: acquiring the first based on the identification parameter Personal entity and the first Attribute categories in which the individual entities exist, and forming a common attribute category set; According to the first Personal entity and the first Calculating the number of attribute categories of each entity and the numerical difference of corresponding attribute parameters Personal entity and the first The similarity between individual entities : ; Wherein, the And The tth attribute category of the common attribute category set is at the tth Personal entity and the first And the value of the corresponding attribute parameter in each entity, and T is the total number of elements in the common attribute category set.
- 6. The method for searching the enterprise consultation data generated based on the graph enhancement search according to claim 1 is characterized in that the method for searching the enterprise knowledge graph is as follows: semantic analysis is carried out based on the user search content, search dimensions are extracted, and the search dimensions comprise entity name dimensions, attribute dimensions and relationship dimensions and search targets; taking the entity nodes as units, and respectively carrying out the following operations for each entity node: performing matching search on the enterprise knowledge graph according to the search dimension, and respectively obtaining an entity similarity score, a plurality of attribute similarity scores and a plurality of relationship similarities through cosine similarity, wherein the entity similarity score is the similarity between an entity stored by an entity node and the entity name dimension, the attribute similarity score is the maximum value of the similarity between the attribute dimension related to the user search content and all related attribute nodes, the related attribute nodes are the attribute nodes connected with the entity node through the subordinate edges, and the relationship similarities are the similarity between the relationship dimension related to the user search content and all relationship edges of the entity node; the matching similarity of the entity nodes is obtained by carrying out weighted summation on the relationship similarity, the node similarity and the attribute similarity; And after the matching similarity of all the entity nodes is obtained, selecting the entity nodes with the matching similarity larger than a preset threshold as the candidate matching result.
- 7. The method for searching the enterprise advisory data generated based on the graph enhancement search of claim 1, wherein the method for optimizing the candidate matching results based on the similarity sides in the enterprise knowledge graph is to perform the following operations for each candidate matching result: acquiring all the entity nodes connected with the entity nodes corresponding to the candidate matching results through similarity edges, and taking the entity nodes as nodes for optimization; obtaining the matching similarity between the user retrieval content and each node for optimization, and calculating the optimization parameters of the candidate matching results; calculating the final score of the candidate matching result according to the optimization parameters and the matching similarity; And after obtaining the final scores of all the candidate matching results, selecting the candidate matching results with the final scores larger than a preset threshold as the final matching results.
- 8. The method for searching for enterprise advisory data generated based on graph enhancement search as claimed in claim 7, wherein the method for calculating the optimization parameters of the candidate matching result is as follows: ; Wherein, the Is the first The optimization parameters of each of the candidate matching results, Is the first The entity node numbers corresponding to the candidate matching results, Representing the entity node number corresponding to the d-th optimizing node, Is the first Personal entity node and the th Similarity noted on the similar edges of individual entity nodes, Retrieving content for the user The matching similarity of the individual entity nodes.
- 9. The method for searching for enterprise advisory data generated based on graph enhancement search as claimed in claim 8, wherein the method for calculating the final score of the candidate matching result is as follows: ; Wherein, the Is the first The final score of each of the candidate matching results, As a function of the natural index of refraction, Retrieving content for the user The matching similarity of the individual entity nodes.
- 10. The enterprise consulting data retrieval system based on graph enhancement retrieval, which is applied to the enterprise consulting data retrieval method based on graph enhancement retrieval as claimed in any one of claims 1 to 9, is characterized by comprising: The enterprise knowledge graph construction module is used for constructing an enterprise knowledge graph based on entity attribute structure and entity similarity association mechanism through entity identification and relation extraction, wherein the enterprise knowledge graph comprises entity nodes, attribute nodes, relation edges, similarity edges and subordinate edges, the entity nodes of the storage entity are connected with the attribute nodes of the corresponding storage entity attribute through the subordinate edges, the entity nodes are connected with the similarity edges through the relation edges, and the similarity between the entity nodes is marked on the similarity edges; the preliminary retrieval module is used for acquiring user retrieval contents, and retrieving the user retrieval contents in the enterprise knowledge graph by calculating the matching similarity between the user retrieval contents and each entity node to obtain at least one entity node as a candidate matching result; And the retrieval optimization module is used for optimizing the candidate matching result based on the similarity edges in the enterprise knowledge graph to obtain a final matching result, and extracting an answer corresponding to the target problem of the user retrieval content from the entity node corresponding to the final matching result as a retrieval result.
Description
Enterprise consultation data retrieval method and system based on graph enhanced retrieval generation Technical Field The invention relates to the technical field of knowledge graph construction and retrieval, in particular to an enterprise consultation data retrieval method and system based on graph enhancement retrieval. Background With the development of intelligent application in various industries, enterprise data retrieval based on knowledge patterns is becoming an important and common intelligent query means. The construction of the enterprise knowledge graph is generally realized by carrying out entity recognition and relation extraction on enterprise text data, and then responding to a user query request based on entity name matching, keyword searching or graph structure path searching in search enhancement generation. However, the existing enterprise consultation data retrieval method based on knowledge graph and retrieval enhancement still has the defects that the method is mainly constructed based on external connection between entities, potential association capturing capacity between the entities is weaker, and therefore feature expression of the enterprise knowledge graph is limited, and only when more complex retrieval contents or retrieval result matching feasibility is processed, the problem of insufficient retrieval matching precision can occur, for example, ambiguous matching can occur, the method is generally constructed through rule matching, keyword extraction or a traditional sequence labeling model in the stage of constructing the enterprise knowledge graph, the accuracy of enterprise knowledge graph expression is further reduced through rule matching, keyword extraction or a traditional sequence labeling model, in the conventional retrieval enhancement generation based on the knowledge graph, after candidate entities are obtained, the method is generally conducted directly according to matching scores or path lengths to obtain relevant information required by retrieval, then generation of a final text is guided, the retrieval result obtained in the mode is rough, some false matching results can not be filtered, and the problem of feedback result redundancy occurs. Therefore, the construction method of the enterprise knowledge graph and the retrieval method generated based on the retrieval enhancement are continuously optimized, and the expression capacity of the enterprise knowledge graph and the retrieval precision are improved. Disclosure of Invention The invention aims to provide an enterprise consultation data retrieval method and system based on graph enhancement retrieval, which improve the expression capacity of enterprise knowledge graphs and the retrieval precision. The invention is realized by the following technical scheme: the enterprise consultation data retrieval method based on graph enhancement retrieval comprises the following steps: Based on the structural semantic unit, word segmentation semantic features, context features, word segmentation position features and entity similarity mechanisms are introduced to conduct entity identification and relation extraction, an enterprise knowledge graph is constructed, the enterprise knowledge graph comprises entity nodes, attribute nodes, relation edges, similarity edges and subordinate edges, the entity nodes of storage entities are connected with the attribute nodes of the corresponding storage entity attributes through the subordinate edges, the entity nodes are connected through the relation edges and the similarity edges, and the similarity between the entity nodes is marked on the similarity edges; Obtaining user retrieval content, and performing retrieval on an enterprise knowledge graph by calculating the matching similarity between the user retrieval content and each entity node to obtain at least one entity node as a candidate matching result; Optimizing the candidate matching result based on the similarity sides in the enterprise knowledge graph to obtain a final matching result; and extracting an answer corresponding to the target question of the user retrieval content from the entity node corresponding to the final matching result as a retrieval result. Preferably, the method for constructing the enterprise knowledge graph comprises the following steps: obtaining enterprise data and carrying out standardized processing, wherein the standardized processing comprises syntax segmentation, unified formatting and de-duplication; Respectively carrying out word segmentation processing on each sentence through a hidden Markov model to obtain a structural semantic unit taking words as units; Based on the structural semantic unit, extracting the entity and the corresponding entity attribute and the relation between the entity and other entities, connecting the entity and the corresponding entity attribute through the subordinate edge, and connecting the entity and the entity with the relation through the relation edge t