CN-122019756-A - Intelligent retrieval method and system for enterprise knowledge base based on AI
Abstract
The invention relates to the technical field of intelligent retrieval of enterprise knowledge bases, in particular to an intelligent retrieval method and system of an AI-based enterprise knowledge base. Traversing the query semantic graph, calculating the stability coefficient of each semantic node and the contribution weight of the related nodes, and screening out the dominant semantic node set according to the stability coefficient. And constructing a dynamic focusing window according to the connection strength between the dominant nodes and the distribution discrete degree of the non-dominant nodes, and reconstructing the semantic structure of the original query by using the window to generate a core query statement. And searching in the enterprise knowledge base based on the core query statement and outputting a matching result. The method can deeply analyze the semantic structure in the query and dynamically focus the core retrieval intention, thereby improving the accuracy and efficiency of enterprise knowledge base retrieval.
Inventors
- YANG HUA
- LI NING
- GUO JIANMING
Assignees
- 黑龙江振宁科技股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260129
Claims (10)
- 1. An intelligent retrieval method of an enterprise knowledge base based on AI, which is characterized by comprising the following steps: receiving an input original query sentence, and carrying out structural deconstructment on the original query sentence to generate a query semantic graph containing a plurality of semantic nodes; traversing all semantic nodes in the query semantic graph, and calculating the stability coefficient of the semantic nodes and the contribution weight of the associated nodes; According to the stability coefficient of the semantic node and the contribution weight of the associated node, a dominant semantic node set is screened out from the query semantic graph; constructing a dynamic focusing window of the query semantic graph according to the connection strength among nodes in the dominant semantic node set and the distribution discrete degree of other nodes which are not selected into the dominant semantic node set in the query semantic graph; Reconstructing the semantic structure of the original query statement by utilizing a dynamic focusing window to generate a core query statement subjected to semantic focusing; And searching in the enterprise knowledge base based on the core query statement, and outputting the matched knowledge items.
- 2. The intelligent retrieval method of an AI-based enterprise knowledge base of claim 1, wherein the query semantic graph construction step includes: Performing word segmentation and part-of-speech tagging on the original query sentence, and identifying and tagging all entity words, action words and modifier words; Taking each marked word as an initial semantic node, and establishing a sequential connection edge between the initial semantic nodes based on the linear sequence of the words in the original query statement; analyzing dependency syntactic relations among words, and establishing dependency relation connecting edges among initial semantic nodes with the dependency relations; And fusing the sequential connection edges and the dependency connection edges between the initial semantic nodes to form a query semantic graph containing nodes and multiple types of connection edges.
- 3. The intelligent retrieval method of an AI-based enterprise knowledge base of claim 2, wherein the method for calculating the stability factor of the semantic node is as follows: Counting the number of all connecting edges taking the semantic nodes as starting points or end points for each semantic node in the query semantic graph, and taking the number as the local connectivity of the semantic nodes; Analyzing part-of-speech class distribution of other semantic nodes pointed to or derived from all connecting edges of the semantic nodes, and calculating uniformity of the part-of-speech class distribution; multiplying the local connectivity of the semantic node by uniformity of part-of-speech category distribution, and taking the obtained product as a structural stability value of the semantic node; calculating the variance of the occurrence probability of the semantic node in a plurality of pre-constructed universal language models as a context stable value of the semantic node; and adding the structural stable value and the context stable value of the semantic node to obtain the stability coefficient of the semantic node.
- 4. The intelligent retrieval method of an AI-based enterprise knowledge base of claim 3, wherein the method of calculating the contribution weight of the associated node is: for any two semantic nodes with direct connection edges in the query semantic graph, defining the semantic nodes as associated node pairs; counting the type quantity of all direct connection edges between the associated node pairs, and calculating the edge type richness of the associated node pairs; Searching the shortest path length of the associated node pair in the query semantic graph, and calculating the reciprocal of the shortest path length as the path proximity; multiplying the side type richness by the path adjacency to obtain an original contribution value of each node in the associated node pair to the other node; and normalizing all original contribution values forming the associated node pair with the target semantic node to obtain the contribution weight of each associated node relative to the target semantic node.
- 5. The intelligent retrieval method of an AI-based enterprise knowledge base of claim 4, wherein the method for screening dominant semantic node sets from a query semantic graph is as follows: setting an initial stability coefficient threshold value, and selecting semantic nodes with all stability coefficients larger than the initial stability coefficient threshold value into a candidate node set; for each node in the candidate node set, calculating the sum of contribution weights of all the associated nodes as the aggregation influence of the nodes; multiplying the stability coefficient of each node in the candidate node set by the aggregation influence of the stability coefficient to obtain a comprehensive significant value of the node; according to the comprehensive significant value distribution of all candidate nodes, determining a final comprehensive significant value threshold by adopting a self-adaptive quantile method; and determining candidate nodes with the comprehensive significance value larger than or equal to the final comprehensive significance value threshold as dominant semantic nodes, wherein all the dominant semantic nodes form a dominant semantic node set.
- 6. The intelligent search method of an AI-based enterprise knowledge base of claim 5, wherein the method for constructing a dynamic focus window of a query semantic graph is as follows: Calculating the average weight of all connecting edges on the shortest path in the query semantic graph between each pair of nodes in the dominant semantic node set, and taking the average weight as the connection strength between each pair of nodes; Identifying all semantic nodes which are not contained in the dominant semantic node set in the query semantic graph, and calculating the standard deviation of the shortest path length of the non-contained semantic nodes and each node in the dominant semantic node set to be used as a measure of the degree of distribution dispersion; the average value of the connection strength between all node pairs in the dominant semantic node set is subjected to weighted fusion with the measurement value of the distribution discrete degree to generate a focusing strength coefficient; The method comprises the steps of taking a dominant semantic node set as a core, dynamically determining the number of layers of neighbor nodes which need to be additionally included in a focusing range in a query semantic graph according to the size of a focusing intensity coefficient, and defining a sub-graph structure comprising the core node and neighbor nodes in a specific range, wherein the sub-graph structure is a dynamic focusing window.
- 7. The intelligent retrieval method of an AI-based enterprise knowledge base of claim 6, wherein the method of generating semantically focused core query statements is: Arranging all semantic nodes in the sub-graph structure defined by the dynamic focusing window according to the sequence of the original words in the original query statement; Checking whether words which are not included in a dynamic focusing window exist in an original query sentence between adjacent semantic nodes, and if so, determining whether the words are inserted as connecting words or modifier words according to the parts of speech of the words and the syntactic relation between the words and the front and rear semantic nodes; according to the type of the connecting edges between the nodes in the dynamic focusing window, adjusting the logic relation expression between words in the generated sentence, and reconstructing a grammar structure; and integrating all the words after sequential arrangement, necessary word insertion and grammar structure adjustment to form a core query sentence with complete grammar and semantics.
- 8. The intelligent search method of an AI-based enterprise knowledge base of claim 7, wherein the method for searching in the enterprise knowledge base based on the core query statement comprises the following steps: Inputting the core query sentence into a semantic coding model to obtain a high-dimensional semantic vector representation; calculating cosine similarity between the high-dimensional semantic vector and prestored semantic vectors of all knowledge items in an enterprise knowledge base; sorting all knowledge items in descending order according to cosine similarity; Setting an adaptive similarity threshold related to a focusing intensity coefficient of a dynamic focusing window, and screening out knowledge items with cosine similarity larger than the adaptive similarity threshold as a preliminary result set.
- 9. The intelligent retrieval method of an AI-based enterprise knowledge base of claim 8, further comprising, after generating the preliminary result set: extracting a keyword set of each knowledge item in the preliminary result set; calculating Jacquard similarity coefficients between the core query statement and the keyword set of each knowledge item; carrying out weighted harmonic averaging on the cosine similarity of each knowledge item and the Jacquard similarity coefficient of each knowledge item to obtain a final relevance score of each knowledge item; and re-ordering the knowledge items in the preliminary result set according to the final relevance score, and taking the ordered list as the finally output matching knowledge item.
- 10. An intelligent retrieval system for an AI-based enterprise knowledge base, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor, when executing the computer program, performs the steps of an intelligent retrieval method for an AI-based enterprise knowledge base as claimed in any one of claims 1 to 9.
Description
Intelligent retrieval method and system for enterprise knowledge base based on AI Technical Field The invention relates to the technical field of intelligent retrieval of enterprise knowledge bases, in particular to an intelligent retrieval method and system of an enterprise knowledge base based on AI. Background In the field of intelligent retrieval of enterprise knowledge bases, the prior art mainly extends around keyword matching and semantic vector matching. Keyword matching techniques rely on the surface consistency of query words and document words for retrieval by boolean logic or weight computation based on word frequency-inverse document frequency. The semantic vector matching technology converts the query and the document into high-dimensional vectors by using a pre-training language model, and obtains the result by calculating the similarity between the vectors. These two types of techniques constitute the currently prevailing search paradigm. The prior art solutions have drawbacks. The keyword matching method can not understand the semantics, synonymous relation and context logic behind the vocabulary, so that the accuracy of the search result is low. Semantic vector matching techniques, while capturing a degree of semantic relevance, are essentially encoding the entire query statement into an overall, static vector representation. The method ignores abundant structural information in the query sentence, and cannot distinguish primary and secondary relations and stability of different semantic components and complex supporting or constraint networks among the primary and secondary relations and the stability. When faced with lengthy, ambiguous, or user queries containing multiple sub-intentions, the prior art either introduces noise due to keyword ambiguity, or fails to accurately focus on the user's most core, most stable search intent by treating the query as a whole in chaotic semantics. The current technology lacks the ability to perform deep and structured semantic deconstructing and analysis on query sentences, and is more unable to dynamically adjust the focus of retrieval according to unique internal semantic relationships of each query. The invention aims to solve the technical problem of breaking through the traditional thought of regarding the query as a word bag or an integral vector, and realizing intelligent semantic focusing by analyzing the inherent graphic semantic structure and according to the dynamic characteristics of the structure so as to improve the retrieval precision in a complex query scene. Disclosure of Invention The invention aims to solve the defects in the prior art, and provides an intelligent retrieval method and system for an enterprise knowledge base based on AI. In order to achieve the purpose, the invention adopts the following technical scheme that the intelligent retrieval method of the enterprise knowledge base based on the AI comprises the following steps: receiving an input original query sentence, and carrying out structural deconstructment on the original query sentence to generate a query semantic graph containing a plurality of semantic nodes; traversing all semantic nodes in the query semantic graph, and calculating the stability coefficient of the semantic nodes and the contribution weight of the associated nodes; According to the stability coefficient of the semantic node and the contribution weight of the associated node, a dominant semantic node set is screened out from the query semantic graph; constructing a dynamic focusing window of the query semantic graph according to the connection strength among nodes in the dominant semantic node set and the distribution discrete degree of other nodes which are not selected into the dominant semantic node set in the query semantic graph; Reconstructing the semantic structure of the original query statement by utilizing a dynamic focusing window to generate a core query statement subjected to semantic focusing; And searching in the enterprise knowledge base based on the core query statement, and outputting the matched knowledge items. Preferably, the step of constructing the query semantic graph includes: Performing word segmentation and part-of-speech tagging on the original query sentence, and identifying and tagging all entity words, action words and modifier words; Taking each marked word as an initial semantic node, and establishing a sequential connection edge between the initial semantic nodes based on the linear sequence of the words in the original query statement; analyzing dependency syntactic relations among words, and establishing dependency relation connecting edges among initial semantic nodes with the dependency relations; And fusing the sequential connection edges and the dependency connection edges between the initial semantic nodes to form a query semantic graph containing nodes and multiple types of connection edges. Preferably, the method for calculating the stability coefficient of