CN-122019763-A - Knowledge base retrieval method, system and medium based on behavior feedback and content characteristics
Abstract
The invention provides a knowledge base retrieval method, a system and a medium based on behavioral feedback and content characteristics, wherein the method comprises the steps of carrying out automatic content characteristic recognition on all knowledge segments in a knowledge base and generating content labels, wherein the content labels are used for intention matching; the method comprises the steps of collecting and processing various behavior data of a user in a knowledge base in real time, generating a behavior flow meter and a knowledge segment statistical table, storing the behavior flow meter and the knowledge segment statistical table in the database, constructing a multidimensional weighting algorithm engine for evaluating the knowledge segments to obtain the comprehensive scores of the knowledge segments, processing user inquiry, carrying out text retrieval, carrying out comprehensive evaluation on each knowledge segment obtained by retrieval by combining the multidimensional weighting algorithm engine, carrying out dynamic sorting according to the comprehensive scores of the knowledge segments, and displaying the results to the user. According to the invention, based on user behavior feedback and knowledge segment feature recognition, the user intention is quickly and accurately matched, the high-accuracy search result conforming to the user intention is obtained, and the user experience is improved.
Inventors
- HUANG XINEN
- ZHANG YONGXIA
- MENG FANHAO
Assignees
- 福建星网智慧科技有限公司
- 福建星网锐捷通讯股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251205
Claims (10)
- 1. The knowledge base retrieval method based on the behavior feedback and the content characteristics is characterized by comprising the following steps: Step S1, carrying out automatic content feature recognition on all knowledge segments in a knowledge base and generating content labels, wherein the content labels are used for intention matching; S2, collecting and processing various behavior data of a user in a knowledge base in real time, generating a behavior flow meter and a knowledge fragment statistical table, and storing the behavior flow meter and the knowledge fragment statistical table in a database; S3, constructing a multidimensional weighting algorithm engine for evaluating the knowledge segments to obtain the comprehensive scores of the knowledge segments; And S4, processing user inquiry, carrying out text retrieval, comprehensively evaluating each retrieved knowledge segment by combining a multidimensional weighting algorithm engine, dynamically sequencing according to the comprehensive score of the knowledge segment, and displaying the result to the user.
- 2. The method of claim 1, wherein the content tags in step S1 include a code feature tag for distinguishing whether a programming syntax feature is included, a step feature tag for distinguishing whether a flow description is included, and a data feature tag for distinguishing whether a table or statistical data is included.
- 3. The method according to claim 1, wherein the step S2 comprises: Establishing a refined knowledge segment use statistics mechanism, recording and tracking the user behavior of each knowledge segment in real time, and respectively storing behavior data and knowledge segments into a behavior flow meter and a knowledge segment statistics table in a database by designing a special data structure; the behavior flow list comprises a record ID, a knowledge fragment ID, a user ID, a behavior type, a behavior occurrence time, a browsing duration and comment content, wherein the behavior type comprises quotation, praise, collection, comment and browsing; The knowledge fragment statistical table takes a knowledge fragment ID as a main key and comprises total reference times, total praise numbers, total collection numbers, total comment numbers, total browsing duration and a final reference time stamp; the database supports real-time updating and synchronization.
- 4. The method according to claim 1, wherein the calculation formula related to the comprehensive scoring rule in the step S3 is as follows: Score=α×Similarity+β×BehaviorScore+γ×FeatureScore Wherein Score represents a comprehensive Score of the knowledge segment, similarity represents a text Similarity matching Score, behaviorScore represents a Score based on user behavior statistics, featureScore represents a Score based on content characteristics, and α, β and γ represent weight parameters, respectively, calculated from an intention recognition result of a user query and the content characteristic matching degree of the knowledge segment.
- 5. The method according to claim 1, wherein the step S4 comprises the following steps: When a user inquires, conventional text retrieval is carried out according to text information input by the user, so that a candidate set is obtained; inputting each knowledge segment in the candidate set into a multidimensional weighting algorithm engine, and utilizing the comprehensive score of the knowledge segment; and dynamically sequencing the knowledge segments in the candidate set according to the comprehensive scores, sequencing the knowledge segments from high to low according to the comprehensive scores, and displaying the results to the user.
- 6. The knowledge base retrieval system based on the behavior feedback and the content characteristics is characterized by comprising: the content feature analysis module is used for carrying out automatic content feature recognition on all knowledge segments in the knowledge base and generating content labels, wherein the content labels are used for intention matching; The data acquisition and processing module is used for collecting and processing various behavior data of a user in a knowledge base in real time, generating a behavior flow meter and a knowledge fragment statistical table and storing the behavior flow meter and the knowledge fragment statistical table in the database; The multidimensional weighting algorithm engine module is used for constructing a multidimensional weighting algorithm engine and evaluating the knowledge segments to obtain the comprehensive scores of the knowledge segments; And the searching and sequencing module is used for processing the user inquiry, carrying out text searching, comprehensively evaluating each knowledge segment obtained by searching by combining with the multidimensional weighting algorithm engine, dynamically sequencing according to the comprehensive score of the knowledge segment and displaying the result to the user.
- 7. The system of claim 6, wherein the data acquisition and processing module comprises: Establishing a refined knowledge segment use statistics mechanism, recording and tracking the user behavior of each knowledge segment in real time, and respectively storing behavior data and knowledge segments into a behavior flow meter and a knowledge segment statistics table in a database by designing a special data structure; the behavior flow list comprises a record ID, a knowledge fragment ID, a user ID, a behavior type, a behavior occurrence time, a browsing duration and comment content, wherein the behavior type comprises quotation, praise, collection, comment and browsing; The knowledge fragment statistical table takes a knowledge fragment ID as a main key and comprises total reference times, total praise numbers, total collection numbers, total comment numbers, total browsing duration and a final reference time stamp; the database supports real-time updating and synchronization.
- 8. The system of claim 6, wherein the multi-dimensional weighting algorithm engine module includes the following calculation formula: Score=α×Similarity+β×BehaviorScore+γ×FeatureScore Wherein Score represents a comprehensive Score of the knowledge segment, similarity represents a text Similarity matching Score, behaviorScore represents a Score based on user behavior statistics, featureScore represents a Score based on content characteristics, and α, β and γ represent weight parameters, respectively, calculated from an intention recognition result of a user query and the content characteristic matching degree of the knowledge segment.
- 9. The system of claim 6, wherein the retrieving and ordering module comprises: When a user inquires, conventional text retrieval is carried out according to text information input by the user, so that a candidate set is obtained; inputting each knowledge segment in the candidate set into a multidimensional weighting algorithm engine, and utilizing the comprehensive score of the knowledge segment; And dynamically sequencing the knowledge segments in the candidate set according to the comprehensive score, and intercepting a plurality of knowledge segments in front according to the sequence from high to low of the comprehensive score and displaying the knowledge segments to a user.
- 10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1 to 5.
Description
Knowledge base retrieval method, system and medium based on behavior feedback and content characteristics Technical Field The invention relates to the technical field of intelligent data retrieval, in particular to a knowledge base retrieval method, a system and a medium based on behavioral feedback and content characteristics. Background At present, with the continuous expansion of the size of the enterprise knowledge base, the traditional knowledge base retrieval method mainly depends on keyword matching or semantic similarity calculation based on vectors. This single search method has the following problems: Because the conventional search results are usually ranked according to the text similarity between the query words and the knowledge content, the method ignores the real value and popularity of the knowledge content in practical application, so that the top-ranked results are not always really needed or most useful by users, and some search systems adjust the search results through parameters such as the reference times, browsing time, praise, collection and the like of the knowledge segments, so that the search results are optimized to a certain extent, but the search results are difficult to obtain exposure for some high-quality but less-referenced new content, and some outdated or low-quality but more-referenced historical content can be frequently recommended due to the history halo, so as to form the Martai effect. In addition, conventional retrieval methods typically ignore the content features of the knowledge segments themselves. For example, whether a knowledge article contains statistics, code examples, operation steps, charts, etc. When the user's query is intended to find a specific type of content, such as a code instance or an operation guide, the system may recommend a theoretical introduction of a large text, resulting in a search result that does not match the user's needs and is inefficient. Disclosure of Invention The invention aims to solve the technical problems of providing a knowledge base retrieval optimization method, a system and a medium based on user behavior feedback and content characteristics, and solving the problems of insufficient retrieval accuracy of the existing knowledge base and the like. In a first aspect, the present invention provides a method for optimizing knowledge base retrieval based on user behavior feedback and content characteristics, the method comprising: Step S1, carrying out automatic content feature recognition on all knowledge segments in a knowledge base and generating content labels, wherein the content labels are used for intention matching; S2, collecting and processing various behavior data of a user in a knowledge base in real time, generating a behavior flow meter and a knowledge fragment statistical table, and storing the behavior flow meter and the knowledge fragment statistical table in a database; S3, constructing a multidimensional weighting algorithm engine for evaluating the knowledge segments to obtain the comprehensive scores of the knowledge segments; And S4, processing user inquiry, carrying out text retrieval, comprehensively evaluating each retrieved knowledge segment by combining a multidimensional weighting algorithm engine, dynamically sequencing according to the comprehensive score of the knowledge segment, and displaying the result to the user. Further, the content tag in the step S1 includes a code feature tag, a step feature tag, and a data feature tag, where the code feature tag is used to distinguish whether the code feature tag contains a programming grammar feature, the step feature tag is used to distinguish whether the code feature tag contains a flow description, and the data feature tag is used to distinguish whether the code feature tag contains a table or statistical data. Further, the step S2 specifically includes: Establishing a refined knowledge segment use statistics mechanism, recording and tracking the user behavior of each knowledge segment in real time, and respectively storing behavior data and knowledge segments into a behavior flow meter and a knowledge segment statistics table in a database by designing a special data structure; the behavior flow list comprises a record ID, a knowledge fragment ID, a user ID, a behavior type, a behavior occurrence time, a browsing duration and comment content, wherein the behavior type comprises quotation, praise, collection, comment and browsing; The knowledge fragment statistical table takes a knowledge fragment ID as a main key and comprises total reference times, total praise numbers, total collection numbers, total comment numbers, total browsing duration and a final reference time stamp; the database supports real-time updating and synchronization; further, the step S1 specifically includes labeling the knowledge segments in the knowledge base with feature tags by using Natural Language (NLP) technology, regular expression matching and keyword matching rule