CN-122019727-A - Reading software intelligent search system based on natural language processing technology
Abstract
The invention discloses an intelligent search system of reading software based on natural language processing technology, which relates to the technical field of computer software and natural language processing and comprises a user interaction interface, a query understanding engine and a query understanding engine, wherein the user interaction interface is used for receiving an original search request initiated by a user in a reading context, the query understanding engine is connected to the user interaction interface and is used for carrying out deep semantic analysis facing to a reading scene on the original search request, and the deep semantic analysis at least comprises the steps of identifying exploratory intentions related to reading behaviors and carrying out context completion on the references and omissions in query by utilizing the real-time reading progress and history notes of the user so as to output a structured query object. The invention triggers the rollback strategy through consistency verification failure, reduces the output risk of misleading generation results, realizes the spanning from keyword matching to semantic understanding, from static list to dynamic generation and from isolated search to context dialogue, and provides an intelligent reading and searching experience for users.
Inventors
- FENG MIN
Assignees
- 半糖去冰科技(北京)有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260226
Claims (10)
- 1. The reading software intelligent searching system based on the natural language processing technology is characterized by comprising: a user interaction interface (100) for receiving an original search request initiated by a user in a reading context; the query understanding engine (200) is connected to the user interaction interface (100) and is used for carrying out deep semantic analysis facing to a reading scene on an original search request, wherein the deep semantic analysis at least comprises the steps of identifying exploratory intents related to reading behaviors, and carrying out context completion on references and omissions in a query by utilizing real-time reading progress and historical notes of a user so as to output a structured query object; The document semantic index and search engine (300) is used for constructing and storing a structural anchor semantic index, wherein the structural anchor semantic index comprises hierarchical structure information (at least comprising chapter/section/paragraph anchor marks) of a document and a knowledge-graph entity anchor, and semantic similarity matching of a document structure is combined based on the structural query object so as to obtain a preliminary search result set; A personalized ranking and content generation engine (400) connected to the query understanding engine (200) and the document semantic indexing and retrieval engine (300) for reordering the preliminary retrieval result set based on the user behavioral portraits and generating abstract or explanatory content matching the query intent; the user behavior acquisition and portrayal modeling module (500) is used for acquiring and analyzing a multidimensional depth behavior sequence of a user in a reading process in real time, and constructing and updating a dynamic user portrayal; Wherein the personalized ranking and content generation engine (400) is configured to introduce a read context evidence constraint architecture, RCS, in generating summarized or interpreted content, the RCS comprising at least: (a) Must-Anchor Fields, section/paragraph Anchor point identification, reference fragment ID and evidence intensity field for limiting the binding of each conclusion/answer sentence in the generated content; (b) Must-Hold constructs for defining textual consistency Constraints that the generated content must not violate, the consistency Constraints including at least one or more of concept definition consistency, character relationship consistency, timeline consistency; And the personalized ranking and content generation engine (400) is further configured to perform consistency verification on the generated content, and trigger a rollback policy when the consistency verification fails, the rollback policy at least comprising outputting only a retrieved evidence card, reducing confidence and prompting a manual review, or requiring replenishment of evidence for regeneration.
- 2. The reading software intelligent search system based on natural language processing technology according to claim 1, wherein the query understanding engine (200) comprises an intention recognition unit (201), a semantic parsing and entity linking unit (202) and a query expansion and correction unit (203), wherein the intention recognition unit (201) classifies the query intention as at least one of a factual query, a view finding, a content recommendation, a comparison analysis or a task execution by using a sequence classification model based on a Transformer, wherein the semantic parsing and entity linking unit (202) is used for recognizing a named entity in the query and linking it to a standard entity node in a system built-in or external knowledge graph while parsing a semantic role frame of the query, and wherein the query expansion and correction unit (203) performs semantic expansion on an original query by using a word network, an association concept and performs context completion on the query with a presence indication or omission based on a session history.
- 3. The intelligent search system of reading software based on natural language processing technology as set forth in claim 1, wherein said document semantic indexing and retrieval engine (300) comprises a document depth encoder (301), a knowledge enhancement module (302) and a hierarchical index library (303).
- 4. A reading software intelligentized search system based on natural language processing technology according to claim 3, characterized in that the document depth encoder (301) encodes the title, text, paragraph and metadata of the document based on a pre-trained language model to generate document-level, paragraph-level multi-granularity semantic vectors, the knowledge enhancement module (302) is used for introducing external knowledge-graph information in the encoding process to enhance and represent entities and concepts in the document, the hierarchical index library (303) is used for storing multi-granularity semantic vectors and supporting multi-level retrieval based on an approximate nearest neighbor search algorithm, and each vector entry is bound with a corresponding chapter/section/paragraph anchor identifier.
- 5. The natural language processing technology based reading software intelligent search system according to claim 1, wherein the personalized ranking and content generating engine (400) comprises an intent driven dynamic content planner (401) and a dynamic summary generator (402) and a multidimensional ranking model (403), the dynamic content planner (401) selects a generating pattern based on a structural location distribution of the search result in the document hierarchy, the generating pattern at least comprising an explanatory summary, a comparative summary, a timeline comb or a relational map summary.
- 6. The intelligent search system of reading software based on natural language processing technology according to claim 1, wherein the user behavior acquisition and portrayal modeling module (500) is configured to reorder the preliminary retrieval results according to deep interest foci and cognitive state vectors mined from the user behavior, and to dynamically generate integrated content intended to bridge a user knowledge gap based on the reordered result set, the exploratory intention of the user, and the structural position distribution of the retrieval results in the document; the user behavior acquisition and representation modeling module (500) comprises: a real-time behavior stream processor (501) for capturing search, click, reading time, page turning speed, scribing, note taking and sharing behaviors of a user; The interest topic evolution model (502) is used for extracting interest labels of users in different dimensions and the change trend of the strength of the interest labels with time by adopting a topic model or a deep interest network based on long-term behavior data of the users; A short term session context sensor (503) for capturing a sequence of actions within a current search session, constructing a short term contextualized preference vector.
- 7. The intelligent search system of reading software based on natural language processing technology according to claim 1, wherein the user behavior acquisition and portrayal modeling module (500) is configured to construct a short-term conversational context vector and a long-term interest topic evolution vector, and input the constructed vectors as features into a multi-dimensional ordering model (403) and a dynamic content planner (401) of a personalized ordering and content generation engine (400).
- 8. The intelligent search system of reading software based on natural language processing technology according to claim 5, wherein the dynamic summary generator (402) adopts an encoder-decoder architecture, extracts or generates a section of coherent summary text focused on a query point from a top-K search result document according to intention and key entities in a structured query object, and the ranking factors of the multi-dimensional ranking model (403) at least comprise query-document semantic relevance scores, document preference degrees based on user historical clicks and reading time periods, document timeliness weights, document authority weights, social collaborative filtering recommendation scores and context awareness.
- 9. The intelligent search system of reading software based on natural language processing technology as set forth in claim 1, wherein the consistency check comprises extracting entity and relationship assertions from the generated content and matching with Must-Anchor Fields bound reference segments, and determining that the consistency check fails when the matching fails or conflicts with Must-Hold Constraints.
- 10. A reading software intelligent searching method based on natural language processing technology, which is applied to the system of any one of claims 1-9, and is characterized by comprising the following steps: S1, receiving an original search request initiated by a user through a user interaction interface; s2, carrying out deep semantic analysis on the original search request by a query understanding engine to generate a structured query object; s3, carrying out structural anchor semantic retrieval by a document semantic indexing and retrieval engine based on the structural query object to obtain a preliminary retrieval result set; s4, the personalized ranking and content generation engine is combined with the user behavior portraits to reorder the primary retrieval result set, and auxiliary content is generated under the RCS constraint of the reading context evidence constraint framework; And S5, presenting the final ordered result list and the generated auxiliary content to a user, wherein when the consistency check fails, a rollback strategy is executed.
Description
Reading software intelligent search system based on natural language processing technology Technical Field The invention relates to the technical field of computer software and natural language processing, in particular to an intelligent reading software searching system based on a natural language processing technology. Background With the popularization of digital reading, the number of documents in reading software is rapidly increased, and users often generate exploratory search demands for context in the reading process. In the prior art, schemes such as keyword retrieval, general semantic retrieval, knowledge graph retrieval or personalized sequencing, abstract generation and the like are adopted. But still faces the problems of query ambiguity, insufficient anchoring of chapter structures, rough utilization of behavior signals, lack of traceable evidence of generated contents, consistency constraint and the like caused by referencing/omitting under a reading scene. For example, knowledge-graph-based semantic retrieval schemes may promote entity matching and templated query capabilities, but often lack strong constraints on reading context and a consistency check mechanism to generate. Therefore, we propose a reading software intelligent search system based on natural language processing technology. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a reading software intelligent search system based on a natural language processing technology, which remarkably improves the search accuracy and the generated content credibility under a reading scene through a closed loop mechanism of reading context completion, structure anchor point index retrieval, evidence constraint generation, consistency verification and rollback. In order to achieve the above purpose, the present invention adopts the following technical scheme: reading software intelligent search system based on natural language processing technology includes: The user interaction interface is used for receiving an original search request initiated by a user in the reading context; The query understanding engine is connected to the user interaction interface and is used for carrying out deep semantic analysis facing to a reading scene on an original search request, wherein the deep semantic analysis at least comprises the steps of identifying exploratory intentions related to reading behaviors, and carrying out context completion on the references and omission in the query by utilizing the real-time reading progress and the history notes of a user so as to output a structured query object; The document semantic index and search engine is used for constructing and storing a structural anchor semantic index, wherein the structural anchor semantic index comprises hierarchical structure information (at least comprising chapter/section/paragraph anchor marks) of a document and a knowledge-graph entity anchor, and semantic similarity matching of a document structure is carried out on the basis of the structural query object so as to obtain a preliminary search result set; The personalized ordering and content generating engine is connected with the query understanding engine and the document semantic indexing and retrieving engine and is used for reordering the preliminary retrieval result set based on the user behavior portraits and generating abstract or explanatory content matched with the query intention; the user behavior acquisition and image modeling module is used for acquiring and analyzing a multidimensional depth behavior sequence of a user in a reading process in real time, and constructing and updating a dynamic user image; wherein the personalized ranking and content generation engine is configured to introduce a read context evidence constraint architecture, RCS, in generating summarized or interpreted content, the RCS comprising at least: (a) Must-Anchor Fields, section/paragraph Anchor point identification, reference fragment ID and evidence intensity field for limiting the binding of each conclusion/answer sentence in the generated content; (b) Must-Hold constructs for defining textual consistency Constraints that the generated content must not violate, the consistency Constraints including at least one or more of concept definition consistency, character relationship consistency, timeline consistency; And the personalized ranking and content generation engine is further configured to perform consistency check on the generated content, and trigger a rollback policy when the consistency check fails, wherein the rollback policy at least comprises only outputting a retrieval evidence card, reducing confidence and prompting manual review, or requiring replenishment of evidence for regeneration. The query understanding engine comprises an intention recognition unit, a semantic analysis and entity linking unit and a query expansion and correction unit, wherein the intention recognition unit classifies que