CN-122021611-A - Academic literature reading and analyzing system based on high-quality recursion abstract driving
Abstract
The invention relates to the technical field of academic literature reading and analyzing, and discloses an academic literature reading and analyzing system based on high-quality recursion abstract driving, which comprises an access and preprocessing unit, a recursion abstract generating unit, an intelligent analyzing and extracting unit, an organization and visualization unit and an interaction optimizing unit. The academic literature reading analysis system based on high-quality recursion abstract driving comprises an access and preprocessing unit, a recursion abstract generating unit, an intelligent analysis and extraction unit, an organization and visualization unit and an interaction optimizing unit, wherein the access and preprocessing unit is in signal connection with the recursion abstract generating unit, the recursion abstract generating unit is in signal connection with the intelligent analysis and extraction unit, the intelligent analysis and extraction unit is in signal connection with the organization and visualization unit, the organization and visualization unit is in signal connection with the interaction optimizing unit, and the interaction optimizing unit is in signal connection with the recursion abstract generating unit.
Inventors
- CHENG JUNHAO
- YOU BO
- CHEN YUXIANG
- Lai Yuefu
Assignees
- 浙江科技大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260129
Claims (9)
- 1. The academic literature reading and analyzing system based on high-quality recursion abstract driving comprises an access and preprocessing unit (1), a recursion abstract generating unit (2), an intelligent analyzing and extracting unit (3), an organization and visualization unit (4) and an interaction optimizing unit (5), and is characterized in that the access and preprocessing unit (1) is in signal connection with the recursion abstract generating unit (2), the recursion abstract generating unit (2) is in signal connection with the intelligent analyzing and extracting unit (3), the intelligent analyzing and extracting unit (3) is in signal connection with the organization and visualization unit (4), the organization and visualization unit (4) is in signal connection with the interaction optimizing unit (5), and the interaction optimizing unit (5) is in signal connection with the recursion abstract generating unit (2); The access and preprocessing unit (1) is used for performing multi-format adaptive access, redundant information cleaning and structure analysis on academic documents to generate standardized document data; The recursive abstract generating unit (2) is used for extracting the abstract of the standardized literature data and performing multiple rounds of recursive optimization, and outputting a high-quality abstract by combining quality evaluation; the intelligent analysis and extraction unit (3) is used for extracting core knowledge points from documents and abstracts, mining document association, analyzing research methods and identifying innovation points; the organization and visualization unit (4) is used for classifying and clustering the knowledge points, constructing a knowledge graph and displaying literature venues; The interaction optimization unit (5) is used for providing a user interaction function, collecting feedback and realizing iterative optimization of the system model.
- 2. The academic literature reading and analyzing system based on high-quality recursive abstract driving of claim 1, wherein the access and preprocessing unit (1) comprises a multi-format adaptation module (11), a redundant information cleaning module (12) and a literature structure analyzing module (13), an output end of the multi-format adaptation module (11) is in signal connection with the redundant information cleaning module (12), and an output end of the redundant information cleaning module (12) is in signal connection with the literature structure analyzing module (13); The multi-format adaptation module (11) is used for being compatible PDF, DOCX, EPUB and importing academic documents with exclusive formats of academic databases; The redundant information cleaning module (12) is used for automatically identifying and removing redundant information in the literature and reserving core content; The document structure analysis module (13) is used for splitting various structural units of a document and generating standardized document structure data.
- 3. The academic literature reading and parsing system based on high-quality recursive abstract driving of claim 2, wherein said redundant information cleaning module (12) is configured to automatically identify and reject redundant information of watermarks, header footers, repeated paragraphs, and invalid format marks in the literature through a semantic similarity algorithm, and retain core text, formulas, charts, and reference information, where the redundant information determination satisfies the formulas: Wherein, the For text fragments And (3) with Is used to determine the degree of semantic similarity of (1), As the weight of the kth feature word, For text fragments And (3) with Included angle of kth feature word in (a), n is total number of feature words, when And judging redundant information.
- 4. The academic literature reading and analyzing system based on high-quality recursion digest driving according to claim 1, wherein the recursion digest generating unit (2) comprises a digest extracting module (21), a multi-round recursion optimizing module (22) and a digest quality evaluating module (23), an output end of the digest extracting module (21) is in signal connection with the multi-round recursion optimizing module (22), and an output end of the multi-round recursion optimizing module (22) is in signal connection with the digest quality evaluating module (23); the abstract extraction module (21) is used for extracting core views of each structural unit of the literature and generating an initial abstract text; the multi-round recursion optimization module (22) is used for performing iterative recursion refining on the initial abstract; the abstract quality evaluation module (23) is used for quantitatively scoring each level of abstract.
- 5. The academic literature reading and parsing system based on high-quality recursion abstract driving of claim 4, wherein the multi-round recursion optimization module (22) is used for gradually refining abstract contents by iterative recursion operation by taking an initial abstract as an input and combining literature original text semantic association, and the abstract information coverage rate after each round of recursion optimization satisfies the formula: Wherein, the For the information coverage after the r-th round of recursion, For the number of core information points contained in the r-th round digest, The number of recursions ranges from 2 to 5 rounds for the total number of core information points in the document.
- 6. The academic literature reading and analyzing system based on high-quality recursive abstract driving of claim 4, wherein said abstract quality evaluation module (23) is configured to quantitatively score each level of abstract based on three indexes of information coverage, semantic accuracy and logical continuity, screen an optimal quality abstract, and the comprehensive score satisfies the formula: Wherein, Q is summary comprehensive quality score, C is information coverage normalized score, a is semantic accuracy normalized score, L is logic coherence normalized score, α, β, γ are weight coefficients, and α+β+γ=1.
- 7. The academic literature reading analysis system based on high-quality recursive abstract driving of claim 1, wherein the intelligent analysis and extraction unit (3) comprises a core knowledge point extraction module (31), a literature association mining module (32), a research method analysis module (33) and an innovation point identification module (34), the output end of the core knowledge point extraction module (31) is respectively in signal connection with the literature association mining module (32) and the research method analysis module (33), and the output ends of the literature association mining module (32) and the research method analysis module (33) are respectively in signal connection with the innovation point identification module (34); the core knowledge point extraction module (31) is used for extracting key knowledge elements of research background, research problem, experimental data and conclusion views from literature texts and recursive abstracts to form a structured knowledge point set; the literature association mining module (32) is used for constructing a citation association, a theme association and a method association network among the literatures based on the knowledge point semantic similarity; the research method analysis module (33) is used for identifying theoretical models, experimental designs and data analysis methods adopted by the literature, and disassembling implementation steps and core parameters of the method; The innovation point identification module (34) is used for locating breakthrough points, improvement schemes and innovation application scenes of documents by comparing prior art documents with recursive summaries.
- 8. The academic literature reading and analyzing system based on high-quality recursive abstract driving of claim 1, wherein the organization and visualization unit (4) comprises a knowledge point classification and clustering module (41), a knowledge graph construction module (42) and a literature context display module (43), the output end of the knowledge point classification and clustering module (41) is in signal connection with the knowledge graph construction module (42), and the output end of the knowledge graph construction module (42) is in signal connection with the literature context display module (43); The knowledge point classification and clustering module (41) is used for classifying and archiving the extracted knowledge points according to the subject and research directions based on the discipline domain dictionary and the semantic clustering algorithm; the knowledge graph construction module (42) is used for generating a visual knowledge graph by taking knowledge points as nodes and the association relationship as edges; the literature context display module (43) is used for generating a literature reading context graph based on the recursion abstract level and the knowledge correlation network and labeling a core viewpoint evolution path and key nodes.
- 9. The academic literature reading and analyzing system based on high-quality recursive abstract driving of claim 1, wherein the interaction optimizing unit (5) comprises a user interaction module (51), a feedback collecting module (52) and a model iteration module (53), an output end of the user interaction module (51) is in signal connection with the feedback collecting module (52), and an output end of the feedback collecting module (52) is in signal connection with the model iteration module (53); The user interaction module (51) is used for providing interaction functions of abstract level switching, knowledge point retrieval, literature question answering and annotation; the feedback collection module (52) is used for recording evaluation feedback of the user on the abstract quality, the knowledge point accuracy and the visual effect and generating a feedback data set; The model iteration module (53) is used for fine tuning a recursive abstract generation algorithm and a knowledge point extraction model based on a feedback data set in an incremental training mode, and optimizing the system analysis precision and the output quality.
Description
Academic literature reading and analyzing system based on high-quality recursion abstract driving Technical Field The invention relates to the technical field of academic literature reading and analyzing, in particular to an academic literature reading and analyzing system based on high-quality recursion abstract driving. Background The number of academic documents serving as core carriers for spreading scientific research achievements, accumulating knowledge and communicating academic is exponentially increased along with the expansion and technical progress of scientific research fields, and the academic documents cover a plurality of discipline fields such as natural science, engineering technology, human social sciences and the like, have various document formats (such as PDF, DOCX, EPUB and the like), and have strong content specialization and complex structure. For scientific researchers, students and academic workers, efficient reading and accurate analysis of academic documents are key preconditions for grasping research front dynamics, refining core views and excavating innovation directions. However, the related technology and tools for reading and analyzing the prior academic documents still have the defects of limited document format compatibility, poor adaptability to unstructured documents such as scanning PDF, proprietary formats of academic databases and the like, low preprocessing efficiency because redundant information is manually cleaned, uneven summary generation quality, difficulty in considering information coverage and semantic consistency due to adoption of a single extraction mode, incapability of forming a hierarchical and accurate summary system and difficulty in meeting the requirements of different users on deep research and reading of the documents, and insufficient interactivity and iterative optimization capability, wherein the system cannot continuously optimize model performance based on user feedback due to difficulty in improving analysis precision and user experience, so that the academic document reading and analyzing system driven by high-quality recursion the basis of the abstract is required to be invented. Disclosure of Invention The invention aims to provide an academic literature reading and analyzing system based on high-quality recursion abstract driving so as to solve the problems in the background technology. In order to achieve the aim, the invention provides the technical scheme that the academic literature reading and analyzing system based on high-quality recursion abstract drive comprises an access and preprocessing unit, a recursion abstract generating unit, an intelligent analyzing and extracting unit, an organization and visualization unit and an interaction optimizing unit, wherein the access and preprocessing unit is in signal connection with the recursion abstract generating unit, the recursion abstract generating unit is in signal connection with the intelligent analyzing and extracting unit, the intelligent analyzing and extracting unit is in signal connection with the organization and visualization unit, the organization and visualization unit is in signal connection with the interaction optimizing unit, and the interaction optimizing unit is in signal connection with the recursion abstract generating unit; The access and preprocessing unit is used for performing multi-format adaptive access, redundant information cleaning and structure analysis on the academic literature to generate standardized literature data; the recursion abstract generating unit is used for extracting the abstract of the standardized document data and performing multiple rounds of recursion optimization, and outputting a high-quality abstract by combining quality evaluation; the intelligent analysis and extraction unit is used for extracting core knowledge points from documents and abstracts, mining document association, analyzing research methods and identifying innovation points; The organization and visualization unit is used for carrying out classification clustering on the knowledge points, constructing a knowledge graph and displaying literature context; the interaction optimization unit is used for providing a user interaction function, collecting feedback and realizing iterative optimization of the system model. Preferably, the access and preprocessing unit comprises a multi-format adapting module, a redundant information cleaning module and a document structure analysis module, wherein the output end of the multi-format adapting module is in signal connection with the redundant information cleaning module, and the output end of the redundant information cleaning module is in signal connection with the document structure analysis module; The multi-format adaptation module is used for being compatible PDF, DOCX, EPUB and importing academic documents with exclusive formats of academic databases; The redundant information cleaning module is used for automatically iden