CN-122024265-A - EVA AI academic auxiliary system

CN122024265ACN 122024265 ACN122024265 ACN 122024265ACN-122024265-A

Abstract

The PDF document structuring processing and converting tool analyzes and converts the content (including text, form, image, structure, etc.) in the PDF document into structured data (MarkDown) which can be used for downstream LLM tasks, and is particularly suitable for PDFs with complex typesetting of academic documents, reports, etc. Through the document data vectorization function, the medical professional document can be accurately processed, the core ideas, key data and logic relations in the document are deeply understood, the vector which can accurately reflect the essence of the document is generated, important information is prevented from being lost or misread, and the information processing and application effects are improved. Meanwhile, the personalized Agent can adapt and optimize to diversified tasks in the medical academic field, can fit research directions, knowledge backgrounds and working habits of different users, for example, provides support conforming to strict logic structures and standard academic expressions when academic papers are written, presents concise and clear assistance and key outstanding contents when academic slides are made, effectively improves the working efficiency of medical workers and researchers, and meets the high-standard requirements in the medical field.

Inventors

Request for anonymity

Assignees

上海歆语网络科技有限公司

Dates

Publication Date: 20260512
Application Date: 20250829

Claims (2)

Markdown module innovation point ① Document visual structure perception, performing layout analysis by using a pre-trained document understanding model, ② multi-modal analysis (text+image), performing direct vector extraction on a scanning PDF by using OCR, performing ③ structure hierarchy reconstruction, reasoning out a chapter tree structure of a document based on visual cues (font size, thickness and indentation) +title number modes (such as 1.1 and 1.2), ④ table structural recognition, reconstructing row and column information by adopting a visual model+text position cross matching mode, ⑤ end-to-end structuring, forming a complete pipeline by inputting PDF into JSON/HTML (Java object/hypertext markup language) output and not relying on a plurality of isolated tool chains, and ⑥ supporting scientific literature scenes, performing optimization processing aiming at a typical structure of scientific literature (such as arXiv and PubMed), and accurately recognizing paper structures and metadata (such as abstract and author information).
Agent Module innovation Point ① The dynamic adaptation engine for medical academic task is used for innovatively constructing a three-dimensional adaptation model of scene-specification-user, aiming at task scenes such as paper writing, slide making and the like, embedding a special specification (such as ICMJE paper specification and academic conference slide format) in the medical field, and dynamically adjusting the structure and expression mode of output content by combining with the research direction (such as oncology and neuroscience) of the user. ② Based on the deep knowledge association of the document vector, the structured data is converted into the medical exclusive vector, so that semantic level association retrieval is realized, key words are matched, and logic related contents such as 'drug side effect' and 'adverse reaction mechanism' can be associated. ③ And (3) strengthening the personalized iteration of the learning drive, namely optimizing a task strategy through feedback of a user on content modification, wherein after the user adjusts a quotation format for a plurality of times, the system can automatically adapt to the habit, so that the effect of 'longer use and more fitting with requirements' is realized. ④ And constructing a closed loop of 'analysis-generation-processing-feedback-optimization', reversely transmitting user feedback to optimize vector generation weight while the Agent calls MarkDown module data, and continuously adapting to new requirements in the medical field.

Description

EVA AI academic auxiliary system Technical Field The invention relates to the technical field of artificial intelligence, in particular to an AI academic assistance system applied to medical research, aiming at improving the efficiency of doctors and researchers in the aspects of document retrieval, academic writing and data processing. Background The scheme aims to solve two major core problems, namely the problems of vectorization information loss, poor segmentation quality and the like caused by no structural pretreatment when the traditional RAG technology processes a complex medical academic PDF document, and the problem that a general AI intelligent agent is insufficient in suitability in a medical academic scene and cannot meet the requirements of strict paper writing, simplicity of slide show making, individuation and other industry pain points. The PDF structural analysis and personalized Agent cooperative work is realized through the modularized design, and a full-flow solution of 'document processing-vector generation-task adaptation' is constructed. Disclosure of Invention Markdown module The module converts the complex typesetting medical academic PDF into a structured Markdown format through a five-step processing flow, and provides a high-quality text basis for subsequent vectorization, and the specific steps are as follows: 1. PDF paging and rendering, which is to analyze the PDF file into page images or byte streams by using PyMuPDF and to perform imaging rendering (for OCR or image models) on each page. 2. The image-text content recognition is that an OCR module (such as TESSERACT if PDF is a scanned item) is used for processing the scanned version PDF or the picture embedded page, and the vector text extraction is that character extraction is used for 'original PDF', and OCR is not needed. 3. Block level structure detection (Layout Analysis) is to use a deep learning model (based on LayoutLMv or Donut etc.) to segment the document structure, detect the regions of title, paragraph, table, image etc., support fine tuning by labeling data, and also can directly use a pre-training model. 4. Element identification and classification, namely classifying the type (such as paragraph, title, image and table) of each identification block, further structurally analyzing the table area (such as row-column coordinates, cell merging and the like), and labeling and slicing the formulas and the images. 5. Reducing the hierarchical structure, namely reducing the chapter structure based on the characteristics of document layout, font size, indentation and the like, such as Title, section, subsection and Paragraph; Agent module The module is based on structured data output by a MarkDown module, performs customization processing aiming at task scenes specific to the medical field, and comprises a task adaptation engine and a dynamic learning mechanism: 1. The task adaptation engine comprises ① academic papers, a built-in medical paper template library (comprising the types of treatises, reviews, case reports and the like), adapting templates according to user portrait recommendation, automatically filling related literature references and experimental data through vector retrieval, and providing structural modification suggestions by detecting logical continuity (based on discourse analysis models) and academic standardability (such as checking the format of the international journal editing committee (ICMJE)) in real time during the writing process. ② The method comprises the steps of carrying out literature review and arrangement, generating a theme outline (such as action mechanism of medicine A, clinical curative effect and adverse reaction) based on similarity clustering of literature vectors (such as classifying clinical test documents of the same medicine by cosine similarity), extracting core data (such as sample size, P value and confidence interval) of each document, summarizing and comparing in a table form, marking a Markdown file path of a data source, and facilitating traceability verification of a user. ③ Academic slide sketch is written, and paper core content is automatically converted into a word sketch based on word expression preference (such as conciseness degree and logic level style) in user portrait. By extracting key conclusions (core data such as P value, confidence interval and the like) in the document vector, the key conclusions are combed into outline items according to clear logic levels, and each item stands out from the core view, so that redundant expression is avoided. Meanwhile, the hierarchical structure of the outline is supported to be adjusted according to the requirements of users, such as adding sub-item refined contents or merging item simplified structures, so that the requirements of different academic occasions on the slide show outline are met. ④ The clinical test scheme is designed, namely, according to the information of research purposes, disease types an