CN-121981110-A - Paper innovation evaluation method based on visual language model reinforcement learning

CN121981110ACN 121981110 ACN121981110 ACN 121981110ACN-121981110-A

Abstract

The invention discloses a paper innovation evaluation method based on visual large model reinforcement learning, and belongs to the technical field of artificial intelligence. The method aims to solve the technical problems of single shallow evaluation dimension, missing semantic understanding and context association capability, static isolation of evaluation viewing angles and the like in the existing paper innovation evaluation. The method is characterized in that a training framework integrating a visual language model and reinforcement learning is constructed by designing a multidimensional innovation evaluation scale, and dynamic and global innovation evaluation is realized by combining an enhanced search generation technology. The method comprises the specific steps of innovative scale design, training data construction, model supervision fine tuning and reinforcement learning training and practical application of a model. The method can realize the innovative multidimensional, interpretable and refined evaluation of the paper, improve the accuracy, comprehensiveness and scientificity of the evaluation, and provide an intelligent solution for academic evaluation and scientific research management.

Inventors

HU HONGLIANG
GUO CHUANBIN
DU WEIWEI
YANG CHENG
ZHU XIAODONG
YANG WANBO
ZHANG SHANCHENG

Assignees

北京国科众安科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260306

Claims (5)

1. The paper innovation evaluation method based on visual language model reinforcement learning is characterized by comprising the following steps of: S1, designing an innovation scale, wherein the innovation scale comprises an innovation core dimension, an innovation value dimension and a demonstration rigorous dimension, and the innovation scale is divided into 100 points; s2, training a visual language model, which comprises the following substeps: S21, crawling the target field papers and quotation, namely crawling PDF papers in a specified time range and a specified field from Arxiv websites, and crawling the quotation number of the papers in the current year and the next year from Google Scholar; S22, PDF analysis and information extraction, namely extracting the title, author, abstract, brief introduction, method, data, result and analysis fields of the paper by using an open source PDF analysis tool, and storing the data, the result and the analysis fields in a relational database; s23, vectorizing the paper, namely vectorizing the title and abstract of the paper by adopting Qwen3-Embedding-8B model, and storing the title and abstract into a Milvus vector database; S24, training data construction, namely searching 10 historical papers which are most relevant to the semantics of the target papers and have earlier release time through a RAG technology, inputting the historical paper information, the target paper reference amount and the innovation scale as prompt information into qwen-VL big models, generating each dimension score, comment and total score, and deleting the reference amount to obtain training and verification data; s25, dividing data, namely dividing SFT training data and DAPO reinforcement learning data according to a ratio of 1:4, and reserving 5% data as a verification set in each training stage; S26, model training, namely performing SFT supervision fine adjustment by taking cross entropy loss as a target, and performing DAPO reinforcement learning training by taking an average value of single-point prediction accuracy as a reward function, wherein the single-point prediction accuracy=1-absolute percentage error=1- |true value-predicted value|/true value theoretical maximum value, the true value is a gold standard score of an evaluation dimension, the predicted value is the dimension score output by the model, and the true value theoretical maximum value is a full score corresponding to each evaluation dimension; And S3, using a trained model, namely analyzing and vectorizing the PDF paper to be evaluated and searching related historical paper, and taking the information of the paper to be evaluated, the information of the historical paper and the innovation scale as prompts to input the trained model to generate a structural innovation evaluation result.
2. The paper innovation evaluation method based on visual language model reinforcement learning according to claim 1, wherein in S1, innovation core dimensions comprise theoretical innovation 20 points, method innovation 20 points, application and demonstration innovation 10 points, data/resource innovation 10 points, innovation value dimensions comprise academic value 10 points and practical value 10 points, and demonstration rigor dimensions comprise logic self-consistency 10 points, evidence sufficiency and reliability 10 points.
3. The method for evaluating the innovation of the paper based on the reinforcement learning of the visual language model according to claim 1, wherein in S24, the history paper searched by the RAG needs to satisfy 10 papers with the highest semantic relevance before the target paper in the publication year and month.
4. The method for creatively evaluating paper based on reinforcement learning of visual language models according to claim 1, wherein in S26, training targets in SFT stage are training targets for enabling models to understand evaluation tasks, learning output structural formats and preliminary judgment logic, training targets in DAPO stage are average rewarding functions based on single-point prediction accuracy, and trade-off capability, generalization robustness, comment generation quality and fit degree of prediction scores and true values of the models are optimized.
5. The method for creatively evaluating an article based on reinforcement learning of a visual language model according to claim 1, wherein in S3, the process flow of the article to be evaluated includes PDF parsing and extracting headlines and abstracts, embedding model vectorization, and searching for 10 relevant historical articles in Milvus database, and the model generates an evaluation result including specific scores and comments.

Description

Paper innovation evaluation method based on visual language model reinforcement learning Technical Field The invention relates to the technical field of artificial intelligence, in particular to a paper innovation evaluation method based on visual language model reinforcement learning. Background With the explosive growth of the number of scientific and technological documents, innovative evaluation of academic papers has become a key subject in the fields of academia and scientific research management. The traditional paper innovation evaluation technology mainly depends on word level statistics (such as TF-IDF and LDA topic models) and reference quantity indexes, and although scientific research evaluation development is promoted to a certain extent, obvious limitations exist: First, the dimension was evaluated to be single and shallow. The traditional method is used for detecting new terms or deducing innovation by relying on reference quantity through word frequency analysis, neglecting the complexity and multi-level of innovation, and cannot distinguish innovation contributions of different dimensions such as theoretical paradigms, research methods, data resources and the like, so that an evaluation result is coarse, and the fine requirements are difficult to meet. Second, semantic understanding lacks context-associated capability. The traditional natural language processing technology is based on statistical rules, can not deeply understand deep logic, demonstration structure and concept semantics of academic texts, is difficult to identify research with similar terms but different kernels, or is innovative without using new terms but realized through knowledge recombination, and is easy to generate misjudgment. Finally, the static isolation of the viewing angle is evaluated. The prior method is used for analyzing single papers in multiple isolation or only considering direct quotation, the papers can not be considered under the background of academic development venation and interdisciplinary knowledge flow, the innovative nature is a relatively dynamic concept, the traditional technology can not track research problem evolution and academic paradigm alternation, and the true breakthrough of the papers can not be accurately estimated. The development of visual Language Models (VL Models) and reinforcement learning techniques provides a new path for solving the above-mentioned problems. The visual language model has deep semantic understanding and knowledge association capability, can realize multidimensional innovation feature extraction, and can build a global and dynamic evaluation view angle by combining a RAG technology through dynamic feedback optimization evaluation strategy by reinforcement learning, so that the accuracy, comprehensiveness and interpretability of evaluation are improved. Disclosure of Invention Aiming at the technical problems in the related art, the invention provides a paper innovation evaluation method based on visual language model reinforcement learning, which can solve the problems. In order to achieve the technical purpose, the technical scheme of the invention is realized as follows: A paper innovation evaluation method based on visual language model reinforcement learning comprises the following steps: S1, designing an innovation scale, wherein the innovation scale comprises an innovation core dimension, an innovation value dimension and a demonstration rigorous dimension, and the innovation scale is divided into 100 points; s2, training a visual language model, which comprises the following substeps: S21, crawling the target field papers and quotation, namely crawling PDF papers in a specified time range and a specified field from Arxiv websites, and crawling the quotation number of the papers in the current year and the next year from Google Scholar; S22, PDF analysis and information extraction, namely extracting the title, author, abstract, brief introduction, method, data, result and analysis fields of the paper by using an open source PDF analysis tool, and storing the data, the result and the analysis fields in a relational database; s23, vectorizing the paper, namely vectorizing the title and abstract of the paper by adopting Qwen3-Embedding-8B model, and storing the title and abstract into a Milvus vector database; S24, training data construction, namely searching 10 historical papers which are most relevant to the semantics of the target papers and have earlier release time through a RAG technology, inputting the historical paper information, the target paper reference amount and the innovation scale as prompt information into qwen-VL big models, generating each dimension score, comment and total score, and deleting the reference amount to obtain training and verification data; s25, dividing data, namely dividing SFT training data and DAPO reinforcement learning data according to a ratio of 1:4, and reserving 5% data as a verification set in each training sta