CN-121997050-A - Thesis influence evolution track intelligent agent prediction method and system

CN121997050ACN 121997050 ACN121997050 ACN 121997050ACN-121997050-A

Abstract

The invention provides a method and a system for predicting an article influence evolution track intelligent agent, wherein the method comprises the steps of S1, constructing a training data set containing target articles and real influence labeling values thereof, S2, constructing a large language model reasoning input and configuring an external academic information interface set aiming at the target articles in the training data set, executing a multi-round reasoning and external information calling process by the large language model, outputting predicted influence values of the target articles when termination conditions are met, S3, constructing a reward signal based on deviation between the real influence labeling values of the target articles and the predicted influence values, and S4, taking expected rewards on the maximized training data set as optimization targets, and outputting model parameters after training. The method can lead the model to gradually form reasonable, stable and interpretable reasoning logic under the weak supervision condition of only depending on single point or limited time window influence labeling, and effectively improves the credibility, generalization capability and decision transparency of the prediction result.

Inventors

Zheng Tongya
LI WENDA
Ying Yuchen
JIN CANGHONG
SONG MINGLI

Assignees

浙大城市学院

Dates

Publication Date: 20260508
Application Date: 20260130

Claims (10)

1. The method for predicting the thesis influence evolution track agent is characterized by comprising the following steps: s1, constructing a training data set containing target papers and true influence labeling values of the target papers; S2, constructing a large language model to infer and input and configure an external academic information interface set aiming at a target paper in the training data set, executing a multi-round reasoning and external information calling process by the large language model, and outputting a predictive influence value of the target paper when a termination condition is met; S3, constructing a reward signal based on the deviation between the true influence labeling value and the predicted influence value of the target paper; And S4, optimizing and updating the reasoning strategy of the large language model in the multi-round reasoning and information calling process by using the reinforcement learning method with the expected rewards on the maximized training data set as the optimization target, and outputting the model parameters after training.
2. The method for predicting an agent for evolution trajectories of paper impact according to claim 1, wherein the constructing a training dataset in step S1 specifically comprises: s11, constructing an academic database according to academic network data in the target research field, wherein the academic database at least comprises paper element information and relationship data; S12, setting the length of a time window for influence statistics, and counting the accumulated quotation times of each table of contents treatises in the time window after the posting of the influence statistics as the actual influence labeling value of the target treatises; S13, dividing the marked sample set into a training set, a verification set and a test set, and constructing a training data set representation form.
3. The method for predicting an agent for an evolution trace of influence of a paper according to claim 1, wherein the external academic information interface set in step S2 at least comprises: The paper meta information inquiry interface is used for inquiring the title, the publication year, the author list and the publication carrier information of the paper according to the paper identification; A quotation relation query interface for querying a reference set or quotation relation of the paper according to the paper identification; The author or publishing carrier statistics interface is used for inquiring a corresponding historical paper list or historical quotation statistics information according to the author identification or the publishing carrier identification; and the similar paper retrieval interface is used for retrieving and returning the similar paper collection according to the paper identification or the paper text description.
4. The method for predicting the thesis influence evolution track agent according to claim 1, wherein the multi-round reasoning and external information calling process in step S2 specifically comprises: Under the constraint of the maximum reasoning round number T, the large language model generates new external information calling requests or intermediate reasoning conclusions round by round according to the initial input and the external information calling return results of the historical rounds; And when the large language model outputs a termination instruction or reaches the maximum reasoning round number T, ending the reasoning process, and generating the predictive influence value of the target paper based on the current reasoning context.
5. The method for predicting an agent for an evolution track of influence of a paper according to claim 1, wherein the constructing a reward signal based on deviation in step S3 specifically comprises: Based on predictive influence values Marking value with true influence Constructing a reward function Wherein For measuring the proximity of the predicted result to the actual value; performing numerical constraint or normalization processing on the reward signal to enable the reward signal to meet a preset value range Wherein And Are all preset constants.
6. The method of claim 5, wherein the reward function is defined as: 。
7. the method for predicting an agent for evolution trajectories of paper impact according to claim 1, wherein in step S4, the optimization updating is performed by using a reinforcement learning method, and the optimization objective is: Wherein Parameters representing the large language model are displayed, A set of training data is represented and, Representing random sampling from a training set during training Corresponding reward signal The desired value of the sum of the values, Representing the object of the paper, And representing the corresponding influence labeling value.
8. The method of claim 1, wherein the termination condition comprises reaching a preset maximum number of training rounds, or an average prediction error over the validation data set or a desired reward that is no longer elevated in consecutive rounds of training.
9. The utility model provides a thesis influence evolution orbit agent prediction system which characterized in that includes: The data processing module is used for constructing a training data set containing target papers and true influence labeling values thereof; the reasoning and interaction module comprises a large language model and an external academic information interface set, and is used for executing multiple rounds of reasoning and external information calling aiming at the target paper and outputting a predictive influence value; The rewarding calculation module is used for constructing rewarding signals based on deviation between the true influence labeling value and the predicted influence value of the target paper; and the strategy optimization module is used for optimizing and updating the reasoning strategy of the large language model by using a reinforcement learning method with the aim of maximizing expected rewards.
10. An electronic device comprising a memory and a processor; The memory is used for storing a computer program; the processor is configured to implement the thesis impact evolution trajectory agent prediction method according to any one of claims 1 to 8 when executing the computer program.

Description

Thesis influence evolution track intelligent agent prediction method and system Technical Field The application relates to the technical field of artificial intelligence, in particular to a method and a system for predicting an intelligent object of paper influence evolution tracks. Background The paper influence prediction has important significance for scientific research evaluation, academic recommendation and resource allocation. With the increasing complexity of academic networks, research methods in this field have mainly undergone the following evolutions and have corresponding limitations: 1. Limitations of traditional statistical and deep learning methods Early statistical model-based (e.g., regression, time series) methods were simple in structure, but difficult to characterize nonlinear complex relationships in academic networks. The method based on deep learning, particularly the graphic neural network (such as GNN and R-GCN), carries out vectorization representation on papers and associated entities through end-to-end learning, improves the prediction performance, but the decision process depends on a high-dimensional implicit vector, so that prediction logic is completely unexplained, a judgment basis cannot be traced, and the method is difficult to apply to a high-reliability scene. 2. Existing advanced methods and inherent problems thereof Recent studies have proposed more complex graph neural network models to improve prediction accuracy, such as: the HINTS method is to predict the future reference quantity of papers through dynamic heterogeneous information network embedding and RNN, but deduce that the papers depend on complete network embedding tracks. H2CGL method, constructing layered heterograms and adopting contrast learning to promote the sensitivity of the representation to references. However, there are three general core drawbacks to this type of approach: the interpretability is lost, the decision process is a black box, and no traceable and auditable reasoning basis can be provided. Weak supervision is poor in adaptability, complete and long-term time sequence labeling data (such as annual reference amount) are seriously relied on, and a newly published paper usually only has short-term or single-point accumulated reference data, so that the learning ability of a model is limited under weak supervision conditions. Static solidification of reasoning strategy, namely fixing strategy after model training, and being incapable of dynamic adjustment and optimization according to feedback of a predicted result, and lacking iterative cognition and self-improvement capability. 3. Opportunities and challenges introduced by large language models Large language models have shown great potential in natural language understanding and reasoning, and research has been attempted for paper impact prediction. Although the utilization of text semantics is enhanced, the reasoning process is still hidden inside the model, and the reasoning path is unstable due to the lack of explicit structural constraint and external evidence marking, so that the interpretability and the reliability are still insufficient. In summary, the prior art has not systematically solved three key problems of interpretable inference chain construction, efficient learning under weakly supervised conditions, and feedback driven dynamic policy optimization. Therefore, a new paradigm that can integrate large language model reasoning capability with reinforcement learning optimization mechanism is needed to realize highly transparent, highly reliable and continuously evolving paper impact prediction method. Disclosure of Invention The method solves the problems that the reasoning process is unexplainable in the paper influence prediction task, the learning ability is limited under the weak supervision condition, and a dynamic reasoning strategy optimization mechanism based on feedback is lacked. In order to solve the problems, the invention provides a method, a system, electronic equipment and a storage medium for predicting an intelligent object of paper influence evolution track. In a first aspect, the present invention provides a method for predicting an agent for influence evolution track of a paper, including the following steps: s1, constructing a training data set containing target papers and true influence labeling values of the target papers; S2, constructing a large language model to infer and input and configure an external academic information interface set aiming at a target paper in the training data set, executing a multi-round reasoning and external information calling process by the large language model, and outputting a predictive influence value of the target paper when a termination condition is met; S3, constructing a reward signal based on the deviation between the true influence labeling value and the predicted influence value of the target paper; And S4, optimizing and updating the reasoning strategy of th