CN-121996829-A - Academic paper reviewer recommendation method based on dynamic balance edge graph
Abstract
The invention discloses a dynamic balance side graph-based academic paper reviewer recommendation method, which is used for accurately matching reviewers through a dynamic balance side graph technology and realizing efficient prediction of the reviewers under a complex scene through centroid comparison learning and a dynamic synthesis and deletion mechanism of tail nodes. The method effectively overcomes inaccuracy of the selection of the reviewers caused by unbalanced link types through a dynamic balance edge graph technology, can be matched with proper reviewers accurately, remarkably improves the robustness of the prediction model when the problem of unbalanced types is processed through centroid comparison learning and a dynamic synthesis and deletion mechanism of the tail nodes, ensures the efficient reviewer prediction under a complex scene, and simultaneously enhances the stability and the overall performance of the prediction model.
Inventors
- LV XIAOQING
- LIN HONGXIANG
- WANG TUO
- Wu runzhi
Assignees
- 北京大学
Dates
- Publication Date
- 20260508
- Application Date
- 20241107
Claims (10)
- 1. The academic paper reviewer recommendation method based on the dynamic balance edge map is characterized by precisely matching reviewers through the dynamic balance edge map technology, and realizing efficient prediction of the reviewers under a complex scene through centroid comparison learning and a dynamic synthesis and deletion mechanism of tail nodes, and comprises the following specific steps: 1) Constructing a signed bipartite graph model of paper-reviewers by using academic papers and corresponding paper reviewer data sets, wherein the signed bipartite graph model is used for representing evaluation interaction relations between each reviewer and the paper reviewer; the signed bipartite graph comprises a positive sub graph and a negative sub graph, wherein nodes in the bipartite graph respectively represent papers and reviewers, and the papers and the reviewer node sets are two disjoint sets; 2) Performing node embedding pre-training to generate a high-dimensional representation of each node; Inputting the signed bipartite graph and the positive and negative subgraphs thereof into a graph neural network with shared weights, respectively generating a signed bipartite graph and characteristic representation of each node in the positive and negative subgraphs thereof, splicing the characteristic representations in the column direction, and further generating a characteristic representation of an edge through a learnable linear transformation and bias; 3) According to the signed bipartite graph, performing edge graph conversion and initializing node characteristics of the edge graph: Converting the generated signed bipartite graph representation into a corresponding edge graph representation; each node class in the edge graph comprises a head class and a tail class, the generated node embedding is transferred into the edge graph as the characteristic of initializing the node embedding, and the link class unbalance in the bipartite graph is converted into the node class unbalance of the edge graph; 4) Decision boundaries are adjusted through centroid contrast learning: for each node class in the edge graph, calculating the mean value of the node feature vector, and taking the mean value as the centroid of the class; The method comprises the steps of designing a unified loss function in a class, enabling all node feature vectors in the same class to be close to the mass centers of the class, pushing the mass centers of different classes away in a feature space through separation loss between the classes, and maximizing the distance between the mass centers; 5) Dynamic synthesis and deletion of tail nodes: Extracting N nodes from the head class and the tail class based on the sampling probability to form node pairs, and generating new nodes through feature mixing operation; Randomly connecting the new node to other nodes of the edge graph, updating the adjacency matrix, and inputting the original node characteristics and the new node characteristics into the graph neural network model; Then, re-calculating the mass centers of the head class and the tail class, judging whether the new node is wrongly classified as the head class according to the distances between the synthesized node and the mass centers of the head class and the tail class, deleting the new node and repeating the node synthesis process if the new node is wrongly classified, so as to dynamically adjust the node distribution in the edge graph; 6) The method comprises the steps of predicting the reviewers, in the reasoning stage, reasoning by using an original node set in the edge graph without considering synthesized nodes, calculating a score for each possible reviewer-paper pair based on embedded representation of the nodes and a mode learned from training data, and sorting according to the scores, and selecting the top candidate as a reviewer recommended to the paper to realize accurate reviewer recommendation of the academic paper.
- 2. The academic paper reviewer recommendation method based on dynamic balance edge graphs of claim 1 wherein positive signs represent positive ratings and negative signs represent negative ratings, and wherein edges of positive signs represent positive ratings or recommendations of reviewers to papers and edges of negative signs represent negative ratings or refuses.
- 3. The academic paper reviewer recommendation method based on dynamic balancing edge graphs of claim 2 wherein the signed bipartite graph is represented as g= (U, V, E, X b ), wherein graph G is comprised of two disjoint sets of nodes U and V, representing two portions of the bipartite graph, respectively, set U comprises paper node { U 1 ,u 2 ,…,u |U| }, set V comprises reviewer node { V 1 ,v 2 ,…,v |V| }, E represents a signed edge between paper node and reviewer node, X b represents a feature matrix associated with the nodes and edges, each row in X b represents a node or an edge, and the columns correspond to different features of the node or edge.
- 4. The academic paper reviewer recommendation method based on dynamic balance edge graphs of claim 3 wherein in the signed bipartite graph, positive sub-graph g+ containing positive links and negative sub-graph G-containing negative links are represented as: G + =(U,V,E + ,X b ) G - =(U,V,E - ,X b ) where E + represents the set of all positive sign edges and E - represents the set of all negative sign edges.
- 5. The academic paper reviewer recommendation method based on dynamic balance edge graphs according to claim 4, wherein the bipartite graph G, the positive sub-graph G + and the negative sub-graph G - are input into a graph neural network model expressed as: Wherein, the Representing a characteristic representation of a node v in a first layer of the graph neural network; The method comprises the steps of representing characteristic representation of a node v at a first layer-1, N v representing a direct neighbor node set of the node v, enabling theta l to be a learnable parameter of the first layer of the graph neural network, comprising a weight matrix and a bias term, minimizing a loss function through a training model, learning an optimal value of theta l , and generating effective node characteristic representation. For the graph neural network model input by each of the bipartite graph G, the positive sub-graph G + and the negative sub-graph G - , parameters are shared when parameters are updated.
- 6. The academic paper reviewer recommendation method based on dynamic balance edge graphs according to claim 5, wherein feature representations are generated for the bipartite graph G, the positive sub-graph G + and the negative sub-graph G - respectively through a graph neural network model, and the feature representations of g+ and G-are spliced with the feature representation column direction of the original graph G to form a comprehensive feature matrix, which is expressed as: H o ∈R (|U|+|V|)×d′ The method comprises the steps of obtaining a new feature matrix, wherein H o is a new feature matrix, i rows of the new feature matrix represent embedded representation of an ith node in a new feature space, each node embedded representation comprises comprehensive information of own features and neighbor features of the node, d ′ is a dimension of the embedded representation of the node in a high-dimensional space and is a feature space dimension generated after model learning, I U and I V respectively represent the number of nodes of a paper node set U and a reviewer node set V in a bipartite graph, and the number of rows of the matrix H o is the sum I U and I V of the numbers of all paper and reviewer nodes and represents all nodes in the bipartite graph G; Model outputs were obtained for graphs G + and G - , respectively, denoted as (H o (+)∈R (|U|+|V|)×d′ ) and (H o (-)∈R (|U|+|V|)×d′ ).
- 7. The academic paper reviewer recommendation method based on dynamic balancing graphs of claim 6 wherein generating a feature representation of an edge is represented as: H b =[H o ||H o (+)||H o (-)]W o +b o Wherein, [ | ] represents a per-column splicing operation, And b o ∈R d′ is a learnable parameter; predicting the edge symbol by using a single-layer perceptron, and calculating the prediction probability of the edge symbol by the following formula: Wherein z k is a two-dimensional vector representing the predicted probability that edge e k is positive or negative, e k =(u i ,v j ), and a first component in vector z k represents the probability that e k is positive and a second component represents the probability that e k is negative; The characteristic representations of the node i and the node j are spliced to form a new characteristic vector, wherein W z is a learnable weight matrix, and b z is a learnable bias term; by calculating the difference between the predicted symbol and the actual symbol for all edges e k , the high-dimensional feature representation of the paper node and the high-dimensional feature representation of the reviewer node are optimized and expressed as: Wherein Y k E Y represents the actual sign of edge E k E E, A probability value representing that the model predictive edge e k is a positive symbol, 1- Is the probability value that the model predicted edge e k is a negative sign, and L pre directs model optimization by computing the differences between the predicted and actual signs for all edges e k .
- 8. The academic paper reviewer recommendation method based on dynamic balancing edge graphs of claim 7 wherein constructing the edge graph in step 3) comprises: transferring the node high-dimensional characteristic representation generated by pre-training into an edge graph to be used as the high-dimensional characteristic representation of the initializing node of the edge graph, converting the link class unbalance in the bipartite graph into the node class unbalance, and expressing the edge graph as follows: G=(V,E,X) the node set of the edge graph is V= { N (E) |e E }, N (-) represents the conversion of the edge in the bipartite graph into the node in the edge graph, wherein the class of the node of the edge graph comprises a head class and a tail class; Calculating the mean value of the node characteristic vectors of each node class in the edge graph, and taking the mean value as the centroid of the class, wherein the mean value is expressed as: Wherein, the And A high-dimensional feature vector representation representing a head class node N (e i ) and a tail class node N (e j ), respectively; the intra-class unified loss function is expressed as: wherein V represents the total number of all nodes in the graph, k represents class index, h represents head class, t represents tail class, N (e i )∈V k ,N(e i ) represents node set connected by edge e i , and V k represents node set in class k; Representing a high-dimensional representation of the ith node in class k in feature space; Distance for measuring the proximity of node i to class centroid; Representing the relative similarity that node i belongs to class k; the separation loss between classes is expressed as: L inter =exp(-d(C h ,C t )) Where d (C h ,C t ) is the Euclidean distance, used to measure the difference between the two centroids.
- 9. The academic paper reviewer recommendation method based on dynamic balance edge graphs of claim 8, wherein the step 5) of designing a dynamic synthesis and deletion strategy for tail class nodes comprises: according to the similarity between the node and the centroid of each class, calculating the sampling probability of the head class node and the tail class node, wherein the exponential relation between the sampling probability and the distance between the node and the centroid is expressed as: Wherein, the Representing the sampling probability that the head class node i is selected, Representing the sampling probability of the tail class node j being selected; high-dimensional feature vector representing head class node i A distance from the head centroid C h ; embedded feature vector representing tail class node j A distance from the tail centroid C t ; based on the sampling probability, extracting N nodes from the head class and the tail class respectively to form N pairs of node pairs, and for each pair of node pairs, synthesizing the characteristics of the new node through the mixing operation in the characteristic space, wherein the characteristics are expressed as follows: Wherein, the Characterizing the synthetic node N syn (e k )∈V syn , delta is a value sampled in a uniform distribution from 0 to 1; The method comprises the steps of randomly connecting synthesized nodes with other nodes in a dynamic balance edge graph, constructing a new adjacency matrix, integrating the synthesized nodes into the dynamic balance edge graph, then carrying out feature updating, inputting the high-dimensional features of the original edge graph nodes and the high-dimensional features of the synthesized nodes into a graph neural network model together, and obtaining updated node features; and re-calculating the centroids of the head class and the tail class by using the updated node characteristics, judging whether the synthesized node is wrongly classified as the head class or not based on the distances between the synthesized node and the centroids of the head class and the tail class, and representing that: where λ is a very large normal number constant, when f k =1, node N syn (e k )∈V syn is misclassified as a head class, misclassified nodes are deleted and the nodes are recombined, keeping the number of combined nodes N, which is performed in a loop every training period.
- 10. The academic paper reviewer recommendation method based on dynamic balance edge graphs of claim 9 wherein training the predictive model includes two stages, a first stage training using pre-trained node embedding as a feature and a second stage training using intra-class uniform loss and inter-class separation loss; In the first stage training process, the model adopts node embedding obtained by pre-training as an input characteristic, and the aim is to enable the model to learn and utilize the relation between the complex structure in the node embedding capturing diagram and the nodes; the loss function L employed for the end-to-end training in the second stage is expressed as: L=L rw +αL inter +βL intra Wherein, alpha and beta are loss coefficients, L rw is a cross entropy loss function, L intra is intra-class uniform loss, and L inter is inter-class separation loss; Wherein, the node classification is performed by using a cross entropy loss function, expressed as: Wherein L rw is the performance of the metric model in the symbol edge classification task, Y k ε Y represents the actual sign of node N (e k ), wherein 0 represents a negative node and 1 represents a positive node; Represents the i-th element of p k , w pos and w neg represent the weights of the positive and negative classes, respectively.
Description
Academic paper reviewer recommendation method based on dynamic balance edge graph Technical Field The invention relates to the fields of graph neural networks, link symbol prediction and recommendation systems, in particular to an academic paper peer review recommendation method based on dynamic balance edge graphs, namely a selection recommendation method of a reviewer. Background With the rapid development of online platforms such as social networks and recommendation systems, it is becoming increasingly important to study the interactive relationship between users and items. The prior work has conducted extensive discussion of these complex relationships with the aid of a signed bipartite graph tool. In the signed bipartite graph, links with positive and negative signs represent positive or negative relationships, respectively. Analysis tasks cover node classification, node ordering, and symbolic link prediction, which is particularly critical. The diversity of signed bipartite graph analysis has facilitated the advancement of graph representation learning. In the field of academic paper reviewer recommendation, there is a similar need to recommend appropriate expert reviewers for submitted papers in order to evaluate the content and quality of the papers and provide feedback. Early approaches, such as random walk and matrix factorization, focused on the structural characteristics of the graph. For example, deepWalk samples the Node sequence in the graph by random walk as a way to learn the Node embedded representation, while Node2Vec is an extended version of DeepWalk. With the development of deep learning techniques, these methods are applied to the learning of signed diagrams. For example SiNE builds its objective function using equilibrium theory to learn the embedding of signed networks. SGCN introduces a graph convolution network to process the signed graph. The SNEA introduces a graph-annotating force mechanism to capture complex interactions between nodes. Meanwhile, SGCL optimizes sample similarity using contrast learning to learn a robust node representation that can cope with noise and data sparsity challenges. In addition, some methods are specifically designed for analyzing signed bipartite graphs, which can be used for predictive tasks by paper reviewers. For example, derr et al introduced the concept of signed butterfly isomorphism. SBGNN solve the balancing problem in signed binary networks by modifying the messaging mechanism. SBGCL adopts an innovative double-layer diagram data enhancement method, so that the representation learning effect of the signed bipartite diagram is improved. However, existing approaches still face the problem of link class imbalance when using signed bipartite graphs to predict paper reviewers, which has not been effectively addressed. In the field of graph representation learning, category imbalance problems are quite common. In real world graph data, researchers often observe that the number of nodes in the head category is significantly greater than the number of nodes in the tail category. This imbalance phenomenon results in models that perform well when identifying head classes, but perform poorly when identifying tail classes. To solve this problem, two strategies are generally adopted. The first strategy is to increase the interest in the tail class during model training by adjusting the loss function. For example, the Reweight method improves the model's ability to identify tail categories by assigning different weights to samples of different categories. The second strategy is to use a generation method to balance the dataset by synthesizing the nodes in the tail class with a smaller number of samples. For example GraphSmote synthesizes the tail class nodes by interpolation techniques and generates new edges to achieve data balancing. DR-GCNs and ImGAGN then synthesize the tail class node using the generated antagonism network. GraphENS attempt to extend the decision boundaries of the tail class through the composite subnetwork, taking into account all sample features in the graph. TailGNN enhance the feature representation of the tail node by changing the neighborhood structure and passing knowledge from the head class node to the tail class node. GraphSHA expands the decision boundary by synthesizing more challenging tail class samples and ensures that this boundary information propagates only inside the tail class, avoiding interference with neighboring class data space. Although the above method solves the problem of class unbalance to a certain extent, the extremely unbalanced phenomenon in the edge graph is not solved effectively, which results in that the feature space of the tail class node is excessively compressed when the paper reviewer predicts, thereby affecting the accuracy of prediction and recommendation. Existing work typically utilizes bipartite graph models, particularly signed bipartite graphs, for peer review predictions and recom