CN-122022787-A - Master-slave relation prediction method for technology class data fusion new subject map

CN122022787ACN 122022787 ACN122022787 ACN 122022787ACN-122022787-A

Abstract

The invention discloses a method for predicting a teacher-bearing relation of a new discipline map fused with scientific and technological data, which comprises the steps of analyzing and counting papers and patent data in a plurality of fields, constructing a network representation learner by utilizing a network representation learning algorithm, constructing a teacher-bearing relation recognizer based on a deep neural network-mixed Principal Component Analysis (PCA) algorithm and a pooling layer technical means, simultaneously adding attribute network information, constructing a new discipline map recognizer based on disciplines, classifying and optimizing different disciplines, forming a advisor-advisee consultant and a data set of the consultant pair by utilizing the teacher-bearing relation recognizer, establishing a reliable machine learning prediction model to predict the relationship between a teacher and a student, counting and modeling to calculate talents in different periods of the technology, and judging the teacher-bearing relation of the talents, such as who the talents take the students and the like.

Inventors

DU LIN
XU WEIJIE

Assignees

河南省人才数字科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260121

Claims (10)

1. A method for predicting a teacher-bearing relation by fusing scientific and technological data with a new subject map is characterized by comprising the following steps: S1, searching paper data of science and technology class for nearly 30 years in an existing intelligent library, aggregating the searched paper data according to names, and counting papers corresponding to each talent node; s2, building collected papers, formatting paper data, building a partner list based on each paper, and building a paper data collector; S3, extracting the paper data by using human body information, constructing links of edges among collaborators, and constructing a paper data indicator; s4, representing the edges and the nodes based on a network representation learning algorithm, constructing a network representation learner, and simultaneously reserving thesis subject attributes; S5, building a teacher-bearing relation identifier based on a deep neural network-mixed Principal Component Analysis (PCA) algorithm and a pooling layer technical means, adding attribute network information, building a new discipline map identifier based on disciplines, classifying and optimizing different disciplines, and forming a advisor-advisee consultant and a data set of the consultant pair by using the teacher-bearing relation identifier; S6, further dividing the data set, wherein a part of the data set is used for training, a part of the data set is used for testing, and the model is optimized through an adaboost method, so that a trained model is finally formed; and S7, outputting and comparing the prediction results, optimizing the model based on the actual data book, and performing deep optimization by adopting a long-short-period memory ground cabinet neural network LSTM to generate a master-bearing relation data set pair after combing.
2. The method for predicting the bearing relation of a new scientific and technological class data fusion subject map according to claim 1, wherein in step S5, a deep neural network-mixed principal component analysis algorithm is used for two purposes, namely, data denoising and dimension reduction for visualization, namely, dimension reduction of an original dataset to an n-dimensional dataset with a minimum projection distance.
3. The method for predicting the bearing relation of the new subject map fused with scientific and technological data, as set forth in claim 2, is characterized in that when a bearing relation identifier is built, a deep neural network-mixed principal component analysis algorithm is used for reducing the dimension of data to find the most core data field, wherein the most core data field is one of scientific and research institutions, academic age and name ID, and since some subject fields contain thousands of scholars, it is difficult to ensure that the calculation time of operation and the memory use are controlled by a plurality of vectors, a pooling layer is added to compress input elements, firstly, each adjacent vector is reduced to 1000 dimensions and the average value is correspondingly reduced as the input of an encoder, firstly, the data is sorted into a full-scale data set based on the whole subject, and then the professional data set is divided according to the subject classification based on the new subject map identifier.
4. The method for predicting the bearing relation of the new subject map fused with scientific and technological data according to claim 1, wherein in the step S2, the list of built collaborators is based on the name and the organization of the author, and a unique UUID identifier of each author is built.
5. The method for predicting the bearing relation of the new subject map fused with scientific and technological data, according to claim 1, is characterized in that in the step S3, the specific steps for constructing the paper data indicator are as follows: S3.1, edges are expressed as: Paper1- - - - - { Person1, person2. }, paper2- - - - { Person3, person4. }, paper1 is a Paper, and then Person1, person2 is partner information associated with the Paper; s3.2, calculating a data set- { according to years: - -obtaining a year dataset of interest statistics.
6. The method for predicting the bearing relation of a new discipline map fused with scientific and technological data according to claim 5, wherein in the step S4, the edges are relationships between people and papers, the nodes are papers, and the reservation of discipline attributes of the papers is prepared for optimizing training based on discipline classification.
7. The method for predicting the bearing relationship of a new science and technology class data fusion new science and technology class map according to claim 6, wherein in step S1, the intelligent library refers to an independent research institution or organization composed of a group of expert scholars specially researching and providing public policy advice, the intelligent library usually publishes ideas and provides advice by issuing research reports, organizing seminars and forums and participating in policy making and consultation, the intelligent library comprises fields of science, economy, international relations, energy sources and environment, and in the science and technology field, the intelligent library usually focuses on related issues of science and technology development trend, technical innovation and technology policy, and provides scientific and professional advice and guidance for government and enterprise decisions.
8. The method for predicting the bearing relation of the new science and technology class data fusion subject map of claim 7, wherein when the paper data in the intelligent library is searched, in order to ensure the completeness of the data and the accuracy of the cooperative trend, the data in the current year is removed, the beginning year is recorded as Y start , and the ending year is recorded as Y end .
9. The method for predicting the bearing relation of the new subject map fused with scientific and technological data according to claim 8, wherein in step S6, when the model is optimized by the data set, the method specifically comprises the following steps: s6.1, dividing a training data set Train1; s6.2, dividing a Test data set Test1; s6.3, training a model based on Train 1; s6.4, testing model accuracy based on Test1 and tuning.
10. The method for predicting the teacher-bearing relation of the new subject map fused with scientific and technological data according to claim 9, wherein the process of deep optimization by adopting a long-short-term memory ground cabinet neural network LSTM is as follows: S7.1, preprocessing the input data, wherein the preprocessing comprises data cleaning, standardization, segmentation training set and test set; S7.2, designing a model structure, namely designing a proper LSTM model structure according to the requirements and data characteristics of a specific task, wherein the LSTM model structure further optimizes the model by increasing the number of LSTM layers, adjusting the size of the LSTM layers and adding other types of layers; S7.3, parameter tuning, namely training the model by using a training set, and optimizing the performance of the model by adjusting different super parameters; S7.4, regularization, namely, in order to reduce the overfitting, introducing a regularization technology into the model to help the model to better generalize to the data which are not seen; S7.5, gradient cutting, namely cutting the gradient in the training process to limit the maximum value of the gradient in order to avoid the gradient explosion problem; S7.6, processing the sequence length, namely considering cut-off or filling sequences for long sequence data so as to enable a model to process better, wherein the cut-off is to select the reserved sequence length according to task requirements, and the filling is to expand the sequence to a fixed length by using a specific filling symbol; S7.7, batch normalization, namely accelerating the training process through a batch normalization layer, and improving the performance and the robustness of the model; And S7.8, evaluating and adjusting the model, namely evaluating the optimized model by using a test set, and adjusting and improving according to the evaluation result until the expected performance is achieved.

Description

Master-slave relation prediction method for technology class data fusion new subject map Technical Field The invention belongs to the technical field of information, and particularly relates to a method for predicting a teacher-bearing relationship by fusing scientific and technological data with a new subject map. Background The academic network may be formed from different types of relationships, such as colleagues, friends, and teacher-to-teacher relationships. These relationships generally reflect different interpersonal interactions. For example, in the relationship between a teacher and a guided, a doctor's study subject is generally determined by his/her teacher (i.e., a teacher). In a friendship relationship, however, a person's daily schedule may be determined by his/her friends. These interactions control the dynamics and complexity of the social network. To better model interactions based on network science, a particular network is abstracted into a graph of nodes and edges, where the nodes represent entities and the edges represent different relationships. Thus, we can model the relationships of nodes and edges from a local and global perspective using graph theory methods and machine learning techniques. In the traditional human-human relationship construction, manual input is mainly relied on, and a human main body actively fills in a master-bearing relationship, but the method is difficult to collect the influence of subjective factors such as data privacy, safety concern and higher labor cost, meanwhile, a complete, objective and large-scale carding system is not available, so that the reference value of the master-bearing relationship data system is limited, if large-scale construction (millions of magnitude) of the master-bearing relationship is needed for talent relationship data, a large number of people are needed to conduct detailed investigation on the relationship of the characters, the workload is huge, and the influence of the subjective factors is also brought into hand, so that a master-bearing relationship prediction method for fusing scientific and technological data with a new discipline map is needed to solve the problems. Disclosure of Invention The invention aims to provide a method for predicting the teacher-bearing relation of a science and technology class data fusion new subject map, which is used for accurately predicting the teacher-bearing relation among talents by constructing a machine learning model based on paper or scientific journal data so as to solve the problems in the background technology. In order to achieve the above purpose, the invention adopts the following technical scheme: A method for predicting a teacher-bearing relation by fusing scientific and technological data with a new subject map comprises the following steps: S1, searching paper data of science and technology class for nearly 30 years in an existing intelligent library, aggregating the searched paper data according to names, and counting papers corresponding to each talent node; s2, building collected papers, formatting paper data, building a partner list based on each paper, and building a paper data collector; S3, extracting the paper data by using human body information, constructing links of edges among collaborators, and constructing a paper data indicator; s4, representing the edges and the nodes based on a network representation learning algorithm, constructing a network representation learner, and simultaneously reserving thesis subject attributes; S5, building a teacher-bearing relation identifier based on a deep neural network-mixed Principal Component Analysis (PCA) algorithm and a pooling layer technical means, adding attribute network information, building a new discipline map identifier based on disciplines, classifying and optimizing different disciplines, and forming a advisor-advisee consultant and a data set of the consultant pair by using the teacher-bearing relation identifier; S6, further dividing the data set, wherein a part of the data set is used for training, a part of the data set is used for testing, and the model is optimized through an adaboost method, so that a trained model is finally formed; and S7, outputting and comparing the prediction results, optimizing the model based on the actual data book, and performing deep optimization by adopting a long-short-period memory ground cabinet neural network LSTM to generate a master-bearing relation data set pair after combing. Preferably, in step S5, the deep neural network-hybrid principal component analysis algorithm is used for two purposes, the first is data denoising, and the second is to reduce the dimension for visualization, reducing the dimension of the original dataset to the n-dimensional dataset of the minimum projection distance. Preferably, when the master-slave relation identifier is built, the deep neural network-mixed principal component analysis algorithm is firstly used for reducing the dimension