CN-116168849-B - Medicine-disease association prediction method
Abstract
The invention relates to a medicine-disease association prediction method which comprises the following steps of firstly, extracting a neighborhood subgraph of medicine-disease association, constructing a two-part graph by taking medicines and diseases in a medicine-disease association network as nodes and the relation between the medicines and the diseases as edges, constructing an adjacent matrix of the association network, extracting h-hop neighborhood of related medicine nodes and disease nodes, merging the h-hop neighborhood of the medicine nodes and the disease nodes into h-hop neighborhood subgraphs of the medicine-disease association, extracting the neighborhood subgraphs corresponding to each association as positive samples of model training, randomly selecting the same number of medicine-disease pairs which are not associated to generate negative sample test data, dividing the training set and the test set, secondly, constructing initial node characteristics of the neighborhood of the subgraph, and thirdly, learning the graph neural network.
Inventors
- LUO TAO
- LIU SIKAI
Assignees
- 天津大学
Dates
- Publication Date
- 20260512
- Application Date
- 20230307
Claims (3)
- 1. A method of drug-disease association prediction comprising the steps of: step one, extracting a neighborhood subgraph of medicine-disease association (1) Taking medicines and diseases in a medicine-disease association network as nodes, establishing a bipartite graph by taking the relation between the medicines and the diseases as edges, and constructing an adjacency matrix of the association network; (2) Extracting h-hop neighborhood of related medicine nodes and disease nodes; (3) Combining the h-hop neighborhood of the medicine node and the disease node into a h-hop neighborhood subgraph of medicine-disease association; (4) Extracting a neighborhood subgraph corresponding to each association as a positive sample of model training, and randomly selecting the same number of unassociated drug-disease pairs to generate negative sample test data; step two, constructing initial node characteristics of neighborhood subgraph (1) Assigning integer labels to each node in the extracted domain subgraph, and distinguishing a central node from other nodes; (2) Labeling 1 for the central node, representing medicines and diseases, and labeling the nodes for other nodes by using a method based on a double-radius node label DRN; (3) After each node is distributed with a label, constructing initial node characteristics of a neighborhood subgraph; step three, learning of the graph neural network (1) Extracting a representation h i of each node in the neighborhood subgraph G h (u, v) with the initial node characteristics distributed by a plurality of graph convolution layers for a medicine node u and a disease node v, and representing the following for each node i: Wherein W (k) represents a trainable weight matrix of a certain layer, N (i) represents a neighbor node set of a node i, sigma (·) represents an activation function, h (k) is an activation vector of a k-th layer, and h (0) =x 0 ,x 0 represents an initial node characteristic; (2) The node representation of each node in neighborhood subgraph G h (u, v) is subjected to graph aggregation to obtain representation h G of the neighborhood subgraph, as follows: h G =f({h i :i∈N h (u)∪N h (v)}) Wherein f represents a connection aggregation function, h i is a representation of each node, and N h (u) and N h (v) represent h-hop neighbors of drug u and disease v; (3) The representation of the resulting neighborhood subgraph is input into the multi-layer perceptron MLP and the prediction probability is output to predict the association of drug with disease, represented as follows: y (i,j) =w T σ(Wh G ) Wherein y (i,j) e (0, 1) represents the probability of existence of the association, where W and W are parameters of the MLP, mapping the representation of neighborhood subgraph h G as a predicted probability; (4) Iterative optimization using a gradient descent algorithm, in each iteration, calculating the loss of the neural network using the label information and the weighted binary cross entropy loss function and updating the neural network parameters using gradient descent in accordance with the loss.
- 2. The drug-disease association prediction method of claim 1, wherein the initial node features of the neighborhood subgraph are constructed using one-hot encoding vectors of node labels.
- 3. The drug-disease association prediction method of claim 1, wherein the binary cross entropy loss function is expressed as follows: where n and m are the number of drugs and diseases, respectively, lambda is the penalty weight after a positive sample is determined to be incorrect, Is a predictive probability score matrix, s + and s - are the number of positive and negative samples, respectively.
Description
Medicine-disease association prediction method Technical Field The invention relates to the field of bioinformatics, in particular to a medicine-disease association prediction method based on a graph neural network. The method extracts a neighborhood subgraph of drug-disease association and uses node labels as node information of the neighborhood subgraph. And then, taking the neighborhood subgraph as a training sample and taking the existence or non-existence of the drug-disease association as a label to realize an end-to-end drug-disease association prediction framework. Background Over the last decades, despite the ongoing advances in genomics, life sciences and technology, drug discovery from scratch has become very time consuming and expensive. In view of the high rate of wastage, enormous costs and slow speed of new drug discovery and development, reuse of "old drugs" to treat common and rare diseases is becoming an increasingly attractive proposition because it involves the use of de-risked compounds, potentially reducing overall development costs and shortening development time. Indeed, the discovery of many drugs is not based on a systematic approach, but rather results from understanding the pharmacology of the drug, retrospective analysis of the clinical effects of the drug, and non-focused screening. Thus, it is urgent and important to explore more efficient and systematic methods to effectively accelerate the drug research process. The high-precision calculation type method can effectively select potential candidate association from the association of the medicine and the diseases, reduces the search space of the medicine, reduces unnecessary experiment consumption and improves the medicine development efficiency. How to accurately predict drug-disease associations is a central issue in computational methods. With the perfection of drug databases and the breakthrough of related algorithms, more and more associated prediction methods are proposed. Recently, graph neural networks have shown convincing performance in biomedical network analysis. Yu [1] et al integrate known drug-disease associations, drug-drug similarities, and disease-disease similarities into a heterogeneous network and perform a graph convolution operation on the network to learn drug and disease embedment. Embedding of multiple layers of graph convolution was combined using a notice mechanism, and unobserved drug-disease associations were scored based on the integrated embedding. Zhao 2 et al learn node embedding in heterogeneous information networks from a topology and biology perspective based on graph representation learning techniques, using random forest classifiers to predict unknown drug-disease associations. Existing prediction methods all use graph neural networks to characterize the node embedding of drugs and diseases in drug-disease networks, and then use the node embedding to predict drug disease associations. However, when the information of the medicine and the disease node is lost, the existing prediction method has the problems of incapability of working or low prediction accuracy, so that the method has certain limitation. 1.Yu Z,Huang F,Zhao X,et al.Predicting drug–disease associations through layer attention graph convolutional network.Brief.Bioinform.2021;22:bbaa243 2.Zhao B-W,Hu L,You Z-H,et al.HINGRL:predicting drug–disease associations with graph representation learning on heterogeneous information networks.Brief.Bioinform.2022;23:bbab515 Disclosure of Invention The invention provides a medicine-disease association prediction method based on a graph neural network. The method not only can work when the information of the medicine and the disease node is lost, but also can realize larger performance improvement compared with the mainstream prediction method. The invention is realized by the following technical scheme: a method of drug-disease association prediction comprising the steps of: step one, extracting a neighborhood subgraph of medicine-disease association (1) Taking medicines and diseases in a medicine-disease association network as nodes, establishing a bipartite graph by taking the relation between the medicines and the diseases as edges, and constructing an adjacency matrix of the association network; (2) Extracting h-hop neighborhood of related medicine nodes and disease nodes; (3) Combining the h-hop neighborhood of the medicine node and the disease node into a h-hop neighborhood subgraph of medicine-disease association; (4) Extracting a neighborhood subgraph corresponding to each association as a positive sample of model training, randomly selecting the same number of unassociated drug-disease pairs to generate negative sample test data, and dividing a training set and a test set; step two, constructing initial node characteristics of neighborhood subgraph (1) Assigning integer labels to each node in the extracted domain subgraph, and distinguishing a central node from other nodes; (2) La