CN-116543852-B - DDI prediction method based on Siamese structure and graph contrast learning

CN116543852BCN 116543852 BCN116543852 BCN 116543852BCN-116543852-B

Abstract

The invention discloses a DDI prediction method based on Siamese structure and graph contrast learning, which comprises the steps of collecting drug-drug interaction text data, a drug physicochemical property data file and a targeting relationship data file, extracting physicochemical property characteristics and targeting relationship characteristics of drugs, obtaining initial characteristics based on the physicochemical property and the targeting relationship of the drugs after fusion, calculating a drug-drug interaction adjacent matrix, constructing a drug-drug interaction iso-graph by combining the initial characteristics, inputting the iso-graph into a graph contrast learning model based on Siamese structure, learning to obtain embedded characteristics of drug nodes, and calculating the score of edges between any two drug nodes by using a link prediction method. The method can solve the problems that the drug targeting relationship characteristics and the drug-drug interaction text data are difficult to be used independently and influence on model performance when sparse, improves the accuracy of drug-drug interaction prediction, and can be applied to identifying potential interactions between drugs.

Inventors

TENG ZHIXIA
WANG YUNZE
DONG BENZHI
WANG GUOHUA

Assignees

东北林业大学

Dates

Publication Date: 20260508
Application Date: 20230329

Claims (9)

1. The DDI prediction method based on Siamese structure and graph contrast learning is characterized by comprising the following steps: S1, collecting drug-drug interaction text data, a physicochemical property data file and a targeting relationship data file of a drug, screening and preprocessing the collected data to obtain a drug-drug interaction original data set to be processed; S2, extracting physicochemical property characteristics based on a character string of a molecular format of the SMILES of the drug to be treated and targeting relation characteristics based on targets, enzymes and passages in a physicochemical property data file of the drug to be treated, and fusing the physicochemical property characteristics and the targeting relation characteristics to obtain initial characteristics based on the drug; S3, obtaining a drug-drug interaction adjacency matrix according to the drug-drug interaction text data, and constructing a drug-drug interaction heterogram by combining the drug physicochemical property and the initial characteristic of the targeting relationship obtained in the S2 with the drug-drug interaction adjacency matrix; S4, inputting the medicine-medicine interaction abnormal pattern into a Siamese structure-based graph comparison learning model, and learning to obtain the embedded characteristics of medicine nodes from a graph topological structure, wherein the graph comparison learning model is a graph isomorphic network, the nodes in the graph isomorphic network are medicine nodes, and the node characteristics are the attributes of the nodes; the step of learning the embedded features of the drug node includes: s41, extracting two isomorphic subgraphs with the same side type and different node attributes from a heterogeneous graph structure containing a plurality of DDI relations; S42, inputting the two isomorphic subgraphs serving as input objects of the isomorphic networks of the graph with the shared weight Siamese structure into a siamese structure-based isomorphic network comparison model of the graph; S43, updating and iterating node characteristics in the graph isomorphic network, wherein in each layer of graph isomorphic network, the node characteristics of the next layer of each medicine node are aggregated by the node characteristics of the current layer of the medicine node and the node characteristics of other medicine nodes in the current layer; s44, obtaining a compressed drug embedded vector after iteration is completed; s5, calculating the score of the edge between any two drug nodes by using the link prediction method by taking the embedded feature of the drug node as input.
2. The DDI prediction method based on siamese structure and graph contrast learning according to claim 1, wherein in S1: the drug-drug interaction text data includes drug-drug interaction text; the drug targeting relationship data file comprises drug-target interaction data, drug-enzyme interaction data and drug-channel interaction data; the physicochemical property data file of the medicine comprises a SMILES molecular format character string of the medicine.
3. The DDI prediction method based on siamese structure and graph contrast learning according to claim 1, wherein the S2 comprises: s21, extracting physicochemical property characteristics, namely converting a SMILES molecular format character string of the medicine into a medicine molecular fingerprint format according to physicochemical properties, further encoding the medicine molecular fingerprint format into binary bit vectors, and calculating a medicine-medicine similarity matrix based on the bit vectors; s22, extracting target relation characteristics comprises the steps of converting drug-target point interaction data, drug-enzyme interaction data and drug-path interaction data into binary bit vectors according to drug-target points, drug-enzyme and drug-path interaction relations, and respectively calculating to obtain a drug-target point similarity matrix, a drug-enzyme similarity matrix and a drug-path similarity matrix based on the bit vectors; s23, fusing the four similarity feature matrixes in S21 and S22 to obtain a multi-source medicine similarity matrix, wherein each matrix serves as a two-step medicine feature of the current medicine.
4. The DDI prediction method based on siamese structure and graph contrast learning according to claim 3, wherein the step of calculating the similarity matrix in S21 and S22 comprises: the similarity matrix is calculated by the following formula: ; Wherein N (D A ,D B ) 11 is the number of attributes having attribute values of 1 in bit vectors D A and D B , N (D A ,D B ) 01 is the number of attributes having attribute values of 0 in bit vectors D A and D B and having attribute values of 1 in D B , and N (D A ,D B ) 10 is the number of attributes having attribute values of 1 in bit vectors D A and D B and having attribute values of 0 in D A ; For calculating the medicine-medicine similarity matrix, D A ,D B represents the molecular fingerprint position vector of medicine A and medicine B; For calculating the drug-target similarity matrix, D A ,D B represents One-Hot vector of the drug-target; For calculating the drug-enzyme similarity matrix, D A ,D B represents the One-Hot vector of the drug-enzyme; For calculating the drug-pathway similarity matrix, D A ,D B represents the One-Hot vector of the drug-pathway.
5. The DDI prediction method based on siamese structure and graph contrast learning according to claim 3, wherein the step of S23 of fusing comprises: S231, combining a medicine-medicine similarity matrix, a medicine-target point similarity matrix, a medicine-enzyme similarity matrix and a medicine-channel similarity matrix in a serial manner to obtain a serial medicine similarity matrix, wherein each action in the matrix is a one-step medicine characteristic of the current medicine obtained based on physicochemical properties and a targeting relationship; S232, calculating the serial medicine similarity matrix by using a Gaussian kernel function to obtain a multi-source medicine similarity matrix based on the Gaussian kernel function, wherein each action in the matrix is a two-step medicine characteristic of the current medicine after multi-source characteristic fusion.
6. The DDI prediction method based on siamese structure and graph contrast learning according to claim 1, wherein the step of obtaining the drug-drug interaction adjacency matrix according to the drug-drug interaction text data in S3 comprises: The drug-drug interaction text data comprises description text describing DDI events between every two drugs, keywords in the description text are replaced by fixed words, and a drug-drug interaction relation is obtained, wherein the keywords comprise drug names; Constructing a drug-drug interaction adjacency matrix based on the drug-drug interaction relationship.
7. The DDI prediction method based on siamese structure and graph contrast learning according to claim 5, wherein the S3 drug-drug interaction isomerism graph comprises: the nodes of the drug-drug interaction heterograms are drugs; determining that no edge exists between two nodes according to the position of 0 in the adjacent matrix, and determining that an edge exists between two nodes according to the position of 1 in the adjacent matrix; and determining the attribute of the node according to the two-step drug characteristics based on the multi-source characteristic fusion.
8. The DDI prediction method based on siamese structure and graph contrast learning according to claim 1, wherein the S5 specific steps comprise: the GNN-based link prediction model calculates the score of the existence of links between the drug nodes, and the calculation formula is as follows: ; Wherein, the And The characteristics of the drug nodes u, v at the L layer, which are calculated by the isomorphic network of the multi-layer graph, are respectively, and phi is the set predictive operation.
9. The DDI prediction method based on siamese structure and graph contrast learning according to claim 1, wherein S5 further comprises performing performance evaluation on a graph contrast learning model, and the specific steps comprise: taking the area ROC_AUC under the receiver operation characteristic curve, the accuracy Acc, the F1 fraction, the accuracy Precision, the Recall ratio Recall and the area PR-AUC under the PR curve as evaluation indexes; the specific calculation formula is as follows: ; ; ; ; ; where TP is the DDI predicted by the model as true, TN is the non-existent DDI predicted by the model as false, FP is the non-existent DDI predicted by the model as true, FN is the existent DDI predicted by the model as false; The ROC_AUC is obtained by using FPR and TPR as horizontal and vertical axes to obtain an ROC curve and calculating the AUC under the ROC curve; PR-AUC was obtained by calculating the AUC under PR curve with Recall and Precison as the horizontal axis PR curve, respectively.

Description

DDI prediction method based on Siamese structure and graph contrast learning Technical Field The invention relates to the technical field of bioinformatics, in particular to a DDI prediction method based on Siamese structure and graph contrast learning. Background A method of combining two or more drugs having a synergistic effect for therapeutic purposes is called drug combination therapy. The drug combination therapy not only can improve the curative effect of the drug, but also has less toxicity and side effects. Therefore, predicting potential DDI (drug-drug interaction) relationships is very important. Early DDI prediction problems were primarily addressed by binary classification of whether or not there was synergy/antagonism between drugs. However, such binary classification results are difficult to provide clear guidance for drug combination therapy in the real world. Thus, DDI prediction problems are increasingly shifted to multi-label classification problems that can produce some specific side effect or effects between predicted drugs. Typical methods for predicting drug-drug interactions are DDIMDL, MDF-SA-DDI, SSI-DDI, GMPNN, 3DGT, MTDDI, mircle, attentionDDI. The DDIMDL method takes chemical substructures, targets, enzymes, pathways and other physical and chemical properties of the drugs as drug characteristics, thereby establishing a prediction model based on a multilayer perceptron. The MDF-SA-DDI method is characterized by taking chemical substructures, targets, enzymes and other physical and chemical properties of medicines, and carrying out multi-source medicine fusion based on a Siamese network, a convolutional neural network and an automatic encoder and multi-source characteristic fusion of a transducer based on a self-attention mechanism. And uses the full connection layer as a classifier to predict DDI. SSI-DDI and GMPNN are characterized by chemical substructures of the drug that are divided by different means. DDI prediction models based on GAT and MPNN layers and Co-Attention layers are respectively built. The 3DGT is characterized by a relation between a medicine substructure and DDI text, and is characterized by being embedded by CNN and BERT, and a prediction model based on a deep neural network is established based on the characteristics. MTDDI is characterized by a multi-relation DDI network, and a specific DDI prediction model is obtained based on R-GCN and a tensor decoder. Mircle featuring a DDI network, a DDI prediction model based on GCN and key aware attention message propagation method is established. AttentionDDI is characterized by a drug similarity matrix, and establishes a DDI prediction model based on a transducer and a Siamese encoder. However, most DDI relationships in the real world are unknown, especially for new drugs, known DDI relationships are sparse. The method affected by the size of DDI text data amount has difficulty in achieving the best effect among the above problems. Therefore, it is important how to enable models to learn potential DDI relationship information when faced with sparse DDI relationships. Therefore, how to provide a drug-drug interaction prediction method capable of learning potential DDI relationship information when facing sparse DDI relationship becomes a urgent problem for the practitioners of the same person. Disclosure of Invention The invention aims to provide a DDI prediction method based on Siamese structure and graph comparison learning, aiming at the defects in the prior art. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: the invention provides a DDI prediction method based on Siamese structure and graph contrast learning, which comprises the following steps: S1, collecting drug-drug interaction text data, a physicochemical property data file and a targeting relationship data file of a drug, screening and preprocessing the collected data to obtain a drug-drug interaction original data set to be processed; S2, extracting physicochemical property characteristics based on a character string of a molecular format of the SMILES in the original data set of the drug-drug interaction to be processed and targeting relation characteristics based on targets, enzymes and paths, and fusing the physicochemical property characteristics and the targeting relation characteristics to obtain initial characteristics based on the physicochemical property of the drug; S3, obtaining a drug-drug interaction adjacency matrix according to the drug-drug interaction text data, and constructing a drug-drug interaction heterogram by combining the drug physicochemical property and the initial characteristic of the targeting relationship obtained in the S2 with the drug-drug interaction adjacency matrix; S4, inputting the medicine-medicine interaction iso-graph into a Siamese structure-based graph contrast learning model, and learning from a graph topological structure to obtain the embedded characteristi