Search

CN-121983113-A - Drug target point association prediction method and system based on multi-mode fusion

CN121983113ACN 121983113 ACN121983113 ACN 121983113ACN-121983113-A

Abstract

The invention discloses a drug target correlation prediction method and system based on multi-mode fusion, wherein a mode reservation and interactive fusion strategy is provided, specific information of each mode such as sequence semantics, three-dimensional geometric structures and the like of a drug and a target is reserved through a bilinear model, complex high-order interaction relations among the modes are explicitly modeled, and meanwhile, cross-mode contrast learning is introduced as regularization constraint, so that consistency of representation of different modes of the same entity is ensured, and the problems of information loss and interaction deficiency caused by forced unified representation in the traditional method are fundamentally solved.

Inventors

  • WANG JIACHENG
  • CHEN YAOJIA
  • HAN HONGBIN
  • ZOU QUAN

Assignees

  • 电子科技大学长三角研究院(衢州)

Dates

Publication Date
20260505
Application Date
20251222

Claims (10)

  1. 1. A drug target point association prediction method based on multi-mode fusion is characterized by comprising the following steps: S1, acquiring SMILES sequence and three-dimensional structure information of a drug molecule, amino acid sequence and three-dimensional structure information of a target protein, and a knowledge graph constructed based on biomedical heterogeneous information; S2, respectively extracting multi-mode characteristic representations of the medicine and the target protein, wherein the multi-mode characteristic representations comprise: extracting a sequence semantic feature representation of the drug based on a pre-trained molecular language model; extracting three-dimensional geometric structural feature representation of the medicine based on the geometric neural network; extracting sequence semantic feature representation of target protein based on a pre-trained protein language model; Extracting three-dimensional space structural feature representation of target protein based on a structural diagram neural network; Based on the knowledge graph, extracting the structural characteristic representation of the knowledge graph of the medicine and the target protein by using a graph attention network; S3, performing cross-modal interactive fusion on the sequence semantic feature representation, the structural feature representation and the knowledge graph structural feature representation through a bilinear fusion model, introducing contrast learning regularization constraint to strengthen the correlation among modes, and combining a joint loss function to complete model training; S4, inputting the multi-modal characteristic representation of the drug to be predicted and the target protein into a model with training completed, and outputting a prediction result of drug target interaction.
  2. 2. The multi-modal fusion-based drug target association prediction method of claim 1, wherein the three-dimensional geometric feature representation is extracted by: acquiring three-dimensional coordinate information of drug molecules, and respectively constructing atom-bond graphs based on the information And key-angle diagram Wherein node set V corresponds to all atoms in the drug molecule, edge set Ɛ corresponds to chemical bonds between atoms, and edge set A corresponds to bond angles formed between chemical bonds; And respectively taking the chemical characteristics of atoms and the type characteristics of chemical bonds as node characteristics and edge characteristics of an atom-bond diagram, taking the geometric characteristics of bond angles as edge characteristics of a bond-angle diagram, inputting the complete characteristics of the atom-bond diagram and the bond-angle diagram into a pretrained geometric diagram neural network, iteratively updating the representations of the atoms, the bonds and the bond angles through message transmission, and obtaining the three-dimensional geometric structure characteristic representation through pooling operation.
  3. 3. The multi-modal fusion-based drug target association prediction method of claim 1, wherein the three-dimensional spatial structural feature representation is extracted by: three classes of directed edges were constructed using each amino acid residue as a graph node: Sequence edges residues no more than 3 positions apart on the linear sequence; radius edge is the residue with the C alpha atom distance smaller than 10A in the connecting three-dimensional structure; k adjacent edges, namely connecting 10 nearest neighbor residues of each residue in a space distance; the three types of edges are merged and then input into the structural diagram neural network, neighbor information is aggregated through the relation-specific leachable transformation, and the neighbor information is fused with the node information, and residue level characteristics are obtained after multi-layer iteration; and converging the residue level characteristics to generate a three-dimensional spatial structure characteristic representation of the target protein.
  4. 4. The multi-modal fusion-based drug target association prediction method of claim 1, wherein the knowledge-graph structural feature representation is extracted by: The graph attention network carries out weighted aggregation on neighbor features of the entities in the knowledge graph and captures local topology and semantic information of the entities; introducing TransE energy functions As a constraint in which , , The embedded vectors of the head entity, the relation and the tail entity are respectively; Training by using a range loss function, wherein the range loss function is as follows: Wherein T is a positive triplet set, T' is a negative triplet set, Is a marginal parameter.
  5. 5. The multi-modal fusion-based drug target association prediction method of any one of claims 1-4, wherein the molecular language model is a ChemBERTa model based on a transducer architecture and pre-trained on a large-scale SMILES sequence; The protein language model is ProtBERT model which is based on a transducer architecture and is pre-trained on large-scale amino acid sequences; The geometric neural network is GeoGNN model; the structural diagram neural network is GearNet model; The graph-annotation network is a GAT model with TransE losses.
  6. 6. The multi-modal fusion-based drug target association prediction method according to claim 1, wherein in step S3, the bilinear fusion model implements multi-modal interaction based on a Tucker decomposition by the following formula: Wherein the method comprises the steps of Is embedded for the entity after multi-mode fusion, Is a potential representation of the transformed result, ; S corresponds to a knowledge spectrum mode, v corresponds to a structural mode of two molecules, t corresponds to a sequence mode of the two molecules, and m corresponds to a result obtained by fusing multiple modes; The transformation matrix of each mode is respectively used, Is the core tensor.
  7. 7. The multi-modal fusion-based drug target association prediction method of claim 1, wherein in step S3, the method for introducing contrast learning regularization constraints comprises: taking the representation pairs of different modes of the same entity as positive samples, taking the representation pairs of modes of different entities as negative samples, calculating the difference between the samples through a distance measurement function, and optimizing the modal alignment by combining the contrast learning loss shown in the formula (6): Where L CLi is the contrast loss for the ith entity, M is the set of modes, 、 For the embedding of the ith entity in the p and q modes, N is the small batch sample number.
  8. 8. The method for predicting drug target association based on multi-modal fusion according to claim 1, wherein in step S3, the joint loss function is as follows: Wherein, the Representing the binary cross entropy loss of the kth modality, As a weight parameter that can be learned, S corresponds to a knowledge spectrum mode, v corresponds to a structural mode of two molecules, t corresponds to a sequence mode of the two molecules, m corresponds to a result obtained by fusing a plurality of modes, 。
  9. 9. The method according to claim 1, wherein in step S1, the biomedical heterogeneous information is derived from at least one database in KEGG, drugBank, interPro, uniProt, and the knowledge graph comprises at least three node types in a drug, a protein, a pathway, a BRITE, and a biological process, and at least two side types in a drug-protein, a protein-protein, and a drug-pathway.
  10. 10. A drug target association prediction system based on multi-mode fusion, which is characterized in that the system is used for realizing the method of any one of claims 1-9, and comprises the following steps: the data acquisition module is used for acquiring SMILES sequences and three-dimensional structure information of drug molecules, amino acid sequences and three-dimensional structure information of target proteins and a knowledge graph constructed based on biomedical heterogeneous information; The feature extraction module comprises a molecular sequence feature extraction unit, a medicine structure feature extraction unit, a protein sequence feature extraction unit, a protein structure feature extraction unit and a knowledge graph feature extraction unit, which are respectively used for extracting sequence semantic feature representation, three-dimensional geometric structure feature representation, sequence semantic feature representation and three-dimensional space structure feature representation of target proteins of medicines; The multi-mode fusion module is used for realizing cross-mode interactive fusion through a bilinear fusion model, introducing contrast learning regularization constraint and combining a joint loss function to complete model training; and the prediction output module is used for receiving the multi-mode characteristic representation of the drug to be predicted and the target protein and outputting a prediction result of drug target interaction through a model which is completed by training.

Description

Drug target point association prediction method and system based on multi-mode fusion Technical Field The invention belongs to the field of computer bioinformatics, and particularly relates to a drug target point association prediction method and system based on multi-mode fusion. Background Drug-target interaction prediction in the bioinformatics field is important for accelerating drug discovery and reducing development cost. Traditional biological experiment methods are time-consuming and expensive, so that calculation methods and artificial intelligence tools become important supplementary means. The existing calculation method mainly can be divided into several categories, namely a method based on pharmaceutical chemistry characteristics converts prediction into classification problems, molecular fingerprints and protein sequence characteristics are utilized, the prediction is performed by combining a support vector machine, a random forest and even a deep learning model, the structure-based method depends on three-dimensional information of targets and is limited when the structure is unknown, and the ligand-based method is limited when known activity data is deficient. In addition, the knowledge-graph-based method is rapidly developed, and potential interaction is deduced by constructing a multi-mode biomedical knowledge graph and converting the multi-mode biomedical knowledge graph into a link prediction problem. However, the existing method still has significant challenges that knowledge maps often have incomplete information, the traditional link prediction model is easily influenced by triad structure bias, and more importantly, in the aspect of multi-mode fusion, the existing method usually forcedly projects different mode information such as sequences, structures and the like of drugs and targets to a single shared space for fusion, although the commonalities among modes can be mined, the unique specific information of each mode is inevitably lost, and complicated high-order interaction relations among modes are difficult to fully describe, so that the accuracy and generalization capability of the prediction model are limited. Disclosure of Invention The invention aims to provide a drug target point association prediction method and system based on multi-mode fusion aiming at the problems existing in the prior art. In order to achieve the above purpose, the present invention adopts the following technical scheme: a drug target point association prediction method based on multi-mode fusion comprises the following steps: S1, acquiring SMILES sequence and three-dimensional structure information of a drug molecule, amino acid sequence and three-dimensional structure information of a target protein, and a knowledge graph constructed based on biomedical heterogeneous information; S2, respectively extracting multi-mode characteristic representations of the medicine and the target protein, wherein the multi-mode characteristic representations comprise: extracting a sequence semantic feature representation of the drug based on a pre-trained molecular language model; extracting three-dimensional geometric structural feature representation of the medicine based on the geometric neural network; extracting sequence semantic feature representation of target protein based on a pre-trained protein language model; Extracting three-dimensional space structural feature representation of target protein based on a structural diagram neural network; Based on the knowledge graph, extracting the structural characteristic representation of the knowledge graph of the medicine and the target protein by using a graph attention network; S3, performing cross-modal interactive fusion on the sequence semantic feature representation, the structural feature representation and the knowledge graph structural feature representation through a bilinear fusion model, introducing contrast learning regularization constraint to strengthen the correlation among modes, and combining a joint loss function to complete model training; S4, inputting the multi-modal characteristic representation of the drug to be predicted and the target protein into a model with training completed, and outputting a prediction result of drug target interaction. In the above drug target correlation prediction method based on multi-modal fusion, the three-dimensional geometric structural feature representation is extracted by the following method: acquiring three-dimensional coordinate information of drug molecules, and respectively constructing atom-bond graphs based on the information And key-angle diagramWherein node set V corresponds to all atoms in the drug molecule, edge set Ɛ corresponds to chemical bonds between atoms, and edge set A corresponds to bond angles formed between chemical bonds; And respectively taking the chemical characteristics of atoms and the type characteristics of chemical bonds as node characteristics and edge characteristics of an at