Search

CN-116884474-B - Trusted drug-target correlation prediction method

CN116884474BCN 116884474 BCN116884474 BCN 116884474BCN-116884474-B

Abstract

The invention discloses a credible medicine-target point correlation prediction method which comprises the steps of representing structural characteristics of a medicine-target point, obtaining characteristic vectors related to the structural characteristics of the medicine-target point by utilizing a convolution representation learning module according to an initial character sequence of the medicine-target point, and performing credible prediction on a medicine-target point correlation coefficient by utilizing a depth evidence module. The invention can train a stable credible prediction model based on a huge drug-target point correlation pair database formed by a large number of chemical experimental results. Screening the drug-target point pairs before entering a drug experiment, finding out a reliable activity association pair in advance, and shortening the research and development period of the drug. At the same time, a large number of 'reliable' inactive correlation pairs are screened out, thereby greatly reducing the experimental cost on the inactive correlation pairs.

Inventors

  • LI JIN
  • WEN CHAOYU
  • Xiong Qiangwei
  • TANG YINGCHUN
  • Xie Xingran
  • LI CANPENG

Assignees

  • 云南大学

Dates

Publication Date
20260508
Application Date
20230618

Claims (6)

  1. 1. A method for predicting a trusted drug-target association, comprising the steps of: (1) Obtaining feature vectors representing drug-target structural features In step (1), feature vectors representing structural features of the drug-target are obtained The method specifically comprises the following steps: (1-1) scanning the original structure sequence of the medicine by using Morgan fingerprint generation algorithm to obtain molecular fingerprints of binary vectors of the SMILES sequence, then collecting indexes of all molecular fingerprint vectors coded as '1' so as to form coded vectors of the medicine C i , and feeding the coded vectors of the medicine C i into a medicine structure information query matrix Selecting different rows of matrix E C to form a characteristic map matrix of the medicine; (1-2) feeding the eigenvector matrix of the drug C i into a convolutional neural network C C , and finally, flattening the output of the convolutional neural network C C into a column vector, and performing dimension conversion by a fully-connected layer as the eigenvector of the structural feature of the drug In step (1), feature vectors representing structural features of the drug-target are obtained The method specifically comprises the following steps: (1-3) scanning the target P i of the initial protein amino acid sequence by using a character encoding table to obtain an initial encoding vector, and sending the initial encoding vector of the target to a structure information query matrix According to the index of the coding vector, selecting different rows of a matrix E P for obtaining an initial feature map matrix of the target point; (1-4) sending the feature map matrix of the target point P i into the convolutional neural network C P , flattening the output feature map, and performing dimension conversion by the full connection layer to serve as the feature vector of the target point P i (2) The feature vector is fused, the feature vector is used as the input of a depth evidence classification model, the evidence vector e i of a drug-target point pair is output, the parameter alpha i ,β i of the predicted Dirichlet distribution Beta (p i |α i ,β i ) is used for determining, and a loss optimization function is constructed according to the Dirichlet distribution Updating parameters of a depth evidence classification model by taking a minimum loss optimization function as a target, wherein y i is taken as a random variable of a predicted binomial distribution, a prediction label is described, p i is taken as a random variable of a Dirichlet distribution on the binomial distribution, prediction uncertainty is described, and a BCE (·) represents a cross entropy function; (3) Inputting the new drug-target point pair to be predicted into the depth evidence classification model again, performing feature extraction and fusion by using the step (1), obtaining predicted Dirichlet distribution Beta (p i |α i ,β i ) of the drug-target point pair by using the step (2), performing credible relevance prediction, and returning the relevance coefficient of the drug-target point pair and the uncertainty coefficient of the prediction to a user as a prediction result.
  2. 2. The method of claim 1, wherein in step (2), the deep evidence classification model comprises two fully connected layers and uses ReLu as an activation function to convert the output to a non-negative value by: Wherein e i is the output evidence vector, Parameters representing a deep evidence classification model.
  3. 3. The method of claim 2, wherein the evidence vector e i is used to determine the predicted Beta distribution Beta (p i |α i ,β i ) for the drug-target by: according to subjective logic theory, the predicted activity and the non-activity probability of the drug-target point are respectively described as follows: the overall uncertainty of this prediction is described as: where k=2 represents a two-class prediction problem, i.e. prediction as active or inactive.
  4. 4. A method of trusted drug-target relevance prediction as claimed in claim 1 or claim 3, wherein in step (2), in the training phase of the deep evidence classification model, a Beta loss optimization function is constructed for each drug-target in the training set Wherein BCE (-) represents the cross entropy function, and, at the same time, KL divergence is chosen to further limit the amount of evidence that appears on the error category As regularization term, together with Beta distribution, as a loss optimization function, the following is concrete: Wherein, the And Beta distribution parameters after non-error evidence is removed from alpha i ,β i , and lambda t E [0,1] is a regularized annealing coefficient.
  5. 5. The method of claim 4, wherein in step (2), the minimization of the loss optimization function is targeted at the training stage of the deep evidence classification model Query matrix E C ,E P for structural information of medicine and target spot respectively and parameters of evidence module And seeking, synchronously updating by adopting a gradient descent method, gradually reducing the loss value to a certain threshold value, and storing the model parameters obtained at the moment.
  6. 6. The method for predicting the correlation of a trusted drug to a target as claimed in claim 1, wherein the step (3) specifically comprises the steps of: (3-1) obtaining an initial SMILES sequence and a protein amino acid sequence of a drug-target point to be predicted from a PubChem database and an RCSB PDB database, respectively; (3-2) performing feature extraction and fusion using step (1), performing "trusted correlation prediction" on the drug-target pair using step (2), and correlating the correlation coefficient of the drug-target pair And the uncertainty coefficient u of the prediction is used as a prediction result and returned to the user.

Description

Trusted drug-target correlation prediction method Technical Field The invention relates to the technical field of artificial intelligence, in particular to a trusted drug-target correlation prediction method. Background Drug discovery meeting the pharmaceutical industry standards requires the co-cooperation of various scientific fields, and verifying drug-target association by traditional drug experimental methods is often time-consuming and expensive. In recent years, with the development of machine learning and deep neural networks, a huge drug-target point association pair database formed based on a large number of experimental results is utilized to screen drug-target point pairs before entering a drug experiment, so that an 'activity' association pair is found in advance, the experimental cost is greatly reduced, and the research and development period of the drug is shortened. However, the above methods still have certain limitations. The traditional machine learning drug-target correlation prediction model can only judge whether a certain drug-target pair has correlation or not, but cannot give the reliability degree of the judgment result. For some drug-target pairs predicted to be "active" are put into chemical experiments, often the relevant activity cannot be verified, wasting experimental costs, while simply ignoring those drug-target pairs judged to be "inactive" may result in some drug-target pairs that are truly "active" related to not being found. Therefore, it is necessary for the model to judge the reliability of the correlation prediction result while giving it, thereby further reducing the experimental cost on "unreliable" data points. Disclosure of Invention In order to solve the problems in the prior art, the invention aims to provide a trusted drug-target spot relevance prediction method, solves the problem that the existing drug-target spot relevance prediction model cannot judge the reliability of results, improves the screening quality of drug-target spots through reliability indexes, and further shortens the drug discovery period. In order to achieve the aim, the technical scheme adopted by the invention is that the credible drug-target point relevance prediction method comprises the following steps: (1) Obtaining feature vectors representing drug-target structural features (2) The feature vector is fused, the feature vector is used as the input of a depth evidence classification model, the evidence vector e i of a drug-target point pair is output, the parameter alpha i,βi of the predicted Dirichlet distribution Beta (p i|αi,βi) is used for determining, and a loss optimization function is constructed according to the Dirichlet distributionUpdating parameters of a depth evidence classification model by taking a minimum loss optimization function as a target, wherein y i is taken as a random variable of a predicted binomial distribution, a prediction label is described, p i is taken as a random variable of a Dirichlet distribution on the binomial distribution, prediction uncertainty is described, and a BCE (·) represents a cross entropy function; (3) Inputting the new drug-target point pair to be predicted into the depth evidence classification model again, performing feature extraction and fusion by using the step (1), obtaining predicted Dirichlet distribution Beta (p i|αi,βi) of the drug-target point pair by using the step (2), performing credible relevance prediction, and returning the relevance coefficient of the drug-target point pair and the uncertainty coefficient of the prediction to a user as a prediction result. As a further improvement of the present invention, in step (1), a feature vector representing the structural feature of the drug-target is obtainedThe method specifically comprises the following steps: (1-1) scanning the original structure sequence of the medicine by using Morgan fingerprint generation algorithm to obtain molecular fingerprints of binary vectors of the SMILES sequence, then collecting indexes of all molecular fingerprint vectors coded as '1' so as to form coded vectors of the medicine C i, and feeding the coded vectors of the medicine C i into a medicine structure information query matrix Selecting different rows of matrix E C to form a characteristic map matrix of the medicine; (1-2) feeding the eigenvector matrix of the drug C i into a convolutional neural network C C, and finally, flattening the output of the convolutional neural network C C into a column vector, and performing dimension conversion by a fully-connected layer as the eigenvector of the structural feature of the drug As a further improvement of the present invention, in step (1), a feature vector representing the structural feature of the drug-target is obtainedThe method specifically comprises the following steps: (1-3) scanning the target P i of the initial protein amino acid sequence by using a character encoding table to obtain an initial encoding vector, and sending the initi