CN-120183537-B - Polypeptide drug toxicity prediction method and system based on artificial intelligence

CN120183537BCN 120183537 BCN120183537 BCN 120183537BCN-120183537-B

Abstract

The invention discloses an artificial intelligence-based polypeptide drug toxicity prediction method and system, and relates to the technical field of drug screening, wherein the method comprises the following steps: sequence data and property labels of polypeptide molecules are collected, a rate constant prediction model of polypeptide metabolism is constructed to calculate a rate constant, a machine learning model of kinetic simulation is constructed, and toxicity prediction results of the polypeptide molecules and hydrolysis products thereof are obtained. By combining intelligent word segmentation and context statistical information and combining word embedding vectors of polypeptides, the information of the polypeptides can be fully extracted, and the advantages of the two can be combined, so that the prediction of polypeptide toxicity can be more accurately realized. In addition, a first-order reaction kinetic equation is introduced, the toxicity of the polypeptide metabolite is considered, and the evaluation of the overall toxicity of the polypeptide is perfected. Meanwhile, the corresponding amino acid combination information is extracted from the context word frequency information through the overall toxicity prediction, so that the toxicity fragment prediction result is more accurate.

Inventors

WANG XIAOGANG
YAO JIAWEI

Assignees

南方医科大学第三附属医院（广东省骨科研究院）

Dates

Publication Date: 20260512
Application Date: 20250310

Claims (9)

1. An artificial intelligence-based polypeptide drug toxicity prediction method is characterized by comprising the following steps: S1, collecting sequence data and properties of polypeptide molecules, wherein the sequence data and properties of the polypeptide molecules comprise sequence of the polypeptide molecules, related biological activity data, physicochemical property parameters, known toxicity information and possible metabolite data; s2, constructing a rate constant prediction model of polypeptide metabolism based on a Bayesian regression model; S3, constructing a polypeptide toxicity prediction model with kinetic simulation, performing toxicity fraction prediction of polypeptide molecules and hydrolysis products thereof, including extracting fragment sequences from the polypeptide sequences to generate sequence characteristics of polypeptides and polypeptide fragments, inputting the selected characteristic vectors into the trained polypeptide toxicity prediction model, constructing the polypeptide toxicity prediction model based on the kinetic simulation to obtain toxicity prediction results of the polypeptide molecules and the hydrolysis products thereof, wherein the step S3 of constructing the polypeptide toxicity prediction model based on the kinetic simulation includes S34 of calculating the change rate of concentration change with time based on the kinetic modeling, constructing the polypeptide toxicity prediction model by utilizing a convolutional neural network, and constructing the relation between polypeptides and toxicity; When the change rate of the concentration change with time is calculated based on dynamics modeling, the metabolic pathway of the polypeptide P in the body is assumed to accord with the first-order reaction dynamics, and the rate constant is predicted by a model of S2 and is expressed as follows: i=1,2,...,n; wherein [ p ] and The concentration of the polypeptide and the polypeptide metabolite, k and The rate constants of the polypeptide and the polypeptide metabolite, respectively, Is the rate of change of the polypeptide concentration [ p ] with time t, Is the concentration of the metabolite The rate of change with time, at time t, the metabolite concentration is = Peak time of Peak concentration of 。
2. The artificial intelligence-based polypeptide drug toxicity prediction method according to claim 1, wherein the rate constant prediction model of polypeptide metabolism in step S2 constructs a relationship between molecular weight, hydrophobicity, molecular fragment descriptors, charge distribution, number of hydrogen bond donors and acceptors and metabolic rate constants of polypeptides and their metabolites.
3. The method according to claim 2, wherein the step S2 further comprises feature selection and importance assessment, wherein key features in the metabolic rate constant prediction are screened out by using the feature importance assessment, and the contribution of each feature to the model prediction is quantified.
4. The artificial intelligence-based polypeptide drug toxicity prediction method of claim 1, wherein step S3 further comprises: s31, preprocessing data, namely acquiring a medicine data set, and extracting molecular weight, hydrophobicity, molecular fragment descriptors, charge distribution, polypeptide sequences, toxicity data of polypeptides, metabolite data and secondary structure quantity of polypeptides by processing original biochemical test data; s32, performing word segmentation operation on the polypeptide sequence, acquiring polypeptide fragments by using a sliding window with the length of T, counting the frequency of the fragments to form a key fragment list, and constructing a characteristic space of the sequence; S33, representing the amino acid sequence by using a word embedding model, mapping each amino acid into a high-dimensional vector, and capturing semantic relations and interactions among the amino acids to obtain a feature vector of the polypeptide sequence.
5. The method for predicting the toxicity of a polypeptide drug based on artificial intelligence according to claim 4, wherein the constructing a model for predicting the toxicity of the polypeptide by using a convolutional neural network comprises the steps of: ; Wherein, the For the peak concentration of the metabolite(s), As the weight of the model is given, I is the sequence number of the polypeptide metabolite as a nonlinear activation function.
6. The method according to claim 4, wherein the step S34 further comprises dividing the training set, the verification set and the test set by using the data set in the step S31 for training, parameter adjustment and evaluation of the model, training by using the labeled training set of polypeptide toxicity, and learning the sequence characteristics and the combination relationship of the polypeptide toxicity.
7. The method according to claim 4, wherein the step S34 further comprises evaluating the trained model using a validation set after the training, evaluating the model' S performance on unseen data using a test set, checking its accuracy and generalization ability in predicting polypeptide toxicity, and tuning and improving the model according to the result resolution.
8. An artificial intelligence based polypeptide drug toxicity prediction system based on the artificial intelligence based polypeptide drug toxicity prediction method of any one of claims 1 to 7, comprising: The data acquisition module is used for acquiring sequence data and property labels of the polypeptide molecules and identifying sequence combination characteristics in the polypeptide drug molecules; a rate constant prediction module of polypeptide metabolism for predicting a rate constant of polypeptide metabolism; The toxicity prediction module is used for extracting a fragment sequence from a polypeptide sequence to generate sequence characteristics of the polypeptide and the polypeptide fragment, inputting the selected characteristic vector into a trained toxicity prediction model to predict the drug property, and constructing a machine learning model with kinetic simulation to obtain a toxicity prediction result of the polypeptide molecule and a hydrolysate thereof.
9. A computer-readable storage medium having stored thereon program instructions of an artificial intelligence-based polypeptide drug toxicity prediction method executable by one or more processors to implement the steps of the artificial intelligence-based polypeptide drug toxicity prediction method of any one of claims 1 to 7.

Description

Polypeptide drug toxicity prediction method and system based on artificial intelligence Technical Field The invention relates to the field of drug screening, in particular to an artificial intelligence-based polypeptide drug toxicity prediction method and system. Background Drug screening is a complex and time-consuming process in which drug toxicity assessment is one of the key steps. Conventional techniques for detecting toxicity of compounds typically require biochemical tests, cellular experiments, and even animal models, which are not only time consuming but also costly. The toxicity of the drug determines whether the drug can pass experiments and audits, and the rapid and effective identification of the toxicity of the polypeptide is a key step in the development of the polypeptide drug. However, this predictive task remains challenging due to the complexity and diversity of polypeptides and the uncertainty of their hydrolysis products that are produced during metabolism in vivo. Disclosure of Invention In order to solve the technical problems of polypeptide drug toxicity prediction in the prior art, the invention provides an artificial intelligence-based polypeptide drug toxicity prediction method and system. The invention is realized by the following technical scheme: an artificial intelligence-based polypeptide drug toxicity prediction method, comprising: S1, collecting sequence data and properties of polypeptide molecules, wherein the sequence data and properties of the polypeptide molecules comprise sequence of the polypeptide molecules, related biological activity data, physicochemical property parameters, known toxicity information and possible metabolite data; s2, constructing a rate constant prediction model of polypeptide metabolism based on a Bayesian regression model; S3, constructing a toxicity prediction model to predict the toxicity fractions of the polypeptide molecules and the hydrolysis products thereof, wherein the toxicity prediction model comprises the steps of extracting fragment sequences from polypeptide sequences to generate sequence characteristics of the polypeptides and the polypeptide fragments, inputting the selected characteristic vectors into the trained toxicity prediction model to predict the toxicity of the polypeptides, and constructing a dynamics simulation polypeptide toxicity prediction model to obtain toxicity prediction results of the polypeptide molecules and the hydrolysis products thereof. Further, the rate constant prediction model of polypeptide metabolism in step S2 constructs a relationship between the molecular weight, hydrophobicity, molecular fragment descriptors, charge distribution, number of hydrogen bond donors and acceptors, and metabolic rate constants of the polypeptide and its metabolites. Further, the step S2 further includes feature selection and importance evaluation, and key features in the metabolic rate constant prediction are screened out by using the feature importance evaluation, and the contribution of each feature to the model prediction is quantified. Further, the step S3 further includes: s31, preprocessing data, namely acquiring a medicine data set, and extracting molecular weight, hydrophobicity, molecular fragment descriptors, charge distribution, polypeptide sequences, toxicity data of polypeptides, metabolite data and secondary structure quantity of polypeptides by processing original biochemical test data; s32, performing word segmentation operation on the polypeptide sequence, acquiring polypeptide fragments by using a sliding window with the length of T, counting the frequency of the fragments to form a key fragment list, and constructing a characteristic space of the sequence; s33, representing the amino acid sequence by using a word embedding model, mapping each amino acid into a high-dimensional vector, and capturing semantic relation and interaction among the amino acids to obtain a feature vector of the polypeptide sequence; S34, calculating the change rate of the concentration along with the time change based on dynamic modeling, constructing a polypeptide toxicity prediction model by using a convolutional neural network, and constructing the relation between the polypeptide and the toxicity. Further, when the change rate of the concentration change with time is calculated based on the dynamics modeling, the metabolic path of the polypeptide P in the body is assumed to conform to the first order reaction dynamics, and the rate constant is predicted by the model of S2 and expressed as follows: i=1,2,...,n wherein [ p ] and The concentration of the polypeptide and the polypeptide metabolite, k andThe rate constants of the polypeptide and the polypeptide metabolite, respectively,Is the rate of change of the polypeptide concentration [ p ] with time t,Is the concentration of the metaboliteSpeed over time. At time t, the metabolite concentration is=Peak time ofThe peak concentration is obtained by a back-substitution form