Search

CN-121999881-A - Blood brain barrier penetrating peptide prediction method and system based on fusion sequence and structural information

CN121999881ACN 121999881 ACN121999881 ACN 121999881ACN-121999881-A

Abstract

The invention relates to a blood brain barrier penetrating peptide prediction method and system of fusion sequence and structural information. The method comprises the steps of obtaining an amino acid sequence of a peptide to be predicted, respectively inputting the amino acid sequence into a pre-trained ESM-2 protein language model and a pre-trained ESMFold model to obtain sequence modal characteristics and structure modal characteristics, carrying out bidirectional characteristic alignment fusion processing through a bidirectional cross-modal collaborative attention mechanism to obtain fused residue level characteristics, gathering the fused residue level characteristics by adopting length-aware mask averaging pooling to obtain sequence global characteristics and structure global characteristics, carrying out characteristic splicing on the sequence global characteristics and the structure global characteristics, and obtaining a blood brain barrier penetrating peptide prediction result based on the spliced characteristics. High-precision prediction and interpretable analysis of the blood brain barrier penetrating peptide are realized, the screening efficiency of the blood brain barrier delivery peptide is improved, and the cost is reduced.

Inventors

  • CUI FEIFEI
  • Lv jingwei
  • ZHANG ZILONG

Assignees

  • 海南大学

Dates

Publication Date
20260508
Application Date
20260120

Claims (9)

  1. 1. A method for predicting a blood brain barrier penetrating peptide by fusing sequence and structural information, the method comprising: acquiring an amino acid sequence of a peptide to be predicted, inputting the amino acid sequence into a pre-trained ESM-2 protein language model, acquiring a context semantic representation corresponding to an amino acid residue, and extracting through lightweight linear mapping to obtain sequence modal characteristics; inputting the amino acid sequence into ESMFold model for three-dimensional structure prediction, obtaining space coordinate information of amino acid residues, and constructing a residue contact diagram based on the space coordinate information to obtain structural modal characteristics; performing bidirectional feature alignment fusion processing on the sequence modal features and the structural modal features through a bidirectional cross-modal collaborative attention mechanism to obtain fused residue level features; And converging the fused residue level features by adopting length-aware mask averaging pooling to obtain sequence global features and structure global features, and performing feature stitching on the sequence global features and the structure global features to obtain a blood brain barrier penetrating peptide prediction result based on the stitched features.
  2. 2. The method for predicting the blood brain barrier penetrating peptide by fusing sequence and structure information according to claim 1, wherein obtaining the context semantic representation corresponding to the amino acid residues, extracting the sequence modal feature through lightweight linear mapping, comprises: Acquiring context semantic representations corresponding to amino acid residues and taking the context semantic representations as sequence characteristics; And aligning the channel dimension of the sequence feature with the hidden space dimension through lightweight linear mapping to obtain the sequence modal feature.
  3. 3. The method for predicting a blood brain barrier penetrating peptide by fusing sequences and structural information according to claim 1, wherein constructing a residue contact map based on the spatial coordinate information, obtaining structural modal characteristics, comprises: determining a spatial distance relation between residues according to the spatial coordinate information, establishing a connecting edge according to the spatial distance relation, and expanding and encoding the connecting edge through a radial basis function to obtain edge characteristics; acquiring thermal coding and normalized physical and chemical attribute vectors of amino acid residues, and taking the thermal coding and normalized physical and chemical attribute vectors as node characteristics; Constructing a residue contact diagram according to the edge characteristics and the node characteristics; And carrying out structural information aggregation on the residue contact diagram by adopting a multi-head attention mechanism with edge characteristics to obtain structural modal characteristics.
  4. 4. The method for predicting the blood brain barrier penetrating peptide by fusing sequences and structural information according to claim 1, wherein the bidirectional feature alignment fusion processing is performed on the sequence modal features and the structural modal features through a bidirectional cross-modal cooperative attention mechanism to obtain fused residue level features, and the method comprises the following steps: Respectively calculating the attention mapping of the sequence modal characteristics to the structure modal characteristics and the attention mapping of the structure modal characteristics to the sequence modal characteristics through a bidirectional cross-modal collaborative attention mechanism; And establishing a display alignment relation between the sequence residues and the structural field based on the calculated attention map, and obtaining the fused residue level characteristics through stacking fusion.
  5. 5. The method for predicting the blood brain barrier penetrating peptide by fusing sequences and structural information according to claim 1, wherein the fused residue level features comprise residue level features of sequence modes and residue level features of structural modes, and the method for converging the fused residue level features by adopting length-aware mask averaging pooling to obtain sequence global features and structure global features comprises the following steps: acquiring a mask matrix corresponding to the fused residue level features; Performing mask screening on the fused residue level features based on the mask matrix by adopting length-aware mask average pooling to obtain a sequence modal effective residue feature set and a structural modal effective residue feature set; and carrying out feature summation and average pooling operation on the sequence modal effective residue feature set and the structural modal effective residue feature set to obtain sequence global features and structural global features.
  6. 6. The method for predicting the blood-brain barrier penetrating peptide by fusing sequences and structural information according to claim 1, wherein the feature stitching is performed on the sequence global features and the structural global features, and the blood-brain barrier penetrating peptide prediction result is obtained based on the stitched features, and the method comprises the following steps: normalizing the sequence global features and the structure global features, and splicing the features in a channel dimension superposition mode to obtain spliced joint features; And carrying out layer normalization and nonlinear activation mapping processing on the combined features, outputting prediction probability through a compact full-connection discrimination network, and obtaining a blood brain barrier penetrating peptide prediction result based on the prediction probability.
  7. 7. The method for predicting the blood brain barrier penetrating peptide by fusing sequence and structure information according to claim 1, which is applied to a blood brain barrier penetrating peptide prediction model and is characterized in that the blood brain barrier penetrating peptide prediction model adopts two-stage training: the first stage is a representation learning stage, wherein the contrast learning and supervision loss combined training is adopted, and the feature representation consistent with the cross-modal learning is adopted; The second stage is a supervised fine tuning stage, and in the case of a frozen part module, the classification task is optimized by using a class imbalance perceived loss function.
  8. 8. The method of predicting blood brain barrier penetrating peptide of fusion sequence and structural information of claim 7, wherein the class imbalance perceived loss function is a composite loss function based on label smoothing and focus loss.
  9. 9. A blood brain barrier penetrating peptide prediction system that fuses sequence and structural information, the system comprising: The sequence modal feature extraction module is used for acquiring an amino acid sequence of the peptide to be predicted, inputting the amino acid sequence into a pre-trained ESM-2 protein language model, acquiring context semantic representation corresponding to amino acid residues, and extracting the sequence modal feature through lightweight linear mapping; The structural modal feature extraction module is used for inputting the amino acid sequence into a ESMFold model for three-dimensional structural prediction, acquiring the space coordinate information of amino acid residues, and constructing a residue contact diagram based on the space coordinate information to obtain structural modal features; The feature alignment fusion module is used for carrying out bidirectional feature alignment fusion processing on the sequence modal features and the structural modal features through a bidirectional cross-modal collaborative attention mechanism to obtain fused residue level features; The feature processing and predicting module is used for gathering the fused residue level features by adopting length-aware mask averaging pooling to obtain sequence global features and structure global features, and performing feature splicing on the sequence global features and the structure global features to obtain a blood brain barrier penetrating peptide predicting result based on the spliced features.

Description

Blood brain barrier penetrating peptide prediction method and system based on fusion sequence and structural information Technical Field The invention relates to the technical field of biological peptide screening, in particular to a blood brain barrier penetrating peptide prediction method and system based on fusion sequences and structural information. Background The Blood-Brain Barrier (BBB) is a highly selective biological Barrier consisting of Brain microvascular endothelial cells and their tight junctions, the main function of which is to maintain homeostasis of the central nervous system and to prevent harmful substances from entering Brain tissue. However, the presence of the BBB also severely limits the efficiency of therapeutic drug delivery to the brain, especially macromolecular and polypeptide drugs, which is one of the key bottlenecks in the treatment of neurological diseases. In recent years, blood-Brain Barrier penetrating peptide (Blood-Brain Barrier PENETRATING PEPTIDES, BBBPPS) is widely considered as an ideal carrier for realizing Brain targeting delivery due to the advantages of small volume, strong penetrating power, good biocompatibility and the like. The BBB penetrating peptide can cross the blood brain barrier through various mechanisms (such as adsorption-mediated transport, receptor-mediated transport or cell membrane penetration), and has important application prospects in the fields of drug delivery, neuroimaging, brain disease treatment and the like. However, the barrier-crossing ability of the BBB to penetrate peptides is closely related to its amino acid sequence, physicochemical properties, spatial conformation, and charge and hydrophobicity distribution, which are complex and highly nonlinear in their mechanism of action. The BBB penetrating peptide is screened only by the traditional experimental method, so that the experimental period is long, the cost is high, and efficient screening is difficult to realize in a large-scale candidate peptide library. Currently, studies of blood brain barrier penetrating peptides rely mainly on in vivo or in vitro experimental methods, such as animal model experiments, cell transport experiments, etc., to verify whether the polypeptides can successfully cross the BBB. With the rapid development of artificial intelligence technology, machine learning and deep learning methods have shown great potential in the fields of bioinformatics and drug development. By constructing a prediction model based on sequence, structure and physicochemical characteristics, the BBB penetration capacity of the candidate polypeptide can be rapidly evaluated under the condition of not depending on a large number of experiments, so that the screening efficiency and the research depth of the BBB penetration peptide are remarkably improved. In terms of computational methods, some studies have attempted to classify and predict BBB penetrating peptides based on amino acid composition or simple physicochemical features using traditional machine learning algorithms (such as support vector machines, random forests, or logistic regression). However, depending on animal experiments or cell experiments to verify BBB penetration capability, the method has the advantages of complex experimental flow, long time consumption, high experimental cost, difficulty in meeting the requirement of large-scale polypeptide screening, difficulty in capturing long-range dependency relationship and context information in polypeptide sequences due to the fact that the existing methods based on support vector machines or random forests mostly adopt simple amino acid composition or low-dimensional physicochemical characteristics, difficulty in capturing the long-range dependency relationship and context information in polypeptide sequences, and limited model expression capability, and the existing models generally only output simple two-class or probability prediction results and lack of explanation of key amino acid residues or important characteristics. Therefore, the traditional blood brain barrier penetrating peptide prediction mode has the problems of obviously insufficient prediction precision, feature mining depth and biological interpretability due to high difficulty in capturing data and low data utilization rate. Disclosure of Invention Based on the above, in order to solve the above technical problems, a method and a system for predicting a blood brain barrier penetrating peptide by fusing sequences and structural information are provided, which can realize high-precision prediction and interpretable analysis of the blood brain barrier penetrating peptide, improve the screening efficiency of the blood brain barrier delivering peptide and reduce the cost. A method of predicting a blood brain barrier penetrating peptide by fusing sequence and structural information, the method comprising: acquiring an amino acid sequence of a peptide to be predicted, inputting the amino acid sequence