CN-120766773-B - Antihypertensive peptide classification method combining fusion sequence and structural multi-modal characteristics and combining contrast-generation type combined optimization
Abstract
The invention relates to an antihypertensive peptide classification method combining a fusion sequence and a structural multimode characteristic and combining contrast-generation type joint optimization, which comprises the steps of extracting sequence characteristic representation and structural characteristic representation of peptide fragments, carrying out contrast-generation type joint optimization multimode characteristic enhancement on the sequence characteristic representation and the structural characteristic representation, and carrying out peptide classification on the enhanced characteristic representation by utilizing a preset Kan-Conv structure and label-based smooth joint optimization classification model. The method realizes high-precision identification of the functional activity of antihypertensive peptides, innovatively utilizes three kinds of information including sequence, structure and generated potential space comprehensively, fills the gap that the structure-sequence synergistic effect is ignored in the existing peptide function prediction field, and remarkably improves the screening efficiency and prediction reliability of antihypertensive peptides.
Inventors
- YANG SEN
- SHEN PENG
Assignees
- 常州市第二人民医院
- 常州大学
Dates
- Publication Date
- 20260512
- Application Date
- 20250603
Claims (7)
- 1. A method for classifying antihypertensive peptides by combining a fusion sequence with structural multimodal features and combining contrast-generation type joint optimization, comprising the steps of: extracting sequence characteristic representation and structure characteristic representation of the peptide fragment; Performing contrast-generation type joint optimization on the sequence characteristic representation and the structural characteristic representation to enhance the multi-mode characteristics; the multi-modal feature enhancement for contrast-generating joint optimization includes: Through supervised contrast learning, aligning the sequence feature representation and the structural feature representation in a unified embedding space; Introducing potential diffusion branches, and learning a generated smooth mapping for the sequence feature representation and the structural feature representation in a potential space; performing peptide classification on the enhanced feature representation by using a preset Kan-Conv structure and label-based smooth joint optimization classification model; The method for classifying the peptide by utilizing a preset Kan-Conv structure and label-based smooth joint optimization classification model comprises the following steps: Performing dimension splicing on the enhanced feature representation to generate a merging feature with a preset size; Applying a plurality of parallel self-adaptive convolution kernel channels to the merging features, respectively carrying out batch normalization processing, introducing nonlinearity through a ReLU function, Flattening all channels and space dimensions into a long vector by using a maximum pooling layer, and generating a one-dimensional feature vector with a preset size; The one-dimensional feature vector is mapped through two layers of full connection to obtain an original classification score; applying log-softmax operation to the original classification score to obtain logarithmic probability distribution of two types corresponding to each sample, wherein the two types refer to compression-resistant peptide and non-compression-resistant peptide; and classifying the peptide fragments based on the logarithmic probability distribution.
- 2. The method for classifying antihypertensive peptides by combining fusion sequences with structural multimodal features and contrast-generating joint optimization according to claim 1, wherein extracting the sequence feature representation and the structural feature representation of the peptide fragments comprises: And extracting the characteristic representation of the peptide fragment based on a preset large-scale protein language model, and simultaneously utilizing the structural characteristic representation generated by RDKit.
- 3. The method for classifying antihypertensive peptides by combining fusion sequences and structural multimodal features according to claim 2, wherein the structural feature representation generated using RDKit comprises: Mapping the peptide segment into a three-dimensional structure model, supplementing all implicit hydrogen atoms in the three-dimensional structure model, and obtaining structural data comprising three-dimensional coordinate information of all atoms through a molecular conformation embedding algorithm; extracting atomic coordinates of the structural data to obtain an atomic coordinate matrix; Calculating Pearson correlation coefficients among three-dimensional coordinate dimensions based on the atomic coordinate matrix, and obtaining a correlation coefficient matrix; and sequentially carrying out numerical normalization, image conversion and feature vector generation on the correlation coefficient matrix to obtain structural feature representation.
- 4. The method of classifying antihypertensive peptides by combining fusion sequences and structural multimodal features with contrast-generating joint optimization according to claim 2, wherein performing contrast-generating joint optimization on the sequence feature representation and structural feature representation comprises: performing supervised contrast learning on the sequence feature representation and the structural feature representation to obtain a sequence feature representation and a structural feature representation after contrast learning enhancement; and performing generating type learning on the sequence characteristic representation and the structural characteristic representation to obtain the sequence characteristic representation and the structural characteristic representation after enhancing the generating type learning.
- 5. The method for classifying antihypertensive peptides by combining fusion sequences and structural multimodal features with contrast-generating joint optimization according to claim 4, wherein performing supervised contrast learning on the sequence and structural feature representations comprises: Randomly perturbing the sequence feature representation and the structure feature representation, and splicing the perturbed feature representations up and down in sequence to form a feature matrix; calculating dot product similarity matrixes between every two feature matrixes; Constructing a mask matrix which is in the same scale as the similarity matrix, indexing the dot product similarity matrix element by element, and multiplying the dot product similarity matrix by the mask matrix; Summing all non-zero elements in each row of the multiplied matrix to obtain denominator of supervision contrast loss, wherein each row of the multiplied matrix refers to each enhancement sample; according to the definition of the positive sample pair, extracting the index similarity of each enhanced sample and the positive sample in the multiplied matrix to form molecules for supervising the contrast loss; Calculating and averaging the contrast loss of each enhanced sample in the multiplied matrix sample by sample according to the denominator and the numerator of the corresponding monitoring contrast loss to obtain the final monitoring contrast loss; And acquiring the sequence characteristic representation and the structural characteristic representation after the contrast learning enhancement by utilizing the final supervision contrast loss.
- 6. The method for classifying antihypertensive peptides by combining fusion sequences and structural multimodal features according to claim 4, wherein generating the sequence feature representation and the structural feature representation comprises: Mapping the sequence feature representation and the structure feature representation to a potential space to obtain an initial hidden variable; Injecting noise into the initial hidden variable to obtain a noise-containing hidden variable; The noise-containing hidden variable is spliced with the normalization time step and then is input into a preset full-connection network, and the prediction of noise is output; the initial hidden variable is reversely pushed by the predicted noise, and noise components are removed; And mapping the recovered hidden variables back to the original dimension through two layers of full connection to obtain the reconstructed feature vector.
- 7. The method for classifying antihypertensive peptides by combining fusion sequences with structural multimodal features and contrast-generating joint optimization according to claim 1, characterized in that mapping by two layers of full ligation comprises: a first layer, compressing one-dimensional feature vectors with preset sizes, and applying a ReLU to obtain compressed hidden representations; and the second layer is used for mapping the compressed hidden representation to a preset category number and outputting an original category score.
Description
Antihypertensive peptide classification method combining fusion sequence and structural multi-modal characteristics and combining contrast-generation type combined optimization Technical Field The invention relates to the technical field of intersection of bioinformatics and artificial intelligence, in particular to an antihypertensive peptide classification method combining fusion sequences and structural multi-modal characteristics and combining contrast-generation type combined optimization. Background In the field of antihypertensive peptide discovery and design, traditional calculation and prediction methods mainly depend on manual characteristics of amino acid sequences or simply adopt a sequence language model. However, these methods often ignore the folding conformation and dynamic interaction information of peptide molecules in three dimensions, and the functional activity of peptides is not only dependent on linear sequences, but is also closely related to their spatial structure. On the other hand, although existing structure prediction tools can provide high-precision peptide tertiary structure, the existing downstream prediction models rarely carry out deep fusion on the structure information and the sequence information, so that the prediction effect on key functions such as peptide-receptor binding sites, cell membrane penetrating capacity, enzyme inhibition activity and the like is limited. Furthermore, the sequence and structural features are often independently modeled separately in traditional strategies, lacking an end-to-end multi-view collaborative learning framework, making it difficult for the model to capture the inherent correlation between the two. Meanwhile, the latest depth generation model has been remarkably successful in image and small molecule design, but has not been fully applied to peptide-level functional prediction. Disclosure of Invention The invention aims to provide an antihypertensive peptide classification method combining fusion sequences and structural multi-modal characteristics and combining contrast-generation type combined optimization, so as to solve the problems existing in the prior art, the sequence information and the structural characteristics of the peptide fragments are subjected to multi-view fusion, and supervision contrast learning and potential diffusion model branches are introduced on the basis to perform joint optimization, so that the fine coding and low-dimensional potential representation of the high-dimensional characteristics of the peptide molecules are realized. The end-to-end learning framework constructed by the invention not only can remarkably improve the accuracy of peptide function prediction, but also has good generalization capability, and is suitable for the discovery, classification and design of functional peptide molecules such as antihypertensive peptides. In order to achieve the above object, the present invention provides the following solutions: A method of classifying antihypertensive peptides by combining a fusion sequence with structural multimodal features and combining contrast-generating joint optimization, comprising: extracting sequence characteristic representation and structure characteristic representation of the peptide fragment; Performing contrast-generation type joint optimization on the sequence characteristic representation and the structural characteristic representation to enhance the multi-mode characteristics; And performing peptide classification on the enhanced feature representation by using a preset Kan-Conv structure and label-based smooth joint optimization classification model. Optionally, extracting the sequence feature representation and the structural feature representation of the peptide fragment comprises: And extracting the characteristic representation of the peptide fragment based on a preset large-scale protein language model, and simultaneously utilizing the structural characteristic representation generated by RDKit. Optionally, the structural feature representation generated using RDKit includes: Mapping the peptide segment into a three-dimensional structure model, supplementing all implicit hydrogen atoms in the three-dimensional structure model, and obtaining structural data comprising three-dimensional coordinate information of all atoms through a molecular conformation embedding algorithm; extracting atomic coordinates of the structural data to obtain an atomic coordinate matrix; Calculating Pearson correlation coefficients among three-dimensional coordinate dimensions based on the atomic coordinate matrix, and obtaining a correlation coefficient matrix; and sequentially carrying out numerical normalization, image conversion and feature vector generation on the correlation coefficient matrix to obtain structural feature representation. Optionally, performing contrast-generating joint optimization on the sequence feature representation and the structural feature representation i