CN-121999869-A - Method for high-throughput identification of natural functional short peptide
Abstract
The invention belongs to the crossing field of marine biotechnology and bioinformatics, and particularly relates to a method for high-throughput identification of natural functional short peptides. Currently, there are still bottlenecks in the development and application of various functional short peptides such as penetrating peptides, antibacterial peptides and the like in non-model species, mainly because the differences between species (amino acid preference, receptor specificity) limit the high suitability of the existing functional short peptides in complex and diverse non-model organisms. To this end, the invention proposes a method for high throughput identification of functional short peptides from the natural sequences of organisms using means of machine learning. The method is based on a biological natural protein sequence, a short peptide sequence set meeting the preset length condition is generated through a sliding window strategy system, a deep machine learning model is utilized to conduct high-throughput functional prediction on the short peptide sequence, the comparison threshold value of the functional short peptide can be determined accurately on the basis of 'breaking-predicting-re-comparing' the existing functional short peptide, and the natural functional short peptide existing in the protein sequence can be effectively identified based on the threshold value. The functional short peptide obtained by the method has the biological characteristics of high suitability formed in strict screening of a machine learning model in the aspect of physicochemical properties and species evolution. The method remarkably improves the direct application capability of the functional short peptide in the non-model species, and provides a precise and high-flux technical means for the efficient development and application of the functional short peptide in the non-model organism.
Inventors
- LIU SHIKAI
- SUN DEQI
- WANG TIANCHANG
- LU RUIYI
- LI QI
Assignees
- 中国海洋大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260119
Claims (6)
- 1. A method for high throughput identification of a naturally occurring functional short peptide comprising the steps of: (1) And constructing a predicted set of short peptides, namely acquiring a protein coding sequence set to be predicted from NCBI GenBank, removing redundancy by using a CD-HIT (compact disc-high identity) with a sequence identity threshold value of more than or equal to 90%, and generating a sequence set meeting the short peptide length condition through a sliding window strategy system. (2) And (3) predicting the functional short peptide, namely learning the sequence characteristics of the short peptide by using Encoder layer, converting all the short peptides in the collection into high-dimension digital vectors by using a ESM, protTrans, seqVec, uniRep protein language embedding model, substituting a vector set into an optimal model trained by machine learning to predict, and extracting the short peptide predicted to be positive. (3) And (3) predicting the functional short peptide in the protein coding region, namely inserting the existing functional short peptide into a nonfunctional short peptide sequence to perform 'sequence breaking-machine learning prediction-reply comparison' process, predicting, determining a threshold range for accurately extracting the functional short peptide, and extracting natural functional peptide fragments exceeding the threshold range in the protein sequence based on the verified threshold range. (4) And (3) physical and chemical property identification and chemical modification of the short peptide, namely, using tools such as MEME Suite, protParam and the like to evaluate the physical and chemical properties such as net charge, hydrophobicity, isoelectric point and the like of the peptide segment, discarding functional short peptide with overlarge physical property difference, and carrying out amino acid modification on the functional peptide segment meeting requirements in the aspects of cell escape, nuclear localization and the like according to experimental purposes.
- 2. The method for high-throughput identification of natural functional short peptides according to claim 1, wherein the identification of natural functional short peptides directly from the protein sequences of related organisms in step (1) reduces the influence of few amino acid changes due to species differences on receptor binding, and is of great importance for the development of species-specific highly functional peptide fragments.
- 3. The method for high-throughput identification of natural functional short peptides according to claim 1, wherein the method has higher fitness with a high-throughput machine learning model, can exert the high-throughput advantage of machine learning to a greater extent, and adds the advantage of biocompatibility, reduces the uncertainty of randomly generated sequences in high-throughput prediction, and avoids huge loss of the functional short peptides in terms of transformation and modification due to suitability when in use.
- 4. The method of claim 1, wherein the step (3) is performed by inserting the existing functional short peptide into the protein sequence, thereby locking the threshold range for precisely identifying the functional short peptide, and determining the precise position of the potential functional short peptide in the natural protein sequence based on the threshold, thereby overcoming the influence of the redundant peptide segment on the core region in the positioning process.
- 5. A method for high throughput identification of functional short peptides of natural origin as claimed in claim 1, wherein the organism involved in the method is the organism of interest itself and comprises other organisms (e.g.viruses, symbiotic bacteria, etc.) whose coding regions contain functional short peptides capable of direct interaction with the proteins of the organism of interest. The method aims at remarkably reducing the problems of low efficiency and high noise brought by high-flux prediction by excavating functional short peptide with natural interaction capability with target biological protein in the biological evolution process.
- 6. The method for high-throughput identification of natural functional short peptides according to claim 1, wherein the method has a wide application range, and based on the method, the penetrating peptide, the antibacterial peptide, the neuropeptides and the like in the batch of natural protein sequences can be excavated in batches, and the method has the advantages of high throughput and high adaptability.
Description
Method for high-throughput identification of natural functional short peptide Technical Field The invention belongs to the field of intersection of biotechnology and bioinformatics, and particularly relates to a method for high-throughput identification of natural functional short peptides. Background Functional short peptides (penetrating peptides, antibacterial peptides, neuropeptides and the like) are bioactive molecules composed of a small amount of amino groups, and have the advantages of simple structure and multiple functions. Due to the characteristics of low toxicity, good biocompatibility, easy synthesis, chemical modification and the like, the preparation method has great potential in the research fields of drug delivery, anti-infection treatment, neuromodulation and the like in life science and medical research. Currently, with the development of synthetic biology and nanotechnology, modification and optimization (such as improving stability, targeting and half-life) of short peptides are becoming research hotspots, and play an increasingly important role in precise medical and life science research. As the function of functional short peptides continues to be explored, they are also becoming increasingly important in the study of non-model species. With the advanced fusion of biotechnology with artificial intelligence, deep machine learning (DEEP MACHINE LEARNING, DML) has become a core engine driving functional short peptide identification. The method greatly accelerates the process of exploring rules, predicting functions and identifying novel short peptides from massive and complex biological data, and brings paradigm shift for biological research and drug development. However, the differences between species (amino acid preference, receptor specificity) limit the direct application of proven functional short peptides in complex diverse non-model organisms. Although the existing de novo prediction and random library establishment can quickly obtain a large number of candidate peptide fragments, the candidate short peptides still need complicated verification and modification processes, so that the application of the high-throughput predicted functional short peptides in complex non-model species still has a bottleneck. Functional short peptides are ubiquitous in the protein sequence of organisms and play an important role in the life process. For example, penetrating peptides in biological proteins can help large molecular substances to enter and exit cells, antimicrobial peptides commonly found in the innate immune system of organisms can resist pathogen infection during the non-specific immune phase, and neuropeptides of lower organisms play a part in the regulation of growth and development. Under the condition that the difference between species limits the functional short peptide widely applied in higher organisms to be directly applied in non-model organisms, the functional short peptide is directly identified from the protein sequence of related organisms, the limitation that the high-throughput prediction is only based on the prediction of physicochemical properties can be avoided, and the functional short peptide can be directly and accurately identified from the protein sequence with high adaptation to the target organisms in the evolution process. Aiming at the problem, the method for excavating the functional short peptide from the natural protein sequence is provided, can be matched with a great amount of deep machine learning models, remarkably improves the discovery efficiency of the functional short peptide, compensates for the biological background training capability of machine learning deficiency, provides the functional short peptide with high suitability for complex and various non-model species, reduces subsequent modification and adjustment, and accelerates the application process of the functional short peptide in the non-model species. The invention is based on this Disclosure of Invention Based on the above reasons, the invention aims to provide a method for identifying natural functional short peptides with high throughput, which can be matched with a great amount of deep machine learning models at present, remarkably improves the applicability of the functional short peptides, compensates for the biological background training capability which is lack by deep machine learning, provides biologically relevant functional short peptides for complex and diverse non-model species, reduces subsequent multi-aspect modification and adjustment, and accelerates the application process of the functional short peptides in the non-model species. In order to achieve the above object, the technical scheme of the invention discloses a method for identifying functional short peptides from protein sequences of related organisms, which comprises the following steps: The method comprises the steps of constructing a predicted set of short peptides, namely acquiring a protein coding seq