CN-121999889-A - Dynamic quantification method for interaction of polypeptide molecules and biological membrane
Abstract
The application discloses a dynamic quantification method for interaction of polypeptide molecules and biological membranes, which is applied to a post-treatment process of molecular dynamics simulation, and comprises the steps of preprocessing atomic-level motion track data obtained by molecular dynamics simulation, and then sampling by using a sliding window with fixed length; the method comprises the steps of carrying out feature extraction on atomic-level motion track data in each sliding window to obtain a total contact number signal, a unit lipid contact intensity signal, an insertion depth signal and an insertion intensity signal, carrying out feature fusion on multi-source signals to obtain feature vectors of the sliding windows, inputting the feature vectors of the sliding windows into a trained teacher-student model to output disturbance types and confidence scores of polypeptide molecules in the sliding windows on biological membranes, and outputting a final result in a weighted voting mode. The method can simultaneously characterize the low-frequency collective deformation and the medium-high frequency local event, and classify the predefined disturbance category.
Inventors
- HU CUIHUA
- JING RAN
- WANG YANG
- CHEN YUJUAN
- DONG LITONG
- WANG LU
- WANG ZUOBIN
- WANG YING
- TIAN LIGUO
- LIU LANJIAO
Assignees
- 长春理工大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260119
Claims (8)
- 1. A method for dynamically quantifying interactions of a polypeptide molecule with a biological membrane, wherein the method is applied to a post-treatment process of molecular dynamics simulation, the method comprising: preprocessing atomic-level motion trail data obtained by molecular dynamics simulation to eliminate coordinate jump phenomenon caused by periodic boundary conditions, and then sampling by using a sliding window with fixed length; Extracting the characteristics of the atomic-level motion track data in each sliding window to obtain a total contact number signal of polypeptide atoms and lipid head group atoms, a unit lipid contact strength signal of each lipid type, an insertion depth signal and an insertion strength signal of a polypeptide molecule inserted into a lipid bilayer, and carrying out characteristic fusion on the multi-source signals to obtain characteristic vectors of each sliding window; and inputting the feature vectors of each sliding window into a trained teacher-student model to output the disturbance type and confidence score of the polypeptide molecules in each sliding window on the biological membrane, and outputting a final result in a weighted voting mode.
- 2. The method of claim 1, wherein the preprocessing of atomic-scale motion trajectory data obtained for molecular dynamics simulation comprises: correcting and calculating the distance between any two atoms based on the minimum mirror image principle; or adopting an incremental unwrapping mode to carry out cross-half-box correction and accumulation on continuous frame coordinates so as to obtain a continuous track; or constructing a local coordinate system based on the biomembrane central plane or the local neighborhood, so that the calculation of the local geometric quantity is not interfered by the skip of the cross-boundary.
- 3. The method of claim 1, wherein the total number of contacts signal of polypeptide atoms and lipid head group atoms, the unit lipid contact strength signal of each lipid type is obtained by: Extracting a plurality of polypeptide atoms and lipid head group atoms in each time step in the sliding window, simultaneously obtaining the distance between each polypeptide atom and each lipid head group atom, judging whether the polypeptide atoms and the lipid head group atoms are contacted according to a preset contact threshold value, and counting the total contact number signals in the sliding window; Counting the polypeptide-lipid head group atom pairs with contact relation under each lipid type aiming at each time step to obtain the contact quantity of each lipid type, and calculating the contact proportion of each lipid type based on the quantity of the head group atoms under each lipid type; and normalizing the contact proportion of each lipid type based on the total number of contacts to obtain a unit lipid contact intensity signal of each lipid type in the sliding window.
- 4. The method of claim 1, wherein the insertion depth signal and the insertion intensity signal of the polypeptide molecule into the lipid bilayer are obtained by: Extracting coordinates of a plurality of polypeptide atoms and lipid head group atoms in each time step in the sliding window, and selecting the lipid head group atoms in a specific radius by taking a projection point of a polypeptide centroid on a biomembrane central plane as a center, wherein the biomembrane central plane is parallel to an upper lipid head base layer and a lower lipid head base layer; Summing and averaging projections of the selected lipid head group atoms in the normal direction of the biological membrane under each time step, and calculating the distance between the projections and the central plane of the biological membrane so as to determine an insertion depth signal of the polypeptide molecules inserted into the lipid bilayer in the sliding window; And (3) taking the distance between the projection of the polypeptide centroid in the normal direction of the biological membrane and the central plane of the biological membrane as an insertion intensity signal of the polypeptide molecules inserted into the lipid bilayer in the sliding window at each time step.
- 5. The method of claim 1, wherein the multi-source signals within each sliding window are feature fused by: Performing continuous wavelet transformation on the inserted depth signal in the sliding window to obtain a time-frequency coefficient and an energy spectrum, dividing the energy spectrum into three frequency bands of low frequency band, medium frequency band and high frequency band according to a time scale or a logarithmic scale to obtain the energy duty ratio of each frequency band, and then using the energy spectrum and the energy spectrum as time-frequency energy characteristics; Respectively calculating the mean value, standard deviation, quantile and maximum value of the total contact number signal, the unit lipid contact intensity signal, the insertion depth signal and the insertion intensity signal, and performing linear fitting to obtain respective corresponding statistical characteristics and change rate characteristics; and splicing the total contact number signal, the unit lipid contact strength signal, the insertion depth signal, the statistical characteristic and the change rate characteristic of the insertion strength signal with the time-frequency energy characteristic to obtain the characteristic vector of the sliding window.
- 6. The method of claim 1, wherein the teacher-student model is pre-trained by: Judging the disturbance type of polypeptide molecules on the biological membrane according to a plurality of preset limiting conditions based on the quantile of key indexes in the sliding window, and marking the sliding window conforming to the limiting conditions, wherein the key indexes comprise a total contact number signal, an insertion depth signal, the quantile of an insertion intensity signal and a high-frequency energy duty ratio; and performing semi-supervised self-training on the teacher-student model by using the feature vectors of the marked and unmarked sliding windows.
- 7. The method of claim 6, wherein determining the type of disturbance of the biological membrane by the polypeptide molecule based on a predetermined plurality of defined conditions comprises: when the quantiles of the total contact number signal, the insertion depth signal and the insertion intensity signal in the sliding window are all less than or equal to 30% quantiles corresponding to each other, judging that the disturbance condition is no disturbance; When the quantiles of the total contact number signal and the insertion intensity signal in the sliding window are larger than or equal to the corresponding 70% quantiles respectively, judging that the disturbance condition is surface adsorption; when the quantiles of the insertion depth signal and the insertion intensity signal in the sliding window are larger than or equal to the corresponding 70% quantiles respectively, judging that the disturbance situation is shallow layer insertion; When the regular disturbance score in the sliding window is greater than or equal to 70% of the score and the high-frequency energy ratio is greater than or equal to 75%, determining that the disturbance condition is film perforation, wherein the regular disturbance score is obtained by carrying out score normalization on each key index.
- 8. The method of claim 6, wherein the semi-supervised self-training of the teacher-student model using feature vectors with annotated and unlabeled sliding windows comprises: Taking the feature vector with the label and the continuous sliding window as input of a teacher model, taking the label as output, and training the teacher model; based on the feature vector of each unlabeled sliding window, predicting the probability and confidence score of the disturbance category of each unlabeled sliding window by using the trained teacher model; And training the student model by using the unlabeled sliding window with higher confidence score and the feature vector with the labeled continuous sliding window until the iteration is carried out until the newly added sample converges.
Description
Dynamic quantification method for interaction of polypeptide molecules and biological membrane Technical Field The application belongs to the technical fields of biophysics and computer science, and particularly relates to a dynamic quantification method for interaction of polypeptide molecules and biological membranes. Background Molecular Dynamics (MD) simulation is one of core technologies for revealing dynamic behaviors and interaction mechanisms of biomolecules at an atomic scale, and is widely applied to the fields of drug design, material science and the like. However, one MD simulation can produce TB-level trajectory data, and how to efficiently and accurately extract valuable physicochemical information from these high-dimensional, complex time series data is a key challenge in the art. At the same time, time-frequency analysis, particularly Continuous Wavelet Transform (CWT), is an advantage as a powerful signal processing tool in that it can generate two-dimensional scale patterns to accurately capture the instantaneous frequency characteristics of non-stationary signals. The prior art combines CWT with unsupervised clustering to identify the collaborative motion pattern of the three-dimensional coordinate displacement of atoms in the MD trace. The existing MD track post-processing technology mainly has the following defects: (1) The dependence on a single index leads to a risk of false positives, where conventional analysis methods often rely on a few preset statistics, such as Root Mean Square Deviation (RMSD), contact count, film thickness, etc. These indicators, while intuitive, may exhibit non-monotonicity in describing complex, non-linear biological processes, resulting in erroneous decisions on the physical mechanism. For example, when the interaction between a polypeptide and a cell membrane is studied, the number of contacts with the membrane head group may be decreased when the polypeptide is adsorbed from the membrane surface and completely inserted into the membrane core, and the mechanism of "deep insertion" is easily misjudged as "weak adsorption" by this index alone. (2) Lack of multi-scale information integration for dynamic processes-biological processes often involve dynamic events on different time scales. For example, rapid local vibration of the molecule (high frequency events) occurs simultaneously with slow global conformational changes (low frequency events). Traditional analysis methods, such as fourier transforms, can only provide a global frequency distribution of the signal over a period of time, and cannot reveal the point in time when a particular dynamic event occurs. Therefore, it is difficult in the prior art to effectively distinguish transient, high frequency localized membrane pits caused by molecular insertion from low frequency collective undulations of the membrane itself. (3) Low automation and difficulty in tag generation in recent years, supervised machine learning has been introduced in MD trace analysis, but its application has been limited to the acquisition of high quality training data. For MD tracks that contain millions of frames very often, it is impractical to label the physical mechanism manually from frame to frame. A more systematic approach is the Markov State Model (MSM) workflow. Although the process can automatically divide states, the process is complex, involves multiple steps of PCA dimension reduction, clustering and the like, and is sensitive to parameter selection. In view of the foregoing, there is a strong need in the art for a new method of MD trace analysis that can be automated, accurately quantified, and simultaneously provide process interpretation and global conclusions to overcome the limitations of the prior art. Disclosure of Invention Therefore, the application aims to provide a dynamic quantification method for interaction of polypeptide molecules and biological membranes, which can simultaneously represent low-frequency collective deformation and medium-high frequency local events by means of feature fusion of multi-source signals, enhance the separability of non-stationary disturbance processes, classify and score predefined disturbance categories by means of semi-supervised self-training, and deduce global conclusion from process description by means of a weighted voting mechanism of model confidence. The application provides a dynamic quantification method of interaction between polypeptide molecules and biological membranes, which is applied to a post-treatment process of molecular dynamics simulation, and comprises the following steps: preprocessing atomic-level motion trail data obtained by molecular dynamics simulation to eliminate coordinate jump phenomenon caused by periodic boundary conditions, and then sampling by using a sliding window with fixed length; Extracting the characteristics of the atomic-level motion track data in each sliding window to obtain a total contact number signal of polypeptide atoms and