CN-122024844-A - Antigen prediction method and device
Abstract
The invention relates to the technical field of antigen prediction, and discloses an antigen prediction method which comprises the following steps of 1, collecting HA sequences and HI data of known virus proteins, 2, carrying out data preprocessing on the HA sequences and the HI data to respectively obtain an original similarity matrix and an original antigen distance matrix, 3, carrying out self-adaptive Fourier decomposition based on a maximum selection principle to obtain an enhanced similarity matrix, constructing a preset model based on PDM and MLP, 4, carrying out downsampling treatment on the enhanced similarity matrix to obtain similarity matrices of three scale windows, 5, carrying out multiscale mixing on the similarity matrices of the three scale windows to obtain a fusion similarity matrix, training a PDM module in the preset model, 6, training the MLP module in the preset model according to the fusion similarity matrix and the original antigen distance matrix to obtain an antigen variation prediction model, and 7, outputting antigen distances of viruses to be analyzed by the antigen variation prediction model.
Inventors
- QIAN TAO
- YANG ZIFENG
- HAN ZITIAN
- QU WEI
- PAN WEIQI
- WANG YANG
- CHEN RUIHAN
- LIU HANKE
Assignees
- 广州医科大学附属第一医院(广州呼吸中心)
- 澳门科技大学
- 广州呼吸健康研究院(广州呼吸疾病研究所)
Dates
- Publication Date
- 20260512
- Application Date
- 20251230
Claims (8)
- 1. An antigen prediction method, comprising the steps of: Step 1, collecting HA sequence and HI data of known viral proteins; Step 2, carrying out data preprocessing on the HA sequence and HI data to respectively obtain an original similarity matrix and an original antigen distance matrix; Step 3, performing self-adaptive Fourier decomposition based on a maximum selection principle according to an original similarity matrix and an original antigen distance matrix, screening out a similarity set of important antigen sites and an important anti-in-situ point set, splicing the similarity set of the important antigen sites with the original similarity matrix to obtain an enhanced similarity matrix, and constructing a preset model based on PDM and MLP according to the enhanced similarity matrix and the original antigen distance matrix; step 4, carrying out downsampling treatment on the enhanced similarity matrix by dividing three scale windows to obtain a similarity matrix of the three scale windows; step 5, carrying out multi-scale mixing on the similarity matrixes of the three scale windows to obtain a fusion similarity matrix, and training a PDM module in a preset model; Training an MLP module in a preset model according to the fusion similarity matrix and the original antigen distance matrix to obtain an antigen variation prediction model; And 7, inputting the similarity matrix of the virus to be analyzed into an antigen variation prediction model, and outputting the antigen distance of the virus to be analyzed by the antigen variation prediction model.
- 2. The antigen prediction method according to claim 1, wherein the step 4 comprises the sub-steps of: a1, dividing three scale windows, wherein the resolutions of the three scale windows are respectively high resolution, medium resolution and low resolution; Step A2, performing downsampling treatment on the enhanced similarity matrix, and respectively placing the enhanced similarity matrix into three scale windows, and step A3, expanding the enhanced similarity matrix in each scale window into vectors with high characteristic dimensions by adopting a convolution mapping space to obtain the similarity matrix of the three scale windows.
- 3. The method according to claim 2, wherein in the step A1, the high resolution is n, and the medium resolution is The low resolution is Where n is the dimension of each row in the enhanced similarity matrix in step 3, i.e., the dimension of each sample.
- 4. The antigen prediction method according to claim 1, wherein the step 5 comprises the sub-steps of: step B1, decomposing a similarity matrix of each scale window into low-frequency channel data and high-frequency channel data; step B2, processing the high-frequency channel data of the three scale windows in a bottom-up mode to respectively obtain first data to be fused of the three scale windows; Step B3, processing the low-frequency channel data of the three scale windows in a top-down mode to respectively obtain second data to be fused of the three scale windows; Step B4, training a PDM module in a preset model in a counter-propagation mode according to an original antigen distance matrix in the bottom-up and top-down processing process; And B5, adding and fusing the first data to be fused and the second data to be fused in the high-resolution scale window to obtain a fusion similarity matrix.
- 5. The antigen prediction method according to claim 1, wherein the step 6 comprises the sub-steps of: Step C1, carrying out average pooling treatment on the fusion similarity matrix to obtain a target similarity matrix; And step C2, training an MLP module in a preset model in a back propagation mode according to the target similarity matrix and the original antigen distance matrix to obtain an antigen variation prediction model.
- 6. The antigen predicting method as claimed in claim 1, wherein the step 2 comprises the steps of: step D1, adopting a mode to induce multi-sequence comparison to process the HA sequence to obtain an original similarity matrix; Step D2, processing the missing data and the low reaction value in the HI data by adopting a low-rank matrix completion method, and then processing the HI data by adopting an antigen distance calculation formula to obtain an initial antigen distance matrix; and D3, carrying out normalization processing on the initial antigen distance matrix to obtain the initial antigen distance matrix.
- 7. The method according to claim 1, wherein in the step 3, the principle of maximum selection is: ; Wherein, the Index for the ith antigenic site selected by the principle of maximum selection, For the original antigen distance matrix, Column vectors subjected to linear correlation processing for the jth candidate important antigenic site of the ith iteration; ; Wherein, the For the column vector corresponding to the selected candidate significant antigenic site j, For the stored orthogonal basis vectors, it is used to ensure that no linear correlation occurs between the interiors of the selected set of important antigenic sites.
- 8. An antigen predicting device for implementing the antigen predicting method according to any one of claims 1 to 7, comprising: the data acquisition module is used for collecting HA sequences and HI data of known viral proteins; The data processing module is used for carrying out data preprocessing on the HA sequence and the HI data to respectively obtain an original similarity matrix and an original antigen distance matrix; The characteristic enhancement module is used for carrying out self-adaptive Fourier decomposition based on a maximum selection principle according to an original similarity matrix and an original antigen distance matrix, screening out a similarity set of important anti-in-situ point sets and important antigen sites, splicing the similarity set of the important antigen sites with the original similarity matrix to obtain an enhanced similarity matrix, and constructing a preset model based on PDM and MLP according to the enhanced similarity matrix and the original antigen distance matrix; the scale decomposition module is used for carrying out downsampling treatment on the enhanced similarity matrix by dividing three scale windows to obtain a similarity matrix of the three scale windows; the scale mixing module is used for carrying out multi-scale mixing on the similarity matrixes of the three scale windows to obtain a fusion similarity matrix, and training the PDM module in the preset model; The MLP training module is used for training the MLP module in the preset model according to the fusion similarity matrix to obtain an antigen variation prediction model; The prediction module is used for inputting the similarity matrix of the virus to be analyzed into the antigen variation prediction model, and the antigen variation prediction model outputs the antigen distance of the virus to be analyzed.
Description
Antigen prediction method and device Technical Field The invention relates to the technical field of antigen prediction, in particular to an antigen prediction method and device. Background Influenza pandemics acquire gene fragments from waterfowl hosts by recombination, which results in viral expression of novel HA surface glycoproteins to which most humans have little or no immunity. Genetic variation can lead to antigenic variation, but is not always relevant. Although vaccination efforts have been performed, there is little success due to insufficient vaccine coverage and mismatching of vaccine with epidemic strains. Genetic variation HAs led to extensive evolution of the HA amino acid sequence, particularly in the HA1 head domain. Vaccination is the most effective means of preventing influenza-related morbidity and mortality. While antigen evolution is a major challenge in vaccine preparation, the genetic distance between vaccine strains and epidemic strains may be predictive of vaccine efficacy. Therefore, the prediction of the evolution trend of the influenza virus antigen is performed by predicting the influenza virus antigen, and the method has important significance for coping with the epidemic of the avian influenza A virus and the research and development and production of vaccines. However, existing prediction methods, such as random forest algorithm and LSTM model, have some limitations. There is often insufficient adaptivity in feature selection and weight distribution, and it is difficult to accurately capture subtle changes in antigen evolution. The problem solved by the scheme is how to design a more accurate antigen prediction method. Disclosure of Invention The invention mainly aims to provide an antigen prediction method, which is characterized in that firstly, a key site of antigen evolution is screened out based on extremely-selected self-adaptive Fourier decomposition, then, different spatial scale information transmission is realized through multi-scale decomposition (dividing high-resolution, medium-resolution and low-resolution scale windows) and multi-scale mixing, global optimal feature combination is captured, prediction precision is improved, and then, a linear change mechanism of a pure MLP architecture is adopted to avoid the problems of secondary complexity and gradient disappearance of a calculation attention mechanism, so that the stability of a model architecture is enhanced. Meanwhile, an antigen prediction device is also provided. In order to achieve the above purpose, the present application adopts the following technical scheme: An antigen prediction method comprising the steps of: Step 1, collecting HA sequence and HI data of known viral proteins; Step 2, carrying out data preprocessing on the HA sequence and HI data to respectively obtain an original similarity matrix and an original antigen distance matrix; Step 3, performing self-adaptive Fourier decomposition based on a maximum selection principle according to an original similarity matrix and an original antigen distance matrix, screening out a similarity set of important antigen sites and an important anti-in-situ point set, splicing the similarity set of the important antigen sites with the original similarity matrix to obtain an enhanced similarity matrix, and constructing a preset model based on PDM and MLP according to the enhanced similarity matrix and the original antigen distance matrix; step 4, carrying out downsampling treatment on the enhanced similarity matrix by dividing three scale windows to obtain a similarity matrix of the three scale windows; step 5, carrying out multi-scale mixing on the similarity matrixes of the three scale windows to obtain a fusion similarity matrix, and training a PDM module in a preset model; Training an MLP module in a preset model according to the fusion similarity matrix and the original antigen distance matrix to obtain an antigen variation prediction model; And 7, inputting the similarity matrix of the virus to be analyzed into an antigen variation prediction model, and outputting the antigen distance of the virus to be analyzed by the antigen variation prediction model. Preferably, the step 4 includes the following substeps: a1, dividing three scale windows, wherein the resolutions of the three scale windows are respectively high resolution, medium resolution and low resolution; Step A2, performing downsampling treatment on the enhanced similarity matrix, and respectively placing the enhanced similarity matrix into three scale windows, and step A3, expanding the enhanced similarity matrix in each scale window into vectors with high characteristic dimensions by adopting a convolution mapping space to obtain the similarity matrix of the three scale windows. Preferably, in the step A1, the high resolution is n, and the medium resolution isThe low resolution isWhere n is the dimension of each row in the enhanced similarity matrix in step 3, i.e., the dimension of each