Search

CN-121999855-A - Method and device for predicting interaction between nanobody and antigen

CN121999855ACN 121999855 ACN121999855 ACN 121999855ACN-121999855-A

Abstract

The application relates to the technical field of biological information, in particular to a method and a device for predicting interaction between a nano antibody and an antigen, wherein the method comprises the following steps: based on a protein large language model, sequence characteristics of the nanobody and the antigen are obtained, so that initial nanobody and antigen interaction probability is generated, an antigen binding site prediction model is utilized to determine the estimated antigen binding site of the sequence characteristics of the nanobody and the antigen, corresponding target characteristics are extracted, fusion is carried out with the obtained sequence characteristics, and the initial nanobody and antigen interaction probability is updated by utilizing the fused characteristics, so that a final nanobody and antigen interaction prediction result is obtained. The method provided by the application highlights the importance of amino acid sites on a binding interface when the nano-antibody-antigen interaction is predicted, and the method remarkably improves the prediction performance of the nano-antibody-antigen interaction.

Inventors

  • LIU MIN
  • DENG JUNTAO
  • DONG MINGYU
  • LIU TAO
  • ZHANG YABIN
  • GU MIAO

Assignees

  • 清华大学

Dates

Publication Date
20260508
Application Date
20241101

Claims (10)

  1. 1. A method for predicting the interaction between a nanobody and an antigen, comprising the steps of: extracting characteristics of a target amino acid sequence based on a protein large language model to obtain sequence characteristics of a nano antibody and an antigen meeting preset conditions; Generating initial nano-antibody and antigen interaction probability by utilizing the sequence characteristics of the nano-antibody and the antigen, and predicting an antigen binding site by utilizing an antigen binding site prediction model based on the sequence characteristics of the nano-antibody and the antigen so as to obtain an estimated binding site of the antigen; Extracting target features of the antigen estimated binding site by using a preset hint encoder, fusing the target features with sequence features of the nanobody and the antigen to obtain fusion features, and updating interaction probability of the initial nanobody and the antigen by using the fusion features to obtain a final prediction result of the interaction of the nanobody and the antigen.
  2. 2. The method according to claim 1, wherein the extracting the characteristics of the amino acid sequence based on the protein big language model to obtain the sequence characteristics of the nanobody and the antigen satisfying the preset condition comprises: Characterizing each amino acid on the sequence with an embedded vector output by a last hidden layer in the protein large language model; And carrying out target pooling treatment on the characteristics of each amino acid based on a minimum pooling strategy, an average pooling strategy and a maximum pooling strategy of the target pooling strategy so as to obtain the sequence characteristics of the nanobody and the antigen meeting preset conditions.
  3. 3. The method of claim 2, wherein the generating initial nanobody and antigen interaction probabilities using the sequence features of the nanobody and antigen comprises: Respectively constructing a first predicted interaction probability based on the minimum pooling strategy, a second predicted interaction probability based on the average pooling strategy and a third predicted interaction probability based on the maximum pooling strategy based on sequence characteristics of the nanobody and the antigen; generating the initial nanobody and antigen interaction probabilities using the first predicted interaction probability, the second predicted interaction probability, and the third predicted interaction probability.
  4. 4. The method of claim 1, wherein predicting the antigen binding site based on the sequence characteristics of the nanobody and antigen using an antigen binding site prediction model, results in a predicted binding site for the antigen, comprising: based on the antigen binding site prediction model, carrying out average pooling treatment on the characteristics of each residue of the nano antibody to obtain final characteristics of the nano antibody; splicing the final characteristics of the nano antibody with the characteristics of each antigen residue respectively to obtain spliced characteristics; Inputting the spliced characteristics into a target neural network formed by a plurality of residual blocks to obtain the estimated binding site of the antigen.
  5. 5. The method of claim 1, wherein extracting target features of the predicted binding site of the antigen using a pre-set hint encoder and fusing the target features with sequence features of the nanobody and antigen to obtain fused features, comprises: Acquiring attention characterization of the estimated binding site of the antigen by using the preset hint encoder; Determining attention embedding of the predicted binding site of the antigen based on the attention characterizations; Embedding the attention into the sequence features fused to the nanobody and antigen to obtain the fusion features.
  6. 6. The method of claim 1, wherein the initial nanobody and antigen interaction probability is calculated as: wherein, X 1 and X 2 respectively represent the amino acid sequences of the nanobody and the antigen; A feature mapping function representing a protein large language model ESM-2; Representing the ith multi-layer perceptron, and P min 、P mean and P max represent three pooling strategies (minimum pooling, average pooling, and maximum pooling), respectively.
  7. 7. A nanobody and antigen interaction prediction device, comprising: The extraction module is used for extracting the characteristics of the target amino acid sequence based on the protein large language model so as to obtain the sequence characteristics of the nano antibody and the antigen meeting the preset conditions; The generation module is used for generating initial nano-antibody and antigen interaction probability by utilizing the sequence characteristics of the nano-antibody and antigen, and predicting an antigen binding site by utilizing an antigen binding site prediction model based on the sequence characteristics of the nano-antibody and antigen so as to obtain an antigen predicted binding site; The prediction module is used for extracting target characteristics of the antigen estimated binding site by using a preset prompt encoder, fusing the target characteristics with sequence characteristics of the nanobody and the antigen to obtain fusion characteristics, and updating the initial nanobody and antigen interaction probability by using the fusion characteristics to obtain a final nanobody and antigen interaction prediction result.
  8. 8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement a nanobody and antigen interaction prediction method according to any one of claims 1-6.
  9. 9. A computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor for implementing a nanobody and antigen interaction prediction method according to any one of claims 1-6.
  10. 10. A computer program product comprising a computer program, characterized in that the computer program is executed by a processor for implementing a nanobody and antigen interaction prediction method according to any one of claims 1-6.

Description

Method and device for predicting interaction between nanobody and antigen Technical Field The application relates to the technical fields of biological information, artificial intelligence, biological medicine and the like, in particular to a method and a device for predicting interaction between a nano antibody and an antigen. Background There have been some studies currently applying molecular dynamics or machine learning methods to NAI (nanobody-antigen interaction ) predictions. However, these methods mostly require the precise structure of nanobodies and antigens, which limits the application of the above methods in the prediction of nanobody and antigen interactions. With the development of second generation sequencing technology, protein sequence information is already available quickly and at low cost, and there is currently a more urgent need to construct efficient sequence-based NAI prediction methods. Technically, nanobodies and antigens are proteins in nature, and the sequence-based protein interaction prediction method proposed at present has a reference effect on NAI prediction. However, applying conventional protein interaction prediction methods to NAI predictions often suffers from negative migration, which may be due to the large pattern differences between nanobody sequences and general protein sequences. The "sequence-structure-function" paradigm recognizes that the amino acid sequence of a protein determines its spatial structure, which in turn determines its function, and that sequence-based protein interaction methods first require characterization of the protein sequence in order to predict protein interactions. Previous studies have shown that protein characterization by large language models achieves better performance in a number of different types of tasks including protein-protein binding affinity prediction, and that currently the most advanced sequence-based protein interaction prediction methods D-SCRIPT and Topsy-Turvy utilize protein large language models in model design. However, most studies in the related art directly input protein-embedded features pre-trained by a language model as a protein-interaction prediction model, do not make good use of the protein features obtained by these unsupervised training, and in addition, the current sequence-based protein-interaction prediction method always considers all sites on a protein sequence as equally important in model input, which reduces the prediction performance of the sequence-based protein-interaction prediction method. Disclosure of Invention The present application is based on the inventors' knowledge and knowledge of the following problems: Nanobodies are protein fragments extracted from the variable domains of heavy chain antibodies specific for camels and shark species. Compared with the traditional monoclonal antibody, the nano antibody not only maintains the capability of specifically binding to the antigen, but also has smaller molecular weight, lower immunogenicity and stronger tissue penetrating power. Nanobodies can form various non-covalent bonds with antigens, and the problem of the interaction (NAI) of nanobodies with antigens is an important branch of the problem of PPI (protein-protein interaction ), which research has important implications for elucidating immune mechanisms and designing nanobodies de novo. In recent years, nanobodies have been rapidly developed and widely used in fields of detection, treatment, and the like. Public databases associated with nanobodies are also continually released, which has prompted research into related methods. However, the existing method research work is mainly focused on structural prediction, naturalness evaluation or binding site prediction of nanobodies. There are few studies currently applying deep learning methods to NAI prediction, and in recent years, the research direction has been gaining attention in the fields of biomedicine, bioinformatics, artificial intelligence, and the like. Nanobodies are used as targeted protein drugs, and the most critical feature is the ability to specifically bind to the antigen of interest. The regions where antibodies and antigens interact are called paratope (paratope) and epitope, respectively. The paratopes are generally distributed over the CDRs of the V region (complementarity determining regions ), whether nanobodies or conventional IgG (Immunoglobulin G). The difference is that conventional IgG recognizes an antigen from the CDRs of the heavy and light chains together, but nanobodies have only heavy chains, so they only rely on the CDRs of the heavy chains to recognize an antigen. To compensate for the weakening of sequence diversity due to light chain deletions, the CDR3 region of nanobodies is longer than that of conventional IgG. Thus, nanobodies, although smaller than traditional IgG, are also capable of specifically binding to various types of antigens. In recent years, some attention and research has been paid