Search

CN-121999876-A - Amino acid sequence generation model training method and antibody generation method

CN121999876ACN 121999876 ACN121999876 ACN 121999876ACN-121999876-A

Abstract

The embodiment of the specification provides an amino acid sequence generation model training method and an antibody generation method, wherein the amino acid sequence generation model training method comprises the steps of obtaining an amino acid sequence sample data set aiming at a target project, obtaining a project protein scoring model aiming at the target project according to the amino acid sequence sample data set, generating a predicted amino acid sequence according to a preset amino acid sequence generation model, obtaining an amino acid sequence score corresponding to the predicted amino acid sequence based on the preset general protein scoring model and the project protein scoring model, training the amino acid sequence generation model according to the amino acid sequence score, and continuing training the amino acid sequence generation model until a generation model training stop condition is reached. The amino acid sequence generation model is trained by combining the project protein scoring model and the general protein scoring model, so that the accuracy and adaptability of amino acid sequence generation are effectively improved, and a sequence with more potential is generated for a target project.

Inventors

  • LI MINGYANG
  • YIN MINGZE
  • FU KUN
  • WANG ZHENG

Assignees

  • 阿里云计算有限公司

Dates

Publication Date
20260508
Application Date
20241107

Claims (17)

  1. 1. An amino acid sequence generation model training method, comprising: Acquiring an amino acid sequence sample data set aiming at a target item, and acquiring an item protein scoring model aiming at the target item according to the amino acid sequence sample data set; generating a predicted amino acid sequence according to a preset amino acid sequence generation model; Acquiring an amino acid sequence score corresponding to the predicted amino acid sequence based on a preset general protein scoring model and the project protein scoring model, wherein the amino acid sequence score characterizes the adaptation potential of the protein corresponding to the amino acid sequence to a target project; Training the amino acid sequence generation model according to the amino acid sequence score, and continuing training the amino acid sequence generation model until the generation model training stop condition is reached.
  2. 2. The method of claim 1, obtaining a sample data set of amino acid sequences for a target item, comprising: Acquiring at least one positive sample amino acid sequence data for the target item, wherein the positive sample amino acid sequence data is corresponding to an antibody active for the target item; obtaining at least one negative sample amino acid sequence data based on each positive sample amino acid sequence data; and generating an amino acid sequence sample data set according to each positive sample amino acid sequence data and each negative sample amino acid sequence data.
  3. 3. The method of claim 2, obtaining at least one negative sample amino acid sequence data based on each positive sample amino acid sequence data, comprising: Determining target positive sample amino acid sequence data, wherein the target positive sample amino acid sequence data is any one of the positive sample amino acid sequence data; acquiring at least one positive sample amino acid sequence fragment based on the target positive sample amino acid sequence data; At least one negative sample amino acid sequence data is generated from each positive sample amino acid sequence fragment.
  4. 4. The method of claim 1, obtaining a project protein scoring model for the target project from the amino acid sequence sample dataset, comprising: Obtaining target amino acid sequence sample data from the amino acid sequence sample data set and a sample score corresponding to the target amino acid sequence sample data, wherein the amino acid sequence sample data set comprises at least one amino acid sequence sample data, and the target amino acid sequence sample data is any one of the amino acid sequence sample data; Inputting the target amino acid sequence sample data into a project protein scoring model, and obtaining a prediction score output by the project protein scoring model; Training the project protein scoring model based on the sample score and the predictive score, and continuing to train the protein scoring model until a scoring model training stop condition is reached.
  5. 5. The method of claim 1, generating a predicted amino acid sequence from a predetermined amino acid sequence generation model, comprising: Determining at least one amino acid sequence length from the amino acid sequence sample data set; The amino acid sequence generation model generates a predicted amino acid sequence based on a target amino acid sequence length, wherein the target amino acid sequence length is any one of the amino acid sequence lengths.
  6. 6. The method of claim 5, wherein the amino acid sequence generation model generates a predicted amino acid sequence based on a target amino acid sequence length, comprising: inputting a preset initial amino acid sequence into the amino acid sequence generation model, obtaining subsequent amino acids generated by the amino acid sequence generation model, and splicing the subsequent amino acids to the initial amino acid sequence; And continuing to input the initial amino acid sequence into the amino acid sequence generation model, and obtaining subsequent amino acids generated by the amino acid sequence generation model until the length of the initial amino acid sequence reaches the length of the target amino acid sequence.
  7. 7. The method of any one of claims 1-6, wherein obtaining the amino acid sequence score corresponding to the predicted amino acid sequence based on a pre-set generic protein scoring model and the project protein scoring model comprises: Inputting the predicted amino acid sequence into the universal protein scoring model, and obtaining a first sequence score generated by the universal protein scoring model, wherein the first sequence score characterizes the basic quality of the predicted amino acid sequence; Inputting the predicted amino acid sequence into the project protein scoring model, and obtaining a second sequence score generated by the general protein scoring model, wherein the second sequence score characterizes the potential of the predicted amino acid sequence for the target project; and determining the amino acid sequence score corresponding to the predicted amino acid sequence according to the first sequence score and the second sequence score.
  8. 8. The method of any one of claims 1-6, training the amino acid sequence generation model based on the amino acid sequence score, comprising: determining bonus information corresponding to the predicted amino acid sequence based on the sequence score; and adjusting the amino acid sequence generation model according to the reward information.
  9. 9. The method of any of claims 1-6, the generating model training stop conditions comprising: The number of consecutive high-score rounds is greater than a threshold, wherein the high-score rounds are training rounds having a sequence score greater than or equal to a preset threshold, and/or, The training round reaches the preset training round.
  10. 10.A method of antibody production comprising: Obtaining an amino acid sequence generation model for a target item, wherein the amino acid sequence generation model is trained by the method of any one of claims 1-9; generating at least one antibody amino acid sequence corresponding to the target item according to the amino acid sequence generation model; And obtaining antibody characteristic proteins corresponding to the antibody amino acid sequences based on the antibody amino acid sequences, and obtaining at least one antibody corresponding to the target item according to the antibody characteristic proteins.
  11. 11. The method of claim 10, wherein obtaining the antibody signature protein corresponding to each antibody amino acid sequence based on each antibody amino acid sequence comprises: obtaining a protein folding strategy and determining a target antibody amino acid sequence, wherein the target antibody amino acid sequence is any one of the antibody amino acid sequences; folding the target antibody amino acid sequence according to the protein folding strategy to generate the antibody characteristic protein corresponding to the target antibody amino acid sequence.
  12. 12. An amino acid sequence generation model training method is applied to cloud side equipment and comprises the following steps: Receiving a model acquisition instruction sent by a terminal side device, wherein the model acquisition instruction comprises an amino acid sequence sample data set aiming at a target item; Obtaining a project protein scoring model for the target project according to the amino acid sequence sample data set; generating a predicted amino acid sequence according to a preset amino acid sequence generation model; Acquiring an amino acid sequence score corresponding to the predicted amino acid sequence based on a preset general protein scoring model and the project protein scoring model, wherein the amino acid sequence score characterizes the adaptation potential of the protein corresponding to the amino acid sequence to a target project; Training the amino acid sequence generation model according to the amino acid sequence score, and continuing training the amino acid sequence generation model until a generation model training stop condition is reached; And generating a model based on the amino acid sequence to obtain a model generation result, and transmitting the model generation result to the end-side equipment.
  13. 13. An antibody generation method applied to cloud-side equipment, comprising the following steps: Receiving an antibody generation instruction sent by a terminal side device, and acquiring an amino acid sequence generation model aiming at a target item according to the antibody generation instruction, wherein the amino acid sequence generation model is obtained by training by the method of any one of claims 1-9; generating at least one antibody amino acid sequence corresponding to the target item according to the amino acid sequence generation model; Acquiring antibody characteristic proteins corresponding to the antibody amino acid sequences based on the antibody amino acid sequences, and acquiring at least one antibody corresponding to the target item according to the antibody characteristic proteins; and obtaining an antibody generation result according to each antibody, and sending the antibody generation result to the terminal side equipment.
  14. 14. A cloud training platform comprises a request interface and a response unit; the request interface is used for receiving a task generation request sent by the terminal equipment, wherein the task generation request comprises request information; The response unit acquires an amino acid sequence generation model based on the request information, wherein the amino acid sequence generation model is obtained by training the method according to any one of claims 1 to 9, and generates task information based on the amino acid sequence generation model, wherein the task information is used for the terminal equipment to execute an antibody generation task.
  15. 15. A computing device, comprising: A memory and a processor; The memory is adapted to store a computer program/instruction, the processor being adapted to execute the computer program/instruction, which when executed by the processor, implements the steps of the method of any of claims 1-13.
  16. 16. A computer readable storage medium storing a computer program/instruction which, when executed by a processor, implements the steps of the method of any of claims 1-13.
  17. 17. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1-13.

Description

Amino acid sequence generation model training method and antibody generation method Technical Field Embodiments of the present disclosure relate to the field of computer technology, and in particular, to an amino acid sequence generation model training method, an antibody generation method, a cloud training platform, a computing device, a computer readable storage medium, and a computer program product. Background With the development of the field of antibody design, computational methods have produced significant effects in antibody discovery. Antibodies, proteins produced by the immune system that specifically bind to antigens, protect against disease by binding to antigens. Currently, since the specificity of an antibody is primarily determined by its complementarity determining regions (Comp LEMENTAR ITY-DETERMIN ING regions, CDRs), the design of CDRs is a major concern in computing antibody designs. And the progress in deep learning has led to new computational prediction methods for antibody design that have achieved, to some extent, more efficient antibody discovery. However, the current deep learning model can only generate the CDR regions of the antibody, but generating only the CDR regions of the antibody causes the potential of antibody candidates and the diversity to be insufficient, and the accuracy of generating the whole region of the antibody using the existing model is low, thereby making the experiment using the deep learning model to generate the antibody less efficient. Therefore, in order to solve these drawbacks, an amino acid sequence generation model training method is needed to train a deep learning model that can generate antibodies with high accuracy and diversity, so as to improve the efficiency of the test for generating antibodies. Disclosure of Invention In view of this, the present embodiments provide an amino acid sequence generation model training method, an antibody generation method, an amino acid sequence generation model training method applied to cloud-side equipment, and an antibody generation method applied to cloud-side equipment. One or more embodiments of the present disclosure relate to a cloud training platform, an amino acid sequence generation model training method, a computing device, a computer readable storage medium, and a computer program product, which solve the technical drawbacks of the prior art. According to a first aspect of embodiments of the present specification, there is provided an amino acid sequence generation model training method, comprising: Acquiring an amino acid sequence sample data set aiming at a target item, and acquiring an item protein scoring model aiming at the target item according to the amino acid sequence sample data set; generating a predicted amino acid sequence according to a preset amino acid sequence generation model; Acquiring an amino acid sequence score corresponding to the predicted amino acid sequence based on a preset general protein scoring model and the project protein scoring model, wherein the amino acid sequence score characterizes the adaptation potential of the protein corresponding to the amino acid sequence to a target project; Training the amino acid sequence generation model according to the amino acid sequence score, and continuing training the amino acid sequence generation model until the generation model training stop condition is reached. According to a second aspect of embodiments of the present specification, there is provided a method of antibody production comprising: Acquiring an amino acid sequence generation model aiming at a target item, wherein the amino acid sequence generation model is obtained by training the amino acid sequence generation model training method; generating at least one antibody amino acid sequence corresponding to the target item according to the amino acid sequence generation model; And obtaining antibody characteristic proteins corresponding to the antibody amino acid sequences based on the antibody amino acid sequences, and obtaining at least one antibody corresponding to the target item according to the antibody characteristic proteins. According to a third aspect of embodiments of the present disclosure, there is provided an amino acid sequence generation model training method applied to cloud-side equipment, including: Receiving a model acquisition instruction sent by a terminal side device, wherein the model acquisition instruction comprises an amino acid sequence sample data set aiming at a target item; Obtaining a project protein scoring model for the target project according to the amino acid sequence sample data set; generating a predicted amino acid sequence according to a preset amino acid sequence generation model; Acquiring an amino acid sequence score corresponding to the predicted amino acid sequence based on a preset general protein scoring model and the project protein scoring model, wherein the amino acid sequence score characterizes the adaptation po