CN-122024839-A - Method, device, medium and program product for determining a target enzyme expressed in a target host
Abstract
The application aims to provide a method, equipment, medium and program product for determining target enzymes expressed in target hosts, wherein the method comprises screening and determining one or more target enzymes from amino acid sequence information, structural information and information of the target hosts corresponding to heterologous enzymes meeting required functions by utilizing a codon sequence prediction model, wherein the codon sequence prediction model is used for determining codon sequence information and corresponding scoring information, the scoring information is used for comparing the expression quantity among different codon sequences, and the enzyme corresponding to the higher scoring codon sequence can have higher expression quantity in the target hosts, so that the enzyme expressed in the target hosts can be rapidly screened out, and the screening cost is reduced. And the enzyme function prediction model and/or the enzyme-substrate interaction prediction model can be combined to screen from the aspects of expression quantity, catalytic reaction type and/or enzyme activity, etc., so as to obtain the enzyme with high-efficiency expression and high activity and required function.
Inventors
- ZHANG DEQIANG
- Tan Wuwei
Assignees
- 上海分子之心智能科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260202
Claims (14)
- 1. A method for determining a target enzyme expressed in a target host, the method comprising: determining one or more heterologous enzymes that fulfill a desired function; And screening and determining one or more target enzymes from the one or more heterologous enzymes by utilizing a codon sequence prediction model based on the amino acid sequence information, the structure information and the information of the target host corresponding to the heterologous enzymes, wherein the codon sequence prediction model is used for determining the codon sequence information and corresponding scoring information, and the scoring information is used for screening the enzymes.
- 2. The method of claim 1, wherein the screening for one or more target enzymes from the one or more heterologous enzymes using a codon sequence prediction model based on the amino acid sequence information, the structural information, and the information of the target host for the heterologous enzyme, wherein the codon sequence prediction model is used to determine the codon sequence information and the corresponding scoring information, and wherein the scoring information is used to perform the screening for enzymes comprises: And screening and determining one or more target enzymes from the one or more heterologous enzymes by utilizing a codon sequence prediction model, an enzyme function prediction model and/or an enzyme-substrate interaction prediction model based on the amino acid sequence information, the structure information and the information of the target host corresponding to the heterologous enzymes, wherein the codon sequence prediction model is used for determining codon sequence information and corresponding scoring information, the enzyme function prediction model is used for determining enzyme catalytic reaction type information, the enzyme-substrate interaction prediction model is used for determining interaction prediction information, and the scoring information, the enzyme catalytic reaction type information and/or the interaction prediction information are used for screening enzymes.
- 3. The method of claim 2, wherein the screening for one or more target enzymes from the one or more heterologous enzymes using a codon sequence prediction model based on the amino acid sequence information, the structural information, and the information of the target host for the heterologous enzyme, wherein the codon sequence prediction model is used to determine the codon sequence information and the corresponding scoring information, and wherein the scoring information is used to perform the screening for enzymes comprises: For each heterologous enzyme, determining the codon sequence information corresponding to each heterologous enzyme and corresponding scoring information by utilizing a codon sequence prediction model based on the amino acid sequence information, the structure information and the information of a target host corresponding to the heterologous enzyme; For each first enzyme to be selected, determining enzyme catalytic reaction type information corresponding to each first enzyme to be selected by utilizing an enzyme function prediction model based on the amino acid sequence information corresponding to the first enzyme to be selected; For each second enzyme candidate, determining corresponding interaction prediction information by using an enzyme-substrate interaction prediction model based on the amino acid sequence information corresponding to the second enzyme candidate and corresponding substrate information, wherein the interaction prediction information comprises binding probability information, and screening and determining one or more target enzymes from the one or more second enzyme candidates based on the interaction prediction information.
- 4. The method of claim 3, wherein determining the codon sequence information and the corresponding scoring information for each heterologous enzyme using a codon sequence prediction model based on the amino acid sequence information, the structural information, and the information of the target host for the heterologous enzyme comprises: determining probability distribution information of codons corresponding to each amino acid in the heterologous enzyme by utilizing a codon sequence prediction model based on the amino acid sequence information, the structure information and the information of a target host corresponding to the heterologous enzyme; and determining the codon sequence information corresponding to the heterologous enzyme and corresponding scoring information based on the probability distribution information of the codons corresponding to each amino acid in the heterologous enzyme.
- 5. The method according to claim 3 or 4, wherein determining the enzyme catalytic reaction type information corresponding to each first enzyme using an enzyme function prediction model based on the amino acid sequence information corresponding to the first enzyme comprises: Determining embedded information corresponding to the first enzyme to be selected by utilizing the enzyme function prediction model based on the amino acid sequence information corresponding to the first enzyme to be selected, wherein the enzyme function prediction model is obtained by training based on a comparison learning method; and inquiring and determining enzyme catalytic reaction type information corresponding to the first enzyme to be selected from an enzyme catalytic reaction type knowledge base based on the embedded information, wherein the enzyme catalytic reaction type knowledge base comprises a plurality of enzyme catalytic reaction type information and average embedded information corresponding to each enzyme catalytic reaction type information.
- 6. The method of any one of claims 3 to 5, wherein the enzyme-substrate interaction prediction model comprises a structure prediction model and an affinity prediction model; the determining, based on the amino acid sequence information corresponding to the second candidate enzyme and the corresponding substrate information, the corresponding interaction prediction information using an enzyme-substrate interaction prediction model includes: based on the amino acid sequence information corresponding to the second enzyme to be selected and corresponding substrate information, corresponding compound structure information is determined through the structure prediction model; Based on the complex structure information, carrying out structure cutting, and determining corresponding cutting structure information, wherein the cutting structure information comprises all substrate atoms and a plurality of amino acids in a second enzyme to be selected which meet cutting conditions; based on the clipping structure information, corresponding interaction prediction information is determined through the affinity prediction model.
- 7. The method of any one of claims 3 to 6, wherein the interaction prediction information further comprises affinity strength information; The screening of the one or more second candidate enzymes based on the interaction prediction information to determine one or more target enzymes comprises: screening and determining one or more target enzymes from the one or more second candidate enzymes based on the binding probability information; And if the number of the one or more target enzymes exceeds the corresponding threshold value, screening the one or more target enzymes finally based on the affinity intensity information.
- 8. The method according to any one of claims 1 to 7, further comprising: Determining corresponding codon sequence prediction information by utilizing a codon sequence prediction model based on sample amino acid sequence information, corresponding sample structure information and host information, wherein the codon sequence prediction information comprises probability distribution information of codons corresponding to each amino acid in the sample amino acid sequence information; And carrying out model optimization based on sample codon sequence information corresponding to the sample amino acid sequence information and the codon sequence prediction information and combining a corresponding loss function to obtain the trained codon sequence prediction model.
- 9. The method according to any one of claims 2 to 8, further comprising: for each sample enzyme in the sample enzyme dataset, determining the embedded information corresponding to each sample enzyme by using an enzyme function prediction model; For each sample enzyme, sampling a corresponding positive sample and a negative sample from the sample enzyme dataset, wherein the positive sample is identical to enzyme catalysis reaction type information corresponding to the sample enzyme, and the negative sample is different from the enzyme catalysis reaction type information corresponding to the sample enzyme; And based on the sample enzyme sequence and the embedded information corresponding to the positive sample and the negative sample, performing model optimization by combining the corresponding contrast loss function to obtain a trained enzyme function prediction model.
- 10. The method according to claim 9, wherein the method further comprises: and determining average embedded information corresponding to the enzyme catalytic reaction type information based on the embedded information of one or more sample enzymes corresponding to the enzyme catalytic reaction type information in the sample enzyme data set, and establishing a corresponding enzyme catalytic reaction type knowledge base.
- 11. The method of any one of claims 1 to 10, wherein the target host is pichia pastoris.
- 12. A computer device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method according to any one of claims 1 to 11.
- 13. A computer readable storage medium having stored thereon a computer program/instruction which when executed by a processor performs the steps of the method according to any of claims 1 to 11.
- 14. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 11.
Description
Method, device, medium and program product for determining a target enzyme expressed in a target host Technical Field The present application relates to the field of bioinformatics, and in particular to a technique for determining a target enzyme expressed in a target host. Background Along with the rapid development of biological manufacturing and green biocatalysis technologies, the enzyme is used as an efficient and environment-friendly biocatalyst and is widely applied in the fields of medicine, food, chemical industry, energy sources and the like. Screening for enzymes that are capable of being efficiently expressed in a particular host (e.g., E.coli, pichia pastoris, saccharomyces cerevisiae, etc.) from functionally similar heterologous enzymes is an important challenge in the development of industrial enzymes. The current screening process generally comprises two parts, namely, firstly using a chromogenic substrate or a fluorogenic substrate to carry out high-throughput screening, screening out enzymes with required catalytic activity, then cloning a small amount of candidate enzymes obtained by primary screening into expression vectors suitable for specific hosts one by one, carrying out small-scale expression verification experiments, and further selecting out enzymes capable of meeting the requirement of large-scale production. The whole screening process has long period, low efficiency and high cost. Disclosure of Invention It is an object of the present application to provide a method, apparatus, medium and program product for determining a target enzyme expressed in a target host. According to one aspect of the present application, there is provided a method for determining a target enzyme expressed in a target host, the method comprising: determining one or more heterologous enzymes that fulfill a desired function; And screening and determining one or more target enzymes from the one or more heterologous enzymes by utilizing a codon sequence prediction model based on the amino acid sequence information, the structure information and the information of the target host corresponding to the heterologous enzymes, wherein the codon sequence prediction model is used for determining the codon sequence information and corresponding scoring information, and the scoring information is used for screening the enzymes. According to one aspect of the present application there is provided a computer device comprising a memory, a processor and a computer program stored on the memory, characterised in that the processor executes the computer program to carry out the steps of any of the methods described above. According to one aspect of the present application there is provided a computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of any of the methods described above. According to one aspect of the present application there is provided a computer program product comprising a computer program, characterized in that the computer program when executed by a processor implements the steps of any of the methods described above. According to one aspect of the present application, there is provided an apparatus for determining a target enzyme expressed in a target host, the apparatus comprising: A module for determining one or more heterologous enzymes that fulfill a desired function; And a two-module, configured to screen and determine one or more target enzymes from the one or more heterologous enzymes by using a codon sequence prediction model based on the amino acid sequence information, the structure information and the information of the target host corresponding to the heterologous enzymes, where the codon sequence prediction model is used to determine the codon sequence information and corresponding scoring information, and the scoring information is used to perform enzyme screening. Compared with the prior art, the method for determining the target enzyme comprises the steps of determining one or more heterologous enzymes meeting required functions, screening and determining one or more target enzymes from the one or more heterologous enzymes by utilizing a codon sequence prediction model based on amino acid sequence information, structure information and information of a target host corresponding to the heterologous enzymes, wherein the codon sequence prediction model is used for determining the codon sequence information and corresponding scoring information, and the scoring information is used for screening the enzymes. The scoring information is used for comparing the expression quantity among different codon sequences, and the enzyme corresponding to the codon sequence with higher scoring can have higher expression quantity in a target host, so that the enzyme which is efficiently expressed in the target host can be rapidly screened out, the screening efficiency is improved, and the screen