KR-20260068012-A - Codon optimization for high level gene expression and its use
Abstract
The present invention relates to a codon optimization technique for high expression of recombinant proteins and its application. Specifically, by systematically modifying the third base of a codon constituting a gene, the expression efficiency of recombinant proteins in host cells, particularly plant cells, can be dramatically improved. The present invention presents new possibilities for the development of protein-based therapeutics by enabling the mass production of vaccines or therapeutic protein drugs, which were previously difficult to commercialize due to low expression levels.
Inventors
- 황인환
- 부 티 꾸이
- 석옥희
- 오경석
- 강향주
- 손은주
Assignees
- 포항공과대학교 산학협력단
- 주식회사 바이오앱
Dates
- Publication Date
- 20260513
- Application Date
- 20251105
- Priority Date
- 20241105
Claims (19)
- A gene sequence encoding a recombinant protein, wherein the third base of the codon constituting the gene sequence is cytosine (C) first and guanine (G) second.
- In claim 1, the gene sequence is a gene sequence in which the third base of a codon encoding one or more amino acids selected from the group consisting of Ala(A), Leu(L), Ile(I), Val(V), Ser(S), Pro(P), Thr(T), Tyr(Y), His(H), Asn(N), Asp(D), Cys(C), Arg(R), Phe(F), and Gly(G) is cytosine (C), and the third base of a codon encoding one or more amino acids selected from the group consisting of Gln(Q), Lys(K), and Glu(E) is guanine (G).
- In claim 1, the gene sequence is a gene sequence in which the third base of the codon encoding Ala(A), Leu(L), Ile(I), Val(V), Ser(S), Pro(P), Thr(T), Tyr(Y), His(H), Asn(N), Asp(D), Cys(C), Arg(R), Phe(F) and Gly(G) is cytosine (C), and the third base of the codon encoding Gln(Q), Lys(K) and Glu(E) is guanine (G).
- In paragraph 3, the gene sequence is a gene sequence in which the base sequence of the codon encoding Ser(S) is TCC.
- In a codon optimization method for redesigning a gene sequence encoding the same amino acid sequence, A step comprising replacing a codon in the gene sequence in which the third base is adenine (A) and/or thymine (T) with a codon in which the third base is cytosine (C) or guanine (G) among synonymous codons of the amino acid encoded by the said codon. Codon optimization method.
- In paragraph 5, the above codon optimization method is, A codon optimization method characterized by including the step of replacing a codon in which the third base in the gene sequence is adenine (A) and/or thymine (T) with a codon in which the third base is cytosine (C) among synonymous codons of the amino acid encoded by the codon.
- In paragraph 6, the above codon optimization method is, A step comprising preferentially substituting a codon in the gene sequence in which the third base is adenine (A) and/or thymine (T) with a codon in which the third base is cytosine (C) among synonymous codons of the amino acid encoded by the said codon, and secondarily substituting the codon not substituted with cytosine (C) with a codon in which the third base is guanine (G). Codon optimization method.
- In paragraph 5, the above codon optimization method is, A method comprising the step of replacing a codon in the gene sequence in which the third base is adenine (A), a codon in which the third base is thymine (T), and/or a codon in which the third base is guanine (G) with a codon in which the third base is cytosine (C) among synonymous codons of the amino acid encoded by the said codon. Codon optimization method.
- In paragraph 8, the above codon optimization method is, A method comprising the step of preferentially substituting a codon in the gene sequence in which the third base is adenine (A), a codon in which the third base is thymine (T), and/or a codon in which the third base is guanine (G) with a codon in which the third base is cytosine (C) among synonymous codons of the amino acid encoded by the said codon, and secondarily substituting a codon not substituted with cytosine (C) with a codon in which the third base is guanine (G). Codon optimization method.
- In paragraph 5, the above codon optimization method is, A step comprising replacing the codons that encrypt Ser(S) with TCCs, Codon optimization method.
- In paragraph 5, the above codon optimization method A step comprising substituting a codon encoding any one or more amino acids selected from the group consisting of Ala(A), Leu(L), Ile(I), Val(V), Ser(S), Pro(P), Thr(T), Tyr(Y), His(H), Asn(N), Asp(D), Cys(C), Arg(R), Phe(F), and Gly(G) with a synonymous codon in which the third base is cytosine (C). Codon optimization method.
- In paragraph 5, The above codon optimization method A method comprising the step of substituting a codon encoding any one or more amino acids selected from the group consisting of Gln(Q), Lys(K) and Glu(E) with a synonymous codon in which the third base is guanine (G). Codon optimization method.
- In paragraph 5, the above codon optimization method is for enhancing protein expression by improving gene translation efficiency in a host cell, Codon optimization method.
- In Paragraph 13, The above host cell is a plant cell, Codon optimization method.
- A recombinant expression vector loaded with a gene sequence of any one of claims 1 to 4 or a gene sequence redesigned by a codon optimization method of any one of claims 5 to 14.
- Transformed plant cells or transformed plants into which the recombinant expression vector of claim 15 has been introduced.
- In paragraph 16, the above-mentioned transformed plant cell or transformed plant is (i) Reprogramming of plant traits; (ii) gene editing; and/or (iii) Transformed plant cells or transformed plants for the production of recombinant proteins.
- A method for producing recombinant protein comprising the following steps: (a) a step of culturing the transformed plant cell or transformed plant body of paragraph 16; and (b) A step of recovering recombinant protein from the cultured transformed plant cells or transformed plants.
- In paragraph 18, the above-mentioned plant is, Food crops including rice, wheat, barley, corn, soybeans, potatoes, red beans, oats, and sorghum; Vegetable crops including Arabidopsis thaliana, napa cabbage, radish, chili pepper, strawberry, tomato, watermelon, cucumber, cabbage, Korean melon, pumpkin, green onion, onion, and carrot; Specialty crops including ginseng, tobacco, cotton, sesame, sugarcane, sugar beet, perilla, peanuts, and rapeseed; Fruit trees including apple trees, pear trees, jujube trees, peaches, grapes, citrus fruits, persimmons, plums, apricots, lemons, and bananas; and Selected from floricultural plants including roses, carnations, chrysanthemums, lilies, sunflowers, cosmos, and tulips, Method for producing recombinant proteins.
Description
Codon optimization technology for high-level gene expression and its use The present invention relates to a technology for producing recombinant proteins in host cells, and more specifically, to a codon optimization technology that dramatically enhances the expression of recombinant proteins in host cells by systematically changing specific bases of codons that constitute genes, and the application thereof. Generally, genes are converted into mRNA through the process of transcription, and this mRNA is translated by ribosomes to synthesize proteins. During this process, the base sequence of mRNA is converted into the amino acid sequence of a protein, which is accomplished by a codon consisting of three bases specifying one amino acid. That is, starting from the first start codon (ATG) in the gene's base sequence, every three bases form a codon, and these codons are arranged continuously in a non-overlapping manner to determine the amino acid sequence of the protein. There are a total of 20 types of amino acids that make up proteins, including stop codons that terminate translation. However, there are a total of 64 possible codon combinations, which is more than the number of amino acids. Therefore, a single amino acid can be specified by two or more codons, and in some cases, up to six codons may specify a single amino acid. Multiple codons that encode the same amino acid in this way are called synonymous codons, and due to the existence of these synonymous codons, the gene base sequences of the same protein can vary. In most cases, synonymous codons differ at the third base, but amino acids with six codons, such as arginine, serine, and leucine, also differ at the first and second bases. To date, no clear rules regarding which codons are selected for specific amino acids during the gene generation process are known. However, it is well known that protein expression efficiency varies depending on the combination of codons used, even when encoding the same protein. In particular, some synonymous codons exhibit higher protein production efficiency compared to others, and expression levels can be enhanced by reconstructing the gene sequence using these codons. This process is generally referred to as codon optimization. Codon optimization is a widely used strategy when redesigning genes to express recombinant proteins in specific host cells. Previously known codon optimization methods are primarily based on the frequency of tRNAs expressed in host cells. That is, they analyze the gene copy number or expression level of tRNAs with anticodons complementary to specific codons, and then replace rare codons with low usage frequency with more frequently used codons. This approach has been reported to increase protein productivity in various cases, such as the expression of insect control proteins in tomatoes and tobacco, human proteins in E. coli, and photoproteins and luciferases in mammalian cells. However, codon optimization is not always successful. For example, there have been reported cases where HPV-16 L1 protein expression in plants did not improve compared to the version using human codons, despite the application of plant-specifically optimized codons (Maclean et al., 2007, Journal of General Virology, 88(5), 1460-1469). As such, there are cases where it is difficult to predict expression levels using only simple codon frequency-based approaches; accordingly, new concepts such as codon harmonization, which considers mRNA secondary structure, protein folding, and function, have been proposed (Webster et al., 2017, Biotechnology and Bioengineering, 114(2), 492-502; Mauro, 2018, BioDrugs, 32(3), 183-191). In summary, despite various studies and attempts made to date, universal rules for codon optimization have not yet been established, and in particular, general principles applicable to recombinant protein expression across different host systems are not known. Figure 1 shows that an increase in A and T content at the third base of a codon in Nicotiana benthamiana reduces protein expression. Figure 1A shows the DNA length, codon adaptation index (CAI), total GC content (%GC), GC content at the first base position (%GC1), GC content at the second base position (%GC2), and GC content at the third base position (%GC3) of the GFP wild type (GFP_V1) and variants (GFP_V2, GFP_V3, respectively), and these values were calculated using the CAIcal program (http://genomes.urv.es/CAIcal). Figure 1B shows the results of a Western blot comparison of GFP protein expression levels according to the GFP wild type (GFP_V1) and variants (GFP_V2, GFP_V3, respectively), and Figure 1C shows a graph of the quantitative analysis of Western blot band intensity. Figure 2 shows that the decrease in GFP protein expression in Figure 1 in Nicotiana benthamiana is independent of transcription levels. Figure 2A shows the results of comparing the transcription levels of the GFP genes in V1, V2, and V3, respectively, using qRT-PCR. Figure 2B sh