Search

CN-121975808-A - Method for high-throughput screening of sgRNA skeleton high-activity mutant and application thereof

CN121975808ACN 121975808 ACN121975808 ACN 121975808ACN-121975808-A

Abstract

The invention belongs to the technical field of gene editing and molecular biology, and particularly relates to a method for screening sgRNA skeleton high-activity mutants with high throughput and application thereof. The method comprises the steps of firstly carrying out mismatch, replacement, extension or shortening transformation on lower stem, upper stem and annular region of sgRNA, constructing a large-scale sgRNA skeleton variant library with diverse sequences and structures, constructing a mammalian CRISPRa high-throughput screening system, combining deep sequencing with machine learning, analyzing sequencing data of a skeleton in screening, excavating relevant key sequences and structural characteristics of activity of the skeleton, constructing a general sgRNA skeleton design rule, and finally screening to obtain high-efficiency skeleton variants. The variant can construct a non-repeated sgRNA array, realizes the design of a non-repeated sgRNA unit with multi-target, multi-node or multifunctional regulation, and effectively solves the problems of cloning difficulty and genetic instability caused by framework sequence repetition in a Cas9 system.

Inventors

  • QIAO YUNBO
  • YU WENXIA
  • DAI XIANGPING

Assignees

  • 上海交通大学医学院附属第九人民医院

Dates

Publication Date
20260505
Application Date
20260206

Claims (10)

  1. 1. A method for high throughput screening of high activity mutants of sgRNA backbone in mammalian cells, comprising the steps of: S1, constructing a sgRNA skeleton mutation library, namely designing and synthesizing a sgRNA skeleton variant sequence containing at least one mutation type of sequence substitution, mismatch, extension, shortening, insertion, deletion, polynucleotide substitution or pairing exchange aiming at least one region of a lower stem, an upper stem and a 5bp extension region in the sgRNA skeleton structure to form a large-scale mutation library; S2, establishing a mammalian cell CRISPRa screening system, namely delivering the sgRNA skeleton mutation library to mammalian cells through a viral vector, so that single cells mainly carry single sgRNA skeleton variants; S3, cell sorting and deep sequencing based on phenotype, namely dividing cells into at least four active grade groups according to GFP fluorescence intensity after transfection, respectively collecting cells of each group and extracting genome DNA; S4, calculating enrichment degree of variants and screening high-activity variants, namely, calculating enrichment factors of each variant by comparing the abundance changes of sequencing readings of the variants in populations with different GFP activity grades relative to an initial library, and screening sgRNA skeleton variants remarkably enriched in the high-activity populations; S5, constructing a structure-function prediction model by utilizing machine learning, namely extracting sequence characteristics and secondary structure characteristics of each sgRNA skeleton variant, taking the sequence characteristics and the secondary structure characteristics of each sgRNA skeleton variant and corresponding enrichment factors or activity grades as training data, and performing model training by using a machine learning algorithm to obtain a model capable of predicting the activity of the sgRNA skeleton; s6, extracting key sequences and structural features affecting the activity of the sgRNA skeleton based on the model to form a design principle for guiding the design of a new sgRNA skeleton.
  2. 2. The method of claim 1, wherein in step S1, the construction of the sgRNA backbone mutation library comprises the following methods: (1) Base substitution design is carried out on the lower stem, the upper stem and the 5bp extension region, and a homopolymer sequence with the continuous identical nucleotide number more than or equal to 4 is screened and removed, and the GC content is controlled within the range of 30% -70%; (2) Based on the pairwise substitution design, at least one type of structural variant selected from the group consisting of GA bulge variants, AAGU bulge variants, GAAA loop variants, paired region shortening variants, paired region extension variants, 1-2bp mismatch variants, unpaired base insertion variants, pairing exchange variants, polynucleotide substitution variants, or structural disruption variants is further introduced.
  3. 3. The method of claim 1, wherein in step S1, a library of mutation of the skeleton of the sgRNA is constructed by mutating at least one region of the lower stem, the upper stem and the 5bp extension region of the skeleton structure of the sgRNA V3 scaffold with the nucleotide sequence of the sgRNA V3 scaffold as shown in SEQ ID NO. 1.
  4. 4. The method of claim 1, wherein the sequence features and secondary structural features used in step S5 include at least one of GC content, GC count, number of TT dinucleotide occurrences, site-specific base identity, domain pairing pattern, stem length, bulge length, loop characteristics, melting temperature.
  5. 5. The method of claim 1, wherein in step S5, the machine learning algorithm used is a tree-based integration algorithm selected from at least one of XGBoost, catBoost or LightGBM, and the model is subjected to an interpretive analysis using the SHAP method to identify features that have the greatest impact on activity.
  6. 6. A mutant of high activity of the sgRNA backbone screened by the method of any one of claims 1 to 5, characterized in that its nucleotide sequence comprises at least one of the following structural features: (a) The length of the lower stem region is fixed, and the GC content does not exceed a preset threshold value; (b) The upper stem region has sequence flexibility, allowing for complete replacement; (c) The raised regions do not form additional internal base pairing; (d) The 5bp extension region is rich in T nucleotide and has higher melting temperature; (e) Comprises at least one enhanced motif selected from TTC, CTT or TC.
  7. 7. The high activity mutant of sgRNA framework of claim 6, wherein the nucleotide sequence of the high activity mutant of sgRNA framework is any one of the sequences shown in SEQ ID nos. 2 to 11.
  8. 8. Library of sgRNA backbone variants comprising at least 5000 different sgRNA backbone variant sequences, said variants being obtained by systematic mutagenesis of at least one of the lower stem, upper stem and 5bp extension regions thereof, the types of mutations comprising at least one of sequence substitutions, mismatches, extensions, shortcuts, insertions, deletions, polynucleotide substitutions or pairwise interactions, with reference to the sgRNA V3 scaffold, and said library comprising a mutant of sgRNA backbone of claim 6 or 7; The nucleotide sequence of the sgRNA V3 scaffold is shown as SEQ ID NO. 1.
  9. 9. A method for constructing a non-repetitive multi-target sgRNA expression vector, characterized in that the high activity mutant of the sgRNA framework of claim 6 or 7 is used, or different sgRNA framework variants with low sequence homology are selected from the library of claim 8, and different spacer sequences are respectively connected, so as to construct a multi-sgRNA vector expressed in series or independently.
  10. 10. Use of the sgRNA framework high activity mutant of claim 6 or 7 or the library of claim 8 for the preparation of a CRISPR system for mammalian cell gene editing, gene activation, gene suppression or epigenetic regulation.

Description

Method for high-throughput screening of sgRNA skeleton high-activity mutant and application thereof Technical Field The invention belongs to the technical field of gene editing and molecular biology, and particularly relates to a method for screening sgRNA skeleton high-activity mutants with high throughput and application thereof. Background The CRISPR/Cas9 system is a powerful tool for gene editing in the field of life science, and is widely applied to various fields due to the characteristics of high editing efficiency, good targeting specificity, simple and convenient operation, low price and the like. The CRISPR/Cas9 system is an adaptive immune mechanism originally discovered in bacteria, whose action begins when an exogenous DNA invades, the Cas1-Cas2 complex recognizes the protospacer adjacent motif (Protospacer-adjacent motif, PAM) (e.g., 5 '-NGG-3') on the target DNA and cleaves to acquire an adjacent protospacer (Protospacer) fragment, which is then integrated into the CRISPR locus to form a new spacer (spacer), creating an immunological memory. When bacteria are infected again, the crRNA (consisting of repeat sequence and spacer) precursor and trans-activating crRNA (trans-ACTIVATING CRRNA, TRACRRNA) form a double strand through complementary pairing of the repeat sequence, and mature crRNA is formed by RNase III dominant cleavage (Cas 9 co-participated in complex stabilization) under the guidance of the tracrRNA complex, which guides Cas9 protein to specifically recognize NGG-PAM sequence localization protospacer on target DNA, mediating DNA double strand targeting specific cleavage. In 2012, CHARPENTIER and Doudna groups fuse the tracrRNA and crRNA to form a single RNA chimeric body, which is named as single guide RNA (sgRNA), and the 5' -20nt sequence of the sgRNA determines the targeting site through Watson-Crick base complementary pairing, so that SpCas9 can complete PAM identification and DNA cleavage only by using a single guide RNA. The architecture that does not include recognition sequences after fusion is called scaffold (backbone). In practical application of gene editing technology, the application of multiple sgrnas in the same cell to target the same target or multiple targets is increasingly wide. The multi-sgRNA targeting strategy is adopted, so that the targeting efficiency and the targeting range can be effectively improved, and meanwhile, the time cost and the labor cost of experiments are greatly reduced, and the method has important application value. However, the popularization and application of the strategy are limited by two major core technical bottlenecks, namely, on one hand, the technical difficulty of efficiently delivering multiple sgRNAs into the same cell exists, and on the other hand, if a mode of serially expressing the multiple sgRNAs is adopted, the difficulty of constructing a vector can be greatly increased due to the fact that classical sgRNA skeleton sequences are highly repeated, the number of serially connected sgRNAs can be limited due to the existence of repeated sequences, and further, the application of the strategy in multifunctional screening and complex gene regulation research is severely restricted. Therefore, screening to obtain more sgRNA skeleton sequences with low sequence similarity and activity equivalent to or even better than that of the original skeleton sequences becomes a key requirement for breaking through the technical bottleneck and further expanding the editing efficiency and the application range of CRISPR gene editing tools. Currently, multiple target targeting technologies based on CRISPR/Cas9 systems generally rely on multiple Pol III promoters (e.g., U6 promoter, H1 promoter) to drive the expression of different sgRNA expression modules, respectively, to achieve multiple gene editing or multiple function regulation. However, the multi-promoter system has obvious inherent defects, so that the application effect is seriously influenced, the introduction of a plurality of promoters can lead to rapid increase of the vector volume, the packaging efficiency of a viral vector is limited, the internal recombination of viruses is easily initiated, the packaging failure of the viruses is caused, obvious mutual interference among a plurality of U6/H1 promoters can lead to unbalanced expression quantity of different sgRNAs, and the consistency of targeted editing or regulation is influenced, and the system is difficult to construct a library containing more than 3 sgRNAs, so that the requirements of large-scale high-throughput screening or complex multi-channel regulation research cannot be met. Therefore, the multi-promoter strategy is gradually unable to adapt to the technical requirements of high-end application scenes such as multi-path regulation and control, complex network function analysis or large-scale screening. More importantly, the classical sgRNA skeleton has high repeatability in structure, when a plurality of promoter-