The CCV Center has ended. As of 10/28/2015 this website will no longer be updated.

Programmable DNA-targeting & modification systems

CRISPR

CRISPR systems are a very recently characterized set of systems used by microbes to defend themselves from invading viruses and plasmids. The microbes store short DNA segments derived from these invaders in special locations within their own genomes, and express "CRISPR-associated" (CAS) proteins that use these stored sequences to recognize and then cut the corresponding sequences in the viral or plasmid DNA. Importantly, CRISPR systems use RNA generated from their stored sequences to identify the DNA to be attacked, and a single set of CAS proteins works with all of these CRISPR RNAs. This makes it easy to modify CRISPR systems to target and re-engineer DNA sequences of our choosing compared to genome editing systems like TAL Effectors or Zinc fingers that require one to design a different protein for each target site. For this reason, CRISPR systems have become an area of intense research and have changed the field of DNA engineering in a very short time. The CCV has had a leading role in adapting CRISPR systems for usage in human cells and other higher organisms (PubMed) and adapting it to other purposes (PubMed)), and developing new CRISPR systems for general usage (PubMed). CRISPR systems are so new that many features of them are still being explored. Shown on the left is a structure of S. pyogenes Cas9 in a complex with a "guide" RNA and a targeted strand of DNA (PubMed, RCSB).

Go to Top
Zinc Finger

Zinc Fingers (ZFs) are small protein domains of around 30 amino acids (aa) that each recognize 3 base pair (bp) stretches of double-stranded DNA (dsDNA). Individual ZF domains can be laid out in arrays that recognize longer stretches or 6, 9, or more bp. When each of a pair of Zinc Finger arrays are fused with restriction enzyme domains that cut DNA (most commonly the nuclease domain of the enzyme FokI), and the two ZF arrays recognize adjacent genomic sequences separated by a short spacer, the pair of fusions comprises a "Zinc Finger Nuclease" (ZFN) that will efficiently cut the genome within the spacer. ZFNs can be used to generate both random mutations and precise changes in the genome at the targeted site. ZFs and ZFNs have been studied for over two decades because the four amino acids (aas) within the ZF protein that control recognition of their target 3bp sequences have been identified and a "code" for predicting what aas are needed to bind each of the 64 possible 3bp ZF targets has been discerned. However, using this code to design arrays recognizing long DNA targets is not simple because the code is only approximate. This is because some ZFs exhibit affinities to a fourth base outside of the prinicpal 3bp sequence and this can cause adjacent ZFs in an array to interfere with each other. Therefore special methods are needed to design ZFs that efficiently recognize long sequences, and CCV investigator Keith Joung has long been a leader in the development of such methods and for using them within ZFNs to modify genomic sequences (PubMed). Although recently characterized systems such as TAL Effectors and CRISPR have been a focus of current research because they present simpler systems for DNA targeting, the large large knowledge-base available for Zinc Fingers, the many resources available for them (e.g., the ZifDB database), and their very small size, still present significant advantages. Thus, the CCV continues to work with them and develop them for new purposes (PubMed). Shown on the left is a structure comprising two identical Zif268 Zinc Finger proteins (each containing a tandem array of three Zinc Finger domains) binding adjacent copies of their 9 bp DNA binding sequences on a single 21 bp dsDNA (PubMed, RSCB).

Go to Top
TALE

In 2009 when their DNA-binding code was deciphered (see PubMed), Transcription Activator-Like Effector (TALE) proteins, used in nature by Xanthomonas bacteria that infect plants, emerged as simple and efficient protein system for recognizing and binding to long double-stranded DNA (dsDNA) sequences. Their simplicity lies in the fact that, unlike Zinc Fingers, the TALE DNA-binding code allows one to specify recognition to each bp within a long DNA target sequence independently of the others. TALE proteins contain arrays of repeated blocks of 33-34 amino acids (aa) known as Repeat Variable Diresidues (RVDs). Each RVD recognize a single DNA bp in a DNA sequence by dint of the particular pair of aas at positions 12 and 13. Once the TALE code was broken, the CCV moved quickly to develop methods to synthesize them efficiently (PubMed), to characterize and optimize their use as genome engineering tools and gene activators (PubMed), and, more recently, to develop them as targeted epigenomic regulators (PubMed). Shown here is a structure for a natural Xanthomonas TALE containing 23.5 RVD repeats binding to its natural DNA target sequence within a 36 bp dsDNA containing this sequence (PubMed, RSCB).

Go to Top
Lambda Exo/Bet

Most of the genomic engineering methods highlighted on this page rely on proteins that can be directed to cut the DNA near the genomic location we wish to change. These cuts are often sufficient to generate a random mutation at a target gene that can then inactivate the gene, but we can also make precise changes to a genomic locus if we provide donor DNA that contains the specific altered sequence we wish to have appear there. However, in the bacterium E. coli we can make small precise changes to genomic loci extremely efficiently without cutting the DNA, simply by providing donor DNA and inducing the β protein of the bacteriophage λ, a method sometimes called recombineering. The β protein, in concert with other λ and host factors, allows the donor DNA to be incorporated in the correct place when E. coli replicates its genome. The Church lab within the CCV has developed optimized strains and procedures for this process into a method now called Multiplexed Automated Genome Engineering (MAGE), to the point where, with MAGE, ~10 precise changes can be made at a time to the E. coli genome (PubMed). Within the CCV, we have been attempting to develop a version of MAGE that works in human cells (PubMed). Shown here is a structure of the λ protein Exo RSCB) that works in concert with β. Although β is the protein most critical for MAGE and has been studied for decades, its structure has still not been solved.

Go to Top
Meganuclease

Meganucleases are enzymes found in organisms from across the tree of life that efficiently recognize and cut long DNA sequences of up to several tens of bp long. Because long DNA sequences will only occur rarely in a genome by chance, meganucleases that target a particular location in a genome have potential to cut extremely specifically specifically at that location, so that the possibility of off-target cuts is minimized. For these reasons, meganucleases such as I-Sce I (from baker's yeast), which recognizes an 18bp sequence, have been widely used in research on DNA targeting and on cellular responses and repair processes associated wtih double-stranded DNA (dsDNA) cuts. However, like recombinases and unlike TAL Effectors and Zinc fingers, the protein domains within meganucleases that recognize DNA do not have a modular structure that allows them to be reprogrammed to target sequences of our choosing, which has limited their use in genomic engineering. Nevertheless, structures of a number of meganucleases have been solved, which has allowed the regions of the protein that contact and recognize DNA to be identified, and using this information a number of research laboratories and companies have made headway in reprogramming meganucleases by modifying these protein regions (see, e.g., reference). In the meantime, meganucleases remain of interest in applications where very high specificity is desired and because they can work efficiently in human cells. Shown here is a structure of an meganuclease that has been engineered to recognize a biologically important locus in human cells (RCSB).

Go to Top
Recombinase

Recombinases are enzymes found within viruses and microbes that can invert a sequence, insert or remove a sequence, or, in the right circumstances, effect an exchange of pairs of DNA sequences. Recombinases perform these operations by binding and bring together the specific ~35bp sequences that flank the DNA that will be moved, and then cutting them and rejoining the halves to their opposite partners. Several recombinases such as Cre (from bacteriophage P1), phage λ Integrase and Excisionase, and Frt (from baker's yeast) have been developed into extremely useful tools for biological research and biotechnology, because they are provide highly efficient and specific ways of manipulating DNA sequences -- including sequences in living cells. A drawback of recombinases for genome engineering is that, like meganucleases) and unlike TAL Effectors and Zinc fingers, the recombinase protein domains that recognize the ~35 bp target sequences do not have a modular structure and so cannot be easily reprogrammed to other sequences we would like to target. Therefore, most work with recombinases requires that we use the ~35bp regions that they recognize in nature. To use them to move or exchange sequences in genomic engineering applications, one must therefore place the native 35bp flanks around the sequences we wish to manipulate, and then, after recombination has been effected, either remove them again or leave them in place as unwanted "scars" of the recombination operation, which may have unwanted effects if we are modifying genomes in living cells. Nevertheless, recombinases are extremely useful because, unlike TAL Effectors, Zinc fingers and CRIPSR, they are naturally suited to enable precise replacement of long sequences of many thousands of bp of genomic DNA with DNA of our choosing vs. just a few bp, and because they do not generate random mutations as these other systems potentially do. Moreover, the ability to engineer even a little flexibility into recombinase recognition can be highly useful, especially if it allows a recombinase to recognize and operate on multiple ~35bp target sequences independently (so called "orthogonal" sites). The CCV has explored use of directed evolution to retarget recombinases, as well as ways to improve their accuracy (PubMed). Shown here is a structure of Cre recombinase positioned at a Holliday junction created between cut and rejoined double-stranded DNA strands in the process of recombination (RCSB).

Go to Top
Group II Introns

Group II introns are a class of RNA sequences that can insert DNA copies of themselves at specific locations in a genome. The location in the genome into which the intron will insert is controlled by a set of short sequence regions within the intron, and these short sequences can be recoded so that the insertion can be directed to any of a large set of possible genomic locations. By these means, versions of Group II introns known as Targetrons have been used to disrupt genes in a number of microbes by programming the introns to insert within the gene. For this purpose, Group II introns are useful because they are programmable (vs. transposons) and because their operation relies on the RNA itself and requires very few host factors. These properties have allowed them to be used in other than their native species, and this can be important when one needs to knock out genes in an organism which is difficult to modify genetically. In this way, the Church lab within the CCV used a Group II intron to knock out a gene in a Clostridium species important in biofuel research (PubMed). While Group II introns present an interesting RNA-based mechanism for performing targeted gene knockouts, their successful use in mammalian cells has not been reported. Moreover, the programming of these introns to different locations is limited by the short size of the targeting regions in the intron, which in turn limits their ability to be targeted precisely and specifically to positions of interest in very large genomes such as the human genome. CRISPR/CAS systems therefore appear to be much better positioned for RNA-based targeting in human cells. Shown here is a structure of a Group II intron from the deep sea organism Oceanobacillus iheyensis (RCSB).

Go to Top