DNA, often thought of as “the blueprint of life,” contains instructions for building proteins that cells need to survive and function properly. But DNA isn't perfect and errors can occur during replication. Sometimes, this can result in snippets of the DNA building blocks called nucleotides—G (guanine), A (adenine), T (thymine), C (cytosine)—getting repeated too many times in a row.
This can lead to a type of mutation, known as a nucleotide repeat expansions, that can alter the function and structure of vital proteins and can give rise to rare neurodegenerative conditions like Huntington's disease and amyotrophic lateral sclerosis (ALS).
New research from Whitehead Institute Member Ankur Jain, graduate student Rachel Anderson, and their colleagues takes a closer look at how the repeat sequence involved in Huntington's disease—a CAG repeat—leads to the production of abnormal proteins that misfold and clump within cells, clogging up important cellular processes.
Their findings, published in the journal Molecular Cell on January 30, reveal that the expanded CAG repeat can interfere with splicing. As depicted in the illustration below, this is the process where portions of RNA that do not encode proteins, also known as introns, are cut out. The remaining sections, called exons are then joined together to form the final messenger RNA that carries instructions for building a protein.
According to the researchers, the expanded CAG repeat creates new markers, or splice acceptor sites, which leads to the cutting and pasting of genetic information to occur at different junctions than usual.
“The question of why the brains of patients with repeat expansion disorders have spurious proteins has confounded scientists for some time,” says Jain, who is also an assistant professor of biology and a Thomas D. and Virginia W. Cabot Career Development Professor at the Massachusetts Institute of Technology. “Now, because we have an understanding of the molecular mechanism, we can try to target the splicing pathway and diminish the production of these proteins.”
Unfolding RNA hairpins
RNA is less stable than DNA, and common RNA analysis approaches rely on an enzyme called reverse transcriptase. Although usually in a cell, DNA is read into RNA, this enzyme reads RNA molecules into a complementary DNA strand (cDNA). This allows the researchers to closely analyze the RNA sequences without risking degradation of genetic information.
But reverse transcription of repeat-containing RNAs comes with its own challenges—these molecules tend to fold back on themselves, forming hairpin loops, and when these loops do not fully unwind during reverse transcription, researchers are left with gaps and errors in the cDNA.
In the new paper, Jain and Anderson used a different approach to sensitively reverse transcribe repeat-containing RNAs into cDNA. Specifically, the researchers worked with an enzyme called TGIRT (Thermostable Group II Intron Reverse Transcriptase) that stays active at high temperatures, allowing it to break open the hairpin structures and capture repeat-containing sequences at a higher fidelity.
“When you heat up an egg, it turns yellow because the proteins in the egg are unfolding due to high temperature. We're exploiting the same thing but with RNA structures,” says Anderson.
Then the researchers began mapping these repeats onto a reference genome, which serves as a guide for genetic information in a human, but they quickly ran into challenges. The “letters” that make up a human genome G-A-T-C combine in various sequences to form the strands of DNA in our cells.
This means, repeated patterns in the human genome are inevitable (repeat-based diseases only arise when a single sequence—like CAG—is repeated too many times in a row) and each pattern can occur at multiple locations in the genome. So pinpointing where the repeat-containing RNA originated is like reconstructing a story from fragmented sentences without context.
Source: Whitehead Institute for Biomedical Research