Research
Chemical biology of nucleic acids
Recent advances in the understanding of nucleic acid function have shown that non-coding sequences have key roles in regulating many cellular processes, from transcription and translation to cell division and genome stability.
Rather than having typical Watson-Crick base pairs that form a double helix, many non-coding sequences in DNA and RNA display non-standard structural features. For example, guanine-rich sequences can adopt stable four-stranded structures called G-quadruplexes. We hypothesise that the formation of such structures in vivo is critical to biological function and medicine. We aim to elucidate the role of such structures in cancer and in normal cells. By the application of small chemical molecules that selectively target such non-canonical structural elements, we further aim to develop novel approaches that could be used in the treatment of cancer.
While nucleic acids generally adopt a well-known double helical structure through guanine-cytosine and adenine-thymine base pairing, some sequences can take on alternative structures. In guanine (G)-rich regions, G bases can adopt stable intramolecular arrangements mediated by Hoogsteen hydrogen bonding to form several stacked G-tetrads. Within the human genome, we have shown that potential G-quadruplex forming sequences, with the consensus G3-6N1-7G3-6N1-7G3-6N1-7G3-6, are common (Figure 1). These sequences show particular concentrations near or in the promoters and first introns of many genes, including oncogenes such as MYC, KIT and RAS. The accumulated evidence for G-quadruplex structure and function is based largely on data from biophysical, structural and in vitro studies. We are therefore investigating the existence of G-quadruplex nucleic acids in living systems, and are seeking robust evidence of their biological function and their validity as drug targets.

Figure 1
G-quadruplex formation mediated by Hoogsteen hydrogen bonding (left). Stacked G-tetrads in an intramolecular G-quadruplex (top right) and the putative consensus sequence for G-quadruplex formation (bottom right) (see Huppert and Balasubramanian, Nucleic Acids Res 2005; 33: 2908).
DNA G-quadruplexes
DNA G-quadruplexes are implicated in a range of biological processes from the control of cell division to the regulation of gene transcription. We are investigating several aspects of DNA G-quadruplex biology. First, we wish to prove the existence and survey the extent of G-quadruplex formation in living cells, and how this might be regulated in cancer phenotypes. To do this we are using a variety of probes to stabilise G-quadruplexes in cells. We have therefore chemically synthesised a number of small molecules with high affinity for G-quadruplex over duplex DNA. We are also using G-quadruplex-binding proteins as probes and are exploiting natural binding proteins, such as nucleolin, or synthetic proteins such as specific G-quadruplex-binding antibodies and zinc-finger proteins that we have generated. By isolating genomic DNA bound to these probes, we can use chromatin immunoprecipitation together with next-generation sequencing (ChIP-seq) technologies to determine the sites of G-quadruplex formation genome-wide.
We are also exploring how G-quadruplexes in promoters influence gene transcription and/or DNA replication in cancer cells. Many groups including ours have intensely studied the biophysical and structural characteristics of G-quadruplexes found in the promoters of human oncogenes. Chemical biological studies on cancer cells in culture have also shown that G-quadruplex-binding small molecules can suppress oncogene expression. While such studies are highly suggestive of G-quadruplex formation at promoters during transcription, we are now actively looking in vivo at how G-quadruplexes form in promoter sites in genomic DNA by structural mapping and genetic approaches.
During cell division, it is vital the ends of chromosomes do not become shortened or are recognised by DNA damage response pathways, otherwise genome instability will be induced. Telomeres protect chromosomes from damage by virtue of their DNA sequence. This sequence, comprised of tandem TTAGGG repeats, is required to recruit a protective protein complex, known as shelterin, to telomeres. We and others have shown that the TTAGGG sequence is capable of forming stable DNA G-quadruplex structures in vitro, and studies in invertebrate systems suggest that G-quadruplex formation is cell cycle regulated (Figure 2). Telomeres are also actively transcribed into telomeric (TERRA) RNA. While TERRA RNA can form stable G-quadruplexes in vitro, it is not known if this is true in vivo or whether this is needed for normal telomere function. Also, of note is the observation that 85% of primary tumours show increased expression of the enzyme telomerase, which is required to maintain telomeres. By applying small molecule and protein probes, together with genetic approaches, we aim to definitively prove the existence and understand the regulation of DNA telomeric G-quadruplexes and TERRA RNA in human cells.

Figure 2
DNA damage and uncapping of telomeres is induced by a G-quadruplex-binding small molecule. In treated cells (b), POT1 (green spots) is lost from telomeres compared to untreated cells (a). The compound induces DNA damage as measured by gamma-H2AX foci (red spots in c). This suggests that a DNA damage response is stimulated by loss of POT1 from telomeres. Double staining, under conditions of partial POT1 loss shows that DNA damage occurs at telomeres (see Rodriguez et al., J Am Chem Soc 2008; 130: 15758).
RNA G-quadruplexes
There is much evidence that G-quadruplexes are widely present in RNA and that they may be associated with several key aspects of RNA biology. For example, G-quadruplexes in the 3'-UTR of insulin-like growth factor II mRNA play a role in post-transcriptional endonucleolytic cleavage. Furthermore, we have recently shown that a conserved RNA G-quadruplex motif in the 5'-UTR of the human NRAS proto-oncogene can modulate translation (Figure 3). We have also shown that small molecule ligands can target such RNA G-quadruplexes and thus influence translation. As RNA helicases with G-quadruplex resolving activity have recently been identified, this suggests that RNA quadruplexes exist normally in vivo. Our bioinformatics analysis highlights that large numbers of human RNA transcripts contain a potential G-quadruplex forming region. This raises an important question: how widespread are G-quadruplexes in RNA transcripts and what is their functional relevance? To address this we are therefore using a combination of genome-wide ChIP-seq and chemical biology approaches to identify and map the existence of G-quadruplex structures within the transcriptome.

Figure 3
G-quadruplexes in the 5'UTR of NRAS mRNA modulate translation. Luciferase reporter constructs containing a G-quadruplex upstream of the translation start site show significantly reduced translation levels as compared to mRNAs containing no or mutated G-quadruplexes (see Kumari et al., Nat Chem Biol 2007; 3: 218).
Non-coding RNA function
More generally, we are exploring other noncoding structural elements formed by RNA molecules. Recently, much evidence has pointed to a role for micro-RNAs and long non-coding (lnc) RNAs in epigenetic regulation and cancer. For example, lncRNAs are thought to associate with chromatin-modifying complexes to delineate active versus silent chromatin domains. In breast tumours, the lncRNA HOTAIR shows high expression, with increased levels correlating with greater tumour spread and cell death. HOTAIR is thought to interact with the Polycomb repressive complex 2 (PRC2) to remodel chromatin to a more embryonic phenotype. In cell culture models, over-expression of HOTAIR promotes a metastatic phenotype while repression inhibits cancer invasiveness. In collaboration with Adele Murrell's laboratory, we are defining which domains in HOTAIR are required for HOTAIRPRC2 interaction. We are also designing small molecule ligands that disrupt HOTAIR binding to PRC2 with a view to reversing metastasis.
