1. Introduction
The concept that DNA guanine (G)-quadruplexes (G4s) are a novel class of biomolecular targets for cancer drug discovery originated in the late 1980s [
1], [
2], although the ability of guanine derivatives to self-assemble into four-stranded structures has been recognized since 1962 [
3]. Telomeric-G4, formed in the single-strand telomeric guanine-rich DNA sequences in eukaryotic chromosomes under near-physiological conditions, was the first example of G4 formation in organisms [
1], [
4], [
5]. In 1997, human telomerase was found to be inhibited by telomeric-G4-interactive small molecules [
6]. Subsequently, the formation of G4 in the promoter region of the human oncogene
MYC (
MYC-G4) was reported in 2002, which provided initial evidence for the biological significance of DNA G4 in oncogene expression and demonstrated that their stabilization by G4-interactive small molecules can cause the downregulation of oncogenes [
7]. This study sparked interest in non-telomeric -G4s and established the targeting of oncogene promoter G4s as an alternative strategy for cancer therapy, with
MYC-G4 as a model system. Subsequently, a large number of oncogene promoter G4s have been discovered, including
KRAS-,
PDGFR-β-,
BCL-2-,
c-KIT-,
VEGF-, and
EGFR-G4s [
8], [
9], [
10], [
11], [
12], [
13], [
14], [
15]. The 2009 Nobel Prize in Physiology or Medicine was awarded to Elizabeth Blackburn, Carol Greider, and Jack Szostak for the discovery of telomeres and telomerase. Since then, G4s have garnered significant attention as promising anticancer drug targets. In addition to their association with cancers, G4s are linked to various other diseases, including neurodegenerative diseases [
16], [
17] and viral and parasitic infections [
18], [
19]. The expansion of massive GGGGCC repeats has been implicated in frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS) [
17]. Helicase defects can lead to the accumulation of G4s in both the transcriptome and genome, resulting in many severe congenital diseases, such as Fanconi anemia (FANCJ-deficient), primordial dwarfism (BLM-deficient), Werner syndrome (WRN-deficient), and so forth [
17], [
20]. The strong correlation between G4 disruption/formation and disease onset—including sporadic Alzheimer’s disease, severe familiar coagulopathy, atopic dermatitis, myocardial infarction, and deafness—has also been reported recently [
21]. Therefore, it is of great significance to comprehend G4 structures, unravel the biological functions of G4s, and explore novel G4-targeting compounds for disease treatment.
DNA G4s are secondary structures of nucleic acid consisting of two or more stacked G-tetrads, which are formed by four guanines in G-rich sequences arranged in a circular manner and connected via Hoogsteen hydrogen bonds. The stable stacking of G-tetrads also requires the coordination of monovalent cations, such as K
+ or Na
+, with the guanine O6 (
Fig. 1(a)) [
22], [
23]. Although G4s can form intermolecularly or intramolecularly, most biologically relevant G4s are intramolecular and consist of a core comprising three tetrads [
24]. Interestingly, G4s are characterized by high structural diversity, in contrast to the rather uniform duplex structures (
Fig. 1(b)) [
24], as they can vary in terms of the directionality of G-tracts, loop type, loop length, or loop sequences. Moreover, G4s exhibit diversity in the capping structures that cover the external tetrads. For example,
MYC-G4 and
VEGF-G4 are canonical right-handed parallel-strand G4s, adhering to the well-established rules for G4 formation with a consensus sequence of G
≥3(N
1−7G
≥3)
≥3 (
Fig. 1(c)) [
24]. Telomeric (
Tel) hybrid-1 and hybrid-2 G4s are hybrid structures consisting of three parallel-oriented G-tracts and one antiparallel-oriented G-tract, arranged in different order (
Fig. 1(c)) [
24]. In contrast,
KRAS-G4,
PDGFR-β-G4, and
EGFR-G4 represent bulged G4, fill-in vacancy G4, and snap-back G4 topologies, respectively, which deviate from the canonical G4 structures (
Fig. 1(c)) [
15], [
24]. Moreover,
hTERT-G4 is a distinctive tertiary DNA G4 structure consisting of an end-to-end stacked pair of G4s with a 26-base long loop, making it the first reported instance of such a configuration (
Fig. 1(c)) [
25]. These distinct G4 structures are proposed to interact with various proteins to perform specific biological functions [
26]. The unique diversity of G4s among DNA secondary structures presents opportunities for the development of specific drugs that target individual G4s.
By using G4-specific antibodies and chemical probes, DNA G4s have been mapped in the human genome and visualized in living human cells [
27], [
28], [
29], [
30], [
31]. Under G4-inducing conditions, the human genome has the potential to form over 70 000 G4s [
32]. However, only around 10 000 G4 structures have been identified in human chromatin and living cells [
30], [
31]. These findings suggest that DNA G4 formation is a dynamic process that is highly associated with specific chromosome structures, genomic features, and cell status. Furthermore, G4-forming sequences are highly enriched in key human gene regulatory regions, such as 5′-untranslated regions, oncogene promoters, and telomeres [
30], [
31]. The biological function of DNA G4s, which can be seen as epigenetic modulators, involves the regulation of different cellular processes, including gene transcription, translation, and replication; genome stability; and telomere maintenance [
26], [
27], [
33]. The presence of DNA G4s is substantially associated with highly transcribed genes involved in cancer and neurodegenerative disorders [
27], [
34]. The formation of DNA G4 structures in human cancer cells is prominent in specific cell cycles and occurs in a dynamic equilibrium between folded and unfolded states. This equilibrium can be shifted toward a folded state by using G4-stabilizing compounds, which then suppress oncogene expression by interfering with the interaction between G4s and transcriptional factors or functional proteins, ultimately leading to cancer cell death [
31]. Moreover, the formation of DNA G4 structures is associated with increased heterogeneity in breast cancers, thereby facilitating stratification and enabling the discovery of personalized cancer treatment strategies [
35]. Taken together, DNA G4s have been recognized as distinct targets for cancer treatment, particularly in targeting “undruggable” and drug-resistant proteins such as MYC, EGFR, KRAS, and PDGFR-β [
36], [
37].
Natural products and their derivatives have historically played an important role in the discovery of drugs, particularly for cancer and infectious diseases [
38], [
39]. In fact, more than 60% of approved anticancer drugs are derived from natural sources, including the famous antitumor drugs paclitaxel, teniposide, camptothecin, and vincristine [
39]. Natural products have intrinsic advantages in structural complexity and functional diversity [
38]. Notably, many natural and nature-derived compounds, including telomestatin, quindoline i, berberine, coptisine, epiberberine, and sanguinarine, have been found to bind DNA G4 structures with potent affinity and anti-cancer activity (
Fig. 2) [
37], [
40], [
41], [
42], [
43], [
44], while the compounds listed above are just one tip of the iceberg for natural products to be excavated of their DNA G4-targeting activities.
In this review, we summarize and evaluate the recent progress related to natural and nature-derived DNA G4 binders, with an emphasis on structural studies of DNA G4s in complex with natural small molecules. We then discuss the challenges and opportunities met in the development of DNA G4-targeting drugs based on natural products.
2. Telomestatin and its derivatives are potent telomeric-G-quadruplex stabilizers
Human telomeres are typically made up of 5-8 kilobases (kb) d(TTAGGG)
n G-rich DNA repeats with a single-stranded 3′-end chromosomal overhang of 150-200 nucleotides [
45], [
46], [
47]. In general, 50-200 bases of telomeres are lost per round of replication, and the cell ultimately undergoes senescence and apoptosis when the telomeric DNA reaches a critical length [
46], [
48]. However, the telomerase enzyme, which is selectively activated in most tumor cells, maintains the telomere length through its reverse transcriptase activity and thus has a critical role in cellular transformation and immortalization [
48], [
49]. G-rich human telomeres can form two distinct types of DNA G4s—namely, hybrid-1 and hybrid-2—in physiologically relevant solution conditions. The stabilization of telomeric G4s by small molecules could hinder telomere extension mediated by telomerase or alternative lengthening of telomeres (ALT), leading to the shortening of the telomeric DNA and finally causing cancer cell death [
24], [
50], [
51]. Therefore, developing telomeric-G4 stabilizers has emerged as an effective anticancer strategy and has been supported by extensive studies [
24], [
52], [
53], [
54], [
55], [
56], [
57], [
58].
Telomestatin, a natural macrocyclic compound isolated from
Streptomyces annulus 3533-SV4, consists of seven oxazole rings and a thiazoline ring. It exhibits potent inhibitory activity against telomerase (
Fig. 2) [
59]. This compound has been found to specifically inhibit telomerase activity by stabilizing telomeric-G4s with an a half maximal inhibitory concentration value with telomerase repeat amplification protocol (IC
50-TRAP) of 5 nmol·L
−1—much more potent than any other reported G4-interactive molecules [
42], [
59]. In addition, telomestatin exhibits antitumor activity against a number of cancer cells, while displaying negligible toxicity to normal cells. Therefore, telomestatin has been considered as a promising anticancer drug candidate and has attracted great research attention [
57], [
60]. However, its low water solubility, together with the difficulty of obtaining large amounts of telomestatin, limits further clinical evaluation [
61], [
62].
Notably, a number of telomestatin derivatives with improved physicochemical and biological properties have been reported [
57], [
63], [
64], [
65], [
66], [
67], [
68], [
69]. Among them, one compound called L2H2-6M(2)OTD (L2H) (
Fig. 2) containing six oxazole rings and two alkyl amine side chains showed comparable bioactivity to telomestatin (IC
50-TRAP = 20 nmol·L
−1) [
57]. The alkyl amine side chains are positively charged under physiological conditions, which facilitates electrostatic interaction with the negatively charged phosphate backbone of telomeric-G4 and improves water solubility. Importantly, the nuclear magnetic resonance (NMR) solution structure of
Tel-hybrid-1 G4 in complex with L2H has been resolved by Chung et al. [
60] (
Fig. 3). The determined complex structure showed that L2H disrupts the original T-A base pair in the 5′-end capping structure of the
Tel-hybrid-1 G4. Instead, L2H is stacked on the 5′-end outer G-tetrad for extensive π-stacking interactions. The proximity of the two alkyl amine side chains to the phosphate groups of nucleotides facilitates electrostatic interactions, which play a critical role in establishing strong binding affinity. Collectively, this solution structure is the only available complex structure of a human telomeric-G4 bound to a telomestatin derivative, which provides valuable insight into the design of telomeric-G4-targeting drugs. Hence, many L2H derivatives with various functional groups have been synthesized [
63], [
66], [
67], [
68], [
70].
3. Quindoline and its derivatives as MYC G-quadruplex stabilizers
Aside from telomeric DNA G4s, oncogene promoter G4s have attracted great attention, especially the
MYC oncogene promoter G4 [
22]. Stabilizing oncogene promoter G4 structures to modulate downstream gene expression has emerged as an alternative therapeutic strategy for cancer treatment [
8], [
22]. Pioneering work from the Hurley group reported that a DNA G4 formed in the
MYC proximal-promoter region (
MYC-G4) [
7]; it works as a gene transcription repressor and can be stabilized by G4-targeting small molecules for
MYC transcription inhibition. This work established a paradigm for subsequent promoter G4 studies for genes such as
KRAS,
PDGFR-β,
BCL2,
c-KIT,
VEGF, and many others [
8], [
9], [
10], [
11], [
12], [
13], [
14].
MYC, a validated “driver” oncogene, is highly deregulated in human cells and exhibits overexpression in more than 80% of human cancers [
71], [
72], [
73]. The MYC oncoprotein plays pivotal roles in tumor cell proliferation, differentiation, metastases, and drug resistance and is considered an attractive cancer therapeutic target [
71], [
72], [
73]. However, it is well-known to be “undruggable” due to its flat surface, which lacks a compound binding pocket [
72], [
73], [
74]. Significantly, the binding and stabilization of small molecules to
MYC-G4 can effectively reduce
MYC expression, ultimately leading to the death of cancer cells [
7]. Therefore,
MYC-G4 represents an alternative target for
MYC-signaling downregulation and has been attracting great attention from the cancer community.
The major free
MYC-G4 structure was determined by our lab in 2005 [
75]. However, finding a small molecule that binds to
MYC-G4 with sufficient affinity and specificity for structure determination has been challenging [
24]. Although we have been extensively working on finding a specific
MYC-G4 stabilizer, it took six more years before we determined the first high-resolution NMR solution structure of the
MYC-G4 in complex with two molecules of quindoline i (a quindoline derivative). This is also the first complex structure of a biologically relevant promoter G4 recognized by small molecules (
Fig. 3(b)) [
76]. Quindoline i is a derivative of the naturally occurring indoloquinoline alkaloid cryptolepine [
77]. Cryptolepine was isolated from
Cryptolepis triangularis N.E.Br. It has a variety of bioactivities, including antimicrobial, antibacterial, anti-inflammatory, and anticancer activities [
77], [
78]. Moreover, cryptolepine has been used as an antimalarial drug in Central and Western African nations [
41], [
78]. In the year 2000, cryptolepine and its derivatives were shown to have antitumor activity, which may be connected with their DNA G4 interactions [
79]. Since then, a number of cryptolepine derivatives have been synthesized, some of which have shown a potent
MYC-G4 stabilization effect, including quindoline i [
41], [
80], [
81].
The NMR titration data of quindoline i to
MYC-G4 exhibited a high spectral quality that was suitable for structure determination [
76]. Therefore, the 2:1 quindoline i-
MYC-G4 complex structure was determined and showed several unexpected features, such as the rearrangement of the flanking residues induced by quindoline i to form a binding pocket (
Fig. 3(b)) [
76]. At each end, quindoline i recruits the adjacent flanking residue A6 or T23 to form a “quasi-triad plane” that stacks on the outer G-tetrads of the
MYC-G4. The binding of quindoline i involves both electrostatic and π-stacking interactions. Importantly, the diethylamino functional groups may have electrostatic interactions with the DNA phosphate backbone. It is notable that this is the first structural study showing the simultaneous binding of a DNA G4 by two small molecules. It should be pointed out that the crescent-shaped quindoline only covers two guanines of a G-tetrad for π-stacking interactions, which is different from previously described and more extensive symmetric cyclic ring-fused compounds, including telomestatin and TMPyP4, that overlap evenly with all four guanines in the external G-tetrad for maximum stacking interactions.
However, compounds with symmetric cyclic fused rings appear to bind G4s with less specificity, whereas asymmetric compounds bearing a smaller stacking skeleton—such as quindoline and quarfloxin—are more likely to bind to a specific G4 in a defined manner. Therefore, it is of great interest to develop more crescent-shaped compounds for specific G4-targeting drug discovery. Indeed, inspired by quindoline, a growing number of compounds that are specific for a particular G4 have been discovered from both natural and synthetic compounds in subsequent studies [
36], [
82], [
83], [
84], [
85], [
86], [
87], [
88], [
89], [
90].
4. Berberine and its derivatives as telomeric and promoter G-quadruplex stabilizers
In general, the selectivity of G4-targeting compounds is unsatisfactory, as it is often difficult for such compounds to effectively differentiate among multiple DNA G4s. A well-known example is berberine, a natural isoquinoline alkaloid found in many medicinal plants, including
Hydrastis canadensis,
Coptis chinensis, and
Berberis aristate [
91], [
92]. Berberine has a broad array of pharmacological functions implicated in its protein and nucleic acid interactions, such as anti-cancer, anti-inflammatory, antimicrobial, and antidiabetic activities, and has been used in traditional Chinese prescriptions for hundreds of years [
91], [
92]. Recently, berberine and its derivatives were found to be effective G4 stabilizers, uncovering a new functional mechanism of berberine scaffolds for drug development [
37], [
93]. Notably, berberine is non-selective for DNA G4s and can bind to both the telomeric-G4 and a number of gene promoter G4s.
4.1. Berberine binds to human parallel telomeric-G4 with a 6:2 stoichiometry
The first DNA G4 in complex with berberine was determined by X-ray crystal diffraction in 2013, when six berberine molecules were found to bind to a parallel G4 dimer formed by the human telomere sequence [
94]. In the determined crystal structure, two molecules of berberine match each other and form a co-plane with their concave sites in the center, which then stack onto the outer G-tetrad of the parallel telomeric-G4 (
Fig. 4). Interestingly, the A2 and T13 residues from two different DNA monomers are connected by Watson-Crick hydrogen bonds, creating a binding pocket at the two 5′-end sites where the paired berberine is located [
94]. In this way, a drug-stabilized 5′-end-to-5′-end DNA G4 dimer is formed with a total of 6:2 berberine-G4 binding stoichiometry [
94]. Notably, the biological significance of this determined crystal structure is limited due to its potential association with crystal packing rather than being a unique type of 6:2 berberine-G4 complex in solution. Nevertheless, the dimeric binding mode derived from the crystal structure has been used in many studies for developing G4-targeting berberine derivatives [
95], [
96], [
97].
4.2. Berberine and coptisine bind to MYC and KRAS promoter G-quadruplexes
We investigated the binding of berberine with various DNA G4s using
1H NMR titration experiments. The results showed that berberine preferably binds to parallel DNA G4s, including the
MYC and
KRAS promoter G4s. Unlike the crystal complex structure, which shows a dimeric binding mode, we found that berberine binds to
MYC-G4 (dissociation constant
Kd ≈ 9.9 μmol·L
−1) in a monomeric form with a 2:1 berberine-
MYC-G4 binding stoichiometry [
97]. The mass spectra used in this study clearly show that there are two major high-affinity binding sites of berberine to
MYC-G4, which differs from the previously reported 6:2 binding mode in the crystal solid state [
94], [
97]. The determined NMR solution structure of the 2:1 berberine-
MYC-G4 complex shows that berberine recruits the flanking residue to form a ligand-base co-plane that stacks on the 5′- or 3′-external G-tetrad, and the coexistence of two different berberine orientations can be observed at each binding site (
Figs. 5(a) and
(b)). Interestingly, two distinct conformations of berberine to
MYC-G4 at each binding site are clearly defined and related via rotational symmetry of about 180° (
Figs. 5(a) and
(b)). The wide binding pocket formed by the outer G-tetrad and the flanking residues in parallel G4s makes the bound ligand more flexible and able to adopt different conformations.
KRAS-G4 is another actively studied promoter G4, which is a transcriptional regulator and amenable for small-molecule targeting to downregulate
KRAS expression [
9]. The
KRAS oncogene is one of the highly mutated genes in the human genome and contributes to a number of human cancers [
37], [
98]. Although
KRAS-G4 was discovered over ten years ago, and a number of
KRAS-G4-binding compounds were reported, no high-resolution
KRAS-G4-ligand complex structure had as yet been solved, which severely impeded the further development of
KRAS-G4-targeting drugs [
37]. In 2022, we determined two NMR solution structures of
KRAS-G4 in complex with the natural products berberine (
Kd ≈ 0.55 μmol·L
−1) and coptisine (
Kd ≈ 0.50 μmol·L
−1) [
37]. In these two complex structures, the 2:1 binding stoichiometry and base-recruiting mechanism can again be observed, and appear to be the key features of berberine derivatives bound to parallel G4s in the solution state (
Fig. 6). Notably, the
KRAS-G4 contains a unique thymine bulge, which is base-paired with the flanking residue adenine by Watson-Crick hydrogen bonds in the ligand-free form. However, with berberine and coptisine binding, the original A-T base pair is disrupted to form an adenine-ligand co-plane that stacks on the 3′-end G-tetrad. The bulge base thymine partially covers the bound ligand and participates in the formation of a 3′end binding pocket (
Fig. 6). This unique thymine bulge could serve as a binding moiety to enhance the ligand selectivity for
KRAS-G4. Moreover, the
KRAS-G4-coptisine complex structure is stabilized by an extra H-bond that forms between the methylenedioxy five-member ring of coptisine and adenine H61. Nevertheless, the 4 nucleotide (nt) loop residues are not involved in the binding pocket formation, which may be worth exploring further. In addition, novel berberine derivatives can be designed by introducing different side chains at the C1, C12, and C13 positions to achieve higher affinity and selectivity for
KRAS-G4 based on the determined complex structures.
4.3. Berberine binds to a dGMP-fill-in vacancy G-quadruplex formed in the PDGFR-β gene promoter
The vacancy G4 (vG4) is a unique type of DNA G4 that is formed by three G
3 tracts and one G
2 tract and thus has a G-vacancy site in an incomplete G-tetrad [
99], [
100]. vG4s are less stable than the canonical G4s due to the presence of the G-vacancy site [
99]. However, the G-vacancy site can be specifically filled in by guanine derivatives, such as cyclic guanosine monophosphate (cGMP) and deoxyguanosine-5′-monophosphate (dGMP), which provides an opportunity for developing selective vG4-targeting drugs by designing small-molecule-guanine conjugates that utilize the G-vacancy site as an anchor point [
101], [
102]. Moreover, the formation of G-fill-in vG4s indicates potential gene-regulatory mechanisms associated with the guanine metabolite concentration in cells and implies new opportunities for novel drug development [
101], [
103], [
104], [
105]. In 2020, we determined the NMR solution structure of the first dGMP fill-in vG4 from the
PDGFR-
β gene promoter [
101]. We also found the natural alkaloid berberine as a suitable small molecule that could specifically bind and stabilize this distinct type of dGMP fill-in vG4 (
Kd ≈ 1.6 μmol·L
−1) [
93]. The NMR structure of the ternary berberine-dGMP-vG4 complex in potassium solution was determined (
Fig. 7) [
93]. This is the first complex structure of a small molecule bound to a fill-in vG4. Like the berberine-
MYC-G4 complex structure, each berberine recruits the adenine residue from the two flanking sequences to form a “quasi-triad plane” that stacks on the two outer G-tetrads of the fill-in vG4. The binding involves π-stacking and electrostatic interactions. The coexistence of a minor ligand conformation in the two binding sites is also observed in the berberine-dGMP-
PDGFR-
β-vG4 complex, just like the berberine-
MYC-G4 binary complex (
Fig. 5,
Fig. 7) [
93], [
97]. This study reveals the interactions of berberine with a biologically relevant vG4 and contributes structural insight for the design of vG4-targeting berberine derivatives.
4.4. Epiberberine specifically binds to human Tel-hybrid-2 G-quadruplex
Epiberberine is a berberine derivative that differs in the positions of the methoxy and methylenedioxy groups (
Fig. 2) [
106]. Our lab determined the structure of the biologically relevant human
Tel-hybrid-2 G4 (Tel2G4) back in 2007 [
51], and it took almost ten years to find a suitable small molecule that can specifically bind to this unique structure [
106]. In 2018, we solved the solution structure of Tel2G4 in complex with epiberberine (
Kd ≈ 0.016 μmol·L
−1) using NMR (
Fig. 8) [
53], [
106]. Unlike berberine, which binds to the promoter G4 with a 2:1 binding stoichiometry, epiberberine specifically binds to the 5′-end G-tetrad of the Tel2G4. This specific binding induces a significant rearrangement of the flanking residues and the TTA loop at the 5′-end site, forming a well-fitted binding pocket. Epiberberine recruits the flanking adenine to form an H-bonded ligand-base co-plane that stacks on top of the 5′-end outer G-tetrad. Simultaneously, this region is covered by a T:T:A triad layer and another T:T base pair through π-stacking interactions. Such an extensive four-layer binding pocket has never been described in G4-ligand complexes before.
It is notable that the structurally similar alkaloids berberine (
Kd ≈ 1.99 μmol·L
−1), coptisine (
Kd ≈ 0.33 μmol·L
−1), and palmatine (
Kd ≈ 0.74 μmol·L
−1) cannot bind to Tel2G4 well [
53], suggesting that the positions of the methoxy and methylenedioxy groups have a crucial impact on the specific recognition. Significantly, epiberberine exhibits such high specificity toward Tel2G4 that it can convert other telomeric-G4 structures into hybrid-2 under physiological conditions, making it the first reported example of this kind. Overall, this study provides structural insight into ligand interaction with the human telomeric-G4 and contributes a model system for developing specific Tel2G4-targeting drugs.
5. Many other natural products as DNA G-quadruplex binders
Given the structural diversity and polymorphism of the human DNA G4s, it is reasonable to look for potent and selective G4 binders among natural products. Indeed, a growing list of naturally occurring DNA G4 binders has been reported beyond the above-discussed G4-binding natural alkaloids, including distamycin A [
107], Fe(III)-protoporphyrin IX (hemin) [
108], colchicine [
109], pegaharmine D [
84], sanguinarine [
40], chelerythrine [
40], piperine [
110], magnoflorine [
111], triptolide [
112], jatrorrhizine [
113], fangchinoline [
114], evodiamine [
114], isaindigotone [
58], quinazoline [
115], and schizocommunin derivatives [
116].
Distamycin A, a canonical DNA minor groove binder, was found to bind to the two opposite grooves of an intermolecular parallel [d(TGGGGT)]
4 G4 in 4:1 binding stoichiometry with a binding constant (
Kb) value of (4.0 ± 3.0) ×10
6 L·mol
−1 [
107]. This finding suggests that the design of novel G4-binding compounds can be achieved by combining G4 ligands with DNA duplex binders, thereby enhancing both G4 specificity and affinity. Furthermore, a flexible DNA duplex binder can be used to link two G4-binding compounds, forming a clamp-like ligand for stable anchoring inspired by the artificial G4 probe (G4P) protein [
31]. Hemin, a rigid natural macrocycle compound chelated with a metal ion, can be captured by G4 structures formed in the expanded hexanucleotide repeat RNA (
Kd ≈ 3 µmol·L
−1) and DNA of the
C9orf72 gene. This interaction enhances peroxidase activity and is associated with the development of neurodegenerative diseases such as ALS and FTD [
117]. The positive ion center and aromatic macrocycle skeleton are natural characteristics of G4 binding compounds that have attracted intense attention from researchers in structural modification. In general, the cation center and the side chains are the main modified objects, represented by the pentacationic manganese(III)-porphyrin complex (association constant (
Ka) = 10
8 L·mol
−1 human telomeric-G4) and the porphyrin-bridged tetranuclear platinum complexes with significant G4-binding specificity [
118], [
119], [
120], [
121], [
122], [
123]. Among the discovered G4-binding natural products, many have crescent-shaped skeletons with significant G4-stabilizing activity, such as sanguinarine (
Ka ≈ 1.16 × 10
6 L·mol
−1-
KRAS-G4) [
124], jatrorrhizine (
Ka ≈ 0.90 × 10
6 L·mol
−1-
KRAS-G4) [
124], and chelerythrine (
Kd ≈ 0.25 μmol·L
−1-
VEGFA-G4) [
125]. The drug potential of sanguinarine and chelerythrine is restrained due to their toxicity against normal cells [
126], [
127]. However, structural modifications may help to attenuate such side effects. It is interesting to note that schizocommunin is not a rigid crescent-shaped compound and has a relatively flexible carbon skeleton, whereas the intramolecular hydrogen bond contributes greatly to the compound’s planarity and thus its G4-binding affinity [
116], [
128]. Therefore, it is worth considering the introduction of H-bond-forming fragments to amplify the planarity of other possible compounds while ensuring a certain degree of flexibility. The tertiary nitrogen centers could also be quaternized to introduce positive charges, thereby enhancing the interactions with electron-rich G4s [
129], [
130]. Many studies have investigated the structural modifications of crescent-shaped natural products for higher G4-binding specificity and better pharmacological activity [
131], [
132], [
133]. Given the lack of determined structures for many G4-ligand complexes, it is crucial to comprehend the molecular recognition of such products for specific G4s and utilize this knowledge to develop novel nature-derived drugs.
6. Challenges and opportunities for DNA G4-targeting drug design based on natural products
Structural studies on natural small-molecule-G4 complexes have provided valuable structural basis and insight into targeting human promoter G4s and telomeric-G4s. Macrocyclic molecules, such as telomestatin derivatives, are similar in size to a G-tetrad and cover the entire four guanines of the outer G-tetrad in the
Tel-hybrid-1 G4 for extensive π-stacking and electrostatic interactions. These large macrocyclic molecules generally have high affinity and low selectivity for the different topologies of DNA G4s, which is challenging to bind a specific G4. In contrast, small crescent-shaped pharmacophores with suitable functional groups, such as berberine, epiberberine, and quindolines, are more likely to bind to biologically relevant intramolecular G4s in a specific manner. Since crescent-shaped small molecules only cover two guanines, they often recruit one flanking residue to form a “quasi-triad plane” that stacks over the outer G-tetrad for specific G4 binding. Distinct from DNA minor-groove binders, which emphasize skeleton length and flexibility for sequence-specific targeting and to adapt to the helical topology of duplex DNA [
134], [
135], [
136], [
137], crescent-shaped G4-binding compounds are characterized by extended aromatic ring systems, positively charged centers, and modifiable side chains. The central positively charged nitrogen in a crescent-shaped pharmacophore would normally be positioned above the negatively polarized carbonyl groups of the G-tetrad, resulting in strong electrostatic interactions. Moreover, the possible hydrogen bonding interaction between the ligands and the recruited flanking residues is an important factor that reinforces the specific interaction, as in the cases of quindoline i and berberine to
MYC-G4, epiberberine to telomeric-G4, and coptisine to
KRAS-G4. Meanwhile, the modifiable cationic side chains would interact with the grooves or phosphate backbones of the G4s for steric and electrostatic interactions, contributing additional forces for the specific recognition of distinct DNA G4s. Collectively, the optimal small molecules utilize a combination of interactions—including steric effects, π-stacking, H-bonds, and electrostatic interactions, to recognize individual G4s specifically—which can only be identified in NMR solution structure studies.
The main challenge of current G4-targeting drug discovery is G4 selectivity. Structural studies of the human genomic G4s have shown that many G4s share the same general features—that is, a stacked G-tetrad core with several short loops. The reported G4-targeting compounds stack on the outer G-tetrad for extensive π-stacking interactions; therefore, it is difficult for these compounds to distinguish among different G4s, especially with similar G4 topologies. Remarkably, the recently discovered unique G4s, such as vG4s [
101], [
104], [
105], bulge-containing G4s [
37], and stem-loop-containing G4s [
138], [
139], [
140], may provide opportunities to develop particular G4-targeting drugs, because these G4s have features distinct from the canonical G4s that can be utilized for more selective ligand design. For example, inspired by a dual-specific targeting approach [
141], [
142], natural G4-binding ligands could be conjugated with specific DNA duplex binders to target specific stem-loop G4s or G4s with suitable grooves. Similarly, natural ligands could be modified with guanine analogs for vG4-specific binding by partly contributing to the integrity of vG4s [
102], [
104], [
93], [
143]. For G4s with bulges or loops, natural compounds could be conjugated with complementary base analogs linked by flexible carbon chains for possible complementary pairing and hydrogen bonding. Apart from G4s with special structural topologies or sequence compositions, the unwound single DNA strands adjacent to G4s could be targeted by ligand-“oligonucleotide” complexes, such as ligand-peptide nucleic acid (PNA) conjugates, for individual G4 targeting [
144], [
145]. It was reported that the conjugate of the G4 ligand naphthalene diimide (NDI) with PNA, which can hybridize with G4 flanking sequences, bound specifically to the G4 within the human immunodeficiency virus (HIV)-1 long terminal repeat (LTR) region, implying its potential in acquired immune deficiency syndrome (AIDS) treatment [
144]. Notably, the cell membrane permeability should be considered when designing ligand-PNA conjugates. Platinum halide modification could also be utilized to realize covalent binding with flanking or loop bases and to reduce off-target effects for potent cancer therapies [
146], [
147]. In addition, carbohydrates could be added to the ligand for higher selectivity toward cancer cell G4s [
87], [
148], [
149]. Since studies of ligand-conjugates have rarely involved natural products, further experiments should be carried out to verify the feasibility of the above strategies. On the other hand, the enantioselectivity of natural and nature-derived compounds could be considered for higher specificity toward targeted G4s. Many synthetic metal complexes have exhibited enantioselectivity toward specific G4s [
150], [
151], [
152], [
153], [
154], [
155]; however, the chirality of natural and nature-derived compounds associated with G4-binding activity is less understood. For ligand skeletons with chiral carbons, the chiral effects should be investigated further. For example, the derivative (
S)-telomestatin was reported to have much higher telomeric-G4 binding and telomerase-inhibitory activity than natural (
R)-telomestatin [
52]. Pegaharmine D, a pair of racemates, could bind parallel G4s specifically and showed remarkable inhibitory activity against cancer cells, while the effects of its chirality on G4 binding and bioactivity have not been investigated yet [
84]. Since many alkaloids are chiral with distinct pharmacological activity, it would be of great significance to dissect the relationships between ligand enantioselectivity and G4 affinity-selectivity through specific structural analysis. Altogether, the above strategies could be combined for much more specific G4 targeting as well as higher therapeutic effects.
Another challenge confronted by G4-targeting drug discovery is the timely circulation and sharing of newly found natural alkaloids all over the world. Hundreds to thousands of novel alkaloids with various skeletons have been isolated from terrestrial and marine organisms, including plants, fungi, and bacteria [
156], [
157], [
158], [
159], [
160], [
161], [
162], [
163]. Many of them have aromatic planes and positively charged centers. In particular, many marine alkaloids have natural halogen groups, which have been found to stabilize π-stacking by withdrawing electrons [
164], [
165], [
166]. Halogen groups could also help improve the lipo-solubility and bioavailability of ligands [
116]. Tajuddeen et al. [
167] reported a versatile new class of axially chiral
N,C-coupled naphthylisoquinoline alkaloids isolated from plants, which might have the potential to bind G4s. However, the G4-binding activity screening of these newly discovered natural alkaloids is hampered by the limitations of their sources, synthesis, and commercialization. Another reason may be that G4s are newly recognized targets for small molecules and are not as well-known as targets such as duplex DNA and proteins. In fact, telomestatin is the only known G4-binding natural ligand isolated from microorganisms since its discovery in 2001, and it has been derivatized substantially for better bioactivity [
57], [
59]. Therefore, numerous novel natural products remain to be tested and derivatized for their G4-binding potential, which will advance the field of natural or nature-derived drug discovery based on G4 interactions.
At present, a momentous argument is whether it is necessary to target a specific G4 for disease treatment. Cancer and other disease cells have been found to have more G4s, along with more obvious genomic instability, than normal cells. Many reported G4-binding compounds have been found to bind different G4s and exhibit prominent cancer-inhibitory activities with little or no toxicity toward normal cells [
20], [
168], [
169], [
170]. Berberine, which was discussed earlier in this review, has been found to bind the G4s formed in telomeres and the promoters of
MYC,
KRAS, and
PDGFR-β, and inhibit the proliferation of non-small cell lung cancer (NSCLC) cells. Thus, it is worth considering the pursuit of multiple G4 targeting for complex disease treatment. However, for diseases caused by one or several canonical aberrant genes (e.g.,
EGFR and
KRAS mutations, and
C9orf72 hexanucleotide repeats) or those related to synthetic lethality [
8], [
27], [
117], [
171], individual G4 targeting would be an effective therapy with higher safety and thus could still be an important direction for drug design. The “oncogene addiction” hypothesis suggests that some cancers depend on a driver gene or genes for growth and viability, and define appropriate cancer targets [
172], [
173]. Therefore, targeting aberrantly expressed genes by acting on promoter G4s could be a fruitful way for novel drug development in cancer therapy, especially as an alternative strategy for “undruggable” and drug-resistant proteins such as MYC, KRAS, and EGFR. Collectively, more experimental verifications on the cellular and animal levels are needed to clarify the advantages and disadvantages of individual-G4 targeting and multiple-G4 targeting strategies.
7. Conclusions
Natural products have clear importance and advantages in drug design, as they represent a huge inspiring chemical library to be tested and derivatized. Numerous studies have revealed that G4s participate in various aberrant biological processes as epigenetic and regulatory elements in replication, transcription, and translation. The formation of G4s can inhibit DNA methylation and influence the nucleosome assembly, which renders G4s as special epigenetic markers. In addition, G4 formation can cause site mutagenesis, gene deletion-junction, transposition, rearrangement, and copy number alterations, which are important sources of genome instability and disease genesis [
174], [
175]. G4s in the promoter regions can induce the binding of transcription factors and may even change the chromatin architecture and regulate gene expression by promoting long-range interactions including promoter-enhancer interactions and chromatin looping, mediated by chromatin remodeler proteins and long-loop G4 formation [
176], [
177], [
178], [
179], [
180]. The formation and stabilization of R-loops are also closely related to G4 formation [
176], [
181]. The long-range interactions and R-loops mediated by G4s, together with regulatory proteins, may promote liquid-liquid phase separation (LLPS) to ensure efficient biological processes [
26], which are prominent in cancer cells for rampant proliferation and nutrient depredation. However, the specific mechanisms still need experimental confirmation. Furthermore, the aberrant formation of G4s in coding regions would induce replication fork stalling, transcription and translation termination, like a simple “roadblock,” thereby affecting cellular homeostasis and causing pathological lesions. To maintain orderly vital movements, helicases, and other nucleic acid binding proteins are frequently needed to unwind G4s in normal cells, which also involves DNA damage repair pathways [
174], [
182], [
183], [
184]. Once the balance of G4-formation-unwinding is disrupted and DNA repair errors occur due to congenital or environmental factors, diseases such as cancers can happen. Therefore, aside from protein targeting, it is judicious to develop G4-targeting drugs by taking advantage of quadruplex-structural features distinct from double-stranded DNA (dsDNA). Natural drugs that target G4s can compete with G4-binding proteins and impede gene expression, acting as a “brake,” thus offering a promising strategy for disease treatment.
Structural analyses of G4-ligand complexes can provide information about ligand binding mechanisms and novel drug design. Based on the determined complex structures, it is evident that natural small molecules specifically recognize G4s through a combination of interactions, including π-stacking, H-bonds, electrostatic interactions, and steric effects. These specific binding modes of ligand-G4 complexes in pseudo-physiological conditions can be best studied via NMR solution structure analysis, because crystallization commonly produces artificial ligand-G4 complexes due to crystal packing, as in the case of the 6:2 berberine-G4 crystal structure observed under crystalline conditions. Unlike macrocyclic molecules, which directly cover the entire G-tetrad for both π-stacking and electrostatic interactions, small crescent-shaped pharmacophores can recruit flanking residues to form a “quasi-triad plane” that stacks over the outer G-tetrad, facilitating extensive π-stacking and electrostatic interactions. Specific H-bond interactions are observed in the “quasi-triad plane,” which promote higher binding affinity. Specific structural modifications based on natural products are greatly needed to improve the selectivity and bioactivity of these compounds toward G4s.
The development of various small molecule modification strategies to enhance G4-binding specificity and affinity has made it more feasible to target individual G4s, while targeting multiple G4s is emerging as an important strategy for treating complex diseases. Pidnarulex (CX-5461), a first-in-class G4-targeting compound, is in clinical evaluation for treating BRCA1/2 deficient breast and ovarian cancer patients [
169]. Due to the significant correlation between G4 formation and tumor progression, G4-binding drug development will open up a new window for disease treatments. More novel natural alkaloids—especially those from marine organisms and endophytic fungi—should be tested for their G4-binding ability and pharmacological activity; meanwhile, ligand-G4 complex structure analysis is imperative for instructive drug derivatization. Since natural products have historically been an important source of therapeutic drug leads, we believe more naturally occurring G4-targeting drugs will be discovered in the future, with exceptional G4 selectivity and high efficacy in disease treatment.
Acknowledgment
This research was supported by the National Institutes of Health (R01CA177585, U01CA240346, and R01CA153821) (DY), the Purdue Center for Cancer Research (P30CA023168), the National Natural Science Foundation of China (82173707 and 82322065), the Program for Jiangsu Province Innovative Research Scholar (JSSCRC2021512), and the “Double First-Class” University Project (CPUQNJC22_08).
Compliance with ethics guidelines
Kai-Bo Wang, Yingying Wang, Jonathan Dickerhoff, and Danzhou Yang declare that they have no conflict of interest or financial conflicts to disclose.