Introduction
RNA binding proteins (RBPs) are critical in modulating RNA metabolism
and linked with erroneous gene regulation in a wide range of disease
conditions [1]. The human genome codes for more than 3500 RBPs
[2]. Their emerging role is underscored by genome-wide studies
indicating that hundreds of these RBPs are significantly dysregulated in
a variety of cancer types [3], where some are even identified as
potential cancer drivers [4]. In addition, RBPs are implicated in
numerous somatic and mendelian genetic diseases, impacting multiple
organ systems in humans such as metabolic, neurodegenerative,
musculoskeletal and connective tissue diseases [2]. Altered
expression or function of RBPs translates into aberrant control of
target RNAs, and hence gene expression, ultimately driving pathological
phenotypes [5]. RBP-RNA interactions are driven by RNA-binding
domains (RBDs) and are often dysregulated in human cancers [5].
Importantly, many RBPs bind the same sub-set of target RNAs, potentially
exploiting a synergistic or competitive physiology [6]. However, the
molecular mechanisms by which RBPs direct their specificity, including
the selective use of their constituent RBDs to target specific RNA
types, remain elusive. RBPs are known to interact with RNA molecules
through two RNA-binding motifs (RNP1 and RNP2). RNPs within an RBD
provide some underlying principles about how an RBP recognizes specific
RNA species [7]. RNPs are evolutionarily conserved among many RBPs
and correspond to a
β1-α1-β2-β3 -α2-β4structural arrangement [8]. The two beta strands found in the middle
of this arrangement (indicated in bold) are known to interact with RNA
either as the octameric RNP1 or as the hexameric RNP2 motif with the
conserved sequence (R/K)-G-(F/Y)-(G/A)-(F/Y)-V-X-(F/Y) [9] or as the
hexameric RNP2 motif with the conserved sequence of
(L/I)-(F/Y)-(V/I)-X-(N/G)-L [9]. RNP-RNA interactions are
predominantly hydrophobic, and aromatic residues are especially
important in mediating the interaction through Van der Waals forces, π-π
stacking interactions [10] with nucleotide bases and π-sugar ring
interactions [10]. Additionally, basic residues in these conserved
motifs also form salt bridges with phosphate groups to enhance stability
[10]. Previous studies have established the role of aromatic and
basic residues in interactions of hnRNP A1[11,12] and Lin28 [13]
with RNA molecules.
Nucleolin (NCL), a multifunctional RBP is often overexpressed in many
cancers and disease conditions [14]. NCL is involved in myriads of
cellular processes that are ultimately tied to its RNA/DNA-binding
functions to regulate gene expression that control cell survival, growth
and or death. These roles include sensing stress [15], ribosome
biogenesis [16], chromatin remodeling [17], DNA replication,
transcription, messenger RNA (mRNA) turnover [18], induction
[19] & inhibition [20] of translation, and microRNA (miRNA)
biogenesis [21]. NCL protein is organized into distinct functional
domains: (a) the highly acidic N-terminal domain with basic stretches
that contains the nuclear localization signal, is heavily phosphorylated
during the cell cycle by stage-specific kinases, and drives the histone
chaperone activity of NCL [17]; (b) the glycine and arginine-rich
(RGG/GAR) C-terminal domain, known to play a critical role in
protein-protein interactions such as with ribosomal proteins [22]
and the tumor suppressor p53 [23] and is also implicated in
non-specific interactions with RNA; and (c) the central region
constitutes two-to-four distinct RNA-binding domains and is critical for
its interaction with different species of RNAs [24]. Most eukaryotic
species, including plants, contain only two RBDs in NCL protein, where
the individual RBD domains are better conserved among the orthologs than
within the protein. Interestingly, NCL from Dictyostelium
Discoideum uniquely possesses an odd number of RBDs (three RBDs)
suggesting a unique RNA binding profile in this organism (Singh Lab,
unpublished). NCL has evolved in vertebrates, including humans, to an
increased (four) number of RBDs where RBDs 3 and 4 are unique to these
organisms [24]. It is also well-established that RBDs 1 and 2 are
sufficient for certain NCL-RNA interactions, specifically binding to
mRNA [25,26] and rRNA molecules [16]. The newly emerged RBD3 and
4 domains suggest potential evolutionary novel functions of NCL in these
higher organisms. However, in contrast to RBDs 1 and 2, RBDs 3 and 4
have remained overlooked and understudied.
NCL regulates gene expression by binding both coding (mRNA) and
non-coding RNA species (rRNA, miRNA, and lnc RNA). It is well
established that NCL interacts with RNA preferentially through stem loop
structures including apical loops or hairpin loops [20,26] and AU
[18,27]/G rich elements [28,29], both serving as signature
sequence or structural motifs for NCL-RNA affinity. In fact, a G-rich
stem-loop structure called nucleolin recognition element (NRE), found in
pre-ribosomal RNA, establishes a primary role of NCL in processing rRNA.
Similarly, NCL also demonstrates high affinity for the 11 nt single
stranded evolutionary conserved motif (ECM) found 5 nt downstream of the
pre-rRNA processing site [30]. Additional RNA recognition motifs
which NCL is known to interact with include AU rich elements (ARE),
G-quadruplex structures in the tumor suppressor TP53 mRNA
[25] and a stem loop forming GCCCGG motif in GADD45α mRNA in
DNA damage response [29]. NCL-mRNA interactions mediated by its RBDs
influence mRNA turnover rate or translation [25,26], while
NCL-lnc-RNA binding also has implications in RNA localization
[31,32]. Overall, it is clear that NCL-RNA interactions have a
profound influence on many cellular processes that control growth,
proliferation, and survival.
As a member of the short non-coding RNA molecules, microRNAs (miRNA) are
often dysregulated in many cancers where the aberrant miRNA processing
is linked to tumorigenesis [33]. Processing of primary-miRNA
(pri-miRNA) to precursor-miRNA (pre-miRNA) in animals is mediated via
the microprocessor complex (MPC) in the nucleus. The pre-miRNA is then
transported to the cytoplasm by shuttle proteins and subsequently
processed into its mature form by Dicer [34,35]. In plants, on the
other hand, both the pri- and pre-miRNA are processed solely in the
nucleus by Dicer-Like1 protein (DCL-1) and a few more helper proteins
[36,37]. A similar mechanism also exists in Dictyostelium
Discodeium , a slime mold species, where the double stranded RNA (dsRNA)
binding protein RbdB processes pre-miRNA molecules [38]. NCL is
known to interact with the active components, Drosha and DCGR8 in the
microprocessor complex [21] and the emergence of NCL-RBD3-4 in
higher organisms coincides with the roles of NCL in miRNA processing.
We, therefore, propose that the emergence of NCL-RBD3-4 in higher
organisms coincides with the NCL role in miRNA processing in evolution
and that RBD3-4 possess sequence/structural determinants that
specifically recognize miRNA precursor molecules in NCL protein.
The focus of this study is to elucidate the selective preference of
specific NCL RBDs for the recognition of miRNAs using an in
silico approach. Structural information for NCL RBDs and miRNA
molecules is either unavailable or limited to partial structures. To
fill these structural gaps, in this study we generated 3D models of the
human NCL central region containing all 4 RBDs as well and various
tandem pairs of RBD, as well as selected miRNAs . Our data include much
needed structural models of NCL-RBDs, miRNAs and predicted scenarios of
NCL-miRNA interactions from RNA-Protein docking algorithms. Our study
suggests a predominant role of NCL RBDs 3 and 4 in miRNA target
specificity and provides details about key motifs/residues at the
NCL-substrate interface responsible of specific NCL-miRNA interactions.
Structural modeling and in silico analysis tools provide valuable
information to fill in the knowledge gaps and provide a cost effective
and rational entry point in experimental design. Ultimately, the
insights from this study can lead to future studies for identifying new
drug design targets to regulate NCL functions in gene expression during
tumorigenesis.