The seesaw between normal function and protein aggregation: How functional interactions may increase protein solubility

Protein aggregation has been studied for at least 3 decades, and many of the principles that regulate this event are relatively well understood. Here, however, we present a different perspective to explain why proteins aggregate: we argue that aggregation may occur as a side‐effect of the lack of one or more natural partners that, under physiologic conditions, would act as chaperones. This would explain why the same surfaces that have evolved for functional purposes are also those that favour aggregation. In the course of reviewing this field, we substantiate our hypothesis with three paradigmatic examples that argue for the generality of our proposal. An obvious corollary of this hypothesis is, of course, that targeting the physiological partners of a protein could be the most direct and specific approach to designing anti‐aggregation molecules. Our analysis may thus inform a different strategy for combating diseases of protein aggregation and misfolding.


F I G U R E 1
Proteins are similar to gear toothed wheels. They must work in groups to carry out their functions. When in isolation their very meaning is lost. (A) A group of wheels that, by turning together, allow the movement of the whole mechanism. (B) Isolated (disconnected), toothed wheels. (C) The ternary complex of two components of iron-sulphur cluster biogenesis from bacteria (pdb id 4eb5). The complex is formed by the dimeric desulphurase IscS (in two blue tones) and the scaffold protein IscU (red and magenta). The complex is dynamic but stable. implication of the last statement is that aggregation could be viewed as caused by the lack of natural partners. We could thus say that physiologic interactors are the chaperones that keep proteins in solution.
Different evidence demonstrates a direct link between normal and aberrant functions. A systematic analysis of the protein-protein complexes in the Protein Data Bank (PDB), for instance, showed that protein interface regions are more prone to aggregate than other surfaces.
This indicated that many of the interactions that promote formation of functional complexes, including hydrophobic and electrostatic forces, can potentially also cause abnormal self-assembly. [8] Accordingly, we demonstrated that the same regions of a protein that have evolved to endow the molecule with its function are those that promote aggregation. [9] This implies that regions involved in protein aggregation may originate from an authentic physiological functional need that, under specific conditions, may backfire and result in pathology. [10] A direct relation between normal and aberrant function strongly suggests studying the physiologic function to gain a better understanding of aggregation. This is at variance with the common tendency to consider aggregation the key to understanding pathology, often completely neglecting the non-pathologic function(s) of the protein under study and the environment in which it operates. There seems to be, in other words, a barrier that separates studies aiming at elucidating the non-pathologic function of proteins from investigations of protein aggregation. Another crucial consequence of a partner-orphan model of aggregation is that normal partners could inform drug design.
The perspective of a partner-orphan mechanism as the origin of at least some misfolding diseases may change drastically our perception of protein aggregation: self-assembly would be a late side-effect of the disease and not, as it has often been considered, the cause of many diseases.
In this review, we discuss three paradigmatic examples that argue for the generality of our working hypothesis. We also analyse the case of the Aβ peptide as a classic example that has been widely studied without questioning why the peptide is present in the brain. Our analysis may inform new strategies to combat protein aggregation and misfolding and demonstrate that the study of normal function holds specific value for understanding the pathological behaviour of proteins.

A STRIKING EXAMPLE OF A PARTNER-ORPHAN MECHANISM: THE ROLE OF UBIQUITIN BINDING IN ATAXIN-3 AGGREGATION
The first example we encountered in which a direct relationship between physiologic function and aggregation was immediately clear is that of ataxin-3 (UniProt ID P54252). [11] This is a 40-43-kDa cysteine F I G U R E 2 Relationship between physiologic function and aggregation. (A) Proteins may evolve to have exposed hydrophobic regions to mediate interactions with their partners, indicated in the figure with magenta and blue patches. (B) When the substrate is absent, the protein may be unstable and/or misfold leaving the exposed surfaces naked. (C) The regions functionally important can then promote aggregation protease with specific ubiquitin hydrolase enzymatic functions. [12,13] Ataxin-3 takes part in different cellular pathways and seems to play an important role both in the ubiquitin proteasome machinery [14] and in gene expression regulation. [15] Ataxin-3 is ubiquitously expressed in different cell types of peripheral and neuronal tissues. [16][17][18][19][20] Although not essential, the protein is evolutionarily conserved. [21] Its structure consists of a globular N terminal Josephin domain, followed by two ubiquitin-interacting motifs (UIMs), a polymorphic tract of polyglutamine (polyQ) repeat, and a more flexible C-terminus that, in some isoforms, contains a third UIM. Two nuclear export signals and one nuclear localisation signal are also present in the C-terminus. [22] We recently demonstrated that, although generally more flexible, the C-terminus of ataxin-3 contains well-defined secondary structure elements. [23] Ataxin-3 is linked to the progressive neurodegenerative Machado-Joseph disease (MJD; MIM catalogue no. 109150), also known as spinocerebellar ataxia type 3 (or SCA3). The disease occurs as the consequence of the pathological expansion of CAG repeats in a coding region of the ATXN3 gene, which results in a tract of polyQ repeats within ataxin-3. The repeat length correlates with disease onset: normal individuals have 12-44 glutamine repeats, whereas MJD patients have expanded polyQ repeats above 53 up to approximately 87 glutamines. [24,25] A direct link between pathology and polyQ is testified by an inverse correlation between the age at onset and the number of CAG repeats in the ATXN3 gene [26,27] within a variability of the age at onset [28] that is probably influenced by a mixture of environmental [29] and genetic factors. [30] A hallmark of MJD is the formation of macromolecular aggregates containing the pathological expanded ataxin-3. [31] It is still unclear whether aggregation is initiated by full-length ataxin-3 or its calpain and caspase cleavage fragments, since both are able to aggregate in vitro and in vivo: [32][33][34] while polyQ expansion is deemed to be essential for disease onset, the isolated Josephin domain is strongly prone to aggregation and fibrillation in vitro, suggesting that the protein contains multiple aggregation motifs as is also the case for other disease-associated proteins, such as prions (for a comprehensive review see [35] ).
Ataxin-3 binds and cleaves preferentially polyubiquitin chains of at least four subunits. The catalytic triad is in the Josephin domain, which is thus the enzymatically active region. Several ubiquitin-binding sites are observed along the ataxin-3 length, two or possibly three being in the Josephin domain, [36] two or three more (depending on the isoform) corresponding to the UIMs in the protein C-terminus. [37] In the course of our analyses on ataxin-3 and Josephin aggregation, we observed that the surfaces of Josephin evolved to bind ubiquitin are the same that promote aggregation. [9] This was evidenced by mutation of even only one amino acid in the ubiquitin-binding surfaces appreciably reducing self-assembly. [9] Conversely, addition of mono-ubiquitin resulted in a strong reduction of aggregation, presumably through masking the hydrophobic surfaces that promote aggregation.
From these studies, we deduced that ataxin-3 has exposed hydrophobic surfaces evolved to recognise the substrate and other partners. When the substrate is absent the same surfaces could promote misfolding and aggregation ( Figure 2).
These results thus argue for a direct relationship between normal function, physiologic partners and aberrant aggregation.

WHAT ABOUT RNA IN PROTEIN AGGREGATION?
Mislocalisation of the nuclear heterogeneous ribonucleoprotein transactive response DNA binding protein  to the cytoplasm, and its subsequent aggregation into toxic inclusions is the common hallmark found in over 97% of all cases of amyotrophic lateral sclerosis (ALS). [38] Increasing evidence supports the hypothesis that these aggregates have a direct toxic effect and are not just a consequence. [39] TDP-43 is a highly conserved and ubiquitously expressed protein, which was initially identified thanks to its ability to bind HIV-1 TAR DNA and act as a transcriptional repressor. [40] When non-pathologic, TDP-43 is a nuclear protein, [41] but, under pathologic conditions, the protein becomes predominant in the cytoplasm where it forms the inclusions. [42][43][44] Altered subcellular localisation has thus been suggested as an integral part of the disease pathogenesis.
TDP-43 is a modular protein that contains an N-terminus with a nuclear localisation signal, [45,46] two tandem highly conserved RNArecognition motifs (RRM), which preferentially bind single-stranded UG or TG-rich nucleotide sequences, [46] and a disordered glycine-rich low complexity C-terminus. This region contains most of the pathologically significant mutations [47][48][49] and was for this reason mistakenly considered the only region important for TDP-43 aggregation. This view was supported by early studies that had shown that C-terminus was sufficient for aggregation. [50] Taking on board the hypothesis of normal interactions as the basis of the design of specific anti-aggregants, we reasoned that we could perhaps use RNA as a way to prevent aggregation. We first demonstrated that the isolated tandem RRMs and a longer construct excluding only the C-terminus are able to aggregate and misfold in vitro under mildly denaturant conditions; these were generated by subjecting the proteins to a temperature scan or simply incubating them at physiologic temperature. [51,52] This study ensured that, even if not the main promoter of aggregation, other regions, such as the RNA binding motifs, are part of the aggregation process of the full-length protein. In agreement with our data, clinically important TDP-43 mutations have been found in the RNA binding motifs. [53] We then selected RNA aptamers that could mimic the properties of the natural RNA partners of the protein. These are small molecules that are increasingly being used in different applications as an alternative to antibodies. They have the advantage of being small, manageable, and easily obtainable. We used an aptamer (RNA12) already described in the literature as able to bind the RRM domains of TDP-43 with nanomolar affinities [54] and its negative control, taken as the reverse and complementary sequence of RNA12. We found that stoichiometric ratios of RNA12 have a strong effect on aggregation as seen by the kinetics followed by thioflavin T fluorescence and quantification of the soluble protein. [52] Conversely the control had no effect or, if any, a negative one, an observation that indicates how relevant the role of a specific partner is in determining the solubility of a protein.
We also demonstrated that the effect is not purely electrostatic since heparin, the polyanion often used to mimic nucleic acids did not have any effect. Scrambled sequences of RNA12 were found to have a lower binding constant and an effect ca. 20% lower than the original aptamer, indicating that the effect is not only compositional but also sequence dependent. The reverse and complementary scrambled sequences did not have any effect.
Our findings suggest a model for the disease mechanism of TDP-43 associated pathologies (Figure 3). In the normal cell, TDP-43 shuttles in and out the nucleus bringing out its cargoes. Under pathological conditions, mutations may prevent RNA binding, leaving the protein "naked". This will thus allow misfolding and aggregation, causing the protein to accumulate in the cytoplasm. Our findings also highlight the dual role of RNA: when RNA-protein interactions are tight and specific, RNA may act as a chaperone that keeps TDP-43 in solution.
When binding is weak, mostly electrostatic, and unspecific, RNA functions as a polyanion and induces phase separation. A strong confirmation of our conclusions was recently published: RNA-binding deficient TDP-43, as induced by mutations and or acetylation within the RRM domains, was shown to lead to separation of the protein into intranuclear liquid spherical shells with liquid cores, which were named anisosomes because of their similarities to liquid crystals. [55] Formation of these droplets when RNA binding is impaired proves the role of this interaction for the physical state of TDP-43.

THE SCENARIO GETS MORE COMPLEX: CALCIUM REGULATED INTERACTIONS KEEP ANNEXIN A11 IN SOLUTION
A more recent perfect example of partner-orphan aggregation is that of annexin A11. This is a candidate gene that was only recently added to SOD1, TDP-43, FUS and many other genes as a putative cause of ALS. [56] As in all the other cases, annexin A11-related ALS is associated with, and thought to cause pathology by, protein aggregation and misfolding. [56] Several different mutations were identified in annexin A11 by screening a large cohort of familial ALS patients. [56,57] An implication of this protein in human disease is not new, as deregulation and mutations in the ANXA11 gene are known to be associated to autoimmune diseases, sarcoidosis and cancer. [58] Annexin A11 is a 56 kDa protein that belongs to the large family of Ca 2+ -binding annexins whose function is lipid binding. Annexins are subdivided into twelve different subfamilies, most of which are characterised by a highly conserved C-terminal annexin core formed by four helical repeats of an annexin motif, with the exception only of annexin A6, which contains eight annexin motifs. [59] These motifs are approximately 70 amino acids long and arranged in five α-helices, called A to E. [60] Ca 2+ is coordinated by residues at the loops connecting the AB and DE helical hairpins. Helix C packs orthogonally against the other components of the bundle.
Lipid binding involves the core domain. Regulation is mediated in the whole family by a Ca 2+ -induced conformational rearrangement, which may be further regulated by the N-terminus, which has great variability among the various sub-families. The length of the N-terminus is also variable, typically being between 10 and 30 residues. In annexin A1, for instance, the N-terminus is 33 residues. In the Ca 2+ -free form this region of annexin A1 inserts into the core domain and forms a new helix that pushes aside helix B of the third annexin motif. [61][62][63]

F I G U R E 4
Model of annexin A11 aggregation as proposed by Smith et al. [56] (A) Under physiologic conditions, annexin A11 forms calcium-mediated interactions with calcyclin. In the absence of calcium the low-complexity N-terminal domain would fold back and interact with the C-terminal annexin domain, in analogy with what happens with annexin A1. In the presence of calcium, the N-terminus would become available for interaction with calcyclin through a helix encompassing G38 and D40. This regulation would prevent protein aggregation and allow normal function. (B) When these residues are mutated, calcyclin interaction would be impaired and the protein would become susceptible to aggregation, almost certainly promoted by the low-complexity domain Ca 2+ binding forces a remodelling of the domain that causes helix B to displace the N-terminal helix and come out from the core domain where it unfolds. Annexin A11 has a much longer N-terminus than the rest of the family members, comprising ∼200 residues. The sequence of this region is enriched in Gly, Tyr, and Pro, suggesting intrinsic disorder, [64] and seems to be important for ALS in that the most common ALS-associated mutations, p.D40G and p.G38R, map there. [56] The N-terminus mediates interactions with a number of other proteins among which are the apoptosis-linked gene-2 protein (ALG-2) and S100A6 (calcyclin). [65] On the basis of a weak sequence homology, it was suggested that the N-terminus of annexin A11 could contain a putative helical motif around residues 38-59 that could dictate, by analogy with annexin A1, a similar regulation. [56] The suggested model of regulation would thus be that also in annexin A11 the helix could interact with the core in a calcium dependent fashion and, when released by Ca 2+ binding would then facilitate binding to calcyclin. Both the annexin A11 p.G38R and p.D40G variants have been proven experimentally to abolish binding to calcyclin.

This evidence makes it clear that the involvement of annexin A11
in ALS is consistent with the concept of partner orphan disease: mutations that abolish interaction with calcyclin would expose surfaces that would normally be protected. This in turn would induce aggregation and misfold as seen in disease (Figure 4).

NATURAL INTERACTIONS AND BEYOND: WHAT ROLE DO QUINARY INTERACTIONS PLAY?
A question that arises spontaneously from these analyses is about the non-specific interactions that occur in the crowding environment of the cell. Do they play a role in protein solubility? An aspect of physiological aggregation that has gained appreciable attention in the last 2 decades is the influence of the cellular environment on protein stability. It is now commonly accepted that the cellular milieu is crowded and confined, [66] where these terms were predicted to result in the stabilisation of folded proteins owing to the reduction of available volume. [67] This interpretation was probably optimistic, possibly due to an overestimation of the volume difference between folded and unfolded species. [68] Recently, a review on the chemical organisation of biological systems has proposed a more complex but possibly more realistic model according to which the same macromolecule may be affected in different ways, depending on the cell cycle, the specific compartment where they are, or the forces acting on the cell. [69] These authors stressed the concept that the factors shaping protein interaction networks can affect the free energy of marginally stable proteins and sparsely-populated states, directing the evolutionary pathways in an organism.
More in general, it has been pointed out that the presence of many macromolecular components in the cell media may give rise to stabilisation or destabilisation effects under the general concept of quinary structure (the fifth level of protein architecture) in addition to primary, secondary, tertiary and quaternary structures. The concept was coined to refer to macromolecular interactions that are transient in vivo. [70] In the words of the author: "Such interactions will not be evident from the composition of purified proteins, but they may constitute an important source of constraints on changes in primary structure." It is now believed that quinary interactions are shaped, in the course of evolution, by adaptation to the physiology of cells [65] and can lead both to stabilisation or destabilisation of proteins, depending on relative interactions. [71] This means that, at least in some cases, a weaker and probably less durable effect on preventing protein aggregation might be exerted by these transient and unspecific interactions. A possible consequence of quinary interactions in the case of misfolding diseases might be a contribution to the long lag times observed in some diseases: the presence of quinary interactions might stabilise, at least temporarily, species with strong intrinsic tendency to aggregate.
All these considerations add a further layer of complexity to the problem of which interactions are "chaperoning" and which ones detrimental, but, at the same time, they contribute to a deeper understanding of the field.

DISEASES IN SEARCH OF A PARTNER: THE EXAMPLE OF AΒ
The presented examples strongly suggest that a way to approach diseases whose therapy has remained stagnant for years might be to search for natural interactors. An example that cries out for a change in perspective is that of the Aβ peptides associated with Alzheimer disease (AD), a pathology that remains out of the reach of treatment. The mechanism of AD [72] is inextricably linked to the presence of the Aβ peptides that are the proteolytic products of the amyloid precursor protein through the action of secretases. The non-symptomatic phase of AD lasts often several years, a period during which the peptides form plaques containing amyloid fibrils of Aβ [73] . These species were originally regarded as the main toxic determinant causing AD according to what is now known as the "amyloid hypothesis." [72] It is interesting to note that, although the aggregates were first observed by Alois Alzheimer in 1906 and considered responsible for the disease, they were soon discarded as possible culprits by the same scientist. [74] In fact, although the amyloid hypothesis has dominated the AD literature for many years, it became more and more evident that the toxicity of the aggregates is probably just marginal. The main cause of cell damage became the Aβ oligomers of different stoichiometries formed during the aggregation pathway. [75] Unfortunately, oligomer isolation in quantities large enough for structural studies hampers a better understanding of this hypothesis, as elegantly described in a recent review. [76] It is thus essential to ask whether Aβ peptides have a nonpathological function that could enable us to find beneficial interactions to stabilise the peptides. Itzhaki and coworkers have for years advocated a role of Aβ in innate immunity. [77] In support of their hypothesis, it has been shown that Aβ peptides are highly conserved across vertebrates [78] and present in all healthy individuals. [76] Their complex biosynthesis is inconsistent with the just occasional formation of protein debris. We recently demonstrated that Aβ peptides are structurally and functionally similar to anti-microbial peptides. [79] Getting inspiration from the already rich interactome of Aβ might thus provide a better answer for AD treatment. Among the several interacting proteins, a particularly interesting one could be transthyretin, whose levels have been shown to enhance Aβ solubility. [80,81]  more examples may reiterate the same message: tau, for instance, a protein whose function is to bind microtubules and maintain their stability in axons, aggregates when mutations prevent interaction with microtubules [82] . Other interesting examples, such as insulin [83] or macroglobulin [84] , are associated with diseases other than neurode- suggests that providing the partner either fully or mimicked by small molecules may be a better strategy to prevent protein aggregation. This aspect has important consequences for future strategies in drug design. A shift of the paradigm also suggests the importance of a holistic view of proteins, seen not just as the cause of aberrant function but, more generally, as entities that need to be understood in full before starting therapeutical interventions.