Peptides in proteins

The protein universe as we know is composed of folded structures and intrinsic disordered regions. The latter may adopt structures upon interaction with binding partners. In addition, some proteins contain C‐terminal extensions which act as independent functional units in the context of the protein. Since their activity does not depend on the protein context they can be considered as peptides in proteins. To illustrate this principle, we here discuss the C‐terminal extensions of IgM antibodies which dictate their assembly and the molecular chaperones Hsp90, Hsp70, and Hsp104 which use C‐terminal peptide extensions as a docking site for interaction with different co‐chaperones.


| INTRODUCTION
Proteins are known as chains of amino acids which fold into welldefined three-dimensional structures. In recent years, intrinsically disordered proteins were discovered which are devoid of structure in isolation. Additionally, an increasing number of proteins has been identified which contain folded and intrinsically disordered regions (IDR). About one-third of the human proteome may contain IDRs. 1,2 Importantly, these IDRs can adopt three-dimensional structures upon interaction with ligands such as DNA or other proteins, allowing specific interactions with different ligands thus expanding the repertoire of complexes as, e.g., seen for transcription factors. 3 The range of structural elements present in proteins is further expanded by Nterminal signal sequences of 30 to 50 residues needed for targeting to mitochondria and the endoplasmic reticulum (ER) that are cleaved after translocation across the membrane. 4 In addition, short (three to four residues) C-terminal targeting sequences exist. 5 Here, we highlight the existence of C-terminal elements in proteins which act similarly to independent peptides attached to a protein structures consisting of several domains that each adopt the Ig fold. 6 The variable N-terminal domains of the HC and LC (V H and V L , respectively) bind to antigens with high affinity and specificity. 7 The C-terminal part (Fc) interacts with receptors and is responsible for effector functions that drive the immune response such as activating the complement system. 8 Two classes of antibodies, IgM and IgA, contain a short peptide extension of~20 residues located at the C-terminus of their HCs. This so-called tailpiece (tp) dictates the polymerization of the entire antibody into complex macromolecular structures such as IgA dimers or IgM hexamers. Therefore, this process is of major importance for the immune response of vertebrates. Since most progress has been made in the investigation of the IgM-tp (μtp), we focus here on the role of the μtp in IgM hexamer formation.
Upon initial exposure to an antigen, IgM is the first antibody produced in the primary immune response. [9][10][11] To ensure rapid activity, IgMs are secreted before B cells undergo somatic hypermutation, which increases antigen binding affinity and specificity. 12 Consequently, IgMs show generally lower antigen affinity than other immunoglobulin classes. However, this is compensated by FIGURE 1 Tailpiece peptide-mediated assembly of IgM. (A) Scheme of hexameric IgM. HC and LC domains are illustrated in blue and gray, respectively. HC and LC domains are illustrated in blue and gray, respectively. S-S indicates a disulfide bond. (B) Crystal structure of the monomeric Cμ4 domain (PDB: 4JVW, green) with the modeled domain boundary (orange ribbons) and μtp extension (red ribbons) as indicated by SAXS measurements with corresponding amino acid sequence of the murine μtp. (C) SAXS-, NMR-, and X-rayderived modeled structure of the hexameric IgM Fc fragment including the Cμ2 (orange), Cμ3 (blue), and Cμ4 (green) domains as well as the μtp (red ribbons) oligomerization ( Figures 1A and 2), leading to multiple antigen binding sites in one antibody complex and an overall high avidity. 13 IgMs are secreted as pentamers and hexamers. In pentameric IgM, one IgM subunit is replaced by a small cysteine-rich protein, called J chain (~15 kDa). 14-16 Its function and secondary structure remain unknown. 17 While most other Igs comprise three constant heavy chain domains, the HC of IgM is C-terminally extended by the Cμ4 domain ( Figure 1A,B). 18,19 In contrast to IgM, IgA is highly antigenspecific and accomplishes rather local immune functions. 20 Therefore, the majority of IgA molecules serve as a first-line defense against foreign pathogens. 21 IgA exists in two distinct, tissuedependent isoforms, IgA1 and IgA2. 21 Similar to IgM, secretory IgA comprises a tp which is required to form defined dimers in a complex with the J chain. 16 The IgM hexamer and pentamer are highly symmetric assemblies ( Figure 1A,C). 22 Electron microscopy (EM) and small angle X-ray scattering (SAXS) revealed a mushroom-shaped quaternary structure of the polymer with the Cμ4 domains and the μtps located in the stem. 23,24 Already in 1986, it was found that the μtp is required for IgM oligomerization. 25 More recently, progress was made on the mechanistic role of the μtp and the contribution of its residues (Müller et al, 2013;Pasalic et al, 2017).
The μtp and the IgA-tp (αtp) share remarkable sequence homology ( Figure 3A). Of highest importance, both contain the important An important aspect in this context are the boundaries of the protein domains to which peptides are attached. They are usually determined either on a structural (e.g., middle domain of Hsp90), functional (e.g., ATP-binding domain of Hsp90), or genetic (e.g., constant Ig domains) basis. Accordingly, peptide extensions of proteins are often assigned as part of a protein domain. 26,27 Given this heterogeneity of domain and peptide boundary definitions, also the distinct length of the tp is rather undefined. On the functional level, it was shown that the very C-terminal 18 residues of the IgM HC (Pro-Tyr) are necessary for the μtp function. 28 However, the presence of Gly and Lys in both IgM and IgA suggests them to be part of the conserved tp sequence ( Figure 3A). Recently, we proposed a protein stability-based approach to assign domain boundaries of Ig domains. 29 The conformational stability of an IgG domain was strongly increased by simply shifting the C-terminal domain boundary. The reason for this strong stability enhancement is the interaction of the C-terminal Lys with an adjacent α-helical loop, ensuring the formation and stabilization of this α-helix and a very C-terminal β-strand. Thus, in particular Lys is supposed to have a beneficial effect on the stability of the Cμ4 domain and should therefore be considered part of the Cμ4 domain. Thus, the sequence of μtp should be defined from Pro559 toTyr576 of the μHC ( Figure 1B).
Interestingly, the secondary structure of the μtp is still unknown ( Figure 1B). Since it could not be determined along with the Cμ4 domain by X-ray crystallography, the μtp is potentially flexible or intrinsically unfolded and therefore likely to adopt various transient conformations. 24 Secondary structure prediction suggested two βstrand segments, a long (Thr560-Thr571) and a very short (Cys-Tyr) one, linked by coiled residues. 28 Of note, within the μtp, the "NVS"motif containing Asn563 represents an active N-glycosylation site.
The μtp glycan supports the assembly of IgM pentamers with the J chain. 30, 31 Remarkably, the isolated μtp peptide is able to form dimers upon disulfide bond formation, which is accompanied by a change in its relative secondary structure toward a larger β-strand content. 28 The role of each μtp residue in IgM assembly was determined by alanine-scanning mutagenesis 28 employing Cμ4-tp. Cμ4-tp is the smallest IgM construct capable of forming regular higher order oligomers in vitro, ie, hexamers of Cμ4-tp dimers, 24 corresponding to hexamers of full-length IgM. This screen emphasized that Cys575, the penultimate residue in the μtp, is essential for IgM polymerization.
However, Cys575 alone was not sufficient for oligomerization in some tp variants indicating the importance of other μtp residues. 25,32 The effects of mutations in μtp residues were highly similar for Cμ4-tp in vitro and full length IgM in cells suggesting that indeed the μtp is the decisive factor of IgM oligomerization and that the Cμ4-tp domain is an excellent minimal model to study IgM association. Several positions were identified for which the mutation to Ala did not interfere with oligomer formation. However, when Tyr562, Val564, Leu566, Ile567, Met568, and Cys575 were mutated to Ala, no oligomerization was observed, and either no IgM secretion was observed or only monomers were secreted from cells. Thus, in particular, the hydrophobic region from Val564-Met568 and Cys575 are of major importance rouxii, and S. cerevisiae. Acidic amino acids are depicted in blue, basic amino acids in magenta, small and hydrophobic amino acids in red, and hydroxyl-, sulfhydryl-, amine, and glutamine in green for IgM oligomerization. This hydrophobic stretch is located within the predicted β-strand, suggesting that this secondary structure element and/or its hydrophobicity may be important for oligomerization. IgM oligomerization is a multistep mechanism governed by conformational transitions of the μtp (Figure 2): the initial step required for polymerization is the disulfide formation between two adjacent Cys575, connecting two IgM subunits to a dimer. Upon this event, the μtp undergoes a conformational shift leading to the exposure of largely hydrophobic amino acids, which had formerly been buried. It seems that hydrophobic interactions lead to oligomer formation via transient intermediate states, finally giving rise to Cμ4-tp pentamers and hexamers.
Interestingly, the C-terminal μtp is also sufficient to drive oligomerization when fused to other Ig classes such as IgG. 33 Besides polymerization, the μtp serves an important protein quality control (QC) function in the early secretory pathway of the ER. 35 Given the fact, that about one third of the entire proteome matures in the ER, QC is of major importance to ensure that properly folded, functional proteins are secreted and to prevent overloading of the protein biosynthesis and secretion machinery. 36,37 To meet these requirements, a number of chaperones and enzymes manage secretory protein maturation and QC in the ER. The redox-specific enzyme ERp44 can specifically detect free, reduced cysteines of proteins by forming mixed disulfide bonds with its clients. 38 Accordingly, the unassembled µHC is not secreted by the cell but retained by ERp44 in the ER lumen to be either correctly assembled into IgM oligomers or eventually degraded if assembly does not occur. 39 Interestingly, this mechanism involves another C-terminal peptide motif of the "captured" proteins: ERp44 and other ER-resident proteins such as BiP (the ER-resident Hsp70 homolog) have a C-terminal RDEL or KDEL motif, respectively, that mediates interaction with the KDEL receptor preventing secretion of these ER-resident proteins. 40 Taken together, the C-terminal tp extensions of IgM and IgA are essential for Ig oligomerization, for ER QC as well as for the interaction with other proteins such as the J chain. Thus, they are at the heart of the biological function of these antibodies and an efficient immune response.

| PEPTIDE-GUIDED ASSEMBLY WITHIN THE MOLECULAR CHAPERONE MACHINERY
Another striking example for peptides in proteins are the C-terminal peptide extensions that emerged in several molecular chaperones upon transition from prokaryotes to eukaryotes. These peptide tails are used as anchors for the binding to partner proteins. This concept is most extensively used in the Hsp90 machinery. Hsp90 is important for the folding, stability, and activity of several hundred different client proteins in the cytosol of eukaryotic cells. [41][42][43] As the interactome of Hsp90 includes kinases, transcription factors, ribosomal proteins, and proteins of the ubiquitination system, the regulatory potential of the chaperone ranges from signal transduction, transcription, and translation to protein synthesis and degradation. [44][45][46][47] Hsp90 from prokaryotes and eukaryotes exhibits a conserved three-domain structure (Figure 4) Table 1). Their interaction is mediated by a spe-   Crystal structures of yeast Hsp82, yeast Hsp104, and human Hsp70. The closed conformation of the yeast Hsp90 dimer is shown on the left (blue) with a schematic representation of the tail domains as blue spheres with the C-terminal amino acids MEEVD (PDB: 2CG9, derived from the Hsp82-Sba1 complex). In the middle, a monomer Hsp104 NBD is depicted in red (PDB: 5VY9). The Cterminal tail region is depicted as red spheres with the C-terminal amino acids DDLD. The transition of the C-domain to the tail piece is depicted as sticks with dark blue colored nitrogen and red oxygen atoms. On the right, the Hsp70 NBD is depicted in dark green (PDB: 5BN8), and the SBD is depicted in light green (PDB: 4PO2). The Cterminal tail region is depicted as light green spheres with the Cterminal amino acids EEVD. The transition of the C-domain to the tail piece is depicted as sticks with dark blue colored nitrogen and red oxygen atoms. The tail piece of the second Hsp82 monomer is depicted as cartoon. Both structures show the last amino acids breaking the helix conformation TPR-co-chaperones are also used to recruit further effector proteins to Hsp90. In this context, Tah1/RPAP3 mediates Pih1 binding and stabilization of the Hsp90-bound complex. 71 Tah1/RPAP3 is also involved in rRNA processing 72 and mTOR stability. 73 In contrast to these co-chaperones which feature a single TPR domain, Hop/Sti1 contains three TPR domains with different specificities for Hsp90 and for the molecular chaperone Hsp70. Upon transition from prokaryotes to eukaryotes, also Hsp70 acquired a C-terminal extension of~30 amino acids containing an EEVD motif that binds to TPR domains. In general, the Hsp90 system is functionally strongly con- Strikingly, the concept of adding a peptide extension to a molecular chaperone upon transition from prokaryotes to eukaryotes is also seen in yeast Hsp104. This hexameric AAA + ATPase ( Figure 4) is able to unfold proteins and dissolve protein aggregates. [80][81][82] The acidic Cterminus (IDDDLD) of Hsp104 ( Figure 3D) interacts with the TPR domains of the co-chaperones Sti1 (TPR1), Cpr7 and Cns1. 83 Hsp104 is of major importance at severe heat stress, where survival increases up to 1000-fold compared with strains lacking Hsp104. In addition to its disaggregase activity, Hsp104 can prevent aggregation of proteins and hold them soluble. 84 Interestingly, metazoa lack an Hsp104 orthologue. Their different components seem to compensate this loss. 85,86 Taken together, the C-terminal peptide extensions of Hsp90, Hsp70, and Hsp104 bind to TPR domains of a number of different proteins resulting in a competition for co-chaperone binding which is influenced by the levels of the respective proteins.

| CONCLUSIONS
The examples discussed in this review establish that nature evolved protein tags as independent functional units that can be added to the C-terminal ends of proteins to foster protein interactions. Interestingly, in all cases discussed, versions of the respective proteins exist in which the peptide tags are lacking. The repertoire ranges from short extensions of a few amino acids to more than 30 residues, thus covering the range typical for functional peptides. Different functional principles become obvious: in the case of antibodies, the homotypic interaction of two tail peptides seems to convey the steric information for the formation of specific oligomers. This requires not only the covalent linkage of the tp cysteines but specific structural information encoded in the tp's hydrophobic region. Thus, the µtp is a rather complex functional unit. In contrast, the current knowledge on the Hsp90/70-tp suggests that the mode of action of this peptide extension is less complicated.
Its function seems to correspond to that of a fishing rod with the (M) EEVD motif at its end as the prey for TRP-domain containing proteins.
Here, complexity is added by the range of TPR-proteins competing for the interaction and the expansion of function that is achieved by the additional domains present in the TPR-proteins.
A comprehensive view of peptides in proteins is still lacking. Given that the two examples described here originate from completely different areas, it can be envisioned that many more (and different) peptide tags exist in different functional contexts. It will be interesting to see how this basic theme is orchestrated to support diverse biological processes.  of the monomers are illustrated in light and dark blue with the C-terminal extension and the MEEVD motif (terminal blue square). The substrate (yellow circle (S)) transfer from the Hsp70/Hsp40 chaperones (dark (Hsp70 nucleotide binding domain (NBD)) and light green (Hsp70 substrate binding domain (SBD)), red (Hsp40) is shown with possible cochaperone interactions in the Hsp90 states. The substrate leaves Hsp90 after ATP hydrolysis. TPR domains of co-chaperones are depicted as U-shaped boxes. Hop (purple) enters the cycle at the ATP unbound open state binding to Hsp70 and Hsp90 with its three TPR domains and leaves the circle upon closing of Hsp90. The PPIases Fkbp51 and Fkbp52 (ocre) enter the cycle upon Hsp90 ATP binding. The co-chaperone p23 stabilizes the Hsp90 closed state (brown) and dissociates from Hsp90 after ATP hydrolysis on the right. Schematic representations of the co-chaperones Cyp40 (brown), PP5 (dark gray), RPAP3 (orange), and CHIP (white) are shown on the right