This article was originally published online as an accepted preprint. The “Published Online” date corresponds to the preprint version. You can request a copy of the preprint by emailing the Biopolymers editorial office at email@example.com
Fibrous proteins are present within all kingdoms of life and widely divergent organisms, ranging from protozoa to humans. One of the most abundant proteins in mammals is the fibrous protein collagen. Although collagen fibers have some remarkable mechanical properties,1 several extra-corporal protein fibers exceed even these. Prominent examples are the byssus of mussels which contains pre-collagens as major building blocks and the silk fibers used in the cocoon of the mulberry silkworm Bombyx mori or in spider silk found in orb webs. For details see recent reviews on human collagens,1 mussel pre-collagens,2, 3 silkworm silk,4, 5 and spider silk.6–9
Here, we highlight some similarities concerning assembly of the proteins into fibers. All four protein classes exhibit a highly repetitive core region with specific amino acid motifs repeated up to several hundred times. The repetitive core regions largely determine the macroscopic properties of the fibers, i.e., their mechanical behavior. However, the precise control of initiating assembly and of fiber elongation depends on non-repetitive terminal domains which can be post-translationally modified and proteolytically processed as is the case with mammalian collagens (Figure 1). In both silkworm and spider silk, as well as in the mussel pre-collagens, the terminal regions are likely maintained in the mature protein, possibly without proteolytical processing.3, 10 The role of the termini during storage and assembly has been thoroughly characterized in case of mammalian collagen. Here, we highlight recent results concerning the role of terminal domains in spider silk proteins in storage and assembly.
SPIDER SILK PROTEINS
Female spiders from the order Araneae are able to produce up to seven different silk types each tailored to a specific purpose (i.e., web frame, capture spiral, prey wrapping, glueing etc.) with different mechanical properties (for more details see Refs. 11–14 and other chapters of this special issue). The underlying proteins, also called spidroins, are named after the glands in which they are produced, e.g., major ampullate spidroin (MaSp), flagelliform spidroin (FlagSp), or aciniform spidroin (AcSp). Despite the different mechanical properties of the fibers, which are based on different amino acid sequences and on slightly different processing conditions, they all share one common structural principle on a molecular level. All spider silk proteins comprise a large repetitive core sequence, accounting for approximately 90% of the amino acid residues, flanked by non-repetitive amino- (NRN) and carboxy-terminal (NRC) domains (Figure 1D). The large core domain consists of ensemble repeats, each ensemble consisting of 40–200 amino acids (Figure 2A) which may be repeated up to 100 times in some cases.15–18 Some ensembles (in MaSp, MiSp, and FlagSp) comprise distinct amino acid motifs, like β-sheet-crystal forming polyalanine stretches, but the type and arrangement of those small motifs differ significantly between silk types. Therefore, the sequence similarity in the ensemble repeats can be quite low between different silk types.12, 15
In contrast to the core domains, the amino- and carboxy-terminal domains are highly conserved for each silk type throughout different species (Figure 2B) and in some cases even between different silk types (Figure 2C).19–22 The sequence similarity may decrease between distantly related species, but interestingly, the predicted size of the termini seems to be consistent. While mature NRN-domains (without signal sequences for secretion) are composed of 120–130 amino acids, NRC-domains are always smaller with on average 87–110 amino acid residues.22 In comparison, NRN-domains seem to have a higher homology and similarity than NRC-domains,22 indicating a more conserved function for the amino-terminal region.
THE TERMINAL DOMAINS OF DRAGLINE SILK PROTEINS
The existence of highly conserved terminal domains in different silk types and species is a first indication of a highly important function. For longer period, it was not clear whether the non-repetitive terminal domains are actually present in any the mature protein fiber, but both immunological experiments using specific antibodies and mass-spectrometry analysis revealed at least the presence of the carboxy-terminal domains in mature fibers of different silk types.10, 22–24 Since the two dragline silk spidroins, MaSp 1 and 2, have slightly different amino acid sequences, specific antibodies were created against the individual NRC-domains. Using these antibodies, it was possible to visualize the non-homogeneous distribution of the two MaSp proteins within the fiber.25 However, several results based on various sequence comparisons together with structural data of different silk types strongly indicate that the non-repetitive amino-terminal domain is also present in the mature fiber.20, 22, 26
A significant step toward understanding the role of the terminal domains in storage and assembly was the determination of atomistic structures of the non-repetitive terminal regions of MaSp proteins. Interestingly, both terminal domains are mainly composed of unique α-helix-bundles but with different folds, neither of which have structural homologues so far.26–28
The MaSp NRN domains from Euprosthenops australis and Latrodectus hesperus (black widow) consist of five α-helices and are monomeric at pH 6.8 and above. On acidification they form tight, antiparallel dimers (Figures 3A and 3B). The dimer interface is mainly hydrophobic and conserved throughout different species and silk types.26 The dimer reveals an interesting charge distribution with positive and negative charges clustering in opposite directions, forming a macromolecular dipole. Modeling experiments showed that this feature seems to be conserved in different NRN domains from different species and silk types. Another interesting feature of the dimeric form is a structurally conserved crevice located at the poles. The crystal structure of NRN of E. australis revealed that the linker region of one subunit connecting NRN with the repetitive core domain fits into this crevice, probably acting as a structural lock.26
The NRC domain of the MaSp2 analog ADF3 (Araneus diadematus fibroin 3) from A. diadematus (European garden spider) also consists of five α-helices (Figure 3C). In contrast to the amino-terminal domain, the NRC region is a permanent, parallel dimer, covalently linked by a disulfide bond. The cysteine residue in each monomer is highly conserved within MaSp NRC domains (Figure 2B). The α-helices are arranged in a way that one helix of each monomer is domain swapped into the other monomer, forming a clamp-like structural arrangement.27 Strikingly, the primary sequence of the NRC region is the most hydrophobic part of the known ADF3 sequence, with all hydrophobic residues being buried within the dimer in its folded state. The solvent exposed surface shows an even distribution of hydrophilic patches ensuring the solubility of the domain.27
MaSp TERMINI SWITCH THEIR STRUCTURE UPON EXTERNAL TRIGGERS
To fully understand the properties of spider silk fibers, it is necessary to investigate the relationship between the structure of the mature fiber and the primary sequence of the underlying proteins. It is also important to understand the assembly process in which the spidroins are converted from a soluble structure into a solid fiber, taking only milliseconds or less at ambient temperatures (Figure 4).29, 30 After secretion from the glandular cells, the major ampullate spidroins are stored in the ampulla at remarkably high concentrations of up to 50% (w/v) at approximately pH 7 and in the presence of sodium chloride.31 Under such storage conditions, micellar-like structures of the spidroins have been reported, likely presenting a metastable state to prevent undesirable aggregation.32–34 The assembly process is initiated in the tapered S-shaped spinning duct where sodium and chloride ions are exchanged for the more kosmotropic potassium and phosphate ions, accompanied by a reduction of the pH value to about 6.2. As shear forces increase along the duct, β-sheet crystals form and align along the fiber axis. Recent results, supported by the atomic resolution structures of MaSp-termini (see above), indicate that both terminal domains act as environmentally triggered structural switches significantly influencing the solubility as well as the initiation of assembly.26–28, 35
Both MaSp1 amino-terminal regions, one from Euprosthenops australis and one from L. hesperus (black widow), are monomeric at storage conditions (pH 7 in presence of sodium chloride). In the complete absence of salt, the NRN domain has a slight tendency to dimerize, which is fully suppressed in the presence of 300 mM sodium chloride.28, 36 The presence of the NRN domain in engineered, recombinant spider silk proteins decelerates the rate of aggregation by more than one order of magnitude, indicating a stabilizing role during storage.26
Acidification to a pH below 6.4, which naturally occurs during the spinning process, induces the formation of a tight antiparallel NRN dimer, probably acting as an intermolecular crosslinker. This dimerization is completely reversible and strongly depends on two charged residues. One aspartic acid is highly conserved in all sequenced NRN-domains from different species and spidroins.22 Mutating this aspartic acid residue into its non-charged counterpart, asparagine prevents dimerization as observed by mass-spectrometry.35 NRN further assembles into higher oligomeric structures as seen in light-scattering experiments, supporting the fact that NRN is critically involved in storage and assembly.26
In contrast to NRN, the NRC domain is a covalently linked dimer with one intermolecular disulfide bond. In further contrast to the antiparallel dimer of the amino-terminal domain, the NRC domain reflects a parallel dimer, acting as a “clamp” to hold two spidroins together. Recombinant spidroins bearing the NRC domain are able to form supramolecular assemblies resembling micellar-like structures, depending on the spidroin concentration and the ionic strength of the solution. The presence of NRC seems to stabilize spidroins against phosphate induced aggregation.37 Therefore, together with the NRN domain, NRC also plays an important role in storage.38, 39
Four charged residues per monomer in the NRC domain of ADF3 are the only charged residues in the entire known ADF3 sequence.18 These four charged residues form two intramolecular salt-bridges located at the junction of the terminal to the repetitive core domain. These residues play an important role in stabilizing the NRC dimeric structure. In presence of salt the structure of the dimer is stabilized. With decreasing ionic strength the structure surrounding the salt bridges becomes more flexible, exposing hydrophobic patches which can enhance the assembly process in the spinning duct.
In vitro it could be shown that the presence of NRC domains significantly influenced fiber assembly yielding even macroscopic fibers which could not be detected in absence ofsuch domain.40–42 This behaviour is even more pronounced in the presence of shear stress where recombinant spidroins with NRC assemble into millimeter-long fibers, while in the absence of the NRC domain only amorphous aggregation could be detected.27, 39, 43 The assembled fibers seem to consist of bundles of smaller fibrils ordered along the fiber axis. Intriguingly, β-sheet crystals formed by the polyalanine stretches are aligned along the axis of these fibers similar to their arrangement in the natural silk fiber.44, 45 This finding indicates that in addition to its involvement in storage of the spidroins, the NRC domain also influences the pre-orientation of the structures of the repetitive core domain.
Combining structural information of both terminal domains with data from fibril assembly experiments, it is evident that both terminal domains contribute to the solubility of the spidroins under storage conditions, either by preventing unspecific aggregation or by facilitating the formation of micellar-like structures in the dope. Upon transfer of the dope to the spinning duct the environmental conditions change (including acidification, ion exchange, shear stress), influencing/changing the fold of the terminal domains. Under these altered conditions, both domains show a structural “switch.” In case of the amino-terminal domain, this results in a tight antiparallel dimer which acts as an intermolecular crosslinker facilitating the assembly process. The structural switch in NRC-domains, on the other hand, leads to exposition of hydrophobic patches which are able to enhance the assembly process and to direct spidroin alignment.
Small terminal domains are highly important for the assembly of structural proteins such as collagens or silks. Recent structural insights into termini of major ampullate spidroins indicate their involvement in both storage and assembly. One remaining question is concerning the contribution of the termini to the mechanical properties of the mature fiber.
Based on the first solved protein structures, it would be of interest to characterize further terminal domains of other silk types and silk from other spiders. Similarities or differences in their structure and function would deepen the understanding of the silk assembly process — a prerequisite of mimicking silk fiber assembly in vitro.
The authors thank Eileen Lintz for proof reading the article.