The shell-forming proteome of Lottia gigantea reveals both deep conservations and lineage-specific novelties



B. Marie, UMR 6282 (ex 5561) CNRS Biogéosciences, Université de Bourgogne, Dijon, France

Fax: + 33 (0)3 80 39 63 87

Tel: + 33 (0)3 80 39 63 72


F. Marin, UMR 6282 (ex 5561) CNRS Biogéosciences, Université de Bourgogne, Dijon, France

Fax: + 33 (0)3 80 39 63 87

Tel: + 33 (0)3 80 39 63 72



Proteins that are occluded within the molluscan shell, the so-called shell matrix proteins (SMPs), are an assemblage of biomolecules attractive to study for several reasons. They increase the fracture resistance of the shell by several orders of magnitude, determine the polymorph of CaCO3 deposited, and regulate crystal nucleation, growth initiation and termination. In addition, they are thought to control the shell microstructures. Understanding how these proteins have evolved is also likely to provide deep insight into events that supported the diversification and expansion of metazoan life during the Cambrian radiation 543 million years ago. Here, we present an analysis of SMPs isolated form the CaCO3 shell of the limpet Lottia gigantea, a gastropod that constructs an aragonitic cross-lamellar shell. We identified 39 SMPs by combining proteomic analysis with genomic and transcriptomic database interrogations. Among these proteins are various low-complexity domain-containing proteins, enzymes such as peroxidases, carbonic anhydrases and chitinases, acidic calcium-binding proteins and protease inhibitors. This list is likely to contain the most abundant SMPs of the shell matrix. It reveals the presence of both highly conserved and lineage-specific biomineralizing proteins. This mosaic evolutionary pattern suggests that there may be an ancestral molluscan SMP set upon which different conchiferan lineages have elaborated to produce the diversity of shell microstructures we observe nowadays.


Novel protein sequences reported in this article have been deposited in Swiss-Prot database under accession nos. B3A0P1B3A0S4


acid-insoluble matrix


acid-soluble matrix


blue mussel shell protein


carbonic anhydrase


calcium-binding motif (E-helix-loop-F-helix) in a “hand” configuration


epidermal growth factor


expressed sequence tag


insulin-growth factor-binding protein


kilo-base pair


laminin G-like


Lottia uncharacterized shell protein


repeated low-complexity domain


secreted cysteine-rich protein


shell matrix protein


whey-acidic protein


zona pellucida


Over the last ∼543 million years, molluscs have evolved a wide variety of mineralized shell structures to serve a range of biological functions. The evolutionary success of this morphological innovation is reflected in their presence in almost every ecological niche on the planet. The broad morphological diversity of the 100 000+ species of shell-bearing molluscs [1] extends to a tremendous diversity of mineralogical textures found within the shell, including ‘prismatic’, ‘nacreous’, ‘foliated’, ‘cross-lamellar’, ‘granular’ and ‘homogeneous’ structures [2-5]. Despite this morphological and mineralogical diversity, all molluscan shells are synthesized by a deeply conserved mechanism; they are the result of the secretory activity of an evolutionarily homologous tissue known as the mantle which extrudes inorganic ions and/or amorphous mineral precursors, together with an extracellular organic matrix. All these ingredients self-assemble very precisely in an acellular medium at the interface between the mantle epithelium and the mineralization front. The organic matrix is incorporated into, and surrounds nascent CaCO3 crystals during the shell layer deposition.

Although the organic matrix represents only a fraction of the total shell weight (usually between 0.1 and 5% w/w), it is known to be essential for both controlling shell formation [6], and for imparting many of the remarkable physical properties (such as fracture resistance) on the mature biomineral. The biochemical characteristics of the organic matrix, usually purified and studied following decalcification of the shell, indicate that it is comprised of a heterogeneous set of macromolecules including mainly proteins, together with variable amounts of polysaccharides and, to a lesser extent, lipids and pigments [7-15].

The protein fraction of this organic matrix has been the subject of much research [16, 17]. Since the elucidation of the full-length primary structure of nacrein [18], the first molluscan shell matrix protein (SMP) to be described (from the pearl oyster), the number of SMPs appearing in public sequence databases has gradually increased. More recently, various high-throughput sequencing approaches based on the screening of mantle-derived cDNA libraries and next-generation sequencing methodologies such as RNA-seq, have been employed increasing this rate of discovery [19-22]. Although these DNA- and RNA-based techniques have significantly increased the number of shell-forming candidate protein sequences, they must be cross-referenced with alternative methods in order to identify true shell-forming proteins. Proteomic analyses focused on the characterization of organic material extracted directly from the shell, combined with the interrogation of mantle-derived nucleic acid datasets constitutes one such approach. This strategy has led to the description and robust identification of numerous novel SMPs from various molluscan species [23-27].

One key question concerning the evolution of the Mollusca is whether the diversity of extant shell structures, most of which appeared early during the evolution of this phylum [3, 28, 29], are in fact constructed from similar SMP assemblages, i.e. whether they truly share a common origin. There is little evidence for the existence of homologous SMPs shared within and between the various bivalve and gastropod models studied to date [20, 23, 24, 27].

In this study, we employed a proteomic approach to investigate the SMPs of an emerging model for biomineralization [30, 31], the giant limpet Lottia gigantea. The significant advantage of conducting such a proteomic investigation on SMPs of L. gigantea is that this is the first mollusc for which a draft genome and significant expressed tag sequence (EST) resources are publicly available. We describe the primary structure of 39 SMPs associated with the calcified shell, and based on conserved motifs we discuss the putative functions of these proteins in the calcifying matrix. We also search for homologues of these SMPs in other conchiferan molluscs, and discuss possible scenarios of molecular evolution of SMP genes and the origin of cross-lamellar shell structures.


The shell of L. gigantea

Like other Lottiidae [32], the shell of L. gigantea is a multilayered organomineral structure (Fig. 1). The thin nonmineralized outermost periostracum is comprised of only organic components. The rest of the shell is highly calcified and is composed of five distinct layers, named according to their position relative to the myostracum layer (M): M + 3 (outermost), M + 2, M + 1, M and M − 1 (innermost). The outermost M + 3 layer is calcitic and consists of an assemblage of large irregular spherulitic and prismatic structures composed of a mosaic of granular submicron grains [30]. The M + 2 layer consists of aragonitic small microneedle prisms, stacked obliquely to the surface. The M + 1 and M − 1 layers possess a characteristic cross-lamellar construction consisting of complicated hierarchical aragonite structures with first, second- and third-order lamellae [31]. The M layer contains large prismatic aragonite structures that are perpendicular to the shell surface.

Figure 1.

Shell layers of the giant limpet L. gigantea. (A) Low magnification SEM view of a transverse cross-section of the shell, and schematic representations of the different layers. (B) SEM of the cross-sectional area (boxed area in A) showing the five calcified shell layers (M + 3, M + 2, M + 1, M, M–1). (C) SEM detailing the different calcified layers. The outermost M + 3 layer consists of calcitic large irregular spherulitic and prismatic structures. The M + 2 layer consists of aragonitic small microneedle prisms. The M + 1 and M–1 layers possess a characteristic cross-lamellar structure. The M layer, the myostracum, contains large prismatic aragonite structures perpendicular to the shell surface.

In order to remove all potential bacterial, protein and soft tissue contaminants, and to investigate only proteins that are intimately associated with the mineral phase (e.g. SMPs), aragonitic shell layers of L. gigantea (M + 2, M + 1, M and M–1) were carefully cleaned with mechanical abrasion of the periostracum and the outermost M + 3 layer, and crushed into minute fragments that were thoroughly decontaminated with sodium hypochlorite. Following decalcification of this powdered shell material with cold acetic acid (5% at 4 °C), we subsequently extracted SMPs associated with the combined aragonitic layers (M + 2, M + 1, M and M–1). Proteins associated with the acid-insoluble matrix (AIM) represented ~ 0.5% of dry powdered shell weight, whereas the proteins associated with the acid-soluble matrix (ASM) represents only 0.05%.

Lottia gigantea shell matrix proteins

When analysed using 1D denaturing SDS/PAGE, ASM and AIM proteins displayed few discrete bands (Fig. 2). ASM and AIM protein banding patterns shared few components, such as the prominent AIM bands that were found around 35, 25 and 13 kDa. Twelve gel bands (b1–b12) were excised from the AIM SDS/PAGE and analysed by LC-MS/MS for protein identification. The rest of the AIM SDS/PAGE profile (without b1–b12 bands) was similarly analysed, without supplementary protein identification. Unfractionated ASM and AIM proteins were also analysed by LC-MS/MS following cleavage by trypsin. Peak lists generated from the MS/MS spectra were directly interrogated against the draft genome assembly Lotgi1_GeneModels_AllModels_20070424_aa ( using mascot software. The resulting data were investigated manually and filtered in order to remove redundant protein entries. In this manner, we could unambiguously identify 39 SMPs (Table 1 and Appendix S1). The full-length or partial sequences of 34 of these 39 SMPs are also present in L. gigantea EST datasets, and have now been deposited into the Swiss-Prot database (accession numbers B3A0P1B3A0S4). We notice that almost all conceptually translated genomic sequences that match our MS/MS peptides possess a predicted signal peptide (Table 1 and Appendix S1). This indicates that these bioinformatically predicted proteins are likely to represent their entire N–terminus and to be genuinely secreted by the mantle.

Table 1. Lottia gigantea shell matrix proteins. For complete data including MASCOT scores, blast results and sequence details see Table S1. Maj., major protein; Min., minor protein; Abs., absent from the detected protein list
FunctionProtein name (domain)Fraction (band/MW)aCalc. molecular mass and pISPLgig Acc.bEST TissueSWP Acc.Mann et al. [33]
  1. a Bands (b1–b12) were excised from AIM SDS/PAGE, as described in Fig. 2. b The Lotgi1_GeneModels_AllModels_20070424_aa complete genomic database used for the proteomic searches was downloaded from Lottia gigantea v0.1 genome project website (; c We notice that both peroxidase-like 1 and peroxidase-like 2 proteins, or their fragmented peptides, were detected in all MS/MS analysed SDS/PAGE bands from b1 to b12 (Table S1). d For this protein a new ORF model was deduced from the genomic sequence (Fig. S3). MW, molecular weight; SP, signal peptide; Lgig acc., Lottia gigantea genomic database accession number; SWP acc., Swiss-Prot accession number.

EnzymePeroxidase-like 1 (Peroxidase, G-rich, K-rich domains)

AIM (b1–b12)


191 kDa

pI = 5.0

Peroxidase-like 2 (Peroxidase, D-rich, G-rich domains)

AIM (b1–b12)c


158 kDa

pI = 8.4

Peroxidase-like 3 (Peroxidase domain)

AIM (b2∼120 kDa)


117 kDa

pI = 7.6

CA–1 (CA domain)

AIM (b6∼35 kDa)


42 kDa

pI = 6.5

CA–2 (CA, D-rich domains)

AIM (b4∼70 kDa)


69 kDa

pI = 5.9

Glycosidase 1 (Glyco_hydro_23 domain)

AIM (b5∼55 kDa)

43 kDa

pI = 5.0

Glycosidase 2 (Glyco_hydro_31, DE-rich domains)

AIM (b5∼55 kDa)

57 kDa

pI = 4.8

Yes174920Not foundMin.
Cyclophilin (Pro-isomerase domain)

AIM (b8∼18 kDa)

21 kDa

pI = 4.8

Extracellular matrix-relatedBMSP-liked (2 von Willebrand A, 2 CBM_14, LamGL domains)

AIM (b1∼160 kDa)


173 kDa

pI = 8.5

?New ORFdLarvaeB3A0P4Maj.
Uncharacterized protein 2/LUSP–2 (SCP domain)

AIM (b6∼35 kDa)


35 kDa

pI = 9.1

Uncharacterized protein 3/LUSP-3 (SCP domain)

AIM (b6∼35 kDa)


33 kDa

pI = 9.6

Uncharacterized protein 17/LUSP–17 (2 EGF-like, ZP domains)


53 kDa

pI = 4.7

Uncharacterized protein 24/LUSP–24 (2 EGF-like, ZP domains)


50 kDa

pI = 4.8

Uncharacterized protein 14/LUSP–14 (Chitin_bind_3 domain)


89 kDa

pI = 7.5

Uncharacterized protein 20/LUSP–20 (3 CBM_14, LamGL domains)


81 kDa

pI = 6.9

Perlustrin (IGF–BP domain)

AIM (b9∼13 kDa)

11 kDa

pI = 4.0

RLCD-containingUncharacterized protein 1/LUSP–1 (G-, Q-, M-rich RLCDs)

AIM (b4∼70 kDa)


68 kDa

pI = 9.2

Uncharacterized protein 11/LUSP–11 (M- and G-rich domains)

AIM (b10∼10 kDa)


13 kDa

pI = 10.8

Uncharacterized protein 12/LUSP–12 (M- and G-rich domains)

AIM (b10∼10 kDa)


13 kDa

pI = 9.8

Uncharacterized protein 5 /LUSP-5 (A-rich domains)

AIM (b9∼13 kDa)


23 kDa

pI = 10.4

Uncharacterized protein 7/LUSP-7 (PTTGGQ-repeat domains)



28 kDa

pI = 5.4

Uncharacterized protein 22/LUSP–22 (Q-rich repeat domains)


45 kDa

pI = 6.9

Uncharacterized protein 25/LUSP–25 (M-rich repeat domains)


18 kDa

pI = 9.9

AcidicAsp-rich protein (D-rich domains)

AIM (b3∼80 kDa)


42 kDa

pI = 3.5

Uncharacterized protein 23/LUSP–23 (D- and E-rich repeat domains)


22 kDa

pI = 3.6

Uncharacterized protein 9/LUSP–9 (no characterized domain)

AIM (b5∼55 kDa)

29 kDa

pI = 3.7

Uncharacterized protein 10/LUSP–10 (D- and Q-rich repeat domains)


68 kDa

pI = 3.8

Ca-bindingEF-hand containing protein-1 (2 EF-hand domains)


23 kDa

pI = 6.5

EF-hand containing protein-2 (2 EF-hand domains)


12 kDa

pI = 4.7

Protease inhibitorPerlwapin (5 WAP domains)


43 kDa

pI = 7.9

UnknownUncharacterized protein 4/LUSP-4 (no characterized domain)

AIM (b8∼18 kDa)

17 kDa

pI = 8.8

Uncharacterized protein 6/LUSP-6 (no characterized domain)

AIM (b7∼25 kDa)


20 kDa

pI = 9.6

Uncharacterized protein 8/LUSP-8 (no characterized domain)

AIM (b8∼18 kDa)


18 kDa

pI = 10.1

Uncharacterized protein 13/LUSP–13 (no characterized domain)

AIM (b6∼35 kDa)


26 kDa

pI = 8.5

Uncharacterized protein 15/LUSP–15 (no characterized domain)

AIM (b6∼35 kDa)


37 kDa

pI = 6.0

Uncharacterized protein 16/LUSP–16 (no characterized domain)


23 kDa

pI = 9.8

Uncharacterized protein 18/LUSP–18 (no characterized domain)


55 kDa

pI = 5.7

Uncharacterized protein 19/LUSP–19 (no characterized domain)


27 kDa

pI = 9.3

Uncharacterized protein 21/LUSP–21 (no characterized domain)


16 kDa

pI = 9.7

Figure 2.

Main shell matrix proteins of L. gigantea. SDS/PAGE separation of acid-insoluble and acid-soluble SMPs. ASM and AIM SMPs were separated on a 4–15% gradient SDS/PAGE gel under denaturing conditions and stained with Coomassie Brilliant Blue. The 12 most intensely stained bands of the AIM (b1–12) were excised for further analysis by MS/MS. A schematic representation of the identified proteins is shown on the right. Grey shaded domains indicate RLCDs. The Asp-rich protein (indicated by *) is likely to possess extensive glycosylation. Red bars indicate signal peptide sequences as determined by signalp 3.0.

Our proteomic analysis of L. gigantea SMPs reveals a diversity of SMP structures that can be broadly categorized into one of the following seven classes: repetitive low-complexity domain-containing (RLCD), extracellular matrix-related, enzymes, acidic (low predicted pI), calcium-binding, protease inhibitor and finally orphan proteins with no identifiable domains (Table 1). Although this list of SMPs is not exhaustive (indeed other proteins are known to be present in the L. gigantea shell matrix, see [33]), we believe it is likely to contain most of the abundant SMPs of the aragonitic shell layers because we were able to identify the predominant SDS/PAGE protein bands with a striking match between the expected and observed molecular masses (Fig. 2 and Appendix S1). Indeed, most, if not all, of the peptide analysed from the bands corresponded to the identified proteins. Furthermore, most of the SMPs we identified appear to be the predominant SMPs in Mann et al.'s dataset [33] (Fig. S1). Interestingly, some of these SMPs (e.g. peroxidase-1, -2, and -3, LUSP–1 and -9) were not, or were only partially, detected by Mann et al. (Fig. S1). For example, we were able to identify three full-length peroxidases (Table 1) that were a minor fraction of the Mann et al. dataset. In addition, LUSP–1, which appears to be one of the main components of the L. gigantea AIM (Fig. 2) was not detected by Mann et al. (Fig. S1). These differences may reflect genuine biological variation in the organic contents of the shells of L. gigantea, because Mann et al. investigated the whole shell layers (comprising the calcitic M + 3 layer, together with other aragonitic layers), whereas we restricted our analysis to the aragonitic shell layers (M + 2, M + 1, M and M–1), and/or may this be the result of subtle differences in shell cleaning, matrix extraction and analysis methods.

RLCD-containing SMPs

One of the most striking results of our analysis is the qualitative abundance (at least 13 of 39) of proteins possessing blocks of similar or identical amino acids (Table 1; Fig. 3). These RLCD-containing proteins can be subdivided into three groups.

Figure 3.

Schematic summary of L. gigantea's RLCD-containing SMPs. Schematic representations of the primary structure of RLCD-containing SMPs isolated from the shell of L. gigantea. (A) RLCD domains of peroxidase–1 and peroxidase–2. (B) LUSP–1, -7, -8, -10 and -22 possess noticeable Q-rich repeats or domains. (C) LUSP–11, -12 and -25 exhibit both M- and G-rich domains. Each protein sequence possesses a signal sequence indicated by a red bar. RLCDs are indicated in light grey, with specific repeats indicated by small white boxes.

The first group possess, in addition to RLCDs, conserved enzymatic domains such as peroxidase, carbonic anhydrase (CA) or glycosidase domains (Fig. 3A). Lgi-peroxidase–1 and -2, contain recognizable RLCDs rich in the following amino acids: aspartic acid, lysine, glycine, serine, proline, arginine and glutamic acid. We also detected an RLCD domain rich in Gly and Glu within glycosidase–2. Similarly, the CA–2 protein possesses supernumerary Asp- and Glu-rich domains in its C–terminus. Several previously described SMPs also combine such RLCDs with enzymatic domains. For example, the CA domain of nacrein (first isolated from the pearl oyster Pinctada fucata) is split by the insertion of a RLCD rich in Gly and Asn [18]. This supernumerary RLCD domain of nacrein has been proposed to regulate the activity of the CA domain, acting as an inhibitor of the precipitation of calcium carbonate [34]. It is possible that these RLCDs, embedded within or adjacent to enzymatically functional domains, may be responsible for conferring on these protein isoforms their specificity for biomineralization purposes. However, this hypothesis awaits further investigation.

Glutamine-rich domains characterize the second group of RLCD-containing proteins (Fig. 3B). We identified six L. gigantea uncharacterized shell proteins (LUSP) with high Gln contents, some of which had additional RLCDs rich in other residues. SMPs rich in Gln have also been found in bivalves, for example MPN88 was previously characterized from the oyster Pinctada margaritifera [19], but to date no clear function has been attributed to such Gln-rich SMPs. Interestingly, vertebrate teeth contain various Gln-rich proteins belonging to the secreted calcium-binding phosphoprotein families, including amelotin, amelogenin and enamelin. Secretory calcium-binding phosphoproteins are believed to interact with calcium ions and regulate mineralization processes in vertebrates [35].

The third group of RLCD proteins contains three members, none of which exhibit any sequence similarity with any other proteins. LUSP–11, LUSP–12 and LUSP–25, contain Met- and Gly-rich domains (Fig. 3C). Putative full-length ORFs for these three proteins were deduced from L. gigantea EST and genomic resources. Similar to Gln-rich domains, the significance of Met- and Gly-rich domains in CaCO3 biomineralization is unknown. However, we have noticed that the shell matrices of the gastropod Haliotis asinina [23] and the bivalves P. margaritifera and P maxima also contain noticeable Met-rich proteins, such as MRNP34 [36].


We detected three different peroxidase-domain-containing proteins in L. gigantea shell matrices. Peptides of the RLCD-containing Lgi-peroxidase–1 and -2 were detected in all MS/MS experiments derived from SDS/PAGE bands b1 to b12 (Fig. 2 and Table S1). This suggests that Lgi-peroxidase–1 and -2 are either extremely abundant in the shell matrix, and/or are cleaved into a wide range of peptide lengths after being secreted from the mantle and incorporated into the calcifying shell matrix. Interestingly, all three Lgi-peroxidases cluster together in our phylogenetic reconstruction (Fig. S2). Because no other peroxidase breaks this strongly supported clade (posterior probability 0.98), these three limpet peroxidases may have been produced by two gene duplication events in an ancestor that directly gave rise to the Lottia lineage. In addition, these three L. gigantea peroxidase-encoding genes are all located on the same genomic scaffold (sca_32; Table S2) within 157 kbp of each other.

Interestingly, a similar peroxidase (H2A0M7) has been recently retrieved from the shell matrix of the prismatic layer of the pearl oyster P. margaritifera [27]. Peroxidases catalyse the oxidation of many aromatic amines and phenols by hydrogen peroxide. These enzymes have long been associated with molluscan shell formation [37]. The function of such peroxidases within the calcifying shell matrix, or even whether they exhibit peroxidase activity once secreted by the mantle, is unknown. One hypothesis would be that these enzymes act in the same way as the melanogenic peroxidase found in the ink gland of the cuttlefish Sepia officinalis, serving to cross-link proteins [38]. Biomineral-associated peroxidases might therefore be involved in biomineral–hydrogel formation via protein matrix framework assembly [39]. Similar functional activity is thought to be mostly provided by two tyrosinases in the Pinctada shell matrix [40].

Carbonic anhydrases

CA is a ubiquitous metalloenzyme found in animals, plants and bacteria which catalyses the reversible hydration of carbon dioxide, according to the equation CO2 + H2O ↔ HCO3 + H+. This enzyme is believed to be essential for biomineral formation because bicarbonate, the product of the catalytic process, can directly react with calcium ions to form calcium carbonate. Furthermore, CA has been found in the organic matrices of various metazoan skeletons [41-45]. We detected two different CAs, Lgi-CA–1 and Lgi-CA–2, in L. gigantea shell AIMs and ASMs. Both of these proteins possess a highly conserved α–CA domain in addition to a Gly- and Glu-rich RLCD present in the C–terminus of Lgi-CA–2 (Fig. 4A). Their CA domains possess the conserved active residues known from well-studied α–CAs [46], suggesting that these two Lottia CAs are active enzymes. In support of this, we were able to significantly detect a specific CA activity in the ASM fraction (Fig. 4B).

Figure 4.

Structure and activity of two CAs isolated from the shell of L. gigantea. (A) Amino acid sequences of CA–1 (upper) and CA–2 (lower). The peptide sequences detected by MS/MS are indicated in red. Signal sequences are underlined. Stars indicate stop codons. (B) CA activity of the ASM derived from L. gigantea shells. Commercial CA derived from bovine erythrocytes was used as a positive control and acetozolamide (AZ) was used as a specific inhibitor of carbonic anhydrase activity.

Asp-rich, low pI proteins

Another group of proteins that emerged from our analyses were the acidic Asp-rich proteins ‘Asp-rich’ (Fig. 5A), LUSP–23, LUSP–9 and LUSP–10 with predicted pI values of 3.5, 3.6, 3.7 and 3.8, respectively (Table 1). According to Coomassie Brilliant Blue-stained SDS/PAGE gels, the abundant protein ‘Asp-rich’ (which also has the lowest predicted pI) has an apparent molecular mass of 80 kDa (Fig. 5). In contrast to this, the predicted molecular mass for the nonglycosylated mature form is only 42 kDa. A likely explanation for this discrepancy is the observation that this band was intensively stained with the cationic dye Alcian Blue, suggesting that ‘Asp-rich’ bears extensive acidic polysaccharide moieties. The hydrophobicity ‘Kyte and Doolittle’ [47] plot of the ‘Asp-rich’ protein suggests that it might also exhibit a coiled-coil structure (Fig. 5B).

Figure 5.

An Asp-rich SMP isolated from the shell of L. gigantea. (A) Amino acid sequence of Asp-rich protein. The peptide sequences detected by MS/MS are indicated in red. The signal peptide sequence is underlined. A star indicates the stop codon. Amino acids boxed in black or white indicate putative phosphorylation or glycosylation sites, respectively. (B) Hydrophobicity ‘Kyte and Doolittle’ plot of Asp-rich protein suggesting a coiled-coil structure. (C) 12% SDS/PAGE of L. gigantea AIM, stained with Coomassie Brilliant Blue or Alcian Blue, pH 1. The molecular mass markers are indicated on the left. The red arrow localizes the 70-kDa band excised for MS analysis that contains the mature acidic Asp-rich glycoprotein.

The presence of such unusually acidic proteins in the molluscan shell matrix is known from the pioneering work of Meenakshi et al. [48], Crenshaw [7] and Weiner and Hood [49], and has been further confirmed by several investigations [50-54]. However, because of the technical challenges of isolating and purifying these acidic proteins, reports of their primary sequence are rare [55-58]. To our knowledge, the Asp-rich protein detected here, together with MSP–1 extracted from the calcitic foliated layer of Patinopecten shell [55], is one of the most acidic molluscan SMPs described to date. Although there are several theoretical models regarding the function that these acidic proteins play in the process of shell formation [59], to date only a few in vivo functional studies that have tested these theories [56].

Blue mussel shell protein (BMSP)-like

Our MS/MS analyses of a Coomassie Brilliant Blue-stained band with an apparent molecular mass of ∼160 kDa (b1 in Fig. 1) identified peptides on the genomic scaffold sca_149. After re-evaluating this genomic locus with an ORF-finding tool (Fig. S3), we identified a protein with a calculated molecular mass of 173 kDa. Two molluscan SMPs shared sequence similarity with this novel L. gigantea protein: BMSP–220 (derived from the blue mussel Mytilus galloprovincialis; G1UCX0); and Pif-177 (derived from P. fucata; C7G0B5) (Fig. 6). These proteins all possess von Willebrand A, peritrophin-A chitin-binding and RLCD domains. The M. galloprovincialis BMSP [60] and the L. gigantea BMSP-like proteins also possess a laminin G-like (LamGL) domain and a poly(T) domain between the von Willebrand A domain and the LamGL domain. The M. galloprovincialis BMSP and the L. gigantea BMSP-like proteins also share the highest sequence similarity in these domains. The P. fucata Pif protein was recently shown to bind both CaCO3 and chitin, and by RNAi to play a role in nacre formation in vivo [56]. Given that L. gigantea does not form nacre it will be interesting to determine the function of the L. gigantea BMSP-like protein.

Figure 6.

BMSP-like SMPs isolated from the shell of L. gigantea. Schematic representations of the primary structure of L. gigantea BMSP-like, M. galloprovincialis BMSP and P. fucata Pif proteins. von Willanbrand A, peritrophin-A chitin-binding (CB), RLCDs and LamGL domains are indicated. Sequence similarity scores between selected domains are the percentage of amino acid identity.

Epidermal growth factor and sona pellucida domain-containing SMPs

We also detected two similar proteins (LUSP–17 and LUSP–24) each containing two epidermal growth factor (EGF)-like domains, and one zona pellucida (ZP) domain in their C–termini. Although separate EGF-like and ZP domains are commonly encountered in organic matrix proteins associated with calcification processes [20, 61, 62], the presence of both domains in one protein is more uncommon. Previous proteomic investigations have described one similar protein from the shell matrix of the pacific oyster Crassostrea gigas [25], and two from the Pinctada shell matrix [27]. A sequence alignment of these latter proteins with the two EGF-like SMPs of L. gigantea is presented in Fig. 7, and illustrates the strong conservation of each domain. LUSP–17 and LUSP–24 are also located on the same genomic scaffold (sca_66). This, in combination with their high degree of sequence identity (79%), strongly suggests that they originated from a gene duplication event.

Figure 7.

Two EGF-like SMPs isolated from the shell of L. gigantea. A sequence alignment of EGF-like proteins retrieved from the shell of L. gigantea (B3A0R6 and B3A0S3) against C. gigas (P86785), and P maxima (P86953 and P86954). Signal peptides (yellow), EGF-like (green) and ZP (blue) domains are highlighted. Stars at the end of each sequence indicate a stop codon.

EGF-like domains are involved in a wide variety of functions such as protein/protein recognition, protein aggregation, molecular signalling or Ca2+-binding ability [63]. ZP domains are present in a range of extracellular filament or matrix proteins from a wide variety of eukaryotic organisms, and are characterized by eight conserved cysteine residues, which are involved in protein polymerization processes [64]. Furthermore, the urine-secreted protein, uromodulin (Tamm–Horsfall protein, Q91X17) that exhibits three EGF domains and one ZP domain can potentially contribute to colloid osmotic pressure and modulates formation of supersaturated salts and their crystals [65]. Such similar functions could easily be credited to the EGF- and ZP-containing SMPs and be integrated into a theoretical model of calcified shell biomineralization. However, these hypotheses await validation by functional experiments.

Other SMPs


We also detected a protein in the L. gigantea shell matrix presenting sequence similarities with cyclophilins (Fig. S4). Cyclophilins are peptidyl–prolyl isomerases that are believed to mostly facilitate protein folding. In mice, the absence of expression of cyclophilin B has been shown to induce severe osteogenesis imperfecta [66]. Although the specific role of this enzyme in calcium carbonate mineralization is not known, Jackson and co-workers [20] described a cyclophilin gene highly expressed in the nacre forming cells of the pearl oyster P. maxima.


Two different glycosidase-related proteins were also detected in Lottia's shell matrix (Fig. S5). The first, named Lgi-glycosidase–1, contains a characteristic glycosyl_hydrolase_23 domain and shares significant sequence similarity with patent lysozyme 2 proteins described from other molluscs [67]. The second, named Lgi-glycosidase–2, possesses Asp- and Glu-rich domains and a conserved glyco_hydro_31 domain. Chitin and other insoluble polysaccharides are major nonprotein components of molluscan shells [68-70]. In classical models of mollusc shell biomineralization, these molecules form a framework of parallel layers between which silk-like and acidic proteins are sandwiched [11, 59]. Lottia's glycosidase SMPs might reasonably be expected to modify this chitin/polysaccharide framework during biomineral formation.

EF-hand containing proteins

We also identified two short proteins containing two EF-hand domains (Fig. S6). Similar proteins have been described previously from the shell matrix of the bivalves Pinctada and Venerupis [26, 27]. Two consecutive EF-hand domains are known to bind Ca2+ ions with high affinity [71], and are observed in many extracellular matrix proteins, such as calmodulins, troponin–C or S–100, often in association with other domains.

Secreted cysteine-rich protein-like proteins

Two similar L. gigantea SMPs, LUSP–2 and LUSP-3, contain characteristic secreted cysteine-rich protein (SCP) domains (Fig. S6). We also found that the genes encoding LUSP–2 and LUSP–3 are located at adjacent genomic loci (sca_35). Interestingly, three additional SCP-domain-containing proteins, which were not detected in our MS/MS analyses, are also present on this scaffold (Fig. S7). Interestingly, a similar SCP-containing protein has recently been described from the nacre of P. margaritifera [27]. Because SCP domains have also been described in association with a variety of extracellular matrix proteins, no clear function has been yet assigned to such domains in the context of biomineralization.


We also detected a protein in the shell matrix of L. gigantea with sequence similarities to perlustrin, a protein containing an insulin-like growth factor binding protein (IGF–BP) domain first isolated from the nacre of the abalone Haliotis laevigata (Fig. S8) [72]. This Lgi-IGF–BP is characterized by a pattern of 12 conserved Cys residues. Interestingly, vertebrate bone matrix contains IGF–BPs, which are involved in bone formation, possess an effective affinity for growth factors of the insulin type, and function by modulating IGF metabolism.


We detected in Lottia's shell matrix one protein containing five whey-acidic protein (WAP) domains, and with high overall sequence similarity to the perlwapin family (Fig. S8). WAP consists of two ‘four-disulfide core’ domains that are present in various serine-proteinase inhibitors. Perlwapin proteins, containing such WAP domains, have been identified in Haliotis [23, 73, 74] and from the shell of the blue mussel M. galloprovincialis [24]. However, whereas Lgi-perlwapin contains five WAP domains, the other perlwapins from the species listed above possess only oen to three WAP domains.


Nine other L. gigantea SMPs do not display any sequence similarity with previously described proteins, or possess recognizably conserved domains (Table 1). These proteins were categorized as orphans. Comparative metazoan genome analyses suggest that every taxonomic group contains 10–20% of these so-called ‘orphan’ or ‘taxonomically restricted’ genes. Such genes are thought to underlie mechanisms that can support the generation of morphological novelties [75]. Interestingly, all molluscan shell matrices broadly investigated at the ‘-omic’ level (genomic, transcriptomic or proteomic) contain such orphan proteins. The presence of such orphans may reflect the evolvability of the molluscan shell matrix, suggesting that the appearance of such new proteins within the SMP set could potentially be related to modification of the biomineral structure through evolutionary time. Perhaps more than any other, this class of biomineral-associated proteins highlights the need for in vivo gene function assays to be developed for molluscan biomineralizing systems.


RLCD-containing SMPs

RLCD proteins are a prominent feature of all shell-forming proteomes studied to date. Most, if not all, of the RLCD-containing SMPs we have detected appear to be lineage-specific proteins, supporting the idea that such biomineralizing proteins have evolved independently in the different molluscan models. Various RLCD-containing proteins are present in a wide range of metazoan-secreted structures, for example silk fibroin [76], the mussel byssus [77] or the insect chorion [78]. Molluscan shell-forming proteins with RLCDs include nacrein and lustrin–A which contain GN- or GS-rich domains [18, 74], MSI60 and CL10Contig2 contain poly(G) and poly(A) blocks [79], Pif-177 contains D-rich domains [56], MPN88 contains Q-, M- and G-rich repeated sequences [19], and the Shematrin family bear numerous GY-rich domains [80]. RLCDs are likely to represent regions with intrinsically disordered conformations thought to be structurally unstable [81]. Such domains possess low binding affinity for other organic macromolecules (such as proteins or polysaccharides), but weakly bind mineral surfaces and ions in aqueous phases. Indeed, GY or GN repeats of the nacrein and shematrins have been proposed to weakly bind Ca2+ ions [34, 80], whereas the D-rich domains of Pif-177 were shown to directly bind aragonitic mineral surfaces [56]. It has also been proposed that the poly(G), poly(A), or poly(S) regions of MSI60, CL10Contig2 or lustrin–A may confer elastomeric properties to the mature biomineral [23, 74, 79, 82]. Given that RLCD proteins are a major component of the protein fraction within a wide range of molluscan shells, it is clear that they are likely to be playing crucial roles in either shell formation, and/or imparting to the shell certain physical properties such as fracture resistance.

Conservation of SMPs and their evolution

Given that L. gigantea does not form nacre, one of the most surprising results of our study was the detection of various proteins that share high sequence similarities with SMPs previously identified from the nacro-prismatic shells of Pinctada bivalves. Figure 8 summarizes the co-occurrence of SMPs known from various molluscan models of biomineralization: bivalves of the genus Pinctada [19, 27]; abalone (genus Halio tis) [23]; and L. gigantea (comprising the proteins reported here together with the 23 main shell proteins identified by Mann et al. [33]). Protein sequence alignments and overall domain conservation suggest that most of the eight proteins shared between Lottia and Pinctada (CAs, BMSP and EGF-like in particular) may be true orthologues (Figs 34, 6 and 7 and Fig. S6). For the two proteins shared between L. gigantea and the Haliotids, IGF–BP (perlustrin) and perlwapin, accurate evolutionary relationships (orthology versus paralogy) are difficult to assign because the sequence based similarities between these proteins are restricted to amino acid positions that are specific to the IGF–BP and WAP families (Fig. S8).

Figure 8.

A comparison of molluscan SMPs isolated from Lottia, Pinctada and Haliotis. A broad comparison of molluscan SMPs. (A) A summary of the shared and lineage-specific SMPs described to date from L. gigantea and various Pinctada and Haliotis species. Numbers correspond to the number of different SMPs detected to date for each model, for example we can distinguish 32 different SMPs from the 39 different SMPs we have identified for L. gigantea (e.g. when considering one entry for the two CAs). (B) After categorizing these SMPs into eight broad categories it is clear that proteins with RLCDs are a common feature of the molluscan shell-forming secretome. Most proteins shared between L. gigantea and Pinctada species fall into the extracellular matrix category. Grey boxes indicate proteins detected by Mann et al. [33] to be minor components of the shell matrix.

Counterintuitively, we have found more SMPs shared between L. gigantea and bivalve species Pinctada than between the gastropods L. gigantea and abalone. This trend was also independently described in a transcriptomic comparison of the mantle tissues of L. gigantea, P maxima and H. asinina [20]. One potential explanation for these observations is that the shell-forming secretome of the abalone has accumulated more changes since its divergence from a limpet–abalone ancestor than the limpet has since its divergence from a bivalve–gastropod ancestor. Given the fundamental crystallographic differences between the limpet and abalone shells (presence/absence of nacre and crossed lamellae, for example), such a scenario is conceivable. Complicating this issue is the fact that beyond the species and genus level, molluscan shell microstructures are notoriously evolutionary plastic. To a large degree this plasticity must be the result of the evolution of the organic molecules that coordinate deposition of the shell (past ocean chemistries and temperatures would also affect shell evolution). Our molecular data are compatible with the hypothesis of a genuine affiliation between cross-lamellar structures and nacre [3]. However, a well-resolved, robust and taxonomically well-represented phylogenetic tree for the Conchifera is essential before any scenarios of shell evolution can be proposed and then tested. Fortunately, recent genomic efforts are moving towards this goal [83, 84]. In addition to such a resource, better taxon sampling of mantle tissue transcriptomes and shell proteomes would allow us to better understand how this shelled diversity has been generated over the last 550 million years.


The availability of genome, proteome and transcriptome scale datasets from non-model organisms is enabling more complete assessments of complex biological processes to be performed. Molluscan shell formation is certainly such a process that will benefit from such analyses. By combining a proteomic analysis of SMPs extracted from the shell of L. gigantea with a draft genome assembly, we have identified several new biomineralizing proteins, and further characterized several others. Many of these proteins are characterized by apparently lineage-specific arrangements of RLCDs and highly conserved enzymatic domains such as CA, peroxidase and glycosidase. Even when combined with a recent analysis by Mann et al. [33], the complete shell-forming proteome of L. gigantea is unlikely to have been described, and further work will probably identify additional components. Indeed, it remains possible that the trypsin hydrolysis of few SMPs generate only peptides of unsuitable length (too short or too long) for MS analysis, being undetectable by classical proteomic approach [85]. However many of the primary shell-forming proteins are likely to be in hand, and it is becoming increasingly clear that the challenge that now faces the field is to characterize the function of these proteins using in vivo techniques.

Materials and methods


Fresh L. gigantea shells (5–7 cm in length) were collected from the West Pacific coast of the USA (California). Shell microstructure was observed with a scanning electron microscope Philips XL-30 LaB6 under back-scattered electron mode.

Shell matrix extraction

The external organic layer, the periostracum, and the outermost M + 3 calcified layer that presents burrowing traces were mechanically remove under cold water in order to avoid shell heating, then the rest of the shell, comprising the M + 2, M + 1, M and M − 1 layers were crushed into fragments of ~ 1 mm2. Any other superficial organic contaminants were removed by incubating shell fragments in NaOCl (1%, v/v) for 24 h, and which were then thoroughly rinsed with water and subsequently ground into a fine powder that was sieved (> 200 μm). All protein extractions were performed at 4 °C, as previously described [52]. Briefly, powdered samples were decalcified overnight in cold dilute acetic acid (5%, v/v), which was slowly added by an automated titrator (Titronic Universal, Mainz, Germany) at a flow rate of 100 μL every 5 s. The solution (final pH ~ 4.2) was centrifuged at 3900 g (30 min). The resulting pellet, corresponding to the AIM, was rinsed six times with MilliQ water, freeze-dried and weighed. The supernatant containing ASM was filtered (5 μm) and concentrated with an Amicon ultra-filtration system on a Millipore® membrane (10 kDa cut-off). The final solution (> 5 mL) was extensively dialysed against 1 L of MilliQ water (six water changes) before being freeze-dried and weighed.

CA activity measurement

The miniaturized colorimetric method developed by Maren [86] was employed for measuring the CA activity (EC of the shell ASM. The experiment was carried out under stabilized flow of CO2, in an ice-containing vessel. Four hundred microlitres of phenol red (12.5 mg·L−1 in 2.6 mm NaHCO3) were mixed with 200 μL of water and 100 μL of sample. The reaction was initiated by adding 100 μL of freshly made carbonate buffer (0.6 m Na2CO3, 0.412 m NaHCO3) and the time interval until the colour changed from red to yellow was monitored. This colour change characterizes the pH decrease of the solution (from 8.2 to 7.3), resulting from the production of protons during the reaction catalysed by the CA (CO2 + H2O ↔ HCO3 + H+). The enzyme unit (EU) activity was calculated according to the following equation: activity units (EU) = (T0 − T)/T; where T and T0 are the reaction times required for the pH change with and without a catalyst, respectively. Acetozolamide was used as a specific inhibitor of the reaction. Commercial bovine CA and BSA were used as positive and negative controls, respectively.


The fractionation of matrix macromolecules was performed under denaturing conditions by monodimensional SDS/PAGE (Mini-Protean 3; Bio-Rad). One milligram of matrix (both ASM and AIM) was suspended in 200 μL of Laemmli sample buffer [87], heat denatured (10 min, 100 °C) then centrifuged for 1 min at 12 000 g. Ten microlitres of the supernatant, representing a maximum of 50 μg of matrix, were loaded onto gels. Following SDS/PAGE under denaturing conditions (4–15% acrylamide gel), proteins were visualized with Coomassie Brilliant Blue (CBB G-250; Biosafe, Bio-Rad). Alternatively, putative glycosylations were investigated by staining with Alcian Blue 8GX [88], at pH 1 in order to specifically stain sulfated sugars.

Sample preparation for proteomic analysis

An in-gel digestion procedure was performed for 12 predominant protein bands visualized from the electrophoresis gel of the AIM (Fig. 2). These bands were excised from Coomassie Brilliant Blue-stained gels and completely destained by a wash with 400 μL of 50 mm NH4HCO3/CH3CN (50/50) mixture for 15 min at 37 °C. Reduction was performed with 50 μL of 10 mm dithiothreitol in 50 mm NH4HCO3 for 15 min at 50 °C. Alkylation was performed with 50 μL of 100 mm iodoacetamide for 15 min at room temperature in the dark. The reagents were taken away and the gel pieces were dried using 100 μL of CH3CN. Gel pieces were then treated with 0.4 μg trypsin (Sequence grade; Promega, Madison, WI, USA) in 20 μL of 50 mm NH4HCO3 for 45 min at 50 °C under 800 rpm agitation. The supernatant was removed and stored. The gel pieces were extracted with 30 μL of H2O:CH3CN:HCOOH (68 : 30 : 2) mixture for 30 min at 30 °C. Finally, both supernatant extracts were pooled, dried in a vacuum concentrator and resuspended in 13 μL of 0.1% trifluoroacetic acid.

In-solution digestion of unfractionated L. gigantea ASM and AIM was also performed. These samples (0.1 and 1 mg, respectively) were reduced with 50 μL of 10 mm dithiothreitol in 50 mm NH4HCO3 for 30 min at 50 °C. Alkylation was performed with 50 μL of 100 mm iodoacetamide in 50 mm NH4HCO3 for 30 min at room temperature in the dark. The solution was then treated with 1 μg of trypsin (Sequence grade; Promega) in 10 μL 50 mm NH4HCO3 overnight at 37 °C. The sample was dried in a vacuum concentrator and resuspended in 30 μL of 0.1% trifluoroacetic acid and 2% CH3CN.

Peptide fractionation and data acquisition

MS was performed using a Q-Star XL nanospray quadrupole/time-of-flight tandem mass spectrometer, nanospray-Qq-TOF-MS/MS (Applied Biosystems, Villebon-sur-Yvette, France), coupled to an online nano liquid chromatography system (Ultimate Famos Switchos from Dionex, Amsterdam, The Netherlands). One microlitre of each sample was loaded onto a trap column (PepMap100 C18; 5 μm; 100 Å; 300 μm × 5 mm; Dionex), washed for 3 min at 25 μL·min−1 with 0.05% trifluroacetic acid/2% acetonitrile, then eluted onto a C18 reverse phase column (PepMap100 C18; 3 μm; 100 Å; 75 μm × 150 mm; Dionex). Peptides were separated at a flow rate of 0.300 μL·min−1 with a linear gradient of 5–80% acetonitrile in 0.1% formic acid over 120 min. MS data were acquired automatically using analyst qs 1.1 software (Applied Biosystems). Following a MS survey scan over m/z 400–1600 range, MS/MS spectra were sequentially and dynamically acquired for the three most intense ions over m/z 65–2000 range. The collision energy was set by the software according to the charge and mass of the precursor ion. MS and MS/MS data were recalibrated using internal reference ions from a trypsin autolysis peptide at m/z 842.51 [M + H]+ and m/z 421.76 [M + 2H]2+.

MS data analysis

Protein identification was performed using the MASCOT search engine (version 2.1; Matrix Science, London, UK) against protein databases derived from the EST and the genomic libraries of L. gigantea comprising 252 091 and 23 851 sequences, and downloaded (March 2010) from the NCBI server ( and the L. gigantea genome website (, respectively. LC-MS/MS data were searched using carbamidomethylation as a fixed modification, and methionine oxidation as a variable modification. The peptide mass and fragment ion tolerances were set to 0.5 Da. Only protein identifications with at least two different peptide hits and/or that were independently obtained from two different samples were considered to be valid. The peptide hits were manually confirmed by the interpretation of the raw LC-MS/MS spectra with analyst qs software (Version 1.1). Quality criteria were the peptide MS value, the assignment of major peaks to uninterrupted y- and b-ion series of at least three to four consecutive amino acids and the match with the de novo interpretations proposed by the software.

Sequence analysis

Protein sequence identification was performed using blastp and tblastn analyses performed against Swiss-Prot, GenBank's nr db and dbEST using the online tool provided by UniProt ( and NCBI ( servers. Signal peptides were predicted using signalp 3.0 (, and conserved domains were predicted using smart ( and interproscan ( Following peptide signal removal, theoretical masses and pI values were determined using the expasy protparam tool (

Sequence alignment and phylogenetic analysis

Representative full-length sequences of the major nonvertebrate metazoan peroxidases were selected from the results of a blast searche performed with the three peroxidases from L. gigantea SMPs, using UniProt and NCBI online tools, against Swiss-Prot, GenBank's nr db and dbEST. The multiple alignment was created using t-coffee 6.85 [89] set to standard parameters. Phylogenetic reconstructions were performed using the maximum likelihood method implemented in phyml from the server [90].


We thank Eric Edsinger-Gonzales for providing shell material. BM thanks Jérome Thomas for handling shell pictures of Lottia and Pinctada. The work of BM was supported by the GDR ADEQUA consortium (Coordinator Nathalie Cochennec-Loreau/Yannick Gueguen), while the work of NG and FM was supported via ANR Accro-Earth (ref. BLAN06-2_159971, coordinator Gilles Ramstein, LSCE). Additional supports include itn biomintec and interrvie program (2010). DJJ is supported by DFG funding to the University of Göttingen through the German Excellence Initiative.