Proteomic fingerprinting facilitates biodiversity assessments in understudied ecosystems: A case study on integrated taxonomy of deep sea copepods

Accurate and reliable biodiversity estimates of marine zooplankton are a prerequisite to understand how changes in diversity can affect whole ecosystems. Species identification in the deep sea is significantly impeded by high numbers of new species and decreasing numbers of taxonomic experts, hampering any assessment of biodiversity. We used in parallel morphological, genetic, and proteomic characteristics of specimens of calanoid copepods from the abyssal South Atlantic to test if proteomic fingerprinting can accelerate estimating biodiversity. We cross‐validated the respective molecular discrimination methods with morphological identifications to establish COI and proteomic reference libraries, as they are a pre‐requisite to assign taxonomic information to the identified molecular species clusters. Due to the high number of new species only 37% of the individuals could be assigned to species or genus level morphologically. COI sequencing was successful for 70% of the specimens analysed, while proteomic fingerprinting was successful for all specimens examined. Predicted species richness based on morphological and molecular methods was 42 morphospecies, 56 molecular operational taxonomic units (MOTUs) and 79 proteomic operational taxonomic units (POTUs), respectively. Species diversity was predicted based on proteomic profiles using hierarchical cluster analysis followed by application of the variance ratio criterion for identification of species clusters. It was comparable to species diversity calculated based on COI sequence distances. Less than 7% of specimens were misidentified by proteomic profiles when compared with COI derived MOTUs, indicating that unsupervised machine learning using solely proteomic data could be used for quickly assessing species diversity.


| INTRODUC TI ON
In light of the worldwide observed and predicted changes in biodiversity, there is an urgent need to measure spatial and temporal variation in biodiversity from local to global scales, to understand what impacts these changes might have on communities and ecosystems, and how biodiversity can be conserved (Costello, 2001;Ash et al., 2009). The deep sea is largely underinvestigated even though it constitutes the largest environment on earth, and only little information on the diversity of different metazoan groups (Ramirez-Llodra et al., 2011), which varies on local, regional and global scales, is available (Rex & Etter, 2010;Vinogradova, 1997). The increasing interest in exploring resources provided by the deep sea, such as polymetallic nodules, and the contemplated mining activities, will probably have an immense, yet unforeseeable impact on the inhabiting fauna (Cuyvers et al., 2018). Therefore, a fast and reliable assessment of species diversity is crucial to set baselines in biodiversity, understand the relationship of species to the surrounding environment, detect the influence of anthropogenic disturbances on species compositions and allow a sustainable use of deep-sea resources. Valid species identification is the first step to understand population structures, abundance, and diversity of the communities in the deep sea. While some taxa, such as polychaetes, are thought to be more widespread (Watling et al., 2013), others are found to have many endemic species, with many of them occurring only in singletons and most of them being new to science (e.g., Rex & Etter, 2010 and references therein). This hinders any assessment of regional diversity and biogeographic patterns.
Calanoid copepods often dominate benthopelagic communities of the deep sea (Wishner, 1980), showing high diversity (Bradford-Grieve, 2004) that in some areas is hypothesized to be comparable to pelagic waters (Renz & Markhaseva, 2015). Approximately 2,800 species of calanoid copepods are currently described (Park & Ferrari, 2009), although a large number of so far undetected species is expected to exist in underinvestigated ecosystems such as the deep sea. This is reflected in the fact that >70% of genera detected in recent years in the abyssal benthopelagic realm of the South Atlantic and Southern Ocean were new to science, with most of them being endemic to the benthopelagic zone and specialized to living within the vicinity of the seabed (Renz & Markhaseva, 2015). The identification of individual calanoids within the benthic boundary layer is significantly impeded by the morphological conservation of the group (Renz & Markhaseva, 2015) and, moreover, as generally applicable for many taxa, by increasing lacking taxonomic expertise.
Furthermore, almost no identification literature is available for juvenile stages of calanoid copepods.
Molecular techniques have added a new dimension to the traditional phenotypic approach and allow researchers to overcome taxonomic difficulties in the identification of species with different life history stages. They can improve and contribute to estimating species diversity and were successfully applied to understand the diversity of calanoid copepods (e.g., Blanco-Bercial et al., 2014;Bucklin, Hopcroft, et al., 2010;Laakmann et al., 2013), resolve problematic taxa (e.g., Aarbakke et al., 2014;Hill et al., 2001) or reveal cryptic and new species (Caudill & Bucklin, 2004;Chen & Hare, 2011;Goetze, 2003). In many of these studies, species identification was enabled by analysing the mitochondrial cytochrome c oxidase subunit I gene (COI) as the metazoan species-specific DNA sequence . However, COI barcoding of individual species is time-and cost-intensive.
Proteomic fingerprinting, a tool commonly used for species and strain identification in microbiology (as reviewed by Croxatto et al., 2012) is a relatively new approach in metazoan studies. Proteomic spectra are determined using matrix-assisted laser/desorption ionisation time-of-flight mass spectrometry (MALDI-TOF MS). Pilot studies on proteomic fingerprinting of metazoan taxa indicate the possibilities to distinguish different species (e.g., Feltens et al., 2010;Kaufmann et al., 2012;Mazzeo et al., 2008;Volta et al., 2012), including closely-related freshwater copepod species , calanoid copepods from the North Sea (Laakmann et al., 2013) and tropical Atlantic (Bode et al., 2017;Kaiser et al., 2018) and benthic harpacticoid species Martínez Arbizu, 2018, 2019). Most of these studies either aimed to proof the general concept of proteomic fingerprints for a certain taxon group or to apply the method to the field using a reference library with pre-identified species. However, in the benthopelagic layers of the deep sea, we expect a high number of new, undescribed species, making the establishment of a such a reference library rather time-and cost-intensive.
The use of unsupervised classification methods with proteomic fingerprints is a promising technique that may allow a prediction of species diversity without prior species identification (Rossel & Martínez Arbizu, 2020). It implies that specimens can be separated on species level using the dissimilarity of their MALDI spectra as the only source of information. The aim of our study was therefore to (i) find an adequate unsupervised technique to estimate species richness of a community from proteomic profiles only, (ii) evaluate resolution, accuracy and efficiency of species separation based on proteomic fingerprinting using cross-validation with morphology and COI sequencing, and (iii) provide for the first time data on diversity and composition of benthopelagic abyssal copepods using an integrated morphological, genetic and proteomic approach.

| Sampling
Calanoid copepods were collected in the benthopelagic boundary layer (BBL) during RV Meteor cruise M79-1 (Supporting Information 1) (Project DIVA 3) in the South Atlantic Ocean from station 580 (14°58.91' S, 29°56.48' W) at a depth of 5139 m using an epibenthic sledge (EBS; Brenke, 2005). The sledge consists of a closable 500 µm epi-and supranet, each with an opening of 1 m width and 0.35 m height. Both nets end up in a cod end with a mesh size of 300 µm.
The net openings are positioned 0.2-0.6 m (epinet) and 0.77-1.12 m above the seabed (supranet). The EBS was hauled over the seabed at 1 knot for 10 min. Nets were opened before starting the trawl and closed before starting to haul the net from the seabed. On board, the samples were immediately fixed in 96% pure undenatured ethanol.
Ethanol was exchanged within 24 h of sampling and samples were constantly cooled for molecular analyses.

| Sorting, identification, and specimen preparation for molecular analysis
Calanoid copepods were sorted in the laboratory and all individuals were classified into adult females, adult males and copepodites (altogether 358 individuals, Table 1). Adult stages were identified to genus or species level if possible, using a stereomicroscope and a microscope, and assigned to a morphotype. In some cases, it was necessary to dissect oral limbs and swimming legs to allow for genus identification. Individuals of all morphotypes were transferred to the collection of the DZMB to allow for later descriptions of new species, leaving 259 specimens for molecular analyses.
All of the 259 individuals were cut in half for further molecular analysis to allow for concurrent measurements of molecular genetic analysis and proteomic fingerprinting from the same specimens.
Molecular genetic analyses were conducted using the metasome and the urosome, while proteomic mass spectra were established using only the cephalosome of the individuals (except for individuals > 4 mm, where only the anterior part of the cephalosome was taken for analysis). Pre-tests with epipelagic copepods showed no significant differences in proteomic composition using the whole body or only parts of the cephalosome (personal observation).
Sequences were assembled, edited and checked for reading frames using the software GENEIOUS prime 2019 created by Biomatters (available from http://www. geneious.com/). The data sets were translated into amino acid alignments and checked for stop codons to avoid pseudogenes. Using BLAST (Altschul et al., 1990), sequences were compared with those available in GenBank. All new sequences were deposited in GenBank (Table 1). Multiple alignments of COI were performed in MEGA version 6.06 (Tamura et al., 2013) using default settings and the muscle algorithm (Edgar, 2004).

| Protein mass fingerprinting analysis (MALDI-TOF MS)
Proteomic profiles were determined for 259 specimens. The copepod tissue was quickly dried at room temperature. Depending on sample size 5-10 µl matrix solution (α-cyano-4-hydroxycinnamic acid as saturated solution in 50% acetonitrile, 47.5% LC-MS grade water, and 2.5% trifluoroacetic acid) was added. After at least 10 min extraction 1.2 µl of each sample was added onto the target plate.
Protein mass spectra were measured from 2 to 20 kD using a linearmode MALDI-TOF System (Microflex LT/SH, Bruker Daltonics). Peak intensities were analysed during random measurement in the range between 2 and 10 kDa using a centroid peak detection algorithm, a signal to noise threshold of 2 and a minimum intensity threshold of 400 with a peak resolution higher than 400 for mass spectra evaluation. Proteins/oligonucleotide method was employed for fuzzy control with a maximal resolution 10 times above the threshold. For each sample 240 satisfactory shots were summed up. Spectra were analysed using the R-packages MALDIquAnt (Gibb & Strimmer, 2012), and MALDIquAntForeIgn (Gibb, 2013). Peaks were detected using a signal to noise ratio (SNR) of 7 after squareroot transformation, savitzky golay smoothing, baseline removal (SNIP-algorithm) and normalization (TIC) of spectra. Peaks were repeatedly binned until the intensity matrix reached a stable peak number (tolerance 0.002, strict approach) and missing values were interpolated from the corresponding spectrum. All signals below a SNR <1.75 were assumed to be below detection limit and set to zero in the final peak matrix.

| COI barcodes
A COI fragment of 658 basepair (bp) (minimum sequence length 334 bp) was analysed by neighbour-joining analysis based on uncorrected pairwise genetic distances using the software MEGA TA B L E 1 Calanoid copepod specimens from the South Atlantic, Project DIVA 3, station 580, for morphological identification and molecular analysis; for the analysis using matrix-assisted laser desorption/ionisation time-of-flight (Maldi-TOF) spectra, the anterior part (cephalosome) of the individuals was used; for the molecular genetic analysis of COI the posterior part (metasome + urosome) of the individuals was used (Continues) version 6.06 (Tamura et al., 2013). For comparative purposes, neighbour-joining analysis based on the K80 model (Kimura 2-parameter (K2P) was performed with the following settings: equal base frequencies, one transition and one transversion rate; Kimura, 1980) and 10,000 bootstrap replicates using the software MEGA version 6.06 (Tamura et al., 2013). Both models resulted in comparable tree topologies. We therefore chose the tree topology based on uncorrected p-distances to represent our data set and determine molecular operational taxonomic units (MOTUs) in our data set.

TA B L E 1 (Continued)
Four commonly used internal cluster criteria were tested for their ability to separate on species level using the R-package nBCLust (Charrad et al., 2014, Table 3): the variance ratio criterion by Caliński and Harabasz (1974), the Dunn index (Dunn, 1974), the silhouette analysis (Rousseeuw, 1987) and the gap analysis (Tibshirani et al., 2001). Each of these iterative applied algorithms uses a specific measure to find significant clusters in the data set. The first two follow a quite similar approach in providing the number of clusters that are best separated and most compact. The Dunn index uses a ratio of separation (a minimum of pairwise distances between clusters) and compactness (as maximum of pair-wise distances within the cluster), while the variance criterion is based on the ratio of the between cluster sum of squares and the within cluster sum of squares.
The latter also includes a penalty factor for the number of clusters tested. The often-applied silhouette analysis uses the difference between (normalized) separation and compactness instead of a ratio.
For each data point a silhouette width is calculated, and the average of these widths then provides the validation criterium for the tested solution. Another approach is followed in the gap analysis, which compares the compactness of the clusters within the data set, with that of clusters of a random data set to validate whether the solution is significantly different from a random structure. In order to test, which cluster validation method is most suited to identify specieslevel structures in the data, we applied all criteria on the subset of samples which species-level identity was verified (labelled) by COI barcode (N = 182). We evaluated the consistency between MALDI data and MOTUs using four external cluster validation criteria provided by the R-package CLusterr: Rand, Hubert & Arabie adjusted Rand, Fowlkes & Mallows measures and Jaccard (Table 3, for all indices as well as a detailed discussion of the indices, see Jaccard, 1908;Rand, 1971;Fowlkes & Mallows, 1983;Hubert & Arabie, 1985;Milligan & Cooper, 1986;Wagner & Wagner, 2007). Due to the good clustering results of the criterion by Caliński and Harabasz (1974), we then used this criterion to estimate the total number of species based on all MALDI samples (N = 259).

| Consensus clustering of protein mass spectra
To determine the stability of species clusters a consensus clustering was performed using HC with single linkage using the R-package ConsensusCLusterPLus (Wilkerson & Hayes, 2010). A consensus matrix was calculated based on 100 repetitions of HC using Euclidean distance of Hellinger transformed peak intensities based on subsets of features (f), here compounds and samples (s), respectively (i.e., f = 0.8/s = 0.8, f = 1/s = 0.8, f = 0.8/s = 1, f = 0.5/s = 1). Outer clustering of the consensus matrix was again done using HC with single linkage and cluster stability was inspected visually. The number of clusters was inferred from each consensus analysis using the proportion of ambiguous clustering (PAC) as internal validation measure (Șenbabaoğlu et al., 2014).
PAC is defined as the fraction of sample pairs with consensus values in the interval above 0 (i.e., sample pairs that are never in the same cluster) and below 1 (i.e., sample pairs that are always in the same cluster). In a truly stable clustering, a consensus matrix contains only 0 and 1, and the PAC would have a score of 0. Here, we used 0.1 as lower and 0.9 as upper boundary. From this we inferred the optimal number of clusters by the lowest PAC. As for the agglomerative clustering we applied consensus clustering to the labelled subset of specimens (N = 182) to first validate the method and then on the whole data set (N = 259), using all samples and only 50% of features (s = 1, f = 0.5), to predict species numbers.

| Calculation of diversity
We calculated diversity in the calanoid community using the Shannon
Nine specimens could only be identified to family or higher taxon level, as these individuals most probably belonged to new, undescribed genera. Only 12 specimens out of 358 (3.3%) could be assigned to six already described species (Xancithrix ohmani,  Renz & Markhaseva, 2015).

| Discrimination of operational taxonomic units based on proteomic profiles (POTU)
Proteome profiles could be successfully measured for all of the 259 specimens that went into molecular analysis. In total 588 molecular compounds (with a signal to noise ratio >7 in at least one organism in the data set) were determined and used for all further analysis.
For 182 out of these 259 specimens COI barcodes were successfully determined and these specimens were used to validate the ap- set. Both, the variance ratio criterion (Caliński and Harabasz, 1974) as well as the Dunn index (Dunn, 1974) provided an optimal solution of 182 clusters, i.e., each specimen was forming an own cluster. Both methods also showed a distinct local maximum at a cluster number of 58 and 53, respectively (Figure 3a,b). The commonly used gap analysis (Tibshirani et al., 2001) as well as the silhouette analysis F I G U R E 2 Euclidean distance based on Hellinger transformed peak intensities between (inter-spec.) and within (intra-spec.) POTUs: boxplots comprise true positives and true negatives, i.e., samples with consistent classification with MOTU assignment; red dots are Euclidean distances of false positive and negative samples, i.e., distances between the wrongly assigned samples with all other samples in the false cluster and the correct cluster, respectively Euclidean distance (Rousseeuw, 1987) did not reveal any of this substructure leading to an extreme under-and overestimation, respectively, when compared with MOTUs (Figure 3c,d). External cluster validation revealed highest success rates for delimiting clusters on MOTUlevel (and thus probably species level) using the variance criterium (Caliński and Harabasz, 1974). We therefore defined these clusters as POTUs and their intercomparison with the identified MOTUs revealed that 13 out of 182 individuals (7%) were misidentified by proteomic-based clustering. However, MOTU identification in two of these clusters (M42 Xanthocalanus sp., M40 Indet., M34 Indet.) and (M46 Indet, M32 Indet) derived from short COI sequence lengths only, and MOTU delimitation within these clusters strongly dependeds on the model applied (personal observation). Thus, consistency between POTUs and species may be even higher. Euclidean distances of false-positive and false-negative assigned specimens were not distinctly different from the correct inter-and intraspecific distances, respectively (Figure 2).
Consensus clustering was used to estimate cluster stability.
Between 52 and 60 clusters were inferred from consensus clustering followed by PAC depending on the percentage of samples and features (compounds) included (Table 3, Figure 4, Figure 5). The substructure of most clusters was low reflecting overall high stability of clustering results. However, some clusters with stronger substructure (e.g., M18 Xanthocalanus sp) and some linked cluster groups (e.g., M52 Indet., M15 Indet., M21 Indet) are more sensitive to incorrect delimitation. Clustering based on 100% of samples and 50% of compounds delimited 59 POTUs and misidentified 11 out of 182 specimens (6%) when compared to MOTUs. Six of the multi-MOTU clusters displayed no evident substructure.

| DISCUSS ION
The Millennium Ecosystem Assessment provided strong evidence that the abundance of many species is declining, and that species distributions have been substantially altered due manifold anthropogenic activity (Millennium Ecosystem Assessment, 2005). A fast and reliable provision of comprehensive baselines for biodiversity is an urgent need for ecosystem management, yet still a strong challenge specifically in understudied marine ecosystems, such as the F I G U R E 3 Identification of number of species clusters based on agglomerative hierarchical clustering using Euclidean distances derived from proteomic composition and internal cluster validation by the variance ratio (Caliński and Harabasz (1974) ; Figure 4a), the Dunn Index 1974Dunn (1974 ; Figure 4b), the gap analysis (Tibshirani et al., 2001;Figure 4c) and the silhouette analysis (Rousseeuw, 1987; Figure 4d); red dots in 4a and 4b indicate the first maximum in the variance ratio and Dunn index, respectively deep sea. Next to morphological identification, DNA based methods such as barcoding/metabarcoding, as well as the recently emerged rapid analyses using MALDI-TOF mass spectrometry to identify specimens using proteomic fingerprinting, were shown to accelerate the process of specimen identification in biodiversity assessments . A crucial step in using these methods is to build reference libraries that connect morphological data to speciesspecific COI barcodes and proteome fingerprints. Here, for the first time, we report a study assessing species biodiversity of the highly specialized benthopelagic calanoid copepod community in the deep sea below 5,000 m in the South Atlantic, by a combined approach of morphological and molecular methods.

| Species identification by a combined morphological and molecular approach -a methodological evaluation
Four criteria of a method are substantial for biodiversity assessments: (i) the resolution, i.e., the taxonomic level of discrimination that can be reached, (ii) the accuracy, i.e., the percentage of correct classification, (iii) the net identification rate, i.e., the proportion of all available specimens of a sample that can be identified to species level, which is a result of resolution, accuracy and loss rate during the process and, finally (iv) the cost-benefit ratio, i.e., the effort in F I G U R E 4 Hierarchical consensus clustering of proteomic profiles with 100% of samples and 50% of features (compounds) applying the proportion of ambiguous clustering as internal validation measure (Șenbabaoğlu et al., 2014) to identify cluster stability (here: 59 clusters). Colours of the clusters refer to the MOTUs as identified by COI; MOTU number and where possible the assigned name of the MOTU is provided; * indicates individuals where MOTU and POTU delimitation was not consistent F I G U R E 5 Identification of number of species clusters based on consensus clustering with 100% of samples and 50% of features (compounds) using Euclidean distances derived from proteomic composition and the proportion of ambiguous clustering (PAC) as internal validation measure (Șenbabaoğlu et al., 2014); the red dot indicates the first minimum of PAC

| Taxonomic resolution
The barcode region of the mitochondrial COI gene is a wellestablished character for species-level identification of many marine metazoan taxa and has been shown to be successful in a vast range of studies on calanoid copepods (e.g., Blanco-Bercial et al., 2014;Bucklin, Hopcroft, et al., 2010;Laakmann et al., 2013;Machida et al., 2009). COI sequence analysis revealed 60 MOTUs when using the proposed sequence similarity of 97% (Hebert, Ratnasingham, et al., 2003 Inter-and intraspecific variation of proteomic fingerprints is far less understood. Several studies have proved that copepods show clear differences between species in proteomic mass spectra (Bode et al., 2017;Kaiser et al., 2018;Laakmann et al., 2013;Riccardi et al., 2012), and that these spectra can be used for species identification using supervised machine learning techniques with pre-established libraries (Rossel and Martínez Arbizu, 2018). Yet, no gold standard for unsupervised species delimitation has been developed. A previous study successfully applied partitioning around medoids (PAM) clustering in combination with the silhouette index to predict species number (Rossel and Martínez Arbizu, 2020). PAM and also k-means clustering were not applicable to our study due to the expected unbalanced data set with most probably many singletons and small sample size. Thus, we applied agglomerative hierarchical clustering (HC) in combination with different cluster validation methods.
While the silhouette index seems to be very problematic for many singletons, the gap analysis only resolved larger structures in the data set. The two criteria based on the ratio of a separation-and compactness-measure of the clusters (Calinski-Harabasz and Dunn) were most accurate in delimiting POTUs on species-level. These strong differences emphasize that POTU identification is highly sensitive to the applied approach especially in "difficult" data sets with many singletons and/or unbalanced composition. However, regardless of the unsupervised algorithm applied,all approaches require a consistent, stable and taxon independent species gap in the similarity of mass spectra. At least for the species included in this study this prerequisite seems to be fulfilled, as classification of MOTUs and POTUs was generally consistent. More validation studies on species-level delimitation as well as the development of standard pipelines will be needed to establish proteomic fingerprinting as assessment tool for biodiversity in understudied species communities.
In conclusion, taxonomic resolution of all three methods was similar, with morphospecies, MOTUs and POTUs most likely representing distinct biological species.

| Accuracy
Another factor to be evaluated is the accuracy, i.e., percentage of specimens that can be correctly assigned to a species or genus name. This process is inevitably linked to morphological identification based on expert taxonomic knowledge, either during the study itself or by information coming from an integrative reference library.
The genetic distance-based assignment of MOTUs using the basic local alignment search tool (BLAST) method (Altschul et al., 1990) provided the opportunity to assign MOTUs to already described and sequenced species, thereby adding to the species information obtained by the data set. This supports the importance of taxonomically comprehensive DNA barcode databases when morphological identification is not possible, for example, in juveniles. At present, proteomic profiles still require morphological or genetic intercalibration if more information than only diversity or species number is needed. Standard pipelines for supervised species identification in combination with central proteomic libraries still need to be established. The intercomparison of MOTUs and POTUs revealed that 7% of specimens were assigned differently. A slightly smaller error rate of 4% has been observed applying clustering on simulated data sets (Rossel and Martínez Arbizu, 2020). Misidentification of single specimens was neither detectable using a direct comparison of sample distances nor by consensus clustering, indicating that misidentification is probably caused by variance in the proteomic profiles.
The stability of clusters was relatively high; however, some clusters seem to be more predestined for unstable delimitation than others.
Overall, it is evident that the accuracy of proteomic fingerprinting for species discrimination of calanoid copepods is lower than of COI sequencing and also morphological identification.

| Net identification rate
The net identification rate was lowest for morphological identification, as it allowed morphospecies delimitation for only 37% of the specimens found, due to the tedious work and expert knowledge required for species identification, as well as the almost exclusive limitation to adult individuals. Molecular methods contributed significantly to the estimation of diversity by providing the possibility of including juvenile stages into the analysis. COI sequencing was successful in 70% of the specimens that were included in the molecular analyses. The loss rate of 30% most likely originated from the need for taxa-specific optimization; i.e., the sometimes low affinity of the universal COI barcoding primers by Folmer et al., (1994)  Barcoding also requires experienced staff (at least well-trained technician level) and the cost of consumables are generally high, more than 5 Euro per specimen . MALDI-TOF, on the other hand, is a fast and low-cost (0.1 Euro consumables) method , which can be accomplished easily after very short training, resulting in relatively low personal costs.

| Species richness and diversity
The deep sea is by far less explored than coastal areas, although it occupies 60% of the planet (Costello et al., 2010), and its stability and large area may accommodate a high species richness (Grassle, 1989). Calanoid copepods form the most numerous taxon in pelagic waters, often making up 80% of the zooplankton biomass in the water column (Mauchline, 1998 In conclusion, cross-validation of proteomic fingerprinting with morphology and COI sequencing proved a generally consistent species discrimination of calanoid copepods for all three methods.
Based on this, proteomic fingerprinting added significantly to the biodiversity assessment, as it was the only method allowing for a successful analysis of all individuals examined. With this method, an extremely high species diversity of calanoids, as well as a high degree of singletons, could be detected. Morphological information revealed that most of these species are new to science. Therefore, we consider proteomic fingerprinting to be an accurate, fast, inexpensive, and therefore highly promising assessment tool, which can provide comprehensive baselines of species diversity, not only in epipelagic monitoring studies, but in deep-sea studies with high number of unknown species as well. The method is still in its infancy in marine science. Reference libraries allowing taxonomic information to be assigned to species-specific proteomic features need to be established and filled before the method can be applied as a "stand-alone" tool. Also, we will have to enhance our understanding on the uncertainties and pitfalls of the method. However, although taxonomic expertise remains the keystone for any biodiversity assessment, and COI barcodes provide reliable information for species discrimination and assignment, an integration of proteomic fingerprinting will clearly enhance and accelerate the identification processes in biodiversity studies.