- Top of page
- Materials and Methods
- Discussion and Conclusion
- Literature Cited
- Supporting Information
Analytical flow cytometry (FCM) is well suited for the analysis of phytoplankton communities in fresh and sea waters. The measurement of light scatter and autofluorescence properties of particles by FCM provides optical fingerprints, which enables different phytoplankton groups to be separated. A submersible version of the CytoSense flow cytometer (the CytoSub) has been designed for in situ autonomous sampling and analysis, making it possible to monitor phytoplankton at a short temporal scale and obtain accurate information about its dynamics. For data analysis, a manual clustering is usually performed a posteriori: data are displayed on histograms and scatterplots, and group discrimination is made by drawing and combining regions (gating). The purpose of this study is to provide greater objectivity in the data analysis by applying a nonmanual and consistent method to automatically discriminate clusters of particles. In other words, we seek for partitioning methods based on the optical fingerprints of each particle. As the CytoSense is able to record the full pulse shape for each variable, it quickly generates a large and complex dataset to analyze. The shape, length, and area of each curve were chosen as descriptors for the analysis. To test the developed method, numerical experiments were performed on simulated curves. Then, the method was applied and validated on phytoplankton cultures data. Promising results have been obtained with a mixture of various species whose optical fingerprints overlapped considerably and could not be accurately separated using manual gating. © 2011 International Society for Advancement of Cytometry
In the euphotic layer of the ocean, oxygenic photosynthesis is responsible for virtually all biochemical production of organic matter, resulting in an annual flux of 4 × 1015 moles of carbon (1). This biological pump constitutes the most important carbon sink at the oceanic scale, keeping the atmospheric carbon dioxide concentration 150 to 200 ppmv lower than it would be without phytoplankton in the ocean (2). Marine primary production represents 45% of the bulk primary production on Earth (1) whereas the marine phytoplankton biomass only accounts for 2% of the global photosynthetic biomass. The high productivity shown by this taxon can be explained by high potential growth rates and short life cycles (3). Biological absorption of carbon is almost entirely realized by small-sized phytoplankton communities (<10 μm) under the control of light, nutrients (4), grazing, and viral lysis.
Because of the complex origin of the chloroplast, the phytoplankton is a polyphyletic taxon (5, 6). This deep taxonomic diversity induces a highly functional diversity (7): as the result of evolutionary processes that have led to the optimization of light harvesting, different sets of chlorophyll and accessory pigments (carotenoids, phycobiliproteins, etc.) can now be observed (8). Phytoplankton communities are also morphologically diverse, varying in shape and size, as a result of adaptation to physical processes (such as hydrodynamics, irradiance), grazing (formation of colonies, extracellular spikes), nutrient uptake (variation of the volume/surface ratio) (9–13).
To understand the complex dynamics within the phytoplankton community and how the biotic and abiotic factors control them, it is necessary to obtain accurate information at various spatial (from the cell to the ocean) and temporal (from hours to years) scales. Taxonomic analysis by optical microscopy has reached its limit as it is time consuming and requires experienced people (14). Consequently, high frequency analysis (typically several times per hour) is still out of reach. Therefore, other faster techniques such as high pressure liquid chromatography (HPLC) or spectrofluorimetry have been developed and successfully applied to aquatic environment studies. However, they only provide a bulk measurement. Analytical flow cytometry (FCM) has become an attractive alternative as it can perform measurements at high frequency and at the single particle level. For each particle passing through a light source (typically one or several laser beams), a set of real values related to light scattering and fluorescence (natural or induced) are recorded.
Although being an ataxonomical method, FCM allows the discrimination of particle clusters within an aquatic sample based on their optical fingerprints (fluorescence signatures and scattering properties). In the last 20 years, flow cytometers have been designed to marine applications (10). This is the case for the CytoSense instrument (Cytobuoy B.V.). A particular feature of this instrument is its capacity to record the full pulse shape along each particle for both scatter and fluorescence signals (15). This way of scanning cells sequentially provides more information on the morphological variability within the phytoplankton community. By monitoring the phytoplankton clusters at high frequency, unexpected dynamics have been revealed, with respect to strong wind events and physicochemical conditions (16). Additionally, studies by Thyssen et al. demonstrated the capability of this flow cytometer to identify groups that were not discernable using more conventional instruments (16, 17).
After collecting data with the CytoSense, the usual approach is to reduce each pulse into classical descriptors (inertia, fill factor, asymmetry, number of peaks, length, etc) using the Cytoclus© software (Cytobuoy B.V., The Netherlands). Data are displayed by means of scatterplots and histograms that facilitate the visualization and identification of particle clusters defined by similar optical properties. The clusters are usually created by manually drawing and combining regions (gating). This way of defining arbitrary groups is not always objective and can lead to errors, in particular when clusters overlap, shift positions or when different pulse shapes lead to similar classical descriptors. The aim of this study is to provide a observer-independent and consistent method to automatically analyse the data and define clusters (Fig. 1). Despite the large quantity of approved tools available for multivariate analysis, few researchers have worked on the automation of FCM data processing. The major advances have been obtained with Artificial Neural Networks (18–23), mixture models approach (24, 25, 26) or discriminant analysis (27, 28). As longitudinal information related to the particle morphology clearly appears through the pulse shapes, one of our goals is to verify to what extent the statistical analysis of functions (29) i.e., the shapes of the full raw pulses can offer an advantage over using only usual descriptors. The shape, length and area of the various recorded curves have therefore been chosen as descriptors and then used in this study. Several tests have been performed on simulated pulses to test the efficiency of the clustering method. The model has then been validated on biological data collected from phytoplankton cultures.
Figure 1. General scheme of the proposed method. (MDS: Multidimensional scaling). Data collected by the CytoSub (top right) lead to a large and complex set of data (top left): five pulse shape signals (forward and sideward light scatter, FWS and SWS, respectively; red, orange, and yellow fluorescences, FLR, FLO, and FLY, respectively). From the raw signals pulse lengths and amplitudes (conventional descriptors) and pulse shapes (functional descriptors) are computed. Based on distance matrices computation, classification methods are then applied in order to find the various clusters (bottom right). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]
Download figure to PowerPoint
Discussion and Conclusion
- Top of page
- Materials and Methods
- Discussion and Conclusion
- Literature Cited
- Supporting Information
Analysis of aquatic microorganisms performed by flow cytometry is currently used to address their abundance, diversity, and dynamics (10). Data analysis for conventional flow cytometers is based on a set of real values (peak, area, pulse width) corresponding to the light scatter and fluorescence signals recorded for each single particle as it is intercepted by the light source. The clusters are drawn from various histograms and dotplots. The interpretation of these clusters is based on the operator expertise. This way of analysis is particularly well suited for specific observations in samples with known groups (cultures, previously analyzed samples). As far as aquatic environmental studies are concerned, the main purpose of conventional flow cytometric analysis is to define these groups, count the cells, and get information at the group level (basic statistics for light scatter and fluorescence signals: mean, median, mode, and standard deviation for instance). In aquatic environments, phytoplankton diversity is huge, gathering thousands of species with various shapes and covering four decades in size. Some of the species are harmful and need to be monitored at high frequency to detect as fast as possible any sanitary risk.
The advances in electronics and computing contribute to the development of more compact instruments able to record a growing number of variables (10). Some instruments are even able to collect pictures of the particles as they flow through the flow cytometer. Particular models such as the CytoSub (15) and the Flow Cytobot (Heidi Sosik and Rob Olson, WHOI) have been especially developed for the marine field (41). Once deployed in situ (moored or in a buoy) these instruments can perform automated analyses of the phytoplankton cells at a scheduled sampling frequency. With the CytoSub, up to six analyses per hour can be scheduled, quickly generating a huge quantity of data. This high frequency analysis opens the way to new information out of reach when using the classical methods (16). The automation of analysis therefore becomes critical. To some extent, performing such analyses by an operator would become impossible (time consuming, lack of objectivity in the clustering, etc).
Phytoplankton analysis with the CytoSub flow cytometer is innovative in the way that it is based on the pulse shape recording along each particle. It is a compromise between the taxonomical complexity and conventional flow cytometry. It provides information on phytoplankton diversity without fully addressing the complexity of the taxonomical identification. Shape analysis becomes relevant when the recognition and the differences between different shapes are surrounded by mathematical laws.
This study purposes data processing automation in order to provide efficient tools for objective analysis of the full fingerprints of phytoplankton. To test and validate the clustering methods, we have started with some numerical experiments to work on simulated objects known a priori: the classes are defined in advance and the dataset is tunable (heterogeneity within classes; number and relative abundance of each class). Moreover, an infinite number of situations can then be generated (from easy cases to more complex ones). After test and validation of the clustering methods, experiments have been performed with real data collected from the flow cytometry analysis of several cultures (20 different strains belonging to various taxa). Important features can be discussed about the results obtained with the numerical experiments and the phytoplankton cultures. The distance invariant to orientation acts as a deformation of the functional space, gathering shapes similar by symmetry. The experiments carried out with the six simulated families have proven the effectiveness of the distance invariant to orientation computation. When classes present great deviations in their relative abundance (Fig. 6), the partition occurs within the group presenting the predominant abundance. In this case, the clustering does not converge to the proper cutting. In other words, the predominance of a group in a natural sample could prevent identification of other groups in lower abundances.
In aquatic environments, natural samples contain a smaller abundance of large phytoplankton cells (i.e., >20 μm or chain-forming species) and a larger abundance of small phytoplankton (42, 43). It will be essential to consider this phenomenon. Thanks to the modification experiments on the variability within families, it was possible to get different results for the tested clustering methods. A gain was induced by testing several methods and comparing them. However, one approach cannot be considered to be better than another, but more or less adapted to a particular case. In this study, the fuzzy clustering fitted better with the type of data generated by the CytoSub, providing higher classification success than the K-medoids method. This result is due to the specificity of the fuzzy method, which enables a better separation of overlapping groups.
To handle the complex data collected with the CytoSub (i.e., the optical fingerprints corresponding to the five raw pulses), it was necessary to find a way to deal with descriptors of different types such as length, AUP, and functional shape of the various optical fingerprints. The distance matrices of each descriptor were first computed individually and then successfully combined. While looking for the most efficient clustering method, our primary focus was to find out whether using the functional shape could be more efficient than the classical method (i.e., based on real numbers). To do so, two particular datasets of phytoplankton cultures (Amphidinium carterae and Tetraselmis tetrathele) were selected and artificially mixed into a single data file. By analyzing both species with the CytoClus software, i.e., the software dedicated to the CytoSub data analysis using the classical method with conventional descriptors, the toxic and nontoxic species could not be adequately distinguished. Their optical fingerprints were too similar to form distinct clusters in the classical two-dimensional dotplots preventing any efficient manual separation. On the contrary, the autonomous clustering method was clearly efficient (Fig. 9). The classification success reached about 78%, and the two species were well discriminated. Another aim was to test the contribution of the shape related information compared to the classical descriptors. In this case the gain was about 10 points between the classical descriptors and the combination of functional shape descriptors and classical descriptors, a weak improvement but significant. The shape related information appeared useful when particles presented morphological modification or typical features such as the repetition of a similar pattern (for instance chain-forming cells), or the presence of appendages usually linked to an environmental adaptation.
Adversely, shape related information was less efficient for small particles because their shape tends typically to a sphere and thus the corresponding optical fingerprints are dominated by a Gaussian shaped curve (16). However, the use of full pulse shapes is surprisingly applicable for cells that are smaller than the height of the focused laser (5 μm). From the analysis of very small particles (2 to 6 μm in size) the following statements can be made: (i) By considering observations as “curves” (actually “densities” would be more appropriate) one takes into account all moments of all orders and not only mean and variance, (ii) most of the signals look like bell shaped but there is a great variability between the signal shapes due to the difference in skewness (data not shown), (iii) moreover the position of the maximum is not always central leading to asymmetrical curves and this is potentially linked to cell morphology, (iv) considering the entire optical fingerprint (i.e., the whole five variables) these slight variations in signal shapes induce a decoupling between signals. This constitutes an additional information with regards to classical method handling only with length, height or area under the signal.
Through all experiments described in this study, with numerical simulations and real data from more than 20 cultures, a new method of analysis has been validated. It is a new method as it combines conventional descriptors with the pulse shapes. This is complementary to the previous works by Boddy and collaborators who considered the peak integrated values and pulse widths (18). The main known difficulty with unsupervised classification methods is to choose the number of clusters: thanks to the Silhouette coefficient computation, the optimal number of groups can also be found without any human interference. The robustness or consistency of the associated partition is also provided by the maximum of the Silhouette coefficient values. It provides also a visual display of the data with a rational criteria proposed to select splits. But this display is limited by the initial number of observations which must be reasonable. That is not the case when dealing with datasets coming from CytoSub and its large number of observations (several thousands of cells). It is however possible to use subsampling methods to evaluate the number of final clusters with clearer displays. Another original and interesting feature of the described method is that the analysis remains flexible due to the system of weights that can be associated with the distance matrices of each descriptor. The operator can tune the weight applied to the various variables depending on their respective interest and therefore decide to adjust the method to any particular case. By handling the raw pulse shape as a functional descriptor, the potential of the CytoSub flow cytometer is fully utilized. It is true that this study does not present any results of an analysis on natural sample, needed to consider all the complexity that can occur in the field (various clusters, large biodiversity, background noise, etc). The major reason therefore is that to test the efficiency of the clustering methods, it was necessary to have a knowledge of the sample composition. It was mandatory to control the clustering efficiency by comparing the results with what was expected. The work with natural samples is ongoing and will be addressed in other studies.
The automation of sampling acquisition as well as the data analysis and clustering open the way to the spatiotemporal analysis at high frequency, which has previously been out of reach because of physical constraints (need for operator(s), work onboard depending on the ship availability and meteorology, etc). Oceanographic cruises, for instance, are characterized by their limits both in space, whether or not their track covers a long distance, and mainly in time, failing to provide the spatial coverage and temporal resolution required to determine a realistic picture of the marine environment and detect changes within it. To face such challenges, many efforts have been dedicated to the automation of measurements and the autonomy of instruments in order to produce monitoring systems delivering sufficient online data. This is the impetus of the Global ocean observing system (GOOS) endorsed by the United Nations (UNESCO) and in Europe by the European GOOS initiative EuroGOOS. The International Council for the Exploration of the Sea (ICES) and the Mediterranean Science Commission (CIESM) are also developing such activities (see TRANSMED: http://www.ciesm. org/marine/programs/transmed.htm, CIESM pilot project). The high frequency survey should bring new information, which is essential to better understanding the complex dynamics of phytoplankton communities in relation to their environment.