Graphical analysis of flow cytometer data for characterizing controlled fluorescent protein display on λ phage



As native virus particles typically cannot be resolved using a flow cytometer, the general practice is to use fluorescent dyes to label the particles. In this work, an attempt was made to use a common commercial flow cytometer to characterize a phage display strategy that allows for controlled levels of protein display, in this case, eGFP. To achieve this characterization, a number of data processing steps were needed to ensure that the observed phenomena were indeed capturing differences in the phages produced. Phage display of eGFP resulted in altered side scatter and fluorescence profile, and sub-populations could be identified within what would otherwise be considered uniform populations. Surprisingly, this study has found that side scatter may be used in the future to characterize the display of nonfluorescent proteins. © 2012 International Society for Advancement of Cytometry

Unlike small molecules, whose concentration and molecular make-up can be readily determined by a number of well-established techniques, viruses fall within a size range and have a complexity that make them difficult to characterize and quantify. To this day, the most commonly used techniques for identifying and quantifying viruses are based on identifying plaques of dead cells obtained after adding serial dilutions of a virus solution, although a number of alternate techniques have been developed. Hirshfeld et al. were one of the first to report a methodology to obtain both concentration and size characteristics of a virus based on staining the nucleic acid content of the virus with a fluorescent dye and observing the result using a fluorescent microscope (1). Not long after, Hercher et al. developed a method based on the principle of flow cytometry and light scattered by virus particles (2). Although viruses are known to scatter light, because virus particles are much smaller than cells, the wavelength and the angular distribution of light scattered by these particles cannot be directly correlated to shape, structure or size alone.

Viruses are too small in particle size to be discriminated solely on the basis of their light scatter properties using commercially available benchtop flow cytometers. As such, much work has been done to combine both light scattering characteristics of the particles with fluorescence similarly to what was originally done by Hirshfeld et al. (1). Significant progress on the detection of viruses by flow cytometry occurred with the advent of nucleic acid staining dyes that emit in the green with high fluorescence yield after excitation at 488 nm. Marie et al. first described an approach to detect marine viruses on a standard flow cytometer (FACSort, Becton Dickinson, CA), followed by the work of Brussaard et al. that investigated viruses from a number of different families including Baculoviridae, Herpesviridae, Myoviridae, Phycodnaviridae, Picornaviridae, Podoviridae, Retroviridae, and Siphoviridae (3–5). A relatively recent comprehensive methodology paper based on this work, but focusing primarily on bacteriophage enumeration has also been published (6). Although it was recommended to specifically determine which conditions are optimal when analyzing specific phage species, one set of variables could be optimized for mixed bacteriophage samples—distinguishing them primarily based on the fluorescence staining of their respective genomes.

There are some exceptions to the general pattern of using a fluorescent stain for the discrimination and quantification of virus by flow cytometry. For example, recent work on the expression of green fluorescent protein (GFP) on the surface of T7 bacteriphage used flow cytometry to quantify the affinity of GFP-based binders (7). However, the actual characterization of GFP display was performed by fluorescence correlation spectroscopy, similar to Slootweg et al. (8). The application of flow cytometry to the analysis of virus particles displaying GFP has thus far been limited in its scope.

Given an interest in controlled protein expression and phage display, we set out to investigate whether it was possible to analyze eGFP display on λ phage using a commercial flow cytometer. Different levels of eGFP decoration were achieved using a bidimensional genetic system, capitalizing on the ability of bacteriophage λ to display foreign proteins on their surface through translational fusions to the C terminal of gpD, one of the major capsid proteins (9). The bidimensional genetic system consisted of a λ Dam capsid mutation complemented in trans by a plasmid that confers temperature inducible expression of a gpD::eGFP translational fusion, in tandem with glutamine (SupE) suppression of the Dam15 amber mutation of the infecting phage.

The purpose of this work was to develop a methodology to characterize the display of eGFP on λ phage using flow cytometry including the optimization of sample acquisition, data processing, and data interpretation. This methodology goes beyond simple enumeration and qualitatively assesses the expression of GFP on the surface of bacteriophage λ, identifying sub-populations in fluorescent phage produced at four different culturing conditions 30, 35, 37, and 39°C. While the methodology described in this report is framed in the context of surface eGFP display, it is equally applicable to analysis of surface display in general. This could then be used to examine the effect from other process parameters, for example, length of exposure to high temperatures (10).


Flow Cytometric Sample Preparation

Phage, cells, and plasmids used in this work can be found in Table 1.

Table 1. Bacterial strains and plasmids
Cell/phage/plasmid designationGenotypePhenotypeSource/reference
Phage strains   
λF7λDam15imm21cItsgpD λ heteroimmune 
Bacterial strains   
BB4supE supF hsdR [F] proAB+ lacIq lacZΔM15] Agilent Technologies
W3101F-, galT22, λ-, IN(rrnD-rrnE)1, rph-1 CGSC #4467
W3101 SupEF-, galT22, λ-, IN(rrnD-rrnE)1, rph-1,  crcA280::Tn10, glnV44(AS) This study
pPL451 gpDpM-cI857-pL-cI857-pL-D-tL This study
pPL451 gpD::eGFPpM-cI857-pL-cI857-pL-D::gfp-tL This study

Phage amplification

Cultures of transformed W3101 SupE [pPL451gpD::eGFP] Escherichia coli cells were grown on plates at 30, 35, 37 or 39°C overnight, while cultures of W3101 [pPL451gpD] were grown up at 37°C only, prior to the addition of primary lysate dilutions. 1:10 dilutions of primary lysates were prepared in 1 mL of TN buffer (0.01 M Tris–HCl and 0.1 M NaCl, pH 7.8, Fisher Scientific, USA). Lysate dilutions were added to 300 μL of either W3101 SupE [pPL451gpD::eGFP] or W3101 [pPL451gpD] cells before adding 3 mL of top agar (Bacto Tryptone and Bacto Agar from Difco Laboratories, Sparks, MD). The solution was poured evenly onto LB plates and left at room temperature until set. Plates were then incubated at the temperature that the cells were grown up at for 24 h then used for lysate preparation. About 10 mL of TN buffer was added to the surface of the plate and incubated for an additional 8 h at 4°C. After incubation, the TN buffer was removed from the plate with a pipette and added to a fresh conical tube along with the underlying top agar. The tube was mixed vigorously and centrifuged at 12 K rpm (Avanti J-E Centrifuge, Beckman Coulter, Mississauga Canada) at 4°C for 20 min. The supernatant was poured into a fresh ice-cold (0°C) conical tube and 20–40 μL of CHCl3 was added (Fisher Scientific, USA) and shaken vigorously to kill any remaining cells. Lysates were then precipitated for purification and concentration purposes with 20% polyethylene glycol (PEG)-8000 (Fisher Scientific, USA), 2.5 M NaCl using a standard protocol (11) and resuspended in fresh TN buffer. Lysates were then filtered through a sterile 0.45 μm syringe filter (BD Discardit, India) to remove any remaining cell debris. To purify lysates of any cellular proteins, particularly unincorporated gpD::eGFP, lysates were purified by gel chromatography (12) offering lysate purity that is comparable to CsCl centrifugation and can be conducted successfully with smaller volumes. Briefly, lysates were passed through a 50–150 μL agarose size exclusion column (4% beads, ABT, Spain) in buffer containing 10 mM Tris (pH 7.5) and 1 mM MgCl2. This method was repeated on phage and cell controls. Samples were titered at each step of purification by standard viability assays on fresh BB4 E. coli cells. Final phage titers ranged from 1010 to mid 1011 phage/mL (Table 2). Samples were stored at 4°C prior to being analyzed by flow cytometry without further treatment, except for dilution where stated. All samples were analyzed within 1 week of preparation between November 2010 and October 2011.

Table 2. Phage titers
LysateTemperature of inductionPFU/mLFACS counts/mL
λF7-supE-eGFP309.6 × 10107.51 × 106
353.8 × 10111.41 × 107
371.3 × 10102.94 × 107
397.2 × 10102.84 × 107
λF7371.0 × 10101.39 × 107

Phage titration

Viable counts of phage were quantified by plaque assay using serial dilutions in TN buffer added to BB4 E. coli cells. Cells and phage were mixed in agar and then plated on LB medium. The plates were incubated overnight at 37°C and inspected for plaques. Table 2 contains the titers of the stocks used for the analysis by flow cytometry.


In the development of this work, a number of negative controls were used. These included deionized water (DI), phosphate-buffered saline (PBS), TN buffer, noninfected W3101 SupE [pPL451gpD] cell lysate (herein referred to as gpD), noninfected W3101 SupE [pPL451gpD::eGFP] cell lysate (herein referred to as gpD::eGFP), and λF7 infected W3101 [pPL451gpD] cell lysate (herein referred to as λF7). Unless stated otherwise, all samples and controls were analyzed in triplicate.

DI, phosphate-buffered saline PBS and TN buffer

These controls were pure negative controls that were used to determine the cleanliness of the system and cleanliness of the diluant used.


This control was used to see if cell debris interfered with the analysis of the phage particles in this study. Cells producing gpD from a plasmid (pPL451gpD) were lysed and treated in the same manner as those that were infected. This control was run at 37°C, a representative temperature for optimal growth and expression.


This control was used to see if the expression of the gpD::eGFP fusion protein contributed to the fluorescence signal detected when analyzing the phage particles. Cells producing gpD::eGFP from a plasmid (pPL451gpD::eGFP) were lysed and treated in the same manner as those that were infected. This control was run at 37°C, a representative temperature for optimal growth and expression.


This control is one in which λF7 phage, which has an amber mutation within the gpD sequence preventing the expression of functional gpD, is grown in E. coli containing a plasmid that allows the production of gpD (pPL451gpD), thus allowing λF7 phage to be produced. Cell lysates were purified according to the methods described in the Phage Amplification section. This control provided a means of determining whether nonfluorescent phage could be detected with the flow cytometer using the chosen flow cytometer settings. Phage were propagated at 37°C, which is an optimal temperature for the production of the phage.

Flow Cytometric Analysis

Data acquisition

All data was collected using a FACSCalibur flow cytometer (BD Biosciences, San Jose, CA) equipped with a 15 mW air-cooled argon-ion laser, with an excitation frequency of 488 nm. Full technical specifications of the flow cytometer can be found on the BD Biosciences website ( Cytometry_TechSpec.pdf). Side scatter (SSC) photomultiplier tube (PMT) voltage was set to 500 V and fluorescence (FL) PMT voltage to 525 V, both with logarithmic amplification. FL detection was performed with a 530 nm bandpass filter. Samples were run for 30 s at the low flow setting (20 μL/min). No compensation was performed as only a single fluorescence channel was used.

Two bases of comparison were initially considered for data acquisition—fixed acquisition time or fixed event number. A fixed acquisition time strategy was chosen due to its better representation of noise. The false detection of events in PBS was found to occur at a relatively constant rate. Thus, keeping the acquisition time fixed allowed for a constant baseline of noise for all samples, regardless of event count. The strategy also allowed for the detection and comparison of negative controls, which would have been hard to interpret on a fixed event number basis due to their much lower detection rates.

All list mode data files are available on a protected WebDAV site ( Password is available through the corresponding author.

PMT voltage calibration

PMT voltage settings were determined using design of experiments (DOE) principles. Negative controls as well as lysates from λF7 infected W3101 SupE [pPL451gpD::eGFP] cultures (herein referred to as λF7-SupE-eGFP) carried out at 30, 37, and 39°C and diluted 1×, 3×, 9×, 27×, and 81× with PBS were tested using a combination of various PMT voltage settings. Initially, a 22 factorial design was used to test a SSC voltage range of 250–500 V and a FL voltage range of 475–525 V. Four runs were performed to test each combination of the extreme (corner) points, one run for each of the following (SSC, FL) settings: (250, 475 V), (250, 525 V), (500, 475 V), (500, 525 V). A centre point triplicate (375, 500 V) was used to assess consistency. The highest setting combination of (500, 525 V) was found to be the only one that gave a visible cluster for the least diluted sample. Following this result, a second 22 factorial design was used to test higher SSC and FL voltage ranges of 500–600 V and 525–575 V, respectively, using the same procedure as detailed above. The (500, 525 V) setting was once again identified as the best; this time based on its ability to discriminate between the various dilutions and culturing temperatures of λF7-SupE-eGFP samples. The differences in events captured between successive dilutions were used as a simple metric to assess the signal to noise ratio.

Data Processing

All data was processed using the “R” programming language (R Development Core Team, 2011). The packages flowCore (13), data.table (14), and ggplot2 (15) were of particular importance for data importing, general processing, and plotting, respectively.


To minimize any bias that could be incurred by setting an arbitrary threshold, the threshold setting for side scatter (SSC) and fluorescence (FL) were set to zero. During initial investigation, most samples were observed to have a major concentration of particles with a relative fluorescence value of one on the logarithmic scale, equal to a fluorescence level below the minimum detection limit (referred to as nonfluorescent throughout this work). This grouping did not fit into the observed patterns of fluorescent data, as it was composed of a broad spectrum of low fluorescence levels, all of which were below the detection limit and therefore artificially labeled as having a relative fluorescence value of one. As the number of events in this group was bounded by side scatter detection and not fluorescence, it could not be compared directly to the number of events at any other relative fluorescence and was excluded from graphical representation. A parallel argument was used for events with a relative side scatter value of one. All events with a SSC and FL greater than 1 were considered in subsequent analysis. An alternative to this approach using the entire data set and “logicle” display method (16, 17) was considered, but rejected due to the inability to easily differentiate noise along the axes.

Density estimation

Transformation of fluorescence and side scatter data for individually detected events to an event density (relative frequency) at given scatter or fluorescence levels was done using (Gaussian) kernel density estimates as part of the base R functionality in 1D (either relative fluorescence or side scatter values) and using the “MASS” package (18) in 2D. Gaussian kernels for 1D density calculations were used to stay consistent with 2D calculations (both relative fluorescence and side scatter values), for which bivariate Gaussian kernels were the only available choice. Density curves generated from kernel density estimates scaled by the total event count were deemed superior to histograms generated from discrete binning as density estimates are inherently continuous and allow natural smoothing of the final distributions. Furthermore, kernel density estimation does not require an explicit choice of bin width (an implicit calculation is done automatically as part of the algorithm), allowing for more consistent results between samples. Contrary to discrete binning, however, density values generated from kernel density approximations cannot be interpreted directly. This is because the integral of the density curve is what equates to the total event count and not the sum as for histograms; therefore, the actual density values have no physical meaning and can only be used as a relative measure of density within samples. Scaling the density integral by the total sample event count allowed inter-sample comparison. Calculated density values were assigned to each detected event as a third characteristic variable (or dimension) using interpolation (performed in 2D using the “fields” package (19)). Events with lower density values were treated as rarer and more likely to be outliers or noise, while those with a high density as representative of sub-population clusters.

Graphical representation

1D event density was represented directly by plots of kernel density estimates, scaled by the number of events in the sample. A number of visualization options were considered for the presentation of 2D density, including single contour confidence intervals, multiple line contours, filled color gradients, and 3D surfaces. In each case, the goal was to identify clusters of events, corresponding to peaks in event densities, found for a combination of side scatter and fluorescent ranges. For the preliminary investigation, it was decided to use a combination of contour plots and discrete color gradients. A discrete binning method was chosen for the generation of color gradients to avoid unnecessary smoothing, which would hide highly overlapping sub populations. The choice of hexagon bins was an aesthetic one as hexagons look more like points, but give better quantitative data than the overlap of semi-transparent point “clouds.” Contour lines were added to give better shape discrimination.

Parameter extraction

General sub-population cluster positions were identified visually from contour plots and defined by side scatter and fluorescence regions. To allow quantitative analysis, cluster peak locations (equivalent to local modes in the density function and measured in terms of fluorescence and side scatter) were chosen as characteristic of the clusters and extracted from the graphical data. The general cluster regions were identified visually, and defined by side scatter and fluorescence regions. Density values were calculated and assigned to each event in a region as previously described in the density estimation section. Events with the highest 5% density values were chosen as representative of a significant peak in that region; their mean side-scatter and fluorescence values were calculated to give the location of the peak (Fig. 1). While the result of this process is very similar to the calculation of a local density mode in a defined region, it is more robust to noise than a naive calculation. A sub-population cluster is not guaranteed to have a single well-formed peak. Artifacts in detection or density calculation may result in the presence of multiple sharp peaks within a single broader one. By calculating the mean fluorescence and side scatter values from a number of points that are associated with higher densities, the overall location of the broader cluster is established with greater precision.

Figure 1.

Schematic representing density filtration of un-smoothed, discretely binned data. (1) A local maximum range is isolated and relative density values (counts) assigned to each event. (2) 95% of the events, ordered by density, are filtered out. (3) The mean of log-scaled event side scatter values is calculated and taken as an estimate of peak location.


Biological System Under Study

As previously stated in the introduction, different levels of eGFP decoration were achieved using a bidimensional genetic system. The λ phage in this study is unable to produce the gpD capsid protein, essential for the formation of λ phage, because of an amber mutation in the sequence coding for gpD. To allow the production of phage, a plasmid expressing the gene for gpD or gpD::eGFP must be carried by the cells, or the cells themselves must be amber suppressors allowing gpD to be expressed from the gene in the phage containing the amber mutation. Additional control on the level of expression of gpD or gpD::eGFP from the plasmid is obtained by having the gene downstream from the lambda PL promoter that is regulated by the temperature sensitive λ CI[Ts]857 repressor. Therefore, amplification of phage and decoration of the phage with eGFP are controlled by both temperature and how well the cells can suppress the amber mutation (in this manuscript only one suppressor strain is presented).

Count Analysis

The most basic type of sub-population discrimination consisted of the division between fluorescent and nonfluorescent events. Figure 2a compares the number of nonfluorescent (fluorescence = 1) and fluorescent (fluorescence > 1) events detected in control samples, PBS, gpD::eGFP and λF7, as well as a mid-temperature sample of λF7-SupE-eGFP. Both λF7 and λF7-SupE-eGFP were grown at 37°C. While the number of fluorescent events is significantly higher than nonfluorescent ones in λF7-SupE-eGFP, all control samples not containing phage have higher numbers of nonfluorescent particles. For samples with phage (λF7 and λF7-SupE-eGFP), the nonfluorescent counts are comparable while λF7-SupE-eGFP has significantly more fluorescent events. Exclusion of nonfluorescent particles from graphical analysis was argued on the basis of data representation in the section on Thresholding; however, it is clear from Figure 2 that this decision has the added benefit of further differentiating the target phage samples from background. This idea is reinforced by examining the effect of dilution on the two event types for λF7-SupE-eGFP samples that had been cultured at different temperatures (Fig. 2b). The number of fluorescent events detected in λF7-SupE-eGFP samples diluted 10-fold appears to be generally 10 times lower than the number in samples diluted one-fold. While diluting to a total of 100-fold seems to bring the event number of all but the 37°C sample to the range of background noise, the 37°C sample conforms to ideal behaviour. Nonfluorescent events show a much greater degree of deviation from ideality in their response to dilution at all temperatures and for all dilutions, making it difficult to suggest what these events represent other than instrument noise.

Figure 2.

(a) Mean fluorescent and nonfluorescent event counts for PBS, gpD:eGFP,λF7-SupE-eGFP, 37°C. Error bars represent 95% confidence intervals calculated from triplicate measurements. (b) Effect of dilution on mean fluorescent and nonfluorescent event counts for λF7-SupE-eGFP samples, cultured at 30, 35, 37, and 39°C. Error bars represent 95% confidence intervals calculated from triplicate measurements. The larger error bars for the 30°C sample are the result of removing one observation as an obvious outlier.

For fluorescent events, the only possibly significant deviation from ideal dilution can be observed for the first dilution of the 37°C sample. Since the dilution of samples is unlikely to have significant error (with proper technique), even small deviations from expected results can be attributed as an anomalous property of the sample or as an artifact of flow cytometry. The fact that the following 10-fold dilution (from 10-fold to 100-fold total) at the same temperature appears perfectly ideal suggests that the deviation in the first dilution is likely due to an artifact of measurement rather than an issue with the sample. One possible explanation for a lower than expected effect of dilution at high sample concentration may be the detection of multiple particles as a single event (termed coincidence). Indeed, Brussaard et al. have suggested that a detection rate of 100–1,000 events per second is required to avoid coincidence (4), while the detection rate of the one-fold dilution solution was in the area of 5,000 events per second.

Response to temperature further differentiates the two event types. Following a relatively small increase between 30 and 35°C, the number of fluorescent events evident is an order of magnitude higher between 35 and 37°C, in agreement with the CI857 temperature-lability profile and subsequent expression of gpD::eGFP. Considering that the number of fluorescent events detected in the gpD::eGFP control is more than an order of magnitude lower than the lowest amber-suppressed (λF7-SupE-eGFP) sample, it is likely that the temperature dependence of fluorescent event numbers is in fact due to an increase in the number of observed GFP-tagged phage (as opposed to free gpD::eGFP in solution). In contrast, the temperature dependence of nonfluorescent particles is far less pronounced, albeit still apparent. However, as there is no guarantee that the fluorescence of all GFP-tagged phage would be observed, a minimum amount of temperature dependence is to be expected in what was classified as nonfluorescent particles in this work.

1D Density Analysis

Following preliminary analysis of event counts, fluorescence and side scatter distributions were analyzed independently of each other as 1D density curves (Fig. 3).

Figure 3.

(a) 1D fluorescent density distributions for PBS, λF7, and λF7-SupE-eGFP samples, cultured at 30, 35, 37, and 39°C. Events with fluorescence or side scatter values of 1 were excluded. (b) 1D side scatter density distributions for PBS, λF7, and λF7-SupE-eGFP samples, cultured at 30, 35, 37, and 39°C. Events with fluorescence or side scatter values of 1 were excluded.


A relationship was observed between the fluorescence distribution and culturing temperature (Fig. 3a). The fluorescence distributions of λF7-SupE-eGFP cultured at 30 and 35°C do not differ significantly in shape and the differences can be attributed to count alone. The λF7 (grown at 37°C) auto-fluorescence shares a similar distribution to that of λF7-SupE-eGFP cultured at 35°C. It is possible that when eGFP is present, a greater number of fluorescent particles can be detected from lower temperature cultivations. Higher culturing temperatures corresponded well with expected higher fluorescence above 35°C due to increased gpD::eGFP expression and decoration at these temperatures. Between 30 and 35°C, the greatest change was found to be for events with a relative fluorescence between 100 (1.0) and 100.2 (∼1.6) (Fig. 3a). In contrast, increasing the temperature to 37°C, resulted in a much smaller increase of events with a relative fluorescence of ∼1 and the appearance of events with relative fluorescent values as high as 100.6 (∼4). Not only was there a greater overall number of fluorescent events (Fig. 2), but the relative fluorescence of these events was also greater, suggesting a greater density of GFP display per phage. Increasing the culturing temperature to 39°C did not significantly alter the total number of events detected (Fig. 2) but did alter the fluorescence distribution. The impact of increasing the expression level of gpD::eGFP by increasing the temperature past 37°C on the generation of GFP-tagged phage is limited.

Side scatter

Unlike fluorescence, side scatter distributions demonstrated more pronounced changes in their overall shape (Fig. 3b). Most prominent is the difference between λF7 and λF7-SupE-eGFP samples. While the auto-fluorescence of λF7 distribution was comparable to that of λF7-SupE-eGFP, all λF7-SupE-eGFP samples have significantly greater side scatter events. The differences between the λF7-SupE-eGFP samples are also more pronounced than for fluorescence. While the change in culturing temperature from 30 to 35°C resulted in an increase of low fluorescence events (Fig. 3a), the general shape of the fluorescence distribution was quite similar. The same change in temperature, however, resulted in an increase of high side-scatter events (in the 101 to 102 range) and altered the shape of the distribution by increasing its mean and mode, which are more consistent with the distributions of the 37 and 39°C samples. Increasing the temperature from 35 to 37°C increased the relative frequency of particles found near the modes of the distributions at 35°C and above.

2D Density Analysis

Graphical analysis

From a naive point of view, it could be expected that as more protein is displayed on the surface of the phage that the overall shape of the phage and how it scatters light would change, along with changes in fluorescence because of a greater number of fluorescent proteins being displayed. From the 1D analysis, both single dimensions allowed the observation of changes between samples. Using 2D visualization further yields evidence of sub-populations in the form of event clusters (Fig. 4). Not only is the detection of these clusters important but also there is evidence of differences in their response to temperature. Between 30 and 35°C, Cluster B shows only limited fluorescence variation but a definite increase in side scatter. Cluster A, on the other hand, remains in the same position. Increasing the temperature from 35 to 37°C shows a more pronounced difference, and cluster B can be considered to split into Bi and Bii, the latter of which has significantly higher fluorescence and side scatter than the original cluster B. Further increasing the temperature to 39°C has no impact on cluster Bi, but causes a change in the fluorescence and side scatter of cluster Bii, not to mention the cluster shape. While some of these changes are quite subtle, each sample represents over a hundred thousand observed events, making for highly reproducible results as confirmed by replicate analysis (data not shown).

Figure 4.

2D density distribution of fluorescence versus side scatter for λF7-SupE-eGFP samples, cultured at 30, 35, 37, and 39°C. Events with fluorescence or side scatter values of 1 were excluded. Grey hexagons represent regions where at least one event was detected. The density distribution at each temperature is scaled to a constant height. Contour lines represent fractions of maximum density (density quantiles ranging from 0.10 to 0.95 in intervals of 0.05).

While clusters form the most readily available features for analysis, they need not be the only ones. The extent and overall shape of the contour lines present a more qualitative form of analysis. For example, while the 30°C sample events appear highly concentrated around the major cluster, the 35°C sample events are much more spread with respect to side scatter, if not fluorescence (Fig. 4). On the other hand, the 37 and 39°C samples show very similar outside contours despite the significant changes in position, orientation, and shape of the major Bii cluster. While the contour lines in Figure 4 have been drawn around density values scaled to the highest peak in a given sample, further analysis using an absolute representation is also possible.

Cluster quantification

The presence of event clusters made a strong case against using summary statistics over all events in a sample. Ignoring sample heterogeneity was judged as having a high likelihood of overlooking subtle but potentially important temperature-dependent effects. Therefore, cluster locations, characterized by local modes, were chosen to describe the different samples. The choice of hybrid mean-mode statistics using density filtering was made to avoid the variability of local modes, as major clusters have been observed to have multiple smaller peaks within them. In cases where a quantitative comparison made contextual sense, such as for 37 and 39°C samples, comparing clusters was done. As evidenced by the replicates in Figure 5a, the cluster means, which have been overlaid on top of the contour plots, are quite reproducible. To more easily compare the results from the samples collected from the cultures carried out at 37 and 39°C, a bar graph of the cluster means is shown in Figures 5b and 5c. As can be seen, cluster Bii shows a reduction in terms of mean side scatter from 37 to 39°C (Fig. 5c).

Figure 5.

(a) Mean side scatter and fluorescent values for λF7-SupE-eGFP samples, cultured at 37 and 39°C, divided into three clusters and plotted over contour lines. Means were calculated on log-scaled values from events with density values above the 95th percentile. Grey hexagons represent regions in each cluster where at least one event was detected with density above the 95th percentile. Clusters were defined by side scatter ranges of 100.02–100.18, 100.18–100.40, and 100.40–104.00 for clusters A, Bi, and Bii, respectively. The density distribution at each temperature is scaled to a constant height. Contour lines represent fractions of maximum density (density quantiles ranging from 0.10 to 0.95 in intervals of 0.05). (b) Mean side scatter values for λF7-SupE-eGFP samples, cultured at 37 and 39°C, divided into three clusters. Means were calculated on log-scaled values, from events with density values above the 95th percentile. Clusters were defined by side scatter ranges of 100.02–100.18, 100.18–100.40, and 100.40–104.00 for clusters A, Bi, and Bii, respectively. (c) Mean side scatter values for λF7-SupE-eGFP samples, cultured at 37 and 39°C, divided into three clusters. Means were calculated on log-scaled values, from events with density values above the 95th percentile. Clusters were defined by side scatter ranges of 100.02–100.18, 100.18–100.40, and 100.40–104.00 for clusters A, Bi, and Bii, respectively.


The analysis of trends in event counts and the division of fluorescent and nonfluorescent events laid the groundwork for density analysis. Given that at the detection limit of side scatter and fluorescence events have a wide range of secondary parameter values this data could not be easily reconciled. This, however, does not mean that events pertaining to the phage were not captured at these detection limits. Indeed in Figures 2a and 2b, a significant number of events attributable to the phage were found in both the “nonfluorescent” and “fluorescent” categories. Furthermore, Figure 2a also shows that native phage exhibit an intrinsic fluorescence that can be captured by the flow cytometer used in this study.

Brussard et al. have previously shown that different phage and viruses can be differentiated using biparametric dot plots of side scatter and fluorescence when their genome is stained with a dye and analyzed by flow cytometry (4). As an extension to this work and the work of Marie et al., Jorio et al. used a similar approach to characterize a secondary population of aggregated virus. Still, both approaches relied on very clear differences in populations (3, 20). Clear distinctions are not always available in flow cytometry data, be it data collected from the analysis of cells or viruses. Today, especially with the number of fluorophores that can be detected by commercial flow cytometers, what once would be interpreted as uniform populations are actually quite heterogeneous in composition. In fact, one of the major limitations of the analysis conducted early-on as part of this work has been the reliance on dot plots. Dot plots are inherently poor at describing populations as each event falling within a same fluorescence and side scatter area essentially appear as a single dot, for example, for biparametric analyses such as the ones conducted on other viruses (3, 4, 20). As a result, only large differences can be resolved and subtle changes are deemed inconsequential. This limitation is true whether discussing subtle differences in virus populations or cell populations (21). To overcome this, density information is required. Although density plots are available in commercially available software like Flowjo (, for this work, it was decided that working with the raw listmode data would give us more flexibility in the analysis. Furthermore, given that it was not clear from the outset what the best threshold value to use was for collecting the flow cytometric data, it was decided that all data would be collected and analyzed. The analysis of the data from Figure 3 onwards was done once the data was smoothed using kernel density estimates. Each event was then assigned a density value calculated with respect to its neighboring values (in terms of side scatter and fluorescence) to facilitate later filtration and cluster identification (Fig. 5).

As can be seen in Figure 3a, there is a lack of a clear differentiation between the overall shape of the fluorescence frequency curves, especially between λF7 and λF7-SupE-eGFP grown at 35°C; however, when looking at these two samples and their side scatter profile, there is a marked difference. This is very likely due to imperfect repression by the CI857 repressor imparting leaky expression of gpD::eGFP at this temperature. The parent pPL451 plasmid, unlike a λ prophage, is multicopy (pBR322 derivative) and lacks additional transcriptional terminators (tR1–tR4) that further regulate prophage derepression. As such a leaky control system that allows the incorporation of very small amounts of gpD::eGFP into the capsid, resulting in change to the overall scatter properties of the phage particle. It can therefore be extrapolated that this technique may be suitable to analyze even small levels of nonfluorescent displayed protein on the surface of λ phage.

Cluster assignment allows an additional layer of characterization and will lead to the quantification of phage. Although it is expected that the number of fluorescent phage should be proportional to the number of events detected, it is still unclear which method will be best to validate the exact number of particles detected in the different clusters.


Analysis of phage displaying eGFP by flow cytometry has been found to result in a rich source of data. In this work, four approaches to make use of this data have been explored, namely basic counts, 1D fluorescence and side scatter densities, 2D fluorescence and side scatter densities, and cluster quantification. No one approach can be labeled as the best; each was able to provide useful information about the samples examined. While 2D density analysis can be argued as yielding the most comprehensive information, its implementation resulted directly from conclusion made from count analysis. Most importantly, flow cytometry was able to identify the presence of event clusters corresponding to distinct sub-populations that showed different temperature-dependent responses, identified an intrinsic fluorescence of phage particles, and a distinct shift in side scatter properties of the phage when complementing plasmid having the gpD::eGFP gene was present.


The authors acknowledge Tranum Kaur, Ian Mann, and Jian Xiong for their help in preparing and running samples on the flow cytometer, and Maud Gorbet for allowing access to her lab.