A fundamental tenet of scientific research is that published results are open to independent validation and refutation. Minimum data standards aid data providers, users, and publishers by providing a specification of what is required to unambiguously interpret experimental findings. Here, we present the Minimum Information about a Flow Cytometry Experiment (MIFlowCyt) standard, stating the minimum information required to report flow cytometry (FCM) experiments. We brought together a cross-disciplinary international collaborative group of bioinformaticians, computational statisticians, software developers, instrument manufacturers, and clinical and basic research scientists to develop the standard. The standard was subsequently vetted by the International Society for Advancement of Cytometry (ISAC) Data Standards Task Force, Standards Committee, membership, and Council. The MIFlowCyt standard includes recommendations about descriptions of the specimens and reagents included in the FCM experiment, the configuration of the instrument used to perform the assays, and the data processing approaches used to interpret the primary output data. MIFlowCyt has been adopted as a standard by ISAC, representing the FCM scientific community including scientists as well as software and hardware manufacturers. Adoptionof MIFlowCyt by the scientific and publishing communities will facilitate third-party understanding and reuse of FCM data. © 2008 International Society for Advancement of Cytometry
FLOW cytometry (FCM) systems have been available to investigators for over 30 years, and the field continues to advance at a rapid rate. FCM has been responsible for major progress in basic and clinical research by enabling the phenotypic and functional characterization of individual cells in a high-throughput manner. Advances in the technology now allow for automated, multiparametric analyses of thousands of samples per day (1). Each data set can consist of multidimensional descriptions of millions of individual cells, producing data similar in size and complexity to gene expression microarrays. Like the microarray field, the ability to collect FCM data is outpacing the computational means for data handling and analysis. Furthermore, the lack of reporting standardization limits collaboration, independent validation/refutation, and meta-analysis, and thus minimizes the value of the wealth of existing FCM data because of poor annotation (2). Such technological advances, in the absence of reporting standards, also introduce considerable challenges for the producers, reviewers, and consumers of today's biomedical and immunology literature (3).
To address these shortcomings, we have brought together a cross-disciplinary international collaborative group of bioinformaticians, computational statisticians, software developers, instrument manufacturers, and clinical and basic research scientists to develop a minimum information reporting standard for FCM. This large collaborative group proposed the initial version of the Minimum Information about a Flow Cytometry Experiment (MIFlowCyt). The proposal underwent several stages of consultations by the International Society for Advancement of Cytometry (ISAC) Data Standards Task Force (DSTF), which has been responsible for all FCM-related standards for the past 30 years, such as the FCS (4) data file standard. MIFlowCyt was further reviewed by the broader scientific community and finally approved by the ISAC DSTF, ISAC Data Standards Committee, and recently adopted as an ISAC standard by the ISAC Council. MIFlowCyt is also endorsed by the Data Interoperability Steering Committee of the Division of Allergy, Immunology, and Transplantation within the National Institute of Allergy and Infectious Diseases (NIAID), which strongly supports the use of MIFlowCyt and related data format standards.
Throughout the biomedical sciences, minimum information checklists are beginning to find favor with scientists, publishers, and funders alike. Such checklists ensure that descriptions of methods, resulting data, and analyses support the unambiguous interpretation, corroboration, and reuse of data (5). As emphasized by several cross-community integration efforts, e.g., FuGE (6), MIBBI (5), OBI (7), most biological and biomedical investigations have similar design patterns. Typically, they are driven by a hypothesis or purpose, which guides the experimental design. Samples or specimens are prepared and reagents are used to evaluate analytes within the samples. Data are acquired using instruments, followed by data processing and analysis to interpret and conclude the experiment.
To simplify cross-domain comparison and integration, the required minimum information specified by MIFlowCyt (see Supplementary Material) is organized similarly to minimal information requirements for other MIBBI-registered high-throughput biological experiments, i.e., organizing the information by experiment overview, sample/specimen description, instrument details, and data analysis (Table 1). The organization of MIFlowCyt does not specify the order or format in which experiment annotation should be provided, and MIFlowCyt is not intended to replace any additional information already collected within any particular group. The actual information is expected to come from several sources. Research scientists provide detailed information about the experiment overview and design, information about the samples used in the experiment including preparation, treatment, and staining details, and details about data analysis procedures. FCM instruments are expected to provide details about configuration and settings and analytical software details relevant to performed analyses. An example of a MIFlowCyt-compliant experiment annotation is provided as Supplementary Material.
Table 1. Components of a MIFlowCyt-compliant experiment description
|Flow sample (specimen)||Material|
|Data analysis||List-mode data|
|Instrument details||Instrument identification|
MIFlowCyt is explicit in its requirements, following constraint and design recommendations from professional standardization bodies (e.g., www.ietf.org/rfc/rfc2119.txt; http://standards.ieee.org/guides/style/). The word shall is used to indicate that a particular item is an absolute requirement of the specification. It identifies information that is absolutely crucial in order to allow for interpretation of the experiment. Shall combined with if relevant indicates information that is not generally applicable but could impact the interpretation of a particular experiment in certain cases. The word should is used to indicate that a particular item is recommended but not required. The normative version of this standard is available as Supplemental Information to this manuscript. The key design considerations for MIFlowCyt are described later.
An experiment is the evaluation of a set of one or more samples with a specific purpose or objective, such as testing a hypothesis or diagnosing a patient. Information about the overall experiment design shall be provided to understand specific experiment details in the proper context. These details shall include a brief description of the purpose of the experiment and a summary of findings including the interpretation of the results or outcomes. It shall be supported by a list of keywords, preferably from a controlled vocabulary (such as MeSH, www.nlm.nih.gov/mesh/), to facilitate quick orientation and cross-experiment comparison.
If relevant, details shall be provided about experimental variables, i.e., attributes that differ between samples within an experiment due to preexisting differences in sample states (conditional variables) or experimental manipulation of the samples (manipulated variables), as they are important in understanding the relationship between samples within the experiment (e.g., smoker versus nonsmoker).
Quality control measures (e.g., replicates, calibrations, control assays) shall be described to support comparative statistical evaluation.
Contact information shall be included for further information requests.
Flow Sample/Specimen Description
Description of the sample (or specimen as commonly used in the clinical domain) material and its preparation, treatment, and evaluation are crucial for understanding the experiment. MIFlowCyt specifies three types of samples: biological, environmental, and other samples. For all sample types, a description of the sample shall include information about the nature of the sample material (such as peripheral blood, seawater, or dyed plastic beads). For biological samples, a description of the source shall be provided (e.g., blood as the source of mononuclear cells) and the organism from which they were originally derived shall be identified. Ontologies and/or standardized vocabularies, such as NCBI taxonomy (www.ncbi.nlm.nih.gov/Taxonomy) should be used to facilitate correct interpretation.
Sample treatment descriptions shall specify treatment agents and conditions essential for experiment interpretation or cross-experiment querying, comparisons, and analyses. Access to information about the reagents and how they were used is critical for the correct interpretation of data. Reagents in a FCM experiment are conceptually similar to reagents used in other experiment types: they serve the purpose of measuring analytes and have analyte detector and reporter components. In a FCM experiment, the analyte is often a cell surface protein (e.g., CD25), the analyte detector is often an antibody (e.g., anti-CD25) that specifically binds the analyte, and the reporter is typically a fluorochrome (e.g., FITC).
A description of the entity(ies), disposition(s), or process(es) being evaluated (e.g., CD25, apoptosis, membrane permeability) shall be provided whenever there is ambiguity about the analyte that is being measured by the reagent. For example, if propidium iodide (PI) is added to permeabilized cells, it binds to DNA and thus measures DNA content (e.g., for the evaluation of cell cycle status). When PI is added to nonpermeabilized cells, it is taken up selectively by membrane-compromised cells and is thus an indicator of membrane integrity and cell viability. This is an example of a single reagent measuring different attributes depending on the conditions used for the preparation of the experiment sample, which illustrates the importance of capturing these details.
Flow cytometers generally consist of three major subsystems: fluidic, optical, and electronic. Flow cytometers operate by passing a sample-containing fluid stream through a beam of light of a specific wavelength. The scattered and emitted fluorescent light is then collected and transformed to electrical signals, involving a combination of optical and electronic components such as lenses, filters, mirrors, beam splitters, and detectors. The instrument and its configuration and settings directly affect the resulting data. Therefore, it is essential to capture detailed descriptions of selected components of cytometer subsystems in order to properly interpret experimental results.
The instrument manufacturer and model number shall be sufficient if these uniquely specify the required instrument components. Details listed in this section are only required for user-configurable components and cases in which the make and model number are not sufficient to identify the required information. Providing all of the required instrumentation details for configurable components may seem tedious however, most of these are expected to be provided by instrument manufactures and the instruments themselves.
The fundamental information linking sample characteristics to data includes excitation wavelengths of light sources and wavelengths passed by each of the filters. However, it has been demonstrated that other details are also critical (8, 9), for example, fluorochrome excitation depends on laser power, material degradation over time influences the performance of optical filters, photomultiplier tube gain highly depends on voltage settings. The amount of light, polarization, and the beam size where the particle is interrogated depend significantly on the optics used to transmit the light from its source to the flow cell component of the fluidics subsystem. The amount of light reaching the flow cell is dependent on the number of optical surfaces involved, the optical coatings used, and the efficiency of any optical fibers used. Thus, the type of flow cell shall be specified as it can affect sensitivity by impacting the background and laser power requirements (10). Specification of the full optical path shall include specification of the light source(s) and the enumeration of all optical components (e.g., optical filters, beam splitters, mirrors), shall specify the detector used and shall state the characteristic of the signal (e.g., height, width, area) that is being digitalized into the parameter value.
A fundamental principle guiding the publication of scientific results is that supporting data must be made available in a form that allows for independent evaluation of the conclusions as reflected in the data and materials availability policies in most journals (11, 12). A common raw data output of a FCM experiment is a list-mode data file, e.g., FCS (4). Ideally, a list-mode data file contains the set of measured parameters (e.g., fluorescence intensities) for each event (e.g., a cell) collected in the sample. These data are crucial for independent reanalysis, and, therefore, either they shall be provided directly or the details shall be stated on how they may be requested.
Fluorescence compensation is a data transformation process used to subtract the fluorescence signal due to one fluorochrome from the fluorescence signal due to another fluorochrome when the fluorochromes have overlapping emissions spectra. Compensation can be performed after signal detection but before digitalization (“hardware compensation”), or after the data have been collected (“software compensation”). Mathematically, compensation is a straightforward application of linear algebra describable by a compensation matrix, which is often part of the FCS data files. The type of compensation shall be described (e.g., no compensation, hardware compensation, computed compensation) and the spillover or compensation matrix shall be provided whenever available.
Typically, FCM data are analyzed in the context of gates. Gating is a data-filtering process by which populations of events are defined based on parameter value subset combinations. Gating descriptions shall be clear and unambiguous, preferably by mathematical descriptions of each gate boundary. It is important to define qualitative gate descriptors as well as to state denominators used to calculate percentages of events within a gate. The overall gating strategy shall be traceable up to the set of all events.
The importance of minimal data reporting standards is illustrated by both community-driven standards such as MIAME (13), MIRIAM (14), MISFISHIE (15), and the CONSORT checklist (16), and cross-community efforts including MIBBI (5), FuGE (6), and OBI (7). MIFlowCyt fills a recognized gap in biomedical data reporting standards by outlining the minimum information required to interpret FCM experiments, understand the conclusions reached, and make comparisons to experiments performed by different laboratories. Nebulous standards and guidelines lead to alternative interpretations resulting in nonuniform, inconsistent reporting (17). MIFlowCyt follows the minimal specification design and overall goals as specified by MIBBI, and it clearly and explicitly identifies the minimum required information for reporting FCM experiments. It communicates the minimum threshold for data exchange and study interpretation, enumerating all of the items required to be reported and the depth of required details. Through MIBBI, MIFlowCyt also aims to contribute to the development of a shared checklist focused on the common experiment components shared among all communities.
MFIowCyt represents the first crucial step in our efforts to develop universal solutions for representing, collecting, annotating, archiving, analyzing, and disseminating FCM data. Such a minimum information standard sets the stage for the integration of data repositories and data analysis tools. MIFlowCyt describes the minimum information content required to interpret a FCM experiment, independent of the form of how the information is provided. While specification of the form (i.e., data formats) is also essential to facilitate accurate exchange and reuse of experiment data and development of new analytical tools (17, 18), these needs are being addressed through separate, coordinated efforts (19, 20). For example, a specification for the provision of gating details in XML has been accepted by ISAC's DSTF for consideration as the basis for an eventual standard (http://flowcyt.sourceforge.net). Adoption of MIFlowCyt and related standards by Laboratory Information Management Systems (LIMS) would ensure effective interoperability between investigators and other software applications.
The development of MIFlowCyt has been coordinated with submission of applicable FCM-specific terms into OBI and other related ontologies, enabling the potential for structured queries and automated data analysis. Through FICCS (Flow Informatics and Computational Cytometry Society; www.ficcs.org), coordinated efforts have been initiated to develop a data model based on MIFlowCyt by extending FuGE. Publicly available FCM data files are commonly attached as supplementary materials to publications—an unreliable method that does not preserve the data files (17). Therefore, as MIAME guidelines led to expansion of public microarray data repositories such as ArrayExpress (www.ebi.ac.uk/arrayexpress) and GEO (www.ncbi.nlm.nih.gov/geo), we expect MIFlowCyt to influence FCM in a similar way. For example, the Immunology Database and Analysis Portal (www.immport.org) has adopted the MIFlowCyt framework for the capture of FCM data from investigators funded by the Division of Allergy, Immunology, and Transplantation of the NIAID. In addition, efforts are underway under the auspices of ISAC to develop a public data repository for FCM data based on MIFlowCyt.
MIFlowCyt is an ISAC standard stating the minimum information required to report FCM experiments. The normative version of this standard and an example of a MIFlowCyt-compliant experiment annotation are available as Supplemental Information to this manuscript and from www.isac-net.org/, http://flowcyt.sourceforge.net/, and http://mibbi.sourceforge.net/. Broad adoption of MIFlowCyt by research scientists, clinicians, instrument manufacturers, software developers, and publishers will foster scientific collaboration. It will preserve the integrity and quality of FCM data, which will significantly contribute to the advancement of biomedical sciences by enabling scientists to build upon previous findings.
R.R.B. is a Michael Smith Foundation for Health Research Scholar and an ISAC Scholar. The authors thank the International Society for the Advancement of Cytometry, the Flow Informatics and Computational Cytometry Society, and the Data Interoperability Steering Committee of the Division of Allergy, Immunology, and Transplantation within the National Institute of Allergy and Infectious Diseases for their support and contribution to this effort. The authors also thank James Wood for chairing the ISAC adoption process.