Cytochemical flow analysis of intracellular G6PD and aggregate analysis of mosaic G6PD expression

Abstract Background Medicines that exert oxidative pressure on red blood cells (RBC) can cause severe hemolysis in patients with glucose‐6‐phosphate dehydrogenase (G6PD) deficiency. Due to X‐chromosome inactivation, females heterozygous for G6PD with 1 allele encoding a G6PD‐deficient protein and the other a normal protein produce 2 RBC populations each expressing exclusively 1 allele. The G6PD mosaic is not captured with routine G6PD tests. Methods An open‐source software tool for G6PD cytofluorometric data interpretation is described. The tool interprets data in terms of % bright RBC, or cells with normal G6PD activity in specimens collected from 2 geographically and ethnically distinct populations, an African American cohort (USA) and a Karen and Burman ethnic cohort (Thailand) comprising 242 specimens including 89 heterozygous females. Results The tool allowed comparison of data across 2 laboratories and both populations. Hemizygous normal or deficient males and homozygous normal or deficient females cluster at narrow % bright cells with mean values of 96%, or 6% (males) and 97%, or 2% (females), respectively. Heterozygous females show a distribution of 10‐85% bright cells and a mean of 50%. The distributions are associated with the severity of the G6PD mutation. Conclusions Consistent cytofluorometric G6PD analysis facilitates interlaboratory comparison of cellular G6PD profiles and contributes to understanding primaquine‐associated hemolytic risk.

high doses of primaquine required for radical cure. Consequently, the WHO recommends testing for G6PD deficiency when possible prior to administering curative doses of primaquine. 5 Glucose-6-phosphate dehydrogenase (G6PD) deficiency is an inherited, X-linked trait. 6,7 Hemizygous males and homozygous females are either severely G6PD deficient or normal depending on whether they have wild-type alleles or alleles that encode for G6PD enzyme with compromised enzyme stability and activity. Heterozygous females with 1 normal allele and 1 mutated allele present with a broader range of G6PD activity. The random inactivation of one or the other X chromosomes during embryonic development (lyonization) results in females having populations of red blood cells expressing G6PD deficiency in fixed proportions ranging typically from 20 to 80%. 7,8 Safe case management of P. vivax, with 8-aminoquinolines, requires knowledge of the G6PD status of the patient to prevent severe hemolytic anemia. In cases where G6PD activity is too low, 8-aminoquinolines should not be administered. Several quantitative and qualitative assays are available for the diagnosis of G6PD deficiency through measurement of residual G6PD activity in whole blood. 9,10 Qualitative tests cannot, however, stratify women with intermediate activity above 30-40% normal G6PD activity, and quantitative tests do not provide information regarding relative ratios of allele representation in a heterozygous female red blood cell population.
Cytochemical staining of red blood cells followed by observation with either microscopy or flow cytometry represents the only way to observe the mosaic red blood cell population in heterozygous females. [11][12][13][14][15][16][17] Cytofluorometric assays represent an opportunity to determine the relative G6PD activity at the level of the individual red blood cell. Recent development of complementary methodologies makes these assays more robust for wider use. 18,19 Thus, it is possible to identify females with distinct populations of RBCs expressing either a G6PD-deficient allele or a normal one, thereby measuring intermediate G6PD activity in heterozygous females, but so far there is no method to standardize this process. Arbitrary threshold setting in cytofluorometry assay can be subjective, thereby influencing how percentage of G6PD normal cells is designated. Here, a new software tool which provides automated analysis of percent G6PD normal cells and removes operator and instrument variation is presented. The software tool is a Web-based tool that does not require any programming skills. The tool can be used to standardize interpretation of cytofluorometric data. Additionally, 2 datasets, 1 from Thailand and 1 from the USA, are used to demonstrate the utility of the tool for determination of heterozygosity for G6PD in females. Association between mosaic profiles, G6PD activity, and hemoglobinopathies has been published separately. 12,20-22

| Human subjects research and specimen handling
US donor blood specimens were obtained by Bioreclamation, Inc.
(Westbury, NY, USA) and were collected between January 2012 and January 2016 from volunteers who were at least 18 years of age and who signed consent under institutional review board protocol by the Schulman IRB (Cincinnati, OH, USA), 2010-017 IRB. All donors were of African American origin, presenting at a recruitment center in New York, USA. Specimens were transported in ethylenediaminetetraacetic acid (EDTA) anticoagulant venipuncture vacuum tubes on cold packs and were stored at 4°C. Specimen processing took place between 2 and 4 days after blood collection. Tests for a given comparison typically were conducted on the same day for each blood sample. No personal identification data were collected, and all G6PD assays were performed independently and blinded to G6PD status. Ethics Committee (REC). The protocol was also reviewed by the Community Advisory Board at SMRU, which is composed of representatives from the communities served by SMRU. Volunteers who met the inclusion criteria underwent a detailed informed consent process and provided written consent before enrolling in the study.
In total, specimens from 242 volunteer donors were included in the analysis reported here: 97 from the USA, of which 49 were females and 48 were males, and 145 from Thailand, of which 95 were females and 50 were males.

| G6PD genotyping
The G6PD sequence for all specimens used in this report was confirmed by DNA sequencing as described previously. 12

| Cytofluorometry
Whole blood specimens were characterized for intracellular G6PD activity by flow cytometry as described previously. 15 Ten microliters of 50% hematocrit red blood cell suspension was diluted into 90 μL of 0.9% NaCl and was combined with 100 μL of sodium nitrite RBCs per replicate in the FL1 channel 533 ± 30 nm.

| Software tool development
The mathematical algorithm and graphical user interface (GUI) as well as the validation of the analysis tool were developed using the statistical software "R" (http://www.r-project.org/) (R Foundation for statistical computing, Vienna, Austria). Custom scripts were written to generate the algorithm and GUI as well as implementation of the R flowCore package in R for the importation of flow cytometry data as well as R shiny package for building the graphical user interface. The flowCore package was developed at Bioconductor (WA, USA). 23 Shiny is a Web application framework for the R programming environment. 24 The G6PD flow data analysis tool named mosaic G6PD flow is freely accessible to all users on the Shiny server. 25

| Statistical methods
All statistical analyses were conducted in the open-source statistical computing language "R" (http://www.r-project.org/). The sensitivities, and specificities, of the cytofluorometric assay were calculated using DNA sequencing as a reference standard.

| Genotypic characterization of blood specimens
The G6PD alleles for all specimens were sequenced, with a summary provided in Table 1. In the US African American sample, 31 of a total of 97 donors were confirmed heterozygous females. In the Thai sample set, 58 of a total of 145 specimens were confirmed heterozygous females. All heterozygous females except three had a Mahidol G6PDdeficient allele, and the 3 remaining had Mediterranean, Kaiping, and a newly found allele called "Shoklo." One deficient homozygous woman was carrier of 2 different mutations (Mahidol and Orissa), and the remaining homozygous were Mahidol. Likewise, all hemizygous G6PDdeficient males were Mahidol except one which was Viangchan. 21

| Characterization of blood specimens by G6PD enzyme activity assay
All specimens were characterized for G6PD activity with the quantitative G6PD kit from Trinity Biotech. The descriptive analysis for the G6PD activity for the different genotypes is provided in Table 2. The association between G6PD activity and different hematological characteristics is described elsewhere. 21

| G6PD normal to G6PD-deficient red blood cell ratio calculation
The cytofluorometric method allows observation of mosaic red blood cell populations in specimens from females by looking at the activity of G6PD in individual erythrocytes. An algorithm was developed to standardize interpretation of cytofluorometric data. 25 The algorithm for doing this is summarized in the following steps ( Figure 1): Step 1: Correct selection of amplification mode (log or linear) and the correct channel numbers are assigned to FL1, forward scattered count (FSC), and side scatter counts (SSC).
Step 2: QC-After import of.fcs file, the lower and upper 5% of FSC and SSC values are truncated. This helps remove unhealthy cells from the sample population ( Figure 1A).
Step 3: Standardization-A scaling algorithm is applied to data to account for variation from different cytometers due to differences in set gain and amplification. All new data are adjusted for gain and amplification effects to fit a standard of fluorescence ranges from 0 to 1000.
Step 4: Kernal density estimation-For each set of remaining FITC values, an estimation of the distribution of intensity is performed using a kernal density estimation method with a Gaussian kernal.
Resulting histogram data are converted into a probability distribution function for analysis of features. An algorithm searches for local maxima corresponding to dim and bright peaks ( Figure 1B).
Step 5: Data smoothing-A smoothing function is applied to the distribution of intensity obtained from kernal density estimation. The smoothing function removes small artifacts in the data and leaves only the major features of the distribution ( Figure 1C).
Step 6: Thresholding-A hard cutoff threshold is applied to the standardized and smoothed data. Peaks that are too close to one another are treated so that smaller secondary peaks within a 15% margin would not be counted as a separate peak (either dim or bright) based on local maxima ( Figure 1D). Step

| The Web browser accessible software tool for G6PD cytofluorometric data normalization and interpretation
A software tool publically available on the Web browser was developed to allow users running the same cytofluorometric assay analyze data in the same way. 15 The resulting graphical user interface (GUI) contains a sidebar and 2 tabs as shown in Figure 2

| Relative portions of normal to deficient cells across 2 diverse populations
The normalization process for G6PD cytofluorometric data as described above was applied to data collected from donors on the Thailand/Myanmar border and in New York, USA. All specimens were collected from healthy adults and had accompanying DNA sequence and reference quantitative G6PD activity associated with them. There was 100% genotype to phenotype (by the quantitative G6PD assay) concordance (data not shown). The distributions of % bright cells resulting from the standardized flow data analysis for the different genotypes are shown in Figure 3 and described in Table 3. homozygous normal (+ 1 /+ 1 ) and heterozygous normal (+ 1 /+ 2 ), and F I G U R E 1 Process for standardized interpretation of cytofluorometric intrared blood cell glucose-6-phosphate dehydrogenase (G6PD) data. Data processing is shown for a heterozygous specimen (panels A and B) and a normal hemizygous male (panels C and D). A, Removal of lower and upper 5% of forward scattered count (FSC) and side scatter counts (SSC). The empirical cumulative distribution function for the FCS and SSC, respectively, for a clinical specimen is shown. B, After normalization of the data and generation of kernel density estimations, an algorithm is applied to identify peak maxima associated with dim (G6PD deficient) and bright (G6PD normal) red blood cells. C, Data are smoothed to remove small artifacts. The normalized intensity versus frequency for FL1 channel is shown pre-and postsmoothing. D, A hard cutoff threshold is applied to the standardized and smoothed data to allow for only maximal peak Box plots for the distributions of % percent bright cells observed per specimen per G6PD genotype are shown highlighting minimum and maximum (whiskers), 1st quartile and 3rd quartiles (boxes), and means. The distributions are shown only for the genotypes for which there were more than 1 representative specimen in the sample set. The statistics are described in Table 3 (5) female deficient, including homozygous deficient (− 1 /− 1 ) and heterozygous deficient (− 1 /− 2 ). Similarly, the male samples cluster nicely by deficient and normal phenotypes. To confirm this more rigorously, a K means clustering algorithm using from 1 to 6 clusters was applied and the variance was observed as a function of the number of clusters ( Figure 4B). Applying the elbow method for optimal clusters, the optimal number of clusters for females and males is 3 and 2, respectively. Figure 4C shows that when the female and male data are categorized into 3 and 2 clusters, the data roughly cluster as expected by G6PD phenotype which indicates that the metric of percent bright cells may be a suitable method for determining G6PD intermediate activity phenotypes. Another approach would be to generate thresholds in percent bright cells visually from the data and apply these thresholds as a categorization method.

| Interpretation of normalized cytofluorometric data to determine allele composition
In this scenario, a female sample with percent bright cells between 10 and 85 percent may be characterized as a sample with intermediate G6PD activity. Preliminary visual thresholds for determining G6PD activity are tabulated in Table 4.

| DISCUSSION
DNA sequencing provides the only way to determine definitively the G6PD allele composition both in males and in females. Sequencing can only reliably predict the phenotype in males and for those females with either 2 normal G6PD alleles or 2 deficient G6PD alleles. Sequencing cannot provide any phenotypic information in the The statistics are shown only for the genotypes for which there were more than 1 representative specimen in the sample set. This distribution is represented in Figure 3. Two hundred and forty-two blood samples, collected from 2 geographically distinct donor populations and processed in 2 distinct laboratories, were analyzed for G6PD status by DNA sequencing, G6PD enzyme activity, and intracellular RBC G6PD activity by flow cytometry. The comparison of enzyme activity to genotype and intracellular RBC G6PD activity profiles has been described elsewhere. 12 There may be 3 reasons that contribute to females who are normal by G6PD allele but may be categorized as heterozygous by the flow data: (i) genuine biology, in that these women may have a relatively high percent of G6PD-deficient red blood cells; (ii) compromising of the specimen integrity, it has been observed that the flow assay is highly sensitive to how the specimen is handled postcollection; and (iii) limitations of the tool in analysis of the flow data. Formulations have been developed to improve specimen handling for cytofluorometric G6PD assays, 18,19 and further data across laboratories and G6PD genotypes will have to be analyzed to further validate the thresholds described here.
It should be noted that the % bright cell output is not strictly an accurate numeration of the 2 allele representation in the red blood cell population in heterozygous females. This is because the red blood cell distributions from the deficient subjects also have red blood cells with normal G6PD levels (contributing to the bright cell portion), and likewise, the G6PD normal subjects have old red blood cells with low intracellular G6PD in the dim cell portion. The Mahidol allele is considered to confer a more severe G6PD deficiency phenotype than the A-allele. Correspondingly, the Mahidol male hemizygous-deficient G6PD cell profiles seem to be more polarized toward low numbers of G6PD bright cells as compared to the A-population, and the same is also observed for females heterozygous for the Mahidol allele versus A-allele although in both cases the t tests were not strongly significant (0.2 and 0.052, respectively).
In conclusion, a publically available Web browser-based tool with no requirements for programming skills has been developed to standardize analysis of cytofluorometric G6PD data. This tool allows interlaboratory data comparison and broader cross-geographical and genotypic analysis of G6PD deficiency, especially in females. 21 An important next step is to attempt to associate this phenotypic data to risk of hemolysis on exposure to an oxidative stress such as primaquine and its metabolites. Volunteers who met the inclusion criteria underwent a detailed informed consent process and provided written consent before enrolling in the study.

CONSENT FOR PUBLICATION
Not applicable.

AVAILABILITY OF DATA AND MATERIAL
All DNA sequence data have been made available on the NCBI Genbank. The software tool described in this article is publically available at https://mkalnoky.shinyapps.io/MosaicG6PDflow/.