Exploring the genomic basis of early childhood caries: a pilot study

Objective A genetic component in early childhood caries (ECC) is theorized, but no genome‐wide investigations of ECC have been conducted. This pilot study is part of a long‐term research program aimed to: (1) determine the proportion of ECC variance attributable to the human genome and (2) identify ECC‐associated genetic loci. Methods The study's community‐based sample comprised 212 children (mean age=39 months; range = 30–52 months; males = 55%; Hispanic/Latino = 35%, African‐American = 32%; American Academy of Pediatric Dentistry definition of ECC prevalence = 38%). Approximately 2.4 million single nucleotide polymorphisms (SNPs) were genotyped using DNA purified from saliva. A P < 5 × 10−8 criterion was used for genome‐wide significance. SNPs with P < 5 × 10−5 were followed‐up in three independent cohorts of 921 preschool‐age children with similar ECC prevalence. Results SNPs with minor allele frequency ≥5% explained 52% (standard error = 54%) of ECC variance (one‐sided P = 0.03). Unsurprisingly, given the pilot's small sample size, no genome‐wide significant associations were found. An intergenic locus on 4q32 (rs4690994) displayed the strongest association with ECC [P = 2.3 × 10−6; odds ratio (OR) = 3.5; 95% confidence interval (CI) = 2.1–5.9]. Thirteen loci with suggestive associations were followed‐up – none showed evidence of association in the replication samples. Conclusion This study's findings support a heritable component of ECC and demonstrate the feasibility of conducting genomics studies among preschool‐age children.

Objective. A genetic component in early childhood caries (ECC) is theorized, but no genomewide investigations of ECC have been conducted. This pilot study is part of a long-term research program aimed to: (1) determine the proportion of ECC variance attributable to the human genome and (2) identify ECC-associated genetic loci. Methods. The study's community-based sample comprised 212 children (mean age=39 months; range = 30-52 months; males = 55%; Hispanic/ Latino = 35%, African-American = 32%; American Academy of Pediatric Dentistry definition of ECC prevalence = 38%). Approximately 2.4 million single nucleotide polymorphisms (SNPs) were genotyped using DNA purified from saliva. A P < 5 9 10 À8 criterion was used for genome-wide significance. SNPs with P < 5 9 10 À5 were followed-up in three independent cohorts of 921 preschool-age children with similar ECC prevalence. Results. SNPs with minor allele frequency ≥5% explained 52% (standard error = 54%) of ECC variance (one-sided P = 0.03). Unsurprisingly, given the pilot's small sample size, no genomewide significant associations were found. An intergenic locus on 4q32 (rs4690994) displayed the strongest association with ECC [P = 2.3 9 10 À6 ; odds ratio (OR) = 3.5; 95% confidence interval (CI) = 2. 1-5.9]. Thirteen loci with suggestive associations were followed-upnone showed evidence of association in the replication samples.
Conclusion. This study's findings support a heritable component of ECC and demonstrate the feasibility of conducting genomics studies among preschool-age children.

Introduction
Early childhood caries (ECC) is a persistent and possibly growing public health problem. In the USA, recent data suggest that one of four children age 6 or young have experienced ECC when defined according to AAPD criteria [the presence of one or more decayed (non-cavitated or cavitated lesions), missing (due to caries), or filled tooth surfaces in any primary tooth in a child under the age of six] 1 . Moreover, recent evidence suggests that ECC prevalence may be increasing, particularly among ethnic minority and socially disadvantaged populations, leading to substantial disparities in children's oral health 2 . Joint efforts of professional, academic, community, and policy stakeholders are focused on addressing this important health problem 3 , which tends to disproportionally affect children in families from lower socioeconomic strata 4 .
From a pathogenetic standpoint, dental caries results from complex interactions among acid-producing members of the biofilm, fermentable carbohydrates, and many host factors, including susceptible tooth surfaces and saliva. For this reason, caries has been thought to be largely modulated by behavioural and environmental risk factors, such as diet and fluoride exposure. The disease is associated with substantial functional, quality of life, and economic costs, whereas restorative care is not curative and often fails to arrest the disease process 5 . Since the late 1950s, dental caries has been shown to have a substantial genetic component [6][7][8] . Estimates of the disease variance proportion explained by genetics (often referred to as "heritability") have ranged from 30% to 70%, with higher estimates found for primary versus permanent dentition caries 9-12 . Numerous candidate-gene studies have since been conducted to investigate the postulated role of several hypothesized genes in caries aetiology in children and adults 13 . Studies in this body of literature have largely targeted enamel development and mineralization genes, as well as genes involved in the immune response in early childhood. As reviewed by Vieira and colleagues 12 , these studies have had mixed results and currently no consensus knowledge of the genetic basis of ECC exists. This is not surprising, given the very small sample size of most dental caries investigations (typically up to few hundred subjects) compared to genomics studies conducted for other common diseases and traits, frequently including upwards of 50,000-100,000 participants.
Genome-wide association studies (GWAS) have been successful in identifying associations between common genetic variants [primarily single nucleotide polymorphisms (SNPs)] and several common diseases or traits, including asthma, diabetes, colorectal cancer, cardiovascular, and psychiatric conditions 14 . The advent of GWAS has enabled the 'unbiased' scan (i.e., a hypothesis-free exploration) of millions of SNPs across the human genome, in an efficient manner. It is anticipated that GWAS will illuminate the contribution of genomics in oral health and care 15 , although progress to date has been slow. Only one GWAS of caries in the primary dentition has been conducted, using a sample of 1300 European-American (white) children aged 3-12 years old 16 . This study identified several genetic loci with suggestive evidence of association that could have plausible biological roles in childhood caries, but found no genome-wide statistically significant associations. To address the knowledge gap of genomics in ECC, we have embarked on a long-term research programme aimed to study the genetic underpinnings of ECC among a large sample of community-based preschoolage children. Here, we present the results of a pilot GWAS of ECC (i.e., among children aged 71 months old or younger), conducted in a multi-racial/ethnic sample of preschool-age children enrolled in a community-based study of childhood oral health. As part of this pilot and feasibility study, we estimated the heritability of ECC and attempted to replicate loci with suggestive associations in the discovery sample in three external cohorts of preschoolage children.

Materials and methods
This pilot GWAS was conducted using DNA extracted from saliva samples collected from a multi-ethnic sample (Table 1) of 212 lowincome preschool children (ages 2-4) enrolled in the Zero-Out Early Childhood Caries (ZOE) study (UNC-Chapel Hill IRB #08-1185) previously reported by Barakat et al. 17 The planned recruitment for a large-scale GWAS of ECC is approximately 6000 children enrolled in Head Start centres across North Carolina, currently undertaken as the ZOE 2.0 study. Detailed enrolment procedures, inclusion and exclusion criteria of the ZOE study, are reported in Born et al. 18 Briefly, participating children were enrolled in Early Head Start programs or living in nearby 'control' locations in North Carolina and were examined by a single clinical examiner at the child's preschool or a nearby community location using portable dental equipment. According to the AAPD ECC definition, any child with a single decayed (cavitated or non-cavitated), missing (presumably due to caries), or filled tooth surface was classified as having ECC. A secondary measure, caries severity, was ascertained using the dmfs index, which is the sum of surface-level diagnoses for decayed, missing-extracted due to caries or restored tooth surfaces. Diagnosis of surface-level caries lesions was based on NIDCR visual criteria at the non-cavitated level (i.e., d 1 ) without radiographs following a toothbrush prophylaxis, compressed air-drying and artificial light with the use of magnification. Excellent intra-examiner reliability was achieved for surface-level caries lesion diagnoses: j = 0.85, 95% CI = 0.83-0.88, upon duplicate examination of 23 children within a 3-day period. Sociodemographic and behavioural risk factors were collected via structured, computer-assisted parent interviews that were administered in English or Spanish.
Saliva samples were collected alongside the clinical examinations using the Oragene DNA Genotek OG-575 kit. Consent for saliva donation for genomics analyses was given by 96% (n = 331) of eligible children in the pilot study, and saliva samples were obtained from 64% of those (n = 213). The most frequent reasons for not obtaining a saliva sample were lack of cooperation (18%) and inadequate salivation (12%; Fig. S1). DNA extraction, quantitation, and quality assessment were performed at the UNC-Chapel Hill Biospecimen Processing facility with good results, that is, sufficient quantity and quality DNA was obtained for high-density genotyping purposes. Mean DNA yields according to quantitation method were [lg (SD)]: optical density (OD) -44.1 (26.7), Picogreen (PG) -29.1 (15.6), human-specific RNAseP -3.9 (1.6; Fig. S2). Moreover, >80% of samples had A260/A280 ratio between 1.6-2.0 and 260/ 230 ratio above 1.5.
Genotyping was performed at the UNC-Chapel Hill Mammalian Genotyping core using the Illumina HumanOmni2.5-8 bead chip (offering~2.4million markers). Genotyping quality control procedures included Hap-Map-CEPH trios and duplicates, seven blind duplicate samples, identification of sex and sample mismatches, and generation of sample call and error rates. After the exclusion of one contaminated sample, 212 high-quality genotyped samples were obtained [i.e., no sex mismatches, median (range)sample call rate = 99.8% (96.1%-99.9%) and concordance rate = 99.996% (99.991%-99.997%)] and were carried forward to analyses.
After quality control,~2.3 million SNPs were used to estimate heritability of ECC, both with and without adjustment for age, sex, and ancestry using Genome-wide Complex Trait Analysis (GCTA) software 19 and various Minor Allele Frequency (MAF) thresholds of 1%, 5%, and 10%. Low-frequency (MAF 0.5%-5%) and rare (MAF <0.5%) polymorphisms can contribute to variability in complex traits or disease; however, due to the small sample size of this pilot GWAS analysis and the likelihood of inducing spurious findings, they were excluded for analyses and reporting. To estimate and test Genomics of early childhood caries 219 heritability, GCTA employs a random-effects mixed linear model and restricted maximumlikelihood regression adjusting for age, sex, and ancestry 19 . Heritability was estimated for the binary ECC case definition and the continuous measure of disease severity, the conventional d 1,2-3 mfs index (the sum of decayed, missing due to caries, and filled/restored primary tooth surfaces) combining non-cavitated and cavitated caries lesions. Of note, P-values reported for heritability estimates are one-sided, as variance explained (by genetics or any other source) cannot take negative values. Genetic associations of the~1.4 million common SNPs (MAF ≥5%) with the binary ECC case definition were tested using logistic regression while adjusting for age, sex, and ancestry. Ancestry adjustments were performed to control for population stratification 20 (i.e., systematic differences in allele frequency between cases and controls that can induce spurious associations) via the generation of 10 ancestry principal components (PCs). Although these PCs do not have a straightforward interpretation, they represent axes of common, genetically determined ancestry in the study sample, and are treated as covariates (e.g., confounders) in statistical analyses. The conventional genome-wide significance threshold for GWAS is P < 5 9 10 À8 . In addition, a more lenient (P < 10 À5 ) threshold to identify 'suggestive' evidence of association, albeit non-significant, was considered as a means of highlighting additional candidate genes. Loci of interest were visualized using LocusZoom software 21 .
Association of the prioritized SNPs (P < 5 9 10 À5 ) was examined in three independent cohorts comprising 921 preschool children from the Center for Oral Health Research in Appalachia study 22 (COHRA, n = 326; mean age = 35 months; ECC prevalence = 25%), Iowa Fluoride Study 23 (IFS) n = 348; mean age = 60 months; ECC prevalence = 35%) and the Genetic, Environment and Health Initiative Research Study 24 , (GEIRS, n = 247; mean age = 48 months; ECC prevalence = 25%). Replication was considered using three criteria: (1) consistency in the direction of association (i.e., the same risk allele observed across samples) and P-values less than the Bonferroni-corrected statistical significance threshold in all three replication samples; (2) directional consistency and Pvalues less than a nominal threshold (P < 0.05); and (3) directional consistency between prioritized SNPs associations in the discovery (ZOE) and the three replication samples, as determined by a binomial test (P < 0.05).

Results
The prevalence of ECC among the 212 participating children (mean age = 39 months; range = 30-52 months) was 38%. The demographic characteristics of this multi-ethnic/ racial sample are provided in Table 1. When considering all common SNPs (MAF ≥5%; approximately 1.4 million), the heritability (h 2 ) of ECC was 52%, P = 0.03 (or h 2 = 44%, when including all SNPs with MAF ≥1%, that is,~1.9 million SNPs). This estimate diminished after adjustment for ancestry: h 2 = 13% (P = 0.4). This lower proportion corresponds to the ECC variance explained by genetics that is common and shared across all ancestry (and effectively racial/ethnic) groups. Similarly, heritability was markedly lower, at 14% (P = 0.01) for ECC severity (d 1-2,3 mfs index) using the same set of common SNPs (MAF ≥ 5%) compared to the binary ECC case definition (Table 2).
Defects in this gene result in autosomal 220 J. L. Ballantine et al.
recessive, non-syndromic, sensorineural deafness. Other variants in this gene have been associated with nephrolithiasis and reduced bone density. Replication was attempted for all 13 SNPs demonstrating suggestive statistical association with ECC separately within each of the three replication samples. None of the SNPs showed genetic association in replication samples, and only 15 of the 39 SNP look-ups showed directional association concordance.

Discussion
This report presents the results of the first GWAS of ECC, which was conducted among a small pilot sample of 212 community-based preschool-age children participating in a dental public health study. First and foremost, the study demonstrates that genomics investigations of common oral health traits, including ECC, are feasible among preschool-age children in non-clinical settingswith a key enabling feature being saliva collection,   Fig. 1. Regional association plots of the top two loci that emerged with the strongest evidence of association (lowest Pvalues, even though not genome-wide significant) with ECC among the 212 preschool-age children participating in the ZOE genome-wide association study, left panel (a): 4q32 locus (lead SNP: rs4690994, P = 2.3 9 10 À6 ; odds ratio (OR) = 3.5; 95% confidence interval (CI) = 2.1-5.9); right panel (b): the 20q22 locus (lead SNP rs439888; P = 5.3 9 10 À6 ; rs439888 intronic SNP; OR = 3.6; 95% CI = 2.1-6.2). Position on the x-axis corresponds to genomic coordinates (position), and the position on the yaxis corresponds to each SNP's -log 10 (P-value). The top, or 'lead', SNP is coloured purple, whereas other polymorphisms are colour coded by their r 2 , a measure of linkage disequilibrium, with the lead SNP. Plots were generated using Locus Zoom 21 .
Genomics of early childhood caries 221 experience. An alternative approach to circumvent this issue could be the interrogation of the diseased-only surfaces rather than the complete dmfs index; however, this metric could also be confounded by access to care issues, which would affect the ratio of treated vs. untreated disease. On the other hand, heritability was substantially lower when our models were adjusted for ancestry via the inclusion of 10 principal components; this result should be treated with caution, as such adjustments can produce statistically unstable estimates due to the small sample size. Nevertheless, it is indicative of the impact of race/ ethnicity-specific influences, which are at play in a racially mixed sample like in our study.
Our study did not consider traditional risk factors for ECC, including socioeconomic status, diet, oral hygiene, and fluoride exposures. This could be performed via the conduct of stratified analyses or via the inclusion of additional terms for these factors in genetic models. As noted earlier, traditional risk factors, although strongly associated with the trait or disease under study, are not confounders of the genetic associations and adjustments for these factors are not performed 27 . Nevertheless, stratification by such factors or examination of gene-environment (e.g., fluoride) interactions have been informative in previous investigations 16,28 and should be explored in cases where the sample size permits. Interestingly, some biological pathways that are genetically controlled 16 could be operating via clinical 29 (e.g., saliva composition, enamel properties, immunity, and metabolism) or behavioural risk factors, with sweet taste preference being suggested by relatively recent studies 30 .
In sum, the major novelty and strength of this study was the opportunity to do an unbiased scan of the human genome without a priori hypotheses for the first time, in a narrow-age range sample, appropriate for the study of ECC. This study also benefits from the uniform clinical examination protocol and the opportunity to replicate or generalize its findings to external samples of almost one thousand preschool-age children. Lastly, raceor ethnicity-specific results were not examined in these analyses due to the small sample size of the respective strata; although this study characteristic could further reduce the statistical power, we consider that the inclusion of under-studied racial/ethnic groups in this pilot investigation is a novel, positive element.
Why this paper is important to paediatric dentists • The study confirms that a substantial heritable component of ECC exists. • Genomics studies are feasible among preschool-age children using saliva samples obtained during dental examinations-good quality and sufficient amount of DNA can be obtained from saliva, suitable for highdensity genotyping. • Future, large or collaborative multi-ethnic/multi-racial studies, are likely to identify specific genetic influences for ECC, which can help better understand, prevent and treat this early-onset, aggressive childhood disease.
Additional Supporting Information may be found in the online version of this article: Fig. S1. Results of the saliva sample collection process among the 346 preschool-age children participating in the ZOE study. Fig. S2. Quantitation of DNA purified from saliva samples among the 213 preschool-age children that donated a sample for the ZOE GWAS study. Fig. S3. Quantile-Quantile (QQ) plot of GWAS results of ECC among the 212 preschool-age children participating in the ZOE GWAS. Fig. S4. Manhattan plot of the~1.4 million association results [y-axis corresponds tolog 10 (p-value)] of genotyped SNPs with the ECC case definition, arranged by chromosome, among the 212 preschool-age children participating in the ZOE GWAS.
Genomics of early childhood caries 225