Optimizing the quality of clinical studies on oral microbiome: A practical guide for planning, performing, and reporting

Abstract With this review, we aim to increase the quality standards for clinical studies with microbiome as an output parameter. We critically address the existing body of evidence for good quality practices in oral microbiome studies based on 16S rRNA gene amplicon sequencing. First, we discuss the usefulness of microbiome profile analyses. Is a microbiome study actually the best approach for answering the research question? This is followed by addressing the criteria for the most appropriate study design, sample size, and the necessary data (study metadata) that should be collected. Next, we evaluate the available evidence for best practices in sample collection, transport, storage, and DNA isolation. Finally, an overview of possible sequencing options (eg, 16S rRNA gene hypervariable regions, sequencing platforms), processing and data interpretation approaches, as well as requirements for meaningful data storage, sharing, and reporting are provided.


| INTRODUC TI ON
About a decade ago, when the first publications on the oral microbiome using high throughput 16S ribosomal RNA gene amplicon sequencing appeared, 1-4 the methodologies of sample processing, sequencing, and the downstream bioinformatic analyses still had to evolve. Today, sequencing costs per megabase have dropped by at least 100-fold and anyone can send the samples to a sequencing facility and, depending on the services provided, receive fully or partially analyzed data. In the last decade, hundreds of original research articles addressing the oral microbiome have been published, providing an immense volume of information and knowledge on this topic.
However, the problem we are currently facing is that a vast amount of this microbiome data seems to have been created simply because it was both convenient and possible to do so. Frequently this has been done without collecting any oral health-related information. The data are explored for potential associations and correlations, which are commonly mixed up with causality, leading to overestimation of the clinical relevance and impact of the microbiome on the etiology and pathogenesis of various conditions. Reading such papers raises the question if microbiome sequencing was really necessary. Was this the best methodology for answering the original research question?
With this review, we aim to increase the quality standards for clinical studies with microbiome as an output parameter. To this end, we have critically assessed the current evidence for the best quality practices in oral microbiome studies. This evidence regards the entire process of the clinical study, including the research question, study design, required subject and sample information (metadata), sample type and collection, storage, and processing. We provide a brief overview of the options in the fast-growing field of sequencing itself and the downstream analyses for novices in the field of 16S rRNA gene amplicon sequencing. Finally, we list open questions that remain to be addressed to further increase the quality of the studies on oral microbiome.

| RE S E ARCH QUE S TI ON
The first studies on the human microbiome could be categorized as "the phone book" or as "the fishing expedition" studies, as they described the immense and previously unseen bacterial diversity in and on humans. The data from such descriptive studies provided evidence that the attempts of characterizing the human microbiome before the next generation sequencing era were far from complete.
New members of the microbiota, never isolated in the laboratory, became "visible" and could be linked to different intra-oral habitats or pathologic conditions. This has even led to the shift in paradigms on the exclusivity of specific, easily cultivable bacteria in the etiology of oral diseases. that individuals with a high copy-number of this gene had a distinct microbiome composition. 5 After posing the research question, one should investigate the available methodology and carefully judge if obtaining the data on oral microbiome under the circumstances of the planned study will be the best way to provide answers to the question at hand.
Perhaps there are other more direct and more economic ways of addressing the same research question. A recent study on the signs of acculturation of Mexican-American women in the USA 6 could be used to exemplify this issue. The authors collected detailed acculturation questionnaires regarding language, diet, alcohol consumption, and more. A mouthwash sample, originally collected for genetic analyses, was used for microbiome sequencing. None of the questions in the questionnaires regarded oral health care habits, visits to a dental care professional, or oral complaints. The authors concluded that immigration and adaptation to life in the USA were associated with differences in oral microbial profiles in these women. The more they were accultured, the higher the relative abundance of the genus Streptococcus and the lower bacterial diversity they had. However, since streptococci are one of the primary colonizers of teeth, their higher abundance and lower bacterial diversity in general might indicate a higher oral hygiene level, a plausible effect of acculturation. A simple oral hygiene index or at least a question about toothbrushing habits would have been more informative than the wealth of data provided by microbiome analysis.

| CONS IDER ATIONS FOR S TUDY DE S IG N
There are two broad categories of studies for biomedical research, observational and interventional (experimental), each with their own advantages and limitations. In oral microbiome research, the most commonly used designs of observational studies are case-control studies, cross-sectional studies, and cohort studies, while interventional studies are usually randomized clinical trials.

| Cross-sectional and case-control studies
In microbiome studies with a cross-sectional or case-control design, two or more groups are compared. These are relatively lowcost studies generating results quickly, thereby explaining their popularity in the microbiome field. In cross-sectional studies, the subjects are selected (randomly) from a population, based on a different exposure, for instance, current smokers vs never smokers, measured at one moment in time. The case-control design involves cases or outcomes of interest, for example, patients with a certain disease, and selected controls. These controls should be comparable (matched) with the cases as much as possible, with the exception of the disease or the condition of interest. A cross-sectional study is by definition limited to a single measurement of the oral microbiome and cannot assess temporality or causality. The same applies to case-control studies, but this study design can also be longitudinal.
It should be noted that both types are vulnerable to several types of bias. 7 The high risk of bias in cross-sectional and case-control studies is partly related to the complexity and dynamic nature of the host-microbiome interactions, as well as insufficient matching between the cases and controls. The results of such studies are difficult to reproduce and sometimes results even contradict other studies addressing the same disease or condition. For instance, a recent review identified studies on the oral microbiome of oral squamous cell carcinoma patients in comparison with healthy controls and highlighted large heterogeneity and contradictions in the microbial taxa associated with disease or health among the included studies. 8 The findings of some of these studies could be biased by poorly matched control groups. Even although the oral squamous cell carcinoma patients and controls in one of the reviewed studies were matched by age and gender, 9 they were not matched by lifestyle factors known to affect the oral microbiome, namely, tobacco smoking and betel quid chewing (Asian plant compounds with stimulatory substances).
The control group consisted of 50% smokers and 28% betel-chewers, while in the oral squamous cell carcinoma group these were 83% and 90%, respectively.
Another example of poor matching, which biased the study outcomes, ignored the differences in oral health between children with autism and healthy controls. 10 The authors found significant differences in salivary and plaque microbiome between the two groups, but also reported that children with autism had significantly higher decayed, missing, filled surfaces and gingival bleeding than the controls. The observed differences in microbiome could merely reflect the differences in oral health, and may have nothing to do with autism.
Besides various lifestyle factors and differences in oral health status, numerous other factors are known to affect the oral microbiome. These are summarized in the section on subject and sample metadata (section 3.5) and in Table 1. Because these can potentially influence study outcomes, they should not be ignored when selecting study subjects.

| Cohort studies
Cohort studies are considered the gold standard for observational research and can be performed both retrospectively and prospectively. In retrospective cohort studies, the oral samples, stored in a biobank, have usually been collected for purposes other than oral microbiome analyses, for instance, for genetic assessment as in the acculturation study mentioned above. 6 These studies frequently did not collect any information regarding the oral health of individuals, as the purpose for storing the sample in the biobank was related to general health.
Prospective cohort studies follow a specific outcome that has been planned upfront and the study subjects are examined and their microbiomes assessed as they get older, at several time points.
Consequently, prospective cohort studies usually create high quality data accompanied by the appropriate metadata, allowing both assessment of the role of the oral microbiome in etiology of the disease and potential for disease risk prediction. For instance, a recent publication on an Australian cohort of 134 children was followed from 2 months until 4 years of age demonstrated the potential of salivary microbiome in predicting the development of early childhood caries. 11

| Interventional studies
Interventional or experimental studies aim to assess the therapeutic or preventive effects of specific interventions by the investigator. The most common and strongest interventional study design is a randomized controlled trial, which is preferably triple blinded, for study subjects, clinical investigators, and (bio)statisticians. A traditional randomized controlled trial involves study subjects randomly allocated into two or more groups, where the intervention (the test) is compared with a control (positive or negative), or to no intervention at all. The strengths of this design are allocation concealment, the possibility to measure compliance and dropout, to analyze results by intention to treat, and to assess each treatment arm in the same manner, preferably using good clinical practice guidelines. A crossover randomized controlled trial design is a variant of a randomized controlled trial where the same individual is allocated randomly to start with one intervention, followed by a sufficiently long washout period, and completing with the other intervention. There are several aspects to consider when planning an intervention study.

| The observer effect
In oral health research, the study subjects may change their usual behavior, for instance, by temporarily improving their oral hygiene practices just because they are being meticulously observed by dental professionals. This is evidenced by an improved oral health status, such as lowered gingival bleeding and plaque scores in the first week of the study on healthy Dutch young adults (the authors' unpublished findings). As the study proceeds, the clinical outcomes (eg, bleeding, plaque indices) tend to increase, suggesting that the individuals get used to being observed and their behavior returns to what was normal for them. Because the clinical changes are related to changes in the microbial composition of dental plaque, 12 this implies that the composition of the samples collected at the start of the study may differ from those collected later, even without an active intervention. A solution to circumvent this issue could be introducing a "false" start of the study, followed by a "real" start, such as a second baseline visit, once the study subjects adapt to being observed. To date, there is no evidence available for the optimal number of visits or duration of such an adaptation period. -Gender 32,196 -Socioeconomic status 29,32 -Education level 32 -Ethnicity 32,36,37,197,198 General health factors: -Recent history of exposure to antibiotics 39 -Medication use 41 -Systemic diseases 72,199,200 - In bold -the minimum information that should be recorded and reported for reliable data interpretation.

| Temporal stability
Although the oral microbiome is shown to be the most stable niche among different niches of the human body, 13

| Population normalization at the start of the study
Another issue, relevant for interventional studies, regards oral prophylaxis prior to abstaining from the oral hygiene measures within an experimental gingivitis protocol. Some of these studies perform professional tooth cleaning at the start of the study in order to normalize their population to the same plaque level before entering the nonbrushing phase, 15,16 while others do not. 17,18 The magnitude and duration of the changes introduced by the prophylaxis step to the natural oral ecosystem will be highly individual and might introduce additional noise to the microbiome data, potentially leading to an underestimation of the effects of the intervention being studied or to clinically less relevant findings.

| Intervention
Additionally, the duration and the dosage of an intervention is frequently either arbitrarily chosen or based on the estimated clinical effects or even preliminary findings from in vitro experiments. However, in cases where subtle ecological changes are expected, for example, as a result of food supplements containing pre-or probiotics, the ecosystem might need a substantial time to remodel from one state to another. This would mean that longer intervention periods are required before any changes in composition can be measured. Therefore, when assessing novel ecological interventions, researchers might prefer to evaluate the minimum exposure required in a pilot experiment before setting up a much more costly and elaborate full-scale randomized controlled trial. 19 The data obtained from such a pilot would also provide the basic information needed for power estimation of the main study.

| Power analysis tools for microbiome studies
To date, reporting of power analyses in the methods sections of studies with microbiome as the main output parameter is scarce and is mainly limited to methodology papers, describing various tools for sample size calculations using microbiome data. [21][22][23][24] Some of these tools use frequency distributions of individual taxa, for example, operational taxonomic units, for sample size calculation. 21,24 Others are based on the measurement of the change in the community structure, for example, on pairwise distances between samples instead of changes in the relative abundances of individual taxa. 21

| Subject and sample metadata
The data created by sequencing of any microbial communities, including those of oral origin, remain just plain data without scientifically reliable interpretation if they lack accompanying information-the metadata-of the samples and study population (Table 1). In this section, we provide a summary of the current evidence for the measurable effects of demographic factors, general and oral health, as well as behavior of the individual on the oral microbiome ( Figure 1). Based on this evidence, we have listed the optimal and the minimum required information (

| Basic information
On all occasions, basic information about the study should be provided. This includes the sample collection date, a description of the study aim(s), inclusion and exclusion criteria of the study population, and the instructions given to the study subjects before the sample collection, such as duration of abstaining from toothbrushing, food or drink intake, chewing gum and mouthwash use.

Socioeconomic status and education level
A demographic factor known to be indirectly related to general and oral health involves both the socioeconomic status and education level of the individual. [33][34][35] Several studies report a relation between income or education level and oral microbial composition: a microarray study on stimulated saliva of 292 Danish adults with low levels of caries and periodontitis found that socioeconomic status explained 20% of the overall variance in salivary microbiome. 29 30 A study on supragingival and subgingival plaque and saliva from 192 subjects belonging to four major ethnicities in the USA also found taxonomic differences by ethnicity. 36 Interestingly, even genetically closely related populations, such as Japanese and South Korean orally healthy adults, differ in their salivary microbiome. 37

| General health factors known to affect oral health and oral microbiome
Any clinical study should have clearly predefined inclusion and exclusion criteria regarding general health of the study subjects, because of a strong relation between the oral and general health of the individual. 38 Previous exposure to antibiotics 39

Exposure to antibiotics
Regarding antibiotic exposure, there is no consensus on the minimum duration between exposure and enrollment in studies on oral microbiome. In studies on the gut microbiome, an arbitrary period of 6 months since the antibiotic exposure is frequently used, although shorter periods have also been applied. 19 In a randomized controlled trial, healthy individuals were exposed to either clindamycin, ciprofloxacin, amoxicillin, minocycline, or placebo, and their fecal and salivary microbiota were assessed during 1 year. 42 Antibiotics had very limited impact on the salivary microbiome, with effects on bacterial diversity and community structure being measurable right after the exposure, but already becoming undiscernible after 1 month, while the gut microbiota needed up to 1 year to recover from the exposure to some of the antibiotics. The response to antibiotics is highly individual and might also be influenced by the underlying infection when antibiotics are clinically prescribed instead of being tested on healthy individuals. Currently, most studies on the oral microbiome use a minimum of 2 or 3 months since the end of the antibiotic therapy as inclusion criterium.

Systemic diseases
A history of systemic diseases, such as diabetes, rheumatoid arthritis, and cardiovascular disease, that have implications for oral health, 38 should be recorded. A general characteristic of well-being such as frailty has also been shown to relate to differences in oral microbiome. 43 The oral microbiome was shown to differ by the body mass index of the study subjects. 28 Pregnancy is a typical exclusion criterion, as changes induced by pregnancy hormones on the entire body, including the oral microbial ecosystem, 44 will seriously bias the study outcomes. Additionally, if the study population involves infants and young children, information regarding the mode of birth 39,45,46 and predelivery antibiotic prophylaxis as a result of a Cesarian section might be relevant to record.

| Oral health-related factors known to affect oral microbiome
Oral health status One of the most important factors with the strongest evidence for an impact on the oral ecosystem and its microbiome is the oral health status of the individual. Gingival and periodontal health, 15,47 dental caries, 28,48 presence of intra-oral implants 49

Oral hygiene habits
Daily oral hygiene practices (eg, frequency of toothbrushing, efficacy of plaque removal) affect the oral health of the individual.
Therefore, it is not surprising that the salivary microbiome has been shown to reflect the oral hygiene level of the individual, both in children and in adults. 28,54 Nevertheless, numerous published studies fail to record metadata on oral health or oral hygiene behavior of the subjects. For instance, no oral health-related information was collected in a recent study comparing the salivary microbial composition of chronic fatigue syndrome patients with that of age-, gender-, and body mass index-matched healthy controls. 55 Because chronic fatigue syndrome is a neurological disorder, differences in the attitude and level of self-performed oral care could be the most likely explanation for the observed differences in salivary microbiome between the cases and controls, and should have been assessed at least additionally to the microbiome analyses. Similarly, the oral microbiome from mouthrinse samples of diabetics was compared with those of obese and nonobese nondiabetics without any oral status assessment. 56 Prolonged use of a particular oral health care product with additives aiming at ecological modification of the oral ecosystem has also been shown to affect oral microbial composition. For example, use of a toothpaste containing enzymes and proteins 57 or a toothpaste containing arginine has been shown to affect supragingival plaque composition, 58,59 while twice-daily mouthrinse with amine fluoride and stannous fluoride for 2 weeks resulted in microbial shifts in tongue and saliva. 60 Frequency of tongue brushing was recently shown as not only able to affect tongue microbiome composition, but was also related to the effects of chlorhexidine mouthwash on the microbial composition. 61 Besides recording tongue cleaning habits, the presence of tongue piercings should also be noted and preferably used as exclusion criterion, as these could be potential reservoirs of taxa associated with periodontitis. 62 In addition to piercings, wearing appliances such as fixed braces during orthodontic treatment should be considered as exclusion criterion, as these increase plaque retention and affect the oral microbial ecosystem. 63,64

Self-reported oral health status
There is no doubt that oral health and oral hygiene behavior have a direct impact on the oral microbiome and vice versa. However, it is not always possible to conduct a clinical intra-oral examination. In such cases, at least the minimum information on oral care habits and oral health status should be obtained through questionnaires that have been validated for self-assessment of oral health, for example, for the periodontal status 65,66 and dental caries experience. 67 To date, we are not aware of a single validated questionnaire that could be used for all oral health-related factors, as shown in Table 1, and therefore a composite of questions from different questionnaires should be used.

Smoking
Currently, there is ample evidence that smoking tobacco not only has devastating effects on general health, but also on the oral health and oral microbiome. [68][69][70][71][72][73][74][75][76] Because smoking cessation also leads to measurable changes in oral microbiota, 77 it should be noted how long ago it was since an individual stopped smoking.
Modern alternatives to tobacco smoking, such as electronic cigarette smoking, have not yet been investigated in any great detail, although a pilot study on this topic did not find any difference between electronic cigarette smokers (N = 10) and nonsmoking controls (N = 10). 69 Besides conventional tobacco cigarette smoking, smoking of dokha (an Arabic tobacco product) was also shown to affect oral microbiome and lead to dysbiosis, while microbiota of shisha (a water pipe) smokers did not differ from nonsmokers in a study with 330 subjects from the United Arab Emirates. 78

Alcohol consumption
Another lifestyle factor with prominent health effects is alcohol consumption. In the oral cavity, alcohol is metabolized by oral bacteria into acetaldehyde, which is a known carcinogen. 79 A study of 1044 US adults found that heavy and moderate drinkers had higher alpha diversity and that their oral microbiome differed from nondrinkers. 80 It should be noted, though, that this study lacked any information regarding the oral health status or oral care habits of the individuals and therefore the differences observed could have been biased by these factors. To correct for these confounders the authors used surrogate oral health indicators: presence of Porphyromonas gingivalis and Aggregatibacter actinomycetemcomitans for periodontal disease and a high proportion of Streptococcus mutans for caries, which is a very simplified view of oral diseases.

Chewing psycho-stimulatory substances
In some cultures, but especially in south Asia, southeastern Asia, and the Pacific, areca nut or betel quid chewing is a widespread habit, and has become a leading cause of oral cancer in those areas of the world. 81 Differences in the oral microbiome of betel chewers compared with the control group were reported. 81 Chewing of leaves and twigs of khat that provide amphetamine-like effects is another habit gaining popularity among certain cultures 82 and is shown to affect the oral microbiome. 83 Diet Diet has been shown to explain a considerable part of the variation in gut microbiome composition. 40 The strongest evidence of the effects of dietary components on the oral microbiome regard sugar intake. [84][85][86] Differences have been found among the oral microbiomes obtained in the Philippines from hunter-gatherers, who rely on fishing, hunting, and gathering, compared with traditional farmers who rely on cultivated rice and vegetables in their diet, and those living on a Western diet. 87 Differences in self-reported bovine milk intake were associated with oral microbial differences in Swedish adolescents. 88 A study following African celiac children, who switched from an African-style, gluten-free diet, known to contain noncertified foods contaminated with gluten, to an Italian-style diet of certified gluten-free products for 60 days, reported changes in their salivary microbiome and metabolome composition. 89 However, a study of Italian subjects following a habitual omnivore (N = 55), ovo-lacto-vegetarian (N = 55), or vegan (N = 51) diet for at least 1 year before the sample collection found no differences in their salivary microbiome. 90 By contrast, a more recent study comparing healthy Danish vegans (N = 78) and omnivores (N = 82) did find differences in their salivary microbiomes. 91 Furthermore, a study of 282 US subjects assessing the effects of the frequency of consumption of beverages containing sugar, meat, poultry, fish, vegetables, and fruits in the week prior to the sample collection found taxa that differentiated the dietary habits. 32 Analysis of food frequency questionnaire data in comparison with oral microbiome found that saturated fatty acid and vitamin C intake correlated with differences in microbial composition in a study of 182 Americans. 92 Breastfeeding has been shown to lead to different oral microbiota compared with formula-fed infants. 93 A recent study found that effects of partial vs no breastfeeding were still evidenced in the salivary microbiomes of 2-and 7-year-olds. 39 A recent randomized controlled trial with Estonian schoolchildren assessed the long-term effects of candies containing different polyols-erythritol, xylitol, or sorbitol-on salivary microbiome composition. 94 The group consuming erythritol-containing candies for 3 years during school days had the microbiome deviating the most from the rest and the lowest caries scores at the end of the intervention.

| Other factors potentially influencing oral microbiome
Climate, season of enrollment, and time of the day It has been reported that populations living in different geographic and climatic environments (Alaska, Germany, or Africa) differ in their salivary microbiomes. 95 This study, however, did not account for any oral health status-related confounders.
The composition of some human microbial habitats has been shown to depend on the season of enrollment into the study, for example, in the case of the nasal microbiome of infants. 96

Tap water quality and composition
An interesting observation arose from a citizen science project in Spain involving 1555 adolescents (aged 13-15 years) and their teachers from 40 Spanish schools. 100 Their salivary microbiome varied not only by different lifestyle and oral hygiene habits, but also by certain parameters (eg, alkalinity, water hardness) of the tap water in the municipality they lived in. The authors concluded that drinking water may contribute to the shaping of the oral microbiota.

| Intra-oral niches
The oral cavity is a complex ecosystem, consisting of different niches with compositionally different communities, where shedding (mucosal tissue) and nonshedding (dental hard tissue) surfaces form two major, compositionally distinct niches ( Figure 2). 101 Thus, a universal "oral microbiome" sample that would represent the entire ecosystem does not exist. Besides these tissue-related differences, there is a spatial gradient, shaped by salivary flow, from the front to the back of the mouth. 102 The anatomic location (eg, upper buccal molar surface vs lower lingual) has been shown to affect the composition of supragingival plaque within the same individual. 103 To date, numerous types of samples have been used to study the oral microbiome, each with their own advantages and limitations (

Stimulated and unstimulated saliva
It should be realized that saliva itself is not a niche. It is a continuously produced bodily fluid, with microbial and host cells that are dislodged from the oral surfaces and collected together with salivary components. Both its volume and biochemical composition will be different if salivary secretion is passive or stimulated, for example, by a masticatory or gustatory stimulus. 105 To date, Unstimulated saliva 106 A proxy for oral microbiome; noninvasive; self (home) sampling and repeated sampling possible Does not represent a specific niche; relatively time-consuming and drooling might feel uncomfortable (5 min) Stimulated saliva 106 A proxy for oral microbiome; noninvasive; self (home) sampling and repeated sampling possible; faster collection and less discomfort than unstimulated saliva Does not represent a specific niche; requires a gum base or parafilm; more diluted than unstimulated sample; chewing activity might affect sample content Oral rinse/mouthwash 108 A proxy for oral microbiome; noninvasive; self (home) sampling and repeated sampling possible; fast (30-60 s) Does not represent a specific niche; requires a mouth rinse, constituents of which might affect the composition; more diluted than saliva sample Pooled supragingival plaque 118 Represents an intra-oral niche, relevant for oral health; contains low human DNA proportion Sample composition depends on time since toothbrushing and brushing efficiency; self-sampling possible but less reliable than by trained researcher; sampling all surfaces time-consuming; repeated sampling possible only after regrowth of dental plaque Site-specific supragingival dental plaque 104 Represents a specific dental site; allows discrimination between caries lesions and intact surfaces Sampling requires a trained researcher and a clinical setting; surfaces need to be cleaned and diagnosed in a different appointment; time since cleaning needs to be standardized; low sample biomass; repeated sampling possible only after regrowth of dental plaque Subgingival plaque 13 Represents an intra-oral niche, relevant for oral health; possible to sample specific sites repeatedly if using paperpoints Sampling requires removal of supragingival plaque by a trained researcher in a clinical setting; low sample biomass in cases without periodontal pockets; use of paperpointsrisk of DNA contaminants; use of curettes -risk of damage to periodontium and not suitable for frequent resampling Interproximal plaque 207 Represents an intra-oral niche, relevant for oral health; possible to sample specific interdental area and assess effectiveness of anti-biofilm measures on plaque stagnation sites; high bacterial diversity Interdental hygiene habits affect sample composition; not possible to sample with deficient restorations; low sample biomass; highly trained researcher required; repeated sampling limited -requires accumulation of mature biofilm; for replicates within 1 wk -comparable but different sites should be sampled Tongue swab 13 Represents an intra-oral niche; easy to sample; selfsampling and repeated sampling possible; most stable intra-oral niche in general; sufficient material to sample Tongue brushing habit affects the composition; low similarity with dental plaque composition; high compositional stability might limit the applicability in intervention studies; high human DNA proportion Buccal swab 13 Represents an intra-oral niche; relatively easy to sample; self-sampling and repeated sampling possible Low bacterial diversity; potential contamination with other surfaces (eg, teeth) and saliva will affect composition; high human DNA proportion Tonsillar swab 13 Represents an intra-oral niche, relevant for oral and general health; microbial community not disturbed by toothbrushing Requires trained researcher to sample; uncomfortable and uneasy sample collection procedure; repeated sampling only possible after considerable time Palatal swab 13 Represents an oral niche, specifically relevant for oral health of full upper denture wearers; relatively easy to sample; repeated sampling possible Another recent study, though, comparing Scope mouthwash samples with unstimulated saliva, did report differences in microbial composition between these two sample types. 111  Another study compared saliva collected by passive drooling, active spitting, and 10 mL saline rinse for 1 minute. 113 Higher amounts of total and bacterial DNA were obtained from the oral rinse samples, followed by spit and drool samples. The alpha diversity tended to be higher in the oral rinse samples than the others, while microbial composition was driven by individual subject and did not differ by sample type.
In summary, unstimulated saliva and the sample collected by oral rinse seem to differ, especially in the DNA yield and bacterial diversity, but compositional differences are minor.

| Niche-specific oral samples
As stated above, saliva or oral rinse samples do not represent a certain intra-oral niche, but compositionally they do resemble samples from the mucosal surfaces. 101,112 Oral diseases involve specific surfaces and are often biofilm-initiated, therefore site-specific samples are frequently preferred to salivary or rinse samples (Table 2).
Some studies using saliva have failed to discriminate differences that were clinically discernable, for example, in the cases with and without caries, and conclude that saliva is not the best sample for that purpose. 114 Again, depending on the aims and hypotheses of the study, the most appropriate sample type(s) should be selected.
If the aim is to assess overall microbiome diversity, then collecting multiple samples from different niches will be the most appropriate approach. One specific niche or sample type might be more dis-

Niche-specificity in children
The oral microbiome in children undergoes various developmental stages, following the anatomic changes occurring because of teeth eruption and growth, and changes in feeding habits. 115  irrespective of the sample type. 116 Cariogenic bacteria (S. mutans, Streptococcus sobrinus) were not limited to the dental surfaces and also increased with age. 117 The lowest proportion of these microorganisms was found in subgingival plaque, while the mucosal swab sample contained the highest proportions of S. sobrinus and

S. mutans.
Recently, findings that saliva and supragingival plaque in young children harbor very different microbial communities have been confirmed by 16S rDNA amplicon sequencing. 48,118 Plaque was shown to have a higher alpha diversity than saliva. A longitudinal study on maturation of the oral microbiome in 119 caries-free children showed that both saliva and plaque undergo distinct compositional changes in the period from 1 to 4 years of age. 118

Niche stability and comparability in adults
As stated in the sections above, different intra-oral niches will result in different microbiome outcomes. The choice of the sample could also be based on the robustness or temporal stability of the niche.
Once established, oral microbial communities remain relatively stable, with tongue dorsum being the least variable in time, and subgingival plaque being the most (Figure 3). 13 Intra-individual temporal stability and inter-individual differences were recently assessed in a study comparing tongue, saliva, and supragingival plaque in 10 individuals sampled at daily, weekly, and monthly intervals for up to 1 year. 119 The authors found that plaque was significantly more variable than tongue or saliva. Additionally, they demonstrated that machine-learning approaches could assign the samples to the right individual with 88%, 96%, and 97% accuracy, when using tongue, supragingival plaque, and salivary microbiome data, respectively.
This suggests that tongue, although less variable, harbors a microbiome that is less discriminatory among adult individuals than saliva or plaque.
A relevant question regarding sample choice in periodontitis patients is if one has to collect subgingival plaque in order to assess the effects of interventions. Perhaps other samples such as saliva or tongue swab, both of which are less invasive and less time-consuming, but known to differ significantly from dental plaque, are sufficient for observing differences over time or among study groups and could lead to conclusions comparable with those based on subgingival plaque samples. A recent study compared microbial profiles of 14 periodontitis patients before and after periodontal therapy, obtained from supragingival and subgingival plaque, chewing-stimulated saliva, and tongue swab samples. 120 The authors found that the relative abundances of 12 with periodontitis-associated taxa, based on the red and orange complexes, 121

Dental calculus
Supragingival plaque data from healthy individuals (the Human Microbiome Project data set) were recently compared with data from modern and ancient dental calculus samples, where the modern calculus samples originated from both healthy subjects and periodontitis patients. 123 There was a distinct separation in microbial profiles between plaque and calculus, with calculus samples having a higher proportion of periodontal disease-associated species, irrespective of oral health status.
In summary, the choice of sample type(s) for a study on oral microbiome is not always straightforward and simple, and should be performed after evaluating relevance with regard to the study purpose and the feasibility in relation to the costs and logistics of the study.

| Controls for microbiome studies
Another important aspect in the design of a microbiome study is planning and including both negative and positive controls to process alongside the biologic samples with each sample batch (Table 3).

| Negative controls
Inclusion of negative or blank controls allows assessing and correcting for potential contamination. Contamination with bacterial DNA from nonsample sources was never an issue in the studies targeting specific taxa or assessing microbial composition by culture. In a recent review, a large number of genera included in the list of the common contaminant taxa belonged to a normal human (oral) microbiome, originating from the laboratory personnel. 126 The sources of DNA contaminants range from the sampling materials and laboratory environment, researchers, and consumables, to DNA extraction kits and laboratory reagents. 126 For example, infant nasopharyngeal samples clustered by the lot number of the DNA extraction kit used in the study. 127 Besides the effects of the production lot, the DNA extraction blank controls across multiple studies have been shown to share several taxa, 127 leading to the new term in this field, a "kitome". 128 Three types of negative controls should be included with each sample batch for microbiome analysis (Table 3)

| Positive controls
In addition to the negative controls, two types of positive controls should be included and processed together with each batch of samples (Table 3)

| Sample collection methodology
Depending on the sample type, different sample collection methods are used; most of these were discussed in section 3.6 (Sample type choice). Irrespective of the sample type, the sampling method should have a low risk of introducing contamination to the sample. For instance, for subgingival plaque sampling, sterile curettes should be chosen over paperpoints if these cannot be claimed to be bacterial DNA-free. 124 Additionally, the sampling method should be feasible to perform under the conditions of the given study, considering the skills of the operator (eg, self-sampling at home, sampling by a medical nurse or a trained dental professional).

| Sample transport and storage
In

| Bacterial DNA isolation
Only when the entire sample collection process is finished can the next step in sample processing-the isolation of bacterial DNA-start.
It should be noted that it is advisable to wait with isolating DNA if it is not going to be processed further right away. DNA deteriorates and loses its quality during prolonged storage. Another reason for waiting with DNA isolation and processing all samples in one go is to reduce the risk of batch bias. Importantly, the samples belonging to different treatments should be randomized to avoid sample differences because of a batch effect, while time series (samples collected from the same individual at numerous time points) should preferably be processed together, in one batch, to reduce the inter-sample variability. Different sample types (eg, saliva, plaque, and mucosal tissue biopsy) will require different first steps in their processing. If the samples are stored in a transport fluid, this usually needs to be removed first. Thereafter, the obtained sample pellet should be subjected to cell lysis. This can be done either chemically (eg, with phenol/chloroform, TRIS-EDTA buffer), enzymatically (eg, using lysozyme, proteinase K, achromopeptidase) or mechanically (eg, by bead beating in the presence of high-density beads), but preferably by a combination of these methods. Cells of the Gram-positive bacteria are generally more difficult to lyse and will require a mechanical lysis step. 132 After cell lysis, the DNA needs to be separated from the lysate and purified. This is typically done using one of the commercially available DNA isolation kits developed for specific sample types and purposes (Table 5). In gut microbiome research, recent extensive study systematically compared 21 protocols for DNA isolation methods from fecal samples, and reported a large variation in DNA yield and quality between the protocols used. 133 The protocols that performed best resulted in higher alpha diversity and included steps of mechanical lysis with zirconia beads and shaking.

TA B L E 4 Comparison of different oral sample transportation and storage methods
A similar study has not yet been performed on oral samples. It is important for researchers performing more than one microbiome study and aiming to compare their findings with those from their own future studies that the same laboratory facilities and equipment are used, as well as the same sample processing protocols, while insuring that all procedures are performed under aseptic conditions. 126

| AMPLI CON PREPAR ATI ON AND S EQUEN CING
The steps after the sample DNA isolation, purification, and quantifi- are some crucial decisions that should be made upfront.

| Hypervariable region choice
The component of the small ribosomal subunit gene, the 16S rRNA gene, is approximately 1500 base pairs long and contains nine highly conserved parts, which are nearly identical in most bacteria, and nine hypervariable regions, parts of which have slowly evolved and can be used for discriminating different bacterial taxa. Unfortunately, different hypervariable regions evolved differently and there is no single region that would be able to distinguish all bacterial lineages. 135,136 The most optimal would be sequencing the entire gene, thus about 1500 bases. To date, this is possible with few sequencing technologies (Table 6). However, most studies currently use technologies that are high throughput and deliver shorter but high quality sequencing reads. For this, one should choose which region(s) or combinations of regions to target.
The importance of the 16S rRNA gene hypervariable region choice was clearly illustrated in the early days of the next generation sequencing era using 454 pyrosequencing technology: results obtained from sequencing different hypervariable regions (V1-V3, V4-V6, V7-V9) of subgingival plaque bacterial DNA differed significantly. 137 For example, the genus Fusobacterium accounted for 18% of the sequences in the data set from V1-V3, for 4% in the V4-V6 data set, but was not detected at all in the data set from V7-V9.
Thus, the latter region was not discriminatory enough for this specific taxon, and the respective sequences were classified at a higher taxonomic level (eg, class, order, or phylum) instead.
To assist in the choice of region, especially if there is no previously published comparison available on the particular sample type targeted by different regions, there are tools available which allow in silico assessment of the taxa that could at least theoretically be distinguished by specific primers aimed to amplify specific hypervariable regions. 138,139 Since the introduction of the V4-based Illumina MiSeq protocol, 140 the V4 hypervariable region has been frequently chosen ahead of others. This is mainly because this region is entirely covered by the two 250 nucleotide paired-end reads (thus, sequenced from both ends, creating a complete overlap), thereby reducing the error rate to a minimum. 141 Besides the taxonomic differences introduced by the use of different hypervariable region(s), each primer pair will have their own primer bias: some taxa will be amplified more efficiently than others. Although most prokaryotes share the conserved regions of the 16S rRNA gene, there are no universal primers which will amplify all bacterial taxa. Some primer sets include degenerate bases (a mix of a number of possible bases instead of a single base) to reduce mismatches with bases of the 16S rRNA region and improve amplification of taxa that otherwise would not amplify or amplify less efficiently.
In summary, differences in universal primers and in hypervariable regions will affect the data obtained from the sequencing run.
Again, as with sample processing, one should choose the methodology carefully and be consistent, as the results will not be directly comparable with studies using different methods.

| Sequencing platform choice
Yet another design option is the choice of sequencing platform. The first sequencing method was developed in 1977 by Sanger et al 142 and was revolutionary for that time, maintaining a monopoly until

| B IOINFORMATIC S
The sequences obtained using one of the next generation sequencing technologies need to be processed into a data set that can be used for testing the study hypothesis. In the early days of microbiome research, researchers had to rely on separate, custom-made scripts, using a command line and requiring long computing times. 4,149 In the past decade, this field has evolved from numerous web-interfaces and software packages that combine several tools to complete selfcontained data-processing and analysis pipelines such as QIIME and mothur (Table 7). For the advantages and shortcomings of the majority of these tools, refer to systematic comparisons published elsewhere. 150,151 Below, we briefly summarize the data-processing steps and issues that are of importance in generation of valid study outcomes.

| Data quality-filtering
First, the sequences have to be quality-filtered: the bases or reads with low quality scores (assigned to each read during the sequencing run) have to be removed. There is no default way to filter low quality regions or reads. The filtering depends on the sequencing platform, pipeline, and specific filtering method used. Therefore, these details should be reported in manuscripts. Each read is assigned to its sample of origin based on the barcode or index sequence. If the barcode and the primer were part of the sequence, these are trimmed off. Paired-end reads are merged. Reads not assigned to any samples, reads of insufficient length, or reads with ambiguous bases, are generally removed. 152 Next, chimeras or sequences that result from chimeric amplification during the PCR process need to be identified and removed. 153 One can choose a specific software for identification of chimeric sequences or rely on tools provided by the respective complete data-processing pipeline. 150 In QIIME the default method is ChimeraSlayer, 154 while in mothur it is UCHIME. 155

| Operational taxonomic units
After quality-filtering, sequences are usually grouped (clustered) in operational taxonomic units, typically at a 97% similarity level, which was proposed in the 1990s as an approximation of bacterial species. 156 This threshold leads to a reduced contribution of potential errors introduced both by PCR and sequencing in the final data set, called an operational taxonomic unit

| Single nucleotide resolution
To reduce the dependency from sequencing errors and to obtain a data set at a single nucleotide resolution, thus at 100% instead of 97% sequence similarity, different error-correction or denoising approaches have become available (

| Taxonomy assignment
Each feature (eg, operational taxonomic unit, zero-radius operational taxonomic unit, or minimum entropy decomposition) in the data table needs to be assigned a taxonomy. This is done by comparing the sequences from the data set with the sequences in a 16S rRNA gene reference database. There are large databases, such as SILVA, 167 Greengenes, 168

and the Ribosomal Database
Project, 169 containing bacterial sequences from all areas of microbiology, and specific databases limited to a single bacterial habitat, such as HOMD 170 and CORE, 171 both of which are limited to sequences of microbiota previously associated with the oral cavity.
The advantage of using databases tailored for the oral microbes is their higher taxonomic resolution than the broad databases. On the other hand, oral samples, especially if originating from immunocompromised individuals or very young children, may contain sequences that are not normally found in the oral cavity but are common in other environments such as water or soil. Depending on the sample, a larger or smaller proportion of the sequences in the data set will not be assigned taxonomy using the oral database alone or will be classified at a very low resolution such as phylum or even domain level. Therefore, a taxonomy assignment with one of the broad-range databases should be performed in parallel to the oral database.
TA B L E 7 An overview of 16S rRNA gene amplicon data-processing software

Process Tool Description
Quality control FastQC 217 Quality control of raw sequencing data Self-contained analysis pipelines (including quality-filtering, chimera removal, the construction of OTU tables, assignment of taxonomy, with or without data analyses) QIIME 218 Quantitative Insights Into Microbial Ecology. Software pipeline from raw sequencing data until data interpretation (visualization, statistical tests) QIIME 2 192 Redesigned QIIME. Supports processing the sequence data as well as downstream analyses Mothur 219 A single software package for the analysis of amplicon sequencing data MG-RAST 220 MetaGenome Rapid Annotation using Subsystem Technology, a web-based pipeline, also used for the analysis of shotgun metagenomics data USEARCH (UPARSE) 221,222 Software that supports all steps necessary to produce an OTU Produces sOTU with single-nucleotide resolution (putative error-free sequences); processes each sample independently Abbreviations: ASVs, amplicon sequence variants; MED, minimum entropy decomposition; OTU, operational taxonomic unit; sOTU, sub-operational taxonomic unit; zOTU, zero-radius operational taxonomic unit 6.3 | Data analyses

| Assessment of study controls
One crucial step before addressing the research question and looking at the study outcomes is a critical assessment of the study controls (Table 3). Negative controls are contaminated if they present high DNA yield relative to the samples and a high number of sequencing reads per control, at or above the detection limit of the positive controls (if these were included at various dilutions). In such a case, the data from the samples which had low DNA yield, resulted in a low number of reads, or both, should be discarded.
Next, the sequences dominating in the controls should be compared with those in the samples. After identification of the contaminants, these should be subtracted from the final data set and reported as such. One may use a very conservative approach by removing all taxa present in the controls, but this may lead to removal of taxa that are truly present in the samples. There are filtering approaches available that would avoid the aforementioned issue. [172][173][174] Also, one can use predictive modeling provided by tools such as SourceTracker 175 to identify putative contaminants in the data set.
In large-scale studies involving several sequencing runs and processing batches, the data from the positive controls should be used to assess the run-to-run variability and a potential batch effect.

| Data normalization issue
Although sequencing is performed with an equimolar amplicon mix of the samples, there are always inaccuracies in library standardization and amplicon pool mixing and thus library size standardization, as well as in the sequencing process itself. These inaccuracies may lead to a 10-or even a 100-fold range in the number of reads per individual sample, which in turn will influence the study results: the samples sequenced at a higher depth will have higher species richness (number of taxa) than those at a lower sequencing depth, without any biologic reason behind these differences. Therefore, the data need to be normalized before the downstream analyses can be performed.
Currently, the most commonly applied normalization is rarefac-  177 : for sample groups with large (10-fold) differences in the mean sequencing depth, rarefying was shown to lower the false discovery rate compared with a normalization by distribution used in DESeq2. 178 Another study addressing the normalization issue concluded that the best method will depend on the exact structure of the data. 179 To date, there is no consensus on the best method for normalization, but one should be aware that the method used may impact the study results. 177,179

| Data compositionality issue
The compositionality of the data is an issue that is frequently un- To deal with compositional data, it is advised to perform logarithmic ratio transformation (eg, log-ratio, centered log-ratio, isometric log-ratio, additive log-ratio transformations). 180 These transformations, however, suffer from the sparsity of the data (a high count F I G U R E 4 Illustration of the compositionality of the microbiome data, from Gloor et al 180 (A) showing that the data obtained after sequencing cannot provide information on the absolute abundance of bacteria. The number of counts (reads) in the data set reflects the proportion of counts per feature (eg, operational taxonomic unit, gene) per sample, multiplied by the sequencing depth, thus the relative abundances. The bar charts in (B) show the difference between the bacterial count and the proportion of bacteria for two features, A (red) and B (gray) in three samples. Features A and B in samples 2 and 3 appear with the same relative abundance, although the absolute counts in the environment were different. The table in (C) shows real and perceived changes for each sample in transition from one sample to another of zeros in the data table; logarithm of zero is undefined). For that, specific methods can be applied such as implemented in the zCompositions R package 180,182 or pseudo-counts can be used, although there is no consensus on the pseudo-count value. 177 In their work, Gloor et al 180 provide a list of methods and downstream analyses that account for data compositionality. It is important to realize that ignoring the compositional nature of the data may lead to erroneous conclusions not based on true biologic differences.

| Downstream analysis tools
Finally, what remains is to make sense out of the data. Already in the planning stage of the study, one is advised to become acquainted with the amplitude of the downstream tools which can be used for analyzing the data, depending on the study design and hypothesis. For this, reading tutorials and user manuals of one of the major pipelines such as mothur or QIIME may be useful. These will include, but will not be limited to the alpha (within-sample) and beta (between-samples) diversity assessment, data visualization by ordination techniques such as principal component analysis or principal coordinate analysis, and the use of appropriate statistics. Some of these approaches have been clearly explained and illustrated by Goodrich et al. 19 Beyond the tools implemented in the above-mentioned data analysis pipelines, several software tools are available for comparison of two or more groups of microbial communities or for identifying differential taxa between the groups (Table 8).
Only recently, specific tools for longitudinal microbiome data sets have become available. 183 The QIIME2 pipeline now supports analyses of time-series data using q2-longitudinal software plugin, 184 while new dynamic models have been reconstructed from time series data. 185

| Reporting of the study, data deposition, and reuse
As already stated in the section on study metadata (section 3.5), each study should be reported in a way that study methods can be reproduced. A detailed description of the study population with the necessary metadata, detailed and properly referenced methods on sample collection procedure, sample processing, as well as the steps involved in data creation and processing, should be reported.
Recently, FAIR guiding principles for scientific data management and stewardship have been proposed. 186 FAIR stands for Findable, Accessible, Interoperable, and Reusable, and refers to improved infrastructure that will support the reuse of scientific data. The majority of current research data are obtained with public funding and should therefore be publicly available.

| Data depository
Most journals but also research funding organizations require authors to make their sequencing data available. This could be "available upon request", but most often data deposition is required in publicly available databases, such as the Sequence Read Archive (often referred to as the Short Read Archive) of the National Center for Biotechnology Information 187 and the European Nucleotide Archive. 188 Together with the data, a minimum amount of information on the experiment has to be provided, including details on sequencing, such as the target gene or gene region, the sequencing method used, and a reference to the publication with details regarding the study. 189,190 Both the Sequence Read Archive and European Nucleotide Archive deploy data standards and checks on data submission in collaboration with the Genomic Standards Consortium. However, it still remains the responsibility of authors to make the data and metadata publicly available.

| Data reuse
Usually an omics study, such as the microbiome, is published in a relatively condensed way, presenting the major findings of the study in the main and supplementary material. This, however, does not mean that the scientific value of the data is exhausted. For example, the data obtained in the aforementioned study on the effects of antibiotics 42 were reanalyzed by experts in data modeling  191 Although the possibility of combining different microbiome data sets from several studies into a single data set and performing a meta-analysis is currently still a challenge because of heterogeneity in study methodology, it will certainly become of high scientific value. This can only be possible if the data are findable (via an accession number in the publication) and well documented by all necessary metadata. Often, reanalysis of deposited data of a single study takes substantial effort and necessitates contacting the authors because of missing metadata, erroneous data accession numbers, or a lack of description in the processing steps, even in the accompanying publication.
In general, reproducing the study outcomes should always be possible if the methods used are provided in sufficient detail in the publication. To assist researchers, recent microbiome processing pipelines, for example, QIIME 2 192 and DADA2/phyloseq, 193 focus more on reproducible workflows.

| OPEN QUE S TI ON S REG ARD ING THE QUALIT Y OF OR AL MI CROB I OME S TUD IE S
Currently, the body of scientific knowledge is not always large or robust enough to pose a meaningful hypothesis for every microbiome-based study. 194 It is also likely that other confounding factors besides those listed in Table 1