The mouth‐opening muscular performance in adults with and without temporomandibular disorders: A systematic review

Abstract Background The mouth‐opening muscular performance in patients with temporomandibular disorders (TMDs) is unclear. Understanding the impairments of this muscle group within specific TMDs is important to develop proper management strategies. Objective To characterise the mouth‐opening muscular performance in adults with and without TMDs. Methods PubMed, EMBASE, CINAHL, Scopus, Web of Science and Cochrane databases were searched from inception to 12 November 2020. Bibliographies were searched for additional articles, including grey literature. Case‐control, cross‐sectional and interventional studies reporting mouth‐opening muscular strength and/or endurance were included. Risk of bias was assessed by the SIGN checklist for case‐control studies and by the NIH quality assessment tool for cross‐sectional studies. Results were pooled with a random‐effects model. Confidence in cumulative evidence was determined by means of the GRADE guidelines. Results Fourteen studies were included; most were rated as having a moderate risk of bias. Only three studies assessed patients with TMDs and the other 11 assessed healthy adults. Significant sex differences in muscular performance were found for healthy adults in the review (strength deficit for females versus males). There was a significant reduction in maximal mouth opening performance (strength and endurance) in the three studies that assessed patients with temporomandibular disorders. Conclusion Sex plays a significant role in maximal mouth opening strength. There is a lack of reliable data on the normal mouth‐opening strength and endurance of healthy adults as well as for patients with TMDs. Implications Lack of reliable TMDs patient data and comparable healthy adult data highlight future direction for research.


| INTRODUC TI ON
The masticatory muscles are divided into two main categories according to their functions of mouth openers or mouth closers. 1 The mouth closers are the masseter, temporalis and medial pterygoid muscles which work against gravity and are more dominant and stronger than the mouth openers. 1 They are, therefore, considered as one of the most common sites of pain in the masticatory system. 1 The mouth closers are also closely involved in both awake and sleep bruxism (masticatory muscle activity during sleep or wakefulness). 2 The main opener muscle of the mouth is the lateral pterygoid muscle, which also contributes to protrusion and lateral deviation of the mandible, both of which are movements required for normal mastication. 3 The other mouth opening synergists are the supra-and infra-hyoid muscles, which are also involved in different oromotor functions, such as tongue stability, swallowing and speech. 1 There are four suprahyoid muscles on each side of the mouth, the stylohyoid, digastric, mylohyoid, and geniohyoid, and two infrahyoid muscles on each side of the anterior neck, the sternohyoid and omohyoid.
The muscular performance of the mouth closers has been intensively researched in both healthy controls and patients with temporomandibular disorders (TMDs). [4][5][6][7] In contrast, comparable knowledge on mouth openers is very limited. A recent systematic review and meta-analysis which assessed the muscular function of patients with TMDs observed that no study that measured the function of the mouth openers had been included compared to 22 studies that evaluated the function of the mouth closers. 8 The most widely researched population among the few available studies that did assess the muscular performance of the mouth openers comprised healthy elderly individuals from Japan. [9][10][11] One of the main reasons given for under-researching the mouth openers is that activation of the mouth opening muscles is not required for the initial phase of functional mouth opening but rather relaxation of the mouth closers. 1 This argument is mainly valid for the initial phase of mouth opening but not for common masticatory muscle functions, such as yawning, or even gum chewing that requires muscular activation of the mouth openers. 12 Furthermore, given that patients with TMDs are very likely to present with over-activity of the mouth closers, 2 it could be hypothesised that their mouth openers are also required to be active during the initial phase of mouth opening in order to overcome the actions of the closers. It is also very likely that, similar to other regions of the human body, the relationship between the muscular agonist-antagonist is a relevant factor in rehabilitation of the associated musculoskeletal disorders. 13,14 The aim of this review was to systematically evaluate the currently existing evidence on the muscular performance of the mouth openers in patients with TMDs. The research questions were as follows: 1. What is the normal range of human mouth-opening muscular performance (strength and endurance)? 2. Are there standardised, valid and reliable tests to measure mouthopening muscular performance (strength and endurance)?
3. Is mouth-opening muscular performance (strength and endurance) impaired in patients with TMDs compared to healthy controls?

| ME THODS
A review protocol was developed according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) 15 and registered with Prospero prior to initiating this systematic review (Registration date: Dec 15, 2020, CRD42020220878). 16  Table 1.

| Identification and selection of studies
Reference lists from the included studies were also scanned to identify additional relevant studies. No restriction was placed on publication date. Studies identified by the search were transferred to Endnote X9 (Clarivate Analytics) and duplicates were removed. The remaining studies were then uploaded into Covidence systematic review software (Veritas Health Innovation) where two independent reviewers (TG and AEP) screened the titles and abstracts to identify potentially eligible articles. The full texts of the remaining studies were retrieved for further assessment and were included/excluded according to the eligibility criteria ( Figure 1). Reasons for exclusions during full-text screening were recorded for future reference. All stages of the screening and assessment were performed independently by the two reviewers, and meetings were held periodically to compare and discuss decisions. In the case of disagreement, a third review member was consulted (LP).

| Outcome measures
The main outcome measure of this study was muscular performance during mouth opening which included maximal muscle strength and muscle endurance. The secondary outcome measure was muscular performance during mandibular protrusion (a component of full mouth opening) which included maximal muscle strength and endurance.

| Data extraction
Following inclusion into this analysis, data were extracted from each study by means of a standardised form, which had been

127
Web of Science Search Strategy: ("mouth opening" OR "jaw opening" OR suprahyoid* OR "supra hyoid*") AND ( Observational and interventional studies, studies containing baseline data for ≥ 1 outcome measure (muscular strength, power, force or endurance) of adult patients with any medical condition or healthy controls Exclusion criteria: Reviews, single case, cadaver, animal or unclear data were annotated as "not specified" or "unsure", respectively, and the authors of those publications were contacted for clarification. Data collection was performed independently by two reviewers (TG and AEP). Any disagreements were resolved through consultation with a third review member (LP) and the outcome was documented. All forms were stored for future reference. Results for each relevant outcome measure were extracted by one reviewer (TG) and recorded directly into a protected file.

| Risk of bias assessment
The risk of bias for each eligible study was evaluated independently by two reviewers (TG and AEP) using two different quality assessment tools. The SIGN checklist 17 was used for case-control studies and for interventional studies which included cases and controls (Appendix 2). The NIH quality assessment tool 18 was used for cross-sectional studies and for interventional studies which included only one homogeneous group (Appendix 3). The main domains of both quality assessment tools explored (a) sample selection and characteristics, (b) assessor blinding, (c) validity, reliability, and standardisation of outcome measures, (d) confounders and (e) statistical methods. Prior to their implementation, the SIGN and NIH checklist items were discussed by two reviewers (TG and AE) and underwent a pilot assessment to ensure consistency in marking. Each reviewer completed the SIGN/ NIH checklist for the included studies and determined an overall risk of bias rating of low (score of 9-12 methodological points), moderate (score of 5-8 methodological points), or high (score of 0-4 methodological points). Intra-rater agreement was calculated with Cohen's Kappa. Any disagreement was resolved through discussion with a third review member (LP). The authors of the publications were contacted for clarification in the case of unclear or missing information.

| Data analysis
The outcome measure data were compared between studies to establish patterns within and/or between the patient populations and control groups. A meta-analysis was planned to be performed using primary outcome measures where there were ≥5 studies with (a) low to moderate risk of bias and (b) similar assessment and measurement techniques. Results for eligible studies were pooled using Review Manager via a random effects model. Mean differences and standard mean differences were used to determine differences between subgroups, with 95% confidence intervals (CIs) and heterogeneity calculated by means of Cochran's Q test. 15 Studies with high risk of bias, heterogenous assessment procedures or incomplete statistical reporting (e.g. absence of standard deviation [SD] values) were not included in this meta-analysis.

| Confidence in cumulative evidence
The confidence in cumulative evidence was assessed for each outcome according to GRADE guidelines. [19][20][21][22][23][24][25][26] Each outcome was given an overall confidence level of "high", "moderate", "low" or "very low", taking into consideration factors, such as risk of bias, consistency of results, effect size and sample size.

| Study selection
The progression of studies through the review process is demonstrated in Figure 2.
The database search identified 2506 studies, of which 1455 were duplicates. Following screening of titles, abstracts and full texts, 14 studies met the eligibility criteria and were included in this F I G U R E 2 PRISMA flowchart of included and excluded studies review. The list of full text excluded and the reason for exclusion is shown in Table 2.

| Study characteristics
The characteristics of each eligible study are shown in Table 3.
Twelve of the fourteen studies were observational (8 cross-sectional, 3 case-control and 1 reliability) and two were interventional (one randomised control trial and two clinical trials). The most frequently used outcome measure was maximal mouth opening strength (12 studies), and only two studies measured muscular endurance. 27,28 Three studies used the same measurement device and similar testing procedure (jaw-opening sthenometer by Livert), 9,29,30 two other studies used similar devices 31,32 and the remaining nine studies used a specific adhoc unique measurement device with different testing procedures.

| Participants
A total of 1867 adults were included across the 14 studies (mean age = 39.8 ± 12.0 years). All studies included data on sex which could be pooled, and they yielded 1122 females (60%) and 755 males (40%). The combined study participants were divided into two main subgroups according to their health condition: 1651 healthy controls (mean age = 39.8 ± 12; 57% females) and 216 patients with TMDs (mean age = 37.6 ± 11.6; 83% females).

| Outcomes
The studies which evaluated each of the two subgroups are shown in Table 4. Eleven of the fourteen included studies evaluated the mouth opening performance of healthy controls and three of patients with TMDs (two compared to controls and one with TMDs only).

| Risk of bias
Assessment of the risk of bias of each study included in this systematic review is summarised in Table 5a  The "general pain" TMD (according to DC/TMD) group had lower endurance than the "local pain" TMD group (DC/TMD) in both jaw opening and protrusions. No accurate numbers are described but rather only box plots

| Main findings
A summary of the findings for each included study is provided in Table 4.

| Healthy subjects
Thirteen studies assessed the muscular performance of mouth opening among healthy participants (age ≤65 years; n = 1651; 941 females and 710 males). Only three of those studies used a similar measurement device and procedure, and therefore were not appro- Each study used different measurement devices and protocols.
Two studies measured mouth opening endurance 29,30 and the other one determined maximal mouth opening strength as an outcome measure. 31 Two studies compared the muscular performance of patients with pain-related and/or intra-articular TMDs to healthy controls, 30,31 and one study compared the muscular performance of two different pain-related TMDs subgroups. 29 Significant reductions of muscular performance were found among patients with TMDs compared to healthy controls, with no difference between TMD subgroups. 30,31 Patients with TMD-related pain who presented with "general pain" demonstrated lower endurance compared to those without "general pain". 29

| Confidence in cumulative evidence
Based upon the GRADE guidelines, 22 there is only low-quality evidence to support the findings of mouth opening strength among healthy adults due to the high variability of findings, the different measurement devices and procedures and the lack of reliability and validity. Importantly, there is only very low quality of evidence to support the findings for patients with TMDs due to a very low number of relevant studies, together with the use of different measurement devices and procedures.

| DISCUSS ION
This is the first systematic review to comprehensively examine human mouth opener muscle performance. The findings suggest that the parameters of sex and age influence maximal mouth opening strength in healthy population, with large gaps and limitations in the reliability and accuracy of these findings. A very small volume of evidence was found for patients with TMDs. Unlike the availability of information on mouth closer muscles, the evidence regarding the muscular endurance of the mouth opener muscles for both healthy and patient populations is extremely limited.

| Healthy adults
As expected, the largest volumes of evidence of mouth opener muscular performance applied to healthy adults who provided the reference data of normal muscular function to which other groups of patients could be compared. However, these data are extremely

| TMDs
Only three studies that assessed the mouth-opening muscular

| Limitations
The limitations of this study were primarily due to the relatively small volume of available literature. Only fourteen studies met the eligibility criteria of this review, and no homogenic group was found

| Future direction
This review highlights the need for future research into several important areas of interest. The most basic scientific need is to establish a valid and reliable measurement device and testing procedure for the maximal strength and endurance capacity of mouth opening muscles (both mandibular depressors and protrusion muscles 1 ). This will require a well-designed intra-and inter-tester reliability study on healthy controls followed by patients with TMDs in order to validate such a test. A proper real-time observation study on the mouthopening muscular performance will be required, probably using a real-time ultrasonography and/or electromyography devices.
After validating the muscular performance tests, baseline data of healthy controls of different ages will be needed, ideally by performing an international multicentre study. The normal agonistantagonist muscular performance ratio between the mouth opener and closer musculature of males and females of different age groups would be another interesting factor for observation at this stage of research, similar to the existing data on different musculoskeletal regions, such as the knee and shoulder. 13,36,37 The application of the physiological muscular performance data as a reference for comparison with different relevant patient populations in an international multicentre study (TMDs, dysphagia, obstructive sleep apnoea and bruxism) will comprise the next step for investigation. That step may help to identify clinical subgroups that would benefit from muscular rehabilitation programs tailored specifically to improve the mouth-opening muscular performance. The clinical implications of the results are to carefully screen for clinical signs and symptoms of the mouth openers in patients with TMDs and to address it during the multidisciplinary rehabilitation process.

| CON CLUS ION
This is the first systematic review to comprehensively examine mouth-opening muscular performance in healthy and TMDs populations. The findings suggest significant influence of the parameters of sex and age, similar to the findings for other muscle groups. This review also exposes several major gaps in the current literature regarding mouth-opening muscular performance. One is the lack of a valid and reliable test for this unique muscle group, another is the need for an estimation of normal physiological muscular performance and the third is the proper evaluation of muscular performance in patients with common relevant disorders, such as TMD, dysphagia, obstructive sleep apnoea and bruxism.

CO N FLI C T O F I NTE R E S T
None.

AUTH O R ' S CO NTR I B UTI O N S
All authors were involved in study inception and design, and critical manuscript revision. TG collected, analysed and interpreted data and wrote the manuscript. AEP screened papers, extracted data, assessed risk of bias and critically reviewed the manuscript.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/joor.13303.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

2.4
Notes. Summarise the authors conclusions. Add any comments on your own assessment of the study, and the extent to which it answers your question and mention any areas of uncertainty raised above. a Unless a clear and well defined question is specified in the report of the review, it will be difficult to assess how well it has met its objectives or how relevant it is to the question you are trying to answer on the basis of the conclusions. b Study participants may be selected from the target population (all individuals to which the results of the study could be applied), the source population (a defined subset of the target population from which participants are selected), or from a pool of eligible subjects (a clearly defined and counted group selected from the source population. If the study does not include clear definitions of the source population it should be rejected. c All selection and exclusion criteria should be applied equally to cases and controls. Failure to do so may introduce a significant degree of bias into the results of the study. d Differences between the eligible population and the participants are important, as they may influence the validity of the study. A participation rate can be calculated by dividing the number of study participants by the number of eligible subjects. It is more useful if calculated separately for cases and controls. If the participation rate is low, or there is a large difference between the two groups, the study results may well be invalid due to differences between participants and non-participants. In these circumstances, the study should be downgraded, and rejected if the differences are very large. e Even if participation rates are comparable and acceptable, it is still possible that the participants selected to act as cases or controls may differ from other members of the source population in some significant way. A well conducted case-control study will look at samples of the non-participants among the source population to ensure that the participants are a truly representative sample. f The method of selection of cases is of critical importance to the validity of the study. Investigators have to be certain that cases are truly cases, but must balance this with the need to ensure that the cases admitted into the study are representative of the eligible population. The issues involved in case selection are complex, and should ideally be evaluated by someone with a good understanding of the design of case-control studies. If the study does not comment on how cases were selected, it is probably safest to reject it as a source of evidence. g Just as it is important to be sure that cases are true cases, it is important to be sure that controls do not have the outcome under investigation. Control subjects should be chosen so that information on exposure status can be obtained or assessed in a similar way to that used for the selection of cases. If the methods of control selection are not described, the study should be rejected. If different methods of selection are used for cases and controls the study should be evaluated by someone with a good understanding of the design of case-control studies. h If there is a possibility that case ascertainment can be influenced by knowledge of exposure status, assessment of any association is likely to be biased. A well conducted study should take this into account in the design of the study. i The primary outcome measures used should be clearly stated in the study. If the outcome measures are not stated, or the study bases its main conclusions on secondary outcomes, the study should be rejected. Where outcome measures require any degree of subjectivity, some evidence should be provided that the measures used are reliable and have been validated prior to their use in the study. j Confounding is the distortion of a link between exposure and outcome by another factor that is associated with both exposure and outcome. The possible presence of confounding factors is one of the principal reasons why observational studies are not more highly rated as a source of evidence. The study should indicate which potential confounders have been considered, and how they have been allowed for in the analysis. Clinical judgement should be applied to consider whether all likely confounders have been considered. If the measures used to address confounding are considered inadequate, the study should be downgraded or rejected. A study that does not address the possibility of confounding should be rejected. k Confidence limits are the preferred method for indicating the precision of statistical results, and can be used to differentiate between an inconclusive study and a study that shows no effect. Studies that report a single value with no assessment of precision should be treated with extreme caution. l Rate the overall methodological quality of the study, using the following as a guide: High quality (++): Majority of criteria met. Little or no risk of bias. Results unlikely to be changed by further research. Acceptable (+): Most criteria met. Some flaws in the study with an associated risk of bias, Conclusions may change in the light of further studies. Low quality (0): Either most criteria not met, or significant flaws relating to key aspects of study design. Conclusions likely to change in the light of further studies.

APPENDIX 2 (Continued) APPENDIX 2 (Continued)
A PPE N D I X 3

Criteria Yes No
Other (CD, NR, NA)

G U I DA N CE FO R A SS E SS I N G TH E Q UA LIT Y O F O B S ERVATI O N A L CO H O RT A N D CROSS -S EC TI O N A L S TU D I E S
The guidance document below is organized by question number from the tool for quality assessment of observational cohort and cross-sectional studies.

Q U E S TI O N 1 . R E S E A RCH Q U E S TI O N
Did the authors describe their goal in conducting this research? Is it easy to understand what they were looking to find? This issue is important for any scientific paper of any type. Higher quality scientific research explicitly defines a research question.

Q U E S TI O N S 2 A N D 3 . S TU DY P O PU L ATI O N
Did the authors describe the group of people from which the study participants were selected or recruited, using demographics, location, and time period? If you were to conduct this study again, would you know who to recruit, from where, and from what time period?
Is the cohort population free of the outcomes of interest at the time they were recruited?
An example would be men over 40 years old with type 2 diabetes who began seeking medical care at Phoenix Good Samaritan Hospital between January 1, 1990 and December 31, 1994. In this example, the population is clearly described as: (1)  In cohort studies, it is crucial that the population at baseline is free of the outcome of interest. For example, the nurses' population above would be an appropriate group in which to study incident coronary disease. This information is usually found either in descriptions of population recruitment, definitions of variables, or inclusion/exclusion criteria.
You may need to look at prior papers on methods in order to make the assessment for this question. Those papers are usually in the reference list.
If fewer than 50% of eligible persons participated in the study, then there is concern that the study population does not adequately represent the target population. This increases the risk of bias.

Q U E S TI O N 4 . G RO U P S R ECRU ITE D FRO M TH E SA M E P O PU L ATI O N A N D U N I FO R M E LI G I B I LIT Y CR ITE R I A
Were the inclusion and exclusion criteria developed prior to recruitment or selection of the study population? Were the same underlying criteria used for all of the subjects involved? This issue is related to the description of the study population, above, and you may find the information for both of these questions in the same section of the paper.
Most cohort studies begin with the selection of the cohort; participants in this cohort are then measured or evaluated to determine their exposure status. However, some cohort studies may recruit or select exposed participants in a different time or place than unex- cases, the answer would be "yes." However, observational cohort studies often do not report anything about power or sample sizes because the analyses are exploratory in nature. In this case, the answer would be "no." This is not a "fatal flaw." It just may indicate that attention was not paid to whether the study was sufficiently sized to answer a prespecified question-i.e., it may have been an exploratory, hypothesis-generating study.

Q U E S TI O N 6 . E X P OS U R E A SS E SS E D PR I O R TO O UTCO M E M E A S U R EM ENT
This question is important because, in order to determine whether an exposure causes an outcome, the exposure must come before the outcome. With either of these types of cohort studies, the cohort is followed forward in time (i.e., prospectively) to assess the outcomes that occurred in the exposed members compared to nonexposed members of the cohort. Therefore, you begin the study in the present by looking at groups that were exposed (or not) to some biological or behavioral factor, intervention, etc., and then you follow them forward in time to examine outcomes. If a cohort study is conducted properly, the answer to this question should be "yes," since the exposure status of members of the cohort was determined at the beginning of the study before the outcomes occurred.
For retrospective cohort studies, the same principal applies.
The difference is that, rather than identifying a cohort in the present and following them forward in time, the investigators go back in time (i.e., retrospectively) and select a cohort based on their exposure status in the past and then follow them forward to assess the outcomes that occurred in the exposed and nonexposed cohort members. Because in retrospective cohort studies the exposure and outcomes may have already occurred (it depends on how long they follow the cohort), it is important to make sure that the exposure preceded the outcome.
Sometimes cross-sectional studies are conducted (or crosssectional analyses of cohort-study data), where the exposures and outcomes are measured during the same timeframe. As a result, cross-sectional analyses provide weaker evidence than regular cohort studies regarding a potential causal relationship between exposures and outcomes. For cross-sectional analyses, the answer to Question 6 should be "no." (yes/no), then this question should be given an "NA," and it should not count negatively towards the quality rating. which has been tested and calibrated) and a standardized protocol (e.g., patient is seated for 5 minutes with feet flat on the floor, BP is taken twice in each arm, and all four measurements are averaged). In each of these cases, the former would get a "no" and the latter a "yes."

Q U E S TI O N 9. E X P OS U R E M E A S U R E S A N D A SS E SS M ENT
Here is a final example that illustrates the point about why it is important to assess exposures consistently across all groups: If people with higher BP (exposed cohort) are seen by their providers more frequently than those without elevated BP (nonexposed group), it also increases the chances of detecting and documenting changes in health outcomes, including CVD-related events. Therefore, it may lead to the conclusion that higher BP leads to more CVD events.
This may be true, but it could also be due to the fact that the subjects with higher BP were seen more often; thus, more CVD-related events were detected and documented simply because they had more encounters with the health care system. Thus, it could bias the results and lead to an erroneous conclusion.

Q U E S TI O N 11 . O UTCO M E M E A S U R E S
Were the outcomes defined in detail? Were the tools or methods for measuring outcomes accurate and reliable-for example, have they been validated or are they objective? This issue is important because it influences confidence in the validity of study results. Also important is whether the outcomes were assessed in the same manner within groups and between groups.
An example of an outcome measure that is objective, accurate, and reliable is death-the outcome measured with more accuracy than any other. But even with a measure as objective as death, there can be differences in the accuracy and reliability of how death was assessed by the investigators. Did they base it on an autopsy report, death certificate, death registry, or report from a family member?
Another example is a study of whether dietary fat intake is related to blood cholesterol level (cholesterol level being the outcome), and the cholesterol level is measured from fasting blood samples that are all sent to the same laboratory. These examples would get a "yes." An example of a "no" would be self-report by subjects that they had a heart attack, or self-report of how much they weigh (if body weight is the outcome of interest).
Similar to the example in Question 9, results may be biased if one group (e.g., people with high BP) is seen more frequently than another group (people with normal BP) because more frequent encounters with the health care system increases the chances of outcomes being detected and documented.

Q U E S TI O N 12 . B LI N D I N G O F O UTCO M E A SS E SS O R S
Blinding means that outcome assessors did not know whether the participant was exposed or unexposed. It is also sometimes called "masking." The objective is to look for evidence in the article that the person(s) assessing the outcome(s) for the study (for example, examining medical records to determine the outcomes that occurred in the exposed and comparison groups) is masked to the exposure status of the participant. Sometimes the person measuring the exposure is the same person conducting the outcome assessment. In this case, the outcome assessor would most likely not be blinded to exposure status because they also took measurements of exposures. If so, make a note of that in the comments section.
As you assess this criterion, think about whether it is likely that the person(s) doing the outcome assessment would know (or be able to figure out) the exposure status of the study participants. If the answer is no, then blinding is adequate. An example of adequate blinding of the outcome assessors is to create a separate committee, whose members were not involved in the care of the patient and had no information about the study participants' exposure status.
The committee would then be provided with copies of participants' medical records, which had been stripped of any potential exposure information or personally identifiable information. The committee would then review the records for prespecified outcomes according to the study protocol. If blinding was not possible, which is sometimes the case, mark "NA" and explain the potential for bias.

Q U E S TI O N 13 . FO LLOW U P R ATE
Higher overall followup rates are always better than lower followup rates, even though higher rates are expected in shorter studies, whereas lower overall followup rates are often seen in studies of longer duration. Usually, an acceptable overall followup rate is considered 80 percent or more of participants whose exposures were measured at baseline. However, this is just a general guideline. For example, a 6-month cohort study examining the relationship between dietary sodium intake and BP level may have over 90 percent followup, but a 20-year cohort study examining effects of sodium intake on stroke may have only a 65 percent followup rate.

QU E S TI O N 14 . S TATI S TI C A L A N A LYS E S
Were key potential confounding variables measured and adjusted for, such as by statistical adjustment for baseline differences?
Logistic regression or other regression methods are often used to account for the influence of variables not of interest.
This is a key issue in cohort studies, because statistical analyses need to control for potential confounders, in contrast to an RCT, where the randomization process controls for potential confounders. All key factors that may be associated both with the exposure of interest and the outcome-that are not of interest to the research question-should be controlled for in the analyses.
For example, in a study of the relationship between cardiorespiratory fitness and CVD events (heart attacks and strokes), the study should control for age, BP, blood cholesterol, and body weight, because all of these factors are associated both with low fitness and with CVD events. Well-done cohort studies control for multiple potential confounders.

S O M E G E N E R A L G U I DA N CE FO R D E TE R M I N I N G TH E OV E R A LL Q UA LIT Y R ATI N G O F O B S E RVATI O N A L CO H O RT A N D CROSS -S EC TI O N A L S TU D I E S
The questions on the form are designed to help you focus on the key concepts for evaluating the internal validity of a study. They are not intended to create a list that you simply tally up to arrive at a summary judgment of quality.
Internal validity for cohort studies is the extent to which the results reported in the study can truly be attributed to the exposure being evaluated and not to flaws in the design or conduct of the study-in other words, the ability of the study to draw associative conclusions about the effects of the exposures being studied on outcomes. Any such flaws can increase the risk of bias.
Critical appraisal involves considering the risk of potential for selection bias, information bias, measurement bias, or confounding (the mixture of exposures that one cannot tease out from each other).
Examples of confounding include co-interventions, differences at baseline in patient characteristics, and other issues throughout the questions above. High risk of bias translates to a rating of poor quality. Low risk of bias translates to a rating of good quality. (Thus, the greater the risk of bias, the lower the quality rating of the study.) In addition, the more attention in the study design to issues that can help determine whether there is a causal relationship between the exposure and outcome, the higher quality the study. These include exposures occurring prior to outcomes, evaluation of a doseresponse gradient, accuracy of measurement of both exposure and outcome, sufficient timeframe to see an effect, and appropriate control for confounding-all concepts reflected in the tool.
Generally, when you evaluate a study, you will not see a "fatal flaw," but you will find some risk of bias. By focusing on the concepts underlying the questions in the quality assessment tool, you should ask yourself about the potential for bias in the study you are critically appraising. For any box where you check "no" you should ask, "What is the potential risk of bias resulting from this flaw in study design or execution?" That is, does this factor cause you to doubt the results that are reported in the study or doubt the ability of the study to accurately assess an association between exposure and outcome?
The best approach is to think about the questions in the tool and how each one tells you something about the potential for bias in a study. The more you familiarize yourself with the key concepts, the more comfortable you will be with critical appraisal. Examples of studies rated good, fair, and poor are useful, but each study must be assessed on its own based on the details that are reported and consideration of the concepts for minimizing bias.