Volume 57, Issue 1

A Covariance Estimator for GEE with Improved Small‐Sample Properties

Lloyd A. Mancl

Corresponding Author

Department of Dental Public Health Sciences, University of Washington, Box 357475, Seattle, Washington 98195, U.S.A.

*email:lman@biostat.washington.eduSearch for more papers by this author
Timothy A. DeRouen

Department of Dental Public Health Sciences, University of Washington, Box 357475, Seattle, Washington 98195, U.S.A.

Department of Biostatistics, University of Washington, Box 357475, Seattle, Washington 98195, U.S.A.

Search for more papers by this author
First published: 24 May 2004
Citations: 271

Abstract

Summary. In this paper, we propose an alternative covariance estimator to the robust covariance estimator of generalized estimating equations (GEE). Hypothesis tests using the robust covariance estimator can have inflated size when the number of independent clusters is small. Resampling methods, such as the jackknife and bootstrap, have been suggested for covariance estimation when the number of clusters is small. A drawback of the resampling methods when the response is binary is that the methods can break down when the number of subjects is small due to zero or near‐zero cell counts caused by resampling. We propose a bias‐corrected covariance estimator that avoids this problem. In a small simulation study, we compare the bias‐corrected covariance estimator to the robust and jackknife covariance estimators for binary responses for situations involving 10–40 subjects with equal and unequal cluster sizes of 16–64 observations. The bias‐corrected covariance estimator gave tests with sizes close to the nominal level even when the number of subjects was 10 and cluster sizes were unequal, whereas the robust and jackknife covariance estimators gave tests with sizes that could be 2–3 times the nominal level. The methods are illustrated using data from a randomized clinical trial on treatment for bone loss in subjects with periodontal disease.

Number of times cited according to CrossRef: 271

  • Sample size estimation for stratified individual and cluster randomized trials with binary outcomes, Statistics in Medicine, 10.1002/sim.8492, 39, 10, (1489-1513), (2020).
  • An empirical comparison of methods for analyzing over-dispersed zero-inflated count data from stratified cluster randomized trials, Contemporary Clinical Trials Communications, 10.1016/j.conctc.2020.100539, (100539), (2020).
  • Variance estimation when using propensity‐score matching with replacement with survival or time‐to‐event outcomes, Statistics in Medicine, 10.1002/sim.8502, 39, 11, (1623-1640), (2020).
  • Essential Ingredients and Innovations in the Design and Analysis of Group-Randomized Trials, Annual Review of Public Health, 10.1146/annurev-publhealth-040119-094027, 41, 1, (1-19), (2020).
  • Maintaining the validity of inference in small‐sample stepped wedge cluster randomized trials with binary outcomes when using generalized estimating equations, Statistics in Medicine, 10.1002/sim.8575, 39, 21, (2779-2792), (2020).
  • Palatogingival groove and root canal instrumentation, International Endodontic Journal, 10.1111/iej.13259, 53, 5, (660-670), (2020).
  • Chromobacterium violaceum delivers violacein, a hydrophobic antibiotic, to other microbes in membrane vesicles, Environmental Microbiology, 10.1111/1462-2920.14888, 22, 2, (705-713), (2020).
  • A note on the bias of standard errors when orthogonality of mean and variance parameters is not satisfied in the mixed model for repeated measures analysis, Statistics in Medicine, 10.1002/sim.8474, 39, 9, (1264-1274), (2020).
  • Dynamic treatment regimens in small n, sequential, multiple assignment, randomized trials: An application in focal segmental glomerulosclerosis, Contemporary Clinical Trials, 10.1016/j.cct.2020.105989, 92, (105989), (2020).
  • An evaluation of quadratic inference functions for estimating intervention effects in cluster randomized trials, Contemporary Clinical Trials Communications, 10.1016/j.conctc.2020.100605, (100605), (2020).
  • Sample size calculation in three‐level cluster randomized trials using generalized estimating equation models, Statistics in Medicine, 10.1002/sim.8670, 39, 24, (3347-3372), (2020).
  • Propensity score methods for time‐dependent cluster confounding, Biometrical Journal, 10.1002/bimj.201900277, 62, 6, (1443-1462), (2020).
  • Variance estimation in inverse probability weighted Cox models, Biometrics, 10.1111/biom.13332, 0, 0, (2020).
  • Power analysis for cluster randomized trials with multiple binary co‐primary endpoints, Biometrics, 10.1111/biom.13212, 0, 0, (2020).
  • Tools for Selecting Working Correlation Structures When Using Weighted Gee to Model Longitudinal Survey Data, Journal of Survey Statistics and Methodology, 10.1093/jssam/smz048, (2020).
  • A comparison study on modeling of clustered and overdispersed count data for multiple comparisons, Journal of Applied Statistics, 10.1080/02664763.2020.1788518, (1-13), (2020).
  • Analysis of Fluid Velocity inside an Agricultural Sprayer Using Generalized Linear Mixed Models, Applied Sciences, 10.3390/app10155029, 10, 15, (5029), (2020).
  • Effects of an Unstructured Free Play and Mindfulness Intervention on Wellbeing in Kindergarten Students, International Journal of Environmental Research and Public Health, 10.3390/ijerph17155382, 17, 15, (5382), (2020).
  • A pseudo-likelihood approach for multivariate meta-analysis of test accuracy studies with multiple thresholds, Statistical Methods in Medical Research, 10.1177/0962280220948085, (096228022094808), (2020).
  • Statistical Inference Based on Accelerated Failure Time Models Under Model Misspecification and Small Samples, Statistics in Biopharmaceutical Research, 10.1080/19466315.2020.1752297, (1-11), (2020).
  • A Stochastic Second-Order Generalized Estimating Equations Approach for Estimating Association Parameters, Journal of Computational and Graphical Statistics, 10.1080/10618600.2019.1710156, (1-15), (2020).
  • Comparison of small-sample standard-error corrections for generalised estimating equations in stepped wedge cluster randomised trials with a binary outcome: A simulation study, Statistical Methods in Medical Research, 10.1177/0962280220958735, (096228022095873), (2020).
  • Relative efficiency of equal versus unequal cluster sizes in cluster randomized trials with a small number of clusters, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2020.1814795, (1-16), (2020).
  • Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure, Statistics in Medicine, 10.1002/sim.8415, 39, 4, (438-455), (2019).
  • Bias‐reduced and separation‐proof GEE with small or sparse longitudinal binary data, Statistics in Medicine, 10.1002/sim.8126, 38, 14, (2544-2560), (2019).
  • Random main effects of treatment: A case study with a network meta‐analysis, Biometrical Journal, 10.1002/bimj.201700265, 61, 2, (379-390), (2019).
  • Empirical evaluation of the implementation of the EMA guideline on missing data in confirmatory clinical trials: Specification of mixed models for longitudinal data in study protocols, Pharmaceutical Statistics, 10.1002/pst.1964, 18, 6, (636-644), (2019).
  • Sensitivity of methods for analyzing continuous outcome from stratified cluster randomized trials – An empirical comparison study, Contemporary Clinical Trials Communications, 10.1016/j.conctc.2019.100405, (100405), (2019).
  • SMARThealth India: A stepped-wedge, cluster randomised controlled trial of a community health worker managed mobile health intervention for people assessed at high cardiovascular disease risk in rural India, PLOS ONE, 10.1371/journal.pone.0213708, 14, 3, (e0213708), (2019).
  • A multicentre controlled pre–post trial of an implementation science intervention to improve venous thromboembolism prophylaxis in critically ill patients, Intensive Care Medicine, 10.1007/s00134-019-05532-1, (2019).
  • Properties and pitfalls of weighting as an alternative to multilevel multiple imputation in cluster randomized trials with missing binary outcomes under covariate-dependent missingness, Statistical Methods in Medical Research, 10.1177/0962280219859915, (096228021985991), (2019).
  • The Relationship of Post-traumatic Stress Disorder to End-of-life Care Received by Dying Veterans: a Secondary Data Analysis, Journal of General Internal Medicine, 10.1007/s11606-019-05538-x, (2019).
  • Simultaneous inference for multiple marginal generalized estimating equation models, Statistical Methods in Medical Research, 10.1177/0962280219873005, (096228021987300), (2019).
  • Effect Partitioning in Cross-Sectionally Clustered Data Without Multilevel Models, Multivariate Behavioral Research, 10.1080/00273171.2019.1602504, (1-20), (2019).
  • Power and sample size requirements for GEE analyses of cluster randomized crossover trials, Statistics in Medicine, 10.1002/sim.7995, 38, 4, (636-649), (2018).
  • Sample size re‐estimation for clinical trials with longitudinal negative binomial counts including time trends, Statistics in Medicine, 10.1002/sim.8061, 38, 9, (1503-1528), (2018).
  • A population‐averaged approach to diagnostic test meta‐analysis, Biometrical Journal, 10.1002/bimj.201700187, 61, 1, (126-137), (2018).
  • A PRESS statistic for working correlation structure selection in generalized estimating equations, Journal of Applied Statistics, 10.1080/02664763.2018.1508560, 46, 4, (621-637), (2018).
  • Longitudinal data methods for evaluating genome-by-epigenome interactions in families, BMC Genetics, 10.1186/s12863-018-0642-7, 19, S1, (2018).
  • Prognosis of Midlife Stroke, Journal of Stroke and Cerebrovascular Diseases, 10.1016/j.jstrokecerebrovasdis.2017.11.029, 27, 5, (1153-1159), (2018).
  • The relationship between media type and vocabulary learning in a cross age peer-learning program for linguistically diverse elementary school students, Contemporary Educational Psychology, 10.1016/j.cedpsych.2018.12.004, (2018).
  • Inference With Difference-in-Differences With a Small Number of Groups, Medical Care, 10.1097/MLR.0000000000000830, 56, 1, (97-105), (2018).
  • Prediction of health effects of cross-border atmospheric pollutants using an aerosol forecast model, Environment International, 10.1016/j.envint.2018.04.035, 117, (48-56), (2018).
  • Power and sample size calculation incorporating misspecifications of the variance function in comparative clinical trials with over‐dispersed count data, Biometrics, 10.1111/biom.12878, 74, 4, (1459-1467), (2018).
  • Sample size determination for GEE analyses of stepped wedge cluster randomized trials, Biometrics, 10.1111/biom.12918, 74, 4, (1450-1458), (2018).
  • A comparison of bias‐corrected empirical covariance estimators with generalized estimating equations in small‐sample longitudinal study settings, Statistics in Medicine, 10.1002/sim.7917, 37, 28, (4318-4329), (2018).
  • Effect of heteroscedasticity between treatment groups on mixed‐effects models for repeated measures, Pharmaceutical Statistics, 10.1002/pst.1872, 17, 5, (578-592), (2018).
  • A Comparison of Correlation Structure Selection Penalties for Generalized Estimating Equations, The American Statistician, 10.1080/00031305.2016.1200490, 71, 4, (344-353), (2018).
  • Analysis Considerations in Industrial Split-Plot Experiments with Non-Normal Responses, Journal of Quality Technology, 10.1080/00224065.2004.11980264, 36, 2, (180-192), (2018).
  • A readily available improvement over method of moments for intra-cluster correlation estimation in the context of cluster randomized trials and fitting a GEE–type marginal model for binary outcomes, Clinical Trials, 10.1177/1740774518803635, (174077451880363), (2018).
  • Practical and Effective Approaches to Dealing With Clustered Data, Political Science Research and Methods, 10.1017/psrm.2017.42, (1-19), (2018).
  • A review and empirical comparison of causal inference methods for clustered observational data with application to the evaluation of the effectiveness of medical devices, Statistical Methods in Medical Research, 10.1177/0962280218799540, (096228021879954), (2018).
  • A novel approach to selecting classification types for time-dependent covariates in the marginal analysis of longitudinal data, Statistical Methods in Medical Research, 10.1177/0962280218799529, (096228021879952), (2018).
  • A weighted Jackknife method for clustered data, Communications in Statistics - Theory and Methods, 10.1080/03610926.2018.1440597, (1-18), (2018).
  • Improving recovery outcomes among MSM: The potential role of recovery housing, Journal of Substance Use, 10.1080/14659891.2018.1523966, (1-7), (2018).
  • Approaches for the utilization of multiple criteria to select a working correlation structure for use within generalized estimating equations, Communications in Statistics - Simulation and Computation, 10.1080/03610918.2018.1484476, (1-15), (2018).
  • Testing factor–covariate interaction in rank repeated-measures analysis of covariance models, Communications in Statistics - Theory and Methods, 10.1080/03610926.2017.1342840, 47, 11, (2760-2778), (2017).
  • Survival analysis with functions of mismeasured covariate histories: the case of chronic air pollution exposure in relation to mortality in the nurses’ health study, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12229, 67, 2, (307-327), (2017).
  • Assessing Time-Varying Causal Effect Moderation in Mobile Health, Journal of the American Statistical Association, 10.1080/01621459.2017.1305274, 113, 523, (1112-1121), (2017).
  • Robust inference in a linear functional model with replications using the distribution , Journal of Multivariate Analysis, 10.1016/j.jmva.2017.06.008, 160, (134-145), (2017).
  • Sex Disparity in Stroke Quality of Care in a Community-Based Study, Journal of Stroke and Cerebrovascular Diseases, 10.1016/j.jstrokecerebrovasdis.2017.04.006, 26, 8, (1781-1786), (2017).
  • Virus and host-specific differences in oral human herpesvirus shedding kinetics among Ugandan women and children, Scientific Reports, 10.1038/s41598-017-12994-0, 7, 1, (2017).
  • Mixed-effects Models for Repeated Measures in Longitudinal Data Analysis: An Introduction to Methodology, Theory, and Applications, Ouyou toukeigakuJapanese Journal of Applied Statistics, 10.5023/jappstat.46.53, 46, 2, (53-65), (2017).
  • Continuation of non-essential medications in actively dying hospitalised patients, BMJ Supportive & Palliative Care, 10.1136/bmjspcare-2016-001229, 7, 4, (450-457), (2017).
  • undefined, , 10.1063/1.5012232, (050013), (2017).
  • Estimating relative risks in multicenter studies with a small number of centers — which methods to use? A simulation study, Trials, 10.1186/s13063-017-2248-1, 18, 1, (2017).
  • Review of Recent Methodological Developments in Group-Randomized Trials: Part 2—Analysis, American Journal of Public Health, 10.2105/AJPH.2017.303707, 107, 7, (1078-1086), (2017).
  • On generalised estimating equations for vector regression, Australian & New Zealand Journal of Statistics, 10.1111/anzs.12191, 59, 2, (195-213), (2017).
  • On the analysis of very small samples of Gaussian repeated measurements: an alternative approach, Statistics in Medicine, 10.1002/sim.7199, 36, 6, (958-970), (2017).
  • Sample size calculations for randomised trials including both independent and paired data, Statistics in Medicine, 10.1002/sim.7201, 36, 8, (1227-1239), (2017).
  • Improved methods for the marginal analysis of longitudinal data in the presence of time‐dependent covariates, Statistics in Medicine, 10.1002/sim.7307, 36, 16, (2533-2546), (2017).
  • Missing binary outcomes under covariate‐dependent missingness in cluster randomised trials, Statistics in Medicine, 10.1002/sim.7334, 36, 19, (3092-3109), (2017).
  • An evaluation of constrained randomization for the design and analysis of group‐randomized trials with binary outcomes, Statistics in Medicine, 10.1002/sim.7410, 36, 24, (3791-3806), (2017).
  • Improved standard error estimator for maintaining the validity of inference in cluster randomized trials with a small number of clusters, Biometrical Journal, 10.1002/bimj.201600182, 59, 3, (478-495), (2017).
  • Cluster randomized trials with a small number of clusters: which analyses should be used?, International Journal of Epidemiology, 10.1093/ije/dyx169, (2017).
  • Sieve analysis of breakthrough HIV-1 sequences in HVTN 505 identifies vaccine pressure targeting the CD4 binding site of Env-gp120, PLOS ONE, 10.1371/journal.pone.0185959, 12, 11, (e0185959), (2017).
  • Rank repeated measures analysis of covariance, Communications in Statistics - Theory and Methods, 10.1080/03610926.2015.1014106, 46, 3, (1158-1183), (2016).
  • Alternating logistic regressions with improved finite sample properties, Biometrics, 10.1111/biom.12614, 73, 2, (696-705), (2016).
  • Inference about regression parameters using highly stratified survey count data with over-dispersion and repeated measurements, Journal of Applied Statistics, 10.1080/02664763.2016.1191622, 44, 6, (1013-1030), (2016).
  • The performances of several modified CIC criteria for working intra-cluster correlation structure selection in GEE analysis, Communications in Statistics - Simulation and Computation, 10.1080/03610918.2016.1189565, 46, 8, (6034-6048), (2016).
  • Stroke Performance Measures Do Not Predict Functional Outcome, The Neurohospitalist, 10.1177/1941874416675797, 7, 3, (113-121), (2016).
  • Prevalence of periodontal diseases in a multicenter cohort of perinatally HIV‐infected and HIV‐exposed and uninfected youth, Journal of Clinical Periodontology, 10.1111/jcpe.12646, 44, 1, (2-12), (2016).
  • Accommodating Small Sample Sizes in Three-Level Models When the Third Level is Incidental, Multivariate Behavioral Research, 10.1080/00273171.2016.1262236, 52, 2, (200-215), (2016).
  • Racial Differences in Processes of Care at End of Life in VA Medical Centers: Planned Secondary Analysis of Data from the BEACON Trial, Journal of Palliative Medicine, 10.1089/jpm.2015.0311, 19, 2, (157-163), (2016).
  • Obesogenic environments in tribally-affiliated childcare centers and corresponding obesity rates in preschool children, Preventive Medicine Reports, 10.1016/j.pmedr.2016.01.003, 3, (151-158), (2016).
  • Improving power in small‐sample longitudinal studies when using generalized estimating equations, Statistics in Medicine, 10.1002/sim.6967, 35, 21, (3733-3744), (2016).
  • Intra‐cluster correlation selection for cluster randomized trials, Statistics in Medicine, 10.1002/sim.6922, 35, 19, (3272-3284), (2016).
  • Variant‐specific persistence of infections with human papillomavirus Types 31, 33, 45, 56 and 58 and risk of cervical intraepithelial neoplasia, International Journal of Cancer, 10.1002/ijc.30164, 139, 5, (1098-1105), (2016).
  • Can the buck always be passed to the highest level of clustering?, BMC Medical Research Methodology, 10.1186/s12874-016-0127-1, 16, 1, (2016).
  • Performance of models for estimating absolute risk difference in multicenter trials with binary outcome, BMC Medical Research Methodology, 10.1186/s12874-016-0217-0, 16, 1, (2016).
  • Generalized estimating equations in cluster randomized trials with a small number of clusters: Review of practice and simulation study, Clinical Trials, 10.1177/1740774516643498, 13, 4, (445-449), (2016).
  • Stepped wedge cluster randomised trials: a review of the statistical methodology used and available, BMC Medical Research Methodology, 10.1186/s12874-016-0176-5, 16, 1, (2016).
  • Statistical lessons learned for designing cluster randomized pragmatic clinical trials from the NIH Health Care Systems Collaboratory Biostatistics and Design Core, Clinical Trials: Journal of the Society for Clinical Trials, 10.1177/1740774516646578, 13, 5, (504-512), (2016).
  • Evaluation of short message service and peer navigation to improve engagement in HIV care in South Africa: study protocol for a three-arm cluster randomized controlled trial, Trials, 10.1186/s13063-016-1190-y, 17, 1, (2016).
  • Longitudinal time-domain optic coherence study of retinal nerve fiber layer in IFNβ-treated and untreated multiple sclerosis patients, Experimental and Therapeutic Medicine, 10.3892/etm.2016.3300, 12, 1, (190-200), (2016).
  • Modeling Clustered Data with Very Few Clusters, Multivariate Behavioral Research, 10.1080/00273171.2016.1167008, 51, 4, (495-518), (2016).
  • Representational Constraints on Children's Suggestibility, Psychological Science, 10.1111/j.1467-9280.2007.01930.x, 18, 6, (503-509), (2016).
  • Group sequential methods for cluster randomization trials with binary outcomes, Clinical Trials: Journal of the Society for Clinical Trials, 10.1191/1740774505cn126oa, 2, 6, (479-487), (2016).
  • Empirical Standard Errors for Longitudinal Data Mixed Linear Models, Computational Statistics, 10.1007/BF03372107, 19, 3, (455-475), (2016).
  • A mixed model formulation for designing cluster randomized trials with binary outcomes, Statistical Modelling: An International Journal, 10.1191/1471082X03st054oa, 3, 3, (233-249), (2016).
  • See more

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.