Volume 59, Issue 1

Marginal Analyses of Clustered Data When Cluster Size Is Informative

John M. Williamson

Division of HIV/AIDS Prevention, National Center for HIV, STD and TB Prevention, Centers for Disease Control and Prevention, MS E‐37, 1600 Clifton Road, NE, Atlanta, Georgia 30333, U.S.A.

email:jow5@cdc.gov

Search for more papers by this author
Somnath Datta

Department of Statistics, University of Georgia, Athens, Georgia 30602, U.S.A.

Search for more papers by this author
Glen A. Satten

Division of Laboratory Science, National Center for Environmental Health, Centers for Disease Control and Prevention, MS F‐24, 1600 Clifton Road, NE, Atlanta, Georgia 30333, U.S.A.

Search for more papers by this author
First published: 24 March 2003
Citations: 117

Abstract

Summary. We propose a new approach to fitting marginal models to clustered data when cluster size is in‐ formative. This approach uses a generalized estimating equation (GEE) that is weighted inversely with the cluster size. We show that our approach is asymptotically equivalent to within‐cluster resampling (Hoffman, Sen, and Weinberg, 2001, Biometrika73, 13–22), a computationally intensive approach in which replicate data sets containing a randomly selected observation from each cluster are analyzed, and the resulting estima‐ tes averaged. Using simulated data and an example involving dental health, we show the superior performa‐ nce of our approach compared to unweighted GEE, the equivalence of our approach with WCR for large sam‐ ple sizes, and the superior performance of our approach compared with WCR when sample sizes are small.

Number of times cited according to CrossRef: 117

  • Maternal Thyroid Disorders and Risk of Autism Spectrum Disorder in Progeny, Epidemiology, 10.1097/EDE.0000000000001174, 31, 3, (409-417), (2020).
  • A covariate‐specific time‐dependent receiver operating characteristic curve for correlated survival data, Statistics in Medicine, 10.1002/sim.8550, 39, 19, (2477-2489), (2020).
  • Reproductive outcomes associated with flame retardants among couples seeking fertility treatment: a paternal perspective, Environmental Research, 10.1016/j.envres.2020.110226, (110226), (2020).
  • On the interplay between exposure misclassification and informative cluster size, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12430, 69, 5, (1209-1226), (2020).
  • Exploring reproductive associations of serum polybrominated diphenyl ether and hydroxylated brominated diphenyl ether concentrations among women undergoing in vitro fertilization, Human Reproduction, 10.1093/humrep/deaa063, (2020).
  • Effect of oocyte donor stimulation on recipient outcomes: data from a US national donor oocyte bank, Human Reproduction, 10.1093/humrep/deaa003, (2020).
  • Marginal analysis of multiple outcomes with informative cluster size, Biometrics, 10.1111/biom.13241, 0, 0, (2020).
  • Intrauterine insemination performance characteristics and post-processing total motile sperm count in relation to live birth for couples with unexplained infertility in a randomised, multicentre clinical trial, Human Reproduction, 10.1093/humrep/deaa027, (2020).
  • Methodological Issues in Population-Based Studies of Multigenerational Associations, American Journal of Epidemiology, 10.1093/aje/kwaa125, (2020).
  • Variance estimation in tests of clustered categorical data with informative cluster size, Statistical Methods in Medical Research, 10.1177/0962280220928572, (096228022092857), (2020).
  • Robust inference for the stepped wedge design, Biometrics, 10.1111/biom.13106, 76, 1, (119-130), (2019).
  • Marginal analysis of ordinal clustered longitudinal data with informative cluster size, Biometrics, 10.1111/biom.13050, 75, 3, (938-949), (2019).
  • Preconception folate status and reproductive outcomes among a prospective cohort of folate-replete women, American Journal of Obstetrics and Gynecology, 10.1016/j.ajog.2019.02.039, (2019).
  • Waist circumference in relation to outcomes of infertility treatment with assisted reproductive technologies, American Journal of Obstetrics and Gynecology, 10.1016/j.ajog.2019.02.013, (2019).
  • Within-cluster resampling for multilevel models under informative cluster size, Biometrika, 10.1093/biomet/asz035, 106, 4, (965-972), (2019).
  • Analysis of the European baseline survey of norovirus in oysters, EFSA Journal, 10.2903/j.efsa.2019.5762, 17, 7, (2019).
  • OUP accepted manuscript, Biostatistics, 10.1093/biostatistics/kxz005, (2019).
  • Semiparametric regression of clustered current status data, Journal of Applied Statistics, 10.1080/02664763.2018.1564022, (1-14), (2019).
  • A Multicomponent Model to Improve Hospital Care of Older Adults with Cognitive Impairment: A Propensity Score–Matched Analysis, Journal of the American Geriatrics Society, 10.1111/jgs.15452, 66, 9, (1700-1707), (2018).
  • A Tutorial on the Practical Use and Implication of Complete Sufficient Statistics, International Statistical Review, 10.1111/insr.12261, 86, 3, (403-414), (2018).
  • Model selection for semiparametric marginal mean regression accounting for within‐cluster subsampling variability and informative cluster size, Biometrics, 10.1111/biom.12869, 74, 3, (934-943), (2018).
  • Rank‐based inference for covariate and group effects in clustered data in presence of informative intra‐cluster group size, Statistics in Medicine, 10.1002/sim.7979, 37, 30, (4807-4822), (2018).
  • A log rank test for clustered data with informative within‐cluster group size, Statistics in Medicine, 10.1002/sim.7899, 37, 27, (4071-4082), (2018).
  • Regression analysis of clustered interval-censored failure time data with linear transformation models in the presence of informative cluster size, Journal of Nonparametric Statistics, 10.1080/10485252.2018.1469755, 30, 3, (703-715), (2018).
  • How is structural divergence related to evolutionary information?, Molecular Phylogenetics and Evolution, 10.1016/j.ympev.2018.06.033, 127, (859-866), (2018).
  • A prospective study of physical activity and fecundability in women with a history of pregnancy loss, Human Reproduction, 10.1093/humrep/dey086, 33, 7, (1291-1298), (2018).
  • 25-Hydroxyvitamin D and Long Menstrual Cycles in a Prospective Cohort Study, Epidemiology, 10.1097/EDE.0000000000000804, 29, 3, (388-396), (2018).
  • Racial and Ethnic Differences in Pregnancy Rates Following Intrauterine Insemination with a Focus on American Indians, Journal of Racial and Ethnic Health Disparities, 10.1007/s40615-017-0456-8, 5, 5, (1077-1083), (2018).
  • Methodological approaches to analyzing IVF data with multiple cycles, Human Reproduction, 10.1093/humrep/dey374, (2018).
  • A mobile health technology platform for quality assurance and quality improvement of malaria diagnosis by community health workers, PLOS ONE, 10.1371/journal.pone.0191968, 13, 2, (e0191968), (2018).
  • Variable selection for random effects two-part models, Statistical Methods in Medical Research, 10.1177/0962280218784712, (096228021878471), (2018).
  • Pattern–mixture models with incomplete informative cluster size: application to a repeated pregnancy study, Journal of the Royal Statistical Society: Series C (Applied Statistics), 10.1111/rssc.12226, 67, 1, (255-273), (2017).
  • Factors associated with patient no‐show rates in an academic otolaryngology practice, The Laryngoscope, 10.1002/lary.26816, 128, 3, (626-631), (2017).
  • Pearson's chi‐square test and rank correlation inferences for clustered data, Biometrics, 10.1111/biom.12653, 73, 3, (822-834), (2017).
  • Tests for informative cluster size using a novel balanced bootstrap scheme, Statistics in Medicine, 10.1002/sim.7288, 36, 16, (2630-2640), (2017).
  • Two-Part and Related Regression Models for Longitudinal Data, Annual Review of Statistics and Its Application, 10.1146/annurev-statistics-060116-054131, 4, 1, (283-315), (2017).
  • Review of Recent Methodological Developments in Group-Randomized Trials: Part 2—Analysis, American Journal of Public Health, 10.2105/AJPH.2017.303707, 107, 7, (1078-1086), (2017).
  • Non‐parametric regression in clustered multistate current status data with informative cluster size, Statistica Neerlandica, 10.1111/stan.12099, 71, 1, (31-57), (2016).
  • Low-Dose Aspirin and Sporadic Anovulation in the EAGeR Randomized Trial, The Journal of Clinical Endocrinology & Metabolism, 10.1210/jc.2016-2095, 102, 1, (86-92), (2016).
  • Biased and unbiased estimation in longitudinal studies with informative visit processes, Biometrics, 10.1111/biom.12501, 72, 4, (1315-1324), (2016).
  • A randomized trial of lottery‐based incentives and reminders to improve warfarin adherence: the Warfarin Incentives (WIN2) Trial, Pharmacoepidemiology and Drug Safety, 10.1002/pds.4094, 25, 11, (1219-1227), (2016).
  • Quality improvement teams, super-users, and nurse champions: a recipe for meaningful use?, Journal of the American Medical Informatics Association, 10.1093/jamia/ocw029, 23, 6, (1195-1198), (2016).
  • Dysbiosis, inflammation, and response to treatment: a longitudinal study of pediatric subjects with newly diagnosed inflammatory bowel disease, Genome Medicine, 10.1186/s13073-016-0331-y, 8, 1, (2016).
  • Quantile inference based on clustered data, Metrika, 10.1007/s00184-016-0581-0, 79, 7, (867-893), (2016).
  • A corrected formulation for marginal inference derived from two-part mixed models for longitudinal semi-continuous data, Statistical Methods in Medical Research, 10.1177/0962280213509798, 25, 5, (2014-2020), (2016).
  • The Impact of Hospital Size on CMS Hospital Profiling, Medical Care, 10.1097/MLR.0000000000000476, 54, 4, (373-379), (2016).
  • A note on misspecification in joint modeling of correlated data with informative cluster sizes, Journal of Statistical Planning and Inference, 10.1016/j.jspi.2015.09.005, 170, (46-63), (2016).
  • Inferring marginal association with paired and unpaired clustered data, Statistical Methods in Medical Research, 10.1177/0962280216669184, (096228021666918), (2016).
  • Modeling of correlated data with informative cluster sizes: An evaluation of joint modeling and within-cluster resampling approaches, Statistical Methods in Medical Research, 10.1177/0962280215592268, 26, 4, (1881-1895), (2015).
  • Cluster adjusted regression for displaced subject data (CARDS): Marginal inference under potentially informative temporal cluster size profiles, Biometrics, 10.1111/biom.12456, 72, 2, (441-451), (2015).
  • A rank‐sum test for clustered data when the number of subjects in a group within a cluster is informative, Biometrics, 10.1111/biom.12447, 72, 2, (432-440), (2015).
  • Multiple outputation for the analysis of longitudinal data subject to irregular observation, Statistics in Medicine, 10.1002/sim.6829, 35, 11, (1800-1818), (2015).
  • Analysis of Randomised Trials Including Multiple Births When Birth Size Is Informative, Paediatric and Perinatal Epidemiology, 10.1111/ppe.12228, 29, 6, (567-575), (2015).
  • Influence of treatment center and hospital volume on survival for locally advanced cervical cancer, Gynecologic Oncology, 10.1016/j.ygyno.2015.07.015, 139, 3, (506-512), (2015).
  • Prostate Cancer: Interobserver Agreement and Accuracy with the Revised Prostate Imaging Reporting and Data System at Multiparametric MR Imaging, Radiology, 10.1148/radiol.2015142818, 277, 3, (741-750), (2015).
  • Approximate U-Statistics for State Waiting Times Under Right Censoring, Modern Nonparametric, Robust and Multivariate Methods, 10.1007/978-3-319-22404-6, (31-46), (2015).
  • Parametric Response Mapping of Apparent Diffusion Coefficient as an Imaging Biomarker to Distinguish Pseudoprogression from True Tumor Progression in Peptide-Based Vaccine Therapy for Pediatric Diffuse Intrinsic Pontine Glioma, American Journal of Neuroradiology, 10.3174/ajnr.A4428, 36, 11, (2170-2176), (2015).
  • Methods for observed‐cluster inference when cluster size is informative: A review and clarifications, Biometrics, 10.1111/biom.12151, 70, 2, (449-456), (2014).
  • Review of methods for handling confounding by cluster and informative cluster size in clustered data, Statistics in Medicine, 10.1002/sim.6277, 33, 30, (5371-5387), (2014).
  • Regression modeling of longitudinal data with outcome‐dependent observation times: extensions and comparative evaluation, Statistics in Medicine, 10.1002/sim.6262, 33, 27, (4770-4789), (2014).
  • Within‐Cluster Resampling/Multiple Outputation, Wiley StatsRef: Statistics Reference Online, 10.1002/9781118445112, (1-5), (2014).
  • Informative Cluster Size, Wiley StatsRef: Statistics Reference Online, 10.1002/9781118445112, (1-2), (2014).
  • A spatial augmented beta regression model for periodontal proportion data, Statistical Modelling: An International Journal, 10.1177/1471082X14535515, 14, 6, (503-521), (2014).
  • Robust estimation of marginal regression parameters in clustered data, Statistical Modelling: An International Journal, 10.1177/1471082X14535481, 14, 6, (489-501), (2014).
  • Assortativity coefficient-based estimation of population patterns of sexual mixing when cluster size is informative, Sexually Transmitted Infections, 10.1136/sextrans-2013-051282, 90, 4, (332-336), (2014).
  • Generalized Estimating Equations in Longitudinal Data Analysis: A Review and Recent Developments, Advances in Statistics, 10.1155/2014/303728, 2014, (1-11), (2014).
  • A model for repeated clustered data with informative cluster sizes, Statistics in Medicine, 10.1002/sim.5988, 33, 5, (738-759), (2013).
  • Inference on the marginal distribution of clustered data with informative cluster size, Statistical Papers, 10.1007/s00362-013-0504-3, 55, 1, (71-92), (2013).
  • Semiparametric Regression Analysis of Clustered Interval-Censored Failure Time Data with Informative Cluster Size, The International Journal of Biostatistics, 10.1515/ijb-2012-0047, 9, 2, (2013).
  • Extending the Peters–Belson approach for assessing disparities to right censored time‐to‐event outcomes, Statistics in Medicine, 10.1002/sim.5835, 32, 23, (4006-4020), (2013).
  • Event‐weighted proportional hazards modelling for recurrent gap time data, Statistics in Medicine, 10.1002/sim.5522, 32, 1, (124-130), (2012).
  • Efficient Estimation for Rank‐Based Regression with Clustered Data, Biometrics, 10.1111/j.1541-0420.2012.01760.x, 68, 4, (1074-1082), (2012).
  • Mortality and intensive care volume in ventilated patients from 1995 to 2009 in the Australian and New Zealand binational adult patient intensive care database*, Critical Care Medicine, 10.1097/CCM.0b013e318236f2af, 40, 3, (800-812), (2012).
  • Robust methods for detecting familial aggregation of a quantitative trait in matched case–control family studies, Journal of Applied Statistics, 10.1080/02664763.2012.702204, 39, 10, (2097-2111), (2012).
  • A general class of signed-rank tests for clustered data when the cluster size is potentially informative, Journal of Nonparametric Statistics, 10.1080/10485252.2012.672647, 24, 3, (797-808), (2012).
  • A general framework for estimating volume‐outcome associations from longitudinal data, Statistics in Medicine, 10.1002/sim.4410, 31, 4, (366-382), (2011).
  • Informative Cluster Sizes for Subcluster‐Level Covariates and Weighted Generalized Estimating Equations, Biometrics, 10.1111/j.1541-0420.2010.01542.x, 67, 3, (843-851), (2011).
  • Marginal association measures for clustered data, Statistics in Medicine, 10.1002/sim.4368, 30, 27, (3181-3191), (2011).
  • A joint modeling approach to data with informative cluster size: Robustness to the cluster size model, Statistics in Medicine, 10.1002/sim.4239, 30, 15, (1825-1836), (2011).
  • Performance of the Modified Poisson Regression Approach for Estimating Relative Risks From Clustered Prospective Data, American Journal of Epidemiology, 10.1093/aje/kwr183, 174, 8, (984-992), (2011).
  • Fitting marginal accelerated failure time models to clustered survival data with potentially informative cluster size, Computational Statistics & Data Analysis, 10.1016/j.csda.2011.06.015, 55, 12, (3295-3303), (2011).
  • Estimation of covariate effects in generalized linear mixed models with informative cluster sizes, Biometrika, 10.1093/biomet/asq066, 98, 1, (147-162), (2011).
  • Measurement, analysis and interpretation of examiner reliability in caries experience surveys: some methodological thoughts, Clinical Oral Investigations, 10.1007/s00784-010-0475-x, 16, 1, (117-127), (2010).
  • Likelihood Methods for Binary Responses of Present Components in a Cluster, Biometrics, 10.1111/j.1541-0420.2010.01483.x, 67, 2, (629-635), (2010).
  • Revealing age-specific past and future unrelated costs of pneumococcal infections by flexible generalized estimating equations, Journal of Applied Statistics, 10.1080/02664763.2010.515302, 38, 8, (1533-1547), (2010).
  • Inference for marginal linear models for clustered longitudinal data with potentially informative cluster sizes, Statistical Methods in Medical Research, 10.1177/0962280209347043, 20, 4, (347-367), (2010).
  • Regression analysis of clustered interval‐censored data with informative cluster size, Statistics in Medicine, 10.1002/sim.4042, 29, 28, (2956-2962), (2010).
  • Marginal analyses of longitudinal data with an informative pattern of observations, Biometrika, 10.1093/biomet/asp068, 97, 1, (65-78), (2010).
  • Nonparametric Analysis of Clustered Multivariate Data, Journal of the American Statistical Association, 10.1198/jasa.2010.tm08545, 105, 490, (864-872), (2010).
  • Exact Inference for Complex Clustered Data Using Within-Cluster Resampling, Journal of Biopharmaceutical Statistics, 10.1080/10543401003618884, 20, 4, (850-869), (2010).
  • Recent Heterosexual Partnerships and Patterns of Condom Use, Epidemiology, 10.1097/EDE.0b013e318187ac81, 20, 1, (44-51), (2009).
  • Bayesian modeling of multivariate spatial binary data with applications to dental caries, Statistics in Medicine, 10.1002/sim.3647, 28, 28, (3492-3508), (2009).
  • Statistical Approaches for Modeling Radiologists' Interpretive Performance, Academic Radiology, 10.1016/j.acra.2008.07.022, 16, 2, (227-238), (2009).
  • A Multivariate Examination of Temporal Changes in Berg Balance Scale Items for Patients With ASIA Impairment Scale C and D Spinal Cord Injuries, Archives of Physical Medicine and Rehabilitation, 10.1016/j.apmr.2008.09.577, 90, 7, (1208-1217), (2009).
  • On the multivariate spatial median for clustered data, Canadian Journal of Statistics, 10.1002/cjs.5550350202, 35, 2, (215-231), (2009).
  • A Signed‐Rank Test for Clustered Data, Biometrics, 10.1111/j.1541-0420.2007.00923.x, 64, 2, (501-507), (2008).
  • Weighted Rank Regression for Clustered Data Analysis, Biometrics, 10.1111/j.1541-0420.2007.00842.x, 64, 1, (39-45), (2008).
  • Estimating a weighted average of stratum‐specific parameters, Statistics in Medicine, 10.1002/sim.3326, 27, 24, (4972-4991), (2008).
  • Bias of the regression estimator for experiments using clustered random assignment, Statistics & Probability Letters, 10.1016/j.spl.2008.03.008, 78, 16, (2654-2659), (2008).
  • Analysis of recurrent event data under the case‐crossover design with applications to elderly falls, Statistics in Medicine, 10.1002/sim.3171, 27, 15, (2890-2901), (2007).
  • See more

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.