Statistics
Biometrics Virtual Issue
FREE - Classic Journal Content |
To celebrate the International Year of Statistics we have created a special Virtual Issue of 15 classic papers from Biometrics. You can read all these articles FREE during 2013. |
Detecting Disease Outbreaks Using Local Spatiotemporal Methods From the abstract: A real-time surveillance method is developed with emphasis on rapid and accurate detection of emerging outbreaks. We develop a model with relatively weak assumptions regarding the latent processes generating the observed data, ensuring a robust prediction of the spatiotemporal incidence surface. Estimation occurs via a local linear fitting combined with day-of-week effects, where spatial smoothing is handled by a novel distance metric that adjusts for population density. |
Cause-Specific Cumulative Incidence Estimation and the Fine and Gray Model Under Both Left Truncation and Right Censoring From the abstract: The standard estimator for the cause-specific cumulative incidence function in a competing risks setting with left truncated and/or right censored data can be written in two alternative forms. One is a weighted empirical cumulative distribution function and the other a product-limit estimator. This equivalence suggests an alternative view of the analysis of time-to-event data with left truncation and right censoring: individuals who are still at risk or experienced an earlier competing event receive weights from the censoring and truncation mechanisms. As a consequence, inference on the cumulative scale can be performed using weighted versions of standard procedures. |
Identifiability of Models for Multiple Diagnostic Testing in the Absence of a Gold Standard From the abstract: We discuss the issue of identifiability of models for multiple dichotomous diagnostic tests in the absence of a gold standard (GS) test. Data arise as multinomial or product-multinomial counts depending upon the number of populations sampled. Models are generally posited in terms of population prevalences, test sensitivities and specificities, and test dependence terms. It is commonly believed that if the degrees of freedom in the data meet or exceed the number of parameters in a fitted model then the model is identifiable. Goodman (1974, Biometrika 61, 215–231) established that this was not the case a long time ago. |
Incorporating Predictor Network in Penalized Regression with Application to Microarray Data From the abstract: We consider penalized linear regression, especially for “large p, small n” problems, for which the relationships among predictors are described a priori by a network. A class of motivating examples includes modeling a phenotype through gene expression profiles while accounting for coordinated functioning of genes in the form of biological pathways or networks. To incorporate the prior knowledge of the similar effect sizes of neighboring predictors in a network, we propose a grouped penalty based on the Lγ-norm that smoothes the regression coefficients of the predictors over the network. |
Modeling Data with Excess Zeros and Measurement Error: Application to Evaluating Relationships between Episodically Consumed Foods and Health Outcomes From the abstract: Summary Dietary assessment of episodically consumed foods gives rise to nonnegative data that have excess zeros and measurement error. Tooze et al. (2006, Journal of the American Dietetic Association 106, 1575–1587) describe a general statistical approach (National Cancer Institute method) for modeling such food intakes reported on two or more 24-hour recalls (24HRs) and demonstrate its use to estimate the distribution of the food's usual intake in the general population. In this article, we propose an extension of this method to predict individual usual intake of such foods and to evaluate the relationships of usual intakes with health outcomes. |
Chop-Lump Tests for Vaccine Trials From the abstract: This article proposes new tests to compare the vaccine and placebo groups in randomized vaccine trials when a small fraction of volunteers become infected. A simple approach that is consistent with the intent-to-treat principle is to assign a score, say W, equal to 0 for the uninfecteds and some postinfection outcome X > 0 for the infecteds. One can then test the equality of this skewed distribution of W between the two groups. This burden of illness (BOI) test was introduced by Chang, Guess, and Heyse (1994, Statistics in Medicine 13, 1807–1814). |
Exploiting gene-environment independence for analysis of case-control studies: An empirical bayes-type shrinkage estimator to trade-off between bias and efficiency From the abstract: Standard prospective logistic regression analysis of case–control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, under the assumption of gene-environment independence, modern “retrospective” methods, including the “case-only” approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. |
Haplotype-based Regression Analysis and Inference of Case-Control Studies with Unphased Genotypes and Measurement Errors in Environmental From the abstract: It is widely believed that risks of many complex diseases are determined by genetic susceptibilities, environmental exposures, and their interaction. Chatterjee and Carroll (2005, Biometrika92, 399–418) developed an efficient retrospective maximum-likelihood method for analysis of case–control studies that exploits an assumption of gene–environment independence and leaves the distribution of the environmental covariates to be completely nonparametric. Spinka, Carroll, and Chatterjee (2005, Genetic Epidemiology29, 108–127) extended this approach to studies where certain types of genetic information, such as haplotype phases, may be missing on some subjects. |
Semiparametric regression of multi-dimensional genetic pathway data: least square kernel machines and linear mixed models From the abstract: We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least-squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model also makes it possible to test for the overall genetic pathway effect. |
An empirical Bayes method for estimating epistatic effects of quantitative trait loci From the abstract: The genetic variance of a quantitative trait is often controlled by the segregation of multiple interacting loci. Linear model regression analysis is usually applied to estimating and testing effects of these quantitative trait loci (QTL). Including all the main effects and the effects of interaction (epistatic effects), the dimension of the linear model can be extremely high. Variable selection via stepwise regression or stochastic search variable selection (SSVS) is the common procedure for epistatic effect QTL analysis. These methods are computationally intensive, yet they may not be optimal. The LASSO (least absolute shrinkage and selection operator) method is computationally more efficient than the above methods. |
Distance-based tests for homogeneity of multivariate dispersions From the abstract: The traditional likelihood-based test for differences in multivariate dispersions is known to be sensitive to nonnormality. It is also impossible to use when the number of variables exceeds the number of observations. Many biological and ecological data sets have many variables, are highly skewed, and are zero-inflated. The traditional test and even some more robust alternatives are also unreasonable in many contexts where measures of dispersion based on a non-Euclidean dissimilarity would be more appropriate. Distance-based tests of homogeneity of multivariate dispersions, which can be based on any dissimilarity measure of choice, are proposed here. |
Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles From the abstract: A mixed model is a flexible tool for joint modeling purposes, especially when the gathered data are unbalanced. However, computational problems due to the dimension of the joint covariance matrix of the random effects arise as soon as the number of outcomes and/or the number of used random effects per outcome increases. We propose a pairwise approach in which all possible bivariate models are fitted, and where inference follows from pseudo-likelihood arguments. |
Doubly robust estimation in missing data and causal inference models From the abstract: The goal of this article is to construct doubly robust (DR) estimators in ignorable missing data and causal inference models. In a missing data model, an estimator is DR if it remains consistent when either (but not necessarily both) a model for the missingness mechanism or a model for the distribution of the complete data is correctly specified. Because with observational data one can never be sure that either a missingness model or a complete data model is correct, perhaps the best that can be hoped for is to find a DR estimator. DR estimators, in contrast to standard likelihood-based or (nonaugmented) inverse probability-weighted estimators, give the analyst two chances, instead of only one, to make a valid inference. |
Survival model predictive accuracy and ROC curves From the abstract: The predictive accuracy of a survival model can be summarized using extensions of the proportion of variation explained by the model, or R2, commonly used for continuous response models, or using extensions of sensitivity and specificity, which are commonly used for binary response models. In this article we propose new time-dependent accuracy summaries based on time-specific versions of sensitivity and specificity calculated over risk sets. We connect the accuracy summaries to a previously proposed global concordance measure, which is a variant of Kendall's tau. |
N-mixture models for estimating population size from spatially replicated counts From the abstract: Spatial replication is a common theme in count surveys of animals. Such surveys often generate sparse count data from which it is difficult to estimate population size while formally accounting for detection probability. In this article, I describe a class of models (N-mixture models) which allow for estimation of population size from such data. The key idea is to view site-specific population sizes, N, as independent random variables distributed according to some mixing distribution (e.g., Poisson). Prior parameters are estimated from the marginal likelihood of the data, having integrated over the prior distribution for N. |
See what else we are doing to celebrate the International Year of Statistics... |