Simulation of Longitudinal Exposure Data with Variance-Covariance Structures Based on Mixed Models

Authors

  • Peng Song,

    1. Operations Research Program, North Carolina State University, Raleigh, NC, USA
    Search for more papers by this author
  • Jianping Xue,

    Corresponding author
    1. National Exposure Research Laboratory, Office of Research and Development, U.S. Environmental Protection Agency, NC, USA
    • Operations Research Program, North Carolina State University, Raleigh, NC, USA
    Search for more papers by this author
  • Zhilin Li

    1. Center for Research in Scientific Computation & Department of Mathematics, North Carolina State University, Raleigh, NC, USA
    Search for more papers by this author

Address correspondence to Jianping Xue, U.S. EPA, 109 T.W. Alexander Drive, MD E205–02, Research Triangle Park, NC 27711, USA; xue.jianping@epa.gov.

Abstract

Longitudinal data are important in exposure and risk assessments, especially for pollutants with long half-lives in the human body and where chronic exposures to current levels in the environment raise concerns for human health effects. It is usually difficult and expensive to obtain large longitudinal data sets for human exposure studies. This article reports a new simulation method to generate longitudinal data with flexible numbers of subjects and days. Mixed models are used to describe the variance-covariance structures of input longitudinal data. Based on estimated model parameters, simulation data are generated with similar statistical characteristics compared to the input data. Three criteria are used to determine similarity: the overall mean and standard deviation, the variance components percentages, and the average autocorrelation coefficients. Upon the discussion of mixed models, a simulation procedure is produced and numerical results are shown through one human exposure study. Simulations of three sets of exposure data successfully meet above criteria. In particular, simulations can always retain correct weights of inter- and intrasubject variances as in the input data. Autocorrelations are also well followed. Compared with other simulation algorithms, this new method stores more information about the input overall distribution so as to satisfy the above multiple criteria for statistical targets. In addition, it generates values from numerous data sources and simulates continuous observed variables better than current data methods. This new method also provides flexible options in both modeling and simulation procedures according to various user requirements.

Ancillary