Topics of statistical theory for register-based statistics and data integration


  • Correction added on 10 August 2015, after the initial online publication. A duplicate article, with a DOI in the format 10.1111/stan.508 was deleted from EV on 30 July 2015. This DOI now aliases to: 10.1111/j.1467-9574.2011.00508.x in STAN 66:1.


Official statistics production based on a combination of data sources, including sample survey, census and administrative registers, is becoming more and more common. Reduction of response burden, gains of production cost efficiency as well as potentials for detailed spatial-demographic and longitudinal statistics are some of the major advantages associated with the use of integrated statistical data. Data integration has always been an essential feature associated with the use of administrative register data. But survey and census data should also be integrated, so as to widen their scope and improve the quality. There are many new and difficult challenges here that are beyond the traditional topics of survey sampling and data integration. In this article we consider statistical theory for data integration on a conceptual level. In particular, we present a two-phase life-cycle model for integrated statistical microdata, which provides a framework for the various potential error sources, and outline some concepts and topics for quality assessment beyond the ideal of error-free data. A shared understanding of these issues will hopefully help us to collocate and coordinate efforts in future research and development.