Standard Article

Statistical Databases

  1. Josep Domingo-Ferrer

Published Online: 15 SEP 2008

DOI: 10.1002/9780470050118.ecse417

Wiley Encyclopedia of Computer Science and Engineering

Wiley Encyclopedia of Computer Science and Engineering

How to Cite

Domingo-Ferrer, J. 2008. Statistical Databases. Wiley Encyclopedia of Computer Science and Engineering. 1–10.

Author Information

  1. Rovira i Virgili University, Tarragona, Catalonia, Spain

Publication History

  1. Published Online: 15 SEP 2008

Abstract

Statistical databases are databases that contain statistical information. Such databases normally are released by national statistical institutes, but on occasion they can also be released by health-care authorities (epidemiology) or by private organizations (e.g., consumer surveys). Statistical databases typically come in three formats:

  • Tabular data,

    that is, tables with counts or magnitudes, which are the classic output of official statistics.

  • Queryable databases,

    that is, online databases to which the user can submit statistical queries (sums, averages, etc.).

  • Microdata,

    that is, files in which each record contains information on an individual (a citizen or a company).

The peculiarity of statistical databases is that they should provide useful statistical information, but they should not reveal private information on the individuals to whom they refer (respondents). Indeed, supplying data to national statistical institutes is compulsory in most countries, but in return those institutes commit to preserving the privacy of respondents. Inference control in statistical databases, also known as statistical disclosure control (SDC), is a discipline that seeks to protect data in statistical databases so that they can be published without revealing confidential information that can be linked to specific individuals among those to whom the data correspond. SDC is applied to protect respondent privacy in areas such as official statistics, health statistics, and e-commerce (sharing of consumer data). Because data protection ultimately means data modification, the challenge for SDC is to achieve protection with minimum loss of accuracy sought by database users.

Keywords:

  • statistical databases;
  • inference control;
  • statistical disclosure control;
  • statistical disclosure limitation;
  • data privacy;
  • microdata;
  • tabular data