what distinguishes lits data
We define LITS data as a type of time-series data where information is recorded not only on the numbers of a species at a particular site but also ancillary information on the state of individuals. This information may be time-invariant attributes like sex or genotype; single events like date of birth or death; or features that change with time such as weight, spatial location, clutch or litter size, and parasite load. LITS data often also includes information about links between individuals, for example parentage, identity of mates, or position within a dominance hierarchy. The biological data may be associated with information about the physical environment, for example local climatic and weather fluctuations or spatial patterns in topography or soil factors. The precise distinction between what are, and what are not, LITS data is of course somewhat fuzzy, and in practice we take long-term to be more than 5 years or generations, and have concentrated on datasets with the richest individual-based data.
The efficient storage, preservation and dissemination of LITS data are important for two reasons. First, by its nature LITS data take a long time to collect and, especially in studies that record many attributes of different individuals, they require many man-hours and much expense. LITS data are often organizationally complex, and their interpretation requires detailed documentation which, if lost, may render the time-series worthless or of reduced value. Second, LITS data are often extremely valuable for exploring ecological and evolutionary hypotheses. We pick three examples illustrating this importance from the many that we could quote.
Moyes et al. (2006) used LITS data from a population of red deer (Cervus elaphus) to examine survival costs of reproduction. They report that current reproduction reduces the survival probability over the coming year, especially in young and old individuals. In contrast, an increase in cumulative reproductive effort, defined as the proportion of years in which a female has bred since sexual maturity, increases the probability of survival. This positive effect on survival of cumulative reproduction is generated by differences in individual fitness and such effects can only be identified with data on complete life histories (i.e. reproductive histories and mortality information) from a large number of individual females.
A second example comes from the Galapagos. The study of Darwin's finches (Geospiza sp.) on the small island of Daphne Major by Peter and Rosemary Grant and colleagues provides an example of one of the most important LITS datasets ever collected. Among other projects, the Grant's used this dataset to show how ecological and evolutionary dynamics of one species, G. fortis, was influenced by the colonization of Daphne Major by a competitor species, G. magnirostris. The arrival of G. magnirostris caused a decline in population density of G. fortis as interspecific competition for food increased, which also generated a response to selection of its beak shape (Grant & Grant 2006). These ecological and evolutionary consequences required detailed information on food availability, population size, individual phenotypic characteristics and mother-offspring links collected over three decades. Without these data such insights could not have been achieved.
Our final example shows how the analysis of multigenerational pedigrees from wild populations has provided insights into the consequences of human actions on ecological and evolutionary change. Once again, this understanding could not have been realized without LITS data on individual mortality, morphometric measurements, and a pedigree also built on individual-level genetic data. The bighorn sheep (Ovis canadensis) population from Ram Mountain, Alberta, Canada, has been studied at the individual level for three decades. Trophy hunting of large-horned males occurs, leading to selection against large horns. This selection has led to an evolutionary response for smaller horned animals, as they survive longer and now achieve many matings because of the absence of large-horned competitors (Coltman et al. 2003).
The LITS project
The United Kingdom has a long history of collecting LITS data, and this has involved ecologists ranging from large-scale multi-institutional teams such as the Soay sheep (Ovis aries) project (Clutton-Brock & Pemberton 2004) through to individuals, such as Tim Healing's work on the Skomer Island bank vole (Clethrionomys glareolus) (Healing et al. 1983). LITS data are a significant national resource for understanding the local response to environmental change, as well as a major contribution of the UK to international science. The aim of the LITS project was three-fold.
1. To survey the LITS datasets concerning terrestrial vertebrates that had been collected in the UK, and by UK-based researchers conducting work overseas.
2. To assess their archival status – whether they were in digital or non-digital form, stored securely, and sufficiently documented.
3. To produce a web resource that would include a catalogue of datasets with associated metadata (defined below) and, where requested, complete datasets.
Many of the UK LITS datasets (see Appendix S1 in Supplementary material) are amongst the most well-known ecological studies. To survey the UK LITS corpus we contacted the holders of the datasets known to us with a provisional list and asked for information about any we had overlooked. We believe the list of 71 datasets we identified is reasonably comprehensive but on the website we request information about new projects and any we have overlooked.
The second aim of the project was to assess the archival status of the different datasets, and any threats to their continued availability. Risk of loss is difficult to assess quantitatively, but we used criteria such as the current research status of the data owner (active, retired or changed career, deceased, respectively, suggesting increasing risks), the number of recent analyses using the dataset (more recent activity indicating less risk), and the method of data storage (relational databases, spreadsheets, paper files, field notebooks, respectively, correlated with increasing risk). Based on this, datasets were classified as low, medium or high risk. The distribution of risk amongst the 71 datasets indicated that although most of the UKs holdings were at low or medium risk of loss (58% and 22%, respectively), approximately 20% were categorized as being at high risk. Note, although we have informed data owners about our assessment of the archival status of their datasets we are aware that this may be sensitive information and it is not provided on the website. Within the LITS project we had funds for data rescue and used the risk assessment to prioritize how this was spent.
Information that provides contextual value to data and assists users in its accurate interpretation is termed metadata. Metadata are diverse and may not only describe things that have been measured or recorded but also document the way in which data are stored, relationships amongst the data, manipulations that have been performed on them, as well as stewardship information. For the LITS catalogue we requested summary metadata (Table 1) from data owners. An advantage of collecting this information in a common format is that it facilitates searching and the comparison of different data resources. Where actual datasets have been mounted on the website more specific information about each data field and their relationships are provided.
Table 1. Summary metadata stored on UK LITS datasets
|Unique ID||A unique identification number|
|Title||A title that provides a description of the dataset that should distinguish it from other projects|
|Abstract||A description of the dataset that will usually describe the objectives, key aspects, design and general methodology of the study|
|Key-words||Key-words to facilitate easy searching|
|Data owner||Name and contact details of the data owner(s)|
|Data contact||Name and contact details of the first contact to deal with questions regarding the use or interpretation of the data|
|Associated parties||Name and contact details of people who have intellectual property rights on the data. They may have assisted in collection, maintenance or documentation of the data|
|Usage rights||The access to the data permitted by the data owner|
|Geographical coverage||The geographical location of the study site|
|Temporal coverage||The time span over which data were collected|
|Taxonomic coverage||The taxonomic identity of the species studied|
|Methods and sampling||Description of sampling methodologies|
The LITS web resource
The outcome of the LITS project is available at http://www.imperial.ac.uk/litsproject/and consists of a catalogue of LITS datasets with their associated metadata. The data depository is an implementation of the Metacat metadata catalogue system (Jones et al. 2001) developed by a consortium of computer scientists, biologists and environmental scientists (Jones et al. 2001). This system is a flexible database specifically designed for metadata storage, query, and access as part of a distributed network of ecological research databases. It can store a diverse range of metadata and data and has a simple user interface allowing flexible search and query. The database uses ecological metadata language (Jones et al. 2001) to store the data. EML is the emerging standard for describing ecological data and is a derivative of extensible mark-up language (XML: Bray, Paoli & Sperberg-McQueen 2000).
Currently, two full datasets have been mounted on the website and a third is being made ready. The first is a 15-year dataset of populations of kestrels (Falco tinnunculus) in England and Scotland (Village 1990). The study includes information on the breeding success, diet, sightings, distribution and morphometrics of approximately 1000 marked birds. The second dataset is on a population of rook (Corvus frugilegus) (Patterson, Dunnet & Goodbody 1988) monitored for 11 years. This dataset holds information on sightings, breeding success, morphometrics, and mortality of almost 7000 birds. The last dataset is a 28-year study of a population of sparrowhawk (Accipiter nisus) (Newton & Rothery 1997) and holds data on breeding success and survival for 829 individuals. Additionally, data on eggshell thickness and pesticide residue are included.
Unrestricted access has been granted by the data owners to the kestrel and rook datasets while the sparrowhawk data are embargoed until 2009 to allow the data owner to complete some analyses. The data owner can allow restricted access, no access, or password-restricted access. Different parts of the dataset can be accorded different access levels. The three datasets so far have been constructed in Microsoft Access because of its widespread availability, though any format can be used. We hope that further full datasets will be added to the website in time. The website is currently maintained by the NERC Centre for Population Biology, and will ultimately be transferred to the NERC Centre for Ecology and Hydrology designated data centre.