Journal of the Royal Statistical Society: Series A (Statistics in Society)

Cover image for Vol. 180 Issue 4

Edited By: J. Carpenter and H. Goldstein

Impact Factor: 1.852

ISI Journal Citation Reports © Ranking: 2016: 11/49 (Social Sciences Mathematical Methods); 21/124 (Statistics & Probability)

Online ISSN: 1467-985X

Associated Title(s): Journal of the Royal Statistical Society: Series B (Statistical Methodology), Journal of the Royal Statistical Society: Series C (Applied Statistics), Significance

165:2


Modelling the data measurement process for the index of production by K. Patterson, Journal of the Royal Statistical Society, Series A, Statistics in Society, Volume 165 (2002), Part 2, pages 279 - 296.

Data set : The data set comprises 13 vintages of data on the United Kingdom Index of Production - the IoP.

Data organisation: the data file is UKIOP.DAT. The data is organised by row for observation period and column for vintage; thus the data is read as T x m, where m is the total number of vintages; the overall sample period was 1973m9 to 1999m12; the effective estimation period, after lagging and allowing for the most recently available final vintage (v = 13) is, therefore, 1974m1 to 1998m11. The data in UKIoP.DAT is for 1973m9 to 1998m11 to allow reproduction of the results in the paper.

The cycle dummy variable can be computed as follows (in RATS format the ; can be omitted/in TSP format omit the set instruction):

smpl 1974:1 1976:4;

set cd1 = 1;

smpl 1976:5 1980:1;

set cd1 = 0;

smpl 1980:2 1981:12;

set cd1 = 1;

smpl 1982:1 1990:6;

set cd1 = 0;

smpl 1990:7 1992:8;

set cd1 = 1;

smpl 1992:9 1998:12;

set cd1 = 0;

smpl 1973:1 1998:12;

The exceptional event dummy variables are computed as:

smpl 1974:1 1998:12;

set dv79m1 = 0;

smpl 1979:1 1979:1;

set dv79m1 = 1;

smpl 1974:1 1998:12;

set dv79m2 = 0;

smpl 1979:2 1979:2;

set dv79m2 = 1;

Contact details:

Kerry Patterson

Department of Economics, University of Reading,

PO Box 279, Whiteknights, Reading,

RG6 6AA

UK

E-mail k.d.patterson@reading.ac.uk

tel: 01189 318159 (from outside the UK 0044 1865 318159)

Dataset (8kb)

Predicting successful and unsuccessful transitions from school to work by using sequence methods by D. McVicar and M. Anyadike-Danes, Journal of the Royal Statistical Society, Series A, Statistics in Society, Volume 165 (2002), Part 2, pages 317 - 334.

The data cover 712 individuals. For each individual there are 14 characteristics variables, including a unique identifier (id) and 72 monthly activity variables from July 1993 through to June 1999.

  • ID - unique individual identifier.
  • Weight - sample weights.
  • Male - binary dummy for gender, 1=male.
  • Catholic - binary dummy for community, 1=Catholic.
  • Belfast, Neastern, Southern, Seastern, Western - Binary dummies for location of school, one of five Education and Library Board areas in Northern Ireland.
  • Grammar - binary dummy indicating type of secondary education, 1=grammar school.
  • Funemp - binary dummy indicating father's employment status at time of survey, 1=father unemployed.
  • Gcse5eq - binary dummy indicating qualifications gained by the end of compulsory education, 1=5+ GCSEs at grades A-C, or equivalent.
  • Fmpr - binary dummy indicating SOC code of father's current or most recent job, 1=SOC1 (professional, managerial or related).
  • Livboth - binary dummy indicating living arrangements at time of first sweep of survey (June 1995), 1=living with both parents.
  • Monthly Activity Variables are coded 1-6, 1=school, 2=FE, 3=employment, 4=training, 5=joblessness, 6=HE.

Contact details:

D. McVicar

Northern Ireland Economic Research Centre

46 - 48 University Road

Belfast

BT7 1NJ

UK

E-mail: D.McVicar@qub.ac.uk

Dataset (28kb)

Effects of neighbourhood demographic shifts on findings of environmental injustice: a New York City case-study by M. Talih and R. D. Fricker, Jr, Journal of the Royal Statistical Society, Series A, Statistics in Society, Volume 165 (2002), Part 2, pages 375 - 397.

Included are 2 data tab-delimited data files:

- Dataset1.tab (446.7 KB) : Tract-level Census data,

manufacturing zones & geographic info.

- Dataset2.tab (5.6 KB) : Toxics Release Inventory (TRI) site location info.

Questions about the datasets should be addressed to makram.talih@yale.edu

--

Dataset1.tab

- Tab delimited text file with select variables from the 1970, 1980, and 1990 Census.

- Also included are geographic coordinates and manufacturing information at the tract level.

- Coordinates are expressed in kilometres from a fixed origin for the North American Datum (NAD) 83, New York State/Long Island plane.

- Manufacturing information is based on zoning maps provided by the NYC Department of City Planning.

First line of the file contains the variable (column) names.

First column contains the identifier for each row. This is just the tract unique identifier.

Number of columns (excluding row names): 31

Number of rows (excluding header row): 1462

Variable definitions:

Tract Census 1990 tract number.

First 5 digits correspond to county FIPS code.

If the sequence of remaining digits is of length 6, then a "decimal" is implied:

core tract number only has a maximum of 4 digits.

In Talih & Fricker (2002), we used the 1990 tract geography in order to obtain the maps of the clusters.

Hence, using the tract comparability files, we disaggregate the data from the combined 70-80-90 tract to the component 1990 tracts.

Adj.Area Area (in the 1990 Census) of tract, with parks and cemeteries excluded. Tracts with zero adjusted area consist only of parks and/or cemeteries.

1436 tracts have non-zero adjusted area.

Area is expressed in square kilometres.

Easting Kilometres East from a fixed origin for the New York State/Long Island plane.

Northing Kilometres North from a fixed origin for the New York State/Long Island plane.

Tract centroid coordinates are the coordinates of that block centroid within the tract minimizing the maximum distance to every other block centroid within the tract.

Tracts with zero adjusted area have not been given coordinate values.

M1/M2/M3 Type of Manufacturing within the tract, as determined from the NYC-DCP zoning maps.

M1=light, M2=medium, M3=heavy. Some tracts have a mixture of manufacturing activities.

A zero entry in all three columns indicates a non-manufacturing tract.

md.rnt70 1970: Median Rent

md.rnt80 1980: Median Rent (in 1970 Dollars)

md.rnt90 1990: Median Rent (in 1970 Dollars)

md.val70 1970: Median Value of Owner-Occupied Units

mn.val80 1980: Mean Housing Unit Value (in 1970 Dollars)

md.val90 1990: Median Housing Unit Value (in 1970 Dollars)

med.hi69 1969: Median Household Income

med.hi79 1979: Median Household Income (1970 $$)

med.hi89 1989: Median Household Income (1970 $$)

hisp.pr70 1970: % Hispanic

wh.pr70 1970: % Not Hispanic: White

bk.pr70 1970: % Not Hispanic: Black

HISP.PR80 1980: % Hispanic

WH.PR80 1980: % Not Hispanic: White

BK.PR80 1980: % Not Hispanic: Black

HISP.PR90 1990: % Hispanic

WH.PR90 1990: % Not Hispanic: White

BK.PR90 1990: % Not Hispanic: Black

total70 1970: Total Population

TOTAL80 1980: Total Population

TOTAL90 1990: Total Population

density70 1970: Population Density per sq. km.

DENSITY80 1980: Population Density per sq. km.

DENSITY90 1990: Population Density per sq. km.

--

Note on conversions to a fixed (1990 Census) tract geography:

Essentially, the data for the combined 70-80-90 tract is split into the component 1990 tracts. This would not affect income measure, nor does it affect percentages, as the demographics over the combined tract are assumed "uniform", so that the demographics over the component tracts would be proportional, with proportionality constant equal to the ratio of the area of the component tract to the area of the combined tract. Thus total population counts are non-integer numbers on occasions.

--

Notes on Combined 70-80-90 Census Files [07/07/2000] Prepared by Jennifer Pace -- RAND

RACE & INCOME VARIABLES

1990 CENSUS

Downloaded from www.census.gov Census of Population and Housing 1990, STF 3A

Racial Splits that were already given:

Non-Hispanic: White, Black, Asian, American Indian,

Other Race Hispanic: White, Black, Asian, American India

Other Race Percentages were calculated as the raw number divided by the sum of all categories.

Median Household Income, Median Home Value, and Median Rent were also given. However, these numbers were not used in the combined file (1970-1990), because they changed when tracts were combined. See Section on COMPUTING MEDIANS.

1980 CENSUS

RAND data facility CF-273 Census of Population and Housing 1980, STF 3A

Breakdowns Given ->

Total: White

Black American Indian -------|

Eskimo |

Aleut |

Japanese |

Chinese |

Filipino | Combined to get

Korean | Total Asian/American Indian

Asian Indian |

Vietnamese |

Hawaiian |

Guamanian |

Samoan |

Other Asian -------|

Other Race, Hispanic

Other Race, Not Hispanic

Hispanic White, Hispanic Black, Hispanic Asian/American Indian, Hispanic Other Race

Subtracted these Hispanic numbers from the Total numbers to get Non-Hispanic numbers

Percentages calculated as the raw numbers divided by the sum of all categories.

Median Household Income and Median Rent were also given. However, these numbers were not used in the combined file (1970-1990), because they changed when tracts were combined. See Section on COMPUTING MEDIANS.

1970 CENSUS

RAND data facility CF-036 1970 Census of Population and Housing, 4th Count A 1970 Census of Population and Housing, 5th Count

From 5th Count:

Breakdowns Given ->

Total: White

Black

Indian ----|

Japanese |

Chinese |

Combined to get Other Race Filipino |

Other Race ----|

Income given in two groups: Family Income and Unrelated Individuals Income. I combined these two categories to get Household Income. Rent and Value categories also given See COMPUTING MEDIANS.

From 4th Count A:

Up to three rows per tract. 1= Total, 2= White, 3= Black. Summed over the age categories to get population for each row.

Then looked at the spanish indicator* to substract out the Hispanic population in each row. This gave me Hispanic White, Hispanic Black, and Total Hispanic.

Hispanic Other Race = Hisp Tot - (Hisp Wh + Hisp Black)

These Hispanic Numbers were then merged onto the 5th Count data. Non-Hispanic Numbers were computed from the 5th Count race variables, subtracting out the Non-Hispanic numbers.

* spanish indicator refers to people of Puerto Rican birth or parentage or spanish speaking: took the maximum of these two categories.

ADJUSTING FOR INFLATION:

All Dollar Values have been converted to 1970 Dollars.

- 1990 values were divided by 3.366

- 1980 values were divided by 1.993

COMPUTING MEDIANS

Because aggregate values were not given, means could not be computed in most cases. Medians were calculated in the following way:

After tracts were combined -> look at the count of people in each category. For example, in 1970 there are 9 rent categories:

Less than $40

$ 40 - $ 59

$ 60 - $ 79

$ 80 - $ 99

$100 - $149

$150 - $199

$200 - $249

$250 - $299

$300 or more

If the middle person is in category 6 ($150 - $199) the median rent is recorded as $175. For the "Less than" categories, the value is half of the upper bound ($20 in this case). For the "More than" categories, the value is the lower bound ($300 in this case). Otherwise the value is halfway between the lower and upper bounds of the category. This process of calculating medians may be altered later.

Note: For Value of Housing Unit the median is given for 1970 and 1990, but the MEAN is given for 1980 (median could not be calculated)

TRACT COMPARABILITY:

Note: Deleted all tracts with the suffix "99" - these are boat-houses.

Linking several years of data and tract changes...

Source- 1) Census of Population and Housing, 1980: 1980-1970 Tract Comparability File (for 1970 -> 1980 changes) 2) 1980 -> 1990 All non-matching tracts appear to be splits.

These are obvious changes.

County 70-80-90 # 1970 Tract 1980 Tract 1990 Tract -----------------------------------------------------------------------------------

047 3 3.01 3.01, 3.02

047 455 455 455.97, 455.98

047 491 491, 493

491, 493 491, 493 047 546 546 546.98

047 579 579, 589 579, 589 579, 589

047 598 598, 626 598, 626 598, 626

047 600 600 600.97, 600.98

047 606 606 606.97

047 610 610.01, 610.02 610.01, 610.02 610.01, 610.97

047 616 616 616.97, 616.98

047 628 628 628.98

047 666 666 666.98

047 758 758 758.98

047 882 882, 884 882, 884 882, 884

047 910 910, 1132 910, 1132 910, 1132 047 916 916, 918 916, 918 916, 918

047 1040 1040, 1070 1070 1070

047 1190 1190 1190.97

047 1202 1202 1202.97, 1202.98

081 248 248, 250 248, 250 248, 250

081 456 456 456.98 081 641 641.01

081 664 664 664.98

081 716 716 716.98 081 769 769.01, 769.02 769.01, 769.02 769.02, 769.97, 769.98

081 773 773 773.97, 773.98 081 803.01 803.01, 837 803.01, 837 803.01, 837

081 964 964, 972, 992 964, 972, 992 964, 972, 992 081 1072 1072.01 1072.01, 1072.02 1072.01, 1072.02

081 1113 1113, 1123 1113, 1123 1113, 1123

081 1267 1267 1267.98

--

Dataset2.tab

- Tab delimited text file with coordinates of TRI site locations, geocoded from most recent street address and TIGER/Line files (Topologically Integrated Geographic Encoding and Referencing), which are provided by the US Census Bureau.

- Coordinates are expressed in kilometres from a fixed origin for the North American Datum (NAD) 83, New York State/Long Island plane.

- The TRI database is available through the US Environmental Protection Agency data repository.

First line of the file contains the variable (column) names. First column contains the identifier for each row. This is just the facility's unique identifier.

Number of columns (excluding row names): 2

Number of rows (excluding header row): 150

Variable definitions:

Easting Kilometres East from a fixed origin for the New York State/Long Island plane.

Northing Kilometres North from a fixed origin for the New York State/Long Island plane.

Dataset (194kb)

SEARCH

SEARCH BY CITATION