Classical reliability theory assumes that individuals have identical true scores on both testing occasions, a condition described as stable. If some individuals’ true scores are different on different testing occasions, described as unstable, the estimated reliability can be misleading. A model called stable unstable reliability theory (SURT) frames stability or instability as an empirically testable question. SURT assumes a mixed population of stable and unstable individuals in unknown proportions, with wi the probability that individual i is stable. wi becomes i’s test score weight which is used to form a weighted correlation coefficient rw which is reliability under SURT. If all wi= 1 then rw is the classical reliability coefficient; thus classical theory is a special case of SURT. Typically rw is larger than the conventional reliability r, and confidence intervals on true scores are typically shorter than conventional intervals. rw is computed with routines in a publicly available R package.