Medical diagnostic tests must enjoy appropriate validity and high reliability in order to qualify as adequate assessment tools. Without a gold standard test, available medical diagnostic tests are not perfect; hence, the reliability of such tests must be evaluated precisely. Kappa coefficient statistics are often utilized to assess reliability of tests when there are two or more medical diagnostic tests. However, the statistics are imprecise for a typical case when the prevalence rate of a target disease is unknown. Although latent class models could be used to assess reliability, the models cannot estimate reliability in the case of two tests, due to unidentifiability or the lack of degrees of freedom. An alternative approach to assess reliability for the case of two tests is stratifying a two-by-two contingency table under the assumption that sensitivities and specificities between the two tests be equal over all strata and that prevalence rates in the strata be different from each other. Because stratification is basically a multi-sample analysis, it should not be applied to the situation where subsamples (i.e., centers) are randomly selected from a larger population. In this article, a type of mixed-effect model is proposed to evaluate the reliability of two tests for trials in randomly selected multiple centers. Several types of distributions for prevalence rates over subpopulations are considered. Simulation studies show that our proposed method performs nicely. Analysis of real data is also reported. Copyright © 2013 John Wiley & Sons, Ltd.