SEARCH

SEARCH BY CITATION

Algorithm S1 The ENCODE algorithm

Algorithm S2 Window definition

Algorithm S3 The tSNPsMultiPassGreedy algorithm

Algorithm S4 The ReconstructUnassayedSNPs algorithm

Table S1 Selection of tSNPs and prediction of tagged SNPs in each of the 22 autosomes in the HapMap populations (results shown for analysis using parameters of 20 eigenSNPs and 98% accuracy). The total number of polymorphic SNPs for each population and chromosome is also reported.

Table S2 Percentage of SNPs in chromosome 1 lying within windows of given physical size (base pairs) for the parameter combination 98% accuracy and 20 eigenSNPs in HapMap phase 2 data.

Table S3 Percentage of SNPs across all autosomes lying within windows of given size (number of SNPs) for the parameter combination 98% accuracy and 20 eigenSNPs in HapMap phase 2 data.

Table S4 Results computed using the GWAS data for Parkinson's disease. We extracted the common SNPs between the dataset under study and the HapMap phase 2 CEU data; we used the latter data to identify tSNPs and to compute prediction coefficients. The table depicts the percentage of SNPs selected as tSNPs and the error in tagged SNPs for each input parameter combination. False positives are the number of spurious associations with P value ≤ 10−4 in the reconstructed dataset.

Figure S1 An overview of our approach showing the interplay between Algorithms 2, 3, and 4.

Figure S2 Histogram of window sizes in terms of number of SNPs for all autosomes in the Asian population. Results for all six parameter combinations of accuracy and number of eigenSNPs, used in the analysis, are shown.

Figure S3 Histogram of window sizes in terms of number of SNPs for all autosomes in the European population. Results for all six parameter combinations of accuracy and number of eigenSNPs, used in the analysis, are shown.

Figure S4 Histogram of window sizes in terms of number of SNPs for all autosomes in the African population. Results for all six parameter combinations of accuracy and number of eigenSNPs, used in the analysis, are shown.

Figure S5 Prediction error distribution among SNPs with varying rare allele frequencies (RAF) in chromosome 1 datasets. The two rows represent different accuracy parameters used.

Figure S6 Performance comparison with Tagger analyzing the ENCODE regions. For each region, our algorithm was run with nine parameter combinations (90% , 95% , and 98% target accuracy, and 20, 15, and 10 eigenSNPs). The blue line shows percentage of SNPs needed and respective reconstruction error for each of these nine parameter combinations. In each case, Tagger was restricted to the same number of tSNPs as needed by our approach. Coverage corresponds to the percentage of total SNPs captured by Tagger. Our approach provides always perfect coverage and hence it is not plotted. The x-axis corresponds to the percentage of SNPs selected as tagging. The seven subfigures correspond to (A) region ENm010.7p15.2, (B) region ENm014.7q32.33, (C) region ENr112.2p16.3, (D) region ENr113.4q26, (E) region ENr131.2q37.1, (F) region ENr213.18q12.1, and (G) region ENr232.9q34.11.

FilenameFormatSizeDescription
AHG_673_sm_SuppMat.pdf196KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.