Taiwan schizophrenia linkage study: Lessons learned from endophenotype-based genome-wide linkage scans and perspective


  • Wei J. Chen

    Corresponding author
    1. Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
    2. Genetic Epidemiology Core Laboratory, Center of Genomic Medicine, National Taiwan University, Taipei, Taiwan
    3. Department of Psychiatry, College of Medicine and National Taiwan University Hospital, National Taiwan University, Taipei, Taiwan
    4. Graduate Institute of Brain and Mind Sciences, College of Medicine, National Taiwan University, Taipei, Taiwan
    • Correspondence to:

      Wei J. Chen, M.D., Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, 17 Xu-Zhou Road, Taipei 100, Taiwan.

      E-mail: wjchen@ntu.edu.tw

    Search for more papers by this author

  • Conflict of interest: none.


Taiwan Schizophrenia Linkage Study (TSLS) was initiated with a linkage strategy for locating multiple genes, each of small to moderate effect, and aimed to recruit a large enough sample of pairs of affected siblings and their families ascertained from a multisite study. With a sample of 607 families successfully recruited, a total of 2,242 individuals (1,207 affected and 1,035 unaffected) from 557 families were genotyped using 386 microsatellite markers spaced at an average of 9-cM intervals. Here the author reviews the establishment of TSLS and initial signal derived from linkage scan using the diagnosis of schizophrenia. Based on the limited success of the initial linkage analysis, a sufficient-component causal model is proposed to incorporate endophenotypes and genes for schizophrenia. Four types of candidate endophenotype measured in TSLS, including schizotypal personality, Continuous Performance Test, Wisconsin Card Sorting Test, and niacin skin flush test, are briefly described. The author discusses different strategies of linkage analysis incorporating these endophenotypes, including quantitative trait loci (QTL) linkage analysis, clustering-derived subgroups, ordered subset analysis (OSA), and latent classes for linkage scan. Then the author summarizes the linkage signals generated from seven studies of endophenotype-based linkage analysis using TSLS, including QTL scan of neurocognitive performance, QTL scan of niacin skin flush, the family cluster of attention deficit and execution deficit, OSA by schizophrenia–schizotypy factors, nested OSA by age at onset and neurocognitive performance, and the latent class of deficit schizophrenia for linkage analysis. The perspective of combining next-generation sequencing with linkage analysis of families is also discussed. © 2013 Wiley Periodicals, Inc.


Schizophrenia is a common and often disabling mental illness, and its etiology remains largely unresolved even until today [van Os and Kapur, 2009]. Evidence for a substantial genetic contribution to schizophrenia has been drawn mainly from family, twin, and adoption studies [Tsuang, 2000]. The genetic studies of schizophrenia have come a long way since the inception of its genetic contribution [Rodriguez-Murillo et al., 2012]. In the 1990s, the field was excited with the prospect of genome-wide linkage scan using hundreds of highly polymorphic microsatellite markers. Because of unknown mode of inheritance for schizophrenia, it was proposed that a non-parametric approach toward a complex disease such as schizophrenia would be optimal for this purpose [Risch, 1990a]. Another line of rationale is to use so-called endophenotype [Gottesman and Shields, 1973] to increase the power of capturing the underpinning genetic susceptibility to schizophrenia.

Under these circumstances, an ambitious plan of recruiting sib-pairs co-affected with schizophrenia and their other family members in Taiwan nationwide was initiated in 1998 with the sponsoring from National Institute of Mental Health in the United States [Cloninger et al., 1998]. With the completion of Taiwan Schizophrenia Linkage Study (TSLS) along with other supplementary measures funded by local funding agencies, many interesting findings have been generated from this sample of affected sib-pair families. In this article, the author reviews: (1) the establishment of TSLS and initial signal derived from traditional linkage scan using the diagnosis of schizophrenia; (2) the need for a sufficient-component causal model to incorporate endophenotypes and genes; (3) candidate endophenotypes measured in TSLS; (4) strategies of linkage analysis incorporating endophenotypes; and (5) signals generated from a variety of endophenotype-based linkage analysis using TSLS. The perspective of combining next-generation sequencing with linkage analysis of families is also discussed.



By the time of study planning for TSLS, several linkage analyses of schizophrenia had produced equivocal results [Riley and McGuffin, 2000; Badner and Gershon, 2002]. One major concern was that schizophrenia is most likely caused by many genes interacting with environmental factors [Faraone and Tsuang, 1985; McGue and Gottesman, 1989; Risch, 1990b]. The effect of each disease gene would be small, whereas the sample sizes of most studies had been small with not enough statistical power for linkage analysis [Lander and Kruglyak, 1995]. Furthermore, most samples had been complex in ethnic composition, which would compromise the power of linkage analyses if there was disease locus heterogeneity between strata or if analyses could not be adjusted for ethnic differences in marker allele frequencies [Cloninger et al., 1998].

In response to these concerns, the design of TSLS adopted a linkage strategy for locating multiple genes, each of small to moderate effect, and aimed to recruit a large enough sample of pairs of affected siblings and their families ascertained from a multisite study, rather than looking for major genes in large extended pedigrees [Risch, 1990a].

Ascertainment of Probands and Their Families

During the period of 1998–2002, TSLS was set to collect sib-pairs who were co-affected with schizophrenia and had at least two living first-degree relatives. The affected sib-pair probands, who met the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria for schizophrenia or schizoaffective disorder, depressed type, were identified from 78 psychiatric hospitals or health centers throughout Taiwan during the study period. The field work was described in more detail by Hwu et al. [2005].

In TSLS, all participating probands and first-degree relatives were interviewed with the Diagnostic Interview for Genetic Studies (DIGS) [Nurnberger et al., 1994], which contains the modified Structured Interview for Schizotypy (SIS) [Kendler et al., 1989]. The Chinese version of the DIGS and its reliability has been described previously [Chen et al., 1998b]. Interviews with the Chinese version of the DIGS were carried out by research assistants who had received standardized training. In addition to the DIGS, interviewers used the Chinese version of the Family Interview for Genetic Studies (FIGS) [NIMH Genetics Initiative, 1992] to collect relevant information on relatives. Two psychiatrists independently reviewed all available information including the DIGS, the FIGS, hospital records and the interviewer's notes. Best estimate lifetime psychiatric diagnosis according to the DSM-IV criteria were determined independently; if both disagreed about a diagnosis, a third one was sought, and a consensus diagnosis was reached after discussion.

Among 883 potential families identified for this study, 52 had too few family members and 831 families were contacted for this study, with 190 (22.9%) refusing to participate, and 34 were excluded. Finally, 607 families with at least two siblings affected with schizophrenia, along with 1,276 first-degree relatives, were successfully recruited in TSLS. The proband recruitment rate was estimated to be 0.4 per 104 population for the whole nation [Hwu et al., 2005].

Initial Linkage Signal

Genotyping of the TSLS was completed by the Center for Inherited Disease Research, using 386 microsatellite markers spaced at an average of 9-cM intervals [Faraone et al., 2006]. Compared to several other linkage scans of schizophrenia, the density of the markers was in the lower end [Ng et al., 2009]. Non-Mendelian inheritance and excessive recombination events were checked, and erroneous genotypes were removed accordingly. The linkage sample included a total of 2,242 genotyped individuals, comprising 1,207 affected individuals and 1,035 unaffected individuals from 557 families with sib-pairs co-affected with schizophrenia.

The initial linkage scan found five regions with non-parametric linkage z (NPL-Z) scores 2.0 or greater: 2.08 for D1S551 (113.7 cM at 1p31.1) and 2.31 for D2S410 (125.2 cM) at 2q14.1; 2.00 was reached for D4S2361 (93.5 cM) at 4q21.23, 2.07 for D15S1012 (36 cM) at 15q14, and 2.88 for D10S2327 (100.92 cM) at 10q22.3. Because the 10q22.3 finding of the largest NPL-Z (uncorrected P = 0.02) at 100.9 cM was consistent with a previously reported NPL-Z of 4.27 at 107.2 cM on chromosome 10 [Fallin et al., 2003], this locus was reported as confirmed, although it did not attain genome-wide significance (empirical P = 0.14).

To date, TSLS remains the largest linkage study for schizophrenia of a single ethnic group. It is also one of the first projects that deposited its original data at NIMH, which allows other researchers to get access to the data to try out their novel ideas, as one group of Australian researchers did [Holliday et al., 2009]. TSLS was later incorporated in a meta-analysis of 32 linkage scans [Ng et al., 2009] and the results met empirical criteria for “aggregate” genome-wide significance at 10q22.3.


Heterogeneity and Endophenotype

The limited success of linkage analysis of schizophrenia called for more scrutiny on the relationship between genotype and phenotype, particularly the issue of heterogeneity underpinning the diagnosis of schizophrenia. As a useful illustration, Tsuang et al. [1990] proposed a conceptual framework to depict some working models in explaining the multiple pathways that can lead to the same clinical outcome. They distinguished indicators of heterogeneity in different levels: level I in etiology (genetic or environmental), level II in pathophysiology, and level III in clinical symptoms. In particular for level II indicators, many different pathophysiological pathways may be involved in the genesis of the disease, such as brain morphology, biochemical and pharmacological abnormality, and neuropsychological abnormality.

The indicator in pathophysiology is also referred to as endophenotype, that is, the phenotype that underlies the clinical expressions and usually can be measured in a more objective manner [Gottesman and Shields, 1973]. Since this level is closer to etiology, it implies that it can be more easily elucidated in terms of the genes' action, and hence might be useful for obtaining homogeneous subsamples in the search for underpinning susceptibility genes [Gottesman and Gould, 2003].

Susceptibility Versus Modifier Genes

It was pointed out later that some genes may influence susceptibility only (called susceptibility genes), while others may influence clinical features only (called modifier genes) [Fanous and Kendler, 2005]. The effect of modifier genes is more like added-on factor rather than predisposition. From this perspective, the outcome due to these pure modifier genes would not be an ideal candidate for endophenotype. Meanwhile, some genes may have a mixed effect of susceptibility and modifier.

Prodrome and Surrogate Marker

Another important issue that is related to but distinct from endophenotype is the prodrome or pre-onset form of disease. To some extent, the concept of prodrome in schizophrenia is similar to the surrogate marker in clinical trial, in which an outcome that precedes the final outcome is chosen in order to make an earlier judgment whether an intervention is effective [Prentice, 1989]. Precursor signs and symptoms precede the disorder without predicting them with certainty and operate as predisposing risk factors, while a prodrome is identified only retrospectively after the subject meets criteria for a full-blown disorder [Eaton et al., 1995]. Under these circumstances, the concept of endophenotype is not necessarily useful for predicting onset among individuals at high risk, though endophenotype does indicate a predisposition toward susceptibility among the general population [Cornblatt et al., 2002].

A Sufficient-Component Causal Model

Thus, from the nucleotides (the physical constituents of genes) to phenotype, there are many possible pathways in between [Strohman, 2002]. In this aspect, the sufficient-component causal model [Rothman, 1976], which is commonly used in epidemiology to illustrate specific hypotheses about mechanisms of action [Greenland and Brumback, 2002], can be used to incorporate the relationship of a variety of endophenotypes to the underlying susceptibility genes and final onset of the disease.

The concept of this sufficient-component causal model is depicted in Figure 1. For each type of endophenotype, its constituent susceptibility genes and environmental factors are shown in separate wedges. Once the whole pie is filled, the onset of the illness is initiated; but clinical manifestation can be further tuned by modifier genes. Each endophenotype represent a separate sufficient-component of the cause, in which different numbers of susceptibility genes are involved. Endophenotype at a level close to clinical phenotype (e.g., syndromic) is assumed to have accumulated more components than that at a level relatively far apart from clinical phenotype (e.g., cellular). Many schizophrenia-related aberrant gene expressions [Glatt et al., 2005; Glatt et al., 2011; Lai et al., 2011], likely due to accumulated consequences of gene–environment interplay, can be treated as cellular endophenotypes if they can be stably detected before the onset of disease.

Figure 1.

The relationship of different types of endophenotype (cellular, neurophysiological, or syndromic) to the underlying susceptibility genes (denoted as A1, A2, B1, B2, C1, C2, and C3), environmental factors (denoted as EA1, EA2, EB1, EC1, and EC2), prodrome, onset of the disease, modifier genes (denoted as M1 and M2), and clinical manifestation in a sufficient-component causal model. For each type of endophenotype, its constituent susceptibility genes and environmental factors are shown in separate wedges. Once the whole pie is filled, the onset of the illness is initiated; but clinical manifestation can be further tuned by modifier genes.

We can further assume that some genes may be involved in more than one endophenotype, which will lead to gene–gene interaction for the occurrence of the disease. For example, susceptibility gene A1 appears in all three types of endophenotype, while susceptibility gene B1 in both neuophysiological and syndromic endophenotype. For a person with at-risk genotypes at both genes A1 and B1 will have all three pathways to the disease, whereas for a person with at-risk genotype at A1 but not at B1 (or at B1 but not at A1) will have only cellular pathway (or syndromic pathway). That means the co-occurrence of A1 and B1 at-risk genotypes has a greater effect than the sum of either one solely.

Similarly, some environmental factors may be involved in more than one endophenotype. This allows for gene–environment interaction. The completion of a causal pie needs further components other than endophenotype. Hence, an endophenotype is different from the prodrome of the illness, in which the whole pie is nearly completed and signifies the imminent onset of the full-blown illness. In contrast to susceptibility genes, modifier genes exert their effect after the whole pie is completed.

An ideal process in search for useful endophenotype may start with syndromic or dimensional candidates, next move to candidates at physiological level and finally to cellular level. However, there is no definitive clue indicating which type of endophenotypic marker is closer to the underlying genes and hence will lead more directly to their discovery. A rigorous examination whether a candidate fulfills all the criteria for an endophenotype [Gottesman and Gould, 2003] is probably the best guideline to follow.


In TSLS, several candidate endophenotypes were incorporated in the measurement, including schizotypal personality, two neuropsychological tests, and niacin skin flush. The rationale and the way they were measured are briefly described in the following.

Schizotypal Personality

Schizotypy has been postulated as the latent construct that underlies the genetic susceptibility to schizophrenia and constitutes the continuum between disease states and normal traits [Fanous et al., 2001]. Previous studies also indicated that the schizotypyal personality measured either as dimensional scores [Chen et al., 1998b] or axis II diagnosis [Chang et al., 2002] exhibited familial aggregation in the non-psychotic relatives of schizophrenia patients. Using different scales in non-clinical samples, our research also indicated that there was a factor structure similar to those in relatives of schizophrenia patients [Chen et al., 1997] and there was substantial genetic contribution to those factors of schizotypy in adolescent twins [Lin et al., 2007a].

In TSLS, an individual's schizotypy was assessed using the modified SIS in the DIGS 2.0, which contains 13 global ratings on a seven-point scale for schizotypal symptoms and six global ratings on a five-point scale for schizotypal signs, with higher scores representing more severe schizotypal features. The inter-rater reliability of the SIS ratings ranged from 0.7 to 1.0 [Lien et al., 2010b].

Continuous Performance Test

Sustained attention deficits as measured on the Continuous Performance Test (CPT) are present not only in schizophrenia patients but also in subjects with schizotypal personality disorder and in non-psychotic relatives of schizophrenia patients [Chen and Faraone, 2000]. Family studies found an elevated recurrence risk ratio for CPT deficits among non-psychotic parents or siblings [Chen et al., 2004] and a positive association between the severity of the CPT deficits and the familial loading for schizophrenia [Tsuang et al., 2006]. The performance deficits in schizophrenia patients were not amenable to treatment with neuroleptics [Liu et al., 2000] and stable in short-term clinical course [Liu et al., 2002].

During a CPT session, numbers from 0 to 9 were randomly presented for 50 msec each, at a rate of one per second on a two inch square matrix of green light-emitting diodes [Chen et al., 1998a]. Each subject undertook two CPT sessions: the undegraded 1–9 tasks and 25% degraded 1–9 tasks. Sensitivity (d′), derived from the hit rate (probability of response to target trials) and false alarm rate (probability of response to non-target trials), reflects an individual's ability to discriminate target stimuli from non-target ones. The adjusted z scores of the CPT indices [Chen et al., 1998b] were derived by means of standardizing the raw scores with adjustments for sex, age, and education against a community sample of 345 individuals [Chen et al., 1998a]. We winsorized some CPT scores as −5 or 5, respectively, since these scores were extremely small (<−5) or large (>5).

Wisconsin Card Sorting Test

Executive functioning as measured by the Wisconsin Card Sorting Test (WCST) have been found to be impaired in schizophrenia patients [Goldberg et al., 1987] and their first degree relatives [Szoke et al., 2005], though the magnitude of familial aggregation is modest at best, as shown in relatives of schizophrenia patients [Laurent et al., 2000; Lin et al., 2012] or non-clinical adolescent twins [Chou et al., 2010].

A computerized version of the WCST [Lin et al., 2000] was used in TSLS. Subjects were required to match response cards to the four stimulus cards along one of three dimensions (color, form, or number) by pressing one of the 1–4 number keys on the computer keyboard. Subjects were not informed of the correct sorting principle, nor were they told when the principle would shift during the test, but they were given feedback (“Right” or “Wrong”) on the screen after each trial. The testing continued until all 128 cards were sorted. Eight performance indices as described in the WCST manual [Heaton et al., 1993] were used for subsequent analyses, including Total Errors, Non-perseverative Errors, Perseverative Errors, Perseverative Responses, Categories Achieved, Trials to Complete First Category, Conceptual Level Response, and Failure to Maintain Set. The adjusted z scores of the WCST with adjustments for sex, age, and education were derived for each individual against a group of 392 healthy controls [Lin et al., 2012], and extreme values were also winsorized.

Niacin Skin Flush

Niacin (nicotinic acid) can induce a visible skin flush response that is mediated by the release of vasodilatatory prostaglandins from the skin [Morrow et al., 1992]. Studies consistently found lowered incidence or intensity of flush response to niacin skin patch in individuals with schizophrenia [Messamore et al., 2003; Liu et al., 2007a]. Attenuated niacin flush response has also been demonstrated in non-psychotic first-degree relatives of patients with schizophrenia [Waldo, 1999; Shah et al., 2000; Lin et al., 2007b], with a heritability of 47–54% [Lin et al., 2007b]. Greater familial loading for schizophrenia was associated with more impairment in flush response to niacin [Chang et al., 2009].

During the niacin skin patch test, patches of absorbent paper were used to apply niacin in the form of aqueous methyl nicotinate (AMN) [Liu et al., 2007a]. Equal volumes of three different concentrations (0.1, 0.01, and 0.001 M) of AMN, as well as a blank negative control, was applied topically to each subject's forearm skin for 5 min and then removed. After the niacin patches were removed, the skin flush response was rated at 5, 10, and 15 min with a 4-point scale from 0 to 3. The inter-rater reliability for the flush scoring was excellent [Lin et al., 2007b]. When research assistants rated the flush response, they were required to take a photo of each response and a random sample of the photos was periodically checked by a psychiatrist to guarantee the quality of the data.


Different strategies to incorporate the information from those candidate endophenotypes in linkage scan have been applied using the families of TSLS and are briefly described in the following.

QTL Linkage Analysis

Because many candidate endophenotypes were of continuous measures, one obvious way to incorporate this information in linkage scan is to perform QTL linkage analysis. In these analyses, NPL-Z scores, which are based on the Wilcoxon rank-sum test [Kruglyak and Lander, 1995], were calculated at all available markers throughout the genome to test for allele sharing identical by descent among individuals with similar traits using software Merlin [Abecasis et al., 2002]. The genome-wide significance level was evaluated via simulations based on the original family structure, marker informativeness, spacing, and missing data, with phenotypic measurement and affection status being preserved.

Nevertheless, because the families of TSLS were not ascertained on these continuous measures, there were two possible ways to do QTL linkage analysis: either affected individuals only or all available individuals (including both affected individuals and their unaffected relatives) were subject to QTL linkage scan. The interpretation of these two analyses depends on whether the continuous measure being investigated represents susceptibility genes, modifier genes, or genes with a mixed effect of susceptibility and modifier.

Clustering-Derived Subgroups for Linkage Scan

Identifying more homogeneous subgroups of families with a similar pattern in certain endophenotype is one way to tackle the concern about the heterogeneity and inconsistent findings regarding the genetic analyses of schizophrenia. Although several previous studies tried to subgroup schizophrenia patients using endophenotypes [Liu et al., 2006b, 2007e], the subgrouping in these studies was mainly based on a predetermined cut-off, which might not result in the most homogenous subsets. Alternatives are data-driven approaches such as cluster analyses [Jain et al., 1999]. Since methods in clustering analysis are designed for unrelated individuals, a family-based clustering strategy [Lin et al., 2009] was needed for applying such a strategy in TSLS.

To follow-up on the original linkage signal derived from the linkage scan of whole sample of TSLS families, a two-stage clustering strategy was adopted [Liu et al., 2011]. In the first stage, all affected siblings with information on four indicators (CPT undegraded d′, CPT degraded d′, WCST Perseverative Errors, and WCST Categories Achieved), a total of 817 individuals of 476 families, were subjected to clustering. Dissimilarity between a pair of patients was measured using the Euclidean distance between vectors of Pearson correlation coefficients with respect to all other patients. Four largest clusters of patients were derived and named after their neuropsychological performances: (1) attention deficit and execution deficit; (2) attention deficit and execution non-deficit; (3) attention non-deficit and execution deficit; and (4) attention non-deficit and execution non-deficit.

In the second stage, each family was categorized into a cluster if all affected siblings of the family belonged to the same patient cluster. The four family clusters were clearly differentiated by the four neuropsychological indicators. Hence, subsequent linkage scans can be performed separately for these four clusters of families.

Ordered Subset Analysis of Linkage Scan

A discovery-based genome-wide scan such as ordered subset analysis (OSA) has the advantage of not requiring a priori specification of the subset [Hauser et al., 2004]. The method ranks each family by a family-level value of a disease-related covariate of interest and identifies the contiguous subset of families that maximize the evidence for linkage. Such family covariates were chosen depending on the aim of the analyses, such as the average scores of a schizophrenia factor from affected individuals and its corresponding schizotypy factor from non-psychotic relatives within a family [Lien et al., 2010b], or the average values of the age at onset, CPT scores, and WCST scores from affected siblings [Lien et al., 2011]. The genome-wide linkage analysis was then conducted in a series of subsets incrementally, starting with the rank that had the greatest family covariate. The procedure was repeated until all families were included in the final subset.

If OSA-derived subsets of families have tied covariates, the families might be further re-ranked by means of another endophenotype, that is, nested OAS. The nested OSA procedure was repeated to seek a nested subset of families that resulted in a further increase in the linkage signal on the particular chromosome than the original subset [Lien et al., 2011].

A challenging issue in applying OSA is to determine the statistical significance of the results, which involves chromosome-wide significance and genome-wide significance [Lien et al., 2011]. Under the null hypothesis that the ranking of the covariate is independent of the family's linkage scores on the target chromosome, the families are randomly permuted with respect to the covariate ranking and a chromosome-wide P-value for each chromosome is yielded. The genome-wide significance is evaluated by simulating the genotype data of the families and then estimating the empirical significance level with correction for multiple phenotypes. More details are described in Figure 2 of Lien et al. [2011]. To maintain the overall genome-wide significance level after considering multiple informative phenotypes used in a study, an ad hoc correction procedure can be performed. For example, in Lien et al. [2011], the correction was calculated as LOD(corrected) = 3.6 + log10 (numbers of tests).

Latent Classes for Linkage Scan

Another data-driven approach to empirically derive clinical subtypes for schizophrenia is via latent class analysis, a statistical method for identifying subtypes of related cases from multivariate categorical data [Rindskopf and Rindskopf, 1986]. By specifying from 1 to 10 latent classes for fitting, the best-fitting model was selected on the basis of the Bayesian information criterion. A group of researchers of Australia applied the latent class analysis to 1,236 participants with schizophrenia in TSLS [Holliday et al., 2009] using Latent Gold software. A series of latent class analyses identified four classes with qualitative differences between the identified phenotypic subgroups. A profile analysis of variance further validated these differences, suggesting an underlying phenotypic structure of four distinct categories rather than a single continuum of liability.

Among these four classes, two classes demonstrated familial aggregation. The first (LC2) described a group with severe negative symptoms, disorganization, and pronounced functional impairment, resembling “deficit schizophrenia.” The second (LC3) described a group with minimal functional impairment, mild or absent negative symptoms, and low disorganization. Genome-wide linkage analyses were then conducted for these two latent classes by assigning affected individuals to their most likely latent class on the basis of the posterior probabilities of latent class membership. For each, individuals endorsing that class were considered affected for the linkage analysis.


To date, the families of TSLS have been subjected to a variety of genome-wide linkage scan incorporating information from candidate endophenotypes. The endophenotype and analytic strategy used in these analyses included: (1) QTL scan of neurocognitive performance [Lien et al., 2010a]; (2) QTL scan of niacin skin flush [Lien et al., 2013]; (3) the family cluster of attention deficit and execution deficit [Liu et al., 2011]; (4) OSA by schizophrenia–schizotypy factors [Lien et al., 2010b]; (5) nested OSA by age onset and neurocognitive performance [Lien et al., 2011]; and (6) the latent class of deficit schizophrenia for linkage analysis [Holliday et al., 2009]. The results of these endophenotype-based linkage analyses are succinctly discussed in the following.

QTL Scan of Neurocognitive Performance

To identify loci influencing neurocognitive performance in schizophrenia, a genome-wide QTL linkage scan was conducted using the families of TSLS [Lien et al., 2010a], with the hypothesis that the neurocognitive performances in patients with schizophrenia reflect a latent trait that may partially overlap with the heritable pathophysiology of the disorder. Neurocognitive performance examined included eight indices on the CPT and eight indices on the WCST.

At the beginning, genome-wide multipoint non-parametric QTL linkage analyses were performed in affected individuals only. One linkage peak attaining genome-wide significance was identified: 12q24.32 for undegraded CPT hit rate (NPL-Z scores = 3.32, genome-wide empirical P = 0.037, number of families used = 509). When non-psychotic relatives were included in the analyses, none of the signals reached genome-wide significance.

The identification of 12q24.32 as a quantitative trait locus has not been consistently implicated in previous linkage studies of schizophrenia, which suggests that the analysis of endophenotypes provides additional information from what is seen in analyses that rely on diagnoses. This region with linkage to a particular neurocognitive feature may inform functional hypotheses for further genetic studies for schizophrenia.

QTL Scan of Niacin Skin Flush

Among the families of the TSLS, 115 families had at least two affected siblings with information on the niacin skin test, consisting of 226 affected individuals and 137 unaffected individuals. This subsample was then subjected to QTL linkage analyses conducted in affected individuals only as well as the whole family. Among regions with an NPL-Z score of ≥3.0, only the NPL-Z scores of 3.39 at 14q32.12 for 0.01 M at 5 min for the analyses conducted in affected individuals only reached genome-wide significance (genome-wide empirical P = 0.03).

The finding of this study raises some interesting issues. If niacin flush response represented a novel endophenotype for schizophrenia [Lin et al., 2007b; Chang et al., 2009], including unaffected individuals in the linkage scans was supposed to help increase the power since unaffected individuals with attenuated flush response presumably carry the same genetic susceptibility. A possible explanation for this seemingly paradoxical finding is that the QTL with greater linkage signals in affected individuals may contain genes that influence the expression of symptoms following the onset the disease, so-called modifier genes [Nadeau, 2001; Fanous and Kendler, 2005], since the affected individuals only approach and the whole family approach provided different linkage peaks for potential regions.

Another possibility is that the locus with linkage signal revealed in the analyses of the affected individuals might contain genes that influence both the susceptibility and clinical features of schizophrenia, hence termed susceptibility-modifier genes [Fanous and Kendler, 2005]. For example, an NPL-Z score of reaching genome-wide significance was found for the analyses of niacin flush responses using the affected individuals only (i.e., modifier) at locus 14q32.12, whereas the corresponding linkage signals remained strong, with an NPL-Z score of 2.87, for the analyses incorporating the niacin responses of unaffected individuals as well (i.e., susceptibility).

The locus with the strongest linkage signal found in this study, 14q32.12, had little support in the original diagnosis-based linkage scan [Faraone et al., 2006]. Intriguingly, one microRNA has-miR-432 with aberrant expression in the peripheral blood of schizophrenia patients discovered by two different studies [Lai et al., 2011; Gardiner et al., 2012] is also located at 14q32.12. All these point to a potential locus worthy of further investigation about the etiology of schizophrenia.

The Family Cluster of Attention Deficit and Execution Deficit

By means of a two-stage clustering analysis on four indicators (CPT undegraded d′, CPT degraded d′, WCST Perseverative Errors, and WCST Categories Achieved), four clusters of families were derived: (1) attention deficit and execution deficit; (2) attention deficit and execution non-deficit; (3) attention non-deficit and execution deficit; and (4) attention non-deficit and execution non-deficit [Liu et al., 2011]. Then non-parametric linkage analyses of schizophrenia were performed separately for each of the four family clusters. The results showed that the maximal NPL-Z score was 3.70 (P = 0.00008) in the family cluster of attention deficit and execution deficit (number of families = 90) on marker D10S195. Permutation test by repeating 10,000 times random sampling of 90 families indicated the empirical P-value of this NPL-Z score was 0.0023. For the other three family clusters, the NPL scores for D10S195 were lower than that in the entire sample.

After obtaining this homogenous subgroup of 90 attention deficit and execution deficit families, 79 single nucleotide polymorphisms (SNPs) around D10S195 were genotyped in these families for fine mapping, which led to the discovery of ANXA7, PPP3CB, DNAJC9, and ZMYND17 genes as potential candidate genes for schizophrenia [Liu et al., 2011].

OSA by Schizophrenia–Schizotypy Factors

To examine whether incorporating schizotypy factors with schizophrenia factors in OSA using TSLS would help identify susceptibility genes, researchers aligned each schizotypal factor with a corresponding schizophrenia factor in calculating an average family covariate value [Lien et al., 2010b]. After preliminary analysis, they decided to align Negative Schizotypy and Social Isolation/Introversion with Negative Schizophrenia, and align Positive Schizotypy and Interpersonal Sensitivity with Positive Schizophrenia. These four combinations between schizophrenia and schizotypy factors were then used as the covariates to rank families in the subsequent OSA analysis.

The largest NPL-Z scores obtained was 3.83 at 10q22.3 in a subset of 395 families (71% of the whole sample). When the corresponding chromosome-wide NPL-Z distribution in chromosome 10 for this subset of families were compared with that of the original families, the OSA results corroborate, strengthen, and narrow the linkage peaks previously reported on chromosome 10q.

Nested OSA by Age at Onset and Neurocognitive Performance

Since earlier age at onset as well as greater CPT or WCST deficits were associated with greater familial loading for schizophrenia, theses indices were ranked using either ascending order (age at onset, four CPT indices, and three WCST indices) or descending order (four CPT indices and five WCST indices) [Lien et al., 2011]. Out of these 17 indices, a maximum non-parametric logarithm of odds (LOD) score of 4.17 at 2q22.1 was found in 295 families (53% of the families, mean age at onset = 18.70) ranked by increasing age at onset, which had significant increases in the maximum LOD score compared with those obtained in initial linkage analyses using all available families.

Since many of the families had the same mean values of age at onset, a further subsetting by one CPT or WCST index was conducted and resulted in further increase in the LOD score on the same chromosomal region. Among them, undegraded and degraded CPT false alarm rate reached significant increase in the nested OSA-based LOD on 2q22.1, with a LOD score of 7.36 in 228 families (mean adjusted z scores of false alarm rate = 1.84, mean age at onset = 18.40) and 7.71 in 243 families (mean adjusted z scores of false alarm rate = 1.49, mean age at onset = 18.40), respectively. Both the LOD score of 7.36 and of 7.71 reached genome-wide significance (both P < 0.001).

These nested OSA linkage scans revealed possible evidence of linkage on chromosome 2q22.1 in families of schizophrenia patients with more CPT false alarm rates nested within the families with younger age at onset. Intriguingly, both a meta-analysis of linkage analysis [Ng et al., 2009] and a genome-wide association study [Sullivan et al., 2008] also found markers on 2q22.1 to be associated with schizophrenia. Further fine-mapping studies among patients with early age at onset and CPT deficits may help identify the genetic variants involved in the genetic susceptibility to this subtype of schizophrenia.

The Latent Class of Deficit Schizophrenia for Linkage Analysis

Two classes derived from latent class analysis using the sample of TSLS demonstrated familial aggregation, LC2 (resembling “deficit schizophrenia”) and LC3 (a group with minimal functional impairment, mild or absent negative symptoms, and low disorganization) [Holliday et al., 2009]. Thus genome-wide linkage analyses were conducted for these two latent classes, with LC2 including 257 individuals contained within 64 families and LC3 including 93 individuals contained within 23 families. A genome-wide significant linkage to 1q23–25 (LOD = 3.78, empiric genome-wide P = 0.01) was found for the linkage analyses of the latent class of deficit schizophrenia (LC2).

This region has been implicated in schizophrenia susceptibility in two genome scan meta-analyses of schizophrenia [Lewis et al., 2003; Ng et al., 2009], suggesting that variants in 1q23–25 specifically increase the risk for a negative/deficit subtype of schizophrenia. Despite the small number of families in this latent class, the high phenotypic homogeneity of LC2 might have increased the proportion of 1q-linked families and power to detect linkage.


Summary of the Linkage Signals

TSLS has collected a large number of families of sib-pair co-affected with schizophrenia, and its participants had several candidate endophenotypes measured as well. Despite that the initial linkage signal was not very strong, subsequent endophenotype-based linkage scans have led to the strengthening of the initial signal and discoveries of several other linked regions of interest.

As summarized in Table I, a variety of endophenotype-based genome-wide linkage scans on TSLS have helped reveal intriguing loci that might be involved in the genesis of the schizophrenia characterized by the specified endophenotype. First, despite the smaller number of families included in these endophenotype-based analyses than that in the original schizophrenia-based analysis, the linkage signals were stronger and all reached genome-wide significance than the initial signal. Some analyses included even less than 100 families. A persisting concern over the use of endophenotype to subgroup schizophrenia patients is the unknown balance between power gained from increased genetic homogeneity versus the power lost from smaller sample sizes [Visscher et al., 2012]. The applications of such approaches using TSLS indicate that choosing the traits that have empirical support as endophenotypes can tip the balance towards the gain in power.

Table I. Summary of Linkage Signals Derived From Genome-Wide Linkage Scan Incorporating a Variety of Endophenotypes in Taiwan Schizophrenia Linkage Study
Study [reference]PhenotypeAnalysis strategyNo. families usedCytogenetic locationMarkersNon-parametric linkage statisticsGenome-wide empirical P
1. Faraone et al. [2006]DSM-IV schizophreniaNon-parametric linkage analysis55710q22.3D10S2327NPL-Z = 2.880.14
2. Lien et al. [2010a]CPT undegraded hit rateQTL non-parametric linkage analysis50912q24.32D12S2078NPL-Z = 3.320.03
3. Lien et al. [2013]Niacin flush response for 0.01 M at 5 minQTL non-parametric linkage analysis11514q32.12D14S617NPL-Z = 3.390.03
4. Liu et al. [2011]DSM-IV schizophreniaA family cluster of attention deficit and execution deficit9010q22.3D10S195NPL-Z = 3.700.0023
5. Lien et al. [2010b]DSM-IV schizophreniaOSA by negative schizophrenia-social isolation/introversion39510q22.3D10S2327NPL-Z = 3.830.04
6. Lien et al. [2011]DSM-IV schizophreniaNested OSA by age at onset and CPT undegraded false alarm rate2282q22.1D2S442LOD = 7.36<0.001
  Nested OSA by age at onset and CPT degraded false alarm rate2432q22.1D2S442LOD = 7.71<0.001
7. Holliday et al. [2009]Deficit schizophreniaLatent class LC2 (“deficit schizophrenia”)641q23.25D1S1619LOD = 3.780.01

Second, different endophenotypes have led to different linked regions. In particular, several loci distinct from the initial locus at 10q22.3 have been identified through these endophenotype-based linkage scans. Whether the associated endophenotype that led to the discovery of the linked locus characterizes the underlying pathophysiological impairment remains to be investigated. These results are compatible with our hypotheses that different endophenotypes represent separate sufficient-components of the cause for schizophrenia, in which different susceptibility genes are involved.

Limitations of Applying Linkage Analysis on Complex Diseases

In search for genetic susceptibility genes to schizophrenia, linkage analysis can play a role of identifying excess co-segregation of the putative alleles underlying a phenotype with the alleles at a marker locus in family data. These efforts often lead to positional candidate genes for a disease under investigation. Based on those positional candidate genes, hundreds of candidate genes have been reported but no strongly replicated findings have emerged. There are many possibilities why the application of linkage analysis did not succeed in identifying well-replicated susceptibility genes as that applied to Mendelian diseases.

First, the parametric linkage analysis can be biased due to inappropriate parameter specification, as shown in earlier linkage analysis using parametric LOD methods. Nevertheless, this limitation can be overcome by using non-parametric methods, as those used for TSLS.

Second, linkage analysis has limited resolution. As shown in the meta-analysis of 32 linkage analysis of schizophrenia, the majority of the studies using about 350–500 microsatellite markers, while some later studies used 5,868 SNPs [Ng et al., 2009]. Given the structure of two-generation human families, any recombination fraction of less than 1% becomes impractical to detect. Thus, the indicated linkage region usually remains large, for example, 10 cM. In this aspect, genome-wide association studies have better power in detecting alleles associated with complex diseases [Manolio et al., 2008].

Other Strategies That Utilized TSLS

In addition to genome-wide linkage analysis, TSLS has been used in other approaches for the genetic dissection of schizophrenia. One approach is to subject the samples of TSLS to fine-mappings on a variety of candidate regions or genes, such as AKT1 [Liu et al., 2006d], G72 and d-amino acid oxidase [Liu et al., 2006a], RGS4 [Liu et al., 2006c], DTNBP1 [Liu et al., 2007c], PPP3CC [Liu et al., 2007e], HTF9C [Liu et al., 2007d], NOTCH4 [Liu et al., 2007b], D22S278 at 22q12.3 [Liu et al., 2008], IL-6 [Liu et al., 2010], 6p [Lin et al., 2009], and DRD2 [Glatt et al., 2009]. Another approach of genome-wide association studies has not been applied to TSLS, since its sample size is relatively underpowered given the low magnitude of odds ratio for associated SNPs discovered so far.


Despite these achievements, several issues remain to be addressed about the endophenotype-based linkage scan, in specific, and linkage analysis, in general, for schizophrenia. First, linkage analysis is not an end per se, and the final goal is to narrow down the region to find out the genes that confer an increased risk to susceptibility or modifier effect to schizophrenia. However, further mapping efforts on identified linked regions in linkage studies for Non-Mendelian complex diseases did not lead to much success in the identification of causal genes and risk alleles [Bailey-Wilson and Wilson, 2011]. One possibility is that the signal was spurious and hence not replicated across studies. Although TSLS is certainly not immune from such false positivity, there are signs that the linkage signals are relatively robust. For example, the locus 10q22.3 was identified in the initial linkage scan [Faraone et al., 2006], consolidated further in a later meta-analysis of linkage studies [Ng et al., 2009], and then reinforced in two endophenotype-based linkage scans, including a family cluster of attention deficit execution deficint [Liu et al., 2011] and OSA by negative schizophrenia-social isolation/introversion [Lien et al., 2010b].

Second, the resolution of linkage analysis in families is limited in search of “common variants” that tend to have small effect (e.g., an odds ratio of <2.0) [Risch and Merikangas, 1996]. However, the genetic variants underlying so-called common complex diseases may turn out to be rare variants that were not sufficiently covered by existing genome-wide association chips [Manolio et al., 2009]. For TSLS, so far only one linkage region has been fine-mapped using 79 haplotype tag SNPs from the HapMap with the criteria of minor allele frequency above 0.1 and r2 above 0.8 [Liu et al., 2011]. However, even for this fine-mapping, the average inter-marker distance of the SNPs was 43 kilobase (kb), and several candidate genes were implicated in the results. Thus, a causal locus not being found yet for a linkage region may be accounted for by the fact that we have not sequenced enough DNA in the region on a large enough sample of people [Bailey-Wilson and Wilson, 2011].

With the advent of next-generation sequencing technology, it can facilitate economically feasible both systematic and agnostic scans throughout the genome for either common or rare disease risk variants of small or large effect size [Cirulli and Goldstein, 2010]. Under this circumstance, it becomes feasible to perform family studies of DNA sequence data. This approach can be helpful in two important aspects for the search of genetic susceptibility to schizophrenia [Bailey-Wilson and Wilson, 2011]. First, the families can be used to determine whether a sequence variant is segregating within the family or a de-novo mutation. Second, families can help to determine which variants segregate with a disease or endophenotype within the family, which is a typical linkage analysis. In particular, family-based linkage studies have greater power in detecting rare variants of large effect than population-based case-control studies. Thus, a perspective of next-generation linkage analysis has shed new light on the utility of well-characterized family studies of schizophrenia in TSLS. Besides the search for genomic information, the sample of TSLS can be enriched by incorporating other –omics information, such as epigenomics and metabolomics. Nevertheless, the information of the latter –omics data are tissue-dependent. This makes the interpretation of the information derived from peripheral blood difficult and needs careful validation.


TSLS was supported by grants from the National Institute of Mental Health, USA (1R01-MH-59624-01), the National Health Research Institutes, Taiwan (NHRI-90-8825PP; NHRI–EX91,92-9113PP), National Science Council, Taiwan (NSC-91-3112-B-002-011, 92-3112-B-002-019), and the support from the Genomic Medicine Research Program of Psychiatric Disorders, National Taiwan University Hospital. The writing of this article was supported in part by the National Health Research Institutes, Taiwan (NHRI-EX100-10048PI) and National Science Council, Taiwan (101-2314-B-002-134-MY3). The research team of TSLS included Ming T. Tsuang, MD, PhD (principal investigator); Stephen V. Faraone, PhD (co-principal investigator); Hai-Gwo Hwu, MD (Taiwan principal investigator), and Wei J. Chen, MD, ScD (Taiwan co-principal investigator). The other participating researchers and hospitals were reported in the acknowledgement section of Hwu et al. [2005].