The relationship between case–control differential gene expression from brain tissue and genetic associations in schizophrenia

Large numbers of genetic loci have been identified that are known to contain common risk alleles for schizophrenia, but linking associated alleles to specific risk genes remains challenging. Given that most alleles that influence liability to schizophrenia are thought to do so by altered gene expression, intuitively, case–control differential gene expression studies should highlight genes with a higher probability of being associated with schizophrenia and could help identify the most likely causal genes within associated loci. Here, we test this hypothesis by comparing transcriptome analysis of the dorsolateral prefrontal cortex from 563 schizophrenia cases and 802 controls with genome‐wide association study (GWAS) data from the third wave study of the Psychiatric Genomics Consortium. Genes differentially expressed in schizophrenia were not enriched for common allelic association statistics compared with other brain‐expressed genes, nor were they enriched for genes within associated loci previously reported to be prioritized by genetic fine‐mapping. Genes prioritized by Summary‐based Mendelian Randomization were underexpressed in cases compared to other genes in the same GWAS loci. However, the overall strength and direction of expression change predicted by SMR were not related to that observed in the differential expression data. Overall, this study does not support the hypothesis that genes identified as differentially expressed from RNA sequencing of bulk brain tissue are enriched for those that show evidence for genetic associations. Such data have limited utility for prioritizing genes in currently associated loci in schizophrenia.


| INTRODUCTION
The recent and largest genome wide association study (GWAS) of schizophrenia identified 287 loci associated with schizophrenia (Trubetskoy et al., 2022). Due to linkage disequilibrium (LD), most associated loci encompass several genes, and it is not known with certainty which genes are functionally impacted by the associated risk alleles. The prioritization of individual genes within GWASassociated loci is a major challenge for genomics research and is key to translational outcomes. Aiming to identify the most likely schizophrenia-relevant genes, the Psychiatric Genomics Consortium (PGC) (Trubetskoy et al., 2022) applied various gene prioritization methods to the 287 identified loci. These methods, based on the inference of causal SNPs from positional fine-mapping using FINE-MAP (Benner et al., 2016), evidence that the associated alleles colocalize with expression quantitative trait loci (eQTLs) through Summary-based Mendelian Randomization (SMR) (Zhu et al., 2016), and data from chromatin conformation (Hi-C) analysis, highlighted 120 genes as likely to explain associations at some of the loci. However, likely associated genes were not identified at the majority of loci, and for those prioritized genes, the evidence is probabilistic rather than definitive. Thus, there is considerable scope for further attempts at linking the association signals from that study to specific genes.
Given that few common variant associations to schizophrenia can credibly be attributed to nonsynonymous alleles (Trubetskoy et al., 2022), it is widely believed the effects of most are likely to be mediated by effects on gene expression, a hypothesis for which there is some evidence (Bray & O'Donovan, 2019;Huo, Li, Liu, Li, & Luo, 2019;Richards et al., 2012). If this hypothesis is correct, then genes that are differentially expressed in people with schizophrenia should be enriched for those that are functionally affected by the causal alleles underpinning genetic associations. Moreover, if this enrichment is nontrivial, differential gene expression data could inform gene prioritization from GWAS data. Research into other polygenic conditions, such as diabetes, has benefited from using this approach (Chen et al., 2008;Parikh, Lyssenko, & Groop, 2009), but thus far, it has not yet made an important impact on genomic studies of schizophrenia.
Transcriptomics studies of postmortem brain samples have led to the identification of large numbers of differentially expressed genes in the brains of people with schizophrenia (Collado-Torres et al., 2019;Fromer et al., 2016;Gandal, Haney, et al., 2018;Gandal, Zhang, et al., 2018;Hoffman et al., 2019;Jaffe et al., 2018;Ramaker et al., 2017). Analyses of genomic and transcriptomic case-control datasets have shown that genes with similar biological functions are enriched for associations in both types of data, including, for example, genes related to neuronal signaling (Gandal, Zhang, et al., 2018;Jaffe et al., 2018). This overlap further supports the hypothesis of a relationship between genomics and differential expression transcriptomics data, albeit that support is indirect. Here, using the largest schizophrenia GWAS dataset (Trubetskoy et al., 2022), we now directly test the relationship between differential gene expression and genetic association to evaluate the utility of differential gene expression for refining gene prioritization in schizophrenia. Downstream analysis was restricted to expressed genes (>5 counts per million in at least 10 samples). Gene-level counts were normalized using trimmed mean of M-values (TMM) normalization in edgeR (Robinson, McCarthy, & Smyth, 2010), log-transformed, and precision-weights calculated using the limma/voom approach (Law, Chen, Shi, & Smyth, 2014). Surrogate variable analysis was implemented using the sva package (Leek, Johnson, Parker, Jaffe, & Storey, 2012), and the number of surrogate variables (SVs) estimated using the "Leek" method (Leek, 2011) and set to 7. The brain bank of origin (including subcollection for CMC and BrainGVEX) was used as a covariate in the null model.
Differential gene expression tests between schizophrenia cases and controls were then performed using a weighted least-squares linear regression in limma with the following model: Gene expression $ brain bank + SV [1-7] + Dx, where Dx is control or schizophrenia.
To ensure that downstream analyses of the data derived above were not specific to the procedures used for determining differential expression, we repeated the analyses using a published schizophrenia case-control RNA sequencing analysis performed by the PsychEN-CODE consortium (Gandal, Zhang, et al., 2018), hereafter referred to as PsychENCODE. The RNA-seq samples used by PsychENCODE overlapped with our own but included additional samples from three studies (UCLA-ASD, Yale-ASD, BrainSpan) (Gandal, Zhang, et al., 2018). A further difference in that study (Gandal, Zhang, et al., 2018) was the set of covariates used: age, age 2 , batch, sex, postmortem interval, RNA integrity (RIN), RIN 2 , brain bank, brain region, 24 aggregate sequencing metrics (seqPCs), seqPC3 2 , and 4 surrogate variables.
Both the present and PsychENCODE datasets were filtered to retain only protein-coding genes.

| Genomic data
Association and gene prioritization data were taken from the largest available schizophrenia case-control GWAS (Trubetskoy et al., 2022) of up to 76,755 people with schizophrenia and 243,649 controls, which reported common variant associations at 287 distinct genomic loci, referred to herein as the PGC. Genome-wide significant SNPs (p < 5 Â 10 À8 ) were assigned to 1,565 unique protein-coding genes using gene boundaries from Ensembl assembly GRCh37. Finemapping of the significant loci, implemented by FINEMAP (Benner et al., 2016), identified a FINEMAP-broad set of 435 protein-coding genes which contained at least one SNP in the FINEMAP 95% credible set for each locus. The FINEMAP 95% credible set for each locus is the minimum set of SNPs for which the sum of the posterior probabilities (PPs) of being causal includes 95% of the locus-wide PP. The PGC further defined a subset of 64 protein-coding genes as FINEMAP-prioritized based on additional criteria, requiring the loci to remain significant after replication in an extended GWAS, and to either contain the entire set of credible SNPs, or at least one nonsynonymous or untranslated region variant with PP ≥ 0.1. The PGC also identified a broad set of 101 SMR genes where there was evidence consistent with the hypothesis that the GWAS association colocalized with an eQTL for that gene and where the HEIDI test suggested LD was not an evident explanation for that colocalization (Zhu et al., 2016). From these, we defined here 53 SMR-broad genes which were protein-coding and derived only from the subset of eQTLs from the adult brain. SMR genes were further prioritized (n = 35) by the PGC based on additional criteria, including colocalization of eQTLs with FINEMAP credible SNPs, or evidence from Hi-C chromosomal conformation analysis (Trubetskoy et al., 2022). We removed one FINEMAP-prioritized and two SMR-prioritized genes not represented in our case-control transcriptomic data. Gene sets are presented in Table S1. β values from adult brain SMR analysis were used as a continuous measure of evidence from SMR.

| Regression modeling
In primary analyses, a linear model was fitted regressing the differential expression t-statistic for each gene on either a continuous gene property (for example, gene-wide association statistic from MAGMA, or observed/expected upper bound fraction (LOEUF) score (Karczewski et al., 2020) for loss of function mutation intolerance, see below) or a binary variable denoting prioritization by one or more of the above criteria. Analyses were performed using either the signed tstatistic (a positive value indicating increased expression in schizophrenia, a negative value decreased expression in schizophrenia) or, where there was no prior expectation of direction-specific effects, the absolute t-statistic. Sets of differentially expressed genes, defined by p value thresholding, were tested by logistic regression. To compare the differential expression statistics of prioritized genes with those of the remaining genes in the same GWAS locus, we performed a conditional logistic regression analysis in R, using the loci as strata. In all regression analyses, average gene expression (mean log 2 transcripts per million) was included in the model as a covariate.

| MAGMA genetic association analyses
Genes and gene sets were tested for enrichment for genetic associa-

| Partitioned heritability analysis
We applied LD score regression analysis to estimate the heritability explained by various differentially expressed sets of genes using published methods Finucane et al., 2015).
Summary statistics (Trubetskoy et al., 2022) were filtered as described above, and SNPs were assigned to each gene set using a 10 kb window around the transcribed region of each gene (Kim et al., 2019).
Each gene set annotation was tested against a baseline LD model containing 97 annotations describing a broad set of coding, conserved and regulatory regions (Gazal et al., 2017;Hujoel, Gazal, Hormozdiari, van de Geijn, & Price, 2019). In primary analyses, annotations were tested for heritability enrichment, each compared to the remaining SNPs in the genome. To evaluate the relative heritability enrichment of an annotation, conditioned on the baseline and any additional annotations, we extracted the partitioned heritability coefficient τ c , which was converted to a Z-score by dividing by the standard error.

| Correlates of case-control differential expression
The results of differential gene expression analysis are provided in Table S2.
The probability of a gene being reported as differentially expressed can be biased by its transcript abundance (Yoon & Nam, 2017). Consistent with this, the differential expression t-statistic was related to the DLPFC expression level (mean log 2 transcripts per million) such that highly expressed transcripts had a greater likelihood of being differentially expressed (β = .13, p = 6.4 Â 10 À23 ) ( Figure 1a). Subsequent regression analyses controlled for this effect by covarying for mean tissue expression.
Genes with low tolerance to loss-of-function (LoF) mutations are enriched for genetic association with schizophrenia (Trubetskoy et al., 2022) and have a higher expression on average (Karczewski et al., 2020). We tested if the differential expression was also associated with LoF intolerance, using the LOEUF score while controlling for mean tissue expression. Higher expression in cases vs. controls was associated with LoF mutation intolerance (i.e., lower LOEUF score; β = À1.21, p = 1.1 Â 10 À66 ) (Figure 1b). Thresholding for differential expression at a 0.05 FDR cut-off gave a similar picture: genes with significant upregulation having lower tolerance to LoF mutations (logistic regression analysis: β = À.61, p = 1.2 Â 10 À46 ) while significantly downregulated genes were more tolerant to LoF mutations (β = .50, p = 1.5 Â 10 À36 ).

| Relationship of differential expression to GWAS statistics
We found no relationship between differential gene expression and measures of the gene-wide significance using MAGMA (β = À.0067, p = .14). Moreover, genes (n = 1,565) mapping to genome-wide significant loci were not more likely to be differentially expressed than other brain-expressed genes that did not map to the loci (β = À.059, p = .38) (Figure 2a). Repeated analyses in another differential gene expression analysis undertaken by PsychENCODE (Gandal, Zhang, et al., 2018) yielded similar results (Figure 2b).
Genes reported as significant in the differential expression analyses were enriched for heritability. However, they were not more enriched than those that did not show such evidence of differential expression (Table 1), indicating this might reflect the heritability enrichment expected at brain expressed genes. Adding brain-expressed genes to the conditional annotations resulted in none of the gene sets, differentially expressed or not, being significantly enriched for heritability as indicated by the τ c statistic (Table 1). As a positive control, we repeated these analyses using targets of Fragile X mental retardation protein (FMRP) (Darnell et al., 2011), a gene set with strong prior evidence for genetic association with schizophrenia (Clifton, Rees, et al., 2021;Pardiñas et al., 2018). This set retained strong evidence for enriched SNP heritability (Table 1) even after conditioning for brain expression.
Genes in the SMR-broad or SMR-prioritized sets had a lower signed differential expression t-statistic than the remaining genes F I G U R E 1 Gene property correlates of differential expression. (a) Read count bias in differential gene expression statistics. Shown is the absolute t-statistic (Abs t) from differential expression analysis and the mean log transcripts per million (TPM) for each protein-coding gene in RNA sequencing analyses comparing schizophrenia cases and controls. Differential expression is more likely to be detected in highly expressed genes. Average expression was used as a covariate in subsequent regression analyses. (b) Imbalanced differential expression of loss-of-function intolerant genes. Shown is the (directional) t-statistic from differential expression analysis and the loss-of-function observed/expected upper bound fraction (LOEUF) for each protein-coding gene. Genes less tolerant to loss of function mutations were more likely to be overexpressed in cases compared to controls in the locus (this study: SMR-broad β = À.15, p = .0021; SMR-  Note: p values are presented based on the significance of the enrichment for SNP heritability, after conditioning on 97 baseline annotations (Gazal et al., 2017;Hujoel et al., 2019), and after the addition of all brain-expressed genes to the conditional annotations. Proportion of total heritability, h 2 .
F I G U R E 2 Case-control differential expression of GWAS prioritized genes. Results are presented based on findings from case-control differential gene expression analyses performed in the current study (top row) or PsychENCODE (bottom row). (a, b) The absolute t-statistic (Abs t) from differential expression analysis was compared between all genes overlapping with any of 270 GWAS significant loci and all remaining protein-coding genes.
(c-f) The absolute t-statistic from differential expression analysis was compared between FINEMAP-broad or FINEMAPprioritized genes and the remaining genes in the locus. (g-j) The signed t-statistic from differential expression analysis was compared between SMR-broad or SMR-prioritized genes and the remaining genes in the locus. (k, l) The absolute β value from SMR analysis was compared between genes with the same, or different, direction of effect in SMR and differential expression analyses. Group differences were analyzed using logistic regression analysis (a, b, k, l) or conditional logistic regression analysis (c-j). Asterisks indicate significant (p < .05) differences between groups

| DISCUSSION
Large-scale transcriptomic and genomic studies have identified many genes that show evidence for association with schizophrenia. However, while individual genes have been reported with significant associations from both types of data (Fromer et al., 2016), differential gene expression data from transcriptomics has not yet informed systematic efforts at gene prioritization from GWAS. Moreover, it is unclear whether such information should be so used, and if it is, what weight to assign to it. Aiming to clarify those questions, in this study of coding genes, we sought to quantify the relationships between differential gene expression, genome wide association test statistics, associated loci, and genes prioritized within loci. Overall, we find no evidence for robust relationships between differential gene expression and several measures of genetic association, suggesting that in its current form, the former does not assist with mapping GWAS associations to specific genes.
We observed that genes that are less tolerant to LoF mutation are more likely to be overexpressed in people with schizophrenia than in controls. This finding superficially may have some resonance with earlier reports that LoF mutation intolerant genes are enriched for association with schizophrenia (Pardiñas et al., 2018;Trubetskoy et al., 2022). However, further analyses suggest these are unrelated observations. First, genes that show differential gene expression in schizophrenia showed no evidence of being enriched for genetic associations. Second, genes mapping to schizophrenia-associated loci were not more likely to be differentially expressed than genes outside these loci. Third, surprisingly, genes prioritized by fine-mapping showed evidence for less differential expression than other genes at the associated loci. Fourth, genes showing evidence for differential expression were not more enriched for SNP-based heritability than brain expressed genes that were not differentially expressed. Like our study, a previous analysis found that genes differentially expressed in schizophrenia are enriched for heritability compared with other genes in the genome (Gandal, Zhang, et al., 2018) but did not test whether this could be explained by the property of being brain-expressed (Pardiñas et al., 2018) rather than being differentially expressed. Here, we found that regardless of whether genes were upregulated, downregulated, or neither, they were similarly enriched for SNP heritability.
In all cases, the enrichment disappeared after conditioning on brain expression.
Our observation that FINEMAP-prioritized genes were significantly less likely to be differentially expressed than the remaining genes in the locus is surprising. One possible but highly speculative interpretation comes from a report (Gandal, Haney, et al., 2018) that in people with schizophrenia, antipsychotic drugs partially normalize primary disorder-relevant transcriptomic changes. If this is correct, then true susceptibility genes, which are expected to be enriched among those prioritized by fine-mapping, might be less likely than random genes to show differential expression due to normalization by antipsychotics. This potential effect can be tested by comparing the transcriptional patterns of genes associated with schizophrenia and pharmacological treatment.
Genes prioritized for association with schizophrenia through SMR were more likely to be expressed at lower levels in schizophrenia than other genes in the same loci, regardless of whether the putatively causal eQTLs are expected to increase or decrease expression. The absence of an expected relationship between the direction of expression predicted by the genomic-eQTL data and that observed directly by differential expression adds further to our suggestion that differential gene expression in its current form does not provide a useful means to prioritize genomic association findings. However, the observation that, as a class, SMR-associated genes were observed to be expressed at lower levels in schizophrenia than in controls in the differential gene expression study is intriguing. We can only speculate as to a possible mechanism, but one possibility is that regardless of the direction of cis-acting effects, the net effect of causal genetic variation on brain pathology is to reduce the representation of the cellular fraction in bulk tissue in which those genes are expressed, leading to the observation of lower expression. Single-cell rather than bulk tissue case-control expression studies are necessary to evaluate that possibility.
We note that conclusions drawn in the present study concerning concordance between observed and expected gene expression might be limited by the tissue. First, our results are based on studies of bulk brain tissue containing multiple cell types with independent expression dynamics. Genomics and transcriptomics may show stronger patterns of concordance with the expansion of cell-specific case-control studies of differential gene expression and cell-specific eQTLs. Second, the current study was based on RNA samples derived from patients with ages ranging from 17 to more than 90. Since the expression of schizophrenia-associated genes varies across different ages (Clifton et al., 2019), and molecular pathology during brain development is thought to influence predisposition for schizophrenia (Clifton, Collado-Torres, et al., 2021;O'Brien et al., 2018;Owen, O'Donovan, Thapar, & Craddock, 2011;Weinberger, 2017), genetic effects predicted by GWAS might only manifest during specific developmental periods.
Differential gene expression between cases and controls reflects the effects of many factors beyond the cis-acting effects of common genetic variation that are the focus of the present article. Some of these, for example, trans-acting genetic effects, epigenetic modification from exposure to environmental risk factors, and expression changes that are of pathophysiological relevance but are downstream of the primary genetic changes, may be more readily exposed by transcriptomics than by current approaches to genomics.