Genome sequencing for rightward hemispheric language dominance

Abstract Most people have left‐hemisphere dominance for various aspects of language processing, but only roughly 1% of the adult population has atypically reversed, rightward hemispheric language dominance (RHLD). The genetic‐developmental program that underlies leftward language laterality is unknown, as are the causes of atypical variation. We performed an exploratory whole‐genome‐sequencing study, with the hypothesis that strongly penetrant, rare genetic mutations might sometimes be involved in RHLD. This was by analogy with situs inversus of the visceral organs (left‐right mirror reversal of the heart, lungs and so on), which is sometimes due to monogenic mutations. The genomes of 33 subjects with RHLD were sequenced and analyzed with reference to large population‐genetic data sets, as well as 34 subjects (14 left‐handed) with typical language laterality. The sample was powered to detect rare, highly penetrant, monogenic effects if they would be present in at least 10 of the 33 RHLD cases and no controls, but no individual genes had mutations in more than five RHLD cases while being un‐mutated in controls. A hypothesis derived from invertebrate mechanisms of left‐right axis formation led to the detection of an increased mutation load, in RHLD subjects, within genes involved with the actin cytoskeleton. The latter finding offers a first, tentative insight into molecular genetic influences on hemispheric language dominance.

90% of people with RHLD are also left-handed. 3 Therefore, RHLD usually involves a broader re-organization of left-right laterality than purely for language functions, but may represent an etiological group that is distinct from the bulk of left-handers.
Gene expression and in utero ultrasound studies of human embryos have indicated that lateralized development is already underway in the human central nervous system by 5 to 8 weeks postconception, [6][7][8] which indicates a genetic-developmental program underlying the typical form of functional brain laterality. One study reported a nonsignificant heritability (<1%) for the laterality of speech sound perception, based on the dichotic listening method, and considering the full range of trait variation from left-to right-ear-advantage. 9 However, atypical functional language dominance, that is, a categorical trait defined to include both RHLD and ambilateral dominance, has been shown to have a heritability of roughly 30%, measured with functional transcranial Doppler sonography during language production. 9,10 There have been no twin or family-based studies of RHLD heritability itself, likely due to the rarity of the trait. Twin and family studies have reported moderate heritability estimates for lefthandedness (24%-39%), 10,11 although heritability estimates based on genomic similarity between unrelated people in the general population are much lower for left-handedness (heritability = 1%-3%). 12,13 Regardless, molecular mechanisms for the initial "symmetry breaking" process in the mammalian brain, that is, for establishing a leftright axis in the very early embryo, remain unknown. 14 In contrast, much is known about the developmental origins of asymmetry of the visceral organs (ie, heart, lungs and so on). Increased activation of the nodal signaling cascade on the left side of an early embryonic structure, called the node, ultimately results in asymmetric organogenesis. 15 Motile cilia within the node are important for this process, because their unidirectional rotation, arising from the chirality of their protein constituents, produces a right-to-left fluid flow that triggers left-sided nodal expression. 15,16 Monogenic mutations in genes that encode components of motile cilia, or otherwise affect ciliary functions, can cause the disorder primary ciliary dyskinesia (PCD) together with situs inversus totalis (SIT), a condition affecting roughly 1/6000 to 1/8000 people, in which the visceral organs are placed as the mirror image of the usual arrangement. 16,17 PCD with SIT is a genetically heterogeneous condition, which can be caused by mutations in at least 37 different genes, 18 although one gene accounts for 15% to 28% of cases (DNAH5). 19,20 Intriguingly, people with PCD and SIT do not show an increased rate of RHLD or left-handedness, which suggests a fundamental dissociation between nodal-ciliary mechanisms of visceral axis formation and the brain functional lateralities for language and hand dominance. [21][22][23] Thus, the typical form of human brain functional laterality may instead originate from a genetic-developmental mechanism that is brain-intrinsic. Recent studies in Drosophila have showed that cellular chirality induces left-right asymmetry of individual organs in an organ-intrinsic manner, without being induced by the ciliarynodal pathway. [24][25][26][27] In these mechanisms, chirality is a transient property of whole cell morphology at key points in embryonic development. 24 A role of actin-related genes in establishing cellular chirality has been observed in both invertebrate (Drosophila, snail) [24][25][26][27] and vertebrate models (cultured cells, frog, zebrafish), 24,28,29 suggesting that this mechanism is important to establish left-right organ asymmetry across bilaterian groups. Apart from the cilia-related nodal signaling pathway, cellular chirality is the only biological mechanism that has been shown to give rise to organ asymmetry in multicellular animals, of which we are aware.
Recent analyses using the UK biobank data set, based on more than 300 000 participants, have reported that alleles of the microtubule-associated gene MAP2 have very small effects on the probability of becoming left-handed, as well as some other loci which did not clearly implicate individual genes. 30,31 However, the rarer trait of RHLD, found in only roughly 10% of left-handers and less than 1% of right-handers, has not been subject to any previous molecular genetic studies. By analogy with SIT, here we investigated whether RHLD might sometimes arise due to high-penetrance genetic mutations. We sequenced the genomes of 33 people with RHLD as assessed using fMRI, as well as 34 typically lateralized subjects (20 right-handed, 14 left-handed) and interrogated the data with reference to large population genetic databases ( Figure 1).
As this was an exploratory study, we performed separate analyses under recessive and dominant models, allowing for allelic heterogeneity (different causative mutations within a given gene) or genetic heterogeneity (causative mutations in different genes). We also tested for an increased rate of rare mutations in RHLD within specific candidate gene sets, in case an increased load of mutations affecting specific biological processes might increase the chance of having RHLD.
The candidate sets included genes involved in visceral laterality or the actin cytoskeleton, as well as a set of 18 genes which have been tentatively associated with human brain laterality in previous studies. 14 Figure 2 and Figure S2.
The subjects in this study were recruited from two separate sources, that is, the BIL&GIN data set (France) and the GOAL data set (Belgium). We studied hemispheric lateralization for three language tasks, namely production, reading and listening, using fMRI to calculate Global Hemispheric Functional Laterality Indexes (HFLIs), as described previously. 33 Each participant underwent a slow event-related functional MRI protocol including three runs, one for each language task, presented in a random order. The three runs followed the same structure, alternating execution of the task at the sentence level and at the word fMRI data analysis was performed using the SPM5 software (www.fil.ion.ucl.ac.uk/spm/). Scans of each participant and each run were normalized to our site-specific template, corrected for motion during the run, and then warped into the standard montreal neurological institute (MNI) space using a tri-linear interpolation, with subsequent smoothing using a 6-mm full width at half maximum (FWHM)

| BIL&GIN
Gaussian filtering. We then computed for each participant the BOLD signal difference maps and associated t-maps corresponding to the "sentence vs word-list" contrast for the production, reading and listening runs. For each individual and each language task, we computed a HFLI using the LI-toolbox applied to the individual contrast t-map of the considered language task. 34 A two-step procedure was then implemented to select RHLD subjects and typically lateralized controls. We first selected the 10 individuals previously identified as strongly right-lateralized in this data set using a stringent criterion based only on language production (HFLI for language production < −50). 3 Then, to identify individuals exhibiting a right-lateralized profile in all three language conditions, but who may have been overlooked in the first-step, we modeled the joint distribution of the three HFLI using a mixture of 3D Gaussian functions and applied a robust consensus clustering approach. 35

| GOAL
Sixteen RHLD participants were selected from a larger data set of healthy left-handers (N = 250) 37 that was first evaluated using the behavioral visual half field task to identify likely RHLD subjects, and then confirmed using fMRI to calculate Global HFLIs based on a language production task. 2 Participants were asked to covertly think of as many words as possible beginning with a letter presented in the middle of the screen for 15 seconds. Ten different letters were presented in randomized order. The baseline condition consisted of The 16 strongly right-lateralized individuals all met a stringent criterion for RHLD (HFLI for language production < −50). 2 Twelve controls were collected separately but their language lateralization was assessed using the same fMRI paradigm. The 12 controls each had a strongly leftward HFLI score (>50). HFLI distributions for RHLD and controls subjects are shown in Figure 2, and information on sex is given in Table  Sequencing was performed at 20 times average coverage depth, with 90 base pair (bp) paired-end reads for 11 of the RHLD subjects and 14 controls, and 150 bp paired-end reads for six RHLD subjects and eight controls. Raw reads were cleaned by excluding adapter sequences, reads with low-quality bases for more than 50% of their lengths, and reads with unknown bases for more than 10% of their lengths. Clean reads were mapped onto the human reference genome (hg19) using the software Burrows-Wheeler Aligner. 39 Bam files were sorted using SAMtools v1.2 40

| Stratification and inbreeding
Within the BIL&GIN and GOAL data sets separately, population structure was assessed by calling genotypes from the sequence data for hg19.vcf.gz with minor allele frequencies (MAFs) > 10% in each data set, 38 and had been pruned to be in low linkage disequilibrium (LD) with one another using the program PLINK (v1.9) (maximum LD r-square 0.2). 44,45 Multidimensional scaling was used to visualize the major dimensions of genome-wide variability ( Figure S1). None of the first five dimensions was associated with the RHLD vs control distinction in either of the data sets (all |T| < 1, P > .33). Inbreeding was assessed with the F coefficient estimate within each data set using PLINK (v1.9). 45 The measure was not associated with the RHLD vs control distinction in either data set (both |T| < 1, P > .39).
Note that common genetic variants were only used for the purposes of assessing population stratification and inbreeding within the data sets, whereas the rest of the study was focused on rare genetic variation, which has the potential to involve highly penetrant effects.

| Annotation of SNPs and indels
SNPs and indels were annotated using Annovar 46

and Variant Effect
Predictor (v88). 47 In the genome, nonsynonymous protein-coding variants, and variants which affect splice donor and acceptor sites, are a priori the most likely to grossly alter gene function. Accordingly, Gemini (v.20.0) 48 was used to select protein coding variants with "MEDIUM" or "HIGH" impact severity annotations, as well as noncoding variants with "HIGH" impact severity annotations (in practice those altering splice donor or acceptor sites). Additional filtering was performed in R and comprised the removal of "MEDIUM" variants with a PolyPhen 49 prediction score of "benign". MAF information was assigned as the maximum MAF across the GNOMAD (v1), ExAC (v3), 1KG, and ESP data sets (ie, "max_aaf_all" in Gemini), which together comprise whole exome or whole genome data from more than 120 000 people from various population data sets 50 (http://evs.gs. washington.edu/EVS/, http://www.internationalgenome.org/home).
Within the BIL&GIN and GOAL data sets separately, any variants present in at least 19 participants (case or control) were excluded as they are likely to be platform-specific errors or else common variants not previously detected by other sequencing platforms or protocols, and would necessarily be present in at least two control subjects in BIL&-GIN or three controls in GOAL (hence unlikely to be high-penetrance mutations for RHLD).

| Gene-level testing
The BIL&GIN and GOAL data sets were combined for subsequent analysis.
We first verified that the total number of mutated genes per subject did not differ significantly between RHLD and control sub- We performed a post hoc filtering step in which we further excluded from consideration, as potentially monogenic effects, all genes which were mutated in at least one control subject, as these genes were unlikely to be causal monogenically for RHLD. Note that this filter was only applied after the statistical analysis, in order not to bias the multiple testing correction.

| Mutational load in gene sets
We tested whether the RHLD cases had an increased mutational load in specific candidate gene-sets (see the Introduction for the  (Table S2). Only gene sets comprising at least 10 genes were considered.
To test for an increased mutational load within a given gene-set in RHLD, the sum of the number of mutated genes (as defined above) per subject within the set was compared between RHLD subjects and controls by means of the one-tailed exact binomial test, that is, considering the sum of mutated genes per subject in RHLD subjects only, the total sum across RHLD and controls combined, and the proportion of all subjects who were RHLD (33/67). Again, as an exact test, the binomial is robust for the subject sample size, and does not require assumptions on the number of mutations per individual.

| Association with handedness within the UK Biobank
Because the large majority of people with RHLD are left-handed, any monogenic contributions to RHLD would likely also be strongly penetrant for left-handedness. We checked whether a specific mutation of interest in the gene TCTN1, rs188817098, which we initially considered a potential candidate for causing RHLD in some subjects (see Results), is also associated with handedness the UK Biobank cohort data. There were 330 474 subjects (32 367 left-handed) available for this analysis. In this data set, rs188817098 had been directly genotyped and was in Hardy Weinberg equilibrium (P = 1), and the minor allele C had a frequency of 0.001305. Handedness (UK biobank field ID: 1707.0.0) was self-reported and coded for the present purposes as "left-handed" or "right-handed", as described elsewhere. 54 We performed association analysis of rs188817098 with handedness using the program BOLT-LMM (v2.3) which uses linear mixed effects regression under an additive genetic model. 55 The top 40 principal components capturing genetic diversity in the genome-wide genotype data, calculated using fastPCA 56 and provided by the UK biobank, 57 were included as covariates to control for population structure, as well as sex, age, genotyping array, and assessment center. The UK Biobank data were obtained as part of research application 16 066, with Clyde Francks as the principal applicant. The data collection for the UK Biobank has been described elsewhere. 58 Informed consent was obtained by the UK Biobank for all participants.

| Monogenic mutational models
We focused on mutations in the 33 RHLD cases which are known to be relatively rare in the general population on the basis of large-scale genetic databases and predicted to disruptively affect protein sequence, while not being mutated in a set of 34 control subjects (see Methods). As noted above, a given gene would need to be a monogenic cause for at least 10 or 11 of the 33 RHLD cases in this study, and not mutated in controls, to be detected at a significant level after multiple testing correction. There were no genes which met this threshold, under either the dominant or recessive models. Under a recessive model, no gene was even nominally significant (ie, showed unadjusted P < .05), which could have arisen from being mutated in as few as five RHLD cases and no controls.
In the dominant model, TCTN1 was the only nominally significant gene (P = .0267 before multiple testing correction), with five RHLD cases and no controls having heterozygous mutations (Table 2).

| Gene-set analysis
We analyzed a small number of candidate gene sets involved either in visceral laterality or else the actin cytoskeleton (see Introduction for the rationale). We observed an enrichment of mutations within the 'actin cytoskeleton' (GO:0015629) gene-set (Table 3). This gene set comprises 205 human genes (Table S1) (Table 3 and Figure S3). This suggests that individuals with RHLD have a significant enrichment of rare, disruptive mutations in genes involved in actin cytoskeleton structure and function.
In contrast, no differences were found between participants with RHLD and controls for the GO sets "cilium" (GO:0005929), "left-right axis specification" (GO:0070986), or sets defined on the basis of visceral laterality phenotypes or disorders, 18,20 as well as the set of 18 candidate genes which have been tentatively associated with human brain laterality in previous studies (Table 3), consistent with language dominance being largely or wholly independent of these pathways/sets.
We investigated subsets of genes defined as belonging to specific components of the actin cytoskeleton, which included "actin filament" (GO:0005884), "myosin complex" (GO:0016459), and "cortical actin cytoskeleton" (GO:0030864), but saw no significant increase in mutation rates in RHLD in these sets (Table S2). This may indicate that subsets of actin cytoskeleton genes that are more specifically relevant to lateralized brain development have not been defined within the GO.  (Table S3). This pattern indicates that left-handedness without RHLD is not linked to an increased rate of mutations in actin cytoskeleton genes, and that the tentative increase was a specific property of the RHLD subjects.
Per data set analysis showed that the increased mutational load in the actin cytoskeleton gene set was mostly driven by the BIL&GIN data set (P = .0006), while the effect was not significant in the GOAL data set (P = .4) despite having a similar trend of increased mutational load in RHLD cases (Table S4, Figure S3).

| DISCUSSION
Laterality is an important feature of the human brain's structural and functional organization. 14,61,62 Despite this, very little is known of the genetic contributions to typical brain laterality and its variation. In the present study, we performed the first molecular genetic investigation of RHLD, a trait which is present in only roughly 1% of the population.
We focused on relatively rare coding variants that are predicted to disrupt protein functions. A highly penetrant mutated gene in roughly one-third of the RHLD cases, and no controls, could have been detected at a significant level after adjusting for multiple testing in this study. This is a similar level of genetic heterogeneity as found in situs inversus of the visceral organs when it occurs together with PCD, for which up to roughly one quarter of cases are due to mutations in a single gene, DNAH5 19 .
However, we found no individual genes mutated in RHLD at this level, in the present study. It remains possible that some monogenic causes of RHLD were present in our data set, but we could not distinguish them with the present sample size. Note that the sample size precluded an investigation of common genetic effects with low penetrance, that is, the kinds of effects that are tested in typical genome-wide association studies of common traits. The approach here was necessarily focused only on rare variants, which might have sometimes acted as highly penetrant mutations. Nonetheless, it appears on the basis of our data that substantial genetic heterogeneity is likely to be involved in any heritable contribution to RHLD, even if some individual effects might be strongly penetrant. As noted in the introduction, non-leftward language dominance has previously been shown to have a heritability of roughly 30%, although the trait definition in that study included ambilateral individuals in addition to RHLD. 10 As RHLD is mostly found in left-handed people, 3  In the present study, candidate genes that have been tentatively associated with human brain laterality in previous studies showed no evidence for an increase in mutation load in RHLD. The only gene among these that had more mutations in RHLD cases than those in controls was AR (eight in RHLD cases, six in controls). For most of these genes, there is no clear mechanism that might link them to leftright axis determination through chiral properties.
We also found no evidence that candidate gene sets involved in visceral laterality or PCD have an enrichment of rare, proteinaltering mutations in RHLD. This finding is consistent with the fact that people with situs inversus of the viscera, when it occurs together with PCD, have shown normal population rates of lefthandedness and left hemisphere language dominance. [21][22][23] There- fore, there appears to be a developmental disconnect between nodal-ciliary-induced visceral laterality and the functional brain lateralities for hand dominance and language. This suggests that at least some aspects of human functional brain laterality arise from an independent and unknown mechanism, which may be brain-intrinsic.
A molecular-developmental pathway for laterality in the zebrafish brain has been relatively well described, but this appears to take its original cues from the nodal-visceral pathway, and thus the relevance for human functional brain laterality is not clear. 63,64 A relatively small-scale genome-wide association study in humans reported that genes involved in visceral laterality showed an enrichment of association signals with left-vs-right hand motor skill, 65 but a much larger study of binary-trait handedness in the UK Biobank data set, based on roughly 350 000 subjects, found no genetic link of handedness to visceral asymmetry genes. 30 Early life factors can also influence handedness, including birth weight, twinning and breastfeeding, but to an extent which is not remotely predictive at the individual level. 54 Intriguingly, it may be that situs inversus of the visceral organs does associate with left-handedness when not due to mutations affecting the nodal ciliary pathway, 23 although no causal genes were identified in a recent study which investigated the trait combination of situs inversus and left-handedness without PCD. 66 Here, we found initial evidence that people with RHLD have an elevated rate of rare, protein-altering mutations in genes involved in the structure and function of the actin cytoskeleton. This effect was robust to the use of either left or right-handed control groups, and thus was a specific property of RHLD subjects in this data set, rather than lefthandedness in general. We speculate that functional language laterality may be grounded in an evolutionarily ancient mechanism of inducing organ-intrinsic left-right morphogenesis, which can be traced back to the ancestral bilateria, and which arises from fundamental aspects of cellular biology and mechanics. 24,25,27 Developmental studies will be needed to assess whether cellular chirality is transiently present before asymmetric embryonic development of the mammalian brain. An understanding of how mutations of actin cytoskeleton genes might affect such a process will depend on detailed analysis of cellular models. An increased load of heterozygous mutations in genes affecting the actin cytoskeleton might affect brain laterality, while being otherwise well tolerated during development, due to compensation by non-mutated alleles at most of the genes involved. Given that common variants of the microtubule-associated gene MAP2 have recently been associated with left-handedness by large-scale GWAS, 30,31 our findings here in relation to RHLD may be broadly concordant, insofar as they also implicate the cytoskeleton in the developmental origins of human brain laterality.
The possible link of RHLD to actin cytoskeleton genes will need to be replicated in larger independent data sets. Within the present study, we combined the BIL&GIN and GOAL data sets to maximize the power to detect genetic effects on RHLD, although the functional tasks used to define RHLD differed between these two data sets: hemispheric dominance was defined using a contrast at the sentence level in BIL&GIN, and a word-level contrast in GOAL (see Methods).
However, we are not aware of a large-scale data collection in existence, or currently underway, in which a harmonized phenotypic measure of RHLD will become available and which would be wellpowered for GWAS.
Given the sample size for the present study, we focused on rare, protein-altering mutations which had the potential to be highly penetrant effects. Whole genome sequence data, of the type produced in the present study, also contain information on noncoding variation.
Rare noncoding variation has recently been implicated in neurodevelopmental disorders such as autism, 67,68 and a significant fraction of this variation is potentially important for gene function and regulation. 69 The noncoding genome comprises 98% of the genome, and interpreting the variation within these regions is challenging. Several attempts have been made to rank potentially causative variants across the genome based on scores that integrate different types of information, including conservation of DNA sequence, regulatory information, 70 and population genomic data. These ranking approaches include CADD, 71 DANN, 72 GWAVA, 73 M-CAP, 74 MetaSVM 75 or REVEL. 76 However, these ranking approaches are not very concordant with each other. 69 Moreover, the methods rely on assumptions about the deleteriousness/pathogenicity of variants, so that the overall approach is not an obvious fit for a non-pathogenic trait such as RHLD. Thus we did not pursue investigation of noncoding variation, which must await larger sample sizes and an improved understanding of the role of rare, non-coding variation in non-disease phenotypic variation.
Data sets based on hundreds of thousands of participants, such as the UK biobank, 77 permit the estimation of how much of the variance in brain traits can be explained by common genetic variants, and the detection of genetic loci with very small effect sizes. However, the use of such large data sets is usually at the expense of detailed and accurate phenotypic characterization. Correlated structural 78 or resting-state derived indices 79 may offer alternative ways to study RHLD in large data sets, but these approaches will always be indirect.
Hence, the approach taken in the present study is complementary to large-scale studies. We expect that convergent evidence arising from different strategies will help us better understand the biological underpinnings of language lateralization.

ACKNOWLEDGMENTS
The authors thank all the study participants.