Non‐coding regulatory elements: Potential roles in disease and the case of epilepsy

Non‐coding DNA (ncDNA) refers to the portion of the genome that does not code for proteins and accounts for the greatest physical proportion of the human genome. ncDNA includes sequences that are transcribed into RNA molecules, such as ribosomal RNAs (rRNAs), microRNAs (miRNAs), long non‐coding RNAs (lncRNAs) and un‐transcribed sequences that have regulatory functions, including gene promoters and enhancers. Variation in non‐coding regions of the genome have an established role in human disease, with growing evidence from many areas, including several cancers, Parkinson's disease and autism. Here, we review the features and functions of the regulatory elements that are present in the non‐coding genome and the role that these regions have in human disease. We then review the existing research in epilepsy and emphasise the potential value of further exploring non‐coding regulatory elements in epilepsy. In addition, we outline the most widely used techniques for recognising regulatory elements throughout the genome, current methodologies for investigating variation and the main challenges associated with research in the field of non‐coding DNA.


INTRODUCTION
Why do we not have a better understanding of disease biology? What generates wide phenotypic variation in a disease even when the major components of the disease are readily recognisable? Does this heterogeneity arise from environmental factors or genetic background, or both? In the case of brain disorders, the brain is a complex organ with both environmental and genetic determinants of structure, function and disease. Focusing on internal determinants, the complexity of the brain is intuitively linked to the number and connectivity of differentiated cell types that express cell-specific genes, exhibit unique properties and perform specialised functions [1].
At the genomic level, complexity is not simply determined by the expression of cell-specific protein-coding genes but may relate to the way these genes are regulated, with a significant role for non-coding DNA (ncDNA), amongst other influences.
ncDNA refers to the portion of the genome that does not code for proteins and accounts for the greatest proportion of the human genome. Indeed, it is estimated that only about 2% of the human genome encodes proteins, with the rest being non-protein-coding [2].
Of this, the exact percentage carrying functional properties is yet to be clarified, with different estimates being proposed [3]. Historically, a large percentage of the non-coding genome was referred to as 'junk DNA', due to the prevailing sentiment at the time that it lacked any functional relevance and was essentially useless [4][5][6]. This idea has persisted, predominantly due to the difficulty of investigating such a large and complex field, and the lack of appropriate techniques. In recent years, improvements in sequencing technologies, expression assays and advances in data handling and analysis have made it possible to study the ncDNA and have led to a greater appreciation of its role in human health and disease.
The biology of ncRNA and their roles in disease are not covered in this review (see previous reviews [7,8]). Further, the 5 0 and 3 0 untranslated regions (UTRs), located at either end of the mRNA, also fall into this category of ncDNA. UTRs are crucial for the regulation of protein expression. For example, the translation of upstream open reading frames (uORFs) within the 5 0 UTR is known to be a common mechanism for controlling the level of protein production downstream, according to a general model based on competition for ribosome binding [9]. Due to their role in mRNA translation initiation, the question arises whether these regions should be classified as coding DNA rather than non-coding DNA. Therefore, despite the importance of UTRs in the regulation of gene expression, due to the ambiguity in their definition, UTRs will not be discussed further. ncDNA also contains un-transcribed regions that function as regulatory elements, which represent the main focus of this review. These include gene promoters and transcription factor binding sites, enhancers, transposable elements (TEs) and topologically associating domain (TAD) boundaries ( Figure 2). ncDNA and variation in non-coding regions of the genome have an established role in human disease, with evidence in cancers, Parkinson's disease and autism, amongst others [10][11][12][13].
There is still an open question about the causes of the wide phenotypic variability that characterise many epilepsies with a known genetic cause; ncDNA may contribute to this variability [14]. Noncoding regulatory regions may harbour variations that influence gene expression, and have a disease-modifying effect, or influence treatment response. Here, we consider the evidence supporting the role of ncDNA in epilepsy. The regulatory elements that are present in the non-coding genome will be described and the role that these regions have in human disease will be discussed. Examples of the importance that non-coding regulatory regions have in human diseases will be reported, and the existing research in the field of epilepsy will be reviewed. We make the case for further research to appreciate the potential value of non-coding regulatory elements in epilepsy. In addition, this review will also outline the most widely used techniques for recognising regulatory elements throughout the genome, the main methodologies for investigating variation and the main challenges associated with research in the field of ncDNA.

NON-CODING REGULATORY REGIONS
One of the most important classes of non-coding regulatory elements of the genome are gene promoters, which are essential for determining the direction of transcription, indicating the sense strand of the DNA and regulating gene expression. The gene promoter is located upstream of, and partially overlapping with, the transcription start site (TSS) of the gene it regulates, thus occupying the first part of the 5 0 UTR region, as shown in Figure 2A [15]. The minimal portion of the promoter required to initiate transcription is called the core promoter; it spans between 60 and 120 base-pairs and represents the transcriptional machinery assembly site [16][17][18]. The core promoter contains the RNA polymerase binding site, the TSS and optional motifs, including the Goldberg-Hogness box (commonly called the TATA box), the Initiator element (Inr), the downstream promoter element (DPE) and

Key points
• Non-coding regulatory elements: description and role in disease.
• Techniques for the identification of non-coding regulatory elements.
• Methodologies to investigate non-coding variation and challenges.
• Existing research in the field of epilepsy. the TFIIB recognition element (BRE) [19]. Such optional motifs may extend downstream of the TSS: for example the DPE motif, when present, is located 28-33 nucleotides after the TSS (Figure 2A) [20].
Beyond the core promoter is the proximal promoter, located 250 bp upstream of the TSS, and usually extending up to 1000-2000 bp [21].
The proximal promoter contains binding sites for both general and sequence-specific transcription factors [22].
The activity of promoters is influenced by additional regulatory sequences that can modulate the expression of genes from a genomic location even further away. These elements are called enhancers, DNA sequences that range from 50 to 1500 bp in length, to which proteins called activators and repressors can bind. The interaction of the enhancer and the activator/repressor results in the creation of a chromatin loop that can shift the enhancer closer to the gene promoter and allows mediator proteins to be recruited. Mediator proteins either promote or prevent the binding of RNA polymerase, resulting in promotion or repression of the target gene expression. Genes can be modulated by different enhancers, and each enhancer can modulate multiple genes [15]. Enhancers may be located thousands of base pairs away from the target gene, either upstream or downstream of the TSS [15]. The interaction between enhancer and the target promoter occurs through chromatin loops and is supported by proteins called cohesins. Chromatin loops represent non-random threedimensional folds of chromatin that generate physical interactions between distantly located genetic sequences, including long-range interactions between regulatory sequences and the corresponding target genes [23]. Clusters of multiple enhancers may occur in the genome: these typically exhibit similar activity and regulate the same genes. Such redundancy of enhancers may be crucial, especially during development, to provide robustness in case of loss-of-function mutations and to ensure the correct spatiotemporal expression of target genes, necessary to guide development [24][25][26]. Protein-coding gene redundancy has mostly been lost during evolution, and it is possible that today's redundancy involves regulatory elements rather than F I G U R E 1 RNA types and functions. Protein-coding genes are transcribed as pre-mRNAs, which undergo post-transcriptional modifications, becoming mature mRNAs. Among the post-transcriptional modifications of pre-mRNAs is the removal of introns (splicing), which occurs through snRNAs. snRNAs, which guide the splicing process, can be divided into two classes: one class never leaves the nucleus, while another class undergo post-transcriptional modifications in the cytoplasm, before re-entering the nucleus and being functional. The process of splicing may lead to the formation of circRNAs. circRNAs have their 5 0 and 3 0 ends bound together and are involved in the regulation of alternative splicing of the same genes from which they derive. circRNAs can also interact with miRNAs and inhibit their activity. Mature mRNAs exit the nucleus and reach the cytoplasm, where they are translated into proteins. mRNA translation involves ribosomes, macromolecules composed of proteins and rRNAs. rRNAs are initially transcribed as pre-rRNAs and undergo post-transcriptional modifications that involve snoRNAs. snoRNAs guide the post-transcriptional modifications of rRNAs, snoRNAs and tRNAs. tRNAs are also involved in the mRNA translation process: tRNAs recognise specific mRNA codons and carry the corresponding amino acids to the protein synthesis site. Translation of mRNAs into proteins may be prevented by miRNAs. miRNAs are transcribed in the nucleus as pre-miRNAs, which exit the nucleus and undergo cleavage steps, becoming functional. miRNAs show complementarity with a target mRNA, bind to these target mRNAs and induce their cleavage. Another class of RNA are lncRNAs, which have multiple functions, including interacting with DNA and recruiting regulatory proteins to modulate histone modification. piRNAs are transcribed as precursor molecules, which exit the nucleus and interact with the regulatory proteins PIWI (abbreviation of P-element Induced WImpy testis in Drosophila). The piRNA-PIWI complex is involved in stem cell differentiation and silencing of transposable elements, acting both at the transcription level, by silencing the gene, and post-transcription level, inducing the cleavage of mRNA protein-coding genes in order to achieve differential gene expression in various tissues with the least amount of 'space' in the genome [27].
Enhancer functionality may be limited by TAD boundaries (or insulator elements), which represent another class of DNA regulatory elements that are capable of blocking the physical interaction between enhancer and gene promoter [28]. TAD boundaries also function as a chromatin barrier: these regions interact with cohesins and CCCTC-binding factor (CTCF), a transcriptional repressor protein, forming a complex that constitutes a physical impediment to prevent the excessive spread of heterochromatin [28].
TEs represent another class of ncDNA elements involved in regulatory control. TEs are capable of altering their position in the genome and can be divided into two different categories, based on the mechanism of transposition. Retrotransposons, TEs of Class 1, use a 'copy and paste' mechanism: through reverse-transcription and the production of an RNA intermediate, they introduce a new copy of themselves to a different genetic location. Transposons, Class 2 TEs, use a 'cut and paste' mechanism: their sequence includes the genetic code for the transposase enzyme, which they use to excise themselves from one genetic locus and integrate into a different one [29].
Most copies of TEs in our genome have lost the ability to mobilise due to mutations and now have a fixed genetic location. TEs that are still mobile within an individual's genome mobilise predominantly in germ cells and during early embryogenesis [30,31]. Since TEs often include TSSs and other regulatory sequences in their own sequence, their mobilisation has contributed to the formation of novel tissue-specific promoters and transcription factor binding sites (TFBSs), which now have a role in ensuring the correct spatiotemporal gene expression patterns during development [32,33]. Furthermore, TE mobilisation also occurs in somatic cells.
This happens in the brain, particularly in the hippocampus, where the somatic transposition of the human long interspersed nuclear element-1 (LINE-1, also called L1) in neural precursor cells contributes to neuronal hippocampal diversity [31,34]. However, TE insertion into the genome may also have a deleterious effect and cause disease. Examples include Haemophilia A, the first disease in which an association with TEs was proven, and neurofibromatosis type 1 (NF1) [35,36]. Another example is Rett syndrome, which is a neurological condition caused by mutations in the methyl-CpG-binding protein 2 gene (MECP2). MECP2 is a regulator of L1 transposition, and patients with Rett Syndrome, carrying a mutated MECP2, show increased L1 mobilisation, which possibly contributes to the Rett phenotype [37,38]. The regulation and silencing of TE transposition are complex and rely on several elements: piRNAs, which interact with PIWI proteins and drive repressive chromatin marks on the promoter region of TEs, and zinc finger proteins containing the Kruppelassociated box (KRAB-ZFPs), which bind to the TE sequence and F I G U R E 2 Schematic representation of non-coding regulatory elements. (A) Gene promoters represent the site where the transcriptional machinery assembly occurs, and transcription factors interact to regulate the expression of genes. Gene promoters include the core promoter, the minimal required portion to initiate the transcription, and the proximal promoter. (B) Enhancers are regulatory elements that can be bound by activators and repressor and modulate the expression of target genes. (C) Topologically associating domain (TAD) boundaries may prevent the physical interaction between enhancers and the target genes. (D) Transposable elements are regulatory sequences capable of altering their position in the genome. Retrotransposons use a 'copy and paste' mechanism and introduce a copy of themselves to a different genetic location; transposons use a 'cut and paste' mechanism: they excise themselves from one genetic locus and integrate into a different one. Translocation of transposable element (TE) occurs predominantly in germ cells and during early embryogenesis, when TEs contribute defining the temporal expression of genes. BREu: upstream TFIIB Recognition Element, BREd: downstream TFIIB Recognition Element, TATA box, Inr: Initiator element, DPE: downstream promoter element, TSS: transcription start site, gDNA: genomic DNA recruit additional proteins, ultimately adding repressive chromatin marks to the promoter region of the TE [39,40].

NON-CODING VARIATION
The term 'non-coding variation' refers to genetic changes that occur within non-coding regions of the genome. Non-coding variation may be represented either by single-nucleotide polymorphisms (SNPs) or structural variations, including short insertions or deletions (collectively called InDels), copy-number variations (CNVs) and repeat expansions.
SNPs, described as single nucleotide substitutions at specific positions of a DNA sequence, represent the most common type of DNA variation. It is estimated that 90% of disease-associated SNPs fall in non-coding sequences of the genome [41]. However, for most of the millions of SNPs that have been identified in ncDNA by the Human Genome Project and investigated in GWAS studies, we do not yet understand the functional implications.

Structural variations include small insertions and deletions
(defined as the loss and gain of sequences up to 1 kb in length), duplications, inversions, translocation and CNVs, which represent duplications and deletions of sequences greater than 1 kb in length [42,43]. CNVs have been observed throughout the genome, but interestingly not all chromosomes are affected equally: some chromosomes typically have large numbers of CNVs (such as chromosome 19 and 22), whereas other regions are described as CNV-deserts, often being devoid of CNVs [44,45].
Another type of structural variation is represented by repeat expansions, which represent the expansion of repeated DNA sequences: the size of the repeat may vary from trinucleotide repeats to 12-nucleotide long sequences, and the number of times this sequence is repeated is also variable [46,47]. Examples of diseases caused by repeat expansions occurring in noncoding sequences include Friedreich's ataxia, amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) [48,49]. Non-coding repeat expansions have also been found in some epilepsies, including progressive myoclonus epilepsy of the Unverricht-Lundborg type (EPM1), associated with the expansion of a dodecamer repeat in the promoter region of the cystatin B gene (CSTB), and benign adult familial myoclonic epilepsy (BAFME), associated with intronic expansions of a five-nucleotide long sequence (TTTCA or TTTTA): expansions of this sequence have been identified in introns of different genes in different patients [50][51][52].

TECHNIQUES FOR IDENTIFYING REGULATORY ELEMENTS
In order to detect and analyse variation in non-coding regulatory elements, it is first necessary to localise these regions in the genome, as summarised in the pipeline in Figure 3. Several strategies, which are outlined in the following sections, can be used to achieve this ( Table 1). The most reliable and accurate method for predicting the location of regulatory elements is to integrate data from multiple methods, as will be described in Section 4.6 'Online databases'.

Transcription factor binding site localisation
One strategy for identifying non-coding regulatory elements is to localise transcription factor binding sites (TFBSs) across the genome, which can be achieved using Chromatin immunoprecipitation coupled with massively parallel DNA sequencing (ChIP-seq), or by associating open-chromatin assays with computational footprinting methods [53][54][55]. Moreover, all possible binding sites of a particular transcription factor can be examined using position-weight matrix methods (PWM) [53,56].
F I G U R E 3 Flowchart of the steps required to study non-coding genetic variations. After DNA collection and whole-genome sequencing (WGS), is the identification of non-coding regulatory elements, which can be achieved using different approaches, described in Table 1. In the subsequent discovery phase, variant calling tools will be used to identify genetic variations in the cohort of interest. For the functional annotation of variants, the one reliable strategy is to assess the functional consequences of variants using multiple techniques and compare these predictions to provide robustness to the interpretation. Finally, the functional consequences of variants will be evaluated and then validated using wet-lab experiments. WGS: Note: All strategies listed contribute to phase number 3 of the pipeline illustrated below in Figure 3. TFBS: transcription factor binding site, TF: transcription factor.

Chromatin accessibility assays
Chromatin accessibility assays enable the prediction of the three-

Comparative genomics tools
Comparative genomics tools may be used to identify in silico conserved non-coding sequences, which may correspond to functional regions according to the evolutionary conservation principle [74]. The hypothesis is that vertebrates use the same regulatory sequences across phylogeny to control gene expression and, assuming that mutations in such sequences are deleterious or disadvantageous to the organism, these regions are likely to have remained stable and unmutated throughout evolution [75,76]. However, non-coding sequences are known to have a higher evolutionary turnover than protein-coding sequences, such that conservation per se is not a strong indicator of functional relevance but may be useful if combined with other types of data. Examples of comparative genomics and sequence conservation tools are the Basic Local Alignment Search Tool (BLAST), PhastCons and PhyloP [77][78][79][80][81].

Online databases
The use of online databases is the most comprehensive and convenient method to identify regulatory regions throughout the genome because these integrate data from many of the assays described above to accurately locate functional elements. The main limitation of these databases is that the majority of the regulatory elements for which they provide information, especially enhancers, are simply putative, predicted regulatory regions, of which only a small portion have been experimentally validated [82]. Examples include the Encyclopedia of DNA Elements (ENCODE database), the Functional Annotation of the Mammalian Genome project (FANTOM5), the PsychENCODE Consortium and the machine learning tool RefMap [83,84].
All the methods and strategies described above contribute to phase 3 of the pipeline illustrated in Figure 3.

VARIANT ANNOTATION
Once the variants falling in regulatory regions of the genome have been identified, the next and most difficult step is variant annotation that is assigning a functional impact to each variant. This is a major challenge due to various factors, including the issue of identifying the target gene(s) regulated by a particular regulatory element [85]. Regulatory elements may modulate the expression of nearby genes, referred to as cis-regulation, or distantly located genes, called trans-regulation. A further complication is the presence of groups of alleles that co-occur and are co-inherited, a phenomenon known as linkage disequilibrium (LD).
LD structure complicates the functional evaluation of variants because, assuming a group of SNPs in LD, it is difficult to determine the actual causal variant that affects the phenotype [85,86]. Additionally, variants falling in regulatory regions are likely to have a small and quantitative functional effect, which is much more difficult to detect and interpret than the large qualitative consequences caused by many deleterious variants in protein-coding genes [87,88].
There are several methods utilised to score non-coding variants and predict their potential functional impact ( Table 2). One of the most widely used prioritisation and functional prediction tool is the sequence, non-coding RNA or non-coding regulatory region) [94,95].
Another useful tool for assessing the implications of non-coding variation is the Genotype-Tissue Expression database (GTEx), which represents a comprehensive public catalogue of tissue-specific gene expression and regulation data [96]. The information stored in the GTEx catalogue can be used to determine whether a queried variant T A B L E 2 Techniques and resources used to annotate non-coding variants functions as an eQTL for specific genes and is capable of modulating the expression of protein-coding genes. In the investigation of noncoding variants, eQTL data, such as those produced by GTEx, may be useful for identifying likely target genes that are affected by a particular non-coding regulatory variant [97]. For brain-related data and neurological disorder studies, the same type of information obtained from GTEx can be collected through the PsychENCODE Consortium: a multi-site project that aims at creating a comprehensive catalogue of the gene regulatory landscape of the human brain [98].
An additional tool that may be useful to annotate non-coding variants in brain-related studies is Hi-C coupled multimarker analysis of genomic annotation (H-MAGMA) [99].

NON-CODING VARIATION IN DISEASE
The first evidence of direct involvement of non-coding variation in disease dates back to 1982, when a single nucleotide substitution was detected in the promoter region of the haemoglobin subunit beta gene (HBB) (encoding β-globin, a subunit of haemoglobin) and was found to reduce HBB gene expression [100]. In cancer biology, many non-coding variants have been found in the regulatory regions of cancer-related genes and have been found to function as cancer-drivers [10,11,109]. One example is the case of the telomerase reverse transcriptase gene (TERT) promoter. TERT encodes the catalytic subunit of telomerase, an enzyme that regulates elongation of telomeres, repeated sequences localised at the ends of chromosomes, the purpose of which is to protect chromosomes from end-to-end fusion and end-degradation [110]. End-degradation represents the physiological loss of the terminal portion of the DNA strand that occurs during DNA replication. To prevent the loss of coding sequences, telomeres create a protective cap at the ends of chromosomes that is gradually degraded during DNA replication cycles. Physiologically, telomeres progressively shorten throughout life due to the inactivation of TERT [111]. In cancer cells, TERT reactivates and results in minimal telomere shortening, leading to telomere stabilisation and the capacity for indefinite cell proliferation [112]. Variations within the TERT promoter were originally identified in melanoma: a highly recurrent promoter variant was found to be involved in the tumorigenic process, as it created a novel binding motif for transcription factors, thus supporting permanent telomerase expression [113].
TERT promoter mutation status may influence treatment response and can be used to predict how patients will respond to treatments: for example, TERT promoter variants have been associated with resistance to radiotherapy in patients with glioma [119][120][121]. The status of the TERT promoter may also be used to stratify patients and predict prognosis [122][123][124][125]. Furthermore, the TERT promoter is currently being investigated as a potential therapeutic target: a recently published preclinical study explored the use of programmable CRISPR-based base editing on the TERT promoter to reverse the mutation that activates TERT expression and thereby inhibit tumour growth [110,112].

EVIDENCE OF NON-CODING VARIATION IN THE EPILEPSIES
Many epilepsies are characterised by significant and usually unexplained phenotypic variability, which could be potentially associated with genetic variation in non-coding sequences, functioning as disease modifiers. Additionally, ncDNA may potentially harbour genetic variants that act as disease risk variants, which could be explored as prognostic biomarkers to stratify patients and identify those with a higher risk of developing comorbidities or experiencing more severe symptoms [126]. The investigation of ncDNA in epilepsy has been primarily aimed at non-coding RNAs [127,128]. Additional work has been carried out investigating the methylation state and epimutations in non-coding DNA regions [129,130]; however, overall, the non-coding regulatory regions of the genome remain relatively unexplored.

Variation in promoter regions
There is some limited evidence of variation in the promoter region of reduced expression and reduced channel function, leading to a more severe phenotype, thus indicating a disease-modifier effect [134].
Neither of these studies has been replicated.
Additionally, evidence exists for a relationship between altered methylation status in the promoter regions of genes and the pathogenesis of epilepsy. Although not confirmed, such methylation alterations may occur because of genetic variations in gene promoter sequences. One example is temporal lobe epilepsy, in which hypermethylation in the promoter region of the reelin gene (RELN), was reported [135]. In addition, Belhedi et al. described an increased methylation status in the promoter region of the carboxypeptidase A6 gene (CPA6) in patients with focal epilepsy and febrile seizures [136]. Neither study has been replicated, and the credibility of RELN as a gene of relevance in epilepsy per se has been questioned [137]. While still a relatively young field, important research has been carried out on changes in the methylation state of non-coding sequences [129,130].

Non-coding structural variations
Structural variations are also known to play a relevant role in epilepsy, and there is evidence of structural variation occurring in non-coding sequences, associated with epilepsy pathogenesis. One example is benign adult familial myoclonic epilepsy (BAFME), also described as  [51,[138][139][140]. Such heterogeneity of culpable genes and the presence of the repeat expansion in all the types of BAFME suggests a correlation between the repeat expansion and the pathogenesis of BAFME, regardless of the gene in which the expansion occurs [51,138,139].
Another example is progressive myoclonus epilepsy of the  [144,145].
One study reported somatic copy number gains in the enhancer region of the epidermal growth factor receptor gene (EGFR) and the promoter region of the platelet derived growth factor receptor alpha gene (PDGFRA), without alterations in the coding sequence, in brain tissue from patients with focal cortical dysplasia operated for treatment-resistant epilepsy. In addition to the amplification of noncoding regulatory elements, an upregulation of EGFR and PDGFRA was also reported. However, this correlation has not been experimentally confirmed and the mechanism responsible for this association was not addressed in the study [146].
Monlong et al. investigated CNVs in a cohort of 198 patients with epilepsy and found an enrichment of non-coding CNVs close to known epilepsy genes, which likely fall into regulatory sequences [147]. However, the functional effect of these non-coding CNVs has not been experimentally verified.

STRATEGY TO INVESTIGATE NON-CODING VARIATION AND CHALLENGES
A putative schematic representation of the steps to be taken to investigate non-coding genetic variation is shown in Figure 3. In the preliminary phases of the pipeline, blood is currently the gold standard source for DNA collection, as blood-derived DNA samples are of higher quality and are more likely to pass stringent quality controls (QCs) of DNA integrity and concentration, whereas saliva-derived samples, for example, are more prone to QC failure [148]. The major drawback of this approach is the inability to detect somatic genetic variations: in the case of epilepsy, for example, somatic variations may occur in the brain, and might only be detectable with the use of brain-tissue derived DNA samples. Relating to the sequencing approach, although the majority of studies use Whole Exome Sequencing (WES) or targeted sequencing panels, neither of these two methods generate information about noncoding elements; for genome-wide exploration, Whole Genome Sequencing (WGS) is the most appropriate approach.
As shown in the figure, the final phase of the workflow is the functional validation of the candidate variants, which is performed predominantly using wet-lab experimental approaches. The most widely used experimental methods are luciferase reporter assays and the use of CRISPR system to generate viable models. The luciferase reporter assay aims to compare the level of luciferase expression in the presence and absence of a variant of interest [149]. The CRISPR-mediated genome editing system can be used to generate both cellular and animal models, such as murine models, carrying the candidate variation, thus allowing evaluation of the functional impact in vivo [150].
However, in the investigation of non-coding variants, multiple challenges need to be highlighted. First, the challenge of predicting the functional consequences of non-coding variation, which can result in discordant predictions from different annotation tools: this is a major limitation that also applies to the investigation of proteincoding variants [151]. Furthermore, unlike the investigation of protein-coding variants, another challenge in the study of non-coding variation is the current lack of a variation database, which would allow researchers to quickly determine whether a non-coding variant has been previously observed and linked to disease. Second, the detection of non-coding regulatory regions, such as promoter regions, which is complicated by the existence of overlapping genes: distinct genes that share a genetic region. It is estimated that about one-quarter of human protein-coding genes overlap with each other. Most of these are co-expressed in the same tissue type and are likely to be coregulated [152]. The presence of overlapping genes makes it difficult not only to identify the regulatory sequences flanking a proteincoding gene, but also to understand the functional consequences of variants. Indeed, assuming a pair of genes are overlapping, genetic variations falling in the regulatory sequence upstream of both genes may have an impact on the expression of both genes, while variants falling in the regulatory sequence upstream of the second gene may also fall within the coding sequence of the first gene, thus further complicating the functional interpretation of non-coding variants.
Third, the reliability of data on regulatory element localisation: to date, most of the available data, particularly for enhancers, is simply prediction, with only a small portion being experimentally validated data, thus limiting the robustness of results and highlighting the need to produce experimentally-validated enhancer data resources. Recently, non-human model systems have been used to identify and experimentally validate non-coding regulatory elements [153].

CONCLUSIONS AND FUTURE PERSPECTIVE
In conclusion, despite the lack of a comprehensive and systematic genome-wide investigation of non-coding regulatory variation in epilepsy, we suggest that ncDNA has the potential to be of relevance in epilepsy research. Non-coding regulatory regions may harbour variations that influence gene expression and contribute to the phenotypic variability of disease, may have a disease-modifying effect, or influence treatment response. Furthermore, the findings of studies in other fields indicate that non-coding variation may also represent the main source of disease causation, and, eventually, may represent potential therapeutic targets for innovative treatment strategies. Overall, the non-coding genome represents an exciting area to investigate.