A machine‐learned analysis of human gene polymorphisms modulating persisting pain points to major roles of neuroimmune processes

Abstract Background Human genetic research has implicated functional variants of more than one hundred genes in the modulation of persisting pain. Artificial intelligence and machine‐learning techniques may combine this knowledge with results of genetic research gathered in any context, which permits the identification of the key biological processes involved in chronic sensitization to pain. Methods Based on published evidence, a set of 110 genes carrying variants reported to be associated with modulation of the clinical phenotype of persisting pain in eight different clinical settings was submitted to unsupervised machine‐learning aimed at functional clustering. Subsequently, a mathematically supported subset of genes, comprising those most consistently involved in persisting pain, was analysed by means of computational functional genomics in the Gene Ontology knowledgebase. Results Clustering of genes with evidence for a modulation of persisting pain elucidated a functionally heterogeneous set. The situation cleared when the focus was narrowed to a genetic modulation consistently observed throughout several clinical settings. On this basis, two groups of biological processes, the immune system and nitric oxide signalling, emerged as major players in sensitization to persisting pain, which is biologically highly plausible and in agreement with other lines of pain research. Conclusions The present computational functional genomics‐based approach provided a computational systems‐biology perspective on chronic sensitization to pain. Human genetic control of persisting pain points to the immune system as a source of potential future targets for drugs directed against persisting pain. Contemporary machine‐learned methods provide innovative approaches to knowledge discovery from previous evidence. Significance We show that knowledge discovery in genetic databases and contemporary machine‐learned techniques can identify relevant biological processes involved in Persitent pain.


Introduction
Persisting pain has a high prevalence (Elliott et al., 1999;Breivik et al., 2006;van Hecke et al., 2013) affecting a significant proportion of the world's population (Breivik et al., 2006). Its pathophysiology is incompletely understood, which is reflected in the limited success of available treatment options (Moore et al., 2009Derry et al., 2012) and has stimulated intense research on this topic (Kringel and L€ otsch, 2015). In this context, the study of human gene polymorphisms that modulate the persisting pain phenotype is an accepted research approach which has been pursued for more than 50 years (Godinova, 1965). A genetic background to persisting pain is clearly reflected by a protective effect against persisting pain exerted, for example, by a haplotype of the guanosine-5 0 -triphosphate (GTP) cyclohydrolase 1 gene (GCH1), originally reported to be composed of 15 genetic variants (Tegeder et al., 2006(Tegeder et al., , 2008, or by a reduction in the perceived intensity of pain exerted, for example, by a deletion/ insertion variant in the serotonin transporter genelinked polymorphic region (5-HTTLPR; gene: SLC6A4) reportedly reducing the perception of heat pain (Lindstedt et al., 2011;Hooten et al., 2013;Kunz et al., 2016). On the other hand, increased risk for persisting pain is conferred, for example, by the rs12584920 variant of the 5-hydroxytryptamine receptor 2A gene (HTR2A) (Nicholl et al., 2011) or the rs734784 polymorphism in the potassium voltage-gated channel modifier, subfamily S member 1, gene (KCNS1) .
Human genetic research during the last decade has identified many common variants of more than a hundred different genes spread across the genome that modulate the phenotype of persisting pain in several different clinical settings (Table 1). Thanks to concomitant developments in computer science, including progress in artificial intelligence, machinelearning and knowledge discovery in databases (Ashburner et al., 2000), the analysis of fundamental, complex biological functions has become increasingly possible. This allows persisting pain to be approached at a functional genomics level by combining the information on genetic modulation acquired in clinical studies with current knowledge of the function of human genes. This active research topic has already led to the identification of candidate genes for further clinical genetic pain research (L€ otsch et al., 2013) and highlighted key pathophysiological processes of pain which may be targeted for future pharmacological treatment options (Ultsch et al., 2016).
In the present analysis, information on genes, for which empirical evidence indicates the existence of variants that modulate the clinical phenotype of persisting pain, was analysed at a functional genomics level. In this way, the biochemical, cellular and/or physiological properties of each and every gene product can be investigated to gain an understanding of the relationship between the genome and the phenotype on a global genomewide scale (Gibson and Muse, 2009). Applying machine-learned techniques (Fig. 1), the genes currently known to have relevance to human persisting pain were analysed for functional patterns that may provide insight into its pathophysiology based on current research activities, applying a data-driven approach without using prior hypotheses about the most important biological functions characterizing persisting pain. By applying methods of precisely calculated item selection (Ultsch and L€ otsch, 2015), the present analysis aimed to identify a subset of genes that were most consistently reported to be involved in the modulation of persisting pain with a subsequent analysis of the main biological functions exerted by the products of these genes.

Search strategy
A set of genes relevant to persisting pain, based on published associations of their variants with phenotypic differences in persisting pain patients, was obtained from (1) a PubMed database (accessed in August 2016), by searches for the string '(chronic OR persisting OR neuropathic OR back OR inflammatory OR musculoskeletal OR visceral OR widespread OR idiopathic OR fibromyalgia) AND pain AND (polymorphism OR variant) NOT review' and (2) publications starting from the year 2000, which is close to the first sequencing of the human genome (Lander et al., 2001;Venter et al., 2001) marking the beginning of a new area of genetic research, and (3) published overviews on pain genetics (e.g. Edwards, 2006;Tegeder and L€ otsch, 2009;Mogil, 2012;Zorina-Lichtenwalter et al., 2016). To avoid redundancies, reports of positive associations of any gene variant were included only once per clinical setting in the present analysis. This implies that not every variant found to be functionally associated with a persisting pain phenotype was taken into account. The resulting information for each gene, thus, comprised (1) a positive report of a gene

Data analysis
Data were analysed using the R software package (version 3.3.2 for Linux; http://CRAN.R-project.org/; R Development Core Team, 2008) on an Intel Xeon â computer running on Ubuntu Linux 16.04.1 64-bit. The analysis employed several methods of machinelearning that, as described previously (L€ otsch and Ultsch, 2017a), may be referred to as a set of methods that can automatically detect patterns in data and then use the uncovered patterns to predict or classify future data, to observe structures such as subgroups in the data or to extract information from the data suitable to derive new knowledge (Murphy, 2012;Dhar, 2013). More detailed descriptions including definitions of key concepts have been provided elsewhere (L€ otsch and Ultsch, 2017a). The analysis aimed at describing the functional genomics of persisting pain based on the biological roles of the genes that reportedly carry variants modulating that phenotype. The biological roles were assessed as biological processes in which the genes are involved, defined as series of events or molecular functions with a defined beginning and end (Ashburner et al., 2000) and queried from the Gene Ontology (GO) knowledgebase that provides the acquired knowledge about the biological functions of gene products, described with a controlled vocabulary of GO terms (Ashburner et al., 2000).
Thus, the basis on which the present functional picture of persisting pain was created consisted of the biological processes in which the genes carrying modulatory variants were reported to be involved. The functional picture of persisting pain was sought pursuing two different analytical paths (Fig. 1). In a first approach, functional subgroups were sought in the set of human genes, variants of which have been associated with modulation of the clinical phenotype of persisting pain. This was addressed by applying a clustering algorithm on the matrix given by the genes versus their annotated biological processes; an approach that previously proofed as suitable for gene function based classifications (L€ otsch and Ultsch, 2016a). In a second approach, the hypothesis was pursued that the functional genomics of persisting pain will prevail across clinical settings irrespective of the disease that had originally triggered the process. Therefore, the most informative subsets of the genes were identified using a computed item categorization technique The gene, NCBI number, clinical setting of persisting pain are shown together with a key reference in which this association was reported. The studies are given grouped for the relevant gene, however, in arbitrary order of clinical settings or publication year. (Ultsch and L€ otsch, 2015) and subsequently, techniques of knowledge discovery were applied to identify the particular biological roles exerted by this set of genes as opposed to the biological functions exerted by a similarly sized random set of genes. This analysis was performed twice, once using the most informative genes resulting from above-mentioned analysis, and, to strengthen the evidence for pain reliance, again using the additional selection criterion that the genes should be listed among pain-relevant genes in the PainGenes database (Lacroix-Fralish et al., 2007). The analytical steps are described in detail in the following.

Functional clustering of genes
A first analytical approach to the functional genomics of persisting pain reflected in the set of genes that carry variants reported to modulate the phenotype aimed at finding functional clusters of genes.

Generation of a gene versus functional feature matrix
As a prerequisite for functional clustering of genes, a gene versus function matrix was created. Therefore, the biological functions in which the genes, or their products, are involved were queried from the GO knowledge base (http://www.geneontology.org/). The GO knowledge base is searchable for three major categories, consisting of biological process, cellular component and molecular function. As the most suitable representation of processes that are involved in the chronification of central sensitization to pain, the GO category biological process was selected as previously (L€ otsch et al., 2013;L€ otsch and Ultsch, 2016b;Ultsch et al., 2016). According to the GO knowledgebase, this category contains one or more ordered collections of molecular functions involving chemical or physical transformations such as cell growth and maintenance or signal transduction (Ashburner et al., 2000). However, not all processes known to be influenced by the genes were sought, but only those that were annotated to the present set of genes more often than expected for any similarly sized random set of genes. Therefore, to capture biological processes that are particularly relevant to the present gene set, the set of genes was submitted to overrepresentation analysis (ORA; Backes et al., 2007). As intended, this compared the occurrence (as defined by its annotation term) of the particular set of genes covered by a GO term with the number of genes expected to be defined by this term. The significance of the association of a GO term with the expected list of genes was determined by means of a Fisher's exact test (Fisher, 1922). A p-value threshold, t p , of 1 9 10 À6 was applied and subsequently, the obtained results were additionally corrected for multiple testing according to Bonferroni (Hochberg, 1988). The result was the desired 'gene versus biological process' matrix ( Fig. 2) that, rescaled as [0,1] indicating the absence or presence, respectively, of the involvement of a gene in a particular biological process, provided a filtered representation of the particularly important processes in which the analysed genes were involved while disregarding processes that would have been found by chance in any similarly sized gene set. This ORA-based filtering of gene functions was previously found to facilitate the functional analysis of gene sets including a context of pain and analgesia (L€ otsch and Ultsch, 2016a; L€ otsch et al., 2017).

Machine-learned cluster detection
Following the creation of the gene versus functional feature matrix, expressed as 'gene versus biological process' matrix, the feature space required for functional gene clustering was established as D ¼ fðx i Þ j x i 2 N d ; i ¼ 1; . . .; ng comprising the d biological process to which the n genes in the analysed set were annotated. This feature space was searched for a cluster structure. Among several methods available for clustering, a method of unsupervised machine-learning shown recently to provide a viable unbiased clustering of high-dimensional biomedical data, outperforming classical clustering algorithms , was chosen. Specifically, the data space was projected from the high-dimensional feature space D onto a two-dimensional self-organizing map (SOM) of the Kohonen type (Kohonen, 1982). This map was composed of a toroid grid (Ultsch, 2003), i.e. a projection plane where opposite edges are connected. The grid had a size of 25 9 35 artificial neurons chosen according to the proposals of SOM size determination described previously . Each of the artificial neurons held a position vector, which carried the information about the biological processes associated with each gene, and a further parameter, which carried 'weights' of the same dimensions as the input dimensions. The weights were initially randomly drawn from the range of the data variables and subsequently adapted to the data during the learning phase of 25 epochs. The Euclidean distance was used for process (dis-)similarity; very general processes, such as 'biological process' that is the root term of the polyhierarchy carry the same value for all genes and therefore do not influence this distance. Following training of the neural network, an emergent SOM (ESOM; L€ otsch and Ultsch and Sieman, 1990) was obtained that represented the genes as the localizations of their 'best matching units' (BMU), which are neurons that carried the vector most similar to a gene's data vector.
Following the projection of the data on the grid of neurons, an extension of the Kohonen map was applied to obtain clusters of genes. Specifically, the distance structure in the high-dimensional space was visualized using the so-called U-matrix (Ultsch and Sieman, 1990;L€ otsch and Ultsch, 2014). The clusters became visible using a geographical map analogy where 'mountain ranges' represent large distances in the feature space that can be used to visually separate data clusters, whereas low 'valleys' represent sets of genes that are related to similar biological processes and therefore belong to the same cluster. The 'map' was further enhanced by calculating a so-called P-matrix (Ultsch, 2003), which displays the probability of a data point as pðxÞ ¼j fdata points x i j dðx i ; xÞ\ ¼ rg j estimated as the number of data points in a sphere with radius r around x at each grid point on the ESOM's output grid. The calculations were performed using our R library 'Umatrix' (https://cran.r-project.org/package=Umatrix; L€ otsch et al., 2018a).

Functional genomics analysis of most informative genes reported to modulate persisting pain
A second analytical approach at the functional genomics of persisting pain reflected in the set of genes that carry variants reported to modulate the phenotype aimed at identifying the biological roles of those genes that had been implicated most consistently in this context. Specifically, the present analysis aimed at the discovery of new knowledge about pain, rather than about the underlying disease, from reports about a genetic modulation of a clinical symptom involving pain. This implies a distinction between the modulation of pain and the modulation of the disease-causing pain that has usually not been made in the included reports. For example, in rheumatic diseases a genetic variant could modulate the progress and severity of inflammatory processes and via this it could finally modulate pain, or a genetic variant could directly modulate the individual perception of pain and therefore, similar nociceptive stimuli produced by the inflammatory processes could cause different degrees of pain. However, in the interest of the present analysis, we assessed only the direct modulations of pain, which were expected to be provided most likely by those genes that had been implicated in modulation of persisting pain in more than one clinical setting.

Identification of genes most consistently reported to modulate persisting pain
To identify the most informative subset of genes reportedly modulating clinical persisting pain, a cut-off criterion had to be defined for the number of different clinical settings required for inclusion in further analyses. To avoid arbitrary criteria, the cut-off was obtained by applying an item categorization technique used to separate 'the important few' from the 'trivial many' (Juran, 1975). As a most suitable technique, because providing a mathematically based cut-off, computed ABC analysis was chosen for the present purpose, supported by previous demonstrations that it is suitable for item selection tasks in biomedical research (Ultsch and L€ otsch, 2015;L€ otsch and Ultsch, 2017b;L€ otsch et al., 2018b).
ABC analysis requires a set of positive numbers, which was given by the column sums of the 'clinical settings versus genes' matrix ( Fig. 4 top). The vector of the sums of clinical settings with positive reports of the modulatory involvement a gene's variant was submitted to computed ABC analysis (Ultsch and L€ otsch, 2015), which aims to divide a set of positive datahere the set of genes scored according to their involvement in clinical settings Figure 2 Matrix plot of the association of identified genes (rows; Table 1) with biological processes (columns), according to the annotations of the Gene Ontology (GO) knowledgebase (Ashburner et al., 2000), filtered for statistical significance in the present context by means of an overrepresentation analysis with a p-value threshold of 1 9 10 À6 and correction for multiple testing according to Bonferroni. The matrix displays a yes/no scaling [1,0], colour-coded as blue or white, respectively. The figure has been created using the R software package (version 3.3.2 for Linux; http:// CRAN.R-project.org/; R Development Core Team, 2008). Specifically, this plot was created using the 'heatmap.2' functions of the R package 'gplots' (Warnes G. R.; https://cran.r-project.org/package=gplots) with the build-in clustering of the plotting routine disabled (R switches 'Colv = FALSE, Rowv = FALSE'). For descriptions of the GO terms (abscissa), see the AmiGO search tool at http://amigo.geneontology.org/ (Carbon et al., 2009). of persisting pain -into three disjointed subsets called 'A', 'B' and 'C'. Subset 'A' comprises 'the important few', subset 'C' comprises clearly nonprofitable values, i.e. 'the trivial many' (Juran, 1975), whereas subset 'B' includes items that provide still a balance between effort and gain. Therefore, the limit separating subset 'C', i.e. the genes for which a modulation of persisting clinical pain provides the least relevant information, was chosen as the limit for the inclusion of genes in further analyses. These calculations were made using our R package 'ABCanalysis' (https://cran.r-project.org/ package=ABCanalysis; Ultsch and L€ otsch, 2015).

Functional genomics analysis of genes most consistently reported to modulate persisting pain
Following identification of the most informative genes as members of ABC sets 'A' or 'B', the functional genomics picture of persisting pain arising from these genes was analysed. This was obtained by applying ORA as described above, again using a p-value threshold of t p = 1 9 10 À6 with Bonferroni a-correction for multiple testing. The focus of this analysis was, however, the hierarchical representation of the complete knowledge on the biological roles of genes that carry polymorphisms observed to modulate persisting pain phenotypes. This was provided in a directed acyclic graph (DAG; Thulasiraman and Swamy, 1992). In this graph, the top-down, branching polyhierarchy of GO terms starts with the most broadly defined terms and progresses towards the branches, representing GO terms with the narrowest definition (details). These calculations were made using our R package 'dbtORA' (https://github.com/IME-TMP-FFM/db tORA; Lippmann et al., 2018), which has been designed for knowledge discovery in the GO.
As the complete DAG usually contains many GO terms (e.g. 64 GO terms in the present ORA), the information was transformed into a more intelligible form using the method of 'functional abstraction' . This aims to reduce the numbers of GO terms using a heuristic search algorithm that identifies so-called functional areas , which are GO terms that qualify by their informational importance as headlines representing specific aspects (taxonomies) of the complete DAG with maximal coverage, precision, informational value and conciseness .
To narrow the focus even more to pain-relevant genes, the ORA was repeated using the set intersection of the most informative genes identified as described above by means of ABC analysis, with the genes listed among known pain-relevant genes in the PainGenes database (http://www.jbldesign. com/jmogil/enter.html; Lacroix-Fralish et al., 2007). These mainly include genes found, in at least three independent studies in transgenic mice, to contribute to the modulation of pain and identified using PubMed searches, with the addition of further genes (L€ otsch et al., 2013) comprising those causally implicated in human hereditary diseases associated with extreme pain phenotypes (summarized in, e.g. L€ otsch et al., 2017), and genes coding for the targets of approved analgesic drugs or of novel analgesics currently in clinical phases of development (L€ otsch and Geisslinger, 2011). This provided a set of n = 535 'pain genes' (Fig. 1 bottom).

Results
As a result of a literature search, a total of 110 unique genes were identified in eight different clinical settings of chronic central sensitization to pain, including back pain, inflammatory pain, musculoskeletal pain, neuropathic pain, visceral pain, widespread pain, idiopathic pain and miscellaneous pain, for which functional associations of genetic variants with differences in the phenotype of persisting pain had been reported (Table 1). Some of the included studies used a genomewide approach; however, many were candidate gene studies.

Functional clustering of genes carrying variants reportedly modulating persisting clinical pain
In a first analytical approach, functional subgroups were sought in the set of human genes, variants of which have been associated with modulation of the clinical phenotype of persisting pain. A filtered representation of the particularly important processes in which the analysed genes were involved while disregarding processes that would have been found by chance in any similarly sized gene set, was obtained by means of overrepresentation analysis (ORA) of the biological processes to which the genes were annotated in the GO knowledgebase. This identified d = 258 biological processes (GO terms), given the p-value threshold of 1 9 10 À6 and the a-correction according to Bonferroni. One gene was neglected in this analysis, COL6A4P1, the collagen type VI alpha 4 pseudogene 1, because it was not referenced in the GO.
Subsequently, the 109 9 258 sized 'gene to biological process matrix' (Fig. 2) thus obtained was analysed for functional subgroups of genes. Using unsupervised machine-learning implemented as self-organizing artificial neuronal network of the Kohonen type enhanced by the U-matrix (i.e. an emergent self-organizing map, ESOM; Ultsch and Sieman, 1990;L€ otsch and Ultsch, 2014), the highdimensional data space was projected onto a twodimensional toroid grid. On this grid, the U-matrix was visualized by applying a geographical landscape analogy (Fig. 3) providing a visual structure that could be employed for clustering of genes. The results indicated a large cluster comprising more than half of the genes (n = 58). However, this cluster was still composed of functionally very different genes, pointing towards a large heterogeneity of the genes chosen as candidate modulators of persisting pain in the different clinical studies, or was found with genomewide association studies (GWAS) without a focus on a particular gene. In addition, six smaller clusters were suggested, but their distinct separation was occasionally incomplete or, according to the U-matrix landscape analogy, they were not clearly presented as 'valleys' but consisted merely of 'mountain' zones separated by slightly higher ridges (Fig. 3). In ESOM/U-matrix based clustering, this indicates rather large intracluster distances.
As the present method has been shown to be well able to identify existent cluster structures while being unlikely to show false clusters (Ultsch, 2005;Ultsch and L€ otsch, 2017), the main result of this analysis was that there is considerable heterogeneity among the genes reported to be involved in persisting pain without a clear functional focus on common general processes underlying this trait.

Functional genomics analysis of most informative genes reported to modulate persisting pain
In a second analytical approach, the hypothesis was pursued that the functional genomics of persisting pain will prevail across clinical settings irrespective of the disease that had originally triggered the process. However, the set intersection of the genes associated with any of the eight different clinical settings was empty. Nevertheless, the hypothesis that modulatory effects on chronic central sensitization to pain, rather than on the underlying disease-causing pain, are exerted by the same genes across several clinical settings could be pursued in several genes have been shown to be involved in more than one clinical setting of chronic central sensitization.
To identify how many clinical settings qualify as a cut-off, an ABC analysis was applied to the sum of settings in which each of the 110 genes was involved (column sums in the matrix plot in Fig. 4, top). This assigned 22 genes to ABC sets 'A' or 'B' ( Table 2) that were included in further analysis as they could be regarded as best suited to perform an evidencebased functional genomics analysis of chronic central sensitization to pain, considering the difficulty of separation between a modulation of the pain-causing disease from a modulation of the perception and processing of chronic nociceptive input when a gene was involved only in a single clinical setting of persisting pain.
ORA identified 64 GO terms associated with this particular subset of genes more often than expected by chance, given the chosen p-value thresholds of 1 9 10 À6 and correction for multiple testing according to Bonferroni. These terms provided a functional genomics perspective of the genes with variants Figure 3 Result of a projection of the genes carrying variants reportedly modulating persistent pain onto a self-organizing map (SOM) of the Kohonen type (Kohonen, 1982). Following projection of the genes on the grid of neurons, based on their functional annotations in the Gene Ontology knowledgebase (Ashburner et al., 2000) (n = 109; one of the 110 genes in Table 1 was not annotated in the GO), the distance and density structures in the highdimensional space were visualized using the so-called U*-matrix (Ultsch, 2003). Specifically, a trained SOM represents a topology preserving mapping of n high-dimensional data points x i ε D, where D denotes the data space, onto a two-dimensional grid of neurons. A neuron n and the neurons in its neighbourhood N(n) on the output grid of the SOM represent points in the data space. The sum of distances between n and N(n) in the high-dimensional space, combined with the respective density probabilities, is shown on a U*-matrix as a height value (U-height) at neuron n. Large U-heights mean that there is a large gap in the data space. Low U-heights mean that the points are close to each other within the data space . The dots indicate the so-called best matching units (BMUs) of the SOM, which are those neurons whose weight vector is most similar to the input, i.e. the representation of the vector of the annotation of genes to GO terms. The BMUs are coloured according to the obtained clustering of the data space and labelled with the respective gene symbols. The cluster structure emerged from visualization of distances and density structures between neurons in the high-dimensional space by means of a U*-Matrix (Izenmann, 2009). Top: here, the genes represented by the BMUs are annotated. Bottom: 3D-display of the U-matrix in which the 'valleys', 'ridges' and 'basins' can be seen. Valleys indicate clusters of functionally similar genes based on the significant GO term annotations. The figure was created using the R software package (version 3.3.2 for Linux; http://CRAN.R-project.org/; R Development Core Team, 2008) using our R library 'Umatrix' (https://cran.r-project.org/package=Umatrix; L€ otsch et al., 2018a).
shown to modulate the persisting pain sensitization phenotype in different clinical settings. Their graphical representation visualized their arrangement in the GO polyhierarchy (Fig. 5). Following application of functional abstraction , the main biological functions of the 22 gene carrying variants modulating persisting pain, were grouped around seven centres of biological activity or functional areas (Table 2; Fig. 5).
A first functional area to emerge was 'immune system process', represented in this particular gene set as an important common biological function (Fig. 5 middle left). The most significantly associated immune regulatory processes were 'regulation of immune effector processes' (GO:0002697, p = 2.8 9 10 À8 ) and 'positive regulation of lymphocyte mediated immunity' (GO:0002708, p = 2.9 9 10 À7 ). A second functional area centred on 'reactive oxygen species metabolic process' (GO:0072593, p = 4.5 9 10 À11 ) and comprised mainly nitric oxide signalling related processes (Fig. 5 middle right) such as 'nitric oxide biosynthetic process' (GO:0006809, p = 3.2 9 10 À12 ) and its regulation.
The main results, i.e. the most specific functional areas pointing at immune processes and nitric oxide Figure 4 Number of clinical settings in which variants in the respective genes have been reported to be associated with modified phenotypes of chronic pain. Left: Matrix plot of the clinical settings (rows) versus the genes (n = 110, columns split in two halves, sorted for column sums in descending order). The column sums are displayed in the bottom row below the matrix. The numbers are displayed colour-coded with 0 = white, 1 = blue, >1 darker blue). Right: ABC plot of the cumulative distribution function of the sums of clinical settings in which the genes were reportedly involved (column sums in the top matrix; bottom line). The ABC set limits are indicated as red lines (for further details about an ABC analysis, see (Ultsch and L€ otsch, 2015)). The figure was created using the R software package (version 3.3.2 for Linux; http://CRAN.R-project.org/; R Development Core Team, 2008). Specifically, the matrix plot was created using the 'heatmap.2' functions of the R package 'gplots' (Warnes G. R.; https:// cran.r-project.org/package=gplots), with the build-in clustering of the plotting routine disabled (R switches 'Colv=FALSE, Rowv=FALSE'). The ABC curve was drawn using our R library 'ABCanalysis' (https://cran.r-project.org/package=ABCanalysis; Ultsch and L€ otsch, 2015). signalling as key biological processes involved in persisting pain across more than a single clinical setting, prevailed when narrowing the gene set further on those that are also listed in the PainGenes database (Fig. 6). As the set intersection between the 22 genes identified above and the genes of the Pain-Genes database included only n = 13 genes, the ORA applying the same statistical thresholds resulted in fewer additional significant GO terms (Fig. 6). Table 2 Genes for which functional involvement of their variants has been reported in more than one clinical setting of persisting pain (number of these clinical settings given in the third column) were assigned to ABC sets 'A' or 'B' (Figure 4), comprising the most profitable information for an evidence-based functional genomics analysis of persisting pain.
Interleukin-1 receptor antagonist Transforming growth factor, beta 1 Purinergic receptor P2X, ligand gated ion channel, 7 2 X X X X X X X

Discussion
The present analysis used empirical evidence for functional human genetic variants to approach the genetic architecture of persisting pain. Although the evidence was collected from separate studies, its combination permitted a limited genomewide association analysis of the trait. Methods for data mining and machine-learned knowledge discovery were applied to publicly available databases in order to relate knowledge, acquired in the context of clinical pain research, on genes that modulate the phenotype of persisting pain with data on the biological functions of these human genes acquired in any context, without restriction to pain research (L€ otsch et al., 2013). The initial analysis of the whole set of genes showing positive results from clinical pain association studies indicated that, apart from a minority of genes that could be topically grouped such as interleukin or histocompatibility complex-related genes, most genes displayed very heterogeneous functions and the analysis did not illuminate the pathophysiology of persisting pain beyond the functions of the single genes. This was probably due to the fact that data were drawn mainly from candidate gene or GWAS approaches. This selection probably introduced a research bias by (1) addressing genetic modulators in the context of the underlying disease and (2) including a selection of genetic markers that mimic other successful reports of comparable studies.
The situation became clearer when the analytical focus was narrowed to genetic modulations consistently observed across several clinical conditions with potential underlying painful diseases. This reduced the analytical bias generated by genetic modulations responsible for a specific pain-causing disease and is in keeping with the contemporary approach to persisting pain as a distinct condition of central sensitization to pain and not merely a symptom of another underlying chronic disease. Consequently, it would be expected that the trait is modulated by specific genes which should be reflected by observations on its modulation in clinical research. The mathematically precise calculation provided by the ABC analysis (Ultsch and L€ otsch, 2015), developed in order to select the most promising or profitable items from a larger set of items, resulted in identification of a set of 22 genes which could be then be assessed in a computational functional genomics analysis of persisting pain.
A major finding of this analysis of available evidence on genetic modulation of persisting pain was the particular importance of two groups of biological processes indicating involvement (1) of the immune system and of (2) nitric oxide signalling in persisting pain. Involvement of both processes is biologically highly plausible; however, their emergence as major process groups from a functional genomics analysis of data from clinical genetic research on persisting pain was not anticipated. Specific roles for the present subset of 22 genes, with repeated evidence for involvement in persisting pain, were exhibited by the 12 genes annotated as 'immune system process' (Table 1). This subset included interleukin (IL-1B, IL-4, IL-6, IL-10) (Dinarello, 1994;Choi and Reiser, 1998;Mocellin et al., 2004;Nemeth et al., 2004), interleukin receptor (IL-1RN, IL-4R) (Bittar and Bittar, 1996) and histocompatibility complex-related (HLA-B) genes (Dupont and Ceppellini, 1989), which have been shown to be involved in immunological mechanisms of pain (Sato et al., 2002;de Rooij et al., 2009). This is also supported by published evidence for the further genes in this list, such as, TNF (Vassalli, 1992;Franchimont et al., 1999), TGFB1 (Li et al., 2006), GCH1 (Schott et al., 1993), P2RX7 (Schwartz et al., 2009) and SOD2 (Wells et al., 2003). The second major process group emerging from the functional genomics analysis of the key evidence for genetic modulation of clinical persisting pain was nitric oxide signalling, in particular metabolic processes, summarized in this context under the GO term 'reactive oxygen species metabolic process' which includes the genes IL-6 (Deakin et al., 1995), TNF (Deakin et al., 1995;Katusic et al., 1998), ESR1 (Clapauch et al., 2014), IL-10 (Cattaruzza et al., 2003), TGFB1 (Saura et al., 2005), GCH1 (Katusic et al., 1998;Zhang et al., 2007), IL-1B (Katusic et al., 1998), IL-4 (Coccia et al., 2000, P2RX7 (Gendron et al., 2003), SOD2 (Fridovich, 1978). It is widely accepted that nitric oxide (NO) is critically involved in persisting pain (Chung, 2004). It has been shown that NO is produced in the spinal dorsal horn neurons in response to extensive nociceptive inputs and then it diffuses out and increases neurotransmitter release from primary afferent terminals, thereby contributing to central sensitization and persisting pain (Lin et al., 1999). Recent findings seem to indicate that not only NO is a mediator of persisting pain that accompanies inflammation, other reactive oxygen species like superoxide (SO) might also participate in persisting pain (Schwartz et al., 2008). Kim and colleagues found that NO and SO contribute to persisting pain via two separate and independent pathways and a recent study has shown that capsaicin-induced hyperalgesia is a consequence of superoxide build-up in spinal dorsal horn neurons. Superoxide dismutase (SOD-2) encoded by gene SOD2 is a major p-value threshold, t p = 1 9 10 À6 and Bonferroni a correction) of 22 genes ( Table 2) carrying variants reported to modulate the persisting pain phenotypes in different clinical settings (ABC sets 'A' or 'B') versus all human genes. A top-down representation of the annotations (GO terms) is shown representing a systems-biology perspective of the biological processes modulated by this set of genes. Each ellipse represents a GO term. The graphical representation follows the standard of the GO knowledge base, where GO terms are related to each other by 'is-a', 'part-of', and 'regulates' relationships forming a branching polyhierarchy organized in a directed acyclic graph (DAG; Thulasiraman and Swamy, 1992). Top: Significant GO terms are shown as coloured ellipses with the number of member genes, the number of expected genes by chance and the significance of the deviation in the observed from the expected number of genes indicated. The biological processes in which the present n = 22 genes are involved can be summarized by seven primary 'functional areas' or headlines presenting particular aspects (taxonomies) of the complete polyhierarchy at maximum coverage, precision, informational value and conciseness . The ellipses are colour-coded using yellow for a 'headline'; i.e. a GO term that by its location in the polyhierarchy may serve as headlines for a branch of the hierarchy, red for significantly overrepresented terms and white for nonsignificant terms that need to be displayed to preserve the polyhierarchical structure of the DAG. Blue vertices or blue labels are the most specific terms (leaves of the DAG) at the end of a taxonomy (branch) in the polyhierarchy. Bottom: The GO terms (biological processes) taken from the functional areas, shown above in the DAG, are shown with larger fonts for better readability. The figure was created using the R software package (version 3.3.2 for Linux; http://CRAN.R-project.org/; R Development Core Team, 2008) and our R package 'dbtORA' (https://github.com/IME-TMP-FFM/dbtORA; Lippmann et al., 2018).  (Table 2), versus all human genes, that (1) carry variants reported to modulate the persisting pain phenotypes, (2) belonging to the subset of 22 of these genes that are supported by evidence that this modulation applies to several clinical settings (ABC sets 'A' or 'B') and (3) were also part of a set of 535 genes, based on current pain research data, that can be considered, with their products, to be involved in the pathophysiology of pain (Ultsch et al., 2016). Left part: The Venn diagram (Venn, 1880) shows the set intersections on which this selection of 13 genes is based. Right part: Top-down representation of the annotations (GO terms) representing the systems-biology perspective of the biological processes modulated by this set of genes organized in a branching polyhierarchy forming a directed acyclic graph (DAG; Thulasiraman and Swamy, 1992). The figure represents the results of an overrepresentation analysis with parameters for p-value threshold, t p = 1 9 10 À6 and Bonferroni a correction. The biological processes in which the n = 13 genes are involved can be summarized by three primary 'functional areas' or headlines presenting particular aspects (taxonomies) of the complete polyhierarchy at maximum coverage, precision, informational value and conciseness . The ellipses are colour-coded using yellow for a 'headline'; i.e. a GO term that by its location in the polyhierarchy may serve as headlines for a branch of the hierarchy, red for significantly overrepresented terms located in the polyhierarchy below a headline and white for nonsignificant terms that need to be displayed to preserve the polyhierarchical structure of the DAG. Blue vertices or blue labels are the most specific terms (leaves of the DAG) at the end of a taxonomy (branch) in the polyhierarchy. Bottom: The GO terms (biological processes) taken from the functional areas, shown above in the DAG, are shown with larger fonts for better readability. The figure was created using the R software package (version 3.3.2 for Linux; http://CRAN.R-pro ject.org/; R Development Core Team, 2008) and our R package 'dbtORA' (https://github.com/IME-TMP-FFM/dbtORA; Lippmann et al., 2018). The Venn diagram was drawn using the R library 'Vennerable' (Swinton J., https://r-forge.r-project.org/R/?group_id=474).
determinant suggesting a therapeutic potential for the manipulation of spinal SOD-2 activity in pain conditions (Schwartz et al., 2009). The role of cytokines was further highlighted by further restricting the gene subset most consistently related to persisting pain by identifying the appearance of the relevant gene in an independently created list of 535 so-called pain genes (Ultsch et al., 2016). These were genes relevant to pain listed by several sources: mainly the Pain Genes Database (http://www.jbldesign.com/jmogil/enter.html (Lacroix-Fralish et al., 2007). The overlap of the set of the 22 genes, with repeated records of variants that modulate the clinical phenotype of persisting pain, with the 535 'pain genes' comprised 13 genes (ADRB2, COMT, ESR1, GCH1, IL-10, IL-1B, IL-4, IL-6, P2RX7, SCN9A, SOD2, TNF, TRPV1). After performing a similar ORA (Fig. 6) as that described above, these 13 genes were found to be mainly involved in cytokine production, covered by the significant GO terms 'chemokine metabolic process' (GO:0050755, p = 3.3 9 10 À7 ), 'nitrogen compound metabolic process' and again 'response to stimulus', the most significant term being 'nitric oxide biosynthetic process' (GO:0006809, p = 1 9 10 À14 ).
The involvement of the immune system in persisting pain is plausible from a biological perspective. One of the sites of interaction of the immune system with persisting pain has been identified as neuroimmune crosstalk at the glial-opioid interface (Tian et al., 2012). Previous research has shown that glial and immune cells, including astrocytes, microglia/ macrophages, as well as T lymphocytes, are key cells activated during persisting pain, which contribute to pain persistence (Calvo et al., 2012;von Hehn et al., 2012). A role for local production of cytokines in the central nervous system during inflammatory conditions associated with persisting pain, such as rheumatoid arthritis  or fibromyalgia , as well as evidence for central nervous system sensitization by cytokines (Aden et al., 2010) also suggests such an immune system interaction with persisting pain states. Indeed, it has recently been suggested that the predominance of pain sensitization during chronic diseases in women is closely linked to the effects of female sex hormones on the neuroimmune system (Rosen et al., 2017).
The present results clearly support the modulation of neuroimmune system processes as a promising strategy in the development of novel analgesic drugs against persisting pain. This may be possible along several lines. For example, involvement of glial cells in opiate actions has been shown recently (Chen et al., 2010;Boue et al., 2012). Consequently, the elucidation of pain-and opioid-induced mechanisms at the level of glial and immune cells could lead to improvement of pain management. In an animal model, ibudilast, a nonselective phosphodiesterase-inhibiting, anti-inflammatory drug that also blocks glial activation probably via antagonism at the Toll-like receptor 4 (Jia et al., 2012), restored morphine-induced antinociception following tolerance development (Lilius et al., 2009). Similarly, minocycline, a tetracycline that inhibits microglial activation and proliferation, also seems to attenuate morphine tolerance in mouse models of neuropathic pain (Chen et al., 2010). Hence, increasing evidence points towards the immune system as a potential source of future targets for analgesic drugs directed against persisting pain.
By gathering the relevant reports, the present analysis centres on the current evidence about a genetic modulation of persisting pain on a gene level, without going into the details of single nucleotide polymorphism level as usually applied in review papers. However, the approach was centred on machinelearned knowledge discovery from the gathered evidence and was based on published evidence gathered from studies in which the authors had used a candidate gene approach or had performed a GWAS without a gene-specific hypothesis. Therefore, the present analysis implies a research bias given by the original hypotheses or on the inclusion of frequently addressed genetic variants in the analysed studies. The question addressed with the present analysis was about the greater functional perspective emerging from successful clinical studies of the genetic modulation of persisting pain. Importantly, while the analysis is biased with respect to the gene selection, made by the authors of the included studies, its results are not biased for a particular functional genomics perspective as this had not been an gene inclusion criterion in the analysed studies. Nevertheless, the present selection of repeatedly reported associations implies an advantage of frequently included genes that have been attracted research interest through the last several years, such as OPRM1, GCH1 or COMT.
With the caution advised by the implicit research bias regarding the gene selection, the results of the present analysis were (1) unexpected considering that hypotheses about the involvement of immune system processes or of nitric oxide signalling were not preformulated for the present analysis and (2) biologically plausible and completely compatible with current research activities on persisting pain in the light of increasingly acknowledgement of an involvement of immune processes that has attracted European Pain Federation EFIC â concerted research activities (Kringel and L€ otsch, 2015), including that on the role of the glial-opioid interface in persisting pain (http://gloria.helsinki.fi). Hence, although the evidence was generated in separate studies, the combination of positive findings of a genetic modulation of persisting pain allows a limited yet valid genomewide analysis of the trait, providing a more comprehensive picture of functional background of persisting pain from genomics perspective than associations of single genotypes.

Conclusions
While many studies have focused on particular genes, the present analysis pursued the question whether their combined results may provide more complex insights into the pathophysiology of persisting pain in humans. In keeping with the contemporary trend towards 'big data' analysis in biomedical research, the current empirical data on modulation of persisting pain via human genetic polymorphisms have been subjected here to a computational functional genomics analysis. This evidence was then analysed for emergent, principal pathophysiological processes that characterize persisting sensitization to pain. Analysis of 110 unique genes, with variants that have been reported to modulate the clinical phenotype of persisting pain, led to the selection of functionally heterogeneous genes. By focusing on genes that have been repeatedly associated with modulation of persisting pain phenotypes in several clinical settings, a clearer picture emerged of the main processes identified by current human genetics research on persisting pain. A mathematically supported, precise selection of a subset of genes was possible using a computational functional genomics approach. On this basis, including the research bias of current clinical genetic association studies, the evidence gathered so far points to the view of persisting pain as a trait resulting from alterations in the immune system and/or in nitric oxide signalling, a concept that is biologically highly plausible and agrees with other lines of pain research. While analysing existing evidence and therefore limited to previously shown functional pathways, the present computational functional genomics-based approach provides a computational systems-biology perspective on chronic sensitization to pain by summarizing the empirical evidence gathered in many separate studies. Moreover, human genetic research on persisting pain emphasizes the immune system as a potential source of important future targets for analgesic drugs directed against persisting pain and demonstrates that contemporary machine-learned methods offer innovative approaches to knowledge discovery from previous evidence.

Author contributions
JL, DK and EK conceived and designed the analysis. JL, CL and DK analysed the data. JL, DK, EK and MJP wrote the article. JL, AU, DK and EK involved in discussion of methods and results.