Amyotrophic lateral sclerosis (ALS), also known as Lou Gehrig's disease or Charcot disease, is the most common form of adult onset motor neuron degeneration, with a peak age of onset in the sixth or seventh decades [Valdmanis et al., 2009; Van Damme and Robberecht, 2009]. The prognosis is very poor with median survival of about three years from symptom onset. The causes of ALS are gradually being identified, but despite extensive research and rapidly increasing knowledge of the disease mechanisms, there remains no cure [van Es et al., 2010]. A disease-modifying drug treatment exists in the form of riluzole, a benzothiazole derivative that has a modest effect on survival. Noninvasive ventilation also extends survival and improves quality of life [Wijesekera and Leigh, 2009].
Although ALS is generally considered a single disease entity, there are various classifications based on genetic and phenotypic patterns, and it is probably more appropriate to consider it a syndrome of motor neuron degeneration with multiple causes. Advances in technology mean there is an enormous increase in the volume of research data produced, and a corresponding need for storage, analysis, and interpretation, particularly as our understanding of the relationships between genotype and phenotype mature.
An effective method for the curation and organization of such data is the online database [Lill et al., 2011]. Mutation databases of human genes are now highly prominent in all areas of healthcare, and experts in genetic diseases may curate published and unpublished mutations in locus-specific databases (LSDB) [Claustres et al., 2002; Fokkema et al., 2011]. These databases of Mendelian disease mutations play a fundamental role in research, diagnostic, and genetic healthcare [George et al., 2008]. Although the original role of databases was the simple storage of data, modern databases often have a related computational, bioinformatics, or analytical role that encourages interpretation of data.
Here, we present such a system in the form of the ALS Online Database (ALSoD) as seen in Figure 1: Homepage of ALSoD (http://alsod.iop.kcl.ac.uk). This freely available database has been transformed from a single gene storage facility recording mutations in the SOD1 gene to a multigene ALS bioinformatics repository and analytical instrument combining genotype, phenotype, and geo-graphical information with associated analysis tools.
ALSoD is not only a central repository for storing genetic information on the more than 100 ALS-related genes reported to date, but also shows graphs of the gender, age of onset, phenotype, and family history distributions of patient data stored on the database broken down by gene, mutation, or phenotypic group. The various analytical devices include a comparison tool to evaluate genes side by side or jointly with user configurable features, a pathogenicity prediction tool using a combination of computational approaches to distinguish variants with nonfunctional characteristics from disease-associated mutations that have more dangerous consequences, and a credibility tool to enable ALS researchers to objectively assess the evidence for association with ALS. Furthermore, integration of external tools, systems for feedback and annotation by users, and two-way links to collaborators hosting complementary databases further enhance the functionality of ALSoD.
Database Structure and Overview
Funding and Sponsorship
The ALSoD database is a joint project of the World Federation of Neurology and European Network for the Cure of ALS, and is funded through grants from the ALS Association, Motor Neurone Disease Association, ALS Canada, ALS Therapy Alliance, and MNDA Iceland.
The database schema now allows for flexibility and expansion because of changed table designs, rewritten queries, and implementation of stored procedures. Redundant tables have been removed, and a more supple structure is in place. The original database has been archived. New genes have been added to the tables, and a facility to easily add further genes designed. ALSoD now permits only registered users to submit novel gene, mutation, and patient data, and this is regularly validated by an ALS expert.
A facelift was given to the Web page and some pages were redesigned for better visual representation of data. The Graphical User Interface allows data to be interpreted and viewed at a glance instead of using the tabular format of viewing data.
Collaborations and Embedded Tools
ALSoD uses third party open source bioinformatics tools to embed computational analysis within the database using Java applets. For example, in Figure 2, a screenshot of the Multiple Alignment and Mutations on SOD1 gene using a combination of Claustalw and Jalview [Waterhouse et al., 2009] is used to provide multiple sequence alignments in other species for selected genes. GeneMANIA [Warde-Farley et al., 2010] allows users to select genes of interest for prediction of interactions. A Google Earth API is used for viewing maps of mutation, risk, and exposure distributions. Because many ALS gene variants are found in both familial and apparently sporadic ALS, a two-way link out to the ALSGene database provides evidence of association to complement the genotype–phenotype correlation available from familial ALS information in ALSoD [Lill et al., 2011]. A similar link out to fALS Connect, which is a collaboration between multiple interested agencies in the United States, including the patient organization The ALS Association and the research group The Northeast ALS (NEALS) Clinical Trials Consortium, makes ALSoD relevant for patients and carers as well as the scientific community. The database is adopted into the Human Variome Project (http://www.humanvariomeproject.org) and the GWAS Phenomap Project (http://www.gwascentral.org/gwasphenomap).
Integrated Bioinformatics Links
To avoid bias, users can retrieve gene-specific information through external links which have been programmed automatically for each gene, and which open in new windows. Unique identifiers are utilized by systematically linking to broad databases and bioinformatics tools freely available online. The scientific and nonscientific external links integrated into ALSoD include HGNC [White et al., 1997], Entrez Gene [Maglott et al.], UCSC Browser [Fujita et al.], Protein Structure [Rose et al.], OMIM [Amberger et al., 2011; Amberger et al., 2009], Genecards [Safran et al.], ProtScale [Gasteiger et al., 2005], KEGG [Kanehisa et al., 2000], Uniprot [Jain et al., 2009], iHop [Hoffmann and Valencia, 2004], Pathway in KEGG [Kanehisa et al., 2000], GeneTest [Pagon et al., 2002], AmiGO [Carbon et al., 2009], Ensembl [Hubbard et al., 2009], NCBI [Sherry et al., 2001], Life Science DB (Japan) [Yoshida et al., 2010], ALSGene [Lill et al., 2010], GeneWiki [Huss III et al., 2008], WolframAlpha (Maret), and WikiGenes [Hoffmann, 2008].
Feedback is gained in two main ways: a Facebook page for ALSoD http://www.facebook.com/srch.php#!/pages/ALSoD/307667685943735, and a direct feedback page on the ALSoD Website. Comments are publicly displayed and a reCAPTCHA tool displays texts readable only by human users to prevent spammers from infiltrating the system. A news page generates automated summaries of ALS genetics news; and surveys conducted through the freely available online survey tool “SurveyMonkey” are embedded in the user interface.
In addition, by tracking the registered country of origin of page viewing and download requests, accessibility of the ALSoD database to the international ALS community can be monitored directly.
ALSOD v0.1 Beta
First online in 1995 ALSOD (as formerly known) which was hosted at www.alsod.org was developed to store genetic and clinical information and to assist researchers in identifying correlations between phenotype and genotype in ALS for SOD1 mutations. The data available in the database were purely for the SOD1 gene as this was the only available familial gene linked to ALS at the time [Radunovic and Leigh, 1999].
In 1999, the database was first fully functional and available for the research community.
In 2008, about 100 different mutation points across the SOD1 sequence with corresponding clinical information were collated. Genetic mutations of the SOD1 protein were linked to the hypothetical 3D mutant structure hosted on a University College London server developed by Andrew Martin's team [Wroe et al., 2008]. Fifty users from 17 institutions registered with ALSOD to submit ALS patient and mutation data. Ninety-seven familial individuals with 122 mutation data on SOD1 were stored.
The website was relocated to http://alsod.iop.kcl.ac.uk/als following loss of the alsod.org domain. Data could be downloaded freely and the database queried to look for a specific mutation type in four ALS genes (SOD1, ALS2, VAPB, NEFH) or for specific information on patient data.
ALSoD v 3.0 Current Structure
ALSoD is now a relational database with a massive increase in available data through submissions by researchers and regular update by the database curators. The schema has been redesigned for uncomplicated future addition of familial and sporadic ALS patient data, associated mutations, and published ALS genes.
More than 100 ALS-related genes have now been added to the database with a current total of 431 mutations (195 pathogenic) and 589 patient data. Fifteen of the mutations are unpublished except in ALSoD. ALSoD Web pages have been visited over 280,000 times since 2009 by more than 22,900 unique visitors from 140 countries. There are 26 registered contributors excluding those from the host institution. Thirty-three different publications have cited functionalities or updates available on ALSoD.
Examples of Functionality
Users are able to summarize the relationship between mutational data and clinical patient data visually. On every gene overview page the total number of mutations and patients collated from publications is displayed, as well as associated phenotypic information such as the limb to bulbar ratio, age of onset as a box plot, male to female ratio, and familial to sporadic ratio. Key publications for each gene are listed and can be sorted by name of first author, year, or title. A section for genetic variations with their base pair positions, associated statistical results, author, year, and title of publication (where available) are shown in tabular format.
The data stored on ALSoD are also available through statistical reports (Fig. 3) via the reports tool at http://alsod.iop.kcl.ac.uk/Statistics/report.aspx. For example, at the moment this shows the average age of onset for ALS for all deposited genes is 45 years, with a range from 1 (ALS2) to 67 years old (CRYM). The top 20 most frequent mutations are shown with links to more details on each. The first six are SETX: Leu389Ser recorded in 38 out of 54 patients, FUS: Arg521His, recorded in 33 out of 72 patients, SOD1: Leu144Phe, recorded in 26 out of 194 patients, UBQLN2: Pro497His, recorded in 19 out of 35 patients, VAPB: Pro56Ser, recorded in 18 out of 19 patients, and TARDBP: Ala382Thr, recorded in 11 out of 75 patients.
If a mutation is subsequently reported in another individual, this can be seen at http://alsod.iop.kcl.ac.uk/Statistics/pathogenicity.aspx. This is important as it provides stronger evidence of pathogenicity. For example, the Arg521Cys mutation in the FUS gene originally reported from France has more recently been found in Italy. There is currently no system for flagging mutations subsequently found in controls.
Where there are sufficient data that genotype–phenotype correlations are possible, a comparative study of selected genes can be performed on the detailed analysis page (Fig. 4) at http://alsod.iop.kcl.ac.uk/Statistics/analysis.aspx. A user-configurable form appears for users to choose two genes to compare, and if needed, the query can be restricted to patients within a particular age of onset range. For example, we can compare available data for TARDBP and FUS. The analysis shows that 73% of patients with TARDBP mutation have limb onset, whereas for FUS this is 78%; 63% of those with TARDBP mutation are male, compared with 55% for FUS, and there is a family history of ALS in 65% of those with TARDBP mutation compared with 85% of those with FUS mutation. The mean age of onset for patients with TARDBP mutation is higher at 55 than those with FUS mutation at 46 years. Geographical data are also deposited, showing in this case that most recorded TARDBP mutations have come from Italy (32%) but 11 countries have reported cases, with mutations found in Europe, North America, and Asia. For FUS, the situation is quite different, with just five countries reporting mutations, most of them from Belgium (43%).
A more detailed tabular format of the charts displayed is shown at the bottom of the analysis page including the author generating the data, linked to the PubMed abstract using pubmed ID. This is particularly useful for researchers who would like to verify the information presented.
Scenario: A researcher is interested in understanding how SOD1 mutations relate to ALS, and in particular, if there are any codons that are more likely to be mutated than others.
At the homepage the researcher can see that SOD1 is listed as causative for ALS with a description of ALS1. Selecting this gene by clicking “Select” brings up an overview page. The header lists the gene, name and alternatives, the gene inheritance category (in this case familial ALS genes also found in sporadic ALS), the category of gene function (oxidative stress), the locus, two one-sentence summaries, and the total number of mutations recorded with the total number of patients with genotype data. In this case, there are 165 mutations in 264 individuals. Below this is a graphic of the clinical presentation showing 93.4% of those with phenotype information had limb onset, a pair of graphs showing the distribution of age of onset, and two pie charts showing the male–female ratio and the proportion of familial and sporadic cases. The raw numbers are given in a table below the graphic. There are then a number of bioinformatics links and a literature review. Finally, a list of references is given for every deposited mutation.
Clicking the diagram link takes the researcher to a diagram of the first 106 pathogenic nonsynonymous mutations. The Google Map link presents a Google Earth display with the country of origin of each mutation highlighted.
The researcher is particularly interested in the A4V mutation. Using the analysis tools to predict pathogenicity, the researcher finds that A4V is predicted to be pathogenic, and the graphics below show that it has only been reported in familial ALS, with a mean age of onset of 47 and origin in the USA, Sweden, and Canada.
Comparison of the SOD1 gene with TARDBP shows that bulbar onset is much commoner for TARDBP at 27%, the mean age of onset is higher and the proportion with no family history of ALS is also higher. Both genes are found mutated in many countries.
Analysis of the interaction networks of the two genes (under Analysis, Interactions) shows there are only a few links between the networks. Adding in FUS shows it to be a close interaction partner of TARDBP but not of SOD1.
The ALSoD database collects genotype and phenotype information on ALS genes directly deposited by researchers or reported in publications, and is one of the oldest such databases in continuous use. Through links and collaborations with other databases such as ALSGene, and through analysis tools built into ALSoD, it is able to reveal patterns in data that would not otherwise be visible, and acts as a continuous review of ALS genetics and the correlation of genotype with phenotype.
A few issues with the original design meant that registration and access were cumbersome, discouraging users from registering. In addition, the database relied completely on the goodwill of the research community for updates, there was no easy way to include new genes, and there was no easy way to incorporate the latest advances in bioinformatics. These issues have all been addressed in modernizing ALSoD over the last few years.
In the field of ALS, as in other areas, available genetic data, phenotypic classifications, and relationships between mutations and clinical presentation have grown rapidly, and it can be difficult for those not involved in genetics on a daily basis to keep up. ALSoD fulfills an essential role in integrating and collating this otherwise overwhelming dataset into an understandable and manageable form.
A further difficulty faced by anyone interested in ALS genetics is that the number of putatively associated genes is far larger than the number of genes most people would accept as involved. ALSoD helps with this indirectly because the amount of available data on a particular gene corresponds with the amount of research and, therefore, to some extent, with the credibility of that gene as an ALS gene. We plan to formalize this in the future with an algorithm designed to objectively rank genes by the level of evidence supporting their role in ALS. This approach is already taken by databases such as ALSGene in association studies [Lill et al., 2011], and implementing it for familial ALS genes and for genotype–phenotype correlations is a logical extension of this process. More bioinformatics hyperlinks and integration with websites for the use of clinicians and nonclinicians, scientists, and the lay public are also under development [Wroe et al., 2008]. These include a section to deal with high throughput sequencing data. Data supporting a causative gene variation can be dealt with in the existing infrastructure, but what is more difficult is managing sequence data without clear annotation on the likely relationship of any variants seen with ALS. We will develop a tool for storage and display of such data to facilitate the identification of common patterns once sufficient sequences are deposited.
Feedback from users has suggested that extension of ALSoD to include integrated information from other species such as mouse and drosophila would be welcomed. We will begin the process of linking to the relevant external databases and collating published evidence. Once we have a critical mass of information, we will seek collaboration with experts in each field to help with analysis and presentation of data.
Having grown from a single gene mutational database, the ALS Online Database is now a major repository for all ALS genes with multiple tools for researchers familiar with ALS genetics and summaries for those wishing to keep up with genetic advances. With the support and feedback of the ALS research community, it will continue to develop and expand providing a continuous review of ALS genetics.
We are especially grateful for the long-standing and continued funding of this project from the ALS Association and the MND Association of Great Britain and Northern Ireland. We also thank ALS Canada, MNDA Iceland, and the ALS Therapy Alliance for support. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health. Aleks Radunovic, Nigel Leigh, and Ian Gowrie originally conceived ALSoD. ALSoD is a joint project of the World Federation of Neurology (WFN) and European Network for the Cure ALS (ENCALS).
Conflicts of Interest: A.A.C. is on the Scientific Advisory Board of the ALSGene database, the Biomedical Research Advisory Panel of the Motor Neurone Disease Association, and is a consultant for Biogen Idec and Cytokinetics. He receives royalties for the books “The Brain: A Beginner's Guide” (Oneworld Publications) and “The Genetics of Complex Human Diseases” (Cold Spring Harbor Laboratory Press).