LETTERS TO THE EDITOR
Reactome – a curated knowledgebase of biological pathways: megakaryocytes and platelets
Article first published online: 30 OCT 2012
© 2012 International Society on Thrombosis and Haemostasis
Journal of Thrombosis and Haemostasis
Volume 10, Issue 11, pages 2399–2402, November 2012
How to Cite
JUPE, S., AKKERMAN, J. W., SORANZO, N. and OUWEHAND, W. H. (2012), Reactome – a curated knowledgebase of biological pathways: megakaryocytes and platelets. Journal of Thrombosis and Haemostasis, 10: 2399–2402. doi: 10.1111/j.1538-7836.2012.04930.x
- Issue published online: 30 OCT 2012
- Article first published online: 30 OCT 2012
- Accepted manuscript online: 17 SEP 2012 10:29AM EST
- Received: 30 June 2012, accepted: 5 September 2012
Reactome (http://www.reactome.org) is a free open source database and website of human biological pathways built from connected biological ‘reactions’ or pathway steps that encompass all biological events, for example binding, phosphorylation, transport, as well as classic biochemical events . Each reaction is derived from literature and includes a citation that experimentally validates the event described. The aim is to represent a consensus view of human biological pathways, as a free reference and core dataset for biologists (Figs 1 and 2).
The content of Reactome is based on information provided by expert biologists, converted into reactions and pathways by Reactome curators and peer-reviewed by another expert. Reactions and pathways are extensively cross-referenced to databases such as Ensembl (http://www.ensembl.org/index.html), GO (http://www.ebi.ac.uk/QuickGO), PubMed (http://www.ncbi.nlm.nih.gov/pubmed), ChEBI (http://www.ebi.ac.uk/chebi/index.jsp), UniProt (http://www.uniprot.org) and OMIM (http://www.ncbi.nlm.nih.gov/omim). Pathways are human-centric but may incorporate pathway steps manually inferred to exist in humans, based on data from model organisms. These are clearly differentiated from pathway steps that have been experimentally determined in humans. Pathways for species other than human are computationally inferred by a process based on orthology. Currently over 20 additional species are represented. Tools are available on the Reactome website to allow interactive visualization of pathways and enable analyses such as pathway over-representation (pathway enrichment), pathway expansion to include protein–protein and protein–small molecule interactions and the overlay of expression data onto pathways enabling pathway differential expression analysis. All of these tools are compatible and designed to operate with user-supplied datasets. Pathways can be exported in a variety of formats including the BioPax and Systems Biology Markup Language (SMBL) standards (for further information see http://sbml.org).
Reactome covers many areas of biology such as DNA replication and repair, membrane trafficking, synaptic transmission and receptor-based signaling pathways. Each of these topics contains relevant biological pathways and associated diagrams. Pathways relevant to megakaryocyte and platelet biology are largely within the major topic of Hemostasis. This currently contains (October 2012) 40 pathways including 347 reactions. Subtopics within Hemostasis include platelet adhesion to exposed collagen, nitric oxide metabolism, platelet sensitization by low-density lipoprotein, adenosine-di-phosphate signaling through P2Y purinergic receptors, thrombin activation of proteinase activated receptors, glycoprotein (GP)VI (Fig. 2) and αIIbβ3 mediated signaling, platelet calcium regulation, and platelet degranulation.
The Platelet Web (http://plateletweb.bioapps.biozentrum.uni-wuerzburg.de/plateletweb.php) is a dataset with an associated website representing a platelet-relevant subset of a generic human protein–protein interaction network derived from the Human Protein Reference Database or large-scale yeast two-hybrid studies (Y2H)) . Platelet specificity comes from the representation of proteins with platelet-specific proteomics or transcriptomics data. This set was further annotated by incorporating data concerning the platelet phosphoproteome. This approach is fundamentally different and somewhat complementary to the Reactome one, in that it aims to comprehensively represent all proteins that are known to exist in platelets and presents a network of identified interaction connections between them. However, it does not attempt to categorize these into recognizable ‘canonical’ pathways, or explain the context of interactions, or suggest platelet-specific processes that might be of particular interest as opposed to widespread metabolic processes, nor does it distinguish between interactions studied and described in the peer-reviewed literature and unfamiliar interactions that might be novel elements of platelet processes, or artifacts of the technology used to identify the interaction. It is recognized that Y2H technology is a highly artificial measure of protein interactivity that can suggest interactions that have no in vivo relevance. The dataset underlying the Platelet Web site cannot be downloaded, and can only be queried via the website for individual proteins. In contrast, Reactome pathways and the data schema can be downloaded in a variety of re-usable formats including SBML and BioPax standards, or as a list of protein identifiers. There are simple and advanced query interfaces, BioMart representation and an application programming interface offering two alternative methods of bulk querying, and there are several tools that allow the user to analyze their own datasets by comparison or overlay onto Reactome pathways.
The HaemAtlas is a comprehensive compendium of transcripts present in the six main peripheral blood cell elements and in erythroblast and megakaryocytes . It identifies genes that have a significantly higher transcript level in the megakaryocytic lineage than in the seven remaining lineages. The Atlas has recently been expanded with information about changes in the transcriptome for the erythroid and megakaryocytic lineages during differentiation of haematopoietic stem cells . Among the over-expressed category are transcripts for platelet-specific surface receptors in which mutations are known to impair platelet function, such as the receptor for von Willebrand Factor (VWF), GPIbα/Ibβ/IX/V, and the receptor for fibrinogen, vitronectin and VWF, GPIIb/IIIa (integrin αIIbβ3). Information about mutations underlying inherited bleeding disorders of the platelet type like Bernard and Soulier syndrome, Glanzmann’s thrombasthenia and Wiskott Aldrich syndrome are maintained at databases at different institutes and there is a lack of a central portal for all disorders (examples of databases can be found at http://sinaicentral.mssm.edu/intranet/research/glanzmann and http://bioinf.uta.fi/WASbase). The information in these databases is generally not linked to knowledge about signalling pathways as exists in Reactome.
Several recent candidate-gene and genome-wide platelet association studies have identified nearly a hundred common coding and non-coding single-nucleotide polymorphisms (SNPs) that exert an effect on platelet function, [5,6] volume and count . About a third of these SNPs are localized in or near genes encoding known regulators of megakaryopoiesis and the formation and survival of platelets. The remainder are in or near genes encoding proteins from a diverse array of known functional categories, but their role in megakaryocyte and platelet biology remains to be elucidated . Information about the results of genome-wide association studies (GWAS) is maintained in an on-line catalogue (http://www.genome.gov/gwastudies). Overlaying the GWAS results with pathway knowledge in Reactome can be applied to develop protein–protein interaction networks which will reveal hitherto non-appreciated interactions . It is hoped that the availability of such networks will support researchers in their endeavours to unravel the role and function of this new group of key regulators of megakaryopoiesis and the formation and function of platelets. Knowledge about common sequence variants on platelet phenotypes is of no immediate clinical use because their effect size on the risk of bleeding and thrombotic events is small.
This will change with the increasing use of next generation sequencing technologies (NGST). Global scientific initiatives to decipher the coding fraction (exome) or the entire sequence of hundreds of thousands of human genomes will ultimately lead to a complete catalogue of sequence variants in human populations of different ethnicities  and future association studies may identify rare variants with large effects sizes on clinical phenotypes. Several of these variants are likely to become part of the routine diagnostic work-up of patients, particularly those with early onset thrombotic and bleeding disorders. The more immediate application of NGST is in the area of Rare Diseases for which the genetic basis has not yet been resolved. It has now become feasible and affordable to survey the entire coding fraction of the human genome by so-called exome sequencing. This approach has already been successfully applied to identify rare variants and mutations that underlie Rare Diseases. For example, the sequencing of the exomes of a relatively small number of patients has led to the discovery that NBEAL2 is the causative gene for Grey Platelet syndrome [8,9] and that the compound inheritance of a low-frequency regulatory SNP and a rare null mutation in the RBM8A gene causes Thrombocytopenia and Absent Radii syndrome , showing the superiority of the exome sequencing approach over linkage studies in large numbers of pedigrees.
To allow physicians and patients with rare inherited bleeding, platelet and thrombotic disorders to optimally reap the benefits of the genome revolution, carefully curated databases such as Reactome are key building blocks that allow the visualization of clinically relevant information on cellular pathways, linked to data resources that aim to catalogue the relationships between rare sequence variants and clinical phenotypes. In partnership with scientific and clinical experts in the field of rare inherited platelet and bleeding disorders, we have commenced a systematic curation effort that aims to combine literature information on causative rare variants with information from disparate databases to create a single Locus Reference Genomic (LRG) database (http://www.lrg-sequence.org) that will link this gene-centric information with clinical phenotype descriptions and information about studies of novel treatments, at for example Orphanet (http://www.orpha.net). This initiative is overseen by the Scientific and Standardization Committee ThromboGenomics (http://www.thrombogenomics.org.uk) of the ISTH. The LRG database is supported by both the European Bioinformatics Institute and the National Center for Biotechnology Information providing a guarantee of a seamless integration with other databases, such as dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP), Ensembl, etc. Similarly the Orphanet database is a long-term and sustainable database initiative and is one of the reference portals for information on Rare Diseases and orphan drugs. Orphanet’s aim is to help improve the diagnosis, care and treatment of these patients by accurately capturing, annotating and cataloging clinical phenotype information.
In conclusion, global collaboration is urgently needed to curate knowledge about the relationship between rare sequence variants with large clinical effect sizes and to integrate the information from disparate disorder-specific databases in a single freely-accessible database environment and related websites.
Development of the Reactome database was supported by grants from the National Human Genome Research Institute at the National Institutes of Health (grant number P41 HG003751); the European Union 6th Framework Programme ‘ENFIN’ (grant number LSHG-CT-2005-518254). Funding for open access charge: National Institutes of Health grant number P41 HG003751. WHO is supported by a grant from the National Institute for Health Research England (grant number NIHR:RP-PG-0310-1002).
Disclosure of Conflict of Interests
The authors state that they have no conflict of interest.