Reactome (http://www.reactome.org) is an open-source, expert-authored, peer-reviewed, manually curated database of reactions, pathways and biological processes. We provide an intuitive web-based user interface to pathway knowledge and a suite of data analysis tools. The Pathway Browser is a Systems Biology Graphical Notation-like visualization system that supports manual navigation of pathways by zooming, scrolling and event highlighting, and that exploits PSI Common Query Interface web services to overlay pathways with molecular interaction data from the Reactome Functional Interaction Network and interaction databases such as IntAct, ChEMBL and BioGRID. Pathway and expression analysis tools employ web services to provide ID mapping, pathway assignment and over-representation analysis of user-supplied data sets. By applying Ensembl Compara to curated human proteins and reactions, Reactome generates pathway inferences for 20 other species. The Species Comparison tool provides a summary of results for each of these species as a table showing numbers of orthologous proteins found by pathway from which users can navigate to inferred details for specific proteins and reactions. Reactome's diverse pathway knowledge and suite of data analysis tools provide a platform for data mining, modeling and analysis of large-scale proteomics data sets. This Tutorial is part of the International Proteomics Tutorial Programme (IPTP 8).
A major challenge for researchers and bioinformaticians is the integration of experimental and computational proteomics results with information relating to specific biological pathways. How are lists of protein-coding genes with somatic mutations identified in a survey of tumors, or lists of proteins whose expression level is changed in response to an experimental stress or a clinical disease to be mined effectively for insights into causes of disease and their physiological mechanisms? Biological pathway databases can help meet this need by facilitating the capture of the relationships between genes, proteins and small molecules in a computable data model. They provide information on biological reactions at the molecular level and indicate how these reactions can be grouped to provide a specification of a higher order process such as apoptosis. Unlike printed textbooks, pathway databases have the freedom to expand in breadth and depth, to be queried interactively, to adapt their visual display to the needs of individual research communities, and to connect to other internet resources.
There are several distinctive approaches to constructing a useful pathway database. One distinguishing feature is the domain of the database. Some databases focus on the transformation of small molecules (intermediary metabolism) while others place more emphasis on signal transduction and higher order biological processes. Another distinguishing feature is the level of curation. Some pathway databases are fully curated, where each pathway is authored and reviewed by an expert and include numerous comments, literature citations and data such as enzyme activators and inhibitors. The other extreme is an automatic scheme in which pathways are inferred by computational processes with little or no literature curation occurs.
One of the earliest and best-known databases of biological pathways is The Kyoto Encyclopedia of Genes and Genomes or KEGG 1, 2. KEGG's focuses on intermediary metabolism rather than higher level pathways, for a broad range of species. The BioCarta project (www.biocarta.com) is a human-specific pathways database that focuses on higher order processes such as signaling. At heart the BioCarta pathway data are a series of colorful high-resolution diagrams oriented towards education. MetaCyc is a richly curated database of pathways involved in primary and secondary metabolism that supports data mining and other software-driven applications 3. Built on the principles of Wikipedia, the WikiPathways is an open, community-curated biological pathway database that aggregates material from the individual databases mentioned here 4. NCI-PID is a collection of Reactome, BioCarta pathway data, and curated biomolecular interactions and signaling pathways related to cancer 5. Science Signaling's Database of Cell Signaling is an expert-authored and peer-reviewed curated database of signaling pathways 6. NCBI BioSystems functions as a repository of pathway data integrated with gene, protein, associated literature and chemical data present within the Entrez database system 7.
Gathering, integrating, visualizing and analyzing pathway data with other types of biological data are a challenging undertaking. Pathway databases gather and exchange data in different file formats and database dumps. However, two standard pathway data formats have reduced the complexity of data exchange and allow databases to cooperate more effectively. Systems Biology Markup Language (SBML) is an XML format language for the exchange of computational models of biological pathways and processes 8. Visualization and model simulation tools such as CellDesigner 9 and COPASI 10 are compatible with SBML files. Biological Pathway Exchange (BioPAX) uses a Web Ontology Language (OWL) to support the exchange of biomolecular and genetic interactions, gene regulation networks and metabolic and signaling pathway data. Tools such as Cytoscape 11 and Chisio BioPAX Editor 12 enable visualization and manipulation of BioPAX files.
The Reactome database of human pathways, reactions and biological processes 13–16 employs a reductionist data model, which attempts to represent all of biology as reactions that convert input physical entities into output physical entities. The input and output entities of a reaction can be proteins, nucleic acids, chemical compounds or complexes of these entities. Every reaction and entity in Reactome is associated with a species and is assigned to a cellular location. Some reactions may span more than one compartment. For example, the reaction representing the P2Y11 receptor binding adenosine nucleotides (ATP) would have plasma membrane and extracellular components 17. Each reaction is supported by experimental evidence represented by links to the appropriate literature references. Reactions are then grouped into ordered causal chains to form pathways. Pathways can contain reactions, pathways or both. Pathways in turn have been grouped into approximately 160 canonical pathways each of which corresponds to a substantial, tightly connected domain of human biology such as carbohydrate metabolism 18, solute transport regulatory pathways, GPCR signal transduction 19, cell-cycle regulation and innate immunity 20.
Reactome curators work with collaborating faculty-level biologists to create pathways and reactions from published primary research article and reviews. Together they work through a domain of biology to create a human- and computer-accessible description, linking all of the genes, proteins, literature citations and controlled vocabulary data together. Pathways, reactions, protein and small molecule entities are cross-referenced with accession numbers and identifiers to a number of well-established databases, including NCBI Entrez Gene, Ensembl and UniProt databases, UCSC and HapMap Genome Browsers, KEGG Compound, PubChem Substance and ChEBI 1, 2, 21–29. Physical entities and events are further linked to ‘Molecular Function’, ‘Biological Process’ and ‘Cellular Component’ ontology terms found in the Gene Ontology (GO) vocabularies 30, 31 and literature citation linked to PubMed 32. Post-translational modifications are represented in Reactome with terms from PSI-MOD 33. In the other direction, incoming links connect UniProt, ChEBI, Ensembl, Entrez Gene, WormBase and the GO Consortium back to Reactome 21, 24, 26, 27, 30, 31, 34. For example, an incoming link from a UniProt protein entry to Reactome links pages that describe the function the protein plays in one or more biological processes, the complexes it participates in, its position in the pathway diagrams and the literature citations that back these assertions.
The tutorial that follows will illustrate how browsing, searching, analyzing and visualizing Reactome pathway data are useful in interpreting proteomics data sets. Please note that this information is based on Reactome in early 2011. The contents of the database and some of the web pages may have changed slightly since this tutorial was written.
2 Navigating the reactome website
The main user-entry point to Reactome is the website, located at http://www.reactome.org (Fig. 1). This intuitive home page, divided into three main sections, provides access to the database and the suite of pathway analysis and data mining tools. The navigation bar at the top of the page provides access to background information describing Reactome (‘About’), a list of pathways (‘Content’), the user guides and a description of the data model (‘Documentation’), additional data analysis tools (‘Tool’), software and data sets in MySQL, BioPAX, SBML and PSI-MITAB formats (‘Download’). The buttons on the left-hand side of the home page provide access to some of the popular data analysis tools and downloadable data sets. A simple search tool allows the user to query the contents of the Reactome database. The main text section provides information ‘About Reactome’, an example pathway (‘Pathway of the Month’), access to web tutorials and up-to-the-minute Reactome news.
3 Browsing Reactome pathway diagrams
At the top of the browser page is the ‘Search and Analyze’ panel that consists of a search text box to query the elements of the pathway diagram and the ‘Analyze, Annotate & Upload’ button that controls the interactive tools associated with pathway diagrams. The ‘Pathways’ panel, on the left side, organizes all the canonical pathways in a hierarchy. The sub-pathways and reactions within each canonical pathway can be displayed or hidden by clicking on the plus (+) symbol to the left of the pathway name. Navigating to pathways is achieved by clicking on the pathway name in the pathway hierarchy on the left. This displays the corresponding pathway diagram or diagram section in the ‘Visualization’ panel to the right. The Google map-like tools in the upper left corner of the ‘Visualization’ panel enable zooming and scrolling across the Pathway diagram.
When a pathway in the hierarchy is selected, it is highlighted in bright green in the hierarchy and its parent terms are highlighted in green. In the diagram on the right, green squares highlight the nodes of all reactions that are components of that pathway. Scrolling over any sub-pathway or reaction in the hierarchy of the selected pathway will highlight that event with a green square on each reaction node. Highlighted reactions are also visible in the thumbnail diagram, which can be used to navigate quickly to the region of interest in the main diagram. Canonical pathway diagrams, such as ‘Cell Cycle and Mitotic’ may contain sub-pathways that may have their own diagrams. These are represented as boxes with green boundaries in the diagram. This process of navigating downward in the event hierarchy either by choosing sub-processes of the current one in the hierarchy on the left or by choosing a pathway box or reaction node in the diagram window on the right can be continued until a single reaction is highlighted. Moving the cursor over a reaction edge or physical entity node of the pathway will cause its name to appear in a popup window.
Underneath the ‘Visualization’ panel is the ‘Details’ panel that provides a description for the pathway, reaction or physical entity. Pathway descriptions provide a text summary giving an overview of the pathway, the GO biological process term for the pathway and the GO cellular compartment term for its location in the cell, as well as published literature references linked to PubMed. If the pathway is not supported by direct experimental data but has been inferred from a pathway in another species, this is noted with the phrase ‘This event is deduced on the basis of event(s)’ and a link to the reference pathway.
Clicking a reaction box will present the reaction description in the ‘Details panel’ with information about the reaction, including the input and output physical entities, the catalyst and the precise component within a catalyst complex (or domain within a simple catalyst) that enables the reaction to occur. A description of any molecule represented in the pathway diagrams can be displayed below the diagram by selecting the physical entity node within the diagram. Physical entities within the ‘Details’ panel are seamlessly linked to other external bioinformatics resources.
Context-sensitive menus accessible from the ‘Visualization’ panel view of a reaction provides additional functionality while navigating the Pathway browser (Fig. 2C and D). The exact features of the context sensitive menus are determined as the user right-clicks on a physical entity: (i) a list of the other pathways in Reactome in which the selected entity participates; (ii) a display of the physical entities that contribute to the complex and (iii) a list of interactors of the entity (described later). The menu bar at the bottom of the ‘Details’ panel provides download options to retrieve static pathway diagram files and BioPAX levels 2 and 3, SBML and Protégé formats. These batch data dumps allow researchers to download lists of all proteins that participate in a pathway or sub-pathway and support data exchange, analysis and modeling 8, 36–38.
Reactome data sets are a high-quality resource for a pathway-based data analysis. However, the usage of Reactome as a platform for high-throughput data analysis is limited by a low coverage of human proteins. To increase protein coverage and associated functional annotations, we have integrated molecular interaction and network data into the Reactome pathway diagrams. The molecular interaction overlay allows the display of proteins and chemicals interacting with proteins in a Reactome pathway.
As mentioned before, selecting ‘Display Interactors’ from the context-sensitive menus will display the individual protein interactors (Fig. 2B). The ‘Analyze, Annotate and Upload’ feature located at the top left corner of the Pathway Browser is used to overlay all interactors for all pathway proteins. Mousing the cursor over a protein interactor displays a popup window with the gene name and Uniprot accession number of the protein. The molecular interaction overlay employs PSI Common Query Interface (PSIQUIC) web services to import binary interaction data from individual interaction databases into Reactome pathway diagrams (http://code.google.com/p/psicquic). The default interaction database is IntAct. Other PSIQUIC data sources available for overlay in this way include APID, BioGRID, ChEMBL, DIP, InnateDB, IntAct, iRefIndex, MatrixDB, MINT, MPIDB, Reactome, Reactome-functional interactions (FIs) and STRING 25, 27, 39–57. Two the data sets, ‘Reactome’ and ‘Reactome-FIs’, were generated by the Reactome group. ‘Reactome’ represents interaction data derived from Reactome reactions and complexes. ‘Reactome-FIs’ contains approximately 210K functional interactions encompassing over 10 000 human proteins (46% of SwissProt entries for human). It combines curated interactions from Reactome and other pathway databases, including Panther 58, KEGG, NCI-PID, CellMap (http://cancer.cellmap.org/cellmap/), interaction data sets and interactions derived from co-expression data, protein domain–domain interactions, text mining and GO annotations 53.
The nodes and the edges of the overlaid network are interactive, providing links to relevant data sources. For example, clicking on a protein interactor opens a new web page displaying the Uniprot entry for the selected protein. Selecting an edge will open a new web page displaying the interaction entry in the current interaction database. If a new database is selected from the ‘Analyze, Annotate and Upload’ button while interactors are displayed for a set of pathway proteins, those proteins will be submitted to the new database and the display will automatically updated. As well as querying databases for interactions, it is also possible to upload user-defined interactions, in the PSI-MITAB format, that will be overlaid onto the pathway. Launching ‘Submit a new PSIQUIC Service’ will access a PSICQUIC service not listed in the PSICQUIC registry. Interactions can be colored based on the confidence level that reflects the amount of experimental data available. All the interactions displayed in the pathway diagram can be viewed as a list in the ‘Table of Interactors for Pathway’ of the ‘Analyze, Annotate and Upload’ feature. When displayed this table lists the proteins in the pathway along with their interactors from the currently selected interaction database. A full list of interactors for each pathway protein can be downloaded in the PSI-MITAB format.
5 Analysis of proteomics data sets using Reactome tools
Protein–protein interaction detection methodologies such as yeast two-hybrid 59, phage-display 60, protein microarray 61 and affinity chromatography followed by MS 62 have been used to create large interaction data sets. Biomolecular interaction databases such as DIP 63, BIND 55, BioGRID 56, IntAct 43 and MINT contain interaction data sets for yeast 59, 62, 64, bacteria 65–68, fruit fly 69, worm 70 and human 71–73. Several MS-based technologies including MALDI-TOF-MS 74, LC-MS 75 and SELDI-TOF-MS 76 have been used to study the proteome. MS has also been instrumental in the discovery and characterization of protein post-translational modifications 77 and biomarkers 78. Numerous protein fragment databases facilitate the identification of peptides within MS or tandem MS profiles, such as MASCOT 79 and XTandem 80. The PRIDE (PRoteomics IDEntifications) database integrates protein databases, literature citations and post-translational modification data to promote proteomics data analysis 81. Nevertheless, with high-throughput proteomic technologies it has become increasingly important to have analysis tools that can integrate and visualize thousands of data points in the context of the pathway diagrams. Reactome facilitates detailed computational analysis of proteomics data through the capture of published knowledge about reactions, pathways and biological processes and providing a series of bioinformatics tools that integrate the results with the pathway visualization system.
6 Querying Reactome
Most users will probably find the Simple search tool, accessible on all the webpages, sufficient for querying the Reactome database and website (Fig. 3A). Users can submit a word, database identifier or phrase and retrieve a list of corresponding database records. For example, a simple query for the protein name TP53 will yield 524 hits in different data categories (pathways, reactions, proteins and others). The ‘Others’ category represents literature references, complexes, inhibitions, activations or anything else not covered by the first three categories. A subset of the results can be displayed if some of these categories are not required. Simply deselect the boxes that are not required and click the Show button to refresh the search results page. Each of the results returned is clickable and will link to the appropriate Reactome page when clicked. Should it be necessary to restrict the search to a specific species, this can be achieved through the second Species drop-down menu of the search results page.
Working example of Reactome Simple Search: Retrieve all Reactome instances that involve the TPI1 enzyme.
(ii)Enter ‘TPI1’ in the search box and click the ‘Search’ button. In a few seconds, a list of Reactome reactions, pathways and entities should appear in the search results tab.
(iii)Click ‘Protein: UniProt:P60174 TPI (Homo sapiens)’ to connect with the TPI1 protein summary page.
(iv)Returning to the Search page, click ‘Pathway: Gluconeogenesis (Homo sapiens)’ to open the gluconeogenesis pathway diagram in the Pathway Browser.
The Advanced (Extended) search will provide more customizable, complex and logical queries that can be accessed via the Tools menu located in the main menu bar on all webpages. This Extended search method allows specific schema-based queries for particular types of Reactome data (Fig. 3B). Specifically, this option searches for records (instances) in the database by multiple field (attribute) values. Queries are combined together with boolean AND operators. For example, a query to retrieve all reactions that consume GDP and produce GTP would be prepared by choosing class ‘Reaction’, selecting field name input and entering GDP into the search box, and then picking field name output on the next row and entering GTP.
Working example of Reactome Advanced Search: Find all plasma membrane-associated complexes whose name includes the word EGFR.
(ii)Under the ‘Tools’ in the Navigation bar, select ‘Advanced Search’.
(iii)Select ‘Complex’ under the ‘Restrict search to class’ drop-down menu.
(iv)Select ‘name’ under the first row ‘Field name’, select ‘with the EXACT PHRASE’ from the next drop-down menu and type ‘EGFR’ into the final text box.
(v)Select ‘species’ under the second row ‘Field name’, select ‘with the EXACT PHRASE’ from the next drop-down menu and type ‘Homo sapiens’ into the final text box.
(vi)Select ‘compartment’ under the third row ‘Field name’, select ‘with the EXACT PHRASE’ from the next drop-down menu and type ‘Plasma membrane’ into the final text box.
(vii)Click the ‘Search’ button to retrieve from the advanced query, human complexes that contain EGFR and are located in the plasma membrane.
7 Pathway analysis
The Pathway Analysis tool analyzes user-supplied lists of genes, proteins and small molecules and provides ID mapping, pathway assignment and over-representation analysis. Clicking the ‘Pathway Analysis’ button on the Reactome homepage launches a data entry page that allows the user to input a list of gene, protein or small molecule identifiers. Several identifier types and accession numbers are currently supported, including UniProt, GenBank/EMBL/DDBJ, RefPep, RefSeq, EntrezGene, OMIM, InterPro, Affymetrix, Agilent, Illumina and Ensembl. The data entry page supports both typing and pasting identifiers into the text area provided, or uploading a text file of identifiers from the user's computer. Two pathway analyses can be performed. By default, the simpler of these analyses will be selected, ‘ID mapping and pathway assignment’. This analysis takes a set of accession numbers or identifiers and maps them to Reactome pathways. The results are presented in a sortable table that can be downloaded as a spreadsheet or as a comma-separated or tab-delimited file for further analysis (Fig. 4A).
A more complex pathway analysis tool is ‘Over-representation analysis’. This tool determines which events (pathways and/or reactions) are statistically enriched in a set of genes or proteins as specified by a submitted list of identifiers (Fig. 4B). The results of the over-representation analysis are provided as a color-coded interactive list of events. Each event is colored according to the probability (from a hypergeometric test) of seeing a given number or more proteins in this event by chance. The top-level events are ordered according to the lowest p-value of their components. The warmer the color, the higher the level of over-representation is for a given pathway. Selecting an event name will link to the Pathway Browser and clicking on the plus (+) next to the pathway name provides access to the protein identifiers from the submitted list that are found in the pathway, along with the corresponding UniProt IDs. The results are also provided as a table of statistically over-represented events as an ordered list that can be downloaded.
Working example of Reactome Pathway Analysis: Annotate a list of UniProt identifiers with Reactome reaction and pathway data and identify statistically over-represented events.
(iii)Click the ‘Example’ button on the ‘Pathway Analysis’ page and then click ‘Analyze’. This will demonstrate the ‘ID mapping and pathway assignment’ feature. After a few seconds, a table of results entitled ‘Pathway Assignment’ will appear.
(iv)In the ‘UniProt’ column, click on the UniProt ID: O00139 link to open the reference UniProt protein record in a new page.
(v)Return to the ‘Pathway Assignment’ table and click the upside-down triangle (on the left) of the ‘ID’ column header to sort the table based upon the UniProt IDs; UniProt ID: Q9Y6Y9 should now be in the top row.
(vi)In the ‘Pathway names’ column, click on ‘Toll Receptor Cascades’; the Pathway Browser should open in a new page.
(vii)Return to the results table. At the top of the table, you should see a download bar. Select the file format of your choice and click the ‘Download’ button to a file.
(viii)Repeat Steps 1–2 but select ‘Over-representation analysis’ before clicking ‘Analyze’. This will demonstrate the ‘Over-representation analysis’ feature. After a few seconds, a color-coded interactive list of events will appear.
(ix)Click the plus (+) before the event name to reveal the ‘Matching identifiers’ list of the identifiers and associated proteins that contributed to the over-representation score.
(x)Scroll down the page to the ‘Statistically over-represented events as an ordered list’ section to view the same results in a tabular form.
(xi)Click ‘Results in a tab-delimited text file’ to download the results data.
(xii)Scroll down the page to the ‘Mapping from submitted identifiers to Reactions’ section to view the same results in as a list of reactions for each protein.
8 Expression analysis
Proteomic researchers are producing vast quantities of structural and functional data of proteins through large-scale experiments that assess the abundance of proteins, post-translational modifications and protein–protein interactions. The Expression analysis tool will help with the biological interpretation of these different data types. Clicking the ‘Expression Analysis’ button on the Reactome homepage opens a form that allows entry of a user-specified list of identifiers and numerical values. As with the pathway analysis, the expression analysis tool will accept the same protein accession numbers and identifiers that are associated with the popular commercial proteomics platforms. However, the expression analysis tool will also accept numerical values (e.g. abundance, fold change or statistical value) and show how abundance levels affect events (reactions and pathways) in the cell. Once the data are submitted for analysis, the expression results will be presented as a sortable tabular format that can be downloaded as a comma- and tab-separated formats or a spreadsheet (Fig. 5A). A View button embedded in the results table will launch the Pathway Browser and displays the relevant pathway diagram (Fig. 5B). The physical entities in the pathway diagram are color-coded according to the submitted numerical values. The color scale automatically adjusts to fit the range represented in the data set, with red for the highest values and dark blue for the lowest values and the submitted identifier and value are overlaid onto the physical entities. Gray boxes are proteins or small molecules with no associated values in the input data. Black entities represent complexes that have values for at least one of the proteins. The ‘Experiment Browser’, at the bottom of the colored pathway diagram, allows the user to step through different time points or visualize changes in abundance levels across multiple samples.
Working example of Reactome Expression Analysis: Visualize thousands of data points from an expression data set in the context of Reactome pathway diagrams.
(iii)Click the ‘Example’ button on the ‘Upload expression data’ page and then click ‘Analyze’. After a few seconds, a table of results entitled ‘Expression per Pathway’ will appear.
(iv)Click on the arrows of the ‘% in data’ column to reorder based upon the highest percentage hits from the dataset at the top.
(v)Click the ‘View’ button for ‘Intrinsic Pathway for Apoptosis’ to open the Pathway Browser in a new window. Be sure your browser is configured to see pop-ups for Reactome.
(vi)In the top left-hand corner of the diagram, there is an icon with four different sizes of blue circle, which allows you to choose your zoom level and scroll across the pathway diagram. Click on the second highest circle to zoom out and use the arrows to scroll about the pathway diagram.
(vii)Mouse over one of the black colored physical entities (complexes) to show the name of the complex.
(viii)Right click on the same complex entity and select ‘Display Participating Molecules’. A popup box should appear, with a grid of colored squares inside it, representing expression levels for the complex components.
(ix)At the base of the diagram, you will see a bar containing the text ‘Experiment: 10h_control’ and two arrows. Click on the forward arrow five times. Colors of some of the entities will change reflecting changes in their abundance over the course of the study.
(x)Type ‘Smac’ into the Search box in the ‘Search and Analyze’ panel. You will see a demonstration of the auto complete feature of the pathway search. Select ‘SMAC [cytosol]’ and click ‘Search map’. In the ‘Search results’ panel to the left, click the ‘SMAC [cytosol]’ query link. This will open a Pathway hierarchy for the ‘Apoptosis’ pathway in the left panel and center the pathway diagram, and highlight the ‘SMAC’ physical entity of the ‘Intrinsic Pathway for Apoptosis’ sub-pathway diagram with a green box.
(xi)Mouse over this entity. The ‘SMAC[cytosol]’ popup window will appear, displaying the data point identifier and expression value.
(xii)Zoom back in to the highest zoom level, navigate to ‘SMAC[cytosol]’ and right click to show the context sensitive menu. Select ‘Display Interactors’. A halo of interacting proteins will appear around the physical entity.
(xiii)Click on the line connecting ‘SMAC[cytosol]’ to ‘RNF85’ to open a new page with the IntAct interaction for these two proteins?
(xiv)Click on the ‘RNF85’ node to open a new page with the UniProt protein page.
9 Comparative analysis of biological pathways
Organism-based comparative analysis of biological pathways yields information on their evolution, on disease, on biotechnological applications and on pharmacological targets. Reactome provides the opportunity to view predicted pathways for 20 evolutionary divergent model organisms, including Arabidopsis thaliana, Bos taurus, Caenorhadbitis elegans, Canis familiaris, Danio renio, Dictyostelium discoideum, Drosophila melanogaster, Escherichia coli, Mus musculus, Saccharomyces cerevisiae and Rattus norvegius. These species were chosen because of the fullness of their genome sequences and annotations, and because they embody more than four billion years of evolution and span the major branches of life. Twelve of the 20 non-human species also belong to the GO Reference Genome annotation project 30. Protein homology data obtained from Ensembl Compara 82 is used to support orthology-based inference of reactions for which high-quality whole-genome sequence data are available. Selecting the species of interest in the ‘Switch species’ dropdown menu in the upper left corner of Pathway Browser will view model organism pathway diagrams (Fig. 2A).
The Species Comparison tool allows users to compare the predicted pathways with those of Homo sapiens to find reactions and pathways common to both your selected species and human. This tool is launched by pressing the ‘Species Comparison’ button on the sidebar, on the left-hand side of the home page. Having selected a non-human species, the results of the species comparison are presented as a sortable HTML table that can be downloaded, as a spreadsheet or as a comma-separated or tab-delimited file, for further analysis (Fig. 6A). A ‘View’ button embedded in the results table will launch the Pathway Browser and displays the comparative pathway diagram (Fig. 6B). The physical entities in the pathway diagram are color-coded: (i) yellow indicates the protein's ortholog is present in the comparison species; (ii) blue indicates that the protein is only known in human and that no ortholog could be found in the comparison species; (iii) gray indicates that inference was not possible, e.g. for small molecules; and (iv) black indicates the entity is a complex.
Working example of Reactome Species Comparison Tool: Compare the murine predicted pathways with those of Homo sapiens to identify common reactions and pathways.
(iii)Select species ‘Mus musculus’ from the drop-down menu and click the ‘Apply’ button. After a few seconds, a table of results entitled ‘Species Comparison’ will appear.
(iv)Click at the head of the column labeled ‘% in other species’. The table rows should reorder so that the pathways with the greatest overlap between mouse and human are at the top.
(v)Click at the head of the column labeled ‘Pathway name’. The table rows should revert to being ordered alphabetically according to pathway name.
(vi)Scroll down the results page and click the ‘View’ button for the pathway ‘Metabolism of amino acids and derivatives’ to open the Pathway Browser in a new window. Be sure your browser is configured to see pop-ups for Reactome.
(vii)In the top left-hand corner of the diagram, there is an icon with four different sizes of blue circle, which allows the user to choose a zoom level and scroll across the pathway diagram. Click on the second lowest circle to zoom out and use the arrows to scroll about the pathway diagram. About two dozen yellow colored entities are visible. These are pathway entities conserved between both species. The two blue colored entities represent proteins that are only found in Homo sapiens.
(viii)Mouse over one of the black colored entities (complexes); right click and select ‘Display Participating Molecules’. A popup should appear, with a grid of colored squares inside it, representing complex components common to both species.
10 Using Reactome BioMart for data integration
BioMart 83 is a query-orientated data mining tool that can be used for rapid bulk querying, data integration and downloading of Reactome data. BioMart can link queries together, so that the results contain information from more than one database. For example, it is possible to find the ENSEMBL IDs associated with the genes in selected Reactome pathways by linking a Reactome query to an ENSEMBL query. The Reactome BioMart can be accessed via the Tools menu located in the main navigation bar on all web pages (Fig. 7). Simple or complex queries can be created through the BioMart interface. Firstly, the Reactome preformatted queries can be accessed at the top of the page and secondly, the Regular BioMart query interface that is below the canned query selector.
The small set of preformatted or canned queries can be used without needing to understand the details of the BioMart query interface. A canned query selector allows users to choose from one of the currently available queries, to find: (i) a list of pathways for specific species; (ii) a list of reactions for specific pathway; (iii) a list of proteins for specific pathways; (iv) a list of complexes for specific proteins; (v) a list of pathways for specific genes; (vi) a list of genes for specific pathways and (vii) a list of reactions for specific genes. The results are presented in a regular BioMart results page (Fig. 7) and can be exported as HTML, tab-separated values (TSVs) or as an Excel spreadsheet.
The regular BioMart query interface provides users with opportunity to define their own queries. Users have control over both how the data are ‘filtered’, to limit the records that are integrated and also the ‘attributes’, corresponding to columns of data that are included in the results. There are two ways that proteomic researchers might want to use Reactome BioMart. Selecting the ‘database’ and ‘data set’ initiate the regular query. In addition to the Reactome database, there are a number of other databases available, currently UniProt, ENSEMBL and PRIDE. Reactome provides four data sets, ‘complex’, ‘interaction’, ‘pathway’ and ‘reaction’ that are accessible to the BioMart query. For example, select the ‘pathway’ data set if you would like to find all pathways associated with a given UniProt ID. The next step is to select the ‘Filters’ to restrict the query, e.g. ‘Limit to Species’ – Homo sapiens. If you do not use the species filter, then the results will contain information from all species known to Reactome. The ‘Attributes’ selected will specifically define what data are displayed in the results.
The second ‘Data set’ link in the left-hand panel is used to choose another data set, providing the opportunity to integrate Reactome data with a data set from another database. For example, if you want to find the evidence that supports the existence of the protein, associated with a set of pathways, select ‘pathway’ as the first data set, then select ‘UNIPROT (EBI UK) UNIPROT’ as the second. In the second data set, click on ‘Attributes’, expand the ‘Protein attributes’ category by clicking the ‘+’ symbol in the right panel and select ‘Protein existence’ to include this attribute in the final results display.
Application programming interfaces (API) provide more flexible and interactive connections for automated data exchange between local programs and pathway databases. Reactome has a Perl-based API providing access to BioMart data sets. Perl and JAVA APIs and SOAP also give programmatic access to Reactome's MySQL database. Describing their functionality is beyond the scope of this tutorial, but documentation is available via the Reactome website download page (http://www.reactome.org/download/index.html).
Working example of Reactome BioMart: Query and extract Reactome protein and pathway annotations.
(ii)Under the ‘Tools’ in the Navigation bar, select ‘BioMart: query, link’.
(iii)Select ‘Find list of pathways for specific proteins’ from the ‘Canned query’ drop-down menu and click the ‘GO’ button. This will perform a preformatted BioMart query.
(iv)Click the ‘Show example’ button on the BioMart page and then click ‘Run query’. After a few seconds, a table of results will appear. Clicking the ‘Protein UniProt IDs’ and ‘Pathway IDs’ in the table will connect to the UniProt database and Reactome pathway diagrams, respectively.
(v)Click ‘New’ button towards the top of the page to reset the query submission page.
(vi)Choose the database ‘REACTOME’ from the ‘CHOOSE DATABASE’ drop-down menu of the regular BioMart query section (below the canned query). A new selector should appear, saying ‘CHOOSE DATASET’.
(vii)Click on the ‘CHOOSE DATASET’ selector. There should be four data sets: ‘complex’, ‘interaction’, ‘pathway’ and ‘reaction’. Select ‘reaction’.
(viii)Click on ‘Filters’ on the left-hand side of the page, and then click on the right-hand side of the page to select the ‘Homo sapiens’ filter from the ‘Limit to Species’: drop-down menu.
(ix)Click on ‘Attributes’ on the left-hand side of the page, and then on the right-hand side of the page select the attributes ‘Reaction name’, ‘Protein UniProt ID’ and ‘Protein name’.
(x)Click on the ‘Results’ button towards the top of the page.
(xi)Click on a ‘Reaction DB_ID’ link to visualize the corresponding Reactome reaction in the Pathway Browser.
(xii)Go back to the BioMart results table, and then click the ‘Go’ button above the results table. This will download the BioMart results as a TSV file.
11 Challenges and future directions for Reactome
Pathway databases such as Reactome have made important contributions and advances in recent years in the way of data visualization and analysis. However, there are still some challenges outstanding. Our current curation practices allow Reactome to capture, to a high degree of accuracy, pathway annotations encompassing many areas of normal and developmental biology. However, one major caveat of manually curated databases is the low coverage of physical entities. Reactome curators will continue to systematically annotate proteins. We intend to extend our annotations to new signaling pathways, biological processes. Reactome already contains annotations a few tissue-specific processes derived from annotations of generic processes. An area of focus for future curation is a pathway associated with pathological and infectious disease. Furthermore, pathway annotations in Reactome have concentrated on the properties and functions of proteins. However, a substantial part of the human genome is transcribed into non-coding RNAs, and these entities contribute to the regulation of signaling and other biological processes 84–86.
Pathway databases like Reactome must maintain and increase their commitment to collaboration and integrating biochemical, biological, biophysical and chemical information data exchange formats. Reactome has been exchanging data with a number of databases, including NCI-PID and is currently working with WikiPathways to create a specific data exchange framework. We would be encouraged to see the linkages between databases and Reactome, and the integration of Reactome pathway data into other bioinformatics resources. Reactome does not currently store information on enzyme kinetics or protein-binding affinities. Information on enzyme kinetics is highly dependent on experimental conditions, which would need to be described in a systematic way in order to allow for one-to-one comparisons. Reactome can provide systems biologists with a reaction graph into which kinetic data from other sources could be integrated. There is a need for much more quantitative data, such as reaction kinetics, entity stoichiometry, molecule concentrations and other cell- or tissue-specific data. Reactome will continue to support SBML and BioPAX data structures as these formats support these additional attributes.
The integration of pathway and interaction data has been a key element of the Reactome redevelopment. There is only one drug interaction database (ChEMBL) that currently provides a PSIQUIC web service; the rest are all protein–protein interaction data sets. Overlaying protein-small molecule data from resources such as PubChem or proprietary sources may enable identification of novel lead compounds. Reactome will need to maintain the molecular interaction interface as these web services are deployed.
The Reactome data model, curation software tools, data visualization and analysis have focused on pathways and reactions associated with human biology. We have previously worked with model organism groups, notably rice, Arabidopsis thaliana87, fruit fly (http://fly.reactome.org) and other plants (Gramene) and M. tuberculosis, to build pathway databases on the Reactome model. One future goal is to create other manually curated model organism pathway databases.
We have developed a new intuitive web interface to visualize and analyze pathway data and promote integrated research on pathways. Further development will centre on evaluating the usability and functionality and refining it appropriately. New analysis tools could be developed to improve the visualization of expression data, integrate data from others ‘omics’ databases, such as expression, protein localization or transcription factors data, and clinical resources. Reactome group will continue to develop and distribute open software and standard operating procedures for the management of pathway information in order to encourage standardization and reuse.
Development of the Reactome website, data model and data analysis tools described in this tutorial are a result of concerted work of the Reactome curators and developers. We are also grateful to many scientists who collaborated with us to build the Reactome pathway content. This work was supported by grants from the National Human Genome Research Institute at the National Institutes of Health (grant number P41 HG0037510) and the European Union 6th Framework Programme ‘ENFIN’ (grant number LSHG-CT-2005-518254).
The authors have declared no conflict of interests.