Mass spectrometry has commonly been used to aid in identifying proteins since 1959. By utilizing the tandem mass spectrometry process, we combine the aspects of peptide mass fingerprinting and amino acid sequencing to increase the probability of finding a match of an unknown sample of protein to a known sample. To further increase the probability, in tandem mass spectrometry peptides with a certain mass to charge ratio are isolated and then fragmented with a gas. The new peaks that result from the collisions are those that contain different numbers of amino acids. With the information found, it is possible to determine the amino acid sequence of the selected peptide. By conducting two-dimensional electrophoresis, selecting an isolated protein dot from the gel, and performing tandem MS on it, the selected protein can be identified by its partial peptide sequence. In this simulation, the lab procedure described above is mimicked . Use of MS is seen in many experiments such as identification of bacteria  and gel-separated proteins , and is used in teaching expression proteomics . Providing a simulation of a mass spectrometer gives students without direct access to a tandem mass spectrometer the opportunity to complete a protein identification experiment and learn more about proteomics.
Computer simulations offer a low-cost, safe, information-rich supplement to hands-on experimentation in the classroom. Supplementing a lecture or reading-based course with simulations has been shown to improve information retention, practical skill use, and willingness to study . Student grades are improved as well as their self-assessed confidence in their acquired knowledge [6–8]. Fewer students have been shown to receive D's and F's in courses that use laboratory simulations , and students are more actively engaged in the learning process .
There are several different ways to incorporate computer simulations into traditional teaching methods. A simulation of a laboratory experiment can provide a concrete visual example of an abstract and difficult-to-grasp concept . The burden of a high student-to-teacher ratio can be partially alleviated by allowing students to navigate through a simulation together . Simulations can be used on a day-by-day basis or can be incorporated into an overall course design , which has been used to create effective distance learning courses .
2DE Tandem MS is a software program containing two simulations—2D electrophoresis and tandem mass spectrometry—both designed for use in studying the proteomes of single-celled organisms. The 2DE Tandem MS program was originally designed for Windows; this has been expanded to include Mac OSX, Linux, and the popular web browsers in all of those platforms. Early programming on this project was done in C++ and Java AWT, which are both platform dependent; the current approach is to develop applications using Java Swing, which in most cases leads to applications that operate on the most common operating systems and browsers. Using Java AWT components and absolute positioning in a graphical user interface (GUI) can lead to inconsistencies in display across platforms—a Windows user will see a different display than a Macintosh user. With a Java Swing GUI, 2DE Tandem MS looks the same across almost all platforms.
In this article, we will explain the features and assumptions built into the simulation, the biochemical principles behind the simulation, and provide suggestions for implementing it in the classroom or lab. The Supporting Information contains one additional figure and some suggested exercises at different levels of sophistication.
The 2-D electrophoresis simulation GUI consists of a gel canvas on the right and a control panel on the left (Fig. 1a). The gel canvas is where the protein separation is animated and explored. The control panel contains the buttons needed to load the desired proteome file, adjust the settings of the simulation, control the animations, and further interact with the protein separation. Fasta , GenBank , PDB , and custom e2d format file types are supported. The custom e2d format is generated each time a source file in one of the other three formats is loaded. The e2d file includes the same information as the source file, plus the isoelectric point (pI) and molecular weight for each protein, to speed up the loading process. A pH range for the isoelectric focusing (IEF) portion of the animation can be selected from the default choices or entered manually. The percent acrylamide of the gel can also be set, either from fixed or gradient acrylamide concentrations. The animation buttons can be used to play or pause the animation on the gel canvas, set the animations back to their initial stages, or reset the gel with no proteome file loaded. The IEF animation takes place across the top of the gel canvas and ends with the colored bars representative of the proteins with pI values within range of each other in their proper positions according to the pH chosen by the user (Fig. 1a, top). The pI value range of each colored bar is 1% of the full pH range for the IEF separation. Once it has finished, the sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) animation can begin; the proteins that were initially sorted by pH along the user-specified range are then allowed to migrate through the virtual gel sorted by molecular weight (Fig. 1a, bottom). Once the animation has stopped, the user can click on any of the protein dots to open the protein information box (Fig. 1b), which provides the title, molecular weight, pI value, and function of the protein, and has buttons that will send the sequence for searching in Expasy BLAST , NCBI , or Uniprot . There is also a button to send the protein's sequence to the tandem mass spectrometer simulation. Each protein dot is color coded for easier identification. The user can further interact with and explore the separation using the control panel. The “display protein list” feature allows the user to select specific proteins for removal from the gel before running the animations again. The “search protein field” feature allows the user to find specific proteins by either including or excluding search terms from the loaded proteins' titles, functions, or sequences. Carrying out a search immediately clears all disqualified proteins from the gel canvas, but can be undone using the reset button of the feature. The “generate HTML page” feature creates an HTML document containing all currently loaded proteins sorted by title, pI value, molecular weight, or function. The “record to CSV” feature creates a csv file (data are output as a list of comma separated values which can be opened in spreadsheet programs) containing the currently loaded proteins in a spreadsheet listing their titles, sequences, pI values, molecular weights, and x and y coordinates on the gel canvas.
The tandem mass spectrometry simulation can be used alone or in conjunction with the 2-D electrophoresis simulation (Fig. 2). The GUI contains the buttons for operating the simulation on the left, a graph of the initial mass spectrometry output on the bottom, and a graph of the second mass spectrometry output on top. A protein sequence can be obtained from the 2-D electrophoresis simulation, entered manually, or loaded from a FASTA file (Fig. 2, box on the top left). The user can select from four different proteases to digest the peptide. The mass/charge range of the x-axis of both graphs can be adjusted or left at the default of 0–3000. The output from the first mass spectrometer in the tandem MS is displayed in the lower spectrum (Fig. 2, bottom right). The user can click on any of these peaks to find its exact mass and perform further fragmentation, the output of which is displayed in the top graph as B and Y fragment peaks with the residue letter designations displayed between adjacent peaks (Fig. 2, top right). The user can toggle whether they see only B fragment peaks or Y fragment peaks for easier reading of the protein sequence (for an explanation of B and Y fragments see the Biochemistry Principles section). The built-in help files provide step-by-step instructions on how to use the simulation, along with definitions and explanations.
2DE Tandem MS employs several principles of biochemistry to illustrate the properties of gel electrophoresis and peptide fragmentation realistically. Two dimensional electrophoresis begins with IEF (where proteins are separated by isoelectric point) followed by SDS-PAGE (where they are separated by molecular weight). Before the IEF animation of the 2-D electrophoresis simulation, 2DE Tandem MS calculates the pI and MW for each protein and assigns their final positions in the pH range based on their pI value and the min and max pH set by the user (Fig. 1a, Choose pH box). A brief animation runs that visually demonstrates the separation of proteins based on pH, with the bands of protein ending at their intended positions. Before running the SDS-PAGE simulation, the user can select the percent acrylamide using the “Choose Acrylamide %” drop-down menu (Fig. 1a, middle left). At the user's prompting the SDS-PAGE animation begins, the proteins enter the gel, and then migrate toward the bottom of the screen. At this point the percent acrylamide selected by the user determines how far each protein migrates down the gel canvas (Fig. 1a, bottom right).
In an actual experiment, users pick spots from the 2D gels, extract the proteins, digest them and submit them for sequencing by mass spectrometry. This process is replicated in the simulation, with the advantage that the virtual process always works. The protein information box (Fig. 1b) appears when a user clicks on an individual protein spot. Clicking on the “Run Mass Spectrum” button then transfers the protein sequence to the mass spectrometer simulation, which can be accessed by clicking on the Mass Spectrometer tab in the simulation. The tandem mass spectrometry simulation gives the user a choice of four different proteases to digest the proteins. The selection of the protease should be based on the primary sequence of the protein. The simulation uses the standard target sites for trypsin (cuts at the C-terminal side of lys and arg), chymotrypsin (C-term phe, tyr, trp), proteinase K (C-term ala, phe, ile, leu, val, trp, tyr), and thermolysin (N-term ile, leu, met, val) to generate the peptide fragments . The digested fragments undergo ionization and then enter the first mass spectrometer, where they are separated by mass-to-charge ratio. When a peak from the first mass spectrometer (Fig. 2, bottom right) is selected for further fragmentation, the digested peptide enters a collision chamber, where the chemical bond between the carbonyl and amino groups along its backbone are broken, leaving two fragments that each carry a single positive charge: a B fragment, where the carbonyl carbon is triple-bonded to the oxygen, and a Y fragment, where the amino group has two extra hydrogens. There are several other bonds on the peptide molecule that could be broken, but B and Y fragments are the most commonly seen in Tandem MS experiments and the easiest to interpret. The bombardment in tandem mass spectrometry is carefully controlled so that the bulk of the peptides are only broken once, allowing for B and Y fragments of each possible size to be taken from the original peptide . Our simulation replicates this process. Figure 3a indicates the cleavage points on the peptide backbone (ProThrGluGlyCysMetAsnLeu or PTEGCMNL), along with the B fragment (ProThrGluGly or PTEG, Fig. 3b) and Y fragment (CysMetAsnLeu or CMNL, Fig. 3c) that result from cleavage of the peptide bond between glycine and cysteine. The structures of all the B and Y fragments that appear in the tandem MS simulation (Fig. 2, top right) for this peptide can be found in Supporting Information Fig. S1. Because only one CONH bond is broken per peptide fragment and each charged fragment only differs in size by the mass of one residue compared to its neighbors, it is possible to use the mass difference between successive B-fragment peaks or Y-fragment peaks to “read” the residue sequence of the selected output graph peptide fragment. The B-fragments are read left to right, and the Y-fragments are read right to left (Fig. 2, upper spectrum) to give the full sequence of the peptide.
To build a lightweight application that could be easily downloaded over the Internet, some assumptions about the 2-D electrophoresis and mass spectrometry laboratory experiments were made, and certain details were omitted from the simulations.
The 2-D electrophoresis simulation assumes that all proteins in the given proteome are expressed at an equal level, no post-translational modifications have occurred, and that all proteins are soluble. The results of the pI calculations conducted by 2DE Tandem MS are based on the fixed pKa values for ionizable amino acid side chains and terminal groups in each peptide chain . The results of these calculations are comparable to the Expasy Compute pI/MW Tool . The mass spectrometry simulation omits any interference from the bulk solution, and does not try to simulate the effects of noise in the output graphs, and omits other fragmentation patterns that are not B or Y type. All protease digestions are assumed to be complete. Immonium ions are also omitted from the tandem mass spectrometry output graph to avoid confusion in the identification of B and Y fragments.
2DE Tandem MS can be downloaded freely from https://sourceforge.net/projects/jbf/ and comes with extensive help files, two highly interactive simulations, and two sample proteome files. After download, the 2DE_Tandem_MS.zip file must be unzipped and its files extracted to a folder (designated by the user) in the desired final location in the host computer's file structure. To begin running the program, navigate inside that folder and double-click the 2DE_Tandem_MS.jar file. The host computer must have Java installed to run 2DE Tandem MS, but any platform with Java installed can run it.
Obtaining New Proteome Files
The user can follow these steps to import additional proteomes.
2Double click on the “Complete proteome set” link for any bacterium.
3Click on the orange “Download” button in the upper right hand corner of the screen.
4The browser will now display a list of different file formats to download the complete proteome set in. Click on the “Download” link in the FASTA format box for “Canonical sequence data in FASTA format.”
5Save the downloaded FASTA file in the data folder of the 2DE_Tandem_MS file structure with the name of the organism followed by a file extension of .faa (e.g. Acinetobacter_baumannii.faa)
6Close and restart 2DE Tandem MS if it is currently running.
7Click the Add Proteins button. The new .faa file will now be accessible from the Load Protein Data File Dialog Box inside the simulations of 2DE Tandem MS.
Under the Hood
2DE Tandem MS uses several quantitative algorithms to guide its electrophoresis simulation. The pI value of each protein loaded into the electrophoresis simulation is calculated for use in determining where in the gel's pH range it settles based on values in the literature . The calculation is carried out immediately after parsing the sequences of the proteins in the loaded file, and is conducted protein-by-protein. For each protein, a value named charge is initially set to 0, and a value named pH is initially set to 7. Then each amino acid represented in the sequence is evaluated as either an acid or base, and based on their side chain pKa values a certain amount is either added or subtracted to the running total of charge. Equations (1) and (2) display the calculations used to determine that amount.
1If the side chain is classified as an acid:
2If the side chain is classified as a base:
If the charge total is within 0.005 of 0, after every amino acid in the protein sequence has been evaluated and the C-terminus and N-terminus charges have been accounted for, the calculations cease and the pH value that was in use is set as the protein's pI value. If the charge total is outside this range, a binary algorithm resets the boundaries of low and high pH, and a new pH value is set as the mean between the new boundaries. At the beginning of the calculations, the low pH is defined as 0 and the high pH is defined as 14. Refer to Eqs. (3) and (4) for the binary algorithm.
To accurately portray the protein separation representative “protein dots” move vertically down the gel canvas as the simulation progresses. The y position of each dot on the gel is calculated in iterations based on the percent acrylamide used in the gel, the molecular weight of the protein, and the position of the dot after the previous iteration. Equations (5) and (6) display the calculations used by each dot as it progresses down the gel canvas.
5If a percent acrylamide range for gradient separations is selected:
6If a fixed percent acrylamide is selected:
Using the 2DE Tandem MS application, students can explore the proteomes of any single-cell organism that are available as files in FASTA, GenBank, and PDB formats. The application comes with two initial proteomes; Escherichia coli and Pseudomonas putida KT2440. In the Supporting Information section, a series of exercises is provided to help instructors introduce the 2DE Tandem MS simulation to students. The application can be most effectively used in a computer classroom, where it has already been installed on all the computers. As an alternative, students can be asked to install the application on their computers as described above and bring them to class. Once students have the program running, we follow this pattern:
1Load a proteome file, often the E. coli K12 FASTA file.
2Select IEF conditions. Users can select from existing pH ranges (3–10 or 4–7) or select their own pH range (by simply entering two numbers separated by a hyphen).
3Select SDS-PAGE conditions. Options are fixed concentrations ranging from 5% to 18%, or four different gradient concentration ranges (4–15%, 4–20%, 8–16%, 10–20%).
4Run the IEF simulation.
5Run the SDS-PAGE simulation.
6Pick a spot and follow the links to databases to find more information (Fig. 1b).
7Explore the Search options to look for proteins by title, function, or sequence.
8Select one protein spot and transfer its sequence to the Tandem MS simulation.
9Select a protease to digest the protein.
10Run the first mass spectrometer to get the molecular weights of the fragments.
11Run the second MS to get the sequence for the fragment.
The process is described in detail in the Supporting Information exercises and the help files.
The 2DE simulation was used in two different sections of the course Molecular Modeling and Proteomics at the Rochester Institute of Technology, which is required for bioinformatics majors and can be taken as an upper division elective by biochemistry and biotechnology majors. A total of 21 students were involved between the two sections. The simulation was used in conjunction with a wet lab experiment, the separation of serum proteins by 2DE . The use of the simulation gave students exposure to further analysis of the proteins following separation, something that was not available in the wet lab. The exercise included an exploration of the E. coli proteome in the simulation, a comparison of the pI and molecular weight values from the simulation with the values reported for the same proteins from E. coli on the Swiss 2D-PAGE web site [26; world-2dpage.expasy.org/swiss-2dpage/], and a comparison of homologous proteins from other organisms using the simulation. The exercise also included a brief survey containing four open-ended questions.
Students were asked to identify the most interesting finding they encountered in these exercises. The most common comment (14–21 students) was about the variation of pI values between species within the simulation and the difference in pI values between the simulation and the actual experimental results posted on Swiss-2DPAGE. Four of the students indicated that they enjoyed using the simulation, but one student preferred using BLAST and the Expasy calculator for pI and molecular weight to complete the assignment. Two students commented on the amount of data that could be accessed from the simulation.
Next we asked them to suggest additional capabilities for the simulation. Six students suggested improvements to the user interface and help files. Students also suggested that the search engine could be improved by adding features such as searching by gene identification number or molecular weight range.
In response to a question about problems with the simulation, 10 students indicated that the application had crashed on one or more occasions. Three reported problems with loading certain Genbank (*.gbk) files. Two complained that the files loaded too slowly.
Finally, we asked for suggested improvements or additions to the exercises that would make this lab period more interesting and informative. Six students complained that the original exercise (find and describe pI and MW for 10 proteins) was repetitive and tedious. Three students suggested creating more limited data sets so that the search process would yield results more quickly. Three students also suggested creating a detailed protocol or wizard to take users through the process the first time.
The suggestions made by the students informed the ongoing development of the project. The coding was converted from Java AWT to Java Swing, which dramatically reduced software failures in subsequent testing in Windows, Macintosh OS X, and Linux. Their suggestions for explaining the application were incorporated into the Help files for the program and their comments on the exercises were considered in designing the exercises attached as Supporting Information.
The 2-D electrophoresis and tandem mass spectrometry simulations are stable and ready for classroom application, but new features are planned for both. The 2-D electrophoresis simulation will be expanded to allow the user to load a second protein file and compare the two proteomes together on the gel canvas using color-coded dots to indicate homologous proteins and proteins unique to either of the two organisms. The user will be able to define homology based on percent similarity. The tandem mass spectrometry simulation will incorporate images into its tandem mass spectrometry output graph allowing students to view chemical drawings of the peptide sequence they have identified.
In addition to expanding the existing simulations, more laboratory experiments are being explored for possible incorporation into the collection of simulations. A 1-D electrophoresis simulation and ion-exchange simulation are both under construction and a reversed phase chromatography simulation is also planned. As they are developed, these resources will be released on the Sourceforge site(https://sourceforge.net/projects/jbf/) or can be obtained by contacting the corresponding author (paul. email@example.com).