Unit

You have free access to this content

UNIT 1.3 Searching NCBI Databases Using Entrez

  1. Gretchen Gibney,
  2. Andreas D. Baxevanis

Published Online: 1 JUN 2011

DOI: 10.1002/0471250953.bi0103s34

Current Protocols in Bioinformatics

Current Protocols in Bioinformatics

How to Cite

Gibney, G. and Baxevanis, A. D. 2011. Searching NCBI Databases Using Entrez. Current Protocols in Bioinformatics. 34:1.3:1.3.1–1.3.25.

Author Information

  1. Bethesda, Maryland

Publication History

  1. Published Online: 1 JUN 2011
  2. Published Print: JUN 2011
thumbnail image

Figure 1.3.1. The Entrez unified results page, showing the number of hits to each of Entrez's component databases fitting the query. Clicking on any of the numbers to the left of the database name takes the user to the results found in that particular database.

thumbnail image

Figure 1.3.2. Results of a text-based Entrez query using Boolean operators against PubMed. The initial query (from Fig. 1.3.1) is shown in the search box near the top of the window. Each entry gives the title of the paper, names of the authors, and the citation information. An individual record can be viewed by clicking on the hyperlinked title of that paper.

thumbnail image

Figure 1.3.3. An example of a PubMed record in Abstract format, as returned through Entrez. This Abstract view is for the fourth reference shown in Figure 1.3.2. The view provides connections to related articles, sequence information, and the actual, full-text journal article. See text for details.

thumbnail image

Figure 1.3.4. Related citations for an entry found in PubMed. The original entry from Figure 1.3.3 (Cho et al., 1994) is at the top of the list, indicating that this is the parent entry.

thumbnail image

Figure 1.3.5. The Entrez Gene page for the DCC (deleted in colorectal carcinoma) gene. The screen shows that this is a protein-coding gene and provides information on the genomic context of DCC and the encoded protein. An extensive collection of links to other NCBI and external databases is provided along the right-hand side of the window. See text for details.

thumbnail image

Figure 1.3.6. The dbSNP GeneView page for the DCC gene. The information on individual SNPs is shown in the table towards the bottom of the screen. Each SNP occupies two lines of the table, with one line showing the “contig reference” (the more common allele) and the other showing the SNP (the less common allele). For example, the first two rows in the table show a contig reference A for which there is a documented SNP, changing the A to a G. At the protein level, this changes the amino acid at position 3 of the DCC protein from asparagine to serine. The rows are colored red since this is a “nonsynonymous SNP;” that is, the SNP produces a discrete change at the amino acid level. In contrast, the fifth and sixth rows of the table are shown in green, indicating that this record is for a “synonymous SNP;” the entries describe a SNP where the contig reference (T) and the SNP allele (C) ultimately produce the same amino acid (Asp).

thumbnail image

Figure 1.3.7. The RefSeq protein entry corresponding to the original Cho et al. (1994) publication shown in Figure 1.3.2, in GenPept format. See text for details.

thumbnail image

Figure 1.3.8. The OMIM entry for the DCC gene. Each entry includes information such as the gene symbol, alternate names for the disease, a description of the disease, a clinical synopsis, and references.

thumbnail image

Figure 1.3.9. An example of a list of allelic variants that can be obtained through OMIM. The figure shows the four allelic variants for the DCC gene, two leading to cancers of the digestive tract and two that are associated with a movement disorder. The description under each allelic variant provides information specific to that particular mutation.

thumbnail image

Figure 1.3.10. Gene Expression Omnibus (GEO) DataSets for the DCC gene. For each DataSet, a brief description of the experiment is provided, as well as a schematic of the gene expression profile derived in the study.

thumbnail image

Figure 1.3.11. The MedlinePlus page devoted to information for both laymen and physicians on DCC and disorders related to DCC. The information available through this page is often much more appropriate to provide to patients, since the level of writing is geared towards nonprofessionals. Often, MedlinePlus entries include interactive tutorials for various procedures related to the disease of interest.

thumbnail image

Figure 1.3.12. The ClinicalTrials.gov page showing actively recruiting clinical trials relating to colorectal neoplasms. Information on each trial, including the principal investigator of the trial and qualification criteria, can be found by clicking on the name of the trial.

thumbnail image

Figure 1.3.13. Searches saved through My NCBI can be recalled, viewed, and updated through the Saved Searches option under the My Saved Data on the user's My NCBI page. See text for details.

thumbnail image

Figure 1.3.14. Formulating a search against the nucleotide portion of Entrez. The initial query is shown in the text box near the top of the window (DNA-binding), and the nucleotide entries matching the query are displayed below. See text for details.

thumbnail image

Figure 1.3.15. Using the Limits feature of Entrez to limit a search to a particular organism. See text for details.

thumbnail image

Figure 1.3.16. Results of a limited search against the nucleotide portion of Entrez. The initial query is shown in the text box near the top of the window (methanothermobacter), and the nucleotide entries matching the query are displayed below. Note the caution (!) icon next to the words Limits Activated at the top of the results page, indicating that the results displayed have been “limited,” here to a particular organism (Fig. 1.3.15). See text for details.

thumbnail image

Figure 1.3.17. Combining individual queries using the Advanced Search feature of Entrez. Each search performed in the last 8 hr is saved and given a number in Search History. The searches can be combined using the search numbers and the Boolean operators AND, OR, or NOT. See text for details.

thumbnail image

Figure 1.3.18. Entries resulting from the combination of two individual Entrez queries. The query term producing the results is shown in the Search Box near the top of the window (#17 AND #18). The numbers correspond to those assigned to the previously performed searches listed in Figure 1.3.17. See text for details.

thumbnail image

Figure 1.3.19. The structure summary for 1HMF, resulting from a direct query of the structures accessible through the Entrez system. The entry shows header information from the corresponding MMDB entry, links to PubMed, and links to the taxonomy of the source organism. Structure neighbors, as assessed by VAST, can be found by clicking on the long bar (purple on screen) next to the Protein key. The structure itself can be viewed by clicking on the Structure View in Cn3D button, thereby spawning the Cn3D viewer.

thumbnail image

Figure 1.3.20. The structure of 1HMF rendered using Cn3D version 4.1, an interactive molecular viewer. Cn3D can be used as a helper application to any Web browser or as a stand-alone application. In panel A, the backbone of the structure is shown as a worm, with the coloring indicating secondary structural regions; in this case, there are three α-helices, shown in green, with a “crayon” indicating the length and directionality of the helix. Four residues have been highlighted in the sequence window, and those residues are shown in yellow in the structure window. In panel B, the rendering of the structure has been changed, showing the structure in space-filling style, with the coloring being done by charge (red, negative; blue, positive). For both panels, the coloring shown in the structure window is mirrored in the sequence window below. See text for details.

thumbnail image

Figure 1.3.21. Changing the rendering and coloring of selected parts of a structure. The Style Options window also allows individual residues to be numbered and the dimensions of side chains and other features to be changed. See text for details.

thumbnail image

Figure 1.3.22. An overview of the relationships in the Entrez integrated information retrieval system. Each node represents one of the elements that can be accessed through Entrez, and the lines represent how each component database connects to the others. Entrez is under continuous evolution, with new components being added and the interrelationships between the elements changing dynamically. (Figure from The Entrez Search and Retrieval System, The NCBI Handbook; see Internet Resources.) A Flash-based version of this figure can be found at http://www.ncbi.nlm.nih.gov/Database/datamodel/index.html.