Liver cancer is one of the most commonly diagnosed cancers worldwide, and is one of the most frequent causes of cancer-related death [Jemal et al., 2011]. Hepatocellular carcinoma (HCC) has a high mortality rate, and is the major histological subtype among primary liver cancers. To date, the detailed pathogenesis of HCC is still under investigation and clinical therapies for this disease are limited and mostly inefficient [Farazi and DePinho, 2006; Llovet et al., 2003; Thorgeirsson and Grisham, 2002; Whittaker et al., 2010]. Among the problems faced by researchers, are the distinct susceptibilities of HCC worldwide [El-Serag and Rudolph, 2007; Kew, 2002], the dissimilar responses to common therapies between individuals [Worns and Galle, 2010], and the diverse survival and recurrence rates in different populations [Artinyan et al., 2010]. This diversity is partially due to human genetic variations such as mutations, polymorphisms, and mtDNA copy number [Kirk et al., 2005; Lee et al., 2004; Sheen et al., 2003; Wong et al., 2000; Wu et al., 2007; Yamada et al., 2006]. In the past several decades, since the association between human genetic variations and HCC was initially investigated, there has been a lack of an integrated database. In order to obtain a systematic view about the current research status of the association between human genetic variations and HCC, and to further accelerate the search and identification of new HCC genetic markers, we collected all studies completed so far from PubMed, extracted and sorted the data, and constructed dbHCCvar, an online database on the investigated human genetic variations in HCC. Each entry in dbHCCvar records the detailed information and major research data of an eligible study, and also provides useful links to the other two frequently used databases, NCBI and UniProt.
One of the major challenges of managing HCC is achieving more individualized treatment [Rahbari et al., 2011]. At present, HCC epidemiologic data are all fragmented and have been obtained from diverse populations [Venook et al., 2010]. Although several HCC databases have already been published [He et al., 2010; Liang et al., 2002; Su et al., 2007], these databases mainly focus on the HCC proteome, abnormally expressed genes, and the HCC network. However, to date, a comprehensive database providing integrated epidemiologic data of genetic variations in HCC is unavailable. Therefore, it is expected that dbHCCvar will serve as a valuable platform that will allow researchers to efficiently obtain information about specific human genetic variations in this disease. We hope that the dbHCCvar database will accelerate the identification of new human genetic markers for HCC, and promote the development of preventative therapies and individualized treatment for HCC.
MATERIALS AND METHODS
Text Mining and Screening
In this study, we searched PubMed (http://www.ncbi.nlm.nih.gov/sites/pubmed) for articles published before June 20, 2011, according to 408 combinations of the keywords: “HCC”/“hepatic carcinoma”/“hepatocarcinoma”/“hepatocellular carcinoma”/“hepatoma”/“liver cancer,” “polymorphism”/“SNP”/“mutation”/“mitochondria,” and “risk”/“susceptibility”/“clinicopathological”/“progression”/“development”/“treatment”/“toxicity”/“radiotherapy”/“chemotherapy”/“drug”/“reaction”/“response”/“outcome”/“efficacy”/“survival”/“recurrence”/“prognosis.” The initial collection of search results were further screened, and only articles written in English, that tested the association between human genetic variations (mutations and polymorphisms both in nuclear DNA and mtDNA) and HCC (risk, clinical pathology, drug reaction, survival, or recurrence) with sample sizes larger than 30 were included. Reviews and articles performing investigations for other liver diseases were excluded. Studies lacking statistical analyses were also not included. After the screening process was completed, a total of 183 research articles were used for database construction.
Data were independently extracted from each eligible paper by two investigators and were comprised of: journal, title, genes investigated, genetic variations investigated, detailed information of variations investigated, samples investigated, sample origin, information of additional factors (cirrhosis, HBV, HCV, aflatoxin, alcohol, and smoking), and statistical analyses about the association between genetic variations and HCC susceptibility, progression, survival, and recurrence.
We used the notation from the original publication to describe the genetic variations recorded in dbHCCvar. For unification, we further modified the original reported notation according to the standard mutation nomenclature recommendations [den Dunnen and Antonarakis, 2000]. Human genetic variations were grouped as mutations and polymorphisms in this study by the authors. Abnormal DNA sequence variants were listed as mutations. Common DNA sequence variants in the population were classified as polymorphisms. The cutoff point between a mutation and a polymorphism in this study was 1%. Therefore, if the second most frequent allele of a variation had a frequency of 1% or more in the general population, the variation was classified as a polymorphism.
Additional Information Collected
The gene symbol, gene ID, full name, aliases, and UniProt ID of genes with variations included in dbHCCvar were further obtained from NCBI [Maglott et al., 2011] and UniProt [UniProt Consortium, 2010]. Polymorphisms that were included in dbHCCvar, but did not have their rs number mentioned in the original article, were searched in other related articles and also in the NCBI SNP database (http://www.ncbi.nlm.nih.gov/projects/SNP/) for their rs number.
All data were first compiled in an excel file, and then imported into the Microsoft SQL Server as a database. The web front end was implemented using ASP (Active Server Page) and an integral part of the Microsoft IIS (Internet Information Server) web server running on Windows XP. Data update will be carried out in two ways. One way will be to take advantage of myNCBI (http://www.ncbi.nlm.nih.gov/sites/myncbi/), and to periodically add the latest eligible reports to dbHCCvar. The other way will be via guest uploading and administrator verification. All updated data will be shown in the “What's new” page of dbHCCvar.
Users can access dbHCCvar by freely visiting the (http://GenetMed.fudan.edu.cn/dbHCCvar) website. If visitors have any difficulties in linkage, they can contact the database administrator at any time by sending an email to email@example.com. The administrator will reply within two working days.
As of June 20, 2011, 666 entries referring to 513 unique human genetic variations have been listed in dbHCCvar. The detailed content of an entry in dbHCCvar is shown in Figure 1. For each entry in this database, detailed information of a genetic variation and its association with HCC reported by a reference is stored. The influence on HCC of a combination of the variation and additional factors was also recorded if so reported in the reference.
The dbHCCvar database website includes nine modules: homepage, function, abbreviations, what's new, contact us, acknowledgments, links, statistics, and user guide.
(1)Homepage gives a brief description of the dbHCCvar database.
(2)Function page is made up of four subpages: the search page (Fig. 1), the browse page, the download page, and the upload page (Fig. 2). In the search subpage, users can perform a quick search by query of a NCBI gene ID/a UniProt ID/a gene name or aliases/an rs number in NCBI/a HGV format variation. We also provided four search options in the search subpage: search by variant type, by ethnicity, by additional factors, and by phenotype. Using one or more of the four search options, users can conveniently obtain entries of genetic variations that they are interested in. In the browse subpage, users can skim through all entries in dbHCCvar and gain a general impression of the human genetic variations that have been investigated to date, and their association with HCC. In the download subpage, users can download the latest version of this database as a zip file. In the upload subpage, authors of new publications are welcome to upload to dbHCCvar, general information, and major research data from their latest studies. After verification by the database administrator within three working days, the eligible uploaded study will be formally added to dbHCCvar.
(3)Abbreviations page lists all abbreviations in this database and their corresponding full names.
(4)“What's new” page presents the latest publications that investigated the association between human genetic variations and HCC. These publications will include those directly added by the database administrator through periodic searching of PubMed, as well as publications that have been uploaded by authors and verified by the administrator.
(5)“Contact us” page shows contact information. Users are welcome to contact us for any biological or technical questions, suggestions, and error corrections.
(6)Acknowledgments page displays the researchers who provided valuable advice and helped us during the process of database construction and manuscript preparation. Financial support will also be acknowledged.
(7)Links page provides links to a series of frequently used common databases, and HCC-related databases that users may be interested in.
(8)Statistics page presents all statistical data of dbHCCvar, including the number of papers, number of unique genetic variations, number of genes, as well as other statistical information.
(9)“User guide” page compendiously introduces the functions of this database that are listed above, and will facilitate visitors'usage of this database.
The content of the dbHCCvar database is shown in Figure 3. As of June 20, 2011, there were 666 entries in dbHCCvar. These entries include information for 513 unique human genetic variations (35.3% mutations distributed in 10 genes, 64.7% polymorphisms distributed in 151 genes) among 158 genes, according to 183 publications. The mutations recorded in dbHCCvar are comprised of 99.4% somatic mutations and 0.6% germline mutations. Of the total 666 entries, 532 contain data on association with HCC susceptibility, 133 provide data on association with HCC progression, and 83 supply data on association with HCC survival or recurrence.
In dbHCCvar, TP53 genetic variations (130 mutations and four polymorphisms) were most frequently recorded (145 entries in total). In regards to polymorphisms, GSTM1 polymorphisms (two polymorphisms) were reported most frequently in the publications used for dbHCCvar construction (17 articles). It is not surprising that TP53 mutations and GSTM1 polymorphisms are the most frequently studied genetic variations. TP53 mutations were found to occur frequently in human cancer about two decades ago [Hussain et al., 2007; Levine and Oren, 2009], and the association between GSTM1 polymorphism and HCC began to attract attention more than 15 years ago [London et al., 1995; Wang et al., 2010; White et al., 2008].
As of June 20, 2011, dbHCCvar contains a total of 666 entries. Although there is diverse distribution of records about polymorphisms among 151 genes, entries on human mutations were less and mostly focused on TP53 mutations. Considering the important role that mutations play in cancer (including HCC) development [Frank, 2003; Frank and Nowak, 2004; Wong and Ng, 2008], it is necessary for researchers to promote statistical investigations of the association between various types of mutations and HCC. During data collection and database construction, we found that there were few reports investigating the association between genetic variations and HCC progression, drug response, survival, or recurrence. To date, these therapy-related parameters have been reported to vary among individuals [Artinyan et al., 2010; Huynh, 2010; Tanaka and Arii, 2010; Worns and Galle, 2010]. Thus, it is important and urgent for researchers to accelerate investigation in these areas.
dbHCCvar is easy to operate. It has several user-oriented designs and features. First, the search function of dbHCCvar is powerful. It provides multiple search options (search by query according to the NCBI gene ID/the UniProt ID/the gene symbol or aliases/the rs number in NCBI/the variation in HGV format, by variant type, by ethnicity, by additional factors, or by phenotype). Second, dbHCCvar also provides for its users, a browse and download service for the entire dataset in this database. Researchers who would like to get a holistic scan of the human genetic variations investigated in HCC, can access all data in dbHCCvar by simply pressing the “browse” or “download” button on the homepage of the dbHCCvar database website. Third, immediate data update is another important feature of dbHCCvar. In order to add newly reported data in a timely manner, dbHCCvar established two different data update approaches: Periodical updates are carried out by the database administrator, and dbHCCvar offers an extra option for authors of new publications to upload their latest data to the database. Both approaches contribute to a guaranteed updating system in dbHCCvar. Furthermore, information about additional factors (cirrhosis, HBV, HCV, aflatoxin, alcohol, and smoking) has also been collected and recorded for every entry of dbHCCvar, making systematic analysis of multiple HCC-related factors more convenient. In addition, links to the two common biological databases (NCBI and UniProt) have been placed in every entry of dbHCCvar as well, facilitating cross-database searches by users. Those aforementioned features make dbHCCvar very convenient for its users.
DISCUSSION AND PROSPECTS
HCC is a major histological subtype among primary liver cancers, and is one of the most common cancers with high mortality worldwide. To date, several HCC databases have been constructed [He et al., 2010; Liang et al., 2002; Su et al., 2007], which mainly focus on the HCC proteome, abnormally expressed genes, and the HCC network. However, a database with integrated epidemiological data of genetic variations in HCC is not yet available. To help accelerate the identification of new genetic markers for HCC, and to promote individualized treatment, we constructed dbHCCvar, a database that integrates all fragmented and current epidemiological data of genetic variations in HCC.
At present, investigations and reports about the association between different human genetic variations and HCC risk, progression, survival, and recurrence are appearing at fast pace. More and more new human mutations and polymorphisms are being studied in order to identify new biomarkers for HCC prevention, diagnosis, and prognosis. The dbHCCvar database administrator will continue to track new eligible reports in PubMed and periodically update the database. In addition, dbHCCvar encourages authors of new eligible reports to upload their data to this database. As a result of such a guaranteed updating system, dbHCCvar has the ability to provide the most current information to its users.
In this study, we constructed dbHCCvar, a database of human genetic variations that have been investigated for their association with HCC (risk, clinical pathology, drug reaction, survival, or recurrence). Through free access and web browsing, users of this database can quickly have the general knowledge of the current research status of the relationship between human genetic variations and HCC. The current research status includes whether the relationship between a particular human genetic variation and HCC has been previously investigated, or whether all previous studies have consistent or conflicting results. In addition, diverse searching options are supplied in dbHCCvar, thereby making it convenient for users to obtain information on one or more specific variants. The functions and information that dbHCCvar provides should increase the study efficiency of HCC-related human genetic variations, thus accelerating the identification of useful HCC genetic markers. We hope that dbHCCvar will be a useful tool for researchers in pertinent fields, and will benefit the prevention and therapy of HCC.