Herbal hepatotoxicity: a critical review


  • Rolf Teschke,

    Corresponding author
    • Department of Internal Medicine II, Division of Gastroenterology and Hepatology, Klinikum Hanau, Academic Teaching Hospital of the Medical Faculty of the Goethe University Frankfurt/Main, Frankfurt Main, Germany
    Search for more papers by this author
  • Christian Frenzel,

    1. Department of Medicine I, University Medical Center Hamburg Eppendorf, Hamburg, Germany
    Search for more papers by this author
  • Xaver Glass,

    1. Office of the Dean, Medical Faculty of the Goethe University Frankfurt/Main, Frankfurt Main, Germany
    Search for more papers by this author
  • Johannes Schulze,

    1. Office of the Dean, Medical Faculty of the Goethe University Frankfurt/Main, Frankfurt Main, Germany
    Search for more papers by this author
  • Axel Eickhoff

    1. Department of Internal Medicine II, Division of Gastroenterology and Hepatology, Klinikum Hanau, Academic Teaching Hospital of the Medical Faculty of the Goethe University Frankfurt/Main, Frankfurt Main, Germany
    Search for more papers by this author


Professor Rolf Teschke MD, Department of Internal Medicine II, Klinikum Hanau, Academic Teaching Hospital of the Goethe University of Frankfurt/Main, Leimenstrasse 20, D-63450 Hanau, Germany.

Tel.: +49 6 1812 1859

Fax: +49 618 1296 4211

E-mail: rolf.teschke@gmx.de


This review deals with herbal hepatotoxicity, identical to herb induced liver injury (HILI), and critically summarizes the pitfalls associated with the evaluation of assumed HILI cases. Analysis of the relevant publications reveals that several dozens of different herbs and herbal products have been implicated to cause toxic liver disease, but major quality issues limit the validity of causality attribution. In most of these reports, discussions around quality specifications regarding herbal products, case data presentations and causality assessment methods prevail. Though the production of herbal drugs is under regulatory surveillance and quality aspects are normally not a matter of concern, low quality of the less regulated herbal supplements may be a critical issue considering product batch variability, impurities, adulterants and herb misidentifications. Regarding case data presentation, essential diagnostic information is often lacking, as is the use of valid and liver specific causality assessment methods that also consider alternative diseases. At present, causality is best assessed by using the Council for International Organizations of Medical Sciences scale ( CIOMS) in its original or updated form, which should primarily be applied prospectively by the treating physician when evaluating a patient rather than retrospectively by regulatory agencies. To cope with these problems, a common quality approach by manufacturers, physicians and regulatory agencies should strive for the best quality. We propose steps for improvements with impact on future cases of liver injury by herbs, herbal drugs and herbal supplements.


Herbal hepatotoxicity or herb induced liver injury (HILI) is causally related to natural products consumed by humans. Usually these herbs are avoided by animals due to protective mechanisms as nicely described two decades ago [1, 2]. Herbs synthesize a broad spectrum of chemicals with beneficial properties when used in appropriate amounts and with toxic features when consumed in excess. When herbivorous animals encounter these plants, they normally leave these herbs due to their often unpleasant, strong, bitter, or fetid taste. The same plants are collected by humans for herbal preparations. Herbs are used either in their original forms as teas and food additives, or are manufactured into herbal products like herbal drugs and herbal supplements. Though, erroneously, herbs and herbal products were considered safe for a long time, there is now growing evidence that herbs may cause adverse reactions of variable severity involving numerous organs including the liver [2, 3].

The diagnosis of HILI is a particular clinical and regulatory challenge, as shown by recent analyses involving cases of suspected greater celandine (GC) hepatotoxicity [4-6] and by previous reports related to a few dozens of different herbs and herbal products [3, 7-32]. Each individual case of assumed HILI must primarily be considered as a signal of safety concern, which requires further follow-up evaluation as to whether this signal is correct. This approach will prove whether a particular herb or herbal product was hepatotoxic in an individual patient. Using this method of evaluation, numerous shortcomings emerged, mostly related to quality issues.

This review critically summarizes the pitfalls associated with the evaluation of assumed HILI cases and proposes steps for improvements with their desired impact on upcoming cases of liver injury by herbs, herbal drugs and herbal supplements.

Greater celandine hepatotoxicity and quality issues

Primary GC hepatotoxicity has been assumed in 69 cases, with 21 cases published as case reports [4], and 48 spontaneous reports communicated to the German regulatory agency BfArM (Bundesinstitut für Arzneimittel und Medizinprodukte) [5, 6]. Of these spontaneous reports, the regulatory agency finally assumed a probable or possible causality in 22 cases [5, 6], but the evaluation algorithm in use remained undeclared [6]. Also, in most of the 21 published case reports, information on the causality level and type of algorithm was lacking [4]. Reassessment of these 21 case reports and the 22 spontaneous reports using both the original and the updated scale of the Council for International Organizations of Medical Sciences (CIOMS) resulted in four cases with a highly probable GC hepatotoxicity and in 12 cases with a probable causality [4, 5]. Comparing these 16 cases with established causality with the initial 69 signals of safety concern [4-6] indicates a system with a low threshold for case reporting and high over-reporting since only 23% of the cases were likely correctly diagnosed upon appropriate causality assessment. Therefore, thorough evaluation is advised as an important quality criterion.

Detailed analyses of the initial 69 cases of suspected GC hepatotoxicity revealed numerous criteria of low quality including confounding variables [4-6]. Among these were insufficiently or otherwise poorly documented cases, case duplicates and causality obviously being unrelated to GC intake. Striking differences in data quality existed between published case reports [4] and spontaneous reports [5, 6]. In particular, product information and treatment modalities were poorly documented in the case reports [4] but satisfactorily in the spontaneous reports [5, 6]. Exclusion of hepatitis A–C infections was provided in all published case reports [4] but inconsistently in spontaneous reports [5, 6]. In both groups, major shortcomings were evident in documentation of diagnostic criteria and exclusion of alternative diagnoses [4, 5].

Despite these pitfalls in the course of case evaluation, there are ample data to characterize GC hepatotoxicity as a typical liver disease, based on data from the 16 cases with established causality for GC [4, 5]. Considering laboratory data, pathogenetic aspects and clinical manifestations, GC hepatotoxicity emerges as a specific form of hepatocellular pattern of injury, likely based on an idiosyncratic subtype caused by a rare metabolic aberration, and with clinical features of an acute liver toxicity.

General aspects and quality specifications of HILI

The general topic of liver injury due to herbal medicines is an important but neglected subject. For most herbs, HILI is a rare disease, occurring in a few susceptible individuals, and has characteristics similar to those of drug induced liver injury (DILI) [3, 28-32] as well as many other liver diseases unrelated to herbs and drugs [33-35]. In addition, diseases of other organs such as the gall bladder, bile ducts and pancreas may mimic symptoms of HILI and DILI [35]. Thus, clinical presentation alone does not allow the diagnosis of HILI, unless supplementary information is provided and evaluated.

Though a literature search reveals numerous different herbs and herbal products that have been implicated to cause toxic liver disease [4-32], case data are often confounded by alternative diagnoses [36] and scattered [4-32]. In the majority of these reports, problems of quality specifications prevailed regarding the herbal product, case data presentation and causality assessment method [4, 5, 19, 23, 28-60] (Table 1).

Table 1. Quality standards for assessing cases with suspected herbal hepatotoxicity
Items with required quality specifications
  1. Required quality specifications of herbal products refer to herbs, herbal drugs and herbal supplements including herbal mixtures. Additional details of quality specifications are discussed for herbal products [19, 23, 28, 30, 31, 36, 37, 39-46], case data [4, 5, 32, 33, 35, 36, 39, 47-52] and causality evaluation [4, 5, 28, 29, 32-39, 49, 50, 53-57].
  • Herbal products
    • Good Agricultural Practices (GAPs)
    • Good Manufacturing Practices (GMPs)
    • Definition of plant family, subfamily, species, subspecies and variety
    • Definition of plant part
    • Definition of solvents and solubilizers
    • Lack of impurities, adulterants and misidentifications
    • Minimum of batch to batch variability
    • Minimum of product to product variability
    • Lack of variety to variety variability
  • Case data
    • Qualified data acquisition and documentation of complete data
    • Transparent presentation of all data, not just the tip of the iceberg
    • Initial assessment of a temporal association, then of a causal relationship
  • Causality evaluation
    • Liver specific causality assessment method
    • Assessment method validated for hepatotoxicity
    • Structured and quantitative method
    • Use of the CIOMS scale
    • Assessment by skilled hepatologist with clinical experience
    • Regulatory assessment with assistance of external experts
    • High graded transparency of causality assessment results
    • Presentation of the results item by item with individual scores

Quality of herbs and herbal products

The production of herbal drugs is under regulatory surveillance and their quality normally is not a matter of concern [5, 6]. For unregulated herbal supplements, however, poor quality may be a critical issue [30, 36] with major implications for assessing causality of suspected herbal hepatotoxicity. In particular, causality attribution of herbal supplements is hampered if quality standards are neglected during the production process [36]. Good manufacturing practices (GMPs) are outlined by the WHO [40-42], and other quality requirements are found in the relevant literature [19, 23, 28, 30, 31, 36, 37, 39, 43-46] as presented in Table 1 in detail.

Case data quality

General rules for high quality data regarding both case publications and spontaneous reports are summarized (Table 1), and the specific key elements required for each individual case of suspected herbal hepatotoxicity undergoing clinical evaluation and causality assessment are presented (Table 2) [4, 5, 33, 35, 39, 47-52]. Depending on the clinical presentation, several hundreds of other liver diseases may be of potential relevance and ought to be considered. A compilation of important alternative diagnoses may serve as a reminder for clinicians [35], in addition to common liver diseases normally ruled out at the beginning of any clinical assessment [33, 35, 39, 50].

Table 2. Requirements for causality evaluations of cases with suspected herbal hepatotoxicity
Key elements essential for sophisticated causality assessment
  1. Latency period indicates time from herb start to symptoms, alternatively to abnormal liver tests. The data are derived from various reports on cases of drug and herbal hepatotoxicity [4, 5, 33, 35, 36, 39, 47-50]. For exclusion of other differential diagnoses, recommendations are given in special reports [33, 35, 39, 50]. ALT, alanine aminotransferase; ALP, alkaline phosphatase; AST, aspartate aminotransferase; CMV, cytomegalovirus; EBV, Epstein Barr virus; HAV, hepatitis A virus; HBV, hepatitis B virus; HCV, hepatitis C virus; HEV, hepatitis E virus; HSV, herpes simplex virus; VZV, varicella zoster virus.
  • Details and clinical characteristics of patients
    • Gender, age, body weight, height, BMI
    • Ethnicity, profession
    • Past medical history regarding general diseases and specifically liver diseases
    • Definition of risk factors such as age and alcohol
    • Alcohol and drug use
    • Statement regarding actual treatment including steroids or ursodesoxycholic acid
  • Herbs and their use
    • Brand name with details of ingredients, plant parts, batch number, and expiry date
    • Identification as herbal drug or herbal supplement
    • Herb as an ingredient of a polyherbal product or an undetermined herbal product
    • Manufacturer with address
    • Indication of herbal use with dates of symptoms leading to herbal treatment
    • Daily dose with details of the application form
    • Exact date of herb start and herb end
  • Clinical course and temporal association
    • Timeframes of challenge, latency period and dechallenge
    • Accurate dates of emerging new symptoms after herb start in chronological order
    • Accurate date of initially increased liver values
    • Verification or exclusion of a temporal association
  • Liver values
    • ALT value initially including normal range
    • ALT values during dechallenge at least on days 8 and 30, as well as later on
    • ALT values during dechallenge to exclude a second peak
    • ALT normalization with exact date and actual value
    • ALP value initially including normal range
    • ALP values during dechallenge at least on days 8 and 30, as well as later on
    • ALP values during dechallenge to exclude a second peak
    • ALP normalization with exact date and actual value
    • AST value initially including normal range
    • Laboratory criteria for definition of hepatotoxicity and its pattern
  • Alternative diagnoses
    • Assessment of pre-existing and co-existing liver unrelated diseases
    • Assessment of pre-existing and co-existing liver diseases
    • Consideration of the several hundreds of other possible liver diseases
    • Providing details to exclude alternative diagnoses
    • Assessment and exclusion of HAV, HBV, HCV, HEV, CMV, EBV, HSV, VZV
    • Liver and biliary tract imaging including color Doppler sonography of liver vessels
    • Specific evaluation of alcoholic, cardiac, autoimmune and genetic liver diseases
    • Individual quantitative score of each alternative diagnosis
    • Comedicated synthetic drugs, herbal drugs, herbal and dietary supplements
    • Individual quantitative score of each individual comedication
  • Re-exposure and known hepatotoxicity of the herb
    • Definition of and search for accidental, unintended re-exposure
    • Assessing and individual scoring of unintended re-exposure
    • Search for evidence of prior known hepatotoxicity of the suspected herb
    • Assessing and individual scoring of known hepatotoxicity caused by the herb

Quality of causality assessment

The preferred tool for causality assessment of hepatotoxicity cases is the CIOMS scale, either in its original [33, 34] or its updated form [35, 50]. The CIOMS scale is liver specific and validated for hepatotoxicity [34, 55], features lacking for the Naranjo scale [58], the WHO global introspection method [60] as the WHO method in short [56] and the ad hoc causality approach [59]. Discussions and uncertainties about hepatotoxicity causality abound if assessed by the Naranjo scale [28, 39, 57, 61, 62], the WHO method [56] or the ad hoc approach [48, 56, 63]. Since these approaches are neither liver specific nor validated for hepatotoxicity, they are considered obsolete for drug and herbal hepatotoxicity causality assessment [56].

Although the quality of HILI case documentation is important, it is not always possible to obtain complete information in order to establish causality in a given case, unless the regulatory agency receives further information upon active request. CIOMS based assessments do not reject cases with incomplete data, but address missing items either by subtracting or withholding scores [33, 35]. This contrasts with the WHO method, which lacks a list of required items and thereby does not specifically consider case data quality [56, 60]. Apart from not being validated for liver toxicity, the WHO method surprisingly lacks even general or specific validation for any adverse reaction of organs unrelated to the liver [56, 60, 64-67].

Cases of herbal hepatotoxicity are normally presented as case reports, which do not allow characterization of general herbal hepatotoxicity for the herbal product used. However, accidental re-exposure and/or thorough causality assessment methods have provided clear evidence for hepatotoxic properties of some herbal products, in addition to GC [4, 5]. Among these are Ayurvedic herbs [38], Chaparral [12], Chinese herbal mixture [10, 13, 68], germander [9, 15], few Herbalife products [21, 22], Ho Shou Wu [69], Jin Bu Huan [11, 70], Kava [71], Ma Huang [72], mistletoe [7], senna [8] and Syo Saiko To [14]. As opposed to these herbs and herbal products, causality could not be established for herbs such as black cohosh (BC) [36, 39, 57] and Pelargonium sidoides (PS) [64-67], using the CIOMS scale [33, 35]. For BC, discussions focused on lack of transparency, poor data presentation, contradictory case data, confounding variables, overall case over-reporting and the use of the Naranjo scale [36, 39, 55, 57, 61, 62]. In suspected PS cases, similar shortcomings were evident and also included overlooked case duplications and retraction, case over-reporting as a specific pharmacovigilance problem and the application of the WHO method as the causality assessment tool [66, 67] and topics for further discussions [64, 65].

Future challenges of HILI

There is sufficient evidence that past assessment of HILI cases often was inappropriate. To recognize better current shortcomings and to ensure safe use of herbal products, responsibility is preferentially shared between manufacturers, reporting physicians, assessing hepatologists and regulatory agencies. Considering the popularity and ensuring the overall safety of herbal products, drug quality has to be maintained by the manufacturers, complete case data are to be provided by reporting physicians and the best assessment method should be developed or used by regulatory agencies. Though key quality specifications have been provided for each level of responsibility (Tables 1, 2), some open questions remain. As long as high report numbers are provided as an argument for causality instead of a valid causality assessment resulting in less cases with stringent probability, the problem of data quantity vs. quality remains [36, 55, 57, 62]. In addition, incorrect diagnoses in assumed HILI cases will be minimized during the care of patients with hepatotoxicity [4, 5, 36, 66, 67].

Case over-reporting

Poor data quality is a well recognized problem [36, 39, 61, 66, 67]. Thus the approach of report quantity (counting cases with low quality reports) of suspected HILI over less cases with a high and valid causality level (based on complete, good data) is a dilemma for regulatory agencies [28, 36, 55, 57, 62]. Using inappropriate causality algorithms adds another problem. This combination inevitably leads to initially high numbers of HILI signals, which will be culled to low numbers of cases upon rigorous assessment [36, 39, 49, 54, 55, 57, 61, 62, 66, 67]. Thus, regulatory case over-reporting refers to the use of the total number of cases as signals of safety concern rather than applying appropriate causality assessments to identify good quality highly probable and probable cases by the regulatory agencies. It remains to be decided on an individual basis whether cases with a possible causality are to be included in the respective analyses. Even with strong signals for safety concern in high numbers of reported cases, stringent and valid causality assessments must be applied. Regulatory agencies are well advised to prevent case over-reporting by not going public with cases as signals of safety concern, unless accompanied by established causality levels using the CIOMS scale. For the detection of a signal of safety concern, no causal relationship between herbal intake and adverse reaction has to be proven [64]. However, publicizing signals in the absence of causal proof will create public confusion as well as scientific discussions [64-67]. Since cases for signals of concern often go back for months or even years when they have first been presented to the regulatory agencies, sufficient time is available for appropriate causality assessments.

The problem of regulatory case over-reporting has well been demonstrated by the European Medicines Agency (EMA) assessing 31 spontaneous cases of assumed HILI by BC reported from EU countries. Following the application of the CIOMS scale, only one single case of suspected HILI by BC remained with a possible causality for BC, equivalent to 3% of the reported 31 cases [73]. A similar degree of regulatory case over-reporting was recognized for PS, when the CIOMS scale was applied for the causality assessment [66, 67]. These examples do not necessarily support the notion that a few true cases among the signals of safety concern are only the tip of an iceberg of unreported cases. Another problem emerges when regulatory case analysis contains procedural errors and inconsistencies, resulting in an inflated number of high graded causalities [36, 57, 61]. In suspected HILI cases, quality of causality assessment is more important than quantity of counted cases, not vice versa.

Missed diagnoses

The high number of missed diagnoses in suspected HILI cases is disturbing and results from disproven causality after assessment [4, 5, 36, 53, 61, 66, 67]. This problem is also described for assumed DILI cases [35]. Missed diagnoses may be a topic in a liability court [39, 74]. Undetected alternative diagnoses in patients with purported HILI cases also create concern, because delayed institution of the appropriate therapy is associated with the risk of prolonged or permanent health hazards. However, in any case of suspected HILI, discontinuation of the accused herb(s) as well as any compound with hepatotoxic potential is mandatory, just to be on the side of caution. Therefore, correct diagnoses are often missed when a temporal association between herbal use and the observed liver disease is given a higher diagnostic priority than other data and assessments.

CIOMS scale in prospective use

For future cases of suspected HILI, a pragmatic approach is desirable, beginning with the physician who assumes that a liver disease might be caused by a herbal product. Apart from assessing the clinical presentation and excluding alternative causes, the CIOMS scale should be used to ensure that all relevant data are considered to establish or disprove causality [33, 35]. Case data and the CIOMS framework listing each individual item point by point should be provided to the regulatory agency for further evaluation and refinement of the submitted CIOMS data. There is some uncertainty, however, how familiar physicians are with the CIOMS scale and if it is practical to suggest physicians to list each point and provide the data to the regulatory agencies for all cases of suspected HILI, particularly as submitting reports of suspected adverse reactions is generally voluntary and not rewarded. We prefer fewer reports with complete CIOMS details to establish clear causality levels to multiple reports lacking details and leading to dispute.

It is desirable for a regulatory agency to be active in completing data sets as needed and not to be a passive reporting portal. Discrepancies will be limited when physicians and regulatory agencies use the CIOMS scale as an identical assessment tool. This method is unequivocal in its questions and transparent with respect to the answers. In the past, the EMA has used the CIOMS scale to assess HILI cases [73] and thereby gained a good reputation as a trend setter. Hopefully, other regulatory agencies will follow in the future. Though this approach will facilitate good pharmacovigilance, the legal and regulatory framework needs to be improved, especially for herbal supplements. Neither the Naranjo scale, the WHO method nor the ad hoc approach are substitutes for the CIOMS scale and they should be abandoned for evaluation of case reports and spontaneous reports of HILI and DILI cases [49].


Causality assessment of herbal hepatotoxicity is a major clinical and regulatory challenge based on low product quality, poor case data presentation and use of insufficient causality algorithms. Future improvements should take into consideration these shortcomings, providing efficient tools to overcome the discussed problems. Regulatory cases should be presented as signals of safety concern only when accompanied by transparent data and valid CIOMS causality results that provide highly probable or probable causality levels for the product under consideration. Publicizing signals without causality assessment creates avoidable public confusion and scientific discussions.

Competing Interests

There are no competing interests to declare.