SEARCH

SEARCH BY CITATION

Keywords:

  • natural product;
  • neural network;
  • quantitative composition–activity relationship;
  • rational drug design;
  • support vector regression

Abstract

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgments
  8. References

Herbal medicine has been successfully applied in clinical therapeutics throughout the world. Following the concept of quantitative composition–activity relationship, the presented study proposes a computational strategy to predict bioactivity of herbal medicine and design new botanical drug. As a case, the quantitative relationship between chemical composition and decreasing cholesterol effect of Qi-Xue-Bing-Zhi-Fang, a widely used herbal medicine in China, was investigated. Quantitative composition–activity relationship models generated by multiple linear regression, artificial neural networks, and support vector regression exhibited different capabilities of predictive accuracy. Moreover, the proportion of two active components of Qi-Xue-Bing-Zhi-Fang was optimized based on the quantitative composition–activity relationship model to obtain new formulation. Validation experiments showed that the optimized herbal medicine has greater activity. The results indicate that the presented method is an efficient approach to botanical drug design.

In recent years, there has been an increasing interest in the use of herbal medicine for prevention and treatment of various illnesses, especially when people realize that sometimes modern medicine is not capable of providing a ‘cure-all’ solution for human diseases (1). In China, almost half of the commercial drugs are botanical drugs, which are developed from herbal medicines and widely applied in the treatment of various chronic diseases including cardiovascular disease and cancer. Usually, herbal medicine is composed of several herbs in appropriate proportion (2). The constituent herbs and their proportion of certain herbal medicine are determined according to traditional medical knowledge. Unlike modern drugs in the form of single chemical ingredient, herbal medicine may contain hundreds of chemical compounds. Many researchers believe that synergistic effect of different active ingredients contributes to the therapeutic effect of herbal medicine (3). Modern clinical trail has proved that botanical drug composed of multiple herbs in certain proportion has greater efficacy than a single herb (4). Therefore, modern botanical drug can be produced as a combination of different active components from herbs. However, the complex chemical composition of herbal medicine leads to the lack of appropriate method for identifying active compounds and optimizing the formulation of herbal medicine.

In modern pharmaceutical industry, computer-aided drug design methods, such as quantitative structure–activity relationship (QSAR) study has greatly accelerated the pace of drug discovery in recent decade (5,6). The underlying assumption behind QSAR analysis is that the variation of biologic activity within a group of compounds can be correlated with the variation of their respective structural and chemical features. Many machine-learning methods have been extensively applied to the process of drug discovery for building QSAR models. In last decades, multiple linear regression (MLR) is among the most widely used methods to derive linear mapping between the activity and the values of structural features (7). However, problems related to multicollinear or over-abundant descriptors may cause unstable results of MLR analysis. Therefore, various types of neural networks, such as backpropagation neural networks, Bayesian regularized neural networks, and probabilistic neural networks, are regarded as valuable tools in investigating non-linear QSAR (8). Support vector machine (SVM) is a new developed supervised machine-learning paradigm based on statistical learning theory. Because of its stability and active learning capability, SVM has been successfully used to find active compounds at different stages of the drug discovery process (9).

Because detailed structural information of all compounds in herbal medicine is not available, QSAR method cannot be directly used to predict bioactivity of herbal medicine. However, variation of biologic activity of herbal medicine is tightly associated with the variation of their chemical composition. Such relationship between chemical composition and biologic activity was regarded as quantitative composition–activity relationship (QCAR; 10). By quantitatively analyzing the chemical composition–bioactivity relationship, mathematical model could be established to predict activity of herbal medicine in the same manner that parallels current QSAR study. Moreover, optimal combination of herbal medicine can be calculated based on QCAR model, which enables us to integrate different active components to form a more effective botanical drug.

In the present study, several computational approaches including MLR, artificial neural network (ANNs), and support vector regression (SVR) are employed in modeling the relationship between chemical compositions and decreasing cholesterol effect of Qi-Xue-Bing-Zhi-Fang (QXF), which is one of the most famous herbal medicines for the treatment of cardiovascular disease. An optimal combination of two active fractions of QXF was obtained. Results of animal experiment validated that the optimized combination exhibits greater efficacy than original formulation. The present study affords an efficient way to develop combinatory botanical drug from herbal medicine.

Methods

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgments
  8. References

Basic concept of quantitative composition–activity relationship

Quantitative composition–activity relationship is the correlation between chemical composition and its biologic activity of a complex chemical system, such as combination drugs and herbal medicines. Usually, herbal medicine can be divided into several fractions called components, which are relatively unambiguous and easily being controlled in preparation process (11). Activity of botanical drug can be calculated by the descriptors of these components through mathematical function or models.

Suppose herbal medicine R is composed of several kinds of herbs, which have a total weight XR, aqueous extracts of this formulation can be divided into n kinds of different components Ci (i =1, 2,…,n) with an yield rate Ti (i = 1, 2,…,n). Thus, the weight of each fraction can be represented as

  • image(1)

Hence, herbal medicine R can be regarded as a combination of different components, represented by a vector [X1X2,…,Xn]. Moreover, m kinds of biologic properties can be assayed and denoted by column vector Y = [Y1Y2,…,Ym]T. If the weight of some fractions of R is changed to obtain another combination R′, represented by [inline image], its bioactivity may relatively vary. Therefore, the relationship between biologic activity and chemical composition of R can be represented by a mathematic function:

  • image(2)

If a single fraction Ci is active, bioactivity of entire combination increases accompanying with the incremental amount of Ci. The function can be simplified to a linear or log linear function. However, synergetic effects of several active components should be taken into consideration in QCAR model. Non-linear model is more suitable to describe complex relationship between active fractions and the integrated pharmaceutical efficacy.

To obtain a useful data set, many combinations of R with different proportion are needed. An experiment design D can be helpful to arrange the appropriate numbers of the combinations. D is a m × n matrix, where m is the number of combination groups, Djk(j ∈ [1, m],k ∈ [1,n]) is the proportion of component Ck in combination j, which is in the range of [0,1]. Thus, the chemical composition information of m group is represented by matrix Xm × n = D × XX = [X1X2,…, Xn]. Subsequently, the pharmacologic activity of these combinations can be assayed and represented by a matrix Y. Moreover, the control group can be regarded as the combination with none of the fraction, which is expressed as vector [0,0,…,0]. Its bioactivity is Y0. To eliminate the influence of different dosage of each fraction in original formulation, the chemical information of those groups should be normalized. The relative weight of ith fraction Ci is calculated according to a linear scale based on its maximum amount (Xi max) and minimum amount (Xi min) of the total groups, as shown in eqn 3:

  • image(3)

The relative bioactivity is obtained in the same way.

Based on the proposed data set, identifying QCAR of herbal medicine is transformed into a pattern recognition or function approximation problem as the same as QSAR study. To increase computational accuracy and efficiency, artificial intelligence techniques, such as MLR, ANNs, SVM have been employed to establish a linear or non-linear model based on the chemical information data set x and pharmacologic information data set Y.

Three algorithms including MLR, radical function artificial neural network (RBFANN), and support vector regression (SVR) were applied in modeling QCAR of QXF. MLR was performed using spss 10.0 (SPSS Inc., Chicago, IL, USA), while SVR was performed with SVR software libsvm, which is provided by Chih-Chung Chang and Chih-Jen Lin and obtained through the network services. ANNs algorithm was implemented by in-house program in matlab 6.5 (Mathwork Inc., Natick, MA, USA). All calculations were processed on a computer with Celeron 1.1 G and 128 MB RAM. The detailed algorithms of MLR, ANNs, and SVR can be found in Refs (12,13).

Designing of botanical drug by QCAR model

The proposed quantitative chemical composition–activity model can be used to identify active components and design botanical drug.

Again for herbal medicine R, suppose the active components are X1 and X2. In original formulation, their proportion is T1/T2 and their total weight is XX1+X2. If the proportion of X1 and X2 was regulated while their total weight X is changeless, different bioactivities may be produced due to the variation of composition. QCAR model can be used to predict efficacies in each condition of proportion and select the best one. And another pharmacologic trial will help to test the validity of computational resulting through comparing activities of primal and optimized proportion. In this way, botanical drug can be designed more efficiently. A brief process of rational drug design for combinatory botanical drug based on QCAR model is illustrated in Figure 1.

image

Figure 1.  An illustration for the procedure of rational drug design for botanical drug through quantitative composition–activity relationship (QCAR) study. First, herbal medicine is divided into several components. Then these components are reassembled together to obtain a series of training samples containing known information of chemical composition and activity. Machine learning methods are adopted to set up prediction model or find out active components. Subsequently, the QCAR model can be used to predict bioactivities of component combinations with discretionary proportion. In this way, optimal combination of active components can be picked out and developed to new combinatory botanical drug with greater bioactivity.

Download figure to PowerPoint

Data sets

Qi-Xue-Bing-Zhi-Fang, composed of six herbs including Ligusticum chuanxiong Hort., Radices paeoniae rubra, and Carthamus tinctorius, has been successfully applied in remedy of many cardiovascular diseases including atherosclerosis and hypertension in China. The aqueous extract of this formulation contains hundreds of chemical compounds including salts, proteins, amyloses, flavonoids, and paeonosides. A porous resin chromatographic technique was used to divide it into six different components, as described in our previous study (14). Briefly, the six plant materials were mixed according to traditional formula and then extracted for three times (2 h each time). After filtration and concentration, the extract was separated on D101 porous resin columns using a step gradient. Six collected fractions were named as Component GS, GP, PP, PF, FP, and UK, respectively.

The uniform experiment design was adopted to arrange combinations of six components with variant amounts. Twenty-two combinations were obtained for pharmacologic assays. The decreasing cholesterol effect of water extracts of this formulation against total cholesterol (TC) increasing was determined in hyperlipidemia rats. Chemical composition and bioactivity data of the sample sets are shown in Table 1. The following is a brief description of the animal trial:

Table 1.   Chemical and biologic information of 22 combinations of QXF
Sample numberWeight of components (g)Bioactivity
GSGPPPPFFPUKTC (mm)
  1. TC, total cholesterol; QXF, Qi-Xue-Bing-Zhi-Fang.

 10.25600.07200.03500.03200.020002.42 ± 0.99
 20.25000.07000.00700.00600.00800.00902.71 ± 1.10
 300.108000.048000.00702.59 ± 1.04
 40.678000.00900.00800.053002.90 ± 1.57
 50.21300.012000.05300.06700.00102.53 ± 0.98
 60.03100.00900.04200.01900.04900.00603.32 ± 1.16
 700.2030000.05600.00103.11 ± 0.45
 80.42700.01200.029000.00700.00803.58 ± 1.83
 90.055000.03700.069000.00502.57 ± 0.95
10000.05200.048000.00702.70 ± 1.70
110.04400.06200.060000.069002.70 ± 1.50
1200000.01600.01903.29 ± 1.46
130.06800.01900.04600.04200.053002.52 ± 0.92
140.34900.01000.04700.00400.05500.00102.92 ± 1.04
1500.10500.02500.02300.00600.00702.87 ± 0.83
160.05500.07700.0740000.00502.90 ± 1.28
170.52400.147000.0130003.03 ± 2.17
180.320000.04300.04000.00500.00302.81 ± 1.37
1900.07700.00700.06900.043001.99 ± 0.67
200.25600.07200.01700.01600.04000.00203.60 ± 1.75
210.128000.01700.01600.10000.00203.34 ± 1.38
220.19200.05400.02600.02400.03000.00303.53 ± 2.37
Blank0000001.57 ± 0.23
Control0000005.40 ± 2.64

Male Wistar rats (160–190 g weight) were obtained from Xiyuan Hospital Animal Center (Beijing, China). The protocol of experiment was in accordance with Guideline for the Care and Use of Laboratory Animals of Xiyuan Hospital. Rats were divided into 24 groups (n = 10, per group). One group was fed ordinary forage that served as blank group; one group of hyperlipidemia rats induced by high-cholesterol diet was control group. Twenty-two groups fed high-cholesterol diet and component groups with dosages shifting from 1.3 to 5.8 g per 100 g rat weight. After feeding 2 weeks, blood was collected for the determination of plasma TC concentration and other biologic data. Plasma TC level in the majority of treated groups was significantly reduced. Statistical analysis was made for the collected data in terms of mean value and standard deviation.

Results

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgments
  8. References

Comparing performance of MLR, radical basis function artificial neural network (RBFANN), and SVR in QCAR modeling

At first, MLR was used to investigate relationship between composition of QXF and its biologic activity. Stepwise regression algorithm was adopted to obtain a linear QCAR model below.

  • image(4)

In the upper function, YTC is the concentration of TC in serum. mPP, mPF, and mFP represent the weight of fraction PP, PF, FP in the formulation, respectively. The predictive result of this model is illustrated in Figure 2A. As shown in Table 2, the mean error of prediction (MEP) and relative standard error of prediction (RSEP) of this linear model are relatively high indicating that linear model could not properly map the sophisticated relationship between components and their bioactivity. Subsequently, two non-linear machine-learning algorithms, RBFANN and SVR, were employed to establish QCAR model of QXF.

image

Figure 2.  Predictive results of leave-one-out cross-validation. This figure shows the predicted activity and actual activity of 22 component combinations. (A–C) Exhibit the results obtained by multiple linear regression, RBFANN and support vector regression, respectively. The solid square ‘▪’ is the experimental mean value of total cholesterol and the hollow circle ○ is the predicted value. The standard deviation of each sample is represented by the bar.

Download figure to PowerPoint

Table 2.   Prediction error of MLR, RBFANN, and SVR
AlgorithmLeave-one cross-validation (%)
MEPRSEP
  1. SVR, support vector regression; MLR, multiple linear regression; MEP, mean error of prediction; RSEP, relative standard error of prediction.

MLR27.332.1
RBFANN9.5211.72
SVR8.0210.29

Some important parameters of ANNs and SVR were determined at first. In RBFANN implementation, radius (spread) value will greatly affect the accuracy of model. In the SVR implementation, the parameters, such as the regular constant C and σ in the kernel function must be determined in advance. According to computational results of cross-validation, suitable radius value for the RBFANN was 0.93 and optimal {δ,C} of SVR was {0.3,20}.

Leave-one-out cross-validation was performed to measure the performance of different machine learning algorithms. The predictive results of RBFANN and SVM are illustrated in Figure 2B,C. It is very clear that the differences between experimental values and predicted values were in the range of standard deviation of animal experiments. As the results shown in Table 2, RBFANN and SVM model have similar RSEP, which was near 10%. The MEP of SVR model was little less than that of RBFANN. Although the predictive accuracies of these models were not very high, it is not bad considering the uncertainties and crudeness of the biologic assay. This result indicates that the non-linear approaches including ANNs and SVR are more suitable to develop QCAR model.

Optimization of QXF by QCAR model

As mentioned in Section 2.2, QCAR model can be used to design combinatory botanical drugs with better efficacy. The activities of combinations of two major active components, PF and FP, were predicted based on the proposed RBFANN model. As illustrated in Figure 3, the activities of different combinations varied according to the dosage of component PF and FP. For a fixed dosage, when the ratio of FP to PF is near 2:1, the maximal predicted activity was obtained (Figure 4).

image

Figure 3.  Predicted relative activities of the combinations composed of component PF and FP with different dosage.

Download figure to PowerPoint

image

Figure 4.  The figure exhibits predictive activity for the combinations of two components PF and FP with different proportion based on quantitative composition–activity relationship model. The result demonstrates that when ratio of PF to FP is nearly 2, viz. amount of PF is twice as FP, the combination achieves maximal activity.

Download figure to PowerPoint

To validate the computational results, additional experiment was performed in the same manner as mentioned in Data sets Section. The decrease of plasma cholesterol induced by QXF and three combinations of component PF and FP was measured. As illustrated in Figure 5, QXF and combinations of PF and FP can significantly diminish plasma TC level. However, the combination with the optimal proportion of FP and PF obtained by computation has greater efficacy than QXF (ratio 1:1) and the combination with ratio 1:2. Compared with original QXF, the chemical composition of the optimized combination is relatively simple and convenient to be controlled by modern analytical methods (14,15). Therefore, the combination produced by the proposed QCAR model has potential to be applied to develop new botanical drug.

image

Figure 5.  Effects of Qi-Xue-Bing-Zhi-Fang (QXF) and combinations of component PF and FP on plasma lipids level (n = 10). Bars represent standard deviations. Compared with control group, *p < 0.05, **p < 0.01; compared with QXF group, #p < 0.05, ## p < 0.01.

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgments
  8. References

Herbal medicine is engendered from ancient medical philosophy among thousands of years and its efficacy is validated in uncountable clinical applications. Although the exact mechanism of action of botanical drug is still controversial, widespread application of herbal product has proven the availability of botanical drug product. Nowadays, the chief strategy for the development of botanical drug is bioassay-guided fractionation, which demands huge workload in sample separation and bio-screening. In many cases, some active ingredients can be discovered in this way, but the pure compound obtained cannot have the same therapeutic effect as the original formulation. On the other hand, because these active components are derived from clinically useful herbal medicines, appropriate combination of active components will improve the hit rate in discovering potential botanical drug. Thus, as suggested by Jia et al. (16), new botanical drug should be designed as a combination of several active components to ensure the holistic and synergistic effects of herbal medicine.

In the present study, QXF extract was divided into six fractions by D101 resin column chromatography, which is one of the most used approaches for the isolation and purification of natural products. The reproducibility of the fraction process was evaluated by high-performance liquid chromatographic analysis, as mentioned in our another report (17). The variation in the content of several active compounds was <2% and the similarity of chromatographic fingerprint was more than 0.99, which indicated that the fraction process has excellent reproducibility. Thus, a feasible way for the production of combined herbal medicine includes two steps. The first step is rapid and repeatable isolation of active components by preparative or semi-preparative chromatography. The second step is combining these active components in proper proportions.

In clinical practice, TCM practitioners often modulate the proportion of herbs according to the status of patients. In this way, the chemical composition of herbal medicine is changed. This adjusting is usually according to clinical experience of the doctor. In our study, interpretation of created QCAR models can give an insight into the chemical–biologic space and allow for narrowing the combinatorial range of active components. The calculated optimal proportion was further validated by bioassays. Such focused screening can reduce the number of experiments and increase the effectiveness of botanical drug design. Although the accuracy of QCAR model is not very high, it is not too bad compared with the accuracy of QSAR model in its early stage, such as the works of Hansch and Fujita in 1960s (18). More careful experimental design and precise bioassay operation will aid in improving the performance of QCAR model. Moreover, if the automated sample preparation approaches and the high-throughout screening techniques can be integrated with the above-mentioned method, the speed of new drug discovery based on herbal medicine will be greatly accelerated.

Conclusion

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgments
  8. References

In the present study, a novel computational strategy to design new botanical drug from herbal medicine was developed. We have demonstrated our preliminary results in modeling the QCAR of QXF and optimizing its chemical composition. New combination of active components from QXF was achieved by a synthetic method of separation, bioassay, computational modeling, and experiment validation. Our future work is concentrated on improving the performance of QCAR model and applying the present method in studying the relationships between composition and activity of other herbal medicines. Studies on simultaneously predicting various activities of herbal medicine and multi-objective optimization for designing botanical drug are also undergoing. Such researches can be expected to afford a new paradigm to the rational design of botanical drug.

Acknowledgments

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgments
  8. References

The author would like to thank Prof. Dazuo Shi of Beijing Xiyuan Hospital for providing biological data. This project was financially supported by the Chinese National Basic Research Priorities Program (No. 2005CB523402) and a key grant from National Natural Science Foundation of China (No. 90209005).

References

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusion
  7. Acknowledgments
  8. References