Basic concept of quantitative composition–activity relationship
Quantitative composition–activity relationship is the correlation between chemical composition and its biologic activity of a complex chemical system, such as combination drugs and herbal medicines. Usually, herbal medicine can be divided into several fractions called components, which are relatively unambiguous and easily being controlled in preparation process (11). Activity of botanical drug can be calculated by the descriptors of these components through mathematical function or models.
Suppose herbal medicine R is composed of several kinds of herbs, which have a total weight X_{R}, aqueous extracts of this formulation can be divided into n kinds of different components C_{i} (i =1, 2,…,n) with an yield rate T_{i} (i = 1, 2,…,n). Thus, the weight of each fraction can be represented as
 (1)
Hence, herbal medicine R can be regarded as a combination of different components, represented by a vector [X_{1}, X_{2},…,X_{n}]. Moreover, m kinds of biologic properties can be assayed and denoted by column vector Y = [Y_{1}, Y_{2},…,Y_{m}]^{T}. If the weight of some fractions of R is changed to obtain another combination R′, represented by [], its bioactivity may relatively vary. Therefore, the relationship between biologic activity and chemical composition of R can be represented by a mathematic function:
 (2)
If a single fraction C_{i} is active, bioactivity of entire combination increases accompanying with the incremental amount of C_{i}. The function can be simplified to a linear or log linear function. However, synergetic effects of several active components should be taken into consideration in QCAR model. Nonlinear model is more suitable to describe complex relationship between active fractions and the integrated pharmaceutical efficacy.
To obtain a useful data set, many combinations of R with different proportion are needed. An experiment design D can be helpful to arrange the appropriate numbers of the combinations. D is a m × n matrix, where m is the number of combination groups, D_{jk}(j ∈ [1, m],k ∈ [1,n]) is the proportion of component C_{k} in combination j, which is in the range of [0,1]. Thus, the chemical composition information of m group is represented by matrix X_{m × n} = D × X, X = [X_{1}, X_{2},…, X_{n}]. Subsequently, the pharmacologic activity of these combinations can be assayed and represented by a matrix Y. Moreover, the control group can be regarded as the combination with none of the fraction, which is expressed as vector [0,0,…,0]. Its bioactivity is Y_{0}. To eliminate the influence of different dosage of each fraction in original formulation, the chemical information of those groups should be normalized. The relative weight of ith fraction C_{i} is calculated according to a linear scale based on its maximum amount (X_{i max}) and minimum amount (X_{i min}) of the total groups, as shown in eqn 3:
 (3)
The relative bioactivity is obtained in the same way.
Based on the proposed data set, identifying QCAR of herbal medicine is transformed into a pattern recognition or function approximation problem as the same as QSAR study. To increase computational accuracy and efficiency, artificial intelligence techniques, such as MLR, ANNs, SVM have been employed to establish a linear or nonlinear model based on the chemical information data set x and pharmacologic information data set Y.
Three algorithms including MLR, radical function artificial neural network (RBFANN), and support vector regression (SVR) were applied in modeling QCAR of QXF. MLR was performed using spss 10.0 (SPSS Inc., Chicago, IL, USA), while SVR was performed with SVR software libsvm, which is provided by ChihChung Chang and ChihJen Lin and obtained through the network services. ANNs algorithm was implemented by inhouse program in matlab 6.5 (Mathwork Inc., Natick, MA, USA). All calculations were processed on a computer with Celeron 1.1 G and 128 MB RAM. The detailed algorithms of MLR, ANNs, and SVR can be found in Refs (12,13).
Data sets
QiXueBingZhiFang, composed of six herbs including Ligusticum chuanxiong Hort., Radices paeoniae rubra, and Carthamus tinctorius, has been successfully applied in remedy of many cardiovascular diseases including atherosclerosis and hypertension in China. The aqueous extract of this formulation contains hundreds of chemical compounds including salts, proteins, amyloses, flavonoids, and paeonosides. A porous resin chromatographic technique was used to divide it into six different components, as described in our previous study (14). Briefly, the six plant materials were mixed according to traditional formula and then extracted for three times (2 h each time). After filtration and concentration, the extract was separated on D101 porous resin columns using a step gradient. Six collected fractions were named as Component GS, GP, PP, PF, FP, and UK, respectively.
The uniform experiment design was adopted to arrange combinations of six components with variant amounts. Twentytwo combinations were obtained for pharmacologic assays. The decreasing cholesterol effect of water extracts of this formulation against total cholesterol (TC) increasing was determined in hyperlipidemia rats. Chemical composition and bioactivity data of the sample sets are shown in Table 1. The following is a brief description of the animal trial:
Table 1. Chemical and biologic information of 22 combinations of QXF Sample number  Weight of components (g)  Bioactivity 

GS  GP  PP  PF  FP  UK  TC (mm) 


1  0.2560  0.0720  0.0350  0.0320  0.0200  0  2.42 ± 0.99 
2  0.2500  0.0700  0.0070  0.0060  0.0080  0.0090  2.71 ± 1.10 
3  0  0.1080  0  0.0480  0  0.0070  2.59 ± 1.04 
4  0.6780  0  0.0090  0.0080  0.0530  0  2.90 ± 1.57 
5  0.2130  0.0120  0  0.0530  0.0670  0.0010  2.53 ± 0.98 
6  0.0310  0.0090  0.0420  0.0190  0.0490  0.0060  3.32 ± 1.16 
7  0  0.2030  0  0  0.0560  0.0010  3.11 ± 0.45 
8  0.4270  0.0120  0.0290  0  0.0070  0.0080  3.58 ± 1.83 
9  0.0550  0  0.0370  0.0690  0  0.0050  2.57 ± 0.95 
10  0  0  0.0520  0.0480  0  0.0070  2.70 ± 1.70 
11  0.0440  0.0620  0.0600  0  0.0690  0  2.70 ± 1.50 
12  0  0  0  0  0.0160  0.0190  3.29 ± 1.46 
13  0.0680  0.0190  0.0460  0.0420  0.0530  0  2.52 ± 0.92 
14  0.3490  0.0100  0.0470  0.0040  0.0550  0.0010  2.92 ± 1.04 
15  0  0.1050  0.0250  0.0230  0.0060  0.0070  2.87 ± 0.83 
16  0.0550  0.0770  0.0740  0  0  0.0050  2.90 ± 1.28 
17  0.5240  0.1470  0  0.0130  0  0  3.03 ± 2.17 
18  0.3200  0  0.0430  0.0400  0.0050  0.0030  2.81 ± 1.37 
19  0  0.0770  0.0070  0.0690  0.0430  0  1.99 ± 0.67 
20  0.2560  0.0720  0.0170  0.0160  0.0400  0.0020  3.60 ± 1.75 
21  0.1280  0  0.0170  0.0160  0.1000  0.0020  3.34 ± 1.38 
22  0.1920  0.0540  0.0260  0.0240  0.0300  0.0030  3.53 ± 2.37 
Blank  0  0  0  0  0  0  1.57 ± 0.23 
Control  0  0  0  0  0  0  5.40 ± 2.64 
Male Wistar rats (160–190 g weight) were obtained from Xiyuan Hospital Animal Center (Beijing, China). The protocol of experiment was in accordance with Guideline for the Care and Use of Laboratory Animals of Xiyuan Hospital. Rats were divided into 24 groups (n = 10, per group). One group was fed ordinary forage that served as blank group; one group of hyperlipidemia rats induced by highcholesterol diet was control group. Twentytwo groups fed highcholesterol diet and component groups with dosages shifting from 1.3 to 5.8 g per 100 g rat weight. After feeding 2 weeks, blood was collected for the determination of plasma TC concentration and other biologic data. Plasma TC level in the majority of treated groups was significantly reduced. Statistical analysis was made for the collected data in terms of mean value and standard deviation.