Database characterization based on chemotype frequency analysis
The hierarchical classification among chemotypes at different resolution levels is known to be well suited for structurally large data sets (7,8). Xu and Johnson (7,8) developed a methodology to group chemotypes in five basic categories: complete 2D structures, cyclic systems, side chains, ring systems and functional groups. In this work, the molecular database was analyzed considering chemotypes at two different resolution levels (cyclic systems and cyclic system skeletons), Figure 1, which can be associated to molecular scaffolds and molecular skeletons, respectively (7–9). Chemotypes were classified and ranked based on their frequency in the database. The data set contains 191 cyclic systems and 86 cyclic system skeletons. The most frequent chemotypes at two resolution levels are shown in Figure 4. The most frequent cyclic system is PP97T, corresponding to similar molecules with a common scaffold based on 1,5-diphenyl-1H-pyrazole (12.5%), Figure 4A. It is worth noting that similar scaffolds based on three ring systems have important frequency percentages, for example 6CEKT (5.5%), L2UP5 (4.0%), USKZ6T (4.0%), E5CGF (2.3%) and USKFM (2.1%). On the other hand, the most frequent cyclic system skeleton is C2NYM (34.5%), which comprises most of the scaffolds containing 1,2-diarylheterocyclic or 1,2-diarylcarbocyclic system, Figure 4B.
Figure 4. Most frequent cyclic systems (A) and cyclic system skeletons (B) found in the current cyclooxygenase inhibitors database (frequency ≥ 10). Chemotype identifier, frequency, and percentage are displayed.
Download figure to PowerPoint
Having identified and quantified the most frequent cyclic systems and cyclic system skeletons, the characterization of the activity and selectivity profile of each chemotype is addressed in the next section using the CASE plots.
Chemotype activity and selectivity-enrichment plots
In this work, CASE plots are proposed as a graphical representation to analyze activity and selectivity enrichment for each chemotype in a set of molecules with activity against two targets or biological end-points. These plots are an expansion of the previously described chemotype-enrichment plots, which are graphic representations that show the relationship between occupancy and activity enrichment for a set of chemotypes at a given level of structural resolution. It is worth to mention that the previously described chemotype-enrichment plots were proposed for the analysis of a set of molecules with activity against a single target or biological end-point (9). Hence, chemotype-enrichment plots do not give information about selectivity for a particular chemotype. As described in the Methods, scaffolds rich in active and selective compounds were identified in the CASE plots (Figure 2).
Different levels in activity and selectivity can be used in CASE plots. Table 1 shows SEF values for the most frequent cyclic systems and cyclic system skeletons considering two thresholds for selectivity (SEF100 and SEF10). At the cyclic systems resolution level, chemotypes with high frequency in the database and high SEF (>1) at the two selectivity levels are PP97T, 6CEKT, L2U5P, QZ3TX, E5CGF, USKFM and RF2VF. It is important to consider different selectivity levels to visualize high- and low-selective chemotypes. An example of a chemotype for cyclic systems with low selectivity is USZ6T, which presents a SEF100 = 0, but a SEF10 = 1.03. Hence, the chemotype USZ6T is rich in molecules with low selectivity in the current data set. A similar analysis using cyclic system skeletons reveals that C2NYM, RDJ81, JK9S5 and 8E4BT are the most frequent chemotypes enriched in selective molecules.
Table 1. Most frequent cyclic systems and cyclic system skeletons analyzed at two thresholds of selectivity
| Entire database |
| Cyclic systems |
| Cyclic system skeletons |
Chemotype Activity and Selectivity Enrichment plots for cyclic systems and cyclic system skeletons are shown in Figure 5. As discussed in the Methods, CASE plots can be divided into four different regions I–IV (Figure 2). Points located in different regions represent different activity and selectivity patterns. Frequency values were mapped (Figure 5A) using a continuous color scale from green (less frequent) to red (more frequent).
Figure 5. Chemotype activity and selectivity enrichment (CASE) plots for COX-2 inhibitors at two different chemotype resolutions. Chemotype Activity and Selectivity Enrichment plots are divided in regions I–IV representing different activity and selectivity enrichment patterns. (A) More frequent chemotypes (frequency ≥ 10); points are color-coded by frequency using a continuous scale from green (less frequent) to red (more frequent). (B) Less frequent chemotypes (frequency < 10).
Download figure to PowerPoint
Region IV is considered the most important zone on this study because it contains active and selective chemotypes. Region IV, in Figure 5A, identifies the most frequent (frequency ≥ 10), active (AEF > 1) and selective (SEF100 > 1) chemotypes. For cyclic systems, there are six examples namely PP97T, 6CEKT, USKFM, L2USP, QZ3TX and E5CGF. Chemotypes E5CGF, QZ3TX, L2USP and USKFM have the highest values of AEF (3.12, 2.92, 2.76 and 2.67, respectively) and SEF100 (2.49, 2.53, 2.52 and 2.67, respectively). Nevertheless, PP97T is the most frequent cyclic system in the region IV (AEF = 1.18 and SEF100 = 1.64) and even in the entire database (frequency = 82). For cyclic system skeletons, there are three examples of cyclic system skeletons located in region IV, namely C2NYM, RDJ81 and JK9S5. The chemotype JK9S5 has the highest values of AEF and SEF100 (2.60 and 2.95, respectively); however, this is not the most frequent chemotype. On the other hand, chemotype C2NYM has the lower values of AEF (1.31) and SEF100 (1.59), as compared to JK9S5 and RDJ81, but it is the most frequent cyclic system skeleton in the entire database (227 compounds).
Based on CASE plots and considering both chemotype resolution levels, it is clear that chemotypes in the region IV comprise the highest proportion of selective and active molecules; and all of them can be classified into 1,2-diarylheterocyclic or 1,2-diarylcarbocyclic systems. Interestingly, when both chemotype resolutions are compared, cyclic systems E5CGF, 6CEKT, USKFM and PP97T can be clustered in a general parent cyclic system skeleton C2NYM.
The information extracted from CASE plots is in agreement with previous reports on which 1,2-diarylheterocycles and 1,2-diarylcarbocyclic systems were described as potent and selective COX-2 inhibitors (14,15). Interestingly, some drugs in clinical use as anti-inflammatories (FDA approved) that can be classified among the most important chemotypes are celecoxib (cyclic system PP97T/cyclicsystem skeleton C2NYM) for human use and deracoxib (PP97T/C2NYM) for veterinary use. Also, other important selective COX-2 inhibitors that have been recently withdrawn from the market are rofecoxib, valdecoxib and parecoxib that can be classified as chemotype C2NYM, whereas etoricoxib is classified as RDJ81.
In addition, the analysis of chemotypes in region III of CASE plots could become useful when chemotypes that comprise molecules with dual strong activity are desirable, for example R79ZD (2,3–dihydro-1-benzofuran system).
Some chemotypes with low frequency are also of interest as they may show high values of AEF and SEF. However, these chemotypes can mislead interpretations in the first analysis. For example, Figure 5B shows an important quantity of chemotypes located in region IV for cyclic systems and cyclic system skeletons. However, most of them are characterized by only one or two molecules in the current database, which is poor information to support the potential in activity and selectivity of these particular chemotypes.
Although region IV of the CASE plots provides interesting and valuable SAR information concerning the scaffolds; the chemotype analysis, by itself, does not include details about the influence of the side chains. This will be addressed in the next section.
Chemotype distribution in DAD maps
Chemotypes previously identified in region IV of CASE plots are especially valuable for SAR characterization. Figure 6 shows DAD maps representing the six most frequent, active and selective cyclic systems (PP97T, 6CEKT, L2U5P, QZ3TX, E5CGF and USKFM) and three cyclic system skeletons (C2NYM, RDJ81 and JK9S5). Of note, in this analysis, only pairwise comparisons were performed for compounds sharing the same chemotype.
Figure 6. Dual activity difference maps for chemotypes located in region IV of CASE plots. Each point represents a pairwise comparison where both molecules share the same chemotype. Data points are color-coded to distinguish chemotypes namely PP97T (gray), 6CEKT (brown), E5CGF (black), USKFM (pink), L2U5P (purple), QZ3TX (blue), C2NYM (red), RDJ81 (green), JK9S5 (orange). In order to have a better visualization, chemotypes are depicted in an individual panel labeled each with its code and pairwise frequency.
Download figure to PowerPoint
The results show that, even when all selected chemotypes are structurally related, its distribution in DAD maps can be different. This is valuable information because different chemotypes can be associated with different activity and selectivity patterns. The quantification of data pairs located in regions Z1–Z5 for selected chemotypes is shown in Table 2. For cyclic systems resolution, chemotype PP97T is widely distributed among Z1, Z3–Z5 having frequency > 20%. This result suggest that PP97T has a diverse SAR, where changes in structure lead to changes in selectivity against COX-1 (Z3 = 21.3%), COX-2 (Z4 = 21.6%) and parallel changes against both targets (Z1 = 24.4%). Interestingly, PP97T has the lower distribution among Z5 (26.5%) as compared with 6CEKT, E5CGF, USKFM, L2U5P and QZ3TX (47.1–67.7%); therefore, changes in side chains of this chemotype lead to high changes in activity or selectivity in most of the cases. Chemotypes 6CEKT, USKFM and QZ3TX are frequent in regions Z3–Z4 (Z3 = 16.5–29.2%; Z4 = 19.8–22.2%); thus, changes in the structure of these chemotypes are highly related to changes in selectivity. These same chemotypes are also frequent in Z5 = 47.1–58.2% and show an important number of pairs with no changes in activity. Additionally, for these same chemotypes, low-frequency values were found in Z1; therefore, a small amount of molecules is associated with parallel changes in activity (Z1 = 2.5–9.7%). Chemotype E5CGF has high frequency in Z3 (39%); thus, most of the changes in structure lead to changes in activity against COX-1. This chemotype is also frequent in Z5 (59%). Finally, chemotype L2U5P has high frequency in Z4 (15.4%); therefore, structural changes in molecules with this chemotype are highly related to changes against COX-2. Data points containing this chemotype are also frequent in Z5 (67.7%). At the cyclic system skeleton level, C2NYM is widely distributed among Z1, Z3–Z5; the same behavior than their children cyclic system PP97T. The same tendency was observed for JK9S5 (distributed among Z3–Z5) and for RDJ81 (distributed among Z1, Z4–Z5) as compared with their children cyclic system QZ3TX and L2U5P, respectively.
Table 2. Pairwise activity differences distribution in regions Z1–Z5 of DAD maps for each selected chemotype
|Chemotypes||Total (Z1–Z5)a||Z1b||Z1 (%)c||Z2||Z2 (%)||Z3||Z3 (%)||Z4||Z4 (%)||Z5||Z5 (%)|
| Cyclic systems |
| Cyclic system skeletons |
Structure–activity relationships with DAD maps
Some important structural modifications that lead to activity and selectivity changes for selected chemotypes can be analyzed using DAD maps. Figure 7 shows a DAD map with examples of pairs of molecules for chemotype PP97T in different regions. Also, panels Z1–Z5 depict the chemical structures and activities of selected pairs for both cyclooxygenases in each region of the DAD map.
Figure 7. Dual activity–difference (DAD) map comprising 3321 pairwise comparisons for 82 compounds with the cyclic system PP97T. Some examples are shown labeled with the compounds code. Chemical structures for selected pairs are depicted in the panels Z1–Z4 (respectively location Zone) and the IC50 (nm) for COX-1 and COX-2 are shown at the bottom of each structure.
Download figure to PowerPoint
Compound pairs 104_144 and 104_128 are examples of pairs in zone Z1, where structural changes are associated with similar changes in activity against both cyclooxygenases. For example, these pairs suggest that trifluoromethyl at position 3 in combination with chlorine or fluorine at position 4 increases the activity against both targets. Z2 is a very attractive zone because pairs located in this region are related to inverse SAR. Some examples in Z2 are pairs 146_147, 114_147, 113_150 and 113_128. Based on these examples, where sulfonamide is replaced for chlorine or methoxyl group, selectivity against COX-1 is observed, and hence, it is evident that sulfonamide substituent is very important to gain selectivity against COX-2. Additionally, the examples presented in regions Z1 and Z2 suggest that hydrogen at position 4 (see compounds 146 and 114) is favorable for selectivity against COX-2 in 3-trifluoro and 3-difluoropyrazole derivatives as compared with halogen substituted molecules (e.g., 144 and 128). Pairs in region Z3 are related to changes in activity against COX-1 and low or no changes against COX-2. Some examples are pairs 94_113 and 101_113, where the absence of sulfonamide has large impact to increase activity against COX-1 but it has low effect against COX-2. Also, the pair 141_108 suggests that a change in the position of phenyl sulfonamide has an important impact to lose selectivity. Pairs in the region Z4 are related to changes in activity against COX-2 and low or no changes against COX-1. Data points 110_480 and 133_480 are examples of pairs in Z4. These pairs suggest that the presence of the acetylaminomethyl substituent leads to loss of activity and selectivity against COX-2. This observation is also supported by comparing the well-known COX-2 selective inhibitor celecoxib with the non-selective compound 480, which only differ in the acetylaminomethyl substituent. These same pairs in Z4 suggest that some changes in the phenyl ring at position 5 are tolerated maintaining selectivity. This last observation holds also with pairs 110_124 and 114_124 located in Z5. Additional examples in Z4 are 119_134 and 119_120 where the substitution of the amino substituent at position 4 of the pyrazole ring for halogens, like chlorine or bromine, is favorable for COX-2 selectivity. Additional substituents at position 4 of the pyrazole ring, like fluorine or methylsulfonyl, reduce activity and selectivity (pair 97_104 located in Z5). Interestingly, pyrazole derivatives substituted at position 3 with trifluoro- or difluoromethyl, for example 146 and 114, lead to highly selective compounds as well as pyrazole derivatives substituted with bromine or chlorine at position 4, for example 134 and 120; however, derivatives having both substitution patterns lead to low-selective compounds (e.g. 144). It is worth mentioning that the SAR discussed in this work is highly dependent of the current database, and hence, other additional observations could arise with different databases screened against the same targets.