Multivariate analysis of cytokine profiles in pregnancy complications

Problem The immunoregulation to tolerate the semiallogeneic fetus during pregnancy includes a harmonious dynamic balance between anti‐ and pro‐inflammatory cytokines. Several earlier studies reported significantly different levels and/or ratios of several cytokines in complicated pregnancy as compared to normal pregnancy. However, as cytokines operate in networks with potentially complex interactions, it is also interesting to compare groups with multi‐cytokine data sets, with multivariate analysis. Such analysis will further examine how great the differences are, and which cytokines are more different than others. Methods Various multivariate statistical tools, such as Cramer test, classification and regression trees, partial least squares regression figures, 2‐dimensional Kolmogorov‐Smirmov test, principal component analysis and gap statistic, were used to compare cytokine data of normal vs anomalous groups of different pregnancy complications. Results Multivariate analysis assisted in examining if the groups were different, how strongly they differed, in what ways they differed and further reported evidence for subgroups in 1 group (pregnancy‐induced hypertension), possibly indicating multiple causes for the complication. Conclusion This work contributes to a better understanding of cytokines interaction and may have important implications on targeting cytokine balance modulation or design of future medications or interventions that best direct management or prevention from an immunological approach.

reactions and are responsible for cell-mediated inflammatory reactions, delayed-type hypersensitivity (DTH) and tissue injury in infectious and autoimmune diseases. On the other hand, Th2 cells secrete IL-4, IL-5, IL-6, IL-9, IL-l0, and IL-13 and are associated with help for antibody production by B cells.
It has been proposed that successful pregnancy in mice is a Th2 phenomenon and that abnormally elevated concentrations of Th1type cytokines are associated with spontaneous miscarriage in mice and humans. 8,9 It appears therefore that cytokines may have positive and negative effects on pregnancy depending on the types and levels of cytokines secreted.
Given this scenario, we and others have undertaken studies with the objective of elucidating the possible roles of cytokines in human pregnancy and to ascertain whether there are differences in cytokine profiles in normal human pregnancy as compared to unexplained pregnancy complications. We analysed supernatants of mitogen-stimulated peripheral blood lymphocytes cultures, for a selected panel of Th1 and Th2 cytokines. Our data, as well as of others, support the hypothesis that normal successful pregnancy is a Th2phenomenon, while several other unexplained pregnancy complications such as recurrent spontaneous miscarriage (RSM), premature rupture of membranes (PROM), pregnancy-induced hypertension (PIH), and preterm delivery (PTD) are associated with an elevated Th1-type cytokine profile. [10][11][12][13][14][15][16][17][18] However, the majority of the studies have compared individual levels or ratios of a small number of cytokines. Given that cytokines form a network of interacting entities and that a single cytokine or a ratio of two may not provide sufficient information about the overall immune reactivity, it is of interest to study the combined levels of several cytokines as this may provide a better picture of immune reactivity. Moreover, multivariate cytokine profile analysis may also suggest cytokine importance or a mathematical measure of the contribution of individual cytokines in separating the test group from its comparable/matching healthy control. This will shed light on the extent to which cytokine profiles are related (or can predict) different pregnancy conditions: If the cytokine levels can predict with high accuracy the pregnancy conditions, this would suggest that the cytokines are an important element of the disease process. If instead the cytokine levels only poorly predict pregnancy conditions, this would suggest that factors other than cytokines are contributing to a greater extent than just the cytokines being studied. This may also have important implications on targeting cytokine balance modulation or design of future medications or interventions that best direct management or prevention from an immunological approach.
Looking to pregnancy complication data from another angle, it does not appear that earlier work has focused strongly on how different these groups are. Rather, many previous studies have focused on whether statistically significant differences between groups of normal vs complication groups could be found by, for example, comparing the mean cytokine concentration values of each group. However, merely statistical significance via finding a small P-value is not the whole story; it is also very interesting to know if the values are really very different, or only slightly different, despite having small P-values. Indeed, we should consider the practical significance, not just the statistical significance, of the difference in cytokine values between groups. Practical, or clinical, significance can be expected to be related to the actual size of the difference; if the difference is small (but statistically significant), this suggests little practical or clinical significance or benefit. On the other hand, if the difference is large, then it suggests practical significance, such as the ability of a drug to alleviate symptoms.
Another question, which has received little if any attention in the analysis of cytokine data for pregnancy complications, is the presence of subgroups in the data. It may be helpful to investigate whether there are subgroups within the same pregnancy complication. Because the cause of the pregnancy complications studied was unknown (cases with known cause were excluded), patient groups may be made up of different subgroups within a certain complication. For example, cytokine levels may display a large difference from the normal pattern in 1 subgroup, while there may also be another subgroup of women in the same group presenting the same complication, but due to other unknown reasons where cytokines are contributors but not necessarily the major players.
Keeping the above points in mind, we aimed to use the statistical approaches to study and quantify the connection between cytokine T A B L E 1 Groups of women studied along with the number of patients (n), clinical history, and mean gestational age Lastly, we also investigate whether there are subgroups within each complicated pregnancy condition.

| Subjects
The groups of subjects studied are as detailed elsewhere 10

| Mitogen-induced activation of PBMC
Peripheral blood from subjects in the groups of Normal Delivery, RSM, PTD, PIH, and PROM were stimulated with mitogen.

| ELISA for cytokines
Cytokine levels were determined by sandwich ELISA using kits ob-

| Statistical analysis
All zero/undetectable cytokine concentration values were replaced with the minimum detectable (ie, sensitivity) value, as is standard practice. The small fraction of missing cytokine values were replaced with the median values of the samples (which is a conservative technique, not biased towards specific characteristics of the data). All concentrations were log-transformed because a log scale is a more natural scale to study cytokine concentrations on. Log-transforming is also standard in the literature. After this, the data were centered (by subtracting the mean) and scaled (divided by the standard deviation).
Some of the cytokine values (log transformed and scaled) deviated strongly from normal distributions; hence, we employed nonparametric statistical tests to measure the differences in groups (see below). A P-value of <.05 was considered statistically significant in this study. Data and statistical analysis were performed with Scikit-learn 19 using the iPython interface. 20 We compared the multivariate cytokine data between groups. To do so, the main tools used were as follows:

| Multivariate comparison of the cytokine pattern between groups
The multivariate Cramer Test detects more complex differences in cytokine patterns as opposed to single or simple ratios differences between groups. As shown in Table 2, cytokine patterns in pregnancy complications were statistically significantly different as compared to the respective groups of matching gestational age. This indicates the statistical significant differences in cytokine profiles between healthy and complicated pregnancy groups of RSM, PTD, PIH, and PROM (P = .0001, .023, .0097, and .009 respectively) ( Table 2).

| Evaluating how different the groups are, using various methods
To understand better how strongly any 2 groups differ from each other, and in what way they differ, we plotted the PLSR projections (Figure 1) reported the 2-dimensional Kolmogorov-Smirmov distance and calculated the classification accuracy as compared to the ROC AUC (Table 2).
Summarizing the information on all these tests for all groups is as follows: Among the tested groups, 1st Tri and RSM are the most separated, which we infer from the obvious separation between the groups in the PLSR figure (Figure 1A), the large K-S distance (0.92), and a high ROC AUC (0.88) ( Table 2).
2nd Tri is also different from PTD and appears more clustered on the edge of the PTD projection ( Figure 1B). Visually, the 2 groups are quite different, but also have some points with overlap. Further, the K-S distance of 0.58 and ROC AUC value of 0.78 suggest that the 2 groups are different, but not extremely distinct, and certainly less than the separation between 1st Tri and RSM groups ( Table 2).
Although the ROC AUC between ND and PIH is somewhat high  Figure 1C). Further, the K-S distance is lowest among other comparisons (0.53) pointing that the groups are somewhat different, but not strongly distinct (Table 2).
Finally, comparing ND to PROM, Figure 1D shows quite a strong overlap between the groups, suggesting that the cytokine profiles are quite similar. The K-S distance is quite low (0.57), and the ROC AUC value also points to the groups being only modestly different (0.68) ( Table 2).
Thus we see that while all pairs of groups are statistically significantly different with small P-values in multivariate Cramer test, they vary considerably in both how strongly, and in which ways they differ.

| Investigating subgroups within complication groups
We further investigated whether there exist subgroups within each of the pregnancy complications (RSM, PTD, PIH, and PROM). The reason for this investigation is that subgroups may imply different causes for the complication.
We were not able to find clear evidence of subgroups within RSM, PTD, or PROM. However, the clearest result of clusters among a pregnancy complication was in the PIH group. To search for subgroups, we first examined PCA plots of all the groups. Figure 2 shows the PCA projection of the PIH data onto the first two principal components of the ND data. Visually examining the PCA plot in Figure 2 suggests that the PIH group is made up of 2 subgroups, which we have shown as crosses and triangles. We will call the PIH subgroup on the "edge" of the PCA projection PIH-out, and the other group PIH-in. Using the Gap Statistic, we also found that the PIH group has 2 subgroups/clusters. Further, employing the K-means clustering method (with K = 2) to assign each sample to 1 of 2 subgroups, we find an exact agreement between the subgroups visually apparent in the PCA plot, and the subgroups found by the gap statistic with K-means.
Having found 2 subgroups, it is interesting to analyze them further.  Figure 3A), while the remaining 9 form a subgroup which has cytokine profiles that are quite different to ND (denoted as PIH-out, Figure 3B).
F I G U R E 2 A PCA plot of ND data (yellow dots), with the PIH data (blue crosses and triangles) projected onto the same axes. The crosses and the triangles denote the 2 subgroups found by visual inspection, the gap statistic, and k-means clustering Table 3 further depicts the analysis of the 2 subgroups as compared to the ND group. As expected, the multivariate Cramer Test Pvalue, the K-S distance as well as the classification accuracy are far more distinct between ND vs PIH-out as opposed to ND vs PIH-in combination. In other words, the subgroup PIH-in is very similar to ND, but not exactly identical in distribution; while the subgroup PIH-out is very different from ND.
In summary, we found that the PIH group is actually made up of 2 subgroups/clusters, one having a multivariate cytokine profile very similar to ND, while the other subgroup is quite different to ND. This suggests that among the patients in our PIH group, a subgroup (PIHout) had PIH consequences which were reflected/shown by the cytokine imbalances as compared to ND, while this was not the case for the PIH-in subgroup of patients suggesting their presentation is not mediated/reflected by cytokine imbalances.
It is worth also noting that the multivariate Cramer Test that detects differences in cytokine patterns was not significantly different between ND and PIH-in group (P = .087), while, as expected, the P-value between ND and PIH-out was extremely significant (P = .0000001). The maximum cytokine importance/contribution in PIH-out group were for IL-5 (0.89) and IFNγ (0.10), ( Table 3). The large K-S distance (1.0), and large ROC AUC value of 0.91 corroborate the same finding (Table 3).

| DISCUSSION
Cytokines are known to work in a complex hierarchical network and most of them show pleiotropic, redundant, and synergetic actions, making the full understanding of the balance very challenging. While several reports have suggested the use of cytokines as potential biomarkers, a single biomarker may be insufficient; thus it was suggested that it would be more appropriate to use ratios of 2 cytokines or to develop a multivariate "cytokine signature" based on the pattern of several cytokines produced by peripheral blood mononuclear cells. [25][26][27] We compared multivariate cytokine profiles of several pregnancy complications to gestationally age-matched control groups, using several statistical techniques. It was previously established that the groups are different in terms of individual cytokine levels and simple ratios, but we sought to investigate these differences further. Studying all tested cytokines, our main aim was to see how different are the panels and in which ways they are different in pathological pregnancies as compared to normal controls.
We, therefore, analyzed our data with a variety of multivariate statistical techniques that can detect more complex differences in patterns than simple differences in individual levels or ratios. It is useful to employ such multivariate methods because it is in principle possible that none of early third trimesters is an important risk factor for PTD in asymptomatic women. 36 Similarly, increased production of the pro-inflammatory cytokines IL-1, TNFα, and IL-6 by placental cells and by amniotic and chorionic decidual tissues in PTD has been reported. 11,37 It is also interesting and novel that the analysis of our PIH group showed that it is made up of 2 clusters, one with multivariate cytokine profile, that is, similar to healthy controls, while the other is quite distinct. This method of analysis and the findings may help in explaining the long and wide controversy in literature about the association of different cytokines in different pregnancy complications. Taking PIH as an example, while there is substantial evidence supporting a role of cytokines in the pathogenesis of PIH, the underlying pathophysiologic mechanisms are still unclear with several proposed pathways. 35 It is possible that among immune causes, different immune mechanisms operate at different interfaces during the different stages of pre-eclampsia, where the final stage in all cases would be placental damage and the manifestations of PIH. 35,38 Taking into consideration other factors, such as HLA (Human Leukocyte Antigen) expression by the trophoblast, secreted trophoblast-derived factors, cytokine genotyping polymorphism and others, all contribute to the complexity and warrant further consideration. 35,[38][39][40] To the best of our knowledge, this is the first study to report simultaneous measurement of multiple cytokines from several differ-