Introduction
 Top of page
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
Carcinoma of the endometrium is the most common female pelvic malignancy1. Initial preoperative evaluation of patients suspected of having a carcinoma of the endometrium includes transvaginal sonography with or without color Doppler imaging and endometrial biopsy.
The distinction between FIGO surgical Stages Ib and Ic2 endometrial carcinoma (assessed postoperatively) is determined by the degree of myometrial invasion (Stage Ib is less and Stage Ic is more than 50% invasion)3. This is an important prognostic factor4 and in many institutions it determines the treatment protocol. The accurate preoperative distinction between patients with Stages Ia or Ib carcinoma and patients with Stages Ic or higher would allow identification of highrisk patients who might need pelvic lymphadenectomy. The importance of this is that in many countries, patients who will need lymphadenectomy are referred to a gynecological oncologist, while patients not requiring lymphadenectomy are operated on by a general gynecologist or surgeon.
Several techniques are used to estimate the depth of myometrial invasion, but all have specific limitations. Intraoperative gross visual inspection or frozen section do not allow preoperative planning of the surgical procedure. Franchi et al.5 reported an accuracy of 85.3% in predicting the degree of myometrial invasion in a series of 403 patients using intraoperative gross visual inspection, whereas Kucera et al.6 reported an accuracy of 88% using frozen section in a combined set of 624 patients. Contrastenhanced magnetic resonance imaging (MRI) is the most reliable method. In a metaanalysis, Kinkel et al.7 reported an area (AUC) under the receiver–operating characteristics (ROC) curve of 91% with respect to the prediction of myometrial invasion. However, MRI is costly, has limited availability and is not appropriate for all patients (e.g. those with claustrophobia, obesity and contrast allergies). Different groups8–18 have studied the value of transvaginal sonography and color Doppler imaging using different morphological or color Doppler parameters, with considerable variation in the results. Arko and Takac19 published one of the largest series that investigated the use of transvaginal sonography to estimate the depth of myometrial invasion in 120 patients, reporting an accuracy of 73% in predicting myometrial invasion.
In our study on patients with endometrial carcinoma, we analyzed ultrasound measurements obtained from transvaginal sonography with color Doppler imaging and histopathological data, obtained from preoperative endometrial biopsy (Pipelle^{®} de Cornier)). We then explored whether they contributed to the prediction of myometrial invasion as assessed postoperatively by the final histopathological examination (gold standard). Moreover, we aimed to construct models to predict the presence of deep myometrial invasion, which could help the clinician to identify preoperatively patients that might need more extensive surgery.
Methods
 Top of page
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
We first collected data from 97 consecutive patients with endometrial carcinoma, who underwent sonography between September 1994 and February 2000 by a single operator (D.T.)20. Here we refer to these patients as the ‘training set’. Their mean age was 65.9 (range, 45–83) years, with 88 women being postmenopausal. The distribution of the different surgical FIGO stages was as follows: 24 Stage Ia, 35 Stage Ib, 12 Stage Ic, eight Stage II, 13 Stage III and five Stage IV. The histopathological subtypes were: 76 endometrioid adenocarcinoma, three serous papillary and 18 mixed type (five of which had a clear cell and three a serous papillary component). Fiftyfour tumors were differentiated highly, 18 moderately and 25 poorly. Tumors with a serous papillary or a clear cell component were considered to be poorly differentiated.
All patients gave informed consent and underwent a preoperative ultrasound examination with transvaginal sonography and color Doppler imaging in the department of Obstetrics and Gynecology (University Hospitals Leuven) using the same protocol. The uterus was assessed both in sagittal and coronal planes with an Acuson Sequoia (SiemensAcuson Inc., Mountain View, CA, USA) ultrasound system, equipped with highly sensitive color Doppler imaging capability and a MultiHertz intravaginal probe with a field of view of 140°. The color Doppler imaging examination always included measurements of flow indices from both uterine arteries and subendometrial blood vessels. Highquality transparent color copies (Agfa Drystar, Agfa Gevaert, Mortsel, Belgium) and schematic handmade drawings of the sonographic findings were obtained for every patient.
Histopathology was assessed preoperatively by endometrial biopsy using a Pipelle de Cornier, which has been shown to reflect accurately histopathological parameters21–23. The patients were divided into two groups as determined by the final histopathological examination of the hysterectomy specimen: those with surgical Stages Ia or Ib and those with surgical Stages Ic or higher.
Several morphological parameters visualized by grayscale transvaginal sonography are available for univariate analysis (endometrial (ET) and myometrial (MT) thickness; endometrial (EV) and uterine (UV) volume; ET/uterine anteroposterior diameter (AP); EV/UV; MT/AP; endometrial echogenicity (EE: homogeneous or heterogeneous); endometrial lining (EL: regular or irregular)). EV and UV (expressed in mL) were calculated from three measurements of the endometrium or the uterus in two perpendicular planes and the volume was calculated according to the formula for a prolate ellipsoid: π/6 × D1 × D2 × D3 (where D1, D2, and D3 represent the three diameters of the structure). Blood flow indices obtained using color and spectral Doppler ultrasound included intratumoral peak systolic velocity (PSV), timeaveraged maximum mean velocity (TAMXV), resistance index (RI) and pulsatility index (PI). Furthermore, uterine artery PSV, TAMXV (maximum of the values measured at both the left and right uterine arteries, i.e. the worst case), RI and PI (minimum of the values measured at both the left and right uterine arteries) were measured. The subjective assessment by the gynecologist of the depth of myometrial invasion (using a fourvalue scoring system: 0 = Stage Ia; 1 = Stage Ib; 2 = Stage Ic; 3 = Stage II or higher) was also recorded. The gynecologist was not blinded to the histological results and tumor grading but he based his assessment mainly on the volume of the tumor and the myometrium remaining between tumor and serosa.
Univariate analysis
Univariate analysis was performed using the SAS software package (Release 8.01; SAS Institute Inc., Cary, NC, USA). We used the Wilcoxon ranksum test (for continuous data) and Fisher's exact test (for categorical data) to calculate Pvalues that reflected whether there was a significant difference for a certain variable between patients with surgical Stages Ia or Ib and patients with surgical Stages Ic or higher24. In addition, the ROC curves and the AUCs were estimated25 and compared26 for the individual parameters using custom scripts written in MATLAB (Version 6.5 Release 13; The Mathworks, Inc., Natick, MA, USA. See also Epstein et al.27 in which the same scripts were applied). The optimal cutoff point on the ROC curve was defined as the point that obtained the best tradeoff between sensitivity and specificity (point at which the tangent to the ROC curve had a slope of 1, for which it could be proven that it maximized the sum of the sensitivity and specificity). The resulting sensitivity and specificity values were also calculated. For all hypothesis tests, twosided tests were used and P < 0.05 was used as the level of significance.
Multivariate analysis
We trained three models (i.e. used the patients of the training set to determine the coefficients of a model in order to optimize its ability to differentiate between patients with and without deep myometrial invasion) based on a set of variables selected after stepwise logistic regression analysis. Subsequently, these models were validated prospectively on a new and independent set of patients. A schematic overview of the multivariate analysis procedure is given in Figure 1.
Variable selection
With multivariate stepwise logistic regression analysis (using stepwise selection in the LOGISTIC procedure from SAS) we aimed to select the variables that contributed significantly in a standard logistic regression model that predicted deep myometrial invasion. We considered the following variables for inclusion in the model: the ultrasound parameters discussed above, the number of fibroids detected during ultrasound examination (NF; range 0–2; this parameter has been reported to be a potential factor disturbing sonographic prediction, leading to overestimation of invasion28), the degree of differentiation of the cancer, the presence of a clear cell component and the presence of a serous papillary component. Note that the latter three (histopathological) variables were assessed by endometrial biopsy preoperatively (using Pipelle de Cornier). In the model, obtained at the end of the stepwise logistic regression analysis, only variables having a coefficient significantly different from zero (Pvalue < 0.05; Wald chisquare statistic) were allowed29. Note that only 74 of the 97 patients from the training set could be used for the stepwise logistic regression analysis because of missing values in some of the variables considered.
Modelbuilding
The variables selected after the stepwise logistic regression analysis were used subsequently to fit a standard logistic regression model and least squares support vector machine (LSSVM) models30 with linear and radial basis function (RBF) kernels to the training set.
Support vector machines are a relatively new method for solving classification problems and have already been used extensively for various applications, including medical ones31 (for more details, see the Opinion published in the same issue of this Journal32).
Since the models in this section were based on only a subset of the variables used during variable selection and since only the patients without missing values in any of the variables could be taken into account, the number of patients in the modelbuilding step (94) was larger than the number of patients used in the variable selection step (74). As described above, the single valued output of the models could also be analyzed and compared using the Wilcoxon ranksum test and ROC curves, and could also be used to estimate an optimal cutoff point or threshold for these models. Patients with a model output larger than this cutoff were then predicted to have deep myometrial invasion.
The standard logistic regression model was fitted with the LOGISTIC procedure from SAS. The class labels for patients with Stages Ia or Ib were 0, and they were 1 for patients with Stage Ic or higher. The Wald chisquare statistic was used to assess the significance of the coefficient of a certain variable in the fitted model.
Using LSSVMlab version 1.530, 33 for MATLAB we trained two LSSVM models using a linear and an RBF kernel. It is possible to write an LSSVM with a linear kernel as a simple linear equation in its variables. An LSSVM with an RBF kernel has a more complex form, (in this case it was a sum with 95 terms) which is why it is not stated explicitly in this manuscript.
Prospective validation
In the previous section, the AUCs of the mathematical models were estimated using the same collection of patients that was used to fit or train these models. This could have led to results that were too optimistic. Therefore, we validated prospectively our results using independent data from 78 consecutive new patients. Here we refer to these patients as the ‘independent test set’, which became available after the derivation of these models (collected prospectively). The mean age of the patients in the test set was 64.1 years (range, 31–89 years) and 72 of them were postmenopausal. They were assessed using the same protocol as that used for the patients of the training set. The distribution of their FIGO stages was: 14 Stage Ia, 36 Stage Ib, 16 Stage Ic, one Stage II, nine Stage III and two Stage IV. The following histopathological subtypes were present: 59 endometrioid adenocarcinoma, one mucinous, two serous papillary, 15 mixed type (of which nine had a serous papillary and four a clear cell component) and one endometrial tumor with unspecified histopathological subtype. Forty tumors were differentiated highly, 14 moderately and 24 poorly. Using these independent test data, we calculated the AUCs of the three models discussed above and compared them with the AUC of the subjective assessment of the expert. We also evaluated the performance of our models at the optimal cutoff points obtained after the ROC analysis of the training set. We used the method described by Hanley and McNeil25, 26 to estimate the sample size needed to reach statistical significance.
Results
 Top of page
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
The results (based on the training set) of the univariate analysis of the ultrasound parameters and the subjective assessment are presented in Table 1. Of all the ultrasound parameters, EV/UV had the largest AUC (78%), comparable to that of the subjective assessment (79%; difference not statistically significant). Also, there was no significant difference between the AUC of EV/UV and the AUCs of ET, MT, EV, ET/AP and MT/AP. Compared to these morphological parameters, the AUCs of the blood flow indices were low. Uterine artery RI and PI were higher in Stages Ia–Ib compared with Stages Ic or higher (differences were significant but Pvalues were close to 5%).
Table 1. Univariate analysis of the ultrasound parameters, the subjective assessment, the standard logistic regression model and the least squares support vector machines (LSSVM) models with a linear and radial basis function (RBF) kernel (training set, n = 97)  Range  AUC [95% CI]  Optimal cutoff value*  Sensitivity (%)  Specificity (%)  Mean or proportion in Stage Ia or Stage Ib  Mean or proportion in Stage Ic or higher  P 


Endometrial thickness (ET) (mm)  2–65  0.76 [0.66, 0.86]  14  81  64  15  25  < 0.0001 
Myometrial thickness (MT) (mm)  2–18  0.71 [0.59, 0.82]  8  74  61  8.8  6.4  0.001 
Endometrial volume (EV) (mL)  0–84  0.76 [0.66, 0.86]  4.9  71  69  8.2  18  < 0.0001 
Uterine volume (UV) (mL)  16–1075  0.61 [0.49, 0.72]  89  58  69  91  147  0.08 
ET/uterine anteroposterior diameter (AP)  0.07–1.5  0.75 [0.65, 0.86]  0.43  72  71  0.37  0.54  < 0.0001 
EV/UV  < 0.0001–0.75  0.78 [0.68, 0.87]  0.09  69  80  0.07  0.15  < 0.0001 
MT/AP  0.04–0.44  0.75 [0.64, 0.85]  0.17  74  75  0.24  0.15  < 0.0001 
Endometrial echogenicity (EE) (% heterogeneous)  —  0.60 [0.49, 0.72]  —  65  56  44%  65%  0.06 
Endometrial lining (EL) (% irregular)  —  0.61 [0.50, 0.73]  —  78  44  56%  78%  0.03 
Intratumoral        
PSV (cm/s)  0–0.96  0.61 [0.49, 0.73]  0.13  59  64  0.14  0.21  0.09 
TAMXV (cm/s)  0–0.77  0.61 [0.49, 0.73]  0.06  82  46  0.09  0.14  0.09 
RI  0.05–1  0.62 [0.48, 0.75]  0.5  50  78  0.62  0.54  0.08 
PI  0.23–6.0  0.61 [0.48, 0.74]  0.61  38  88  1.4  1.1  0.10 
Uterine artery        
Peak systolic velocity (PSV)  0.09–2.1  0.51 [0.39, 0.65]  0.62  31  84  0.49  0.53  0.81 
(cm/s)        
TAMXV (cm/s)  0.04–0.75  0.57 [0.45, 0.70]  0.25  37  80  0.20  0.24  0.27 
Resistance index (RI)  0.41–1.2  0.64 [0.52, 0.76]  0.71  49  78  0.78  0.71  0.03 
Pulsatility index (PI)  0.16–6.0  0.64 [0.52, 0.76]  1.3  49  78  1.9  1.5  0.04 
Subjective assessment (Stage  0–3  0.79 [0.69, 0.88]  1  61  86  0 : 51%  0 : 13%  < 0.0001 
Ia: 0; Stage Ib: 1; Stage Ic:   1 : 36%  1 : 26%  
2; Stage II or higher: 3)   2 : 12%  2 : 39%  
 3 : 2%  3 : 21%  
Standard logistic regression  0–1  0.89 [0.83, 0.96]  0.45  77  86  0.21  0.65  < 0.0001 
LSSVM with linear kernel  −1.5 to 1.4  0.88 [0.81, 0.95]  −0.31  91  73  −0.52  0.20  < 0.0001 
LSSVM with RBF kernel  −1.2 to 0.93  0.99 [0.97, 1]  −0.30  97  100  −0.74  0.56  < 0.0001 
Multivariate stepwise logistic regression selected the degree of differentiation, NF, ET and EV as variables that contributed significantly in a standard logistic regression model aiming to discriminate between patients with and without deep myometrial invasion on the final histopathological assessment. None of the blood flow indices was selected.
The resulting logistic regression model fitted to the training data was given by:
where DD1 and DD2 equal 1 if, respectively, the tumor is moderately and poorly differentiated, and 0 in other cases, and where y is the model output, which is a number on a continuous scale between 0 and 1 (note that since we had to take only the missing variables in the four selected variables into account, 94 patients could be used to fit the three models, which is more than the number of patients (74) that was used for variable selection). A patient was predicted to have a tumor of Stage Ia or Ib if y⩽a certain cutoff level and was predicted to have a tumor of Stage Ic or higher if y > this cutoff level. The coefficients (rounded to two decimal places) were: β_{0} = − 3.70 (95% CI, − 5.53 to − 1.86, P < 0.0001), β_{1} = 2.36 (95% CI, 0.82 to 3.91, P = 0.0027), β_{2} = 2.42 (95% CI, 1.00 to 3.84, P = 0.0008), β_{3} = − 2.45 (95% CI, − 4.23 to − 0.67, P = 0.0070), β_{4} = 0.20 (95% CI, 0.07 to 0.32, P = 0.0021) and β_{5} = − 0.11 (95% CI, − 0.19 to − 0.03, P = 0.0054). These coefficients indicate that the predicted probability of deep myometrial invasion increased when the degree of differentiation and the ET increased and that the predicted probability of deep myometrial invasion decreased when the NF and the EV increased. The negative influence of the EV was unexpected, but can be seen as a nonlinear effect of the ET (since EV∼ET^{3}). The performance of the standard logistic regression model on the training data and the optimal cutoff level are also summarized in Table 1.
The resulting LSSVM model with a linear kernel fitted to the training data was given by:
where DD equals 1, 2 and 3 if the degree of differentiation is highly, moderately and poorly differentiated, respectively and where y is the model output, which is a number on a continuous scale. Again, a patient was predicted to have a tumor of Stage Ia or Ib if y⩽a certain cutoff level and was predicted to have a tumor of Stage Ic or higher if y > this cutoff level. The coefficients (rounded to two decimal places) were: β_{0} = − 1.44, β_{1} = 0.37, β_{2} = − 0.37, β_{3} = 0.05 and β_{4} = − 0.03. According to the sign of these coefficients, the influence of the different variables was the same qualitatively as that in the logistic regression model.
As mentioned previously, the LSSVM model with an RBF kernel could not be written in a simplified form and is therefore not stated explicitly here. However, it could be implemented easily in for example, Microsoft Excel. The model output was a single and continuous number that had to be compared with a certain cutoff level. The performance of the LSSVM models with a linear and RBF kernel on the training data and the optimal cutoff levels are also described in Table 1.
Evaluated on the training set, the standard logistic regression and the LSSVM models with a linear and RBF kernel had a larger AUC than did the subjective assessment. This difference was only significant for the LSSVM with an RBF kernel (P < 0.0001) and had borderline significance for the standard logistic regression model (P = 0.0595).
The results of the prospective validation, which was only possible in 76 (of 78) testset patients because of missing values in EV, are presented in Table 2 and Figure 2. From these results we can conclude that prospective evaluation on the independent test set resulted in a higher AUC only for the LSSVM model with a RBF kernel (difference not significant) and in an equally good AUC for the LSSVM model with a linear kernel when compared with the AUC of the subjective assessment. The performance of the standard logistic regression model was poor. For the optimal cutoff value, the positive likelihood ratio for a positive result (positive likelihood ratio, LR+) of the subjective assessment was better compared with that of the LSSVM models. The opposite was true for the negative likelihood ratio (LR−). This means that, at the chosen cutoff level, the LSSVM models were better at ruling out deep myometrial invasion than they were at ruling it in, when compared with the subjective assessment.
Table 2. Prospective validation: performance of the standard logistic regression model and the least squares support vector machines (LSSVM) models with linear and radial basis function (RBF) kernels for the patients of the independent test set; comparison with the ultrasound parameter (endometrial/uterine volume (EV/UV)) from Table 1 with the best discriminatory potential and the subjective assessment (n = 78 for the subjective assessment and n = 76 for EV/UV and the mathematical models)  AUC [95% CI]  Optimal cutoff value*  Sensitivity (%)  Specificity (%)  LR+  LR− 


EV/UV  0.70 [0.58, 0.82]  0.085  57  72  2.1  0.59 
Subjective assessment  0.72 [0.59, 0.84]  1  61  80  3.0  0.49 
Standard logistic regression  0.66 [0.53, 0.79]  0.45  50  75  2.0  0.67 
LSSVM with linear kernel  0.72 [0.59, 0.84]  − 0.31  75  69  2.4  0.36 
LSSVM with RBF kernel  0.77 [0.66, 0.87]  − 0.30  79  67  2.4  0.32 
Discussion
 Top of page
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
Our study indicates that single morphological parameters do not improve the predictive power when compared with subjective assessment, and that spectral Doppler analysis does not contribute to the prediction of the degree of myometrial invasion in endometrial cancer. Combining the degree of differentiation, ET, EV and NF in an LSSVM model with a linear or RBF kernel might deliver predictions that are as reliable as is the subjective impression of an experienced sonologist. Assuming that a real difference exists between the true AUC of the LSSVM model with an RBF kernel and the true AUC of the subjective assessment, the number of patients in the independent test set, however, was not sufficient to reach statistical significance in a prospective evaluation. If the values in Table 2 represent the true AUCs (i.e. those that would be achieved by infinite populations), one would need a sample size of approximately 919 patients to be able to detect, with 80% power, the difference between these AUCs as being statistically significant34. Confirmation of the performance of LSSVM models with an RBF kernel in larger prospective studies is therefore necessary.
As could be expected and as is explained in the Opinion of this issue32, the performance on the test set or level of generalization of the LSSVM model with a linear kernel was better than was the performance of the standard logistic regression model. Evaluation on the training set (Table 1) gave the opposite order of performance, although the difference was small. The LSSVM model with an RBF kernel had the best overall performance, both on the training set and on the independent test set. This is an indication that nonlinear effects might play a role in the distinction between patients with and those without deep myometrial invasion. The better sensitivity for deep invasion of the LSSVM model could be helpful in selecting patients who might benefit from a pelvic lymphadenectomy by an experienced surgeon.
It is important to emphasize that the models described in this study might not be ready to be implemented in routine clinical practice. First of all, the measurements that were considered in our study all originated from the same sonologist. Because of differences that might exist between different centers, or even individual sonologists (who might, for example, use different ultrasound equipment), the models discussed here should be tested on multicenter prospective data using a stringent and detailed protocol; we have planned this multicenter prospective study. Moreover, the techniques used by the same expert might undergo subtle changes with time, causing a drop in model performance when the model is applied on new patients. These comments also apply to the evaluation of the degree of differentiation, a variable that was also included in our models. This parameter is, at least partially, a subjective measure that can differ between centers, between pathologists and in time. There is also the possibility of change in the characteristics of the population of patients, causing new patients to be drawn from a distribution different from the one that was used to derive the models. This again might cause a drop in model performance when applied to new data.
Despite these possible limitations, we believe that the proposed models could represent a simple and inexpensive method that might contribute to the preoperative distinction between low and highrisk patients, allowing for better preoperative allocation of patients with endometrial carcinoma. Further research is therefore needed in this area.
Acknowledgements
 Top of page
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
This research was supported by Research Council KUL: GOAMefisto 666, GOA AMBioRICS, IDO (IOTA Oncology, Genetic networks), several PhD/postdoc & fellow grants; Flemish Government: FWO: PhD/postdoc grants, projects G.0115.01 (microarrays/oncology), G.0240.99 (multilinear algebra), G.0407.02 (support vector machines), G.0413.03 (inference in bioi), G.0388.03 (microarrays for clinical use), G.0229.03 (ontologies in bioi), G.0241.04 (functional genomics), G.0499.04 (Statistics), research communities (ICCoS, ANMMM, MLDM); IWT: PhD Grants, STWWGenprom (gene promotor prediction), GBOUMcKnow (Knowledge management algorithms), GBOUSQUAD (quorum sensing), GBOUANA (biosensors); Belgian Federal Science Policy Office: IUAP P5/22 (Dynamical Systems and Control: Computation, Identification and Modelling, 2002–2006); EURTD: FP5CAGE (Compendium of Arabidopsis Gene Expression); ERNSI: European Research Network on System Identification; FP6NoE Biopattern; FP6IP eTumours.