Psychometric properties of the pelvic organ prolapse symptom score


Dr S Hagen, Nursing, Midwifery and Allied Health Professions Research Unit, Glasgow Caledonian University, Cowcaddens Road, Glasgow G4 0BA, UK. Email


Objective  To assess the internal consistency, construct validity and sensitivity to change of a pelvic organ prolapse symptom score (POP-SS).

Design  Analysis of data from three prolapse studies, including symptomatic and asymptomatic women who completed the POP-SS.

Setting  (1) A community setting in New Zealand, (2) two gynaecology outpatient departments in Scotland and (3) a gynaecological surgery department in Scotland.

Population or sample  (1) Participants from a survey of postnatal women at 12-year follow up, invited to complete a prolapse questionnaire and have prolapse assessment, (2) new gynaecology outpatients presenting with prolapse symptoms, randomised to pelvic floor muscle training (PFMT) or control and (3) women having anterior and/or posterior prolapse surgery, randomised to mesh insert or no mesh.

Method  Data were analysed to assess internal consistency, construct validity and sensitivity to change of the POP-SS.

Main outcome measures  Cronbach’s alpha, significance of differences in POP-SS scores between studies and significance of difference in POP-SS scores pre- to post-intervention.

Results  For internal consistency, Cronbach’s alpha ranged from 0.723 to 0.828. Women having surgery had higher POP-SS scores than those having conservative management (mean difference 5.0, 95% CI 3.1–6.9), who in turn had higher scores than the asymptomatic women (mean difference 5.9, 95% CI 4.4–7.4). Significant differences in POP-SS score were detected after surgery and PFMT. The improvement due to surgery was significantly greater than that associated with PFMT (z =−3.006, P = 0.003).

Conclusion  The POP-SS has good internal consistency and construct validity and is sensitive to change.


Pelvic organ prolapse (POP), a common female condition, is symptomatic descent, from the normal anatomical position, of the vaginal walls, apex or vault.1 Women with prolapse present with a variety of symptoms (vaginal, urinary, bowel, back, abdominal and sexual symptoms). Some of these symptoms are specifically associated with the descending pelvic organs protruding into the vaginal canal, for example feeling of a bulge or something coming down. Others, such as urinary and bowel problems, can co-exist and may be related to or independent of the prolapse. It is important in research and clinical practice that we quantify such symptoms using standardised instruments with known psychometric properties.

Many instruments exist for measuring urinary symptoms and associated quality of life, including 17 questionnaires that the International Consultation on Incontinence (ICI) classed as grade A (i.e. having established reliability, validity and responsiveness demonstrated in one or more data sets).1 Far fewer are available for the specific symptoms of prolapse. The ICI in 20051 concluded that questionnaires in this area were ‘poorly developed to date and required encouragement’; two questionnaires of grade B (validity and reliability established with rigour or validity, reliability and responsiveness indicated) were identified (Pelvic Floor Disorder Inventory [PFDI]2 and Pelvic Floor Impact Questionnaire [PFIQ]2) and an additional five were in early development (grade C) (e.g. P-QoL and International Consultation on Incontinence Questionnaire [ICIQ] Vaginal Symptoms Module).

Since then, work has been published on the above prolapse measures (short-form versions of the PFDI and PFIQ3, the P-QoL4 and ICIQ Vaginal Symptoms Module5). The most prominent of these measures2,4 are fairly lengthy, cover a range of symptoms and include a number of subscales, for example relating to urinary and bowel symptoms. It could be argued that these commonly co-existing symptoms are better measured using validated, condition-specific instruments such as those developed by the ICIQ group6 and that there remains a need for a brief symptom index that encapsulates the presence and extent of key prolapse symptoms. We report in this study on a scale that fulfils this need.

At the start of a programme of work on prolapse in 2000, when we sought a brief validated prolapse symptom scale, no suitable scale was available. We thus developed a simple set of key questions covering the symptoms caused or exacerbated specifically by prolapse, which could serve as the primary outcome measure for subsequent randomised controlled trials of various interventions for POP. The key questions formed the basis for a POP symptom score (POP-SS).

Our intention was to supplement the POP-SS with a number of existing validated scales aimed specifically at urinary (ICIQ-UI SF7), bowel (ICIQ-BS6) and sexual symptoms (Pelvic Organ Prolapse/Urinary Incontinence Sexual Questionnaire, PISQ-128) so that these functions could be assessed independently.

We administered the POP-SS to women in a number of research studies to generate data on its acceptability and performance. This article presents the findings regarding psychometric properties of the POP-SS, including internal consistency, construct validity and sensitivity to change.


The pelvic organ prolapse symptom score

The POP-SS consists of seven items, each with a 5-point Likert response set (0 = never, 1 = occasionally, 2 = sometimes, 3 = most of the time and 4 = all of the time) (Table 1). The question format and response set were modelled on those used by the ICIQ group to standardise outcome measures in pelvic floor dysfunction research and clinical practice.6 The items were developed from reviewing the literature in the course of undertaking a number of prolapse-related Cochrane systematic reviews9–11 and from discussion with gynaecologists, physiotherapists and women with prolapse. Some of the items are similar to those in other instruments since they target universally acknowledged symptoms associated with prolapse (e.g. a feeling of something coming down in the vagina). A total score (range 0–28) is calculated by summing the seven individual symptom responses to derive the POP-SS score. In addition, women indicate which one of the seven symptoms causes them most bother (Table 1).

Table 1.  POP-SS: percent of women responding positively to symptom questions in each study
How often during the last 4 weeks have you had the following symptoms (0 = never, 1 = occasionally, 2 = sometimes, 3 = most of the time and 4 = all of the time)
 Study 1 ProLong (n = 435) (%)Study 2 POPPY (n = 47) (%)Study 3 IMPRESS (n = 66) (%)
  • *

    The symptom most often identified as causing most bother is shown, with the percentage of respondents who chose this symptom. This question was used only in study 1 and study 3. N/A, not applicable.

A1A feeling of something coming down from or in your vagina?16.278.789.2
A2An uncomfortable feeling or pain in your vagina which is worse when standing?13.067.470.8
A3A heaviness or dragging feeling in your lower abdomen/tummy?27.063.881.5
A4A heaviness or dragging feeling in your lower back?23.759.666.2
A5A need to strain (push) to empty your bladder?24.156.572.3
A6A feeling that your bladder has not emptied completely?38.163.887.7
A7A feeling that your bowel has not emptied completely?46.463.876.9
A8*Which of the symptoms above (questions A1–A7) causes you most bother?A7, 39.3N/AA1, 40.0

At an early stage, the POP-SS was assessed in qualitative interviews with ten women (mean age 49 years) during which they completed the seven questions as part of a larger questionnaire. Women who had either stage I (n = 5) or stage II (n = 5) prolapse were purposively selected to represent the range of prolapse types (four rectocele, three cystocele, two rectocele + cystocele and one uterine prolapse). The ‘think aloud’ method12 was used to encourage women to make explicit their understanding of the questions and rationale for responses chosen. Women were also asked to comment on the comprehensiveness and acceptability of the questionnaire. This approach provided evidence of content validity and acceptability, since women could understand the questions, and found them acceptable and relevant to the symptoms that troubled them in relation to their prolapse.13

The POP-SS has to date been used in three studies,14–16 undertaken by the same research group, described below.

Data sets

Study 1: Prolapse and incontinence: Long-term research (ProLong)14

In New Zealand, in 2005, 435 women were followed up 12 years after giving birth at which time they had responded to a survey investigating postnatal urinary and faecal incontinence.17 All women completed the POP-SS, and a subgroup of 166 women agreed to have objective prolapse assessment using the Pelvic Organ Prolapse-Quantification (POP-Q) system.18 Women were not known to be symptomatic of prolapse: they were selected entirely on the basis of their involvement in the earlier survey.

Study 2: Pelvic organ prolapse physiotherapy (POPPY) feasibility study15

In 2003/04, in a feasibility study at two Scottish centres, focussing on stage I or II prolapse, 47 women were randomised to either a pelvic floor muscle training (PFMT) intervention group or a control group receiving only a prolapse-related lifestyle advice leaflet. Objective quantification of prolapse type and severity was carried out at baseline and 6 months in both groups using the POP-Q,18 and women completed a postal questionnaire including the POP-SS at baseline, 20 and 26 weeks.

Study 3: Insertion of mesh or sutures for prolapse surgery success (IMPRESS)16

In 2005, at one Scottish gynaecology centre, 66 women completed the POP-SS before and 6 months after having prolapse surgery (anterior and/or posterior repair). No POP-Q data were collected.

Analysis of the data resulting from these studies contributed information regarding internal consistency, construct validity and sensitivity to change of the POP-SS.

Psychometric properties

It is desirable for questions within a scale that are measuring the same concept, in this case extent of prolapse symptoms, to have high correlation; a property known as ‘internal consistency’. Internal consistency of the POP-SS was assessed using data from studies 1, 2 and 3.

A valid scale is one that measures what it intends to, and this is best assessed by comparison with a ‘gold standard’ measure of the same quantity (criterion validity).19 When no gold standard measure exists, as is the case for prolapse symptoms, it is appropriate to assess construct validity instead. Hypotheses or constructs can be established regarding the responses to the scale, and if the hypotheses are supported by the data, this provides evidence of construct validity. A form of construct validity known as trait validity was investigated by the hypothesis that scores at baseline (i.e. prior to any treatment) would be lowest in an asymptomatic group of women (study 1), followed by a conservative management group (study 2) and highest in a surgical intervention group (study 3).

Ability to detect change in prolapse symptoms due to an intervention is an important scale property. Sensitivity to change of the POP-SS was assessed by testing for a significant pre- to post-intervention improvement in scores using data from study 2 (PFMT intervention) and study 3 (surgical intervention). The improvement in scores was expected to be greater in study 3.

Statistical analysis

The three data sets described above were analysed separately and combined as appropriate to examine the properties of the POP-SS. The POP-SS data were found to be non-normally distributed in several of the samples, particularly post-intervention when symptoms are likely to have resolved; thus, primarily nonparametric methods were used.

Cronbach’s alpha20 was used to assess internal consistency of the seven-item POP-SS using data from studies 1, 2 and 3. Good internal consistency was assumed if Cronbach’s alpha was between 0.7 and 0.9.21 It is undesirable for alpha to be too high as this suggests redundancy in the items of the scale.

In assessing trait validity, initially, mean and median scores for the three study groups were tabulated. Nonparametric one-way analysis of variance (Kruskal–Wallis) was used to test for a significant difference between groups. Parametric analysis of variance, with post hoc t tests of differences between group means with Bonferroni correction, was also performed.

In terms of sensitivity to change, the Wilcoxon paired test was used to test for statistically significant differences between pre- and post-intervention POP-SS scores within studies. Differences between studies in pre- to post-intervention change in score were tested using the Mann–Whitney U test.

Analysis was undertaken using SPSS 14.0 for Windows software (SPSS Inc., Chigago, IL, USA), and a 5% level of significance was used throughout.


Sample characteristics

The women in study 3 (surgery group) were oldest and those in study 1 (asymptomatic) were youngest, reflecting the differing study populations (Table 2).

Table 2.  Characteristics of women from included studies
VariableStudy 1Study 2Study 3
  • *

    One woman presenting with prolapse symptoms but found to be stage 0 on examination was included. One woman with stage 3 prolapse was erroneously included.

  • **

    Women who had both anterior and posterior repair were assumed to have equal leading edges. n/a indicates where data were not available for particular studies.

Maximum sample size435 (166 with POP-Q)4766
Median age in years (range)40 (28–57)57 (31–72)61 (43–84)
POP-Q at baseline, n (%) 
Stage 03 (2)1 (2)*All women were stage II, III or IV
I59 (35)13 (29)
II101 (61)30 (67)
III3 (2)1 (2)*
Leading edge POP type, n (%) 
Anterior86 (52)17 (70)30 (48)
Posterior32 (19)4 (16)13 (20)
Anterior = posterior43 (26)2 (8)19 (30)**
Superior3 (1)1 (4) 
Median POP-SS (range) 
Baseline1 (0–16)8 (0–21)13 (3–28)
20-week/6-month post-interventionn/a7.5 (2–21)3 (0–22)
26-week post-interventionn/a6 (1–21)n/a

Internal consistency

The correlation among questions within the POP-SS was assessed in individual study data sets. Cronbach’s alpha values (Table 3) indicate that the POP-SS seven items have good internal consistency, that is alpha > 0.7. The POPPY study (study 2), which had the smallest sample size, had slightly lower Cronbach’s alpha for both 20- and 26-week follow-up time-points.

Table 3.  Internal consistency of the POP-SS
StudyCronbach’s alphan
Study 10.823421
Study 2 
20 weeks0.73738
26 weeks0.72339
Study 3 
6-month post-operative0.82862

Construct validity

The median POP-SS score at baseline was highest in the surgery study (study 3), followed by the conservative intervention study (study 2) and lowest in the study of asymptomatic women (study 1) (Table 2). A significant difference between groups (Kruskal–Wallis χ2= 176.730, df = 2, P < 0.001) was detected. The ProLong mean score was significantly lower than that at baseline from POPPY (mean difference −5.9, 95% CI [−7.4, −4.4]) and IMPRESS (mean difference −10.9, 95% CI [−12.2, −9.6]), and the baseline POPPY mean score was significantly lower than that for IMPRESS (mean difference −5.0, 95% CI [−6.9, −3.1]). That is, the POP-SS scores differed between studies in a predictable way.

Table 1 highlights where differences in POP-SS scores between studies arose from. In the asymptomatic group of women (study 1), a low percentage responded positively to having each of the seven symptoms. A feeling of incomplete bladder (38%) and bowel emptying (46%) were the symptoms most commonly reported, and the latter was the symptom that women said caused most bother. In contrast, only 16% reported a feeling of something coming down. Percentages were consistently higher (in excess of 50% for each symptom) in the conservative treatment group (study 2), with the most commonly reported symptom being a feeling of something coming down (79%) (Table 1). In the surgical group (study 3), the percentages were highest of all studies, across all symptom questions. Most women in this study reported a feeling of something coming down (89%): this was both the most prevalent symptom (but not reported by everyone) and the one that most women identified as causing most bother.

Sensitivity to change

In both the POPPY and the IMPRESS studies, a significant decrease in score after the interventions was detected (Table 4). The average decrease in score was shown to be significantly greater in the IMPRESS women than in the POPPY women (z =−3.006, P = 0.003), that is there was greater improvement in the surgery group than the PFMT group. Thus, the POP-SS was able to detect the changes brought about by both types of intervention, and a difference in the magnitude of the change was distinguishable between studies.

Table 4.  Sensitivity to change of the POP-SS: paired tests
StudyMean pre-interventionMean post-interventionMean difference (pre − post)nWilcoxonP value
  • *

    Only data from intervention women are included: control women received only a lifestyle leaflet, and no significant change in POP-SS score was detected.

Study 313.524.349.2061−6.069<0.001


Summary of aims

Our objective was to investigate the psychometric properties of a brief prolapse symptom scale (POP-SS), which might be used as an outcome measure in future trials of various prolapse interventions and in clinical practice. No suitable validated scale of this nature was available at the onset of our programme of work. There are now a number of published prolapse instruments that are reported to be valid and reliable; however, their length and complexity may make them impractical for some purposes. To our knowledge, a reliable, valid and sensitive scale such as the POP-SS is still lacking in the literature.

Internal consistency

Good internal consistency was confirmed across the three studies, and the POP-SS compared favourably with other instruments in this respect. Digesu et al.4 found Cronbach’s alpha to be in excess of 0.80 in their assessment of the P-QoL, and Barber et al.2 reported Cronbach’s alpha of 0.82 for the Pelvic Organ Prolapse Distress Inventory and 0.97 for the Pelvic Organ Prolapse Impact Questionnaire, which are the relevant subscales of the PFDI and PFIQ. It is reassuring that all POP-SS items appear to be measuring the same trait, that is, there is homogeneity of the items within the scale. The value of Cronbach’s alpha did not exceed 0.9, which would have suggested that the questions were too highly correlated and that some items were redundant. The findings suggest that a simple summation of scores over the seven symptom questions makes a reasonable index.19 Such a summary symptom score is a desirable continuous primary outcome measure for trials of prolapse interventions, allowing treatment effects to be measured in terms of a number of important symptoms simultaneously. This does not of course rule out analysing responses from each question individually, and indeed, this may be a useful approach to take in the clinical setting where it is important to consider a woman’s symptoms in turn and assess how they change over time.

It is interesting that the internal consistency of the POP-SS is good (Cronbach’s alpha 0.823) in a sample of women selected without knowledge of their status with regards to prolapse (ProLong). This is encouraging if the POP-SS were to be used in trials of interventions to prevent prolapse.


The three study populations were representative of women with differing profiles of prolapse. Study 1 comprised a group of women who had participated in a postnatal survey 12 years previously, and for them, prolapse status was therefore unknown. Study 2 included women opting for conservative treatment, predominantly with stage I or II prolapse. Finally, study 3 included women with prolapse of stage II or greater, having prolapse repair surgery. These groups of women would be expected to have different symptoms leading to their differing treatment choices, or in the case of study 1, to no treatment for prolapse being sought. The ability of the POP-SS to differentiate between these groups, as indicated by the significant difference in scores, supports the trait validity of the scale. The predicted ordering in average group scores was observed in the data, providing additional evidence of validity. In a similar analysis, the P-QoL domain scores were also found to differ significantly between symptomatic and asymptomatic women.4 Other studies have investigated validity in terms of the relationship between symptom scores and prolapse severity; however, to date, it is not clear whether increasing symptoms are correlated with increasing prolapse severity.22 Analysis of the relationship between the POP-SS and the POP-Q is underway and will contribute information to this debate.

Sensitivity to change

POP-SS could detect change due to both conservative and surgical interventions, and as expected, the improvement in symptoms was greater in women who had surgery. This is an important property for a scale that is to be used in trials establishing the effectiveness of interventions for treatment of prolapse. The sensitivity to change of the P-QoL and PFDI/PFIQ have not been reported. The short forms of the PFDI and PFIQ were however found to have moderate to excellent responsiveness 3–6 months after surgery.3

Implications for further research/use of POP-SS

Our aim was to develop a scale that was brief and contained only the key symptoms important in obtaining a view of how prolapse is affecting a woman. It could be argued that the three questions within the POP-SS relating to bladder and bowel are not symptoms experienced exclusively by women with prolapse. Generally, we avoided in our scale such questions; however these symptoms, more than others, are linked frequently with prolapse and were regarded to be worth including. The feeling of incomplete emptying of the bladder and bowel were the symptoms most commonly reported in the ProLong study (study 1) in which women did not necessarily have prolapse. This perhaps reflects the fact that these symptoms are experienced generally by women other than those with prolapse. The prevalence of these symptoms was, however, far higher in the studies of women with confirmed prolapse.

The POP-SS was developed from a wide perspective, drawing on published research, clinical expertise and qualitative data from women with prolapse. It would be desirable to undertake further qualitative work investigating how women with different profiles respond to POP-SS items and how well changes in scores reflect important modifications in their symptoms. Data are being gathered currently on the test–retest reliability of the POP-SS and on its relationship with the observed POP-Q measure. Examination of the psychometric properties of the POP-SS in other treatment groups, for example women being fitted with a vaginal pessary, is also warranted.

In prolapse research, the choice of an appropriate measure is still the subject of debate. There is a need to review and produce recommendations on the currently available prolapse questionnaires.


It has been shown that the POP-SS is a measure with good internal consistency; it is valid as a measure of prolapse symptoms as scores differed predictably between groups of women known to differ in their prolapse symptoms; finally, it is sensitive to the change brought about by treatment for prolapse, specifically surgical repair and PFMT.

The POP-SS is a brief questionnaire that is acceptable to women and lends itself to both the research and the clinical environment.

Disclosure of interests


Contribution to authorship

S.H. was the principal investigator on one study, carried out the main data analysis and drafted the manuscript. C.G. was the principal investigator on one study, assisted with interpretation of the findings, commented on draft manuscripts and approved the final submission. L.S. was involved in the analysis of the POP-SS data, commented on draft manuscripts and approved the final submission. D.S. was coinvestigator on one study and was involved in data collection, commented on draft manuscripts and approved the final submission. C.B. carried out the qualitative research on the POP-SS, drafted material for the manuscript and approved the final submission.

Details of ethical approval

For each study, the procedures received ethical approval from the relevant research ethics committee.

  • 1ProLong: Lower South Regional Ethics Committee, New Zealand, ethics ref. LRS/05/04/009, approved 31 May 2005.
  • 2POPPY: (1) Southern General Hospital Ethics Committee, paper no. EC/02/S/115, approved 25 September 2002 and (2) Grampian Research Ethics Committee, project no. 02/0243, approved 11 March 2003.
  • 3IMPRESS: Grampian Research Ethics Committee, project no. 04/MRE10/72, approved 9 May 2005.


  • 1ProLong: University of Otago Postgraduate Scholarship in Obstetrics and Gynaecology.
  • 2POPPY: Health Services Research Committee grant, Chief Scientist Office, Scottish Government.
  • 3IMPRESS: None.


Collaborators in each of the studies: Dr Ian Ramsay, Dr Stewart Pringle, Dr Robert Hawthorn, Dr John Tierney, Dr Christine Bain, Dr Kevin Cooper, Ms Lynne Swan, Professor Don Wilson, Professor Peter Herbison, Dr Nicola Dean, Ms Gaye Ellis and Dr Sabeena Allahdin. S.H. is funded by the Chief Scientist Office, Scottish Government.