Endemic goiter in pregnant women: utility of the simplified classification of thyroid size by palpation and urinary iodine as screening tests

Authors

  • Rutila Castañeda,

    Corresponding author
    1. Unidad de Investigación en Epidemiologıacute;a de la Nutrición, División de Investigación Epidemiológica y en Servicios de Salud, Coordinación de Investigación en Salud, Instituto Mexicano del Seguro Social, Mexico, DF, Mexico
    Search for more papers by this author
  • Diana Lechuga,

    1. Servicios de Salud, Hidalgo, Mexico
    Search for more papers by this author
  • Rosa Isela Ramos,

    1. Unidad de Investigación en Epidemiologıacute;a de la Nutrición, División de Investigación Epidemiológica y en Servicios de Salud, Coordinación de Investigación en Salud, Instituto Mexicano del Seguro Social, Mexico, DF, Mexico
    Search for more papers by this author
  • Clementina Magos,

    1. Instituto Nacional de Diagnóstico y Referencia Epidemiológica, Secretarıacute;a de Salud, Mexico, DF, Mexico
    Search for more papers by this author
  • Maribel Orozco,

    1. Unidad de Investigación en Epidemiologıacute;a de la Nutrición, División de Investigación Epidemiológica y en Servicios de Salud, Coordinación de Investigación en Salud, Instituto Mexicano del Seguro Social, Mexico, DF, Mexico
    Search for more papers by this author
  • Homero Martıacute;nez

    1. Unidad de Investigación en Epidemiologıacute;a de la Nutrición, División de Investigación Epidemiológica y en Servicios de Salud, Coordinación de Investigación en Salud, Instituto Mexicano del Seguro Social, Mexico, DF, Mexico
    Search for more papers by this author

*Correspondence: Dr R. Castañeda, Coordinación de Investigación en Salud, Bloque B, Unidad de Congresos, 4o. Piso, Centro Médico Nacional Siglo XXI, IMSS, Avenida Cuauhtémoc 330, Col. Doctores, Mexico, DF 06725, Mexico.

Abstract

Objective To validate urinary iodine (I) excretion and the simplified classification of goiter by palpation, comparing them with ultrasound of the thyroid gland as the gold standard, to identify endemic goiter in pregnant women.

Population and setting 300 pregnant women identified in referral hospitals, in three geographic regions.

Methods Two endocrinologists, previously trained, evaluated thyroid size by palpation and by ultrasound. Urinary iodine excretion in a sample of urine was determined. Thyroid size below the 90th centile by ultrasound was considered normal.

Results Mean age of study women was 23 years old. The prevalence of low weight for gestational age was 39% and of anaemia 47%. Our sample distribution showed that 120 μg I/L was the best cut off for low urinary iodine excretion to identify endemic goiter in pregnant women (sensitivity 57% and specificity 70%, likelihood ratio of 1.4). The prevalence of goiter was 10% using ultrasound. Palpation had a sensitivity of identification goiter of 94% (95% CI 89–99%), a specificity of 80% (95% CI 75–85%), a likelihood ratio of 4.7, positive post-test probability of 36.5% and negative post-test probability of 99%.

Conclusions Low urinary iodine excretion identified up to 46% of women with goiter. This test by itself is not useful as a screening tool to identify pregnant women at risk of goiter. Identification of thyroid size by palpation was a better screening test. However, when both tests were combined in parallel, up to 100% of women with goiter were correctly identified. Our results suggest that the commonly used cut off point of 100 μg I/L to identify low urinary iodine excretion may under-estimate the prevalence of iodine deficiency disorders when used during pregnancy.

Introduction

Iodine is the basic component of thyroid hormones. Iodine deficiency is characterised by a slow down of metabolic processes, which translates into a deficit of growth and development. This effect is present in animals and in humans1–3. In 1983, Hetzel used the term “iodine deficiency disorders” to encompass a series of clinical and functional disorders, which include goiter, low birthweight, miscarriage, and different degrees of hypothyroidism4,5.

The World Health Organisation (WHO), the United Nations Children Fund (UNICEF) and the International Council for the Control of Iodine Deficiency Disorders (ICCIDD) all agree on the need to eliminate iodine deficiency disorders. To do this, there is a need to identify populations at risk, in order to target interventions and to monitor the impact of such interventions. The public health relevance of these actions became evident in the commitment of those nations that participated in the World Summit for Children in 1990, which set as a goal the elimination of iodine deficiency disorders by the year 20006–8.

To establish monitoring at a population level, we need to have screening tests that have high sensitivity to identify even borderline cases6,7,9,10. In iodine deficiency disorders, this is required because it is important to establish interventions at the public health level that, even when applied to false positive cases, will result in more benefit than harm11. This is the case of effective salt iodination12.

For decades, the identification of thyroid size by a clinical examination has been the most widely used test. This method relied on the classification of goiter in five degrees of severity: 1 = normal; 1A = thyroid enlarged by manual exam, but not visible; 1B = thyroid enlarged and visible with the neck in hyperextension; 2 = thyroid visible even with the neck in normal position; 3 = thyroid visible at a distance. However, this method has a low sensitivity (38%), even when applied by experts13. This is particularly true in small children5. In 1995, there was a joint proposal by WHO/UNICEF/ICCIDD to simplify the classification, using only three degrees of severity: 0 = normal thyroid size; 1= thyroid enlarged by manual exam, but not visible; 3 = thyroid enlarged and visible. Up-to-date, this classification has only been used in school age children, and its use as a screening method compared with a gold standard has not been properly documented4,6. There is an urgent need to do so, particularly in pregnant women, as they constitute a high risk group for iodine deficiency in the newborn14,15.

The objective of this study was to validate the usefulness of field methods generally used to identify endemic goiter in pregnant women, including urinary iodine and the simplified classification of goiter enlargement, comparing them alone or combined in parallel against goiter size identified by ultrasound, as the gold standard.

Methods

The study was designed as a diagnostic test for screening. The study population consisted of pregnant women who lived in three distinct geographic areas in the state of Hidalgo, Mexico. The criteria to select the three areas included: the purposeful selection of a region known as an endemic area for goiter (Huejutla), the choice of a rural area in which local clinical records showed a low prevalence of goiter (Ixmiquilpan) and an urban area surrounding the capital city (Pachuca), in which we did not expect to find iodine deficiency. We chose these different sites in order to capture the full spectrum of the expected iodine deficiency, as suggested by ICCIDD8. We calculated the sample size for a screening test, assuming an expected sensitivity of the tests of 80%, with a confidence interval (CI) of 10% (i.e. acceptable error of 5%) and 80% power for the test16. The corresponding sample size was 300 pregnant women (100 in each area). We followed a sampling schedule by quota, identifying pregnant women who showed up for delivery in referral hospitals of the public health care system.

Two endocrinologists carried out the identification of thyroid size. Training was carried out in the thyroid clinic of a tertiary level hospital belonging to the Mexican Social Security Institute in Mexico City. To standardise the two observers, they evaluated women without thyroid disease, with hypothyroidism, with Graves–Basedow disease, post-I-131 treatment, post-thyroidectomy and women with simple or nodular goiter. The two observers achieved a weighed kappa inter-observer coefficient of 70% before going out to the field.

Clinical exam of thyroid size was carried out independently by the two observers, both of whom were blind to the results of the gold standard. They used the simplified classification of goiter in three degrees. Facing the interviewed woman, the examiner placed the fingers over the sternum to feel the gland and asked the patient to swallow, to be sure that the gland had been properly identified. If the thyroid lobules were smaller than the distant phalanx of the patient, the gland's size was considered normal.

Aside from this clinical exam, the interviewers also recorded gestational age in weeks, as assessed from the date of last menses. They also took weight and height measurements, as well as the height of uterine fundus.

Interviewed women were asked to provide a urine sample to determine urinary iodine. The manual acid digestion method proposed by Sandell–Kolthoff was used17. Mild urinary iodine deficiency was defined as a concentration of 50–100 μg I/L, moderate deficiency as a concentration between 20 and 49 μg I/L and severe iodine deficiency as a concentration <20 μg I/L1–6. However, in order to take into consideration the physiologic changes that occur during late pregnancy, such as increased glomerular filtration, which in turn leads to a greater excretion of iodine in urine, for analysis we defined different cut offs, taking into consideration the centile distribution of our sample.

Thyroid size was determined by ultrasound, using a portable Philips unit, with a 7.5 MHz transducer. Measurements were taken with the interviewee facing the interviewer in a sitting, resting position, and both thyroid lobules were measured in terms of length, width and depth. The volume of the gland was calculated following the formula published by Blum in 198618: (length, in cm) × (width, in cm) × (depth, in cm) × 0.479. Individual values were calculated for each lobule and summed up to give the gland's total volume.

The demographic and clinical data of the sample population were described using descriptive statistics. The cut off point for urinary iodine excretion to classify an individual as deficient was defined after plotting sensitivity and specificity values in a receiver operator curve. We calculated the sensitivity, specificity, likelihood ratio and efficiency of the test for urinary iodine excretion and thyroid size using the three-point classification, alone and combined in parallel (i.e. considering a positive case when a woman had either or both tests positive, respectively), comparing them with thyroid size as determined by ultrasound, defined as the gold standard. Thyroid size values were expressed in cubic centimetres (cc) and plotted in a centile graph. Values under the 90th centile were considered normal. We correlated total thyroid volume with each individual lobule's volume, in order to evaluate the predictive power of a single measurement. ANOVA was used to identify statistically significant differences among the three study groups for variables with a parametric distribution, and Kruskal–Wallis was used for urinary iodine excretion, which showed a non-parametric distribution. Post hoc tests were carried out to single out the group that differed from the rest. All statistical analyses were done using the statistical software package SPSS for Windows 7.0 (SPSS, Chicago, Illinois, USA).

This study received ethical approval from the Mexican Social Security Institute Research Review Committee. All those who participated in the study did so voluntarily, having given their informed consent. All pregnant women with goiter identified by the gold standard test received Lipiodol.

Results

The description of the sample population in each of three geographic study areas is shown in Table 1. There were no statistically significant differences between regions in terms of mean age, gestational age and frequency of low weight for gestational age, although the latter was lowest in Pachuca (35%) and highest in Huejutla (45%). Likewise, haemoglobin values and urinary iodine excretion were consistently lower in the endemic goiter region of Huejutla, compared with the other two regions. Likewise, thyroid volume was largest in Huejutla, although the differences were not statistically significant. The differences were statistically significant for mean height, which was lower in Huejutla compared with the peri-urban area of Pachuca, and mean body weight, which was lowest in Huejutla and highest in Pachuca. The centile distribution of thyroid volume showed 13 cc to be the 90th centile; therefore, this was the cut off used to identify an enlarged thyroid. Mean total volume of the thyroid gland was largest in Huejutla and lowest in Ixmiquilpan (Table 2). There was also a statistical significant difference between right and left length and depth of the thyroid measurements, with the largest volumes in women in Huejutla and lowest in Ixmiquilpan. The correlation coefficient (r) between left front thyroid lobule and total volume of the gland was 0.81 (P= 0.003).

Table 1.  General characteristics of the women studied in each of the three areas. Values are given as mean [S.D.] and %.
 Pachuca (n= 100)Ixmiquilpan (n= 100)Huejutla (n= 100)
  1. *P < 0.05 by ANOVA.

Age (years)23.48 [5.76]23.43 [5.29]23.28 [5.87]
Weight (kg)62.40 [11.30]61.44 [10.27]55.69 [9.11]*
Height (cm)153.09 [7.30]150.92 [6.48]147.70 [5.81]*
Miscarriages (%)17155
Gestational age (weeks)29.34 [8.77]30.4 [6.89]28.40 [9.00]*
Low weight for gestational age (%)353745
Height of uterine fundus (cm)25.13 [6.50]29.1 [7.60]24.72 [6.70]*
Haemoglobin (g/dL)12.14 [1.48]11.03 [1.87]*11.40 [1.57]*
Table 2.  Size of the pregnant women's thyroid gland, urinary iodine excretion and prevalence of goiter by manual exam and low urinary iodine excretion in each of the three study areas. Values are given as mean [S.D.], (median) and %.
 Pachuca (n= 100)Ixmiquilpan (n= 100)Huejutla (n= 100)Total (n= 300)
  1. a Grade 0: normal, grade 1: thyroid palpable but not visible, grade 2: thyroid palpable and visible.

  2. *P < 0.05 by ANOVA.

  3. **P < 0.05 by Kruskal–Wallis.

Total thyroid volume (cc)8.37 [1.98]7.42 [2.28]11.13 [4.69]* 
Right thyroid lobule (cm)    
Width16.80 [8.57]*14.85 [1.89]17.31 [2.53]* 
Length37.86 [7.00]35.77 [3.85]40.23 [4.70]* 
Depth15.88 [8.72]14.67 [2.04]15.94 [2.40]* 
Left thyroid lobule (cm)    
Width15.91 [8.60]14.39 [1.94]17.19 [3.41]* 
Length37.60 [6.77]35.49 [4.01]40.31 [4.53]* 
Depth15.84 [8.65]14.55 [1.83]16.24 [3.01]* 
Urinary iodine excretion (μg/L)116.68124.95109.00 
Median (min–max)(10–150)(6–150)(4–150)** 
Prevalence of goiter (%)    
Thyroid enlargement by clinical exama19205230.3
  Grade 081804869.7
  Grade 119204427.7
  Grade 20082.6
Low urinary iodine excretion    
Low (<100 μg I/L)31214131
Mild (50–99 μg I/L)21142621
Moderate (20–49 μg I/L)7411 7
Severe (<20 μg I/L)334 3

Table 2 shows the results of the tests to identify goiter and low urinary iodine excretion. Thyroid size by manual exam identified in Pachuca 81% of women with normal thyroid (Grade 0), in Ixmiquilpan 80% of women and in Huejutla 48%, corresponding to 69.7% of the total sample. Manual exam also identified 19% of women with enlarged but not visible thyroid (Grade 1) in Pachuca, 20% in Ixmiquilpan and 44% in Huejutla, corresponding to 27.7% of the total sample. Lastly, we identified 8% of pregnant women with enlarged and visible thyroid (Grade 2) only in Huejutla, which corresponded to 2.6% of the total sample. The median value of urinary iodine excretion was 146 μg I/L (range 4–150 μg I/L); 31% had less than 100 μg I/L. Twenty-one percent of these had a mild iodine deficiency, 7% a moderate deficiency and 3% a severe iodine deficiency.

Table 3 shows the values of the diagnostic tests for the whole sample. The simplified classification of goiter in three degrees had a sensitivity of 94% (95% CI 89–99%) and a specificity of 80% (95% CI 75–85%), with a likelihood ratio of 4.7 and efficiency of the test of 80%. The positive value of the post-test probability was 36.5% and the negative value was 99%. For lower prevalence, such as those found in Pachuca and Ixmiquilpan, the positive values of the post-test probability were 15% and 10%, respectively, while the negative post-test probability value was 100% for either site. In Huejutla, with a higher prevalence of goiter, the positive value of the post-test probability was 53% and the negative value was 95%. Prevalence of low urinary iodine excretion was 31%, with a sensitivity of 41% (95% CI 36–46%), specificity of 70% (95% CI 65–75%), efficiency of the test of 70% and likelihood ratio of 1.3, positive post-test probability of 13.9% and negative post-test probability of 89%. When both thyroid gland size and urinary iodine excretion were used in parallel (i.e. when either or both tests were positive), the sensitivity to identify enlarged goiter increased to 97% (95% CI 92–100%) while the specificity decreased to 56% (95% CI 51–61%), with a likelihood ratio of 3.44 and efficiency of the test of 61%, positive post-test probability of 68.3% and negative post-test probability of 95.9%.

Table 3.  Comparison of the simplified classification of thyroid size and urinary excretion used as diagnostic tests to identify goiter by ultrasound in 300 pregnant women. Values are given in %, sensitivity (95% CI) and specificity (95% CI).
Values of diagnostic tests for the prevalence of goiter by ultrasound, clinical exam and urinary iodine excretion%
  1. a OMS/UNICEF/ICCIDD.1995 0.

Prevalence of goiter by ultrasound (gold standard)10
Prevalence of goiter by clinical exama30.3
Sensitivity (95% CI)94 (89–99)
Specificity (95% CI)80 (75–85)
Efficiency80
Likelihood ratio4.7
Positive post-test probability36.5
Negative post-test probability99
Prevalence of low urinary iodine excretion (<100 μg/L)31.0
Sensitivity (95% CI)41 (36–46)
Specificity (95% CI)70 (65–75)
Efficiency70
Likelihood ratio1.3
Positive post-test probability13.9
Negative post-test probability89
Prevalence of goiter by clinical exam and urinary iodine excretion combined in parallel42.6
Sensitivity (95% CI)97 (92–100)
Specificity (95% CI)56 (51–61)
Efficiency61
Likelihood ratio3.44
Positive post-test probability68.3
Negative post-test probability95.9

The receiver operator curve shown in Fig. 1 identified 120 μg I/L as the cut off point with the highest sensitivity, which was chosen to identify iodine deficiency in view of the requirements of a screening test. The sensitivity of this cut off was 57% (95% CI 53–63%) and the specificity was 70% (95% CI 65–75%), with a likelihood ratio of 1.4 and efficiency of the test of 70%. The value of the area under the receiver operator curve was 0.563.

Figure 1.

Receiver operator curve to show the sensitivity and specificity of the traditional cut off (100 μg/L) and the proposed cut off (120 μg/L) for iodine urinary excretion in pregnant women.

Discussion

Iodine deficiency is the most common preventable cause of irreversible brain damage19. According to UNICEF, there are 43 million people in the world with cerebral lesions due to iodine deficiency, while about 760 million people have goiter6. Other common consequences of iodine deficiency include low birthweight, abortions, stillbirths and various degrees of hypothyroidism20–22. All of these consequences are preventable by an adequate iodine supply to the target populations23,24. This is why it is important to have screening tests that may reliably identify individuals or populations at risk (i.e. with high sensitivity).

Pregnant women constitute one such population, as adequate iodine nutritional status prior to and during pregnancy is reflected in adequate intrauterine growth and development of the fetus, specifically of his/her nervous system, as has been widely demonstrated, experimentally, clinically and epidemiologically14,25. Another important reason for screening for iodine deficiency disorders in pregnant women is that they are easily accessible through prenatal clinics, pregnant women's clubs or midwives. However, there had been no previous validation of common field tests to identify iodine deficiency disorders in this high risk group.

Despite its low sensitivity, urinary iodine excretion has been the most widely used screening test to identify iodine deficiency disorders populations. This is why we decided to evaluate this test, alone and in combination with thyroid size by clinical exam. The sensitivity of urinary iodine excretion used by itself was low, as has been previously reported4–6. Due to the fact that glomerular filtration is increased during pregnancy, thus increasing the clearance of urinary iodine26, we evaluated the commonly proposed cut off point of 100 μg I/L. The analysis of the receiver operator curve allowed us to identify a better cut off at 120 μg I/L, which maximised the sensitivity of the test (57% vs 41% shown by 100 μg I/L) without affecting the specificity (70% for either cut off). Likewise, the likelihood ratio showed that a woman with a urinary iodine less than 120 μg I/L was 1.4 times more likely to have goiter than a woman with a urinary iodine above this cut off. Global efficiency of the test for this cut off was 70%, higher than the 60% of the 100 μg/L cut off.

From an epidemiologic point of view, it is necessary to have screening tests to iodine deficiency disorders with a high sensitivity even if they have a lower specificity. In iodine deficiency disorders, it is known that it is necessary to identify the probable cases, even if they prove to be false positives, in order to implement suitable public health care measures, which are likely to provide more benefit than harm and may contribute to modifying the natural history of these disorders27,28. In the case of iodine deficiency disorders, the effective saturation of salt with iodine is such a public health measure, as even when it is consumed by non-iodine-deficient populations it does not show toxic effects. Another, more targeted intervention, such as a dose of oral iodine (Lipiodol) should be administered to deficient populations, identified by an appropriate test.

On the other hand, the thyroid gland of women chronically exposed to iodine deficiency is more responsive to the action of thyroid stimulating hormone (TSH). This response is increased due to the presence of TSH-like hormones, secreted by the placenta29. While this contributes to a physiologically enlarged thyroid gland during pregnancy, the actual volume of the gland depends on its capacity to compensate for the chronic iodine deficiency30–32. In view of these physiologic responses, we considered it important to evaluate the recognition of thyroid enlargement, using the new classification of thyroid size proposed by WHO/UNICEF/ICCIDD. To increase our sample size, as well as to capture the full spectrum of the morbid condition, we considered it appropriate to combine the samples from the three study sites. In other words, we were actually looking for a population that was not homogeneous, so that we could study different manifestations of iodine deficiency. While the sensitivity and specificity of the test are not dependent on the prevalence, this affects other characteristics of the diagnostic test. Thus, if we had concentrated on a single population, the different prevalence of goiter and low urinary iodine excretion would have affected the post-test probability of identifying these deficiencies. The positive value of the post-test probability was 36.5% and the negative value was 99%. For example, in Pachuca and Ixmiquilpan, with the lower prevalence of goiter by clinical exam (19% and 20%, respectively), the positive post-test probabilities were lower and the negative post-test probability was higher than in the whole sample, while in Huejutla, with the higher prevalence of goiter (45%), the positive post-test probability was higher than in the whole sample, but the negative post-test probability was lower.

Identification of thyroid size by palpation has been shown to have a high intra- and inter-observer variability, hence its low sensitivity (as low as 30% in some field tests)4,5. This may be due, at least in part, to the difficulty of standardising field personnel in the use of the five-degree classification33, which resulted in under-estimating small thyroid enlargement34. The new classification of goiter in only three degrees of severity may overcome this difficulty. In our study, up to 94% of women were identified as having goiter, compared with the identification of thyroid enlargement by ultrasound. Likewise, 85% of women without thyroid enlargement were classified as such by the clinical examination. The efficiency of the test, which was 80% in our study, may be due to a greater ease of identifying thyroid size in the adult (when compared with small children), as well as to the high inter-observer concordance (kappa > 70%). The likelihood ratio meant that it was 4.7 more likely that a woman with an enlarged thyroid that is palpable and visible would actually have endemic goiter.

When both field tests were combined in parallel, identifying a case as positive when either of the two or both of them were positive, 100% of women with goiter were properly identified, as well as 56% of women with normal thyroid volume. In this case, the likelihood ratio showed that it was 3.44 more likely to find that a woman had endemic goiter when either test was positive, when compared with women with both tests negative.

In a situation where there may be concern regarding training of field enumerators in the identification of thyroid size, or when the inter-observer agreement may be low, it would be advisable to include the determination of urinary iodine excretion to identify the severity of the deficiency. On the other hand, if the training of field workers is likely to provide good identification of thyroid enlargement, the extra cost and difficulties of adding a second test would not be necessary. Conversely, we would not recommend the use of urinary iodine excretion alone.

In summary, proper identification of populations of pregnant women at risk of iodine deficiency will allow proper public health measures to be implemented. In several countries, salt iodination has proved to be the most effective intervention to lower iodine deficiency disorders24,35. Parenteral or oral administration of iodised oil has also been shown to lower clinical manifestations of iodine deficiency, to lower morbidity and to increase survival of the infant10,11,36–38. Other public health care measures, such as nutrition education and feeding of iodised salt to animals, will also help to lower prevalence of goiter in endemic areas.

Conclusion

In summary, identification of thyroid enlargement using the new classification using three degrees has a higher sensitivity that the old classification using five degrees of enlargement (94% vs 38%, respectively), provided there is good training and acceptable inter-observer agreement. The high sensitivity and specificity of the new classification of goiter in three degrees may also enable its use in clinical studies to identify individuals at risk of thyroid enlargement. On the other hand, the high specificity of urinary iodine excretion (even when used by itself) may be important in areas where there is low or an unknown prevalence of goiter. Our results suggest that the commonly used cut off point of 100 μg I/L, may under-estimate the prevalence of iodine deficiency disorders when used during pregnancy. Our sample distribution suggests that 120 μg I/L may be a more appropriate cut off for pregnant women. While urinary iodine excretion should not be used at an individual level to identify iodine deficient subjects, it may be useful to classify the degree of severity of iodine deficiency found at the population level. When both tests are combined, sensitivity increased up to 100%.

Acknowledgements

The authors want to thank the Program Against Micronutrient Malnutrition (PAMM), for their participation, and the Department of International Health, Rollings School of Public Health, Emory University, Atlanta, Georgia, for the loan of the ultrasound machine used in this study, as well as the required materials. They authors also thank Dr Iván Mendoza Perdomo (CeSSIAM, Guatemala), who trained two of the authors in the use of the ultrasound machine.

This study was supported by grants from CONACYT (National Science and Technology Council, Mexico) and received ethical approval from the Instituto Mexicano del Seguro Social Institutional Review Board Committee in Hidalgo.

Ancillary