Improving population‐level refractive error monitoring via mixture distributions

Sampling and describing the distribution of refractive error in populations is critical to understanding eye care needs, refractive differences between groups and factors affecting refractive development. We investigated the ability of mixture models to describe refractive error distributions.


INTRODUC TION
Understanding the distribution of refractive error within a population promotes optimal eye care planning and delivery. Description of the refractive error distribution within population sub-groups additionally enables explorations of the refractive development effects of inherent variables such as age or sex, exposures such as screen time or sleep patterns and interventions such as mandated time spent outdoors or topical low-dose atropine. Description of the entire refractive error spectrum in a defined population can improve our understanding of the recent and dramatic increases in myopia prevalence seen in some communities 1 and enable work towards the World Health Organization's recommendation for integrated people-centred eye care. 2 While the population distribution of neonatal refractive error is essentially normal, 3 it is widely accepted that the distribution of refractive error develops a marked kurtosis giving an excess of emmetropia and skewness toward myopia. 4,5 The emergence of both kurtosis and skewness in development can be explained by emmetropisation, in which ocular component growth is modulated by visual experience. 4,5 The non-normality raises a question regarding the best way to describe the distribution. It has been standard to simply report refractive error prevalence at specific cutpoints -for example, prevalence of myopia ≤ −0.50 D was 25%. Summary statistics are also common -for example, the mean spherical equivalent refraction (SER) was +0.12 D with a standard deviation of 1.93. Neither of these standard methods deals with the significant deviation from normality, potentially introducing inaccuracy and bias into any related analysis -for example, analysis of the influence of gender on myopia development. Efforts to understand the genetic and environmental mechanisms that determine refractive error may be obscured by the standard assumption of a normal distribution of refractive error in populations.
Single asymmetric distributions are difficult to apply to refractive error data, create difficulties comparing subgroups using standard statistics and do not fit mechanistic concepts of refractive development. 5 Mixture models, where component distributions are weighted and added to match the overall observed distribution, have been used in other fields and have potential to overcome these issues. 6 Three studies have proposed mixture models for refractive data, 5,7,8 but the proposals have not been widely adopted. Thorn 7 and Flitcroft 5 both focussed on the underlying mechanisms that might explain the success of describing refractive error distributions by adding two Gaussian (normal) components. Rozema et al. 8 tested mixtures of up to six Gaussian components; however, their use of clinical data potentially introduces selection bias that clouds the generalisability of the model conclusions.
No further investigation of the statistical accuracy, appropriateness or even use of mixture modelling to describe refractive distributions has been found. We explored mixture modelling options for describing population-based refractive distributions and compare the accuracy of the models identified.

METHODS
We identified publicly available raw datasets of refractive error in defined populations via literature search, web search and key informant advice. Our key informants were a range of experts with knowledge of refractive error epidemiology work around the world. We also identified published descriptions of refractive error distributions via a systematic search strategy. 9 The published descriptions were most commonly graphs of density or frequency versus refractive error, which we digitised into binned datasets using the juicr package for R (rdrr.io/github/mjlajeunesse/ juicr/). Our systematic search strategy was built on key concepts of "refractive error", "epidemiology" and "population". 9 Inclusion criteria were population-based sampling representative of clearly defined communities that quantified distance refractive error, clearly described the use (or not) of cycloplegic agents and reported sample size and participation rate. Exclusion criteria were self-reported refractive error and data from populations defined by disease state or condition. 9 K E Y W O R D S epidemiology, hyperopia, myopia, population distribution, refractive error, statistical models

Key points
• Efforts to understand refractive error development and improve eye care access are hampered by the current standard of describing refractive error distributions by summary statistics such as mean, variance and/or prevalence. • Two-and three-Gaussian-component mixture models can accurately describe accurately a wide variety of population-based refractive error distributions, and the ex-Gaussian additive model is a potentially useful alternative. • Our recommended hierarchy for reporting refractive error population data: (i) repository of de-identified raw data or described by (ii) two-or three-component Gaussian mixture models, (iii) high-quality frequency distribution or (iv) mean, standard deviation, skewness and kurtosis.
Mixture models combine component distributions via weighting to describe the overall distribution observed in epidemiologic research. Component distributions can take any form, but Gaussian (normal) components are appealing in the refractive case as they are the easiest to relate to underlying processes. 5,7 A Gaussian mixture of Κ components has the form: where  , 2 is a normal distribution with a mean of and a variance of 2 , i is the mixing weight for component i such that and is a vector of parameters to be estimated. The expectation-maximization (EM) algorithm is often used to find parameter estimates for the Κ mean/ variance pairs i , 2 i and K−1 mixing weights. A threecomponent Gaussian mixture, for example, has eight parameters that need to be estimated.
Additionally, we noted that adding an exponential component to a Gaussian component -that is, using the ex-Gaussian additive distribution -would generate the skewness and kurtosis common in population distributions of refractive error, potentially providing an accurate model. 6 In its natural form, the ex-Gaussian additive distribution generates a positive skewness between zero and two. This level of positive skewness is commonly seen in populations with low levels of myopia, such as preschoolage Europeans. To generate the negative skewness common in populations from school age onwards, the sign of all SER details was reversed to enable the exponential component in the model to match the observed skewness.
The ex-Gaussian distribution is the sum of independent normal and exponential random variables, Z = X + Y , where X is a normal distribution  , 2 with mean of and variance of 2 and Y is an exponential distribution with mean λ. The three parameters , 2 and can be estimated from the sample mean, variance and skewness of the observed data. 6 When skewness of the observed distribution was negative, myopia was denoted as "+" while hyperopia was denoted as "−" to enable the model to match the observed skewness.
We worked in the R environment (r-proje ct.org, and rstud io.com), which provides adaptable options for fitting mixture distributions via numerous freely available packages including gamlss, MixTools, FlexMix, FPC, Mclust, MixReg, MixDist and MixR. 10,11 MixR (github.com/GaryB AYLOR/ mixR) was our first preference for fitting Gaussiancomponent mixture distributions because it is able to fit both raw and binned data and provides a wide range of fitting accuracy estimates, using the EM algorithm. 12 The Generalized Additive Models for Location, Scale and Shape (GAMLSS, gamlss.com) and e1071 (cran.r-proje ct.org/web/ packa ges/e1071) packages were used to fit ex-Gaussian additive distributions.
The fitting algorithms and comparison statistics are more powerful when modelling raw data compared with binned data. So, the raw, all-ages datasets were modelled to compare fitting accuracy by visual observation, then by Bayesian information criterion (BIC). BIC is based on the likelihood function and is closely related to the Akaike Information Criterion but aims to prevent overfitting by penalising additional parameters in the model. 13 The lowest BIC indicates the simplest, most accurate model.
The three models considered most useful by a combination of BIC and adaptability were then used to estimate the prevalence of refractive error at four representative refractive cut-points. The prevalence estimates from these three models, and from a single-Gaussian distribution as the accepted standard, were compared with observed data. This modelling and comparison were repeated with the all-ages raw data, the raw data divided into age decades and all binned datasets identified by our systematic search strategy. The four representative refractive cut-points were high myopia (SER ≤ −5.00 D), low myopia (SER ≤ −0.50 D), low hyperopia (SER ≥ +0.50 D) and high hyperopia (SER ≥ +4.00 D) when modelling the raw data. We matched these cut-points as closely as possible when modelling the binned data but adjusted each as required to maximise accuracy in response to the bin boundaries provided by the publications. That is, when the published bin boundaries did not match our representative cutpoints, we used cut-points at the bin boundary closest to our representative cut-points. The combination of raw and binned datasets enabled assessment of the effect of data scarcity and diversity.

Identification of refractive error datasets
The National Health and Nutrition Surveys (NHaNES) of the United States and South Korea were the two publicly available repositories of raw refractive error data found. 14,15 Each is population-based, sampling multiple thousands of participants from across the respective countries, and each has been repeated multiple times. US NHaNES samples all ages, with vision data available for those aged ≥12-years from the 1971-1974, 1999-2000, 2001-2002, 2003-2004, 2005-2006  While the raw data are not nationally representative, nationally representative prevalence data can be derived via various adjustments based on demographic information such as age, sex, location, ethnicity and education. However, by their size, diversity and systematic sampling and testing, both NHaNES datasets provide useful opportunities for testing refractive distribution modelling. We used data An additional benefit of these datasets is the refractive diversity between South Korea and the United States. For example, myopia prevalence (≤−0.50 D) in 20-to 29-year-old participants is 80.7% in South Korea compared with 47.7% in the United States. The diversity is a useful challenge for the models. Another benefit of NHaNES data is that participant numbers are large enough to be analysed by age decade, which reveals further diversity of refractive distributions due to age and/ or cohort effects.
The raw refractive data obtained from both US (right eye data from 6034 participants) and Korean (right eye data from 4473 participants) NHaNES were in sphero-cylindrical form. We converted this to SER by adding the sphere and half the cylinder power.
Our systematic search strategy identified 665 published papers with population-based refractive error data from clearly defined communities. All 21 Global Burden of Disease study regions were represented by at least one study, demonstrating geographic spread. 16 The highest number of studies were from the East Asia, Western Europe and North Africa-Middle East regions. Data collection occurred between 1917 and 2020, although most occurred between 1999 and 2019. Single datasets are sometimes used by multiple papers and this duplication has not been removed. With this caveat in mind, there were over 6000 reports of refractive error prevalence, with many papers reporting for multiple sub-groups within their sample and at several refractive cut-points.
From the 665 papers publishing population-based refractive error data, there were 168 quantifications of the refractive error spectrum providing significantly greater information than simply refractive error prevalence at one or a few refractive cut-points. Fifty-three of these provided density or frequency distributions of refractive error that could be arranged into ≥15 refractive bins with clear definitions and method descriptions. The average number of bins was 32, with a standard deviation of 21. There was no dataset duplication in the binned refractive distribution subset, so the duplication within the broader search strategy did not affect our analysis of refractive distribution fitting model accuracy. NHaNES data published as binned datasets were excluded from this analysis to avoid replication with our analysis of the NHaNES raw data. The binned refractive data included were from 12 Global Burden of Disease regions: East Africa, 17 North Africa-Middle East, 18,19 South Asia, [20][21][22] East Asia, [23][24][25][26][27] South East Asia, 28 Oceania, 29 Tropical Latin America, 30 North America High Income, [31][32][33] Asia Pacific High Income, [34][35][36][37][38] Australasia, 4,39 Western Europe [40][41][42][43][44][45][46][47][48] and Eastern Europe. 46 All were published in SER form. Some publications grouped participants by features such as age, sex or ethnicity and provided refractive distributions for each group; we accepted each as they posed differing challenges in terms of mean, variance, skew and kurtosis.

Model comparison part 1 -visual observation and fitting statistics
Multiple mixed Gaussian fits and an ex-Gaussian fit of the all-ages US and Korean NHaNES data are shown in Figure 1. By visual observation, mixture distributions of two-to six-Gaussian components and the ex-Gaussian addition distribution all appear to describe the data better than the single-Gaussian model.
The BIC of each Gaussian mixture model fit of a refractive error distribution was recorded for the all-ages US NHaNES data, the all-ages Korean NHaNES data, seven separate age decade groups of the US NHaNES data and seven separate age decade groups of the Korean NHaNES data. The BIC consistently and significantly reduced, suggesting a better fit, from a single-Gaussian-component model to a two or more Gaussian-component mixture model. However, there were minimal differences between twoand six-component mixtures. Across the 16 data groupings, the BIC means and standard deviations relative to 1 for the single-Gaussian model were 0.89 ± 0.05, 0.88 ± 0.06, 0.89 ± 0.06, 0.89 ± 0.06 and 0.90 ± 0.06 for two-, three-, four-, five-and six-Gaussian-component models, respectively. Remembering that BIC penalises model complexity, these results do not necessarily mean that the three-Gaussiancomponent models achieved a more accurate fit than the more complex models. However, balancing accuracy and simplicity, and avoiding overfitting, the two-or three-Gaussian-component models are preferred.

Model comparison part 2 -accuracy of refractive error prevalence estimates
Another way to compare the accuracy of refractive error distribution models is to test their ability to estimate the prevalence of refractive errors at particular cut-points. Based on the BIC results for model-fitting of the US and Korean raw data across all ages and within specific decades, we selected the two-and three-Gaussian-component mixture models to compare with a single-Gaussian model as the accepted standard, and the ex-Gaussian additive model as an interesting alternative. The "observed" prevalence at each cut-point was given by the raw data count, while "modelled" prevalence was the estimate derived from a model. We plotted the difference between the observed and modelled prevalence, against the observed prevalence, at each cut-point for the all-ages distributions and each age decade distribution of both the US and Korean NHaNES raw datasets. The results for each model are shown in Figure 2.
From Figure 2, means ± standard deviations for observed prevalence minus estimated prevalence were −3.0% ± 6.3% for the single-Gaussian model, 0.5% ± 1.9% for two-Gaussian, 0.6% ± 1.5% for three-Gaussian and −1.8% ± 4.0% for ex-Gaussian model. Two-and three-Gaussian-component mixture models give modelestimated prevalence closer to the observed prevalence compared with the single-Gaussian or ex-Gaussian models. The ex-Gaussian model appears to have greater difficulty predicting the prevalence of low hyperopia than the prevalence of low myopia. However, it is worth noting that the ex-Gaussian is significantly more accurate than a single-Gaussian and can be used with less data input than multiple-Gaussian mixture models.
While Figure 2 shows that the US and Korean NHaNES datasets, grouped as all-ages and by age decade, provide a wide range of prevalence (0.1% to 81%) and skewness (−3.6 to −0.2) to challenge the models, it is still valuable to test model performance across a wider range of data types. To do this, we next used the high-quality binned refractive distributions in published papers identified by our systematic search strategy to estimate refractive prevalence at four refractive cut-points. These data varied greatly in participant age, ethnicity and location, as well as the number of bins into which the refractive distributions were divided. Figure 3 shows the accuracy of the single-Gaussian model was worse than each of the other models: means ± standard deviations for each model were −2.3% ± 7.3% for single-Gaussian, 0.4% ± 3.5% for two-Gaussian, 0.4% ± 3.3% for three-Gaussian and −1.2% ± 5.4% for ex-Gaussian.

DISCUSSION
While there has been some debate between active versus passive modulation of refractive growth, 49 strong support has emerged over time for active emmetropisation driven by visual experience. 4,5,[50][51][52] After the major period of growth, development and emmetropisation has occurred, the complex population distribution of refractive errors that emerges is partly due to identifiable heterogeneity such as age, period, cohort, ethnicity, sex, education and urbanisation. However, even when participants are sub-grouped to isolate these factors, the distribution of refractive error in population-based samples remains nonnormal, presumably due to variations in refractive genetics and visual experience. Perhaps the clearest published examples of this are from military conscription surveys. 34,36 Our analysis demonstrates the ability of multiple-Gaussian mixture models to describe the refractive development heterogeneity within a wide variety of population samples. It is tempting to imply biological meaning in the relationship -for example, that each Gaussian component of a mixture model represents a specific variation in refractive genetics or aligns with clinically meaningful groups like myopia, emmetropia and hyperopia, with the variance of each component representing the effects of the range of F I G U R E 1 All-ages 2007-2008 US (left) and 2008 Korean (right) NHaNES data in white columns fitted with a one-to six-Gaussian mix (a-f, respectively, with components (Comp) as shaded areas and the mixture distribution shown as the black line) and an ex-Gaussian mix (g, with the shaded area showing the addition of the exponential and Gaussian components). visual experience. However, our analysis does not provide any proof of the underlying mechanism, only that mixed-Gaussian models are a reasonably accurate way to describe observed distributions of refractive error. Even so, the existence of a plausible underlying mechanism is reassuring when considering the use of mixture models in measuring and understanding distributions of refractive error. "All models are wrong" is a common statistics aphorism 53 but can less nihilistically be expanded to "All models are wrong, but some are useful". We have argued that there is a need to better understand the distribution of refractive errors in populations, and we have shown that mixture modelling is, while not perfect, a substantial improvement on the currently accepted standard of a single-Gaussian model. Although multiple-Gaussian mixtures outperform the ex-Gaussian additive model when raw data are available, the ex-Gaussian model still appears useful where there is limited binned distribution data and particularly when only summary statistics are available for a particular population. It is important to understand the imperfections of the models but that each can be useful for predicting the prevalence of refractive error at standardised refractive cut-points.
A major strength of our analysis is that all the refractive data, both raw and binned, were population-based. This overcomes the possible bias of clinical data such as that used by Rozema et al. 8 Even so, it is worth noting that Rozema et al. 8 came to similar conclusions from their analysis of clinical refractive data. Similarly, our findings appear consistently robust across the widely diverse range of population-based data we obtained. One limitation in our analysis is that by analysing SER, we have averaged out some additional complexities of refractive error such as astigmatism. A clear understanding of astigmatism is critical in planning and delivering integrated people-centred eye care, so methods of taking it into account in modelling are also needed.
The ability to analyse and compare data across different studies accurately will facilitate progress in understanding  refractive error development and improving eye care access and delivery. The systematic search strategy used here suggests most publications simply report refractive error prevalence at non-standardised cut-points, which are difficult to compare. We have also shown that it is inaccurate, and potentially biased and misleading, to assume normality in grouped comparisons. As such, it seems worth considering a hierarchy of data sharing and description that would maximise the power and potential of meta-analyses, while recognising that the first priority is not always possible. Our recommended hierarchy would be to publish: A high-quality frequency distribution, or binned datasets of 25 bins or more, describing the refractive error distribution/s. These generally need to be modelled (e.g., by multiple-Gaussian or ex-Gaussian mixture models) to be used accurately in meta-analyses. 4. The mean, standard deviation, skewness and kurtosis of the refractive error distribution/s, which would enable modelling using the ex-Gaussian additive distribution.

FU N D I N G I N FO R M AT I O N
This work was supported by an Australian Government Research Training Program scholarship and a public health grant from the Brien Holden Vision Institute (BHVI), Sydney, Australia.

CO N F L I C T O F I N T E R E S T S TAT E M E N T
No conflict of interest is declared.