Feasibility and repeatability of ocular biometry measured with IOLMaster 700 in a large population‐based study

To evaluate the feasibility and repeatability of IOLMaster 700 biometry measurements in an adult population. Furthermore, to assess the value of the Quality Indicators (QIs) provided by the device.

corneal thickness plays a role in intraocular pressure evaluation in glaucoma. Basic research studies also depend on accurate and reliable ocular biometry measurements, for example to study the growth of the crystalline lens throughout life.
Various methods are currently available for the measurement of ocular biometry, specifically optical lowcoherence reflectometry, swept-source optical coherence tomography (SS-OCT) and Scheimpflug technology. The repeatability of ocular biometry has been reported for a variety of devices which use these technologies. The following summary is based on data extracted from 27 publications, which express the repeatability of ocular biometry as within-subject standard deviation (SD). This within-subject SD is derived from a one-way analysis of variance (ANOVA) as the square root of the within-groups mean square. 1 Repeatability for AL ranged from 4 to 50 μm with a median value of 12 μm; for central corneal thickness (CCT) repeatability it was found to be between 0.7 and 20.3 μm with a median of 2.8 μm; for anterior chamber depth/aqueous depth (ACD/AQD) within-subject SD ranged from 3 to 134 μm with a median of 20 μm and for keratometry (mean K) repeatability ranged from 0.01 dioptres (D) to 0.21 D with a median of 0.09 D. These investigations studied healthy subjects (from 19 to 91 years old) or those prior to cataract surgery, comprised between 29 and 119 subjects per study  and included one metaanalysis. 28 Additionally, we refer to a recent review of ocular biometry using SS-OCT by Montés-Micó. 29 The spread of repeatability results in the abovementioned publications is substantial. Furthermore, there is a need for studies with larger sample sizes obtained from a randomly selected population-based sample. Therefore, our aim was to evaluate the repeatability of ocular biometry measurements using the IOLMaster 700 carried out as part of the Leipzig Research Centre for Civilization Diseases (LIFE) Adult-Study. 30,31 A further aim was to provide reference values for the repeatability of the IOLMaster 700 in a normal population. As part of the data analysis, unreliable data can be identified as outliers by comparison of the SD of the repeated measurements for an individual subject with the repeatability limits obtained here. We refer to Rauscher et al. 32 as an example of such an application of repeatability limits.
Prior to investigating repeatability, it is important that the device being evaluated has the capability to measure the quantity being examined. Therefore, an additional aim of the current analysis was investigation of the feasibility of the biometry measurement. Feasibility in this context means the measurement success. For this purpose, we used the Quality Indicator (QI) provided by the IOLMaster 700 as a quality check and guide for the operator. This QI can also give a warning with a recommendation to repeat the measurement. Hence, a further goal was to investigate how taking these recommendations by the QI into consideration could improve the repeatability of the findings.

Device
Ocular biometry was investigated by the IOLMaster 700 (Carl Zeiss Meditec, zeiss.de/medit ec/produ kte.html). This device employs swept-source frequency domain optical low-coherence tomography with a tuneable laser having its wavelength centred at 1055 nm. It allows a scan depth of 44 mm and tissue resolution of 22 μm. 4 This technology provides the following parameters; CCT, ACD defined as the distance between the front surface of the cornea and the front surface of the lens, aqueous depth (AQD) defined as the distance between the back surface of the cornea and the front surface of the lens and lens thickness (LT). Keratometry (K) is assessed with 18 concentric points projected onto the central cornea, covering three zones with 1.5, 2.5 and 3.5 mm diameter. Output is given as the flat anterior corneal radius of curvature (R1, mm), flat anterior keratometry (K1 in D), steep anterior corneal radius of curvature (R2, mm) and steep anterior keratometry (K2 in D).
The device acquires six meridional OCT scans that are repeated internally three times, and their mean is shown as the output on the screen, the printout and in the data export. The K element acquires 15 images of the concentric points on the cornea and exports their mean. All measured outputs are accompanied by their SD and a signal QI symbol, which we will refer to as the QI. This QI has 3 levels: QI = 1 'Signal quality OK' for a successful measurement, QI = 2 'Signal quality marginal' for a successful measurement with a warning (operator decides if measurement is used/included) and QI = 3 'Measurement failed' for a failed measurement (with no measurement output by the device). Level 1 shows a checkmark on a green circular background, level 2 shows an exclamation mark on a yellow circle and level 3 shows the letter x on a red circle.
Room lights were switched off during all measurements. While the light level was the same for each participant, no quantitative control of the illumination level was implemented. Measurements were obtained sequentially at one sitting, with a brief moment between each one to save the

Key points
• The IOLMaster 700 showed good feasibility and repeatability to measure ocular biometry in our population-based sample. • The Quality Indicators of the IOLMaster 700 should be considered to identify less optimal measurements, which should be repeated to achieve improved results. • Our estimated repeatability limits for ocular biometry can be used objectively to identify outliers before subsequent data analysis. measurement, during which the participants were allowed to sit back from the instrument. Participants remained seated during all repeated measurements and the head and chin were re-positioned before each measurement. Participants placed their chin on a chin rest and pressed their forehead against a forehead strap. Before the measurement, participants were asked to blink to spread an optically smooth tear film over the cornea. The participants were asked to fixate on the internal target of the device. The device was aligned according to the guidelines described in the instructions provided, which essentially means that measurements were taken when the device software indicated correct alignment. This occurred when a yellow point appeared in the green horizontal and vertical adjustment rectangles and the user interface showed a green light on a traffic light icon. The measurement was performed in the manual mode by pressing the joystick button. The IOLMaster 700 prompts a mandatory calibration check daily at startup, which is executed semiautomatically. Software version 1.90.12.05 was used. The IOLMaster 700 also provides pupil diameter, the coordinates of the pupil centre and the white-to-white distance, which were not studied in the present work.

Participants
The baseline examination of the population-based LIFE-Adult-Study was conducted at Leipzig University, Germany, between August 2011 and November 2014. 30,31 The LIFE-Adult-Study recruited 10,000 randomly selected participants from the population registry of Leipzig, a city in central Germany with more than half a million inhabitants, in an age-and sex-stratified fashion. The study was approved by the Ethical Committee at the Medical Faculty of Leipzig University (approval number: 263-2009-14122009) and adhered to the Declaration of Helsinki as well as all federal and state laws. Prior to inclusion, informed written consent was obtained from all participants. The current study analysed repeated measurements carried out as part of the 6-year follow-up examination of the LIFE-Adult-Study which constituted the baseline visit for ocular biometry due to the inclusion of these measurements into the study protocol. The objective of the LIFE-Adult-Study was to investigate the prevalence, early onset markers, genetic predispositions and role of lifestyle factors of major civilization diseases, with primary focus on metabolic and vascular diseases, heart function, cognitive impairment, brain function, depression, sleep disorders and vigilance dysregulation, retinal and optic nerve degeneration and allergies. See Loeffler et al. and Engel et al. for details. 30,31 The LIFE-Adult-Study design comprised an age and sex stratified random sample of residents from the city of Leipzig, 40-79 years of age. A subset of 400 participants aged 18-39 years was also recruited. In total, 10,000 participants were planned as the target population. Address lists of randomly sampled citizens were provided by the resident's registration office of the city of Leipzig. 30,31 A random subsample having the same age distribution as in the main sample of 10,000 participants was drawn for the current study of ocular biometry resulting in 1767 participants. The age range of this subsample was 26-85 years, with 53% of participants above 70 years and having a median age of 71 years (more details are shown in Table 1). The optometric examinations were carried out between June 2018 and December 2020.
Participants were divided into those who were phakic or pseudophakic. Analysis for phakic lens state was also split for three different groups based on the level of visual acuity (VA) and three different age groups. The three VA groups were <−0.10 logMAR, between −0.10 and 0.30 log-MAR and >0.30 logMAR. This 0.30 logMAR limit was chosen based on the WHO criteria for mild visual impairment. Age groups were < 55 years, between 55 and 70 years and >70 years of age. This 55 year age criterion was chosen to include a group of participants mostly without cataracts.

Inclusion/exclusion criteria
All participants with at least two repeated measurements for at least one variable were included. This resulted in a main group of 1767 subjects with two repeated measurements and a sub-group of 1331 subjects with three planned repeated measurements.
For the Feasibility and QI evaluations the following cases were excluded: wrong lens status entered by the operator (n = 15), previous corneal or vitreous surgery (n = 6), aphakia (n = 1) and phakic IOL (n = 1). For the Repeatability evaluations, we excluded all cases mentioned above for feasibility as well as cases with internally incorrect detection of the crystalline lens or IOL back surface (n = 12). These exclusion criteria were used for both the main group with two repeated measurements and the sub-group with three planned repeated measurements.
Only measurements with a QI score of 1 were included in the repeatability evaluations. All four QI from ocular biometry by swept-source OCT measurements (AL, ACD, LT, CCT) needed to be graded as such per eye for the results to be included in the present analysis. The QI for the keratometry measurements also needed to be at this level for all results of a subject to be included.
Applying the above inclusion and exclusion criteria, we had a total of 1744 right eyes which were analysed for feasibility (1450 phakic and 294 pseudophakic). A total of 1588 right eyes were analysed for repeatability (1365 phakic and 223 pseudophakic). All sample sizes are given in the flow chart (Table A1).

Statistics
Data from the right eye of each participant were used for all analyses. The variables AL, ACD, AQD, LT, CCT, R1 and R2 were taken from the IOLMaster 700. We calculated mean K, delta K, J 0 and J 45 as follows: K1 (dioptres) = 337.5/R1 (mm); K2 = 337.5/R2; delta K = K2 − K1. The power vectors J 0 and J 45 were calculated according to Thibos. 33 After statistical analysis of the individual power vectors, the effective astigmatic cylinder power was calculated from J 0 and J 45 results. 33 The results are given for each variable as the percentage for each QI level, compared with the total number in the group. This was evaluated for the first, second and third measurements (if available). A measurement success rate is given, which is the sum of successful (QI = 1) and successful with warning (QI = 2) values obtained relative to all measurements.
When a measurement was accompanied by a warning by the QI, the device manual recommends that the operator should exclude this measurement and carry out an additional measurement. For a failed measurement, the device does not output any results, and thus, a replacement measurement needs to be performed. The QI evaluations for this investigation were only carried out for the phakic group and two repeated measurements. We compared two groups; group one included the first two measurements with both QI = 1 ('Both QI = 1') and group two included those with at least one of the first two measurements with QI = 2 ('One or two measurements with QI = 2'). First, the mean SD of both measurements was evaluated for each variable; the individual SD was then averaged across all participants. Next, the maximum difference between the two measurements was calculated for mean K and delta K. Here the frequency of cases with values >0.25 and >0.50 D was compared.
The repeatability evaluations were made with the participants divided into phakic and pseudophakic groups and only included QI = 1 measurements. This was performed for the main group with two repeated measurements and additionally for the sub-group where three repeated measurements were available.
Repeatability of the two and three measurements was expressed as the within-subject SD (S w ), which was calculated from a one-way ANOVA as the square root of the mean square within groups following the guidelines of McAlinden. 1 The use of the within-subject SD for more than two measurements was proposed by Bland and Altman. 34 We calculated a repeatability limit (r w ) as follows: This repeatability limit gives the interval within which 95% of measurements occurred. It can then be used to identify outliers in future studies. The method by McAlinden 1 focuses on the difference in measurements as for the original Bland-Altman approach 35,36 and includes a factor of √ 2 . Our computation of the repeatability limit r w omits the √ 2 factor as we employ the Pythagorean mean of SDs across participants. We prefer the concept of a 95% reference interval for the measurements themselves, as described in the International Organization for Standardization (ISO) 5725-2, 37 because we are not comparing two different measurement technologies. We also include the repeatability S w , so that the reference interval can be computed in different ways, including using the √ 2 factor. 38 Proportions of feasibility were compared with the Pearson chi 2 test. Median of VA and age were compared with the non-parametric Mann-Whitney U test. SDs and variances were compared using F-tests. The significance limit was set to 0.05, and for repeated tests, Bonferroni's correction was considered.

R ESULTS Feasibility
We found very good measurement feasibility for phakic eyes with a success rate of 99.7% for AL, 99.9% for CCT and ACD, 98.5% for LT and 97.6% for K; considering the first measurement. Feasibility for pseudophakic eyes was somewhat lower with success rates of 99.3% for AL and CCT, 95.2% for ACD, 94.6% for LT and 96.6% for K. The percentage of measurements with warnings was low for AL, CCT, ACD and LT (between 1.7% and 3.2%) and higher for K (16.4%) in phakic cases ( Figure 1, Table A2). Pseudophakic cases showed a significantly higher rate of measurements with warnings for ACD, LT and K (Table A2). The percentages provided above for the first measurement were very similar for the second and third repeated measurements (Table A2). When we split the phakic cases into three VA groups, we observed an increase in the percentage of cases with QI = 2 as VA declined; which was significant for AL, CCT and K. For K only there was an increase in the percentage of failed measurements as VA worsened. Dividing the sample into three age groups had less impact on the distribution of QI. Only for K was a substantial and significant increase in the percentage of cases with warnings (QI = 2) observed with increasing age (Table A3).
Participants displaying measurements with warnings (QI = 2) had poorer median VA, compared with participants with QI = 1 measurements. This difference was significant for AL and K. Older age was also associated with more failed measurements (QI = 3). The median age was significantly higher with failed measurements for AL, LT and K.
For instance, successful first measurements of AL were obtained on all participants below 76 years of age (Table A4).

Considering the QI
We evaluated the possible improvement in measurement variability if the operator considered the warnings (QI = 2) given by the IOLMaster 700 software and excluded these measurements. The following results include phakic eyes with two repeated measurements. Considering distance estimates in μm, the largest improvement was seen for AL with a mean SD reduction from 48 to 4 μm, followed by LT (17-5 μm), ACD (14-6 μm) and CCT (2.8-1.7 μm). Mean SD for mean K and delta K was reduced from 0.08 to 0.04 D and 0.13 to 0.09 D, respectively ( Figure 2 and Table A5). All comparisons in Table A5 were significant.
An additional method of evaluating the effect of the QI warnings is to calculate the maximum difference between two repeated measurements. For mean K, 9.4% of all phakic cases showed a difference ≥0.25 D when both QI = 1 and QI = 2 measurements were included, whereas this was only true for 1.9% of the cases when only QI = 1 measurements were included. The reduction in delta K was from 26% to 13%. Similar reductions, albeit of a lower magnitude, were achieved when taking a threshold of 0.50 D for the mean K and delta K results (Figure 3).

Repeatability
Repeatability was evaluated for the main group of cases with two repeated measurements and for the sub-group with three repeated measurements. Repeatability was very good for phakic eyes, with values around 8 μm for AL, CCT, ACD, AQD and LT and 2.3 μm for CCT for phakic cases with two measurements and QI = 1. Repeatability for the F I G U R E 1 Feasibility of the first measurement based on the Quality Indicator (QI) as percentage of all cases for axial length (AL), central corneal thickness (CCT), anterior chamber depth (ACD), lens thickness (LT) and keratometry (K). Data for phakic eyes only. Numeric values are given in Table A2.  Table A5.  Table A6. flat and steep anterior corneal radius of curvature showed somewhat higher values, that is, ≈16 μm for R1 and R2. This resulted in a repeatability of 0.07 and 0.12 D for mean K and delta K, respectively ( Figure 4 and Table A6). Repeatability after two and three measurements was not significantly different for AL and CCT. In contrast, it was significantly different for ACD and LT. When considering Bonferroni's correction for a repeated test, the repeatability after two and three measurements was significantly different for R1, but not for R2 (Table A6).
In pseudophakic eyes, repeatability was similar to phakic eyes for AL, CCT, R1, R2, mean K and delta K. It was significantly larger for ACD and AQD (72 μm) and LT (55 μm). The sub-group with three repeated measurements also showed very similar results compared with those having two measurements, except for ACD, AQD and LT in pseudophakic eyes (Table A6). Grouping for VA revealed an increase only for the repeatability of AL, from 5.02 μm (VA < −0.10 logMAR) to 10.53 μm (>0.30 logMAR). Differing ages showed increased repeatability in the middle age group (55-70 years) for AL, ACD/AQD and LT (Table A7). Additionally, the results for the repeatability limit are proportional to the repeatability in Figure 4 and Table A6 and are summarised for the main variables in Table 2.

Feasibility
Measurement feasibility was very good in phakic eyes, with 97%-99% of measurements successful. These results are based on a sample of the normal aging population, which included eyes with incipient cataracts. Dense cataracts were absent, as medical care provided the option for surgery in this population. In pseudophakic eyes, it is more difficult to detect the front and back surface of the IOL during ocular biometry, especially in cases with posterior capsule opacification. The anterior, or more often the posterior lens capsule could be opaque or there might be irregular remnants of the posterior capsule after YAG capsulotomy. Therefore, the feasibility for ACD/AQD and LT in pseudophakic eyes was lower, including more failures (Table A2). This is in line with previous research on pseudophakic eyes which showed that very clear surfaces of the IOL were sometimes not detected by the device, introducing measurement error for the LT variable. 39 Non-detection of the anterior lens surface leads to unsuccessful measurement data for ACD/AQD and LT, whereas non-detection of the posterior lens surface most likely leads to missing values for LT alone. Hui and coworkers 40 reported that 38 of 160 eyes in cataract patients could not be measured with the Lenstar LS900 in comparison with 28 eyes using the IOL-Master 500. In addition to dense cataracts, poor fixation and difficulty co-operating with the examination were also causes for failed measurements. 40 Hirnschall et al. 41 found a success rate of 99.5% with the IOLMaster 700 in patients scheduled for cataract surgery when counted on a pereye-basis. This compares well with the 99.7% success rate found here for AL but counted on per-subject-basis in a normal aging population. Measurement feasibility for keratometry was lower in both phakic and pseudophakic eyes as compared with the results for the OCT-based measurements (Table A2). Any alteration of the ocular surface conditions, especially the tear film, may make obtaining these measurements more difficult. Therefore, dry eye might be a cause of a reduced QI score for keratometry readings, especially in pseudophakic eyes after cataract surgery.
Measurement feasibility of AL, CCT and LT was not affected by age. Participants with failed measurements (QI = 3) had somewhat worse VA and may have experienced difficulties seeing the fixation target inside the IOLMaster 700. More frequent dry eye symptoms in elderly participants could have been responsible for reduced feasibility of their keratometry measurements, which depend on good reflectivity from the anterior surface of the cornea.

Considering the QI
Our evaluations with respect to the QI showed that removing measurements with warning (QI = 2) improved the repeatability of the biometric result substantially. This was particularly notable for AL with a 92% reduction in the mean SD, but also for ACD and LT (58% and 73% reduction of mean SD, respectively; Figure 2, Table A5). This is of clinical relevance and illustrates the importance of considering the quality warnings given by the device.
Combining measurements with warnings (QI = 2) and failed measurements (QI = 3) for keratometry comprised T A B L E 2 Repeatability limit expressed as 1.96 times within-subject standard deviation for phakic eyes. 18.8% of the findings (see Table A2; 16.4% and 2.4% were QI = 2 and QI = 3, respectively) and means that only once in every 5th eye would an extra measurement be needed. If this extra measurement was required, these cases would yield the excellent repeatability results for 'only QI=1 included' (Table A6). With a small extra workload in 18.8% of cases, keratometry measurements could be improved by reducing measurement uncertainty (mean SD) between 35% and 48% (Table A5). The percentage of cases with differences between the measurements of mean K and delta K of 0.25 D or 0.50 D could be reduced between 50% and 85% when the QI is considered (Figure 3).

Repeatability
We have provided results for those undergoing two repeated measurements, as well for a subgroup with three repeated measurements, and did not find significant differences for AL or CCT (Table A6). If the IOLMaster 700 evaluations are only intended to measure AL and CCT, then taking two rather than three repeated measurements will reduce the overall time for the procedure. As with feasibility, repeatability for ACD/AQD and LT in pseudophakic eyes was somewhat compromised. Therefore, the operator needs to take extra care when examining the OCT B-scan after each measurement to check if the optical surfaces, especially the IOL, have been identified correctly in these cases.
Visual acuity >0.30 logMAR reduced repeatability of the AL measurements. This might be due to fixation difficulties or disease related changes in the retina. However, the repeatability values remain very low, 5 μm for VA <−0.10 logMAR and 10 μm for VA >0.30 logMAR (Table A7) The somewhat higher values of repeatability in the middleaged group might be caused by more incipient cataracts in this age range. The younger age group are unlikely to have developed cataracts, while the older participants will either have undergone cataract surgery or retain relatively clear lenses not requiring surgery.
These results for repeatability are in line with previous reports using the IOLMaster 700. For example, regarding AL and ACD/AQD, our value of 8 μm compares closely with previously reported findings between 5 and 10 μm. 4,8,9,11,[13][14][15]17,25 It is also consistent with median values of 12 μm (AL) and 20 μm (ACD/AQD) found across different devices in the 27 studies cited in the Introduction. For mean K, our value of 0.07 D compares well with values between 0.02 and 0.07 D from previous investigations using the IOLMaster 700, 9,13,15,25 and to a median value of 0.09 D found across different devices in the studies mentioned in the Introduction.
A literature review showed that the median repeatability for AL in healthy eyes and those with cataract was 10 and 16 μm, respectively, while for ACD/AQD the respective values were 20 μm and 19 μm. For mean K the median reported repeatability was 0.09 D in both healthy and cataractous eyes (data obtained from the 27 studies cited in the Introduction). The mean repeatability of AL in our group of participants <55 years of age (no cataract) was 5 μm, which is lower than the median value of 9 μm reported in four studies using the IOLMaster 700 prior to cataract surgery. 4,9,11,14 This indicates a difference in repeatability in IOLMaster 700 measurements in eyes with and without cataract; however, both values were very low for AL measurements. For future studies in adults <55 years, we suggest that the lower repeatability limit of (1.96 × 5) μm should be applied to filter out outliers. For cataract patients, the larger limit of (1.96 × 9) μm may be used.
Errors in the refractive outcome of cataract surgery are mostly influenced by variations in AL and anterior keratometry (mean K). These relationships depend upon the magnitude of AL and the applied IOL formula. As a rough estimate, taking our repeatability limit for AL of 16.35 μm (Table 2) and the value of 2.67 D/mm for vitreous chamber depth from an error propagation analysis by Ribeiro et al. 42 will result in an error of 0.04 D for refraction after cataract surgery. Variations in keratometry will also have a direct effect on the refractive error outcome; in our case 0.14 D was the repeatability limit for mean K ( Table 2).
For quality assurance of the obtained results, evaluation of measurement performance is important. As first suggested by McAlinden and co-workers 1 for biometry and emphasised for other data on children's eyes, 43 the current study presents repeatability limits to enable future research employing such boundaries as a first step of analysis for quality assurance. Our calculated repeatability limit provides the ranges within which 95% of measurements lie ( Table 2). Following this, we assume inconsistencies in the finding if the value lies outside the 95% repeatability limit. We suggest this rule be applied ex post to the measurement data for subsequent analyses with the IOLMaster 700. Thus, data are only used for analysis if their measurement repeatability lies within the established limit for this variable. In this way, the repeatability limit can be used as a quality cut-off to accept a measurement for subsequent evaluation. Future studies or clinical measurements can implement repeatability limits based on the current large dataset. Such reference limits will help to reject inconsistent results before subsequent analysis or in clinical practice to indicate if measurements should be repeated. 38

CONCLUSIONS
In our population-based sample, the IOLMaster 700 was able to collect data for AL, CCT, ACD, LT and keratometry from the vast majority of the eyes examined. Considering the built-in QI improved measurement variability substantially. Repeatability measurements indicate that clinically meaningful changes can be detected reliably with this instrument. When comparing two to three repeated measurements, the level of repeatability for AL and CCT was the same. Abbreviations: ACD, anterior chamber depth; AL, axial length; AQD, aqueous depth; CCT, central corneal thickness; D, dioptres; K1, flat anterior keratometry; K2, steep anterior keratometry; LT, lens thickness; R1, flat anterior corneal radius of curvature; R2, steep anterior corneal radius of curvature; S w , within-subject standard deviation.

T A B L E A 6
Repeatability expressed as the within-subject standard deviation for phakic and pseudophakic eyes. Note: Results for two and three repeated measurements, all with Quality Indicator = 1. F-test compares the variance of repeatability between three and two measurements in phakic eyes and variance of repeatability of two measurements between phakic eyes and pseudophakic eyes. A significance limit of 0.025 needs to be considered because of a repeated F-test. Tests were only done for primary variables R1 and R2 and not for derived keratometry variables to avoid additional repeated tests.