Heterogeneity in children's reading comprehension difficulties: A latent class approach

Abstract Background Poor comprehenders are traditionally identified as having below‐average reading comprehension, average‐range word reading, and a discrepancy between the two. While oral language tends to be low in poor comprehenders, reading is a complex trait and heterogeneity may go undetected by group‐level comparisons. Methods We took a preregistered data‐driven approach to identify poor comprehenders and examine whether multiple distinct cognitive profiles underlie their difficulties. Latent mixture modelling identified reading profiles in 6846 children from the Avon Longitudinal Study of Parents and Children, based on reading and listening comprehension assessments at 8–9 years. A second mixture model examined variation in the cognitive profiles of weak comprehenders, using measures of reading, language, working memory, nonverbal ability, and inattention. Results A poor comprehender profile was not identified by the preregistered model. However, by additionally controlling for overall ability, a 6‐class model emerged that incorporated a profile with relatively weak comprehension (N = 947, 13.83%). Most of these children had weak reading comprehension in the context of good passage reading, accompanied by weaknesses in vocabulary and nonverbal ability. A small subgroup showed more severe comprehension difficulties in the context of additional cognitive impairments. Conclusions Isolated impairments in specific components of reading are rare, yet a data‐driven approach can be used to identify children with relatively weak comprehension. Vocabulary and nonverbal ability were most consistently weak within this group, with broader cognitive difficulties also apparent for a subset of children. These findings suggest that poor comprehension is best characterised along a continuum, and considered in light of multiple risks that influence severity.


INTRODUCTION
Poor comprehenders have difficulties with reading comprehension alongside relative strengths in reading accuracy.Their weaknesses extend beyond the written domain to listening comprehension, in line with the Simple View of Reading that sees variation in both reading accuracy (often summarized as 'decoding') and listening comprehension as contributing to reading comprehension (Hoover & Gough, 1990).Consistent evidence points to underlying impairments in aspects of oral language (Landi & Ryherd, 2017).However, reading comprehension is the product of many cognitive operations (Castles et al., 2018) meaning there may be several routes to comprehension failure.Large-scale data-driven approaches are needed to identify and understand heterogeneity.

Oral language as a core deficit
Several studies converge on the finding that poor comprehenders have poor vocabulary relative to control children (e.g., Cain & Oakhill, 2006;Nation et al., 2004) and differences in grammatical processing and listening comprehension (Elwér et al., 2015) are indicative of oral language weaknesses more broadly.Given reciprocal influences across development (Verhoeven et al., 2011), low language may be a cause or consequence of poor reading comprehension.Retrospective longitudinal studies show that language differences are apparent prior to the onset of reading (Catts et al., 2006;Justice et al., 2013;Nation et al., 2010) and some causal role is further supported by the success of oral language interventions in improving reading comprehension in poor comprehenders (Clarke et al., 2010).In light of these findings, it is tempting to conclude that poor comprehenders have core deficits in oral language that lead to their difficulties with reading comprehension.

Beyond oral language
Despite group-level differences, not all poor comprehenders show vocabulary impairments (Colenbrander et al., 2016), and some studies find only weak transfer effects from oral language intervention to improvements in reading comprehension (Melby- Lervåg & Lervåg, 2014).These findings align with the more general conclusion that single deficit models of developmental disorders rarely hold up to the variability observed at an individual level (Astle & Fletcher-Watson, 2020;Pennington, 2006).Relatedly, the apparent specificity of language deficits might result from the tightly controlled groupmatch design used in experimental research.Typically, small groups of children matched closely on age, reading accuracy, and often nonverbal ability (restricted to be within the normal range) are compared in attempts to identify cognitive differences that characterise poor comprehenders.By minimising sources of variability within and between the groups under investigation, however, we likely miss the breadth of weaknesses that might accompany comprehension difficulties on an individual basis.Further, such recruitment constraints typically leave small numbers of participants for the comparisons of interest (often no more than 10-20 per group), resulting in low statistical power for detecting smaller effects.
Thus while language may be the most substantial difficulty experienced by the majority of poor comprehenders, milder or less consistent cognitive weaknesses may go undetected.
The proposal that poor comprehenders experience broader cognitive deficits is not novel.Reading comprehension places significant demands on other cognitive abilities, including attention (Cain & Bignell, 2014), working memory (Carretti et al., 2009), and reasoning skills, all of which may act as "pressure points" in the reading system (Logan & LARRC, 2017;Perfetti et al., 2014).While it is difficult to separate some of these effects from low language (Pimperton & Nation, 2010), documenting and understanding broader weaknesses is necessary for remediation as well as for theory.For example, mutualistic relationships between cognitive domains across development may leave long-lasting weaknesses that are not addressed by targeting the original "cause" of the disorder in intervention (Kievit, 2020).
Conversely, co-occurring strengths in other domains may act as protective factors in minimising the severity of comprehension problems.
Thus, there is a clear need for large-scale studies that capture relative strengths and weaknesses across several dimensions, and how they might differ within a heterogeneous population (Lervåg, 2021).

Research questions
We adopted a data-driven approach to identify and understand the nature of children's reading comprehension difficulties, using data from the Avon Longitudinal Study of Parents and Children (ALSPAC), a UK birth cohort study that began in the early 1990s.Participants (n = 6846) were assessed at 8-9 years on measures of readingrelated skills: decoding, reading comprehension, and listening comprehension (hereafter referred to as "reading skills" for brevity).
We first asked whether profiles of reading skills in this cohort reflect the Simple View of Reading.We used latent profile analysis (LPA) to extract different classes of readers without imposing arbitrary thresholds to better capture the dimensionality of reading difficulties.

Key points
� Poor comprehenders show poor reading comprehension in the context of adequate decoding.While oral language tends to be low in poor comprehenders, reading is a complex trait and heterogeneity may go undetected by group-level comparisons.
� Using a large sample and a preregistered data-driven approach, reading comprehension difficulties were best conceptualised along a dimension of overall reading skill, rather than as a distinct subgroup.
� The majority of weak comprehenders showed weaknesses in oral language and nonverbal ability.A minority had broader cognitive weaknesses and more severe comprehension problems.
� In line with a multiple risk framework, researchers and practitioners should look beyond language to identify the broader cognitive strengths and weaknesses of poor comprehenders.
If reading comprehension is the product of variation in reading accuracy and language comprehension, we anticipated at least four classes to emerge, reflecting relative strengths and weaknesses in these domains.Any additional classes were expected to further differentiate by levels of ability (as in Torppa et al., 2007).
Using the poor comprehenders identified by the first analysis, we fitted a second model that incorporated additional cognitive and behavioural measures (reading rate, vocabulary, working memory, nonverbal ability, inattention; referred to collectively as "cognitive skills") to test whether there are multiple distinct profiles within the group.We predicted that most would have language weaknesses, and that some would show additional weaknesses in nonverbal ability, working memory, and/or attention.We had no strong predictions over the extent to which these difficulties would co-occur or reflect distinct cognitive profiles.

Sample
The Avon Longitudinal Study of Parents and Children recruited 15,454 pregnancies in the former Avon area (UK) between April 1991 and December 1992, from whom 13,988 offspring were alive at one year.Later recruitment of eligible children at age 7 increased this total sample size to 14,901.The offspring have been studied ever since via a wide range of questionnaires and clinic assessments (Boyd et al., 2013;Fraser et al., 2013).The study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool (http://www.bristol.ac.uk/alspac/researchers/our-data/).
Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees.
Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethical and Law Committee at the time.
We report data from participants who completed the Neale Analysis of Reading Ability (NARA-II; Neale, 1997) during a clinic visit at age 9.5 years (n = 6935).Approximately 61% of eligible participants attended the clinic, which was influenced by sociodemographic factors: attendees were more likely to have mothers who were older, more highly educated, and homeowners, relative to eligible participants that did not attend.A higher proportion of eligible females versus males and white versus non-white children attended the clinic.To address non-independence, one twin from each pair (n = 89) was selected at random.Our final sample comprised 6846 children.

Measures
Reading assessments were administered during a clinic visit at age 9.5 years, with listening comprehension and all other cognitive assessments at age 8.5.We describe manual-reported reliability for the full subtests for the relevant age.Analyses were based on raw scores, with age (months) included as a covariate unless otherwise stated.
Summary statistics are presented in Table 1, alongside correlations between measures in Table 2.

Decoding
Item accuracy.Children read ten "made-up" and 10 real words aloud, selected from Nunes et al. (2003).The two lists have test-retest reliabilities of 0.8 and 0.73 respectively.We intended to analyse words and nonwords as separate measures, but high word reading caused problems in higher-class models.Data were therefore summed into a single score (/20).
Passage accuracy.In the NARA-II children read aloud passages of increasing difficulty.They are instructed to attempt difficult words (and errors are corrected by the administrator), and to read carefully despite the time being recorded.The number of errors are deducted from a manual-specified threshold value for each passage, and then summed to produce an overall score in line with standard scoring practices (Form 2; reliability = 0.87).

Comprehension
Reading.Children are told that they will be asked questions after reading each passage in the NARA-II.Questions are asked orally by the administrator, and children are permitted to look back at the text when making their responses.A high proportion of the questions required inferences from the text (86%; see Bowyer-Crane & T A B L E 1 Summary statistics for all reading and cognitive variables (across whole analytic sample; n = 6846).Listening.This comprised alternate questions from the listening comprehension subtest of the Wechsler Objective Language Dimensions (WOLD; Rust, 1996).The tester read aloud paragraphs and children responded verbally to comprehension questions (/16; full subtest reliability = 0.84).

Analyses
Mixture models incorporate a categorical latent variable to identify subpopulations (latent classes or profiles) in the data.Analyses were conducted using Mplus v8.5 (Muthén & Muthén, 1998-2017) and the MplusAutomation package in R (Hallquist & Wiley, 2018).Plans were preregistered (https://osf.io/4zahf)following the template by van den Akker et al. ( 2019), with deviations noted throughout.All models used robust estimation to account for any non-normality in the data.
Missingness was predicted by variables already included in the planned analyses (Appendix Table S1, S1), and dealt with using full information maximum likelihood.Scripts and annotated outputs are available (https://osf.io/zvjw4/);direct links to the final model details are provided in Appendix S4.Note that our analyses do not account for clustering within schools, and while we do not have individual-level data on the reading instruction that they received, the national curriculum at that time specified phonics instruction alongside other literacy activities.

Preregistered analyses
We used split-half cross-validation to develop the model on an "exploratory" half of the sample (n = 3423), with the confirmatory validation detailed below.An initial confirmatory factor analysis was used to inform our mixture model approach: we examined model fit for a single-versus two-factor model (decoding, comprehension).The latter demonstrated that the correlation between factors was high (0.77), and listening comprehension was only weakly loaded (0.38) on the comprehension factor alongside reading comprehension.Thus, we proceeded to conduct a LPA directly using the observed measures (with age included as a covariate for each), rather than incorporate continuous latent factors for decoding and comprehension in a factor mixture model (FMM).
Following the class enumeration procedure set out by Masyn (2013), we fitted a series of models with increasing k-classes under five model specifications that differed in whether the variances and/or covariances were estimated separately across classes (Masyn, 2013;Pastor et al., 2007) (Weller et al., 2020).

Results
Models that estimated covariances between measures showed better fit to the data than those that did not (Table S2).This specification also has theoretical support, given we incorporated multiple measures of decoding and comprehension that are expected to correlate.
For each of these unrestricted specifications, models with 3-5 classes emerged as the best candidates according to model fit and interpretability (Figure S1).The 4-class model that allowed variances and covariances to vary between classes (Figure 1A) was selected as the best model with reasonable discrimination between classes.For brevity, only the k-class models from this unrestricted model specification are presented in Table 3, allowing comparisons with the subsequent exploratory analyses.
The selected model showed low but acceptable entropy (0.70), with average latent class probabilities ranging from 0.78 to 0.89.The four classes showed differentiation by overall performance level across tasks, representing high ability (24.38%), high-average ability (39.75%), and two lower ability classes (6.85% and 29.02%).This differentiation by level was common to all model specifications and candidate k-class models considered.

Interim summary
The preregistered analysis did not succeed in finding latent classes of readers.Our hypothesised profile of readers with good decoding but poor comprehension skills was not identified within the dataset.
Rather, reading difficulties most commonly spanned both decoding and comprehension components, leaving only ordered profiles of ability.These profiles are of limited use beyond a continuous approach to modelling reading-related skills and draw into question the view of reading difficulties as strongly categorical in nature.That is, based on the measures in our analysis, we did not find distinct groups of readers with isolated impairments in either decoding or comprehension.
An alternative approach emphasises this continuous variation in reading, and a number of studies have considered reading comprehension difficulties that are weak relative to decoding across the spectrum of reading ability (e.g., Tong et al., 2011;Wagner et al., 2021).Understanding the cognitive difficulties of children who show such uneven profiles of reading remains important, as are their implications for educational outcomes.Thus, to further inform discussions on conceptualising reading comprehension difficulties, we explored two additional model specifications designed to isolate qualitatively distinct profiles (i.e., profiles of varying strengths and weaknesses across tasks) in the context of strong differences in overall ability, as proposed by Morin and Marsh (2015).

Method
We repeated the class enumeration sequence for two alternative model specifications.The first was a FMM that incorporated an additional continuous latent variable formed from all measures (Figure 1B), allowing for the estimation of qualitatively different profiles alongside differences in overall ability estimated by the continuous factor.The second was an LPA model that incorporated the higher-order factor score as a control measure for each indicator variable (Figure 1C), allowing for the estimation of qualitatively different profiles beyond differences in overall ability.We selected the best candidate model from each specification (including from Analysis 1A), and then compared the model fit, stability, and theoretical relevance to select a final model from all specifications considered.The selected model was further scrutinised through cross-validation in the second half of the sample.

Results
The best-fitting FMM with continuous latent variable had 4 classes, with good entropy (0.85) and average latent class probabilities of 0.85-0.94.This model also produced ordered profiles, namely a majority high-average ability group (72.99%), two smaller classes of lowaverage ability (6.08% and 4.48%), and a lower ability group (16.46%).
However, the modified LPA with covariate did identify reading profiles with different shapes beyond overall ability in models above 5 classes, including a profile with relatively strong decoding and relatively weak comprehension that remained stable across 5-6 class models.The main difference between these two models was the addition of a profile with relatively good comprehension.We favoured the 6-class model as this profile conforms theoretically to previous research.3).

F I G U R E 1
Further, it was the only model that provided classes of theoretical and practical relevance to our goal of identifying individuals with relatively weak comprehension.

Cross-validation
Full details are reported in Appendix S2.Our selected model showed poorer fit than a freely estimated model on the second half of the data (p < 0.001), largely driven by slight variations in class intercepts.
The overall shape of the profiles remained similar.We repeated the class enumeration process for the second half of the sample.This confirmed that the 6-class model showed the best profile stability across both halves of the sample.We describe these profiles below, re-fitted to the full dataset.

Class labels
As summarised in Figure 2 and

Interim summary
Our initial models in Analysis 1A produced ordered profiles indicating strong differences in performance spanning decoding and comprehension.Controlling for overall ability in Analysis 1B, Mean performance on the reading skill measures for each reading profile (Analysis 1B).These reading profiles reflect those extracted from Analysis 1B: Exploratory LPA with overall ability covariate.Descriptive statistics are also provided in Table 4.  however, revealed qualitatively different reading profiles with varied strengths and weaknesses, including a group of individuals with weak reading comprehension relative to their decoding ability.This "weak comprehender" group differs from the traditionally identified poor comprehenders in the sense that not all individuals would be considered impaired below a threshold score, but aligns more closely with the "unexpected" poor comprehenders selected by regression approaches (e.g., MacKay et al., 2017;Tong et al., 2011).To better understand how this group aligns with prior research, and consider potential heterogeneity, we next examined whether there are multiple cognitively distinct profiles within this group who have weak comprehension relative to their decoding skills.

Sample
This comprised the weak comprehender group identified in Analysis 1B (n = 947).

Measures
Adding to the reading measures in Analysis 1, we included the following measures from the same clinic visits.Distributional statistics and missingness are detailed in Table 1.

Language
Vocabulary: Picture naming.Children named a subset of 10 pictures (WOLD expressive vocabulary task; Rust, 1996).Correct answers were summed (/10).The split-half reliability for the full oral expression subtest is 0.91.
Vocabulary: Definitions.Alternate items were administered from the vocabulary subtest, Weschler Intelligence Scale for Children (WISC-III; Wechsler et al., 1992).Children named depicted objects in early items; later items required word definitions to be supplied (scored 0-2).Scores were summed and doubled (for comparability to the full test; reliability = 0.88).
We intended to incorporate separate variables for comprehension and vocabulary.As the two constructs were highly correlated, they were collapsed into a single latent factor labelled language.

Reading rate
The NARA-II measures rate based on the average number of words in connected text read/minute (reliability across forms = 0.71).While reading rate is often considered a marker of decoding efficiency (particularly in more transparent orthographies than English), no studies to our knowledge have used the NARA-II rate score to select poor comprehenders and therefore it was not included in Analysis 1.It is a multifaceted measure with many potential influencing factors (e.g., decoding errors, attention, articulation, anxiety), and thus we included it in Analysis 2 to explore its utility as a co-occurring marker of different types of reading comprehension difficulty.

Nonverbal ability
We planned to include five WISC-III Performance Intelligence Quotient (IQ) subtests (Picture Completion, Coding, Picture Arrangement, Block Design, Object Assembly) to indicate nonverbal ability, but these were collapsed to a single score to reduce model complexity.Reliability for the full Performance IQ scale is 0.90.
Each subtest was z-scored on the sample, and an averaged composite z-score returned.

Working memory
In the WISC-III Backward Digit Span, children repeat back increasingly long lists of digits in the reverse order.The number of digit strings reported correctly was treated as categorical and extreme values collapsed to form four ordered categories.Split-half reliability is 0.84 for the full Digit Span subtest (forward and backward).

Inattention
The Strengths and Difficulties Questionnaire hyperactivity subscale (Goodman, 1997), was completed by parents when children were age 9. Goodman (2001) reports a Cronbach's α of 0.77 for this subscale.
As this scale is discrete and highly skewed ( e50% of children scoring 0-2/10) it was analysed as categorical, with the highest two scores collapsed to meet the software constraint of 10 categories.No age covariate was included.

Excluded measures
The structure of the final model deviates from that specified in the preregistration (https://osf.io/4zahf),as detailed in Appendix S3.Of note, we were unable to incorporate both Backward Digit Span and Counting Span working memory measures due to the resultant number of empty cells in each class; we favoured Digit Span as it was administered at the same time as the other variables.Three cognitive measures of attention were also excluded, due to poor distributions and poor loadings regardless of factor structure.

Factor mixture model
Figure 3 shows the final model specification.We followed the same class enumeration procedure as above for two model specifications that differed in whether factor loadings varied between classes (akin to models 3 and 4 from Clark et al., 2013).However, models with class-specific factor loadings were too complex for the data, leading to errors in parameter estimation that could not be resolved without substantial simplification.We thus focussed on the model with constrained factor loadings across classes, allowing variances and covariances to be estimated separately for each class.As above, missingness was random conditional on variables already included in the model, and was dealt with using full information maximum likelihood (see Appendix S1, Table S1).
We inspected model fit using AIC, BIC, aBIC and adjusted LRTs, and considered the utility of classification in deciding on the best kclass model.The computational load was deemed too high to compute the planned bootstrapped estimates for standard errors, but the large sample and the clear distinction between profiles gives confidence that any adjustments would not affect interpretation of the results.

Results
Only the 2-and 3-class models were supported by the data.Adding a fourth class resulted in an overly complex model for successful fit.The The remaining 81.1% of poor comprehenders (unexpected weak comprehenders) showed largely typical performance across tasks: that is, although comprehension was substantially below decoding, performance across both domains tended to fall within average range.
Decoding and rate were above-average on age-standardised scores; comprehension, vocabulary (definitions), and nonverbal ability were below-average (Figure 4A).This group showed comparable-if not slightly superior-working memory and attention to the remainder of the sample not classed as weak comprehenders (n = 5899; Figure 4B & C).Effect sizes are presented in Table 6.

Discussion
Using a data-driven approach, our first analysis demonstrated that the traditional poor comprehender profile-as defined by reading comprehension impairments that are substantially below averagegood decoding skills-is not reflected in the data.However, it is possible to examine unexpectedly weak comprehension relative to decoding skills, adopting a dimensional approach to reading weaknesses.By accounting for general reading ability in an exploratory model, we were able to identify 947 children with weak comprehension relative to decoding.Focussing on this subgroup, comprehension difficulties were most consistently accompanied by weaknesses in oral language and nonverbal ability.More severe comprehension problems were associated with the additional presence of broader cognitive difficulties.

Dimensionality of reading difficulties
The traditional poor comprehender profile described in experimental literature was not well-supported by the data: decoding and comprehension were highly correlated, and our initial LPA extracted profiles that varied only in overall ability.It is important to note that the absence of qualitatively different profiles in this sample is likely exacerbated by the measures available: the decoding and comprehension measures from the NARA-II are highly correlated (0.82), and are also the measures with the most variance.Thus, it remains plausible that the hypothesised profiles would have emerged if entirely separate assessments of decoding and comprehension were used.However, these component skills are highly correlated in developing readers of this age (see García & Cain, 2014, for a metaanalysis), and a recent study of Finnish readers also failed to identify discrepant profiles despite using measures from different tasks (Psyridou et al., 2021).These results contrast those of Torppa et al. (2007), who observed a poor comprehender group in a different sample of Finnish readers during the first two years of schooling (aged 7-9 years).Although the evidence is mixed (Psyridou et al., 2021), we tentatively consider that literacy development may also be important to understanding these conflicting findings.Foorman et al. ( 2017) found evidence of qualitatively distinct literacy profiles in early schooling, yet only ordered profiles emerged beyond 5 th grade (aged 10-11 years).This latter group better aligns with our sample, who were also in their fourth or fifth year of formal schooling when they completed the reading assessments.Thus, an important conclusion from the present study is that, at least later in development, reading difficulties are not strongly categorical and most commonly span problems with both decoding and comprehension.
Including overall reading ability as a covariate, we could extract qualitatively distinct profiles in the context of strong quantitative differences in overall ability.Thus, children with relatively weak comprehension compared to decoding were identified in a datadriven way, in line with the view that reading comprehension difficulties are dimensional in nature (Wagner et al., 2021).This approach is somewhat similar to regression-based analyses that identify children whose comprehension is weaker than predicted by age and decoding (e.g., Tong et al., 2011).However, our data-driven approach does not require arbitrary decisions (e.g., the size of the comprehension gap).Inspection of the alternative models considered for identifying reading profiles indicated that this approach was the best fit in accounting for the data, and cross-validated well across different halves of the sample.Whether these profiles of uneven reading skills make useful predictions about children's outcomes beyond the severity of overall reading difficulties is a key question for future research.

Cognitive profile(s) of weak comprehenders
Consistent with previous research that has documented oral language weaknesses in poor comprehenders (e.g., Nation et al., 2004), the weak comprehenders (n = 947) selected by our dimensional approach had weak vocabulary, particularly for definitions, a task T A B L E 6 Effect sizes for differences in reading and cognitive skills between the two cognitive profiles of weak comprehenders (Low ability vs. Unexpected; Analysis 2) and compared to the remaining sample ("Other").

Low ability versus
Descriptive statistics of reading skills for each reading profile (Analysis 1B).

3
-class model (Log Likelihood [LL] = −19621.32,AIC = 39,484.64,BIC = 40,071.89,aBIC = 39,687.60)showed slightly better fit than the 2-class model (LL = 19,781.25,AIC = 39,738.50,BIC = 40,165.60,aBIC = 39,886.11),but this difference was not significant according to the (V)LMR tests (p = 0.12).Further, adding a third class introduced only a small profile (3.93%) that was not theoretically meaningful.The 2-class model was therefore selected as most parsimonious.The 2-class model was a significantly better fit than a 1-class model (p = 0.01).It showed good entropy (0.88) and classification probabilities of 0.90 and 0.98.The two classes were dissociated across all measures.The smaller class (low ability group; 18.9% of poor comprehenders) showed relatively severe impairments across the board (Figure 4A & B), alongside increased hyperactivity/inattention (Figure 4C).Although their difficulties were most severe for reading comprehension and item accuracy (see also Figure S2), performance was close to or below -1SD age-expected levels across the board.

F
Final factor mixture model (FMM) specification for identifying the cognitive profiles of weak comprehenders (Analysis 2).Age (months) was incorporated as a covariate for all observed variables, except inattention.The four left-most observed variables were also included as measures of reading skill in Analysis 1.F I G U R E 4Descriptive statistics of reading and cognitive skills for each cognitive profile of weak comprehenders (Analysis 2).Assessment performance for Class 1 (Low ability) and 2 (Unexpected weak comprehenders); (A) Sample-standardised scores of reading and cognitive skills (error bars denote 95% confidence intervals; the four left-most measures were also included as measures of reading skill in Analysis 1); (B) distribution of working memory scores; and (C) distribution of inattention scores.For reference, the remainder of the sample (i.e., those not identified as weak comprehenders; n = 5899) are marked by the grey dotted line.
Note:The top right quadrant reflects correlations between measures in Analysis 1, with the whole sample of 6846 participants.The bottom left quadrant reflects correlations between measures in Analysis 2, with the subsample of 947 weak comprehenders.
T A B L E 2 Correlations between all reading and cognitive variables.

Table 4
these differences were close to average and less extreme than in Profile 1. Profile 5 (good comprehenders; 18.43%) showed close-toaverage decoding and were the highest performers on the two comprehension tasks.Profile 6 (consistent readers; 5.97%) performed close to average on all tasks.This small group was older at the time of the reading assessments (M = 123 vs. 118 months), and so the T A B L E 3 Fit indices for preregistered and exploratory mixture models used to identify reading profiles (Analysis 1).Note: Rows in bold represent the best k-class model selected for each model specification.Abbreviations: k, number of classes; AWE, Approximate Weight of Evidence criterion; BIC, Bayesian Information Criterion; CAIC, Consistent Akaike's Information Criterion; LL, log likelihood; LMR, Lo-Mendell-Rubin.profilereflectssuperiorperformance in absolute terms but in line with age.Given that the weak comprehender group is of primary interest for Analysis 2, Table5presents the Cohen's d effect sizes for this group relative to each other profile, on all reading measures.

Unexpected Low ability versus Other Unexpected versus Other
The Low Ability and Unexpected weak comprehender groups reflect the cognitive profiles extracted from Analysis 2. The "Other" group reflects the remainder of the sample (i.e., those not identified as weak comprehenders in Analysis 1; n = 5899), presented for reference.Effect sizes are computed as Cohen's d, for continuous variables only.