Background: Analyses from the Jyväskylä Longitudinal Study of Dyslexia project show that the key childhood predictors (phonological awareness, short-term memory, rapid naming, expressive vocabulary, pseudoword repetition, and letter naming) of dyslexia differentiate the group with reading disability (n = 46) and the group without reading problems (n = 152) at the end of the 2nd grade. These measures were employed at the ages of 3.5, 4.5 and 5.5 years and information regarding the familial risk of dyslexia was used to find the most sensitive indices of an individual child's risk for reading disability.
Methods: Age-specific and across-age logistic regression models were constructed to produce the risk indices. The predictive ability of the risk indices was explored using the ROC (receiver operating curve) plot. Information from the logistic models was further utilised in illustrating the risk with probability curve presentations.
Results: The logistic regression models with familial risk,letter knowledge, phonological awareness and RAN provided a prediction probability above .80 (area under ROC).
Conclusions: The models including familial risk status and the three above-mentioned measures offer a rough screening procedure for estimating an individual child's risk for reading disability at the age of 3.5 years. Probability curves are presented as a method of illustrating the risk.
A key issue for clinical screening practice and for this study is whether it is possible to determine the critical ages and measures for identification of an individual child's risk for reading disability (RD). Findings of meta-analyses (Scarborough, 1998, 2001) and recent familial dyslexia follow-up studies (Carroll & Snowling, 2004; De Jong & Van der Leij, 2003; Elbro, Borstrom, & Petersen, 1998; Gallagher, Frith, & Snowling, 2000; Pennington & Lefly, 2001; Snowling, Gallagher, & Frith, 2003) as well as longitudinal data from the Jyväskylä Longitudinal Study of Dyslexia (Lyytinen et al., 2001, 2004, 2006) show that the best predictors of preschoolers’ and kindergarteners’ later reading achievement are the measures which require the processing of print followed by oral language proficiency measures as well as performance-IQ and familial history of dyslexia. The present study concerns prediction of 2nd grade reading disability (RD) and non-RD outcomes in children with and without familial risk for dyslexia. A battery of key dyslexia predictors (i.e., phonological awareness, short-term memory, rapid naming, expressive vocabulary, pseudoword repetition, letter naming) at three age points (3.5, 4.5 and 5.5 years) was used.
Earlier studies have suggested that individual screening has usually not been as successful when preschool-age predictors rather than measures derived from an age closer to school entry have been employed. Some researchers have combined several kindergarten variables with the guidance of statistical procedures and thus improved predictability impressively, e.g., recently Pennington and Lefly (2001) used discriminant function analysis and Elbro et al. (1998) employed a logistic regression analysis procedure. In the recent past, only one reading study (Catts, Fey, Zhang, & Tomblin, 2001) has tried to implement the findings from logistic regression modelling into clinical screening practice. Catts et al. found that the performance of the children (with early language problems) at the age of 5 years in letter knowledge, sentence imitation, phoneme/syllable deletion and rapid naming, together with mother's education, made a significant contribution to predicting the risk of reading comprehension difficulties in the 2nd grade. They also presented practical suggestions on the interpretation of results and on implementing the logistic equation models by calculating the exact individual probability scores for the purpose of clinical use.
The reports in the literature typically focus on the group-level differences in the predictors using either risk/control groups or RD/nRD samples. It is often the case that although a significant difference in group means emerges on a measure, it does not necessarily discriminate and predict skills at the individual level. The goodness of the predictor is determined by its ability to ‘catch’ the true positive cases (TP; i.e., those who are predicted to show RD and who turn out to be RD at school age) and to ‘avoid’ the false positive cases (FP; i.e., those who are predicted to show RD but who do not at school age). These two accounts are inversely related and by changing the cutoff point of the predictor the rates of both TP and FP shift. This is utilised in the ROC (receiver operating characteristic) analyses. The ROC curve is a plot of the TP-rate (sensitivity) against the FP-rate (1–specificity) for different cutoff points of a predictor. The method is often used in medical research to explore a measure's ability to discern individuals who have a disorder from those who do not have it (Greiner, Pfeiffer, & Smith, 2000; Grunkemeier & Jin, 2001; Obuchowski, 2003). The ROC scores (area under ROC) can be interpreted to express the measure's overall prediction probability of a disorder.
In the present study, a large battery of the key behavioural level dyslexia predictors was assessed during three successive years starting at the age of 3.5 years. The measures of phonological awareness, short-term memory, rapid naming of objects, expressive vocabulary, pseudoword repetition and letter naming as well as performance IQ and the familial risk of dyslexia were examined in the prediction of a specific reading disability.
The challenging goal of the present study is to be able to present a clinically usable and parsimonious procedure for evaluating an individual child's risk for RD. We first employed the logistic regression modelling approach to explore what combinations of measures are the most sensitive and specific in predicting an individual child's risk for RD at the different age phases and across ages, and then utilised the ROC analyses in the estimation of the achieved prediction probabilities from the age of 3.5 to 5.5 years. Information from the logistic models was further utilised in building the probability curve presentations which offer a powerful way to illustrate an individual child's risk of RD. Some suggestions and guidelines for screening are also offered.
The present data were drawn from the follow-up project of Jyväskylä Longitudinal Study of Dyslexia (JLD), which has been exploring early language development and the precursors of reading skills (for a review of sample and earlier results see Lyytinen et al., 2001, 2004, 2006 and background of parents in Leinonen et al., 2001). This data is drawn from 198 children belonging to JLD of whom 106 had a familial risk background (at-risk sample) and 92 had no familial risk (control sample). RD and nRD groups were formed on the basis of reading outcomes reported below. The backgrounds of parents and children are presented in Table 1.
Table 1. Statistical background of children and parents
At-risk, n = 106
Control, n = 92
RD, n = 46
nRD, n = 152
*Group difference at RD and nRD emerged for performance IQ and mother's education; p < .05.
Performance IQ age 5
Performance IQ age 8
Predictive measures prior to school age
The early predictors of reading and spelling were derived from the individual assessments at the age phases of 3.5 (M = 3.53, SD = 18 days), 4.5 (M = 4.51, SD =11 days) and 5.5 years (M = 5.50, SD = 11 days). The predictive measures tap the following skill areas.
Phonological Awareness (PA) tasks. Both tasks embedded in a computer animation program called Heps-Kups Land (the program was created especially for this purpose, for details see Puolakanaho, Poikkeus, Ahonen, Tolvanen, & Lyytinen, 2004) and more traditional PA tasks were employed. The age-specific composites were formed on the basis of the following subtasks:
1Word-level segment identification (8 items at the age of 3.5 years). In this task the child was presented with 3 pictures of objects on the screen, immediately followed by the name of each object (all compound words) and asked to identify the picture containing a specified part of the compound (e.g., ‘lentokone’ (aeroplane); ‘soutuvene’ (rowing boat); ‘polkupyörä’ (bicycle) – in which picture can you hear the sound ‘kone’ (plane)?).
2Syllable-level segment identification (8 items at the age of 3.5, 16 at the age of 4.5 and 22 at the age of 5.5 years). The task was the same as above but with the requirement to identify the sub-word-level units (syllables or phonemes) within the target (e.g., the ‘koi’ in the word ‘koira’ (dog)).
3Synthesis of phonological units (12 items at the age of 3.5 years). Segments (syllables or phonemes) were presented to the child, each separated by 750 msec, and the child was asked to blend the segments to produce the resulting word (e.g., per-ho-nen (butterfly) or m-u-n-a (egg)). Only a response containing the right assembled form was coded as correct.
4Continuation of phonological units (8 items at the age of 3.5 and 12 items at the age of 4.5 years). The child was presented with the beginning of a ‘secret’ word and asked to guess how the word would continue (e.g., ‘mu-?). Only continuations that were meaningful words were coded as correct.
5 Initial phoneme identification (9 items at the age of 4.5 and 5.5 years). This task entailed the child being shown four pictures of objects with the simultaneous presentation of the object name. The child was then required to select the correct picture on the basis of the oral presentation of a subsequent initial phoneme relating to one target (e.g., ‘In the beginning of which word do you hear ____?’).
6 Production of the first phoneme (8 items at the age of 4.5 and 5.5 years). The experimenter showed a picture to the child and asked what she/he saw in the picture. After that, the child was asked to listen to the word and then to articulate the first sound (phoneme or letter name) of the object. The sum of the correct phonemes or initial letter answers formed the score of the task. The PA score at 3.5 years was formed by summing the subtasks 1, 2, 3 and 4, the PA score at 4.5 years was a composite of subtasks 2, 4, 5, and 6, and the PA score at 5.5 years was a composite of subtasks 2, 5, and 6.
Rapid naming. The rapid serial naming (RAN) of objects task was assessed at the ages of 3.5 and 5.5 years using the standard procedure (see Denckla & Rudel, 1976) within a reduced (30-item; 5 stimuli by 6 times random presentation) matrix. Total matrix completion time (seconds) was used as the measure.
Short-term memory. The digit span subtest was assessed at 3.5 years and 5.0 years of age (the latter score was included in the measures of the 4.5-year age phase) using the typical procedure described in the literature (e.g., Gathercole & Adams, 1994). The score used in the analyses was the number of correctly repeated lists. The memory for names task was administered at the age of 5.5 years in association with the Developmental Neuropsychological Assessment (NEPSY; Korkman, Kirk, & Kemp, 1998). In this test, the child is required to recall names which are read aloud by the examiner. One point was awarded for each correct repetition.
Expressive vocabulary. The Boston Naming Test (BNT; Kaplan, Goodglass, & Weintraub, 1983) was used to obtain a measure of productive vocabulary at 3.5 and 5.5 years. The score is based on summing the number of items (maximum of 60) that the child spontaneously names correctly and the number of items named correctly following a semantic stimulus cue. A subtask of the Vocabulary of the WPPSI-R (Wechsler Preschool and Primary Scale of Intelligence-R; Wechsler, 1989) was administered at 5.0 years of age to assess vocabulary development. In modelling, the Vocabulary subtest was included in the measures of the 4.5-year age phase.
Pseudoword repetition. In the pseudoword repetition task (18 partly different items administered at the age of 3.5 and 4.5 years), items were embedded in a computer animation story and after the child had repeated the pseudoword a hidden animal would appear on the screen. At 5.5 years, the Nonword Repetition task of the Finnish version of the NEPSY (Korkman et al., 1998) was presented. The stimuli were arranged in series of increasing complexity and length, from monosyllabic items (e.g., nas) to polysyllabic items (e.g., plotsiskäntsigis). One point was awarded for each correct repetition.
Letter naming. In the letter naming task the child was asked to name letters which were written in large capitals and presented one at a time each on their own page. At the age of 3.5 years the child was presented with 16 letters organised in three sets (6 + 6 + 4 letters), and at 4.5 and 5.5 years the child was presented with 23 letters which were arranged in six-item sets. The child received one point for each correct response (use of a phoneme or a letter name were both coded as correct responses). The testing always began by presenting the child with the letter that was expected to be most familiar to child: the first letter of his or her own first name.
Performance IQ. A short form of the WPPSI-R (Wechsler Preschool and Primary Scale of Intelligence-R; Wechsler, 1989) was administered at 5.0 years of age, and three performance quotient subtests (Block Design, Object Assembly, and Picture Completion) comprised the performance IQ measure. The WISC-III (Wechsler Intelligence Scale of Children-R; Wechsler, 1991) was administered at the age of 8.0 years, and four performance quotient subtests were used to form the performance IQ measure. The scores were estimated based on these subtests, according to the standard guidelines outlined in the manuals.
Parental education. Parental education was classified using a 7-point scale. This scale was constructed by combining the information that the parents had given concerning their general education and their upper secondary vocational education and tertiary education.
Classification the children for RD and nRD groups.
Measures of reading and spelling (five tasks providing eight measures) assessed individually at the end of the second grade (at the mean age of 8.9 years, sd = .5) were used in the classification of the participating children into those with and without a specific reading disability. The factor analytical results indicated that the measures cluster into two components, one representing the accuracy of reading/writing (four measures) and the other representing the fluency of reading (four measures). The accuracy and fluency scores were derived from the following tasks.
Reading words and nonwords. Altogether, 40 items (four sets of 10 items each, representing three-syllable and four-syllable words and nonwords) were presented separately via computer. Accuracy: The number of correctly read items. Fluency: The mean of the response times (reaction time + response duration) to the correctly read items (each presented separately).
Spelling words and nonwords. Altogether, 18 four-syllable items (6 words and two sets of nonwords with 6 items in each) were presented separately via headphones by using the sound files accessed through the computer, and the children were asked to write them by hand. Accuracy: The number of items with correct spelling.
Reading text. The child was presented with a passage of text, a narrative fiction story from an outdated first-grade reader, which was composed of 124 words. Accuracy: The percentage of correctly read words. Fluency: The rate of words per minute, which was calculated by dividing the number of words read by the time spent reading.
Reading nonword text. The child was presented with a passage of text composed of 19 nonwords but maintaining a structure like that of a normal text. The nonwords were built from real words by replacing consonants or vowels with different ones, e.g., instead of the word ‘matkoja’ (adventures) the nonword ‘tenkoja’ was used in the original text. Accuracy: The percentage of items in the text read correctly. Fluency: The rate of nonwords per minute, which was calculated by dividing the number of nonwords that the child read by the reading time.
Standardised reading achievement test. A speeded word list reading test from Lukilasse (Häyrinen, Serenius-Sirve, & Korkman, 1999) was used in which children read aloud a list of words where the items gradually became longer and more difficult. Fluency: The number of correctly read words within the two-minute time limit was transformed into a standard score according to the test manual's guidelines.
The procedure leading to classification was the following:
1. An exclusion criterion of the standard score of 80 in either performance or verbal IQ (assessed with WISC-R at 8 years, e.g., second grade) was applied, but all participants had scores above the criterion.
2A cutoff point was calculated using the 10th percentile of the JLD control group's performance in each of the eight outcome measures. A child was considered to have deficient skills in each respective task if his or her score fell to the 10th percentile or below.
3To be classified with RD the child's skills were at or below the 10th percentile either a) in at least three out of the four accuracy measures or at least three out of the four fluency measures, or b) in two accuracy measures and in two fluency measures.
Using these criteria, 38 children (35.8%) from the at-risk group and 9 children (9.8%) from the control group were classified with the specific reading disability (RD). Examination of the children identified with RD revealed that for 7 children only the accuracy measures, and for 21 children only the fluency measures, contributed to their being identified with RD, and for 19 children both types of the measures contributed to it. One child from the at-risk group was excluded from the analyses due to missing data at the early assessment phases; thus, the total number of children classified with a specific reading disability was 37 at-risk-group children and 9 control-group children. In the following sections we will be using the acronym RD for children classified with the specific reading disability, i.e., dyslexia (n = 46) and the acronym nRD for children who were not classified with a specific reading disability (n = 152).
The distributions of the 3.5-year letter naming and reading accuracy measures were skewed but for other variables neither ceiling nor floor effects were discovered. The distributions were normal or close to normal. Very few outliers were found, and they were relocated to the tails of the distributions before the analyses. Missing values of predictors (mean of 4.4%) were imputed at the item level by using an EM algorithm in SPSS (version 12.0.1 for Windows).
Gender differences in the predictive and reading-related measures were explored using the two tailed t-test. In the memory for names task, girls (Girls: M =11.3, SD = 5.1, Boys: M = 8.3, SD = 4.4, t (196), = 4.3, p < .001) outperformed boys. Since no other differences were found, girls and boys were combined in the subsequent analyses.
Background comparisons between the at-risk and control groups as well as between the RD and nRD groups are presented in Table 1. Statistically significant differences were found between the RD and nRD group in the mean scores of all the phonological and language skills at 3.5, 4.5 and 5.5 years (all differences except for 4.5-year and 5.5-year pseudoword repetition were at least at the level p < .01).
The logistic models predicting reading disability
All measures which had shown group-level differences were included in the logistic regression analyses, along with the categorical variable indicating familial risk for dyslexia. Since performance IQ failed to show group differences it was dropped from the analyses. Four logistic regression analyses (one model for each age phase and a fourth model across ages) were carried out using the Forward Wald procedure.1Table 2 presents the results of the logistic regression analyses.
Table 2. The three age-specific logistic regression models and an across-age logistic regression model for predicting reading disability
Note: To produce the logistic regression models for 3.5, 4.5 and 5.5 year-old children, the age- specific measures of phonological awareness, short-term memory, RAN, vocabulary, pseudoword repetition and letter naming were entered by using the Forward Wald procedure after Entering group-variable in the first step. The RAN-scores are reversed. In the across-age model the same procedure was applied with all the predictive measures included.
Letter naming 3.5 years
RAN 3.5 years
Phonological awareness 4.5 years
RAN 5.5 years
The familial risk status was a statistically significant predictor of RD in all models. At each age phase two other statistically significant variables were found to predict reading disability: letter knowledge and RAN emerged as significant predictors at 3.5 and 5.5 years, and letter knowledge and PA at the age of 4.5 years. It should be noted, however, that RAN might also have emerged as a significant predictor at 4.5 years (instead of PA) if it had been available at that time point. The following significant predictors emerged in the across-age model: Familial risk status, 3.5-year letter knowledge, 4.5-year PA, and 5.5-year RAN. The Nagelkerke R2 values can be interpreted to express the percentages of explanation. They were nearly identical in the three age-specific models (32–35%). However, in the across-age model, explanation power was slightly higher (up to 39%). The classification outcomes based on the three age-specific models, with different cutoff values of prediction probability scores, are presented in Table 3. The cutoff values of .50 and .25 were selected because they are the values most typically presented in the literature. The third level of probability (.14 in case of 3.5- and 4.5-year model and .17 for 5.5-year model) was selected because it represents the level of 90% sensitivity, a rate considered good enough/acceptable for clinical decision making. At this level 90% of the RD children were correctly identified (i.e., the rate of true positives was high), but specificity did not suffer too much. The same cutoff point representing 90% sensitivity is also applied in the probability curve presentation (Figure 1).
Table 3. Classification accuracy at the different probability cutoff levels using the age-specific logistic models. Analyses of RD (n =46) and nRD (n = 152) children
Classification correct, %
True positive cases
False positive cases
Group membership of positive cases:
Note: Any cutoff levels could be chosen but .50 and .25 are the most commonly presented in the literature. However, the more useful cutoff level (the lowest score in age-specific column) presents a 90% sensitivity level in the JLD-sample. The results of the weighted score analyses *are presented also. For the logistic procedure used, see note in Table 2.
Since the proportion of children with a familial risk of dyslexia is much higher in the JLD sample than in the general population, we also report in Table 3 the findings of the classification accuracy analyses in which the weighting procedure was used and the lowest cutoff level applied (probability level of .14 and .17) or the sensitivity level of around 90% (always the lowest row in a column) was applied. In the weighting procedure we used an estimate which was based on the expectation that in the general population the prevalence of familial risk of dyslexia is around 6% (based on recently presented prevalence rates, e.g., Vellutino, Fletcher, Snowling, & Scanlon, 2004, and genetic studies, e.g., Grigorenko, 2005). The findings with weighted scores did not differ dramatically from the findings without the weighting procedure. Using the lowest cutoff level specificity rates improved somewhat (which means that the model's ability to detect true negative cases would be better in a general population with a smaller proportion of children at risk). However, this change occurred at the cost of reduced sensitivity at 4.5 and 5.5 years (meaning that the model's ability to detect true positive cases would be somewhat poorer in a sample with a smaller proportion of children at risk). Using the 90% sensitivity cutoff levels and weighted scores, it can be noticed that no significant changes in the prediction appeared.
The ROC analysis
The scores from the regression models (based on the results in Table 2) as well as the scores of the early measures were entered into the ROC analyses (SPSS program, version 12.0.1 for Windows) in order to compare the scores’ ability to predict RD status. The ROC scores (area under ROC) were .81 (ages 3.5 and 4.5), .84 (age 5.5.) and .85 (across-age model) when the regression-model-based scores were entered. The scores of single measures varied from a minimum of .61 (digit span at age 5.0, and memory for names at age 5.5) to .77 (letter knowledge at age 5.5). The ROC scores for letter knowledge from age 4.5 onwards, PA from age 3.5 onwards and RAN at the age of 5.5 were at least .70 or higher, indicating a moderate power to predict RD.
Illustrating individual risk for reading disability
The probability curve presentation (shown in Figure 1) was employed to illustrate the estimated probability for RD. The results of the previously presented age-specific logistic regression models (seen in Table 2) are expressed in the form of curves. For simplicity, letter naming is placed on the x-axis and the curves represent a high and low mastery (i.e., +2 and −2 z-score level) of the PA and RAN skills. To make the presentation more informative, it was divided into areas by shadow-coding representing the different levels of risk for RD: the high, moderate and low risk areas. The areas include also false positive cases (i.e., 5.6–24% in the high and 11–27% in the moderate risk areas). The probability of RD can be roughly determined for each individual child by utilising the probability curve presentation.2
Estimating individual risk for reading disability
The aim of the present study was to find a clinically usable and parsimonious procedure for evaluating an individual child's risk for RD. Phonological awareness, short-term memory, rapid naming of objects, expressive vocabulary, pseudoword repetition, and letter naming as well as familial risk status were used as the predictors of RD. Our analyses indicated that sensitive indices of an individual child's risk for reading disability can be built based on logistic regression modelling at the age of 3.5 years, i.e., five years before the assessment of decoding and recoding skills. The prediction probability of the 3.5-year and the 4.5-year model (area under ROC) was .81 and that of the 5.5-year model .84. It is notable that the across-age model with the ROC score of .85 was only slightly better in prediction than the age-specific models. The prediction probability of performance in single tasks was moderate, i.e., the predictions which were successful in avoiding false negative cases always included a substantial amount of false positive cases.
Reading disability could be predicted with three measures at each age phase: At the ages of 3.5 and 5.5 years the best measures were familial risk status, RAN and letter knowledge. At 4.5 years the combination was slightly different as phonological awareness, instead of RAN, was the best complementing predicting measure. However, this could have been due to the fact that we did not have a measure of RAN in the task battery for 4.5-year-old children. Our results are in line with recent findings showing that early RAN, letter naming and phonological awareness are the most powerful predictors of dyslexia in the second grade (De Jong & Van der Leij, 2003; Catts et al., 2001; Pennington & Lefly, 2001).
Earlier studies have usually reported prediction rates for children of 5–6 years using the logistic regression procedure with a probability score cutoff level of .50. The classification accuracy for the familial risk sample in the study by Pennington and Lefly (2001) was 69% for the overall model, 69% for sensitivity, and 76% for specificity. If the sensitivity level in our sample is fixed at the same (i.e., 69%) level, specificity would be 81% and overall correctness 78% (and producing a cutoff level of .31 in our sample). Elbro et al. (1988) as well as Catts et al. (2001) reported classification results with nearly identical percentages compared to Pennington and Lefly's and our findings. Thus, the key predictors from very different language environments interestingly produce fairly similar classification rates for dyslexia. It is interesting that the prediction rates and key predictors that emerge in the analyses are highly similar to those reported in the earlier literature (letter knowledge, RAN and phonological awareness capture the main variance, and other measures, such as expressive language, add little to them), although in the highly regular Finnish language reading fluency (i.e., deficiencies in fluent reading of text and words) contributes more heavily to the classification of RD.
With the aid of the probability curve presentation each individual child's risk of RD can be calculated easily just by entering the relevant scores. The curves also illustrate other interesting points. First, having a familial risk for dyslexia increases the probability of RD dramatically. Second, good early development of letter naming skills decreases the probability of RD dramatically. If a 3.5-year-old child has a letter naming Z-score of .50 or higher (i.e., the child names 4 or more letters correctly), for instance, he or she has a very low probability of RD (always less than 10%). The curves for 4.5- and 5.5-year-old children suggest that for a child with low letter knowledge the probability of RD is lower if he or she has good skills in either RAN or phonological awareness. Thus the combination of the two skills together with familial status for dyslexia contributes to the probability of RD.
Our study suggests that those who probably have a high risk of RD can be identified from the age of 3.5 years onwards, i.e., about 5 years before their dyslexia status can be reliably assessed. A child who has a familial history of dyslexia will face a reading disability approximately four times more often than a child without such a family background. The logistic-model-based mathematical equations, including familial risk status and the three key measures (i.e., letter naming, rapid naming of objects, phonological awareness), offer a rough screening procedure for evaluating an individual child's risk for reading disability. With the aid of the probability curve presentation the risk models are easy to implement in clinical practise. However, the results of the Jyväskylä Longitudinal Study of dyslexia sample need to be compared against a normative sample before applying the procedure in clinical practice.
The most important issue in determining the usability of these prediction values in clinical practice is the prevalence of dyslexia in the sample. In our sample, incidence of dyslexia within the familial risk group is around 36%. Using a sensitivity score of 89% (i.e., the logistic model at 5.5 years with a cutoff level where 90% of true positive cases are identified, see Table 3), the positive prediction value (PPV) is .60, which indicates that 60% of those who are estimated as having a risk of dyslexia will actually face it. Correspondingly, using the specificity score (i.e., 67%) of the same model and cutoff level, the negative prediction value (NPV) is .92, which tells us that 92% of those who are identified as having no risk for dyslexia will actually not have a reading disability. If these scores are applied to the sample of 1,000 children at familial risk for dyslexia, 531 children will be ‘under suspicion’ of a future severe reading disability. Of these children, 319 (i.e., 60%) will actually face it. In addition, 469 will be predicted to be free of future reading problems, although 38 of them will meet reading disability in the future. On the other hand, our cutoff criterion for classifying RD picks up around 12% of children. Using this prevalence, one ends up with a PPV value of .27 and NPV value of .98. If the latter case is applied to the sample of 1,000 children, 397 will be ‘under suspicion’ of future severe reading disability. However, only 107 (i.e., 27%) will face it. In addition, the later model predicts that 603 will be free of future reading problems, although 13 will meet reading disability in the future.
The predictive models also maintained their discriminative ability when the classification analyses were conducted with a weighting procedure which took into account that the proportion of children at risk is smaller in the general population than in the present sample. In general, the specificity rates improve somewhat, but at the cost of reduced sensitivity. For example, at 5.5 years the predictive model with weighted scores would pick up a number of positive cases (i.e., those suspected of RD) of whom 54% will face RD, and a number of children negative cases ( i.e., those not suspected with RD) of whom 95 % will not face RD assuming that the prevalence of dyslexia is around 12 % in a general population and around half of individuals with dyslexia have the familial risk background.
Thus, although the sensitivity and specificity indices and the total prediction probability rates imply that the logistic models have clinically useful discrimination power, the predictions identify not only the true positives (i.e., children with dyslexia) but also some children who later do not manifest severe reading disability (‘false alarms’). The proportion of them is dependent on the dyslexia criterion used, the probability cutoff level set, and the population to which the prediction models are applied. The models are always more accurate in identifying children who will not eventually have a reading disability than those who end up with a reading disability.
In the preliminary analyses, the logistic regression models using the Forced Choice procedure (i.e., using the Enter method) indicated that all the predictors contributed to the outcomes, but the models offered only slightly higher explanation values (1.5 to 2.1 percentage units in each age) than the reported Forward Wald procedure.
A hypothetical example of determining an individual child's risk for RD. For a 4.5-year-old child with a familial risk for RD having a standard score of −1.17 in letter naming and −2.0 in phonological awareness, the probability of RD would be defined using the probability curve presentation (see Figure 1) in the following way. The probability score is found at the y-axis by locating the intersection of letter naming at −1.17 (x-axis) and the curve representing phonological awareness at −2.0 (curve labelled with R, PA −2.0), which leads to a probability score at the level of .70 or more (indicating a 70% probability of RD). The score of the example case is located within the high risk area. Another way of determining a child's individual probability is to count the exact probability score (based on Beta weights in Table 2) with the equation: 1/[1 + e −(−2.681 + 1.250−.987*(−1.17)−.406*(−2.0)] = .788.
We are grateful to all the families and children as well as to their teachers for their long-lasting cooperation with the JLD project. We would like to offer special thanks also to the whole inspirational research team of JLD and the Niilo Mäki Institute and especially to Pekka Räsänen who tutored us to the relevant analyses. The authors wish also to thank Pirkko Rytinki for her comments on language. The Jyväskylä Longitudinal Study of Dyslexia (JLD) was part of the Finnish Center of Excellence Program (2000–2005) and was supported by the Academy of Finland. This research was also supported by The Mannerheim League for Child Welfare and Department of Psychology, Jyväskylä.