Remapping the cognitive and neural profiles of children who struggle at school

Our understanding of learning difficulties largely comes from children with specific diagnoses or individuals selected from community/clinical samples according to strict inclusion criteria. Applying strict exclusionary criteria overemphasizes within group homogeneity and between group differences, and fails to capture comorbidity. Here, we identify cognitive profiles in a large heterogeneous sample of struggling learners, using unsupervised machine learning in the form of an artificial neural network. Children were referred to the Centre for Attention Learning and Memory (CALM) by health and education professionals, irrespective of diagnosis or comorbidity, for problems in attention, memory, language, or poor school progress (n = 530). Children completed a battery of cognitive and learning assessments, underwent a structural MRI scan, and their parents completed behavior questionnaires. Within the network we could identify four groups of children: (a) children with broad cognitive difficulties, and severe reading, spelling and maths problems; (b) children with age-typical cognitive abilities and learning profiles; (c) children with working memory problems; and (d) children with phonological difficulties. Despite their contrasting cognitive profiles, the learning profiles for the latter two groups did not differ: both were around 1 SD below age-expected levels on all learning measures. Importantly a child’s cognitive profile was not predicted by diagnosis or referral reason. We also constructed whole-brain structural connectomes for children from these four groupings (n = 184), alongside an additional group of typically developing children (n = 36), and identified distinct patterns of brain organization for each group. This study represents a novel move toward identifying data-driven neurocognitive dimensions underlying learning-related difficulties in a representative sample of poor learners.

maths at age 11 (Department for Education United Kingdom, 2017).
Our understanding of the causes of learning difficulties comes largely from studying children with a specific diagnosis (e.g., ADHD or SLI) or those selected from community or clinical samples on the basis of strict inclusion criteria (e.g., children with poor reading skills, but age-typical IQ and maths abilities). Most studies recruit children with "pure" problems (e.g., children with ADHD without comorbid dyslexia, or children with maths problems in the absence of reading problems or low IQ). There are practical advantages to this approach: it outlines clear criteria to inform practitioner decision-making about primary areas of weakness that can be used to identify intervention options.
Using strict exclusionary criteria also overemphasizes similarities within groups, and the distinctiveness between groups (Coghill & Sonuga-Barke, 2012;Kotov et al., 2017). It is widely documented that symptoms vary between children with the same diagnosis. For example, performance on cognitive tasks within ADHD groups is notoriously variable (Castellanos et al., 2005;Nigg, Willcutt, Doyle, & Sonuga-Barke, 2005). Symptoms also co-occur across groups. For example, symptoms of inattention are common in children with poor literacy and maths skills (Hart et al., 2010;Loe & Feldman, 2007;Zentall, 2007), ADHD, autism spectrum disorder (ASD; Rommelse, Geurts, Franke, Buitelaar & Hartman, 2011), SLI (Duinmeijer, Jong & de Scheper, 2012), and dyslexia (Germano, Gagliano, & Curatolo, 2010;Willcutt & Pennington, 2000). Finally, this approach of selectively grouping children does not capture the majority of struggling learners-they often do not receive a diagnosis or are characterized by complex and comorbid difficulties that would rule them out of studies with strict inclusion criteria.
For these reasons a number of researchers have advocated empirically based quantitative classification systems (Archibald, Cardy, Joanisse, & Ansari, 2013;Coghill & Sonuga-Barke, 2012;, although few studies have done this. The aim of this approach is to move away from identifying highly selective discrete groups and instead focus on identifying continuous dimensions that distinguish individuals and can be used as potential targets for intervention. Dimensions are derived through data-driven explorations of the data, with no a priori assumptions about group membership. For example, factor analysis, a statistical method that groups variables based on shared variance, is used most commonly to derive underlying dimensions from sets of symptoms or measures (e.g., Kotov et al., 2017). This technique has been used to identify dimensions of phonological and nonphonological skills in children with diagnosed SLI and dyslexia  and separate latent constructs for inattention and hyperactivity in children with ADHD (Martel, Von Eye, & Nigg, 2010). An alternative approach, as yet rarely used, is to cluster children together according to shared profiles based on empirical data. In turn this can be used to inform classification systems, and consequently treatment approaches.
Clustering algorithms have been used to identify groups of children with distinct learning (Archibald et al., 2013) and behavioral profiles (Bathelt, Holmes, the CALM Team, & Astle, in press).
In this study, we use a different data-driven approach-machine learning. Machine learning methods have rarely been applied to understanding developmental disorders (e.g., Fair, Bathula, Nikolas, & Nigg, 2012). Typical applications use supervised machine learning (Peng, Lin, Zhang, & Wang, 2013) in which the algorithm attempts to learn about predefined categories of children. Here, we use an unsupervised learning approach whereby the algorithm attempts to learn about the structure of the data itself rather than which data correspond to predefined groups. Specifically, we used Self Organising Maps (SOMs; Kohonen, 1989), a type of artificial neural network.
Due to their efficacy in visualizing multidimensional data, SOMs have been successfully applied to a variety of tasks including textual information retrieval (Lin, Soergel, & Marchionini, 1991), the interpretation of gene expression data (Tamayo et al., 1999), and ecological community modeling (Giraudel & Lek, 2001). SOMs use an algorithm that projects the original data from a multidimensional input space onto a two-dimensional grid of nodes called a "map", while preserving topographical information. This produces an intervariable representational space, wherein the geometric distance between nodes corresponds to the degree of similarity in the input data. Within the current context, input data are individual children from our sample. The map will represent the cognitive profiles of the children; the closer the children are represented within the map, the more similar their cognitive profiles. In this way, SOMs enable us to map the multidimensional space of our sample-the map will represent how different children group together because of their similar profiles, and in doing so it also learns about the dimensions that most reliably distinguish children.
We applied this technique to a large heterogeneous sample of struggling learners. Children were referred to a research clinic, the

RESEARCH HIGHLIGHTS
• First study to apply machine learning to understand heterogeneity in struggling learners.
• Large sample of struggling learners that includes children with multiple difficulties.
• Rich phenotyping with detailed behavioral, cognitive, and neuroimaging assessments.
Centre for Attention Learning and Memory (CALM), by health and education professionals for ongoing problems in attention, memory, language, or poor school progress in reading and/or maths.
Recruitment was deliberately broad to capture the wide range of poor learners in the school population. Children were accepted into the study irrespective of diagnosis or comorbidity: only non-native English speakers and those with uncorrected sight or hearing problems were excluded. Our first aim was to test whether the multidimensional structure learnt by the map reflects in different sample characteristics, such as the primary reason for referral to the research clinic (e.g., problems in attention, learning, memory, or language).
A second aim of the current study was to use the information from the SOM to identify data-driven groups within the sample.
Even though it is likely that the dimensions that distinguish children are continuous, there may be important reasons to need to group children according to their shared cognitive profile: (a) to identify shared etiological mechanisms, which will be easier with data-driven homogenous groups; and (b) to identify groups for a particular intervention. To do this the SOM was combined with another form of machine learning, k-means clustering (Lloyd, 1982). This combination identified groups of children with similar cognitive profiles. Having grouped the children with the cognitive data, we then explored the learning and behavioral profiles of these groups. We also explored differences in white-matter connectivity between the data-driven groups. White matter maturation is a crucial process of brain development that extends into the third decade of life (Lebel, Treit, & Beaulieu, 2017) and relates closely to cognitive development (Clayden et al., 2012;Stevens, Skudlarski, Pearlson, & Calhoun, 2009). The brain can be modeled as a network of brain regions connected by white matter, commonly referred to as a connectome (Hagmann et al., 2008). We derived whole-brain connectomes and compared them across the groups produced by the machine learning. In short, our second aim was to use machine learning to identify groups of children with shared cognitive profiles, and then test whether these groups differ on learning and behavioral measures, and in terms of brain organization.
This mapping process is intentionally exploratory, and given this novel application of the analytical approach alongside a unique sample, it is difficult to make clear predictions about what the algorithm will learn. The children attending the clinic completed assessments of the cognitive skills known to be impaired in children with learning-related problems including measures of phonological processing, short-term and working memory, attention and fluid reasoning (nonverbal IQ). Children with deficits in reading or language, or associated diagnoses of dyslexia or SLI often have phonological processing problems (Bishop & Snowling, 2004;Joanisse, Manis, Keating, & Seidenberg, 2000;Ramus et al., 2010;Vellutino, Fletcher, Snowling, & Scanlon, 2004). In contrast, those with specific problems in maths or diagnosed dyscalculia are typically characterized by more severe deficits in spatial short-term and working memory (Geary, 2004;Holmes, Adams, & Hamilton, 2008;McKenzie, Bull, & Gray, 2003;McLean & Hitch, 1999;Rasmussen & Bisanz, 2005;Simmons, Singleton, & Horne, 2008;Swanson & Sachse-Lee, 2001) and broader executive functions (Bull, Espy, & Wiebe, 2008;Bull, Espy, Wiebe, Sheffield, & Nelson, 2011;Szucs, Devine, Soltesz, Nobes, & Gabriel, 2013;Van der Ven, Kroesbergen, Boom, & Leseman, 2012). So, a reasonable prediction is that our large sample of struggling learners will include subgroups of children with either phonological problems or spatial short-term/working memory difficulties, and that these children will predominantly struggle with reading or maths respectively. Below average nonverbal reasoning is common among individuals with reading (Duranovic, Tinjak, & Turbic-Hadzagic, 2014;Gathercole et al., 2016;Pointus, 1981;Winner et al., 2001) and maths problems (Gathercole et al., 2016;Swanson & Beebe-Frankenberger, 2004;Fuchs et al., 2005, Cirino et al., 2015, as well as those with ADHD (Holmes et al., 2013). So, another reasonable prediction is that our sample of struggling learners will include a subgroup of children with low fluid reasoning skills, and this will be associated with problems in both reading and maths.

| Participants
Children were referred by practitioners working in educational or clinical services to the Centre for Attention Learning and Memory (CALM), a research clinic at the MRC Cognition and Brain Sciences Unit, University of Cambridge. Referrers were asked to identify the primary reason for referral, which could include ongoing problems in "attention", "learning", "memory", or "poor school progress". The only exclusion criteria were uncorrected problems in vision or hearing and English as a second language.
The initial sample consisted of 550 children. Twenty children (3.6%) were subsequently removed because of missing data on any one of the seven tasks used for the machine learning. All subsequent details refer to the remaining 530 children (see Figure 1 for recruitment). Thirty three percent were referred for problems with attention, 11% for language difficulties, 10% for memory problems, and 43% for problems with poor school progress (for 3% of children referrers did not provide a primary referral reason). The final sample (mean age = 111 months, range = 65-215 months) contained 366 boys (69%). A high proportion of boys is consistent with prevalence estimates for different developmental disorders within cohort studies (e.g., Russell, Rodgers, Ukomunne, & Ford, 2014).
Children were recruited with single, multiple or no diagnosis.
The majority did not have a diagnosis (340, 64%). The prevalence of diagnoses were: ASD = 6%; dyslexia = 6%; obsessive compulsive disorder (OCD) = 2%. Twenty-two percent of the sample had a diagnosis of ADD or ADHD, and further 11% were under assessment for ADHD (on an ADHD clinic waiting list for a likely diagnosis of ADD or ADHD). Finally, 19% of the sample had received support from a Speech and Language Therapist (SLT) within the past 2 years, but did not typically have a diagnosis of SLI. Written parental consent was obtained and children provided verbal assent.

| Behavior
Parents/carers completed the Behavioural Rating Inventory of Executive Function (BRIEF; Gioia, Isquith, Guy, & Kenworthy, 2000). This is designed to assess behavioral skills associated with executive function on eight scales, including planning, working memory, inhibition, impulse control, and emotional regulation. Complete data were available for 99% of our 530 children.
The Children's Communication Checklist (CCC-2; Bishop, 2003) was also administered. This consists of eight scales assessing a child's structural language (e.g., speech, syntax, semantics), pragmatic communication skills (e.g., turn taking, initiation, and use of context), and two additional scales to assess ASD-related dimensions (social relations and interests). Complete CCC-2 data were available for 99% of the sample.

| Statistical methods
A SOM consists of a predefined number of nodes laid out on a twodimensional grid plane; each node corresponds to a "node-weight vector" with the same dimensionality as the input data. In our case, showing recruitment avenues and exclusions each node will have seven weights associated with it (one for each cognitive task). A rule of thumb for determining map size, is to use a number of nodes equal to around 5 times the square root of the number of observations (Tian, Azarian, & Pecht, 2014). In this case, we used a 10 by 10 grid of nodes.

| Training the map
SOMs were trained using the neural network toolbox in MATLAB v2017a (MathWorks Inc., Natick, MA). SOMs consist of a predefined number of nodes laid out on a two-dimensional grid plane.
Each node corresponds to a weight vector with the same dimensionality as the input data. We initialized the node weight vectors using linear combinations of the first two principal components of the input data. SOMs were then trained using a batch implementation, in which each node i is associated with a model m i and a "buffer memory". One cycle of the batch algorithm can be broken down into the following: Each input vector x (t) is mapped onto the node with which it shares the least Euclidean distance at time t. This node is known as its Best Matching Unit (BMU). Each buffer sums the values of all input vectors x (t) in the neighborhood set belonging to node i and divides this by the total number of these input vectors to derive a mean value. All m i are then updated concurrently according to these values. In this way, neighboring nodes become more similar to one another. This cycle is repeated, clearing all the buffers on each cycle and distributing new copies of the input vectors into them. The neighborhood size (ND) decreases as a function of t over n steps in an "ordering" phase, from the initial neighborhood size (INS) down to 1 (Equation 1). In the "fine tuning" phase the neighborhood size is fixed at <1, meaning that the node weights are updated according only to the input vectors for which they are the BMU. This node adjustment process is the mechanism by which the SOM learns about the input data. In the current training process, we used 5 "ordering" runs and a single final fine tuning run.
At the end of the training process: (a) the weight vector for each individual node reflects the scores of the children for whom that node was the BMU; (b) neighboring nodes have similar weights, such that children with similar cognitive profiles are allocated to nodes that are near each other. In essence, the machine learning process generates a model of the multidimensional cognitive data set on which the SOM was trained.

| Exploring the distributions of different groups of children
Once the map had been trained we tested whether different groups of children cluster together. For example, if a child's diagnosis predicts their cognitive profile, then children with the same diagnosis ought to cluster together. That is, they ought to sit on nodes that are near one another. However, if there is no systematic relationship between this characteristic and a child's cognitive profile then this group will be randomly scattered across the map. We tested this both for diagnosis (ASD, dyslexia and ADHD) and the referrer's primary reason for sending the child to the CALM clinic (problems with attention, language, memory, or poor school progress).
To do this, the BMU was tested for each different group. The topographical distribution of this was tested statistically using a version of the Kolmogorov-Smirnov test adapted for 2-dimensional data from two samples (Peacock, 1983). The statistic (D) tests whether the two samples are drawn from the same or different 2-dimensional distributions. In each case we compared the distribution of members of a particular category (e.g., referred for language problems) with that of nonmembers (e.g., those not referred for language problems). A significant statistic indicates that the two distributions are not drawn from the same underlying population-i.e., that this particular way of categorizing children is significantly predictive of the cognitive profile that they have. Conversely a nonsignificant result indicates that the category's members are equally likely to appear anywhere within the map.

| Data driven subgrouping
The artificial neural network maps cognitive profiles in a continuous 2D plane of nodes, where space indicates similarity. We carved our map into sections and grouped the children who fell within that section, thereby clustering children with similar cognitive profiles.
Clustering children who sit close together ought to yield groups with relatively homogenous cognitive profiles that are necessarily distinct from children in other clusters.
There is no clear theoretical rationale for how many clusters the map should be carved into. By definition, the map is fully continuous without clear boundaries. One way to validate the clusters is to test whether they generalize to data not included in the initial machine learning-this could be other cognitive data, learning measures, behavioral questionnaires, or brain data. For example, if clusters cannot be distinguished with unseen data then it suggests that the machine learning is over-fitting the data and/or the number of clusters is too high. In this case, the maps would need to be trained with fewer repetitions, a reduced set of nodes, or most likely a reduced number of clusters. To foreshadow our results, in the current sample we can identify four clusters of children. This is the maximum number of clusters that yield generalizable unique profiles.
The Supplementary Materials includes a five cluster solution, which replicates the clusters from the four cluster solution, and a statistical comparison between the two. The Supplementary Materials also includes an alternative means of grouping children that is not reliant on machine learning-community detection via a network analysis (e.g., Bathelt et al. 2018).
To identify data-driven clusters the node weight values from the SOM were submitted to k-means clustering. Once the nodes were grouped according to the similarity of their weights, we identified children assigned to each group of nodes. This provided us with clusters of children based on nodes they were assigned to in the original mapping. This process was repeated 1,000 times, with the map retrained on every iteration and the k-means clustering recalculated, to check that the clusters were robust. Inevitably some children sit on the arbitrary cluster boundary within the map and thus fall inconsistently into multiple different clusters on each iteration. Across the 1,000 iterations we were able to identify the children's modal cluster, which was used for subsequent analyses. There was a clear modal cluster for 529 children (chi-squared test, ps < 0.05). To check the clustering, each cluster distribution was plotted on the original map. If the process had worked then all cluster members ought to sit on neighboring nodes within the original map.
The cognitive profiles of the clusters were compared to identify the ways in which they differ (it is necessarily the case that they will differ). Importantly the groups were then compared on other measures not included in the machine learning, namely learning and behavioral assessments and in terms of brain organization. For all of our assessments we corrected for multiple comparisons using a Bonferroni Correction within each data type (i.e., cognition, learning, and behavioral measures). The ratios of SOM-defined groups did not differ from the behavioral sample (Cluster 1: n = 48, Cluster 2: n = 44, Cluster 3: n = 51, Cluster 4: n = 41, χ 2 = 0.01, p > 0.999). There were no significant differences between the groups in residual movement (see Table 1

| MRI data acquisition
Magnetic resonance imaging data were acquired at the MRC Cognition and Brain Sciences Unit in Cambridge, on the Siemens 3 T Tim Trio system (Siemens Healthcare) using a 32-channel quadrature head coil. T1-weighted volume scans were acquired using a whole brain coverage 3D Magnetization Prepared Rapid Acquisition Gradient Echo (MP RAGE) sequence acquired using 1 mm isometric image resolution. Echo time was 2.98 ms, and repetition time was 2,250 ms. Diffusion scans were acquired using echo-planar diffusion-weighted images with an isotropic set of 60 noncollinear directions, using a weighting factor of b = 1,000s × mm −2 , interleaved with a T2-weighted (b = 0) volume. Whole brain coverage was obtained with 60 contiguous axial slices and isometric image resolution of 2 mm. Echo time was 90 ms and repetition time was 8,400 ms.

| Structural connectome construction and comparison
First, MRI scans were converted from the native DICOM to compressed NIfTI-1 format. Next, correction for motion, eddy currents, and field inhomogeneities was applied using FSL eddy (see Figure 2 for an overview of processing steps). Furthermore, we submitted the images to nonlocal means de-noising (Manjon, Coupe, Marti-Bonmati, Collins, & Robles, 2009) using DiPy v0.11 (Garyfallidis et al., 2014) to boost signal-to-noise ratio. The diffusion tensor model was fitted to derive maps of fractional anisotropy (FA) using dtifit in FSL v.5.0.6 (Behrens et al., 2003). A constant solid angle (CSA) model was fitted to the 60-gradient-direction diffusion-weighted images using a maximum harmonic order of 8 using DiPy. Whole-brain probabilistic tractography was performed with 8 seeds in any voxel with a General FA value higher than 0.1. The step size was set to 0.5 and the maximum number of crossing fibers per voxel to 2.
For ROI definition, T1-weighted images were submitted to nonlocal means denoizing in DiPy, robust brain extraction using ANTs v1.9 (Avants et al., 2011) To investigate regional differences, we calculated the sum of all connections per region within the connectome. Regions that showed a significant difference between a deficit group (C1, C2, C4) and an age-appropriate performance group (C3) were selected (t-test: p uncorrected < 0.05) and further tested against the external comparison group (method adapted from Shen et al., 2017).
Only regions that displayed a significant difference relative to the external comparison sample were included (FDR-corrected p < 0.05).

| Comparison of the weight matrices
A good way to demonstrate how the SOM represents the cognitive data is to plot the values for each weight vector (i.e., the weights that correspond to each individual task) across the grid of nodes. This can be seen in Figure 3. F I G U R E 2 Overview of processing steps to reconstruct a white matter connectome from diffusion-weighted and T1-weighted MRI data F I G U R E 3 Weight distributions from the self-organizing map, split by task. For each task the map depicts high weights (i.e., good performance) as yellow squares and low weights (i.e., poor performance) as black squares. The Pearson correlation between the weight distributions can be seen in the bottom-right matrix

| Exploring distributions of different categories of children
To explore whether different sample characteristics (diagnostic status, referral reason) are reflected within the map, the best matching node for different groups of children was selected. If category membership significantly predicts a child's cognitive profile then these children should sit together in the map. Conversely, if membership is not predictive then the distribution of members should not differ significantly from that of nonmembers. Figure 4 shows the distribution of all children within our network, then for each category of primary referral reason and then each of the major diagnoses. The statistics are shown under each topography. None are significant.
That is, children are evenly scattered regardless of the primary reason for referral or diagnosis; each of these characteristics provides no information about a child's cognitive profile on our measures.

| Common cognitive profiles
To identify children with common cognitive profiles, the map was The second cluster has difficulties on the spatial STM, and verbal and spatial WM measures. This group is called the "Working Memory Deficits" group. The fourth cluster has difficulties tasks with a verbal component: vocabulary, phonological awareness, verbal STM, and verbal WM. This cluster is called the "Phonological Deficits" group.
The profiles of the four clusters can be seen in Figure 5, with scores and group comparisons presented in

| Learning and behavioral profiles of the data-driven groups
The four clusters also have important differences on other measures not included in the machine learning, which are also shown in The subscale scores for both BRIEF and CCC-2 questionnaires, split by group, can be seen in Table 2. Correlation matrices for both the BRIEF and CCC-2 can be found in Tables S1 and S2. Before comparing the groups a PCA was conducted separately for the subscales of each questionnaire to reduce the number of comparisons.
These analyses identified two factors in the BRIEF, which together explained 76.1% of the variance. The rotated factor solution and scale loadings can be found in Table S3. The first factor captured the working memory, initiate, planning, organization, and monitor subscales. The emotional control, shift, inhibit, and monitor subscales loaded most highly on the second factor. The first factor therefore corresponds to "Cold" executive functions associated with behavioral regulation, while the second corresponds more closely to "Hot" cognitive aspects of executive function. Factor scores were saved and compared across groups: there were no significant differences in behavior across the clusters (all ps > 0.05).
There were also two factors within the CCC-2, explaining 74% of the variance. The rotated factor solution can be found in

| White matter differences between the data-driven groups
Differences in white matter connections between the SOM-defined groups were investigated to uncover the neurobiological correlates of the grouping. Each of the deficit groups (Clusters 1, 2 and 4) was compared to the Age Appropriate group (Cluster 3) and an independent sample of typically developing children (TD). Statistical comparison of connection strengths by region indicate significantly lower connection strengths for frontal, temporal, parietal, and subcortical connections in Cluster 1 compared to Cluster 3 and TD (see Table 3 and Figure 6). There was no significant difference in regional connection strength between Cluster 2 and Cluster 3 or between Cluster 2 and TD. The comparison of Cluster 4 and Cluster 3 indicated significantly lower strength of parietal connections and the comparison with TD indicated significantly different frontal connections.

Regional comparison indicated a significant reduction for
Cluster 1 (Broad Deficits) compared to both comparison groups for the right inferior frontal gyrus (see Figure

| D ISCUSS I ON
We used machine learning to identify the cognitive profiles within a large heterogeneous sample of children with learning-related problems. These profiles were represented as topographical maps. None of the known characteristics of the children (e.g., diagnosis or referral route) were predictive of the cognitive profiles identified by the machine learning. To highlight the cognitive profiles that exist within the dataset, we subsequently carved the topographical maps into four sections. The children that correspond to these four sections will necessarily have distinct cognitive profiles, but they could also be distinguished in terms of learning and behavioral scores, and patterns of brain organization. The four groups cut across any traditional diagnostic groups that existed within the data.
More than half of the sample fell into two extreme groups, one with age-appropriate cognitive abilities and the other with widespread cognitive deficits that were at least one standard deviation below age-typical levels across all tasks. There was no evidence that F I G U R E 6 Regions with consistent significant differences in node degree between Cluster 1 and the control groups (blue) and Cluster 4 and the control groups (red) cognitive skills. For example, the right inferior frontal gyrus is implicated in multiple different executive functions, most commonly measures of inhibitory control (Aron, Robbins, & Poldrack, 2014); the lateral occipital cortex has been found to be modulated by visual attention (Sprague & Serences, 2013); left premotor areas have been linked to language-related difficulties in both children and adults (Mayes, Reilly, & Morgan, 2015;Scott, McGettigan, & Eisner, 2009); and the fusiform gyrus has been suggested as a locus of immature processing of word forms in dyslexia (Tamboer, Vorst, Ghebreab, & Scholte, 2016). These general struggling learners are rarely studied, but our data suggest that they are common amongst those coming to the attention of children's specialist services. Their relative underrepresentation in studies of learning-related problems means that we have little understanding of the key underlying deficits, mechanisms or potential routes to effective intervention. It is also interesting to note that girls were disproportionately common in this group, relative to the sample as a whole or indeed relative to most studies of learning difficulties. Conversely very few girls appeared in the age-appropriate cognitive profile group. In short, the girls referred to the study tended to have more severe cognitive and learning difficulties. One possibility is that there is a gender bias in the reason for children coming to the attention of children's specialist services, with boys being identified more commonly for behavioral difficulties (which may be less closely tied to cognitive and learning profiles), whereas more severe cognitive or learning difficulties are needed for girls to come to the attention of specialists.
Two intermediate groups, both with fluid reasoning scores in the low-average range, were also identified. One intermediate group was characterized by problems on tasks requiring phonological processing, with performance around three quarters of a standard deviation below age-expected levels on measures of phonological awareness, and verbal short-term and working memory. These children had significant problems with structural aspects of communication, mirroring the well-documented link between phonological processing difficulties and specific difficulties with language (Bishop & Norbury, 2002;Bishop & Snowling, 2004;. However, the learning profile demonstrates equivalent and large deficits across measures of reading, spelling, and mathematics. Poor phonological processing is associated with both poor reading (Carroll & Snowling, 2004;Snowling, 1995;Wagner & Torgesen, 1987)  A consistent finding within the field of learning difficulties is that phonological problems are linked selectively with reading. The majority of these findings come from studies that select poor readers, but this is not the same as demonstrating that phonological impairments will always result in selective reading difficulties. Our data suggest that children selected on the basis of phonological difficulties will actually have more widespread learning problems. Membership of the phonological deficit group was associated with reduced structural connectivity in the left precentral gyrus and rostral anterior cingulate, relative to both comparison groups. The precentral gyrus has been implicated in language processing and is thought to be involved in speech production and also decoding via articulatory simulation (Scott et al., 2009). This area has also been implicated in selective language impairment (Mayes et al., 2015). Furthermore, tracts of the perisylvian language network that connect temporal and frontal language areas deficits are passing the precentral gyrus and may be substantially contributing the connectomics differences. Differences in white matter properties of these tracts have been repeatedly implicated in language deficits (Rimrodt, Peterson, Denckla, Kaufmann, & Cutting, 2010;Roberts et al., 2014). This would also mirror the structural communication difficulties that these children demonstrate. Indeed, this is the only behavioral measure that aligns well with the cognitive profiles-children who perform poorly on phonological tasks are also rated as having significant structural language problems by their parents. Other behavioral measures of executive control do not align well with cognitive profiles.
The fourth group had a somewhat contrasting profile of cognitive deficit to the phonological deficit group. They were characterized by similar fluid IQ scores but had more pronounced difficulties in working memory. Their spatial short-term memory scores were over a standard deviation below age-expected levels, and half a standard deviation down on the verbal and spatial working memory measures.
Their phonological abilities were less impaired, they were not rated as having the structural language difficulties reported for the phonological deficit group, and their neural profile was less homogenous.
One possibility is that multiple different etiological routes can result in this profile of difficulties.
Despite contrasting cognitive and neural profiles, the learning profiles of the working memory and phonological deficit groups were nearly identical. This diverges strongly from a preceding literature that emphasizes a marked association between phonological difficulties and problems with literacy (Lyytinen et al., 2004;Snowling, Bishop, & Stothard, 2000;Tanaka et al., 2011), and an emerging literature that suggests strong associations between spatial short-term and working memory problems and numeracy difficulties (Bull et al., 2008;Raghubar, Barnes, & Hecht, 2010;Szucs et al., 2013). These previous studies all recruit on the basis of highly selective learning profiles (e.g., maths problems in the absence of reading difficulties) or diagnostic group, which will have overestimated the distinctiveness of these impairments within the general population of struggling learners.
Despite their utility, machine learning approaches to exploring cognitive profiles have limitations. The current combination of a multidimensional mapping method with a data-driven clustering algorithm suffers from the drawback that the number of groups within the data is underspecified. The mapping process is continuous, with no obvious boundaries, which makes it difficult to have a clear rationale about the formation of groups. Inevitably some children will sit close to a group boundary within the map. Our approach was to add clusters until the clusters did not differ on measures not included in the machine learning. This is how we arrived at four clusters. This is a relatively conservative approach, since different cognitive profiles could exist that genuinely have identical learning, behavioral, and neural correlates. Furthermore, we suspect that datasets with higher dimensionality, stemming from a more widespread battery of measures, could have greater success in identifying more widely differing cognitive profiles.
An alternative to machine learning is to use a network analysis with a community detection algorithm (e.g., Bathelt et al., 2018;Fair et al., 2012). An example of this approach applied to our data can be found in our Supplementary Materials section. This represents the children as nodes and the correlation between their profiles as edges. It is possible to use this approach to identify communities of clusters that maximize the correlation within cluster and the distinctiveness across clusters. This iterative process includes a quality of separation metric (Q) which the clustering algorithm is designed maximize. A major advantage of this approach is that no a priori assumptions about the number of clusters need to be made. However, there are also drawbacks to this alternative. The primary limitation is that a network analysis clusters children on the basis of a correlation matrix. As such it is blind to overall severity. The current sample contains a large number of children with relatively consistent poor scores across all cognitive measures and many children with stable age-appropriate scores. A network analysis would not be able to distinguish these two groups because the two profiles are highly correlated (this is indeed the case, see Supplementary Materials). The SOM uses Euclidean Distance as its primary means of locating children within the 2D topographical space, and as such is able represent both selective cognitive impairments and overall differences in severity. A further limitation is sample size. Whilst we included 530 children in the topographical mapping process, only 220 children were used in the structural neuroimaging comparison. This likely means that we only have sufficient power to detect the largest and most consistent group differences. More diffuse but equally important differences in whole brain connectome organization might exist, but a larger sample would be needed to identify them.
In summary, we used a machine learning approach that represents high-dimensional data as a 2D topography, to map the profiles of struggling learners. We combined this with a clustering algorithm to identify particular cognitive profiles represented within the map. Specifically, four profiles could be identified that comprise children with: (a) general and severe deficits, (b) age-appropriate performance, (c) working memory deficits, (d) phonological deficits.
Furthermore, these data-driven groups are likely to align closely with underlying etiological mechanisms, as evidenced by differences in brain organization across two of the deficit groups, and provide the opportunity to devise interventions that more specifically target the cognitive difficulties faced by individuals with particular profiles.

ACK N OWLED G M ENTS
The