Information properties of morphologically complex words modulate brain activity during word reading

Abstract Neuroimaging studies of the reading process point to functionally distinct stages in word recognition. Yet, current understanding of the operations linked to those various stages is mainly descriptive in nature. Approaches developed in the field of computational linguistics may offer a more quantitative approach for understanding brain dynamics. Our aim was to evaluate whether a statistical model of morphology, with well‐defined computational principles, can capture the neural dynamics of reading, using the concept of surprisal from information theory as the common measure. The Morfessor model, created for unsupervised discovery of morphemes, is based on the minimum description length principle and attempts to find optimal units of representation for complex words. In a word recognition task, we correlated brain responses to word surprisal values derived from Morfessor and from other psycholinguistic variables that have been linked with various levels of linguistic abstraction. The magnetoencephalography data analysis focused on spatially, temporally and functionally distinct components of cortical activation observed in reading tasks. The early occipital and occipito‐temporal responses were correlated with parameters relating to visual complexity and orthographic properties, whereas the later bilateral superior temporal activation was correlated with whole‐word based and morphological models. The results show that the word processing costs estimated by the statistical Morfessor model are relevant for brain dynamics of reading during late processing stages.

been suggested to guide the neural learning process and organization of brain functions (Friston, 2010). Assuming that efficiency serves as a guiding principle also in the implementation of neural computations involved in reading, NLP models may be able to predict neuroimaging data and provide useful descriptions of the underlying processes.
Moreover, describing behavioral and neural data by measures derived from statistical properties can complement description based solely on formal linguistic rules.
In the present study, we used magnetoencephalography (MEG) to quantify the millisecond-scale neural dynamics of visual word recognition, and relate these results to word measures from a NLP model that optimizes the representation of words. Recognition of morphologically complex words is an important aspect of reading that could be addressed by models emerging from recent efforts in NLP.
A complex word, for example, "builders" consists of multiple morphemes. A morpheme is defined as the smallest meaningful unit of language, which can either stand alone in the form of a monomorphemic word (e.g., "build") or be bound to the root (e.g., "-er" and "-s"). Alongside the NLP model we used more traditional psycholinguistic variables that have been related to visual, orthographic, morphological and lexical processing, thereby linking the present results to previous psycholinguistic literature.
A central question in the recognition of complex words is whether the brain decomposes the word into morphological constituents and then recombines the morphemes into a unified semantic meaning (Fruchter & Marantz, 2015;Taft, 2004;Taft & Forster, 1975), or whether most or all words are represented as whole forms in the mental lexicon (Butterworth, 1983). It seems that at least some mechanism to process words as a compilation of separable parts is required because in highly synthetic languages, such as Finnish, a single root word can have 150 different paradigmatic forms and the total number of possible words is counted in millions (Karlsson, 1983). In these types of languages, storing every possible whole word form in the mental lexicon seems like an uneconomical strategy for the neurocognitive system. However, the decomposition and recombination of morphological constituents may also pose additional processing costs, as suggested by longer reaction times and fixation durations to morphologically complex words than frequency-matched monomorphemic words (Hy€ onä, Laine, & Niemi, 1995;Hy€ onä, Bertram, & Pollatsek, 2005;Lehtonen & Laine, 2003;Soveri, Lehtonen, & Laine, 2007). An optimized model for human word recognition may therefore call for a combination of decomposed and full-form representations.
NLP algorithms that employ statistical machine learning have shown that morpheme-like units of representation may emerge from requirements of efficiency in information processing. Morfessor is a data-driven NLP model that has been successful in inducing morphology from raw text data without a priori linguistic knowledge (Creutz & Lagus, 2007). The model utilizes general learning principles instead of explicit linguistic rules. It is based on the minimum description length (MDL) principle (Rissanen, 1978), and is essentially a packing algorithm that seeks to build an optimally compact and descriptive lexicon of units, called morphs, for describing the training corpus. The Morfessor model thus represents a compromise between views arguing for word representation in full word forms and those that suggest mandatory word decomposition: a particular word can be represented as a full form or decomposed into morphemes depending on which representation optimizes the overall storage and processing efficiency. The morphs discovered by Morfessor can be whole words or sometimes resemble linguistic morphemes, but they are not determined by explicit linguistic rules.
In order to relate predictions of the Morfessor model to brain imaging data, we assume that the brain of an experienced reader has adapted to the statistical regularities of written language, and that the neural activation reflects this adaptation. We can examine this idea by employing tools from the mathematical theory of communication (Shannon, 1948). In information theory, surprisal (also known as selfinformation) is defined as an aspect of a probabilistic event that measures the minimum effort needed to communicate the occurrence of that event, and it is quantified by the negative log probability. When the communication is optimized, commonly occurring events require less computational capacity than rare or surprising events that are associated with high information content and high processing requirements.
The reading process may be viewed as an optimized communication channel from text to the brain, and the surprisal of a written word is thus related to the minimum processing requirements. This assumption relates to the Bayesian brain hypothesis which proposes that the brain can minimize the free energy, and thus the effort, by representing sensory inputs in an optimal Bayesian fashion, that is, the neural system is organized based on an internal model of the world that is constantly optimized to minimize the long-term average of surprisal (Friston, 2010;Friston, 2012). Surprisal can be related to the minimum neural activation strength needed to encode the information, as postulated by the efficient coding hypothesis (Barlow, 1961;Linsker, 1990). Given that increases in the MEG signal likely reflect increased neuronal processing, higher surprisal values should be linked with enhanced MEG amplitudes.
The Morfessor model defines a word's surprisal as the sum of the surprisal of its constituent morphs, and can be seen as an estimate of the minimum processing requirement needed in the brain if words are represented as independent morpheme-like units. The Morfessor values have been shown to correlate with reaction times (RTs) in a lexical decision task better than simple psycholinguistic parameters such as word length or word frequency (Virpioja, Lehtonen, Hult en, Salmelin, & Lagus, 2011;Virpioja et al., 2017). It is, however, still unclear whether the predictive power of Morfessor in RTs is linked to a particular stage of the word recognition process.
In line with the information processing framework, we hypothesize that the salient word-evoked activations that appear at different time windows and cortical areas correspond to different aspects of information representation and optimization, and the strength of the activation is proportional to the amount of information. For example, the lowlevel visual features or orthography can be very similar between two words (e.g., "current" and "currant") but their corpus-based word frequencies differ by orders of magnitude (121 vs. 1 per million; Davies, 2010). If these types of qualitatively different information properties are linked to spatio-temporally distinct brain activations, one can compare the predictive power of models that capture different aspects of stimulus-related information, and thereby approximate what type of model is most similar to the internal model operating at the neuronal population level.
Activity in the inferior temporal cortex and fusiform area between 150 and 200 ms has also been reported to be sensitive to orthographic and morphological properties of letter strings, supporting the so-called morpho-orthographic segmentation hypothesis in which decomposition is based on visual word forms and takes place prior to lexical access (Solomyak & Marantz, 2010;Zweig & Pylkkänen, 2009). After 250 ms, the left superior temporal cortex shows a sustained response that usually reaches the maximum at around 400 ms (often referred to as N400m). This activation has been linked to lexical, semantic, phonological and morphosyntactic analysis in word processing (Halgren et al., 2002;Helenius, Salmelin, & Connolly, 1998;Salmelin, 2007;Service, Helenius, Maury, & Salmelin, 2007). Several studies have also found the earliest evidence of morphological processing in this cortical area (Cavalli et al., 2016;Fruchter & Marantz, 2015;Vartiainen et al., 2009;Whiting, Shtyrov, & Marslen-Wilson, 2015).
In the present study, we extract these well-established patterns of brain activation during a lexical decision task and compare them to the Morfessor estimate as well as to psycholinguistic variables that have been linked to visual word processing (Hauk, Davis, Ford, Pulverm€ uller, & Marslen-Wilson, 2006;Pylkkänen & Marantz, 2003;Wydell et al., 2003). These variables seek to estimate aspects of low level visual (image complexity, word length), orthographic (bigram frequency), morphological (Morfessor, lemma frequency, lemma transition probability) or lexical (surface frequency) processing. To assess how Morfessor estimates and/or each of the psycholinguistic variables are related to the brain activity, we employ item-level correlation and multiple regression analysis. As each item is presented only once to avoid confounding effects of item repetition in the lexical decision task, the signal-to-noise ratio of the item-level responses is enhanced by averaging the single trials per each word across the participants.
The interpretation of the results on words is aided by a comparison to the results of a similar analysis on the pseudowords that were presented in the lexical decision task. Non-lexical variables, such as those related to visual features and orthography should be comparable for real words and pseudowords. However, neural processing related to any form of meaning should dissociate between real words and pseudowords.
We predict that the early stages of neural activity during reading will correlate best with surprisal values in the visual or orthographic measures, whereas the later activation is better captured by morphological and lexical variables, with higher neural activation associated with higher surprisal values. Of specific interest is whether the letterstring response will be better explained by orthographic or morphological variables and to what degree the sustained left temporal response will capture both morphological and lexical information measures.
Moreover, any unique predictive power of the Morfessor measure would suggest that a particular brain response is linked to processing of morpheme-like representations and that it is possible, at least to some extent, to find such units by requiring compactness of representation using the minimum description length principle.

| Participants
A total of 23 Finnish-speaking participants were recruited for the experiment. Three participants were excluded due to a low number (less than 290/360) of artifact-free trials with a correct response. Data from 20 participants were thus included in the analysis: 11 female, age 20-37 (mean 24.4, SD 5 6.4), all right-handed as assessed by the Edinburgh Handedness Inventory (Oldfield, 1971), with no reported neurological problems. The participants gave their informed consent, and were reimbursed for their time. The study was approved by the ethics committee of the Hospital District of Helsinki and Uusimaa.

| Experimental design
The experimental setup was a visual lexical decision task. The stimuli were 1,440 unique items from four categories: words, pseudowords, symbol strings as well as words and pseudowords masked with Gaussian noise (for examples, see Figure 1). The pseudowords, symbol strings and noisy stimuli were used for the functional localization step of the study. The word set consisted of 360 Finnish nouns taken from FIGURE 1 Experimental stimuli. Examples of the four functionally distinct stimulus categories: words, pseudowords, symbols strings, and (pseudo)words embedded in Gaussian random noise. Each trial consisted of a fixation cross that appeared for 500 ms, followed by a single stimulus that was displayed for 1,500 ms HAKALA ET AL.

| 2585
the Morpho Challenge 2007 corpus consisting of 55 million word tokens (2.2 million unique), which is part of the Wortschatz collection (Quasthoff, Richter, & Biemann, 2006). The set included monomorphemic, as well as inflected and derived multimorphemic words. It also had a high variance across several psycholinguistic variables to allow correlational analyses. The word length varied from 4 to 16 letters (mean 10.3, SD 2.8), frequency of word occurrence was 0.018-127 per million (mean 1.87, SD 10), and the number of linguistic morphemes was 1-5 (mean 2.8, SD 1.1; note that the root word also counts as a morpheme).
The pseudowords consisted of 360 letter strings generated randomly using a probabilistic n-gram model trained on a Finnish text corpus.
The pseudowords followed the phonotactic rules of the Finnish language, that is, they were pronounceable and resembled real words but carried no meaning. The length distribution of the pseudoword set matched that of the word set.
The noise-embedded items consisted of 60 real words and 60 pseudowords that were masked by a rectangular patch of Gaussian random noise. The level of noise was such that the word was just barely readable. The symbols were 120 letter strings in the Phoenician alphabet, with a length distribution matching that of the word set. The characters had visual qualities akin to letters, but were not easily confused with Finnish alphabets in typical fonts. None of the participants reported familiarity with ancient Phoenician writing systems. Moreover, 120 random filler words from the corpus were added to counterbalance for the 120 symbol strings in order to equalize the number of word and non-word items in the experiment.
The stimuli were projected on a screen placed at a distance of 140 cm from the participant's eyes. The items were presented in black font (lower case Courier New monospaced) on a gray background. The visual angle per letter was 0.418. Each trial consisted of a centered fixation cross, displayed for 500 ms, followed by a stimulus item displayed for 1,500 ms. The participant's task was to identify whether the item was a real Finnish word or not, as fast and accurately as possible.
Responses were given via an optical response device that reacted to index finger lift. The "yes"/"no" responses were randomly assigned to the left/right index finger, balanced across the participants. The responses did not affect the course of the experiment, nor was feedback provided during the task. If the correct response was not given within the 1,500 ms period when the word was displayed, or if the response was given accidentally before 350 ms (median RT minus three times the median absolute deviation), the trial was rejected. Each item was shown only once to each participant. The RTs from the correct responses were collected and used for a behavioral assessment.
The stimulus order was randomized and the presentation divided into six blocks, lasting around 7 min each, with short resting breaks between the blocks. The order of the blocks was balanced across the participants using the Latin square design.

| Measurements
Cortical activity during task performance was recorded with a Vectorview whole-head MEG system (Elekta Ltd., Helsinki, Finland) at the MEG Core, Aalto NeuroImaging. The system employs a total of 306 sensors at 102 locations, with each location equipped with two planar gradiometers in an orthogonal configuration and one magnetometer.
The MEG data was band-pass filtered at 0.03-200 Hz and sampled at 1,000 Hz.
Four electrodes were attached next to the eyes to record vertical and horizontal electro-oculograms (EOG) for detection of blinks and eye movements. Head position was measured with the help of indicator coils placed on the scalp and their locations determined with respect to predefined fiducial points to allow MEG co-registration with participant's anatomical magnetic resonance images (MRIs).
The anatomical MRIs, with 1 3 1 3 1 mm 3 resolution, were obtained on a separate occasion using the T1 MPRAGE sequence on the Siemens Skyra 3T MRI scanner of the Advanced Magnetic Imaging Centre, Aalto NeuroImaging.

| MEG data analysis
As each stimulus word was shown only once during the experiment to avoid priming effects, and the signal-to-noise ratio thus could not be enhanced by averaging within each participant, the responses were averaged across different participants, instead. For this reason, we first located functionally, temporally and spatially corresponding responses of written word processing (at about 100, 150, 400 ms) in individual participants. Each of these responses were then averaged over participants, separately for each word.
The continuous raw data was cleaned from external interference with the spatiotemporal Signal Space Separation method (Taulu & Simola, 2006) and low-pass filtered to 40 Hz. Epochs were extracted using a time window from 2200 to 800 ms with respect to the stimulus onset. Trials contaminated by blink or muscle artifacts were excluded. Rejection criteria were >150 lV for the EOG electrodes and >3,000 fT/cm for the MEG gradiometers. Only trials where the participant responded correctly were included in the further analysis.
The sensor-level summation of electromagnetic fields was disentangled into its underlying source-level components using the equivalent current dipole model (ECD; Hämäläinen, Hari, Ilmoniemi, Knuutila, & Lounasmaa, 1993;Salmelin, 2010). The aim was to isolate at least the four well-established components consistently identified in visual word recognition: the occipital response at around 100 ms, the left occipito-temporal response at 150 ms, as well as the sustained left temporal response reaching the maximum at around 400 ms after stimulus onset, together with its right-hemisphere counterpart.
The ECD models were first constructed separately for each participant based on the averaged field pattern of all word items, following the standard procedure (Salmelin, 2010). The field patterns were calculated using the data from planar gradiometers that are more sensitive to the cortical currents near the sensors and less sensitive to distant noise sources than the magnetometers.
Each source-level model consisted of 4-9 ECDs that adequately reproduced the observed whole-head field patterns with goodness-offit >80%. The four ECDs of interest were identified based on their location in the individual anatomical MRI, peak time, and functional behavior with respect to the different stimulus categories using the criteria presented in (Tarkiainen et al., 1999). The occipital activity peaked at around 100 ms (for individual peak latencies, range 90-131 ms, mean 104 ms, SD 11 ms). This ECD was found in 17/20 participants, and was stronger for noisy than noiseless stimuli. An ECD in the lefthemispheric occipitotemporal area reached the maximum at around 150 ms (range 140-189 ms, mean 154 ms, SD 12 ms) and exhibited a stronger response to letter than symbols strings (identified in 15/20 participants). A left temporal component with sustained activation peaking around 400 ms (range 314-478 ms, mean 389 ms, SD 52 ms) was found in 19/20 participants. A functionally similar component was also found in the right hemisphere at around 400 ms (range 331-519 ms, mean 411 ms, SD 63 ms) in the same participants. These temporal cortex sources differentiated words from pseudowords and their activation was diminished for non-word symbol strings.
Subsequently, in order to maximize the comparability of the source model across the participants, an averaged multi-ECD model was constructed: The coordinates of all identified occipital, occipito-temporal, and bi-lateral temporal components in each participant were projected from individual MRIs to the Freesurfer "fsaverage" average brain template (Dale, Fischl, & Sereno, 1999;Dale et al., 2000;Fischl, Liu, & Dale, 2001  To verify that this process retained the response functionality of the original individual ECD models that included the corresponding components, the source activation strengths for each stimulus category were averaged over participants ( Figure 2b). In agreement with the representative individual models, the averaged occipital amplitude peaked at 100 ms and was strongly increased for noisy stimuli. The occipitotemporal peak at 150 ms was strongest for text items and attenuated for symbols. This attenuation, depicted in the Figure 2b inset, is slight but significant (t test p < .001). The effect is somewhat smaller than previously reported (Tarkiainen et al., 1999), which may be due to our choice of using Phoenician alphabets as symbols. They are more similar to letters than the geometric shapes used in previous studies.
For the bilateral temporal sources, the activation diminished rapidly for symbols after 350 ms and was stronger and longer-lasting for pseudowords than words.
The amplitude values for each response type in the corresponding time-window were normalized within each participant by taking the zscore. For each word, the z-scores were then averaged across the participants. The z-scores were used instead of absolute values of source amplitudes in order to attenuate inter-individual variation of a more general nature. These single-word level measures of brain activation were brought into the linear regression model and evaluated against the values from language models.

| Psycholinguistic and NLP variables
In correlating language models with brain activation and RTs, we employ the Shannon's surprisal measure (Shannon, 1948) that defines where P x ð Þ is the probability of the event. It is expressed as units of information, e.g., bits. The surprisal measure can be obtained from any language model that quantifies the probability of a word in some way.
A model that resembles the brain's internal model should correlate better with neural activity than a model that does not. We consider several ways to quantify the word information content with variables that prisal values of its constituent morphs. As the morph dictionary is quite extensive, it is also possible to express pseudowords using the same model. The pseudowords in our study are generated using n-gram models that reflect statistical properties of real Finnish words and, consequently, they contain letter snippets that coincide with Morfessor morphs. The pseudowords thus have surprisal values that reflect the degree to which they have common elements with real words.
In order to relate the Morfessor model to other measures of morphology, we also examine lemma frequency which is the total frequency of words sharing a given root. The word lemma should be activated following successful morphological parsing. In addition, we examine TPL, which is defined as surface frequency divided by lemma frequency, that is, it expresses the conditional probability of encountering the whole word form, given the stem (Solomyak & Marantz, 2010).
Finally, surface frequency corresponds to the idea of a mental lexicon that stores whole word forms. This measure can be used as an index of full-form lexical access. The whole word's surprisal is given by the negative logarithm of surface frequency. The inter-correlations between these variables are shown in Table 1.
The correlation of the different models to brain activity and RTs (the mean value to each word across individuals) was performed using linear regression. For an initial overview, correlation coefficients of individual predictors to each cortical component were computed with simple linear regression. Next, multiple linear regressions were employed to assess the contribution of the different predictors in conjunction, that is, the source amplitudes and RTs were predicted using a linear combination of variables. Significance of a given variable in the multiple regression model indicates that the variable has unique predictive power that is not explained by its correlation with the other variables.
In order to visualize the brain-level results, grand-average waveforms were computed with respect to the predictor variable with highest correlation for a given cortical source. The stimulus words were divided into three bins corresponding to the highest third, lowest third and average values of the predictor, and the grand average responses were plotted over subjects and word bins for each source component.

| RE S U L TS
All 20 participants included in the analysis performed the task at an acceptable level (at least 80% correct). Mean accuracy was 92% (SD 4%), indicating that the participants complied well with the task instructions. The average RT for words was 852 ms (SD 5 95 ms) and for pseudowords 950 ms (SD 5 94 ms). Reaction times were significantly correlated with each of the tested variables (F(1,358), p < .001), with the exception of the TPL, in a simple correlation analysis (  Table 2, Table 3 and Figure 3. The amplitude of the occipital response at 80-120 ms after word onset was best correlated with word length (r 5 .31, p < .001). Similar results were obtained for pseudowords ( Table 2) (Table 3). For pseudowords, the correlations were significant with respect to image complexity (r 5 -.14), word length (r 5 20.16) and Morfessor (r 5 .14). The bigram frequency did not reach significance. In multiple regression only length remained significant.

| 2589
The visualization based on bins of the lowest, middle and highest values for each predictor showed that the amplitude of the occipital response increased with increasing string length (Figure 3a). For the occipito-temporal source, the effect was visible in the descending slope after the peak (Figure 3b), with increasing letter-string length associated with a steeper descent, resulting in a negative correlation coefficient in the overall time window.

| D I SCUSSION
We investigated the utility of an NLP model for morphological segmentation in predicting cortical activation patterns and RTs during visual word recognition. Models derived from an NLP framework based on optimization principles were hypothesized to mirror efficiency in neural processing. The Morfessor model induces morphology from raw text data instead of relying on predetermined linguistically defined morphs. The surprisal values derived from Morfessor reflect an optimized morphbased representation of a word and may, as such, correlate with the brain responses related to morphological processing. In comparison, the variables related to image complexity, word length, and bigram frequency should be related to processing of lower-level visual and orthographic processing. Surface frequency, when the lower-level effects are factored out, should index activation related to lexical or semantic processing.
We hypothesized that the early occipital activation could be best predicted by word length and image complexity that approximates the surprisal relating to overall visual processing, whereas the later occipitotemporal and superior temporal activations would likely be better accounted for by models using higher levels of abstraction, reflecting processes related to language processing per se.
The behavioral RT results showed that word length, bigram frequency, Morfessor, lemma frequency and surface frequency each pro- Morfessor values contain morphemes or text strings that are common in real words. It is likely that the longer reaction times for these pseudowords reflect greater difficulties to reject a pseudoword that has more in common with real words. This conclusion is consistent with the result that pseudowords with a higher number of lexical neighbors elicit longer RTs (Carreiras, Perea, & Grainger, 1997;Holcomb, Grainger, & O'Rourke, 2002). The Morfessor measure could, therefore, be considered a measure of "word-likeness" when applied to pseudowords.
At the level of the brain, the response amplitude of the 80-120 ms occipital activation increased with increasing number of letters, in line with earlier studies (Assadollahi & Pulverm€ uller, 2003;Wydell et al., 2003). This amplitude was also correlated with image complexity measured by the gif-index, as well as surface frequency and the Morfessor measure. All of these four measures give high surprisal values for long words. However, only one of these predictors was significant in the multiple regression model, suggesting that a single underlying factor is responsible for the correlation. The lowest-level common denominator here is the overall image complexity. This interpretation is further supported by the fact that the response is highly sensitive to addition of noise in the stimuli. Higher visual complexity is related to longer word length, which in turn is correlated with the other variables. The subsequent 140-200 ms activation in the left inferior occipito-temporal cortex was identified by its differentiation of alphabetic strings from graphical symbol strings. The letter-string effect points to a neural population trained by repeated exposure to written text and acting as a bridge or filter between visual and more abstract language processing (Tarkiainen et al., 1999). Part of the present research question was therefore to test to what degree this response is sensitive to orthographic and morphological properties. We found that the response strength was best predicted by word length which was interchangeable with effects of image complexity and Morfessor measures, similarly to the earlier occipital activation. However, these correlations were rather weak. In addition, the letter bigram frequencies provided unique predictive power in the multiple regression model, suggesting that the response is indeed related to abstract orthographic properties rather than low-level visual features alone. The effect seemed to stem from the descending slope following the peak of the evoked response. A similar result has previously been observed when contrasting consonant strings and (pseudo)words (Whiting et al., 2015).
In that study, significant effects emerged between 155 and 230 ms, following the peak response centered at 150 ms.
In the present study, the amplitude of the occipito-temporal response was negatively correlated with both word length and bigram frequency (the longer the word or lower the mean bigram frequency, the lower the amplitude), which seems to contradict our hypothesis that higher information content leads to more neural activation. These results might be related to activation of specialized "bigram cells", postulated by Dehaene et al. (2005), located in the left occipitotemporal sulcus. The bigram cells are thought to be active when the stimulus contains common letter bigrams for which these neurons are tuned. Hence, low bigram frequency results in low activation. Indeed, the response was also found to be reduced when adjacent letters were vertically shifted with respect to each other, which breaks the bigram form (Cornelissen et al., 2003).
We found no independent effects related to word frequency measures, TPL, or Morfessor model in the occipito-temporal response.
Previous studies, on English words, have found support for automatic form-based decomposition (Fruchter, Stockall, & Marantz, 2013), morphological decomposition indexed by TPL (Solomyak & Marantz, 2010), as well as effect of morphological complexity in a corresponding right hemispheric response (Zweig & Pylkkänen, 2009 the first would be associated with orthographic processing and the second with more abstract lexical processing (Gwilliams et al., 2016). In the present study, to ensure robust across-participants matching, the letter-string response was modeled as a single source in the more traditional fashion (Tarkiainen et al., 1999), and it may thus not capture a possible second component in a more anterior region that might be linked to lexicality or morphology.
In the bilateral superior temporal cortices, the activation reached its maximum at around 400 ms after stimulus onset. Both hemispheres differentiated between all stimulus types at 300-700 ms, and the response was characterized as a N400m type response (Salmelin, 2007). In the left temporal cortex, all frequency measures and the Morfessor model were positively correlated with the activation strength: activation increased with increasing surprisal values. The type of morphological processing that is described in the Morfessor model thus seems to be reflected in this cortical response. Temporal activation in this time-window has previously been linked to a wide variety of linguistic and nonlinguistic manipulations (Kutas & Federmeier, 2011;Salmelin, Kujala, & Liljestr€ om, in press), later-stage word recognition processes (Halgren et al., 2002), and access to semantic-syntactic representations of morphemes or their recombination to a meaningful whole (Fruchter & Marantz, 2015;Vartiainen et al., 2009). In addition, surprisal, when derived from a sentence context, has been shown to be a good predictor of the N400 amplitude (Frank, Otten, Galli, & Vigliocco, 2015). The present study shows that surprisal is a relevant measure also for predicting the response to isolated words without a surrounding sentence context.
We observed that the overall prediction accuracy for the left temporal responses was improved when using both surface frequency and the Morfessor measure together. This may imply that access to full word forms occurs in parallel with processing of the word's morphological constituents, or that some subset of words is better described by one model.
Indeed, many of our low-frequency words occur only once in the entire corpus and are, therefore, poorly differentiated based on surface frequency but have a smooth distribution on the Morfessor value scale.
In addition, we found that word length and image complexity were good predictors of the left temporal response in the case of pseudowords but performed worse for real words. This suggests that the processing of pseudowords may be primarily linked to letter-by-letter or phonological representations which are roughly indexed by word length in the orthographically transparent Finnish language, whereas the representation of real words is related to more abstract linguistic or semantic properties. The Morfessor model also proved to be a good predictor of the temporal responses in case of pseudowords, but the multiple regression model did not determine whether this result was truly independent of the word length effect.
In the right hemispheric N400m type response, reaching its maximum at around 400 ms, the Morfessor model, together with word length, image complexity and the frequency measures were positively correlated with the response amplitude. However, in the multiple regression analysis whole-word frequency became redundant. This result suggests that the right hemisphere is also actively involved in the morphological processing. Although the right hemisphere has received somewhat less attention in language studies, there have been documented cases where lesions to right hemisphere have resulted in specific inability to produce derivational morphology (Marangolo et al., 2003). Effect of inflectional morphology has also been observed on right-sided EEG responses (Leinonen et al., 2009). More generally, the right hemisphere has been proposed to become involved when processing requires additional effort (Kircher, Brammer, Tous Andreu, Williams, & McGuire, 2001;Monetta, Ouellet-Plamondon, & Joanette, 2006;Van Ettinger-Veenstra, Ragnehed, McAllister, Lundberg, & Engstr€ om, 2012) or when semantic complexity increases (Tremblay, Monetta, and Joanette, 2009). In line with this view, the righthemisphere activation in the present study could be related to the morphological complexity of words which pose particular demands on semantic integration of the constituents.
Multiple regression analysis of single-item MEG responses enabled assessment of several predicting variables simultaneously, without prior assumptions that are needed in a univariate approach. Related analysis approaches have been successfully employed before (e.g., Hauk et al., 2006;Solomyak & Marantz, 2009). In the present study, our solution for improving the signal-to-noise ratio to the single items was to average single-item responses across the participants. In order to achieve reliable responses by this approach, the variation between individuals both in terms of amplitude strength and spatial locations need to be accounted for. Here we normalized the amplitude strength for each individual before averaging and sought to equivocate the spatial location by means of functional localizers. It is also worth noting that the amount of explained variance in the MEG responses (R 2 .1) was substantially lower than in RTs (R 2 .5), despite the fact that MEG presumably captures the subprocesses involved in reading more directly.
While MEG provides a more detailed description of when and where different aspects of text are processed in the brain, its signal-to-noise ratio is lower than that of the RTs. The reading process in the brain may also entail aspects that are not captured by the evoked responses (i.e., signals phase-locked to the stimulus timing) but which are included in the all-encompassing RT measure.
To conclude, the present study offers a methodological example of how a modern NLP model may be used to address questions of language processing in the human brain. The Morfessor model seems to account for brain activation during reading: the observed good predictive power of Morfessor in lexical decision RT (Virpioja et al., 2017;Virpioja et al., 2011) seems to be related to late-stage morphological processing that is reflected in bilateral temporal cortices from about 300 ms onwards. This suggests that the type of computational properties that are expressed in Morfessor, that is, morpheme-like units derived via optimization, are also reflected in neural processing. Neural processing thus likely follows some form of optimization, in line with the information-theory based principle of minimization of effort. Our findings support the view that the brain of an experienced reader has adapted to the statistical regularities of written language, and that the neural activation reflects this adaptation. In the future, Morfessor or similar models could be used, for example, to model how the statistics of native-language vocabulary can influence the learning and representation of word forms in a new language.

ACKNOWLEDGMENTS
This work was supported by the Academy of Finland (LASTU program 256887 to R. S., 259934 to K. L.; personal grants 255349, 256459, and 283071 to R. S., 288880 to M. L., and 287474 to A. H.) and the Sigrid Jus elius Foundation (to R. S.).