The full text of this article hosted at iucr.org is unavailable due to technical difficulties.

Original Article

Word frequency and readability: Predicting the text‐level readability with a lexical‐level attribute

Xiaobin Chen

Corresponding Author

E-mail address: xiaobin.chen@uni‐tuebingen.de

LEAD Graduate School and Research Network, Seminar für Sprachwissenschaft, Universität Tübingen, , Tübingen, Germany

Address for correspondence: Xiaobin Chen, LEAD Graduate School and Research Network, Universität Tübingen, Gartenstraβe 29a, 72074 Tübingen, Germany. E‐mail:

xiaobin.chen@uni‐tuebingen.de

Search for more papers by this author
Detmar Meurers

LEAD Graduate School and Research Network, Seminar für Sprachwissenschaft, Universität Tübingen, , Tübingen, Germany

Search for more papers by this author
First published: 19 July 2017
Cited by: 1

Abstract

Assessment of text readability is important for assigning texts at the appropriate level to readers at different proficiency levels. The present research approached readability assessment from the lexical perspective of word frequencies derived from corpora assumed to reflect typical language experience. Three studies were conducted to test how the word‐level feature of word frequency can be aggregated to characterise text‐level readability. The results show that an effective use of word frequency for text readability assessment should take a range of characteristics of the distribution of words frequencies into account. For characterizing text readability, taking into account the standard deviation in addition to the mean word frequencies already significantly increases results. The best results are obtained using the mean frequencies of the words in language frequency bands or in bands obtained by agglomerative clustering of the word frequencies in the documents – though a comparison of within‐corpus and cross‐corpus results shows the limited generalizability of using high numbers of fine‐grained frequency bands. Overall, the study advances our understanding of the relationship between word frequency and text readability and provides concrete options for more effectively making use of lexical frequency information in practice.

Number of times cited: 1

  • , Lexical characteristics of written language input across primary grades: An analysis of a Dutch corpus based lexicon, Linguistics and Education, 10.1016/j.linged.2018.12.002, 49, (11-21), (2019).