Recent years have seen a surge in accounts motivated by information theory that consider language production to be partially driven by a preference for communicative efficiency. Evidence from discourse production (i.e., production beyond the sentence level) has been argued to suggest that speakers distribute information across discourse so as to hold the conditional per-word entropy associated with each word constant, which would facilitate efficient information transfer (Genzel & Charniak, 2002). This hypothesis implies that the conditional (contextualized) probabilities of linguistic units affect speakers’ preferences during production. Here, we extend this work in two ways. First, we explore how preceding cues are integrated into contextualized probabilities, a question which so far has received little to no attention. Specifically, we investigate how a cue's maximal informativity about upcoming words (the cue's effectiveness) decays as a function of the cue's recency. Based on properties of linguistic discourses as well as properties of human memory, we analytically derive a model of cue effectiveness decay and evaluate it against cross-linguistic data from 12 languages. Second, we relate the information theoretic accounts of discourse production to well-established mechanistic (activation-based) accounts: We relate contextualized probability distributions over words to their relative activation in a lexical network given preceding discourse.