Inference in gene–environment studies can sometimes exploit the assumption of Mendelian randomization that genotype and environmental exposure are independent in the population under study. Moreover, in some such problems it is reasonable to assume that the disease risk for subjects without environmental exposure will not vary with genotype. When both assumptions can be invoked, we consider the prospects for inferring the dependence of disease risk on genotype and environmental exposure (and particularly the extent of any gene–environment interaction), without detailed data on environmental exposure. The data structure envisioned involves data on disease and genotype jointly, but only external information about the distribution of the environmental exposure in the population. This is relevant as for many environmental exposures individual-level measurements are costly and/or highly error-prone. Working in the setting where all relevant variables are binary, we examine the extent to which such data are informative about the interaction, via determination of the large-sample limit of the posterior distribution. The ideas are illustrated using data from a case–control study for bladder cancer involving smoking behaviour and the NAT2 genotype. Copyright © 2011 John Wiley & Sons, Ltd.