Forecasts of ozone (O3) and particulate matter (diameter less than 2.5 μm, PM2.5) from seven air quality forecast models (AQFMs) are statistically evaluated against observations collected during August and September of 2006 (49 days) through the Aerometric Information Retrieval Now (AIRNow) network throughout eastern Texas and adjoining states. Ensemble O3 and PM2.5 forecasts created by combining the seven separate forecasts with equal weighting, and simple bias-corrected forecasts, are also evaluated in terms of standard statistical measures, threshold statistics, and variance analysis. For O3 the models and ensemble generally show statistical skill relative to persistence for the entire region, but fail to predict high-O3 events in the Houston region. For PM2.5, none of the models, or ensemble, shows statistical skill, and all but one model have significant low bias. Comprehensive comparisons with the full suite of chemical and aerosol measurements collected aboard the NOAA WP-3 aircraft during the summer 2006 Second Texas Air Quality Study and the Gulf of Mexico Atmospheric Composition and Climate Study (TexAQS II/GoMACCS) field study are performed to help diagnose sources of model bias at the surface. Aircraft flights specifically designed for sampling of Houston and Dallas urban plumes are used to determine model and observed upwind or background biases, and downwind excess concentrations that are used to infer relative emission rates. Relative emissions from the U.S. Environmental Protection Agency 1999 National Emission Inventory (NEI-99) version 3 emissions inventory (used in two of the model forecasts) are evaluated on the basis of comparisons between observed and model concentration difference ratios. Model comparisons demonstrate that concentration difference ratios yield a reasonably accurate measure (within 25%) of relative input emissions. Boundary layer height and wind data are combined with the observed up-wind and downwind concentration differences to estimate absolute emissions. When the NEI-99 inventory is modified to include observed NOy emissions from continuous monitors and expected NOx decreases from mobile sources between 1999 and 2006, good agreement is found with those derived from the observations for both Houston and Dallas. However, the emission inventories consistently overpredict the ratio of CO to NOy. The ratios of ethylene and aromatics to NOy are reasonably consistent with observations over Dallas, but are significantly underpredicted for Houston. Excess ratios of PM2.5 to NOy reasonably match observations for most models but the organic carbon fraction of PM2.5 is significantly underpredicted, pointing to compensating error between secondary organic aerosol (SOA) formation and primary emissions within the models' photochemistry and emissions. Rapid SOA formation associated with both Houston and Dallas is inferred to occur within 1 to 3 h downwind of the urban centers, and none of the models reproduce this feature.