• design (process simulation);
  • optimization;
  • machine learning

A central problem in modeling, namely that of learning an algebraic model from data obtained from simulations or experiments is addressed. A methodology that uses a small number of simulations or experiments to learn models that are as accurate and as simple as possible is proposed. The approach begins by building a low-complexity surrogate model. The model is built using a best subset technique that leverages an integer programming formulation to allow for the efficient consideration of a large number of possible functional components in the model. The model is then improved systematically through the use of derivative-free optimization solvers to adaptively sample new simulation or experimental points. Automated learning of algebraic models for optimization (ALAMO), the computational implementation of the proposed methodology, along with examples and extensive computational comparisons between ALAMO and a variety of machine learning techniques, including Latin hypercube sampling, simple least-squares regression, and the lasso is described. © 2014 American Institute of Chemical Engineers AIChE J, 60: 2211–2227, 2014