• semisupervised modeling;
  • probabilistic principal component regression;
  • soft sensor;
  • mixture probabilistic modeling

Traditionally, data-based soft sensors are constructed upon the labeled historical dataset which contains equal numbers of input and output data samples. While it is easy to obtain input variables such as temperature, pressure, and flow rate in the chemical process, the output variables, which correspond to quality/key property variables, are much more difficult to obtain. Therefore, we may only have a small number of output data samples, and have much more input data samples. In this article, a mixture form of the semisupervised probabilistic principal component regression model is proposed for soft sensor application, which can efficiently incorporate the unlabeled data information from different operation modes. Compared to the total supervised method, both modeling efficiency and soft sensing performance are improved with the inclusion of additional unlabeled data samples. Two case studies are provided to evaluate the feasibility and efficiency of the new method. © 2013 American Institute of Chemical Engineers AIChE J 60: 533–545, 2014