• Bayesian information criterion;
  • Model selection;
  • Non-negative matrix decomposition;
  • Reversible jump;
  • Surface-enhanced resonance Raman scattering;
  • Thermodynamic integration

Summary.  Recent advances in the development of technology based on Raman scattering as a chemical analytical technique have made it possible to detect spectral mixtures of multiple DNA sequences quantitatively. However, to exploit these techniques fully, inferential methodologies are required which can deconvolute the observed mixture and infer the composition of distinct DNA sequences in the overall composite. Inferring the component spectra is posed as a model selection problem for a bilinear statistical model, and the Markov chain Monte Carlo inferential methodology required is developed. In particular a Gibbs sampler and reversible jump Markov chain Monte Carlo methods are presented along with techniques based on estimation of the marginal likelihood. The results reported are particularly encouraging, highlighting that, for multiplexed Raman spectra, inference of the composition of original sequences in the mixture is possible to acceptable levels of accuracy. This statistical methodology makes the exploitation of multiplexed surface-enhanced resonance Raman scattering spectra in disease identification a reality. A Web site containing supplementary material, the spectral data that are used in the paper as well as MATLAB scripts implementing the proposed statistical methods is available at