• methods: statistical;
  • pulsars: general;
  • gamma-rays: stars


Machine learning, algorithms designed to extract empirical knowledge from data, can be used to classify data, which is one of the most common tasks in observational astronomy. In this paper, we focus on Bayesian data classification algorithms using the Gaussian mixture model and show two applications in pulsar astronomy. After reviewing the Gaussian mixture model and the related expectation–maximization algorithm, we present a data classification method using the Neyman–Pearson test. To demonstrate the method, we apply the algorithm to two classification problems. First, it is applied to the well-known period–period derivative diagram, where we find that the pulsar distribution can be modelled with six Gaussian clusters, with two clusters for millisecond pulsars (recycled pulsars) and the rest for normal pulsars. From this distribution, we derive an empirical definition for millisecond pulsars as inline image. The two millisecond pulsar clusters may have different evolutionary origins, since the companion stars to these pulsars in the two clusters show different chemical compositions. Four clusters are found for normal pulsars. Possible implications for these clusters are also discussed. Our second example is to calculate the likelihood of unidentified Fermi point sources being pulsars and rank them accordingly. In the ranked point-source list, the top 5 per cent sources contain 50 per cent known pulsars, the top 50 per cent contain 99 per cent known pulsars and no known active galaxy (the other major population) appears in the top 6 per cent. Such a ranked list can be used to help the future follow-up observations for finding pulsars in unidentified Fermi point sources.