Kernel regression for determining photometric redshifts from Sloan broad-band photometry


E-mail: (DW); (YXZ)


We present a new approach, namely kernel regression, to determine photometric redshifts for 399 929 galaxies in the Fifth Data Release of the Sloan Digital Sky Survey (SDSS). Kernel regression is a weighted average of spectral redshifts of the neighbours for a query point, and higher weights are associated with points that are closer to the query point. One important design decision when using kernel regression is the choice of bandwidth. We apply 10-fold cross-validation to choose the optimal bandwidth, which is obtained as the cross-validation error approaches its minimum. The results show that the optimal bandwidth is different for different input patterns: the lowest rms error of photometric redshift estimation arrives at 0.019 using colour+eClass as the inputs, the lowest rms errors comes to 0.020 using ugriz+eClass as the inputs. Where eClass is a galaxy spectral type, and 0.021 using colour+r as the inputs. Thus, in addition to parameters such as magnitude and colour, eClass is a valid parameter with which to predict photometric redshifts. Moreover, the results suggest that the accuracy of estimating photometric redshifts is improved when the sample is divided into early-type and late-type galaxies; in particular, for early-type galaxies, the rms scatter is 0.016 with colour+eClass as the inputs. In addition, kernel regression achieves high accuracy when predicting the photometric eClass rms= 0.034) using colour+r as the input pattern. For kernel regression, the accuracy of the photometric redshifts does not always increase with the number of parameters considered, but is satisfactory only when appropriate parameters are chosen. Kernel regression is a comprehensible and accurate regression method. Experiments reveal the superiority of kernel regression over other empirical training approaches.