This paper reports on the influence of the number of samples used for the development of farm-scale calibration models for moisture content (MC), total nitrogen (TN) and organic carbon (OC) on the prediction error expressed as root mean square error of prediction (RMSEP) for visible and near infrared (vis-NIR) spectroscopy. Fresh (wet) soil samples collected from four farms in the Czech Republic, Germany, Denmark and the UK were scanned with a fibre-type vis-NIR, AgroSpec spectrophotometer with a spectral range of 305–2200 nm. Spectra were divided into calibration (two thirds) and prediction (one third) sets and the calibration spectra were subjected to a partial least squares regression (PLSR) with leave-one-out cross-validation using Unscrambler 7.8 software. The RMSEP values of models with a large sample number (46–84 samples from each farm) were compared with those of models developed with a small sample number (25 samples selected from the large sample set of each farm) for the same variation range. Both large-set and small-set models were validated by the same prediction set for each property. Further PLSR analysis was carried out on samples from the German farm, with different sample numbers of the calibration set of 25, 50, 75 and 100 samples. Results showed that the large-size dataset models resulted in smaller RMSEP values than the small-size dataset models for all the soil properties studied. The results also demonstrated that with the increase in sample number used in the calibration set, RMSEP decreased in almost linear fashion, although the largest decrease was between 25 and 50 samples. Therefore, it is recommended that the number of samples should be chosen according to the accuracy required, although 50 soil samples is considered appropriate in this study to establish calibration models of TN, OC and MC with smaller expected prediction errors as compared with smaller sample numbers.