Abstract. We investigate resampling methodologies for testing the null hypothesis that two samples of labelled landmark data in three dimensions come from populations with a common mean reflection shape or mean reflection size-and-shape. The investigation includes comparisons between (i) two different test statistics that are functions of the projection onto tangent space of the data, namely the James statistic and an empirical likelihood statistic; (ii) bootstrap and permutation procedures; and (iii) three methods for resampling under the null hypothesis, namely translating in tangent space, resampling using weights determined by empirical likelihood and using a novel method to transform the original sample entirely within refection shape space. We present results of extensive numerical simulations, on which basis we recommend a bootstrap test procedure that we expect will work well in practise. We demonstrate the procedure using a data set of human faces, to test whether humans in different age groups have a common mean face shape.