Visual processing takes place in both retinotopic and spatiotopic frames of reference. Whereas visual perceptual learning is usually specific to the trained retinotopic location, our recent study has shown spatiotopic specificity of learning in motion direction discrimination. To explore the mechanisms underlying spatiotopic processing and learning, and to examine whether similar mechanisms also exist in visual form processing, we trained human subjects to discriminate an orientation difference between two successively displayed stimuli, with a gaze shift in between to manipulate their positional relation in the spatiotopic frame of reference without changing their retinal locations. Training resulted in better orientation discriminability for the trained than for the untrained spatial relation of the two stimuli. This learning-induced spatiotopic preference was seen only at the trained retinal location and orientation, suggesting experience-dependent spatiotopic form processing directly based on a retinotopic map. Moreover, a similar but weaker learning-induced spatiotopic preference was still present even if the first stimulus was rendered irrelevant to the orientation discrimination task by having the subjects judge the orientation of the second stimulus relative to its mean orientation in a block of trials. However, if the first stimulus was absent, and thus no attention was captured before the gaze shift, the learning produced no significant spatiotopic preference, suggesting an important role of attentional remapping in spatiotopic processing and learning. Taken together, our results suggest that spatiotopic visual representation can be mediated by interactions between retinotopic processing and attentional remapping, and can be modified by perceptual training.