Semi-quantitative data of peak intensities in infrared (IR) spectra of humic acids (HAs) from semiarid soils under contrasting environmental conditions (vegetation type, geological substrate and local climate) were analysed by multivariate data treatments. Resolution-enhanced IR spectra (applying a second derivative-based subtractive operator) showed a typical lignin pattern, which was coded to obtain an index used to classify the degree of diagenetic alteration of the lignin moiety in the HA fraction. Partial least squares regression (PLS) was used in the exploratory screening for supervised data reduction previous to other multivariate data treatments as well as to identify IR peaks responsive to soil dependent variables. Regression models and multi-dimensional scaling (MDS) were applied in order to classify individual IR peaks or sets of peaks associated with the degree of diagenetic alteration of organic matter, or inform on soils' potential for carbon (C) accumulation. Soil properties co-varying with the intensities of these peaks were mainly related to soil texture and consequently to water holding capacity at different pressures. Principal component analysis (PCA) based on the IR peaks selected in the previous PLS treatments maximized differentiation in terms of the impact of environmental factors on HA characteristics: (i) vegetation type (angiosperms or gymnosperms), (ii) the effect of the geological substrate (granite or limestone) on soil organic matter dynamics and (iii) soil taxonomical differences reflected by independent clusters. The successful forecasting of several factors related to soil C sequestration indicated the validity of the semi-quantitative information extracted from the IR spectra of the HAs and the potential of the multivariate data treatments used to identify biogeochemical proxies of the soil organic matter stabilization processes.