We describe the application of non-negative matrix factorization to generate compact reconstructions of quasar spectra from the Sloan Digital Sky Survey (SDSS), with particular reference to broad absorption line quasars (BALQSOs). BAL properties are measured for Si ivλ1400, C ivλ1550, Al iiiλ1860 and Mg iiλ2800, resulting in a catalogue of 3547 BALQSOs. Two corrections, based on extensive testing of synthetic BALQSO spectra, are applied in order to estimate the intrinsic fraction of C iv BALQSOs. First, the probability of an observed BALQSO spectrum being identified as such by our algorithm is calculated as a function of redshift, signal-to-noise ratio and BAL properties. Secondly, the different completenesses of the SDSS target selection algorithm for BALQSOs and non-BAL quasars are quantified. Combining the detection probabilities with an intrinsic E(B−V) distribution capable of reproducing the observed increase in mean E(B−V) with increasing redshift, the intrinsic C iv BALQSO fraction is 41 ± 5 per cent. Our analysis of the selection effects allows us to measure the dependence of the intrinsic C iv BALQSO fraction on luminosity and redshift. We find a factor of 3.5 ± 0.4 decrease in the intrinsic fraction from the highest redshifts, z≃ 4.0, down to z≃ 2.0. The redshift dependence implies that an orientation effect alone is not sufficient to explain the presence of BAL troughs in some but not all quasar spectra. Our results are consistent with the intrinsic BALQSO fraction having no strong luminosity dependence, although with 3σ limits on the rate of change of the intrinsic fraction with luminosity of −6.9 and 7.0 per cent dex−1 we are unable to rule out such a dependence.