Because only a small proportion of phosphorylation sites can be identified in any single phosphoproteomic profiling, statistical approaches have been widely adopted to test whether phosphoproteins are over-represented or under-represented in distinct pathways or significantly modified by specific kinases. Thus, phosphoproteomic profiling can be described as a ‘near random sampling’. Using this approach, one pathway with more actual phosphorylation sites will be characterized by more phosphopeptides than another pathway containing fewer phosphorylation sites (Fig. 2A). Also, one kinase with a greater number of modified sites will be more frequently identified, i.e. with more hits, than another kinase having fewer modified sites (Fig. 2A). Although there are a number of intrinsic biases in the currently available techniques, this framework is nevertheless fundamental for the performance of statistical analysis.
Enrichment analysis of phosphorylation-associated pathways provides efficient and robust network phospho-signatures. For example, by comparing the phosphoproteomic data sets of ESCs and induced pluripotent stem cells (iPSCs), Phanstiel et al. observed that a number of somatic-cell-related processes are significantly over-represented in iPSCs, and these results were shown to be consistent with additional analyses at both the transcript and protein levels . Thus, it was discovered that somatic cell programs are incompletely silenced in iPSCs . Also, it is possible to predict individual molecular biomarkers from the pathways or sub-networks if some of their neighbors are known disease genes  (Fig. 2B). For example, Matsuoka et al. determined that DDR-associated phosphorylated proteins are significantly enriched in the AKT-insulin pathway. Based on the results, they confirmed that an insulin responsive site in 4E-BP1, Ser111, was phosphorylated by ATM .
The available evidence suggests that the phosphoproteomic data faithfully reflect the dynamics of kinase activity in vivo. For example, by a comparison of the phosphoproteome in the presence or absence of Plk1 activity, three independent studies respectively identified 390 , 1071  and 752  differentially regulated phosphorylation sites. In total, 1979 non-redundant sites were identified, while only 26 (~ 1.3%) phosphorylation sites were identified in all three experiments . However, the Plk1 consensus sequence N/D/E-pS was significantly over-represented in all three studies [19, 20, 48]. In this regard, although the overlap rate in phosphoproteomic studies is low, kinase activity analysis can be used to generate consistent results which can serve as a robust phospho-signature (Fig. 2C). Based on the hypothesis that there is higher kinase activity when there is a greater number of modified sites, we systematically analyzed the human liver phosphoproteome and demonstrated that the activities of 60 and 67 kinases were significantly upregulated (i.e. more sites modified) and downregulated (fewer sites modified), respectively . At least for the upregulated kinases, these results are highly consistent with the known data . Also, Bennetzen et al. used two autophagy inducers, resveratrol and spermidine, to quantitatively identify the phosphoproteome regulated in the autophagic response . Using networkin [35, 36] and motif-x , a highly effective tool for phosphorylation motif discovery, they detected the two enriched motifs S/T-P and RXXS that are recognized by CDK2 and PAK4/PAK7/DMPK/CLK1, respectively . More recently, Casado et al. formally described a kinase–substrate enrichment analysis approach for predicting activated kinases in acute myeloid leukemia (AML) cells by comparing the phosphoproteome-based kinase–substrate networks obtained from control and test samples . The predictions were successfully validated in cell lines by western blotting analysis of the activity-correlated autophosphorylation sites in the predicted kinases. With this method, they also determined that certain kinases, such as CDC7, PDK1 and ERK, are more active in drug-resistant primary AML cells, while Abl, Lck, Src and CDK1 are more active in drug-sensitive cells . In addition, this strategy was used for analyzing the phosphoproteomic dynamics during mouse skin carcinogenesis, and the deregulated activities of PAK4, PKC and SRC were determined to be major drivers of malignancy .