Computational tools for inferring transcription factor activity

Transcription factors (TFs) are essential players in orchestrating the regulatory landscape in cells. Still, their exact modes of action and dependencies on other regulatory aspects remain elusive. Since TFs act cell type‐specific and each TF has its own characteristics, untangling their regulatory interactions from an experimental point of view is laborious and convoluted. Thus, there is an ongoing development of computational tools that estimate transcription factor activity (TFA) from a variety of data modalities, either based on a mapping of TFs to their putative target genes or in a genome‐wide, gene‐unspecific fashion. These tools can help to gain insights into TF regulation and to prioritize candidates for experimental validation. We want to give an overview of available computational tools that estimate TFA, illustrate examples of their application, debate common result validation strategies, and discuss assumptions and concomitant limitations.


INTRODUCTION
Transcription factors (TFs) are essential proteins that regulate gene expression by binding to specific DNA sequences in the promoter or enhancer regions of genes [1,2].They exert their regulatory activity via a diverse range of mechanisms, such as recruiting cofactors, remodeling of the chromatin state, altering epigenetic modifications, or interacting directly with the transcription machinery [3][4][5].TFs are crucial actors in many cellular processes, including development, differentiation, and response to environmental stimuli [6][7][8].It is believed that nearly half of all known TFs are expressed in any cell type, although only a small number of them are thought to be sufficient for establishing the cell type-defining gene expression programs [9,10].Due to their role in central biological processes, their dysregulation is observed in various diseases [11][12][13].Consequently, it is of great interest to gain insights into transcription factor activity (TFA).We define TFA as the regulatory impact that a TF exerts on the expression of each of its target genes, which includes any form of regulation, may it be activation or repression or other effects on transcription like alternative splicing.Instead of retrieving TFA for each individual gene, it is often summarized as a TF's influence on a set of genes, or more generally as a TF's importance for a cell state or a certain condition.TFA can be influenced by various mechanisms, including epigenetic modifications, post-transcriptional regulation, post-translational modifications, protein-protein interactions, presence of cofactors, localization, or DNA structural changes [14][15][16] (Figure 1).Investigation of TFA on all regulatory levels represents a significant research challenge, as TFs can have numerous target genes, each controlled by potentially various enhancer regions in a cell type-specific manner [15,17].For many TFs, there is limited information about which genes or processes they influence and whether they act as repressors or activators.
Given the prohibitive cost and time required to investigate all possible combinations of regulatory players experimentally, many computational tools have been proposed to analyze TFA.We aim to provide a comprehensive overview of these existing computational approaches.
This review first gives a brief outline of experimental protocols and data modalities for TFA inference.We then present the various computational tools partitioned into two main categories (Figure 2, Table 1).Methods in the first category, referred to as gene regulatory network (GRN)-based methods, rely on a TF to gene mapping in order to estimate TFA.They either use a pre-built network or create a network de novo based on tool-specific data modalities.TFA is then typically inferred for a TF and its target genes, which together form a so-called regulon [18].Genome occupancy-based tools, on the other hand, assess TFA by the genome-wide binding behavior of a TF, independent of individual target genes.The TFA inferred by these tools can be seen as a higher level estimate of TFA that summarizes the effects on individual genes.We highlight the strengths and limitations of each approach and give a comprehensive overview of the available methods, including a decision tree separating the tools by the data they use (Figure 3).We further describe experimental setups for a more direct TFA readout, discuss prevalent validation endeavors with their shortcomings, present example applications of TFA inference, and point to key research gaps and opportunities for future developments.F I G U R E 2 Overview of the general setup of computational tools to infer transcription factor activity (TFA).Among the most frequently used data types are gene expression, information on transcription factor (TF) binding, and open chromatin using restriction enzyme accessibility.Some approaches also incorporate perturbation data or use the DNA sequence for deep learning.Regulatory network-based methods depend on linking TFs to potential target genes, often by creating, including, or refining a gene-regulatory network.Conversely, genome occupancy-based approaches follow a target gene-agnostic paradigm that utilizes genome-wide signals, for example, TF footprints.Created with BioRender.com.

Data modalities in TFA inference
There is a multitude of different modalities coming from various biological assays that are used as input for computational tools to infer TFA, either individually or in an integrated fashion.The most prominent modality is the transcriptome, with RNA-seq as standard assay to quantify the expression of genes.While measurements of the transcript levels can help in finding TFs which are present in a cell type, it is also frequently used to correlate TF expression with the overall gene expression across samples or to find differential expression between conditions.
Another major type of data is information on TF binding sites (TFBSs) occupancy.TFBS occupancy can either be measured experimentally, for example, with ChIP-seq [19,20], or predicted by quantifying the agreement of a TF binding motif to the DNA sequence.
The binding motif of a TF is a representation of the preferred binding sequence, commonly encoded in a position weight matrix (PWM).
PWMs are the probabilistic quantification of the nucleotide frequencies observed at a TF's binding sites [15].Since the genome contains magnitudes more potential binding motifs than are actually bound by the cognate TFs, alternative modalities are often included to yield a more accurate TFBS prediction.It was found that TF binding correlates with multiple epigenetic data types, such as measurements of chromatin accessibility, specific histone modifications, or the presence of other cofactors [21,22].In practice, a common approach in addressing the rate of false positive motif-based TFBS predictions is to limit the search space to regions harboring such epigenetic marks, usually via peak calling.Prominent assays in this context are DNase-seq and ATAC-seq for finding regions with open chromatin, both working with enzymes that cleave accessible DNA [23,24].Those two assays can additionally be used for footprint identification.Footprints are regions within open chromatin where the binding of a TF prevents the cleavage enzymes to cut, which leads to a characteristic drop in read coverage, and thus can increase accuracy of TFBS predictions [19,[25][26][27].
Notably, other modalities that contribute to TFA, such as the quantity of proteins and their post-translational modifications, [15,16,28], are not yet widely utilized in TFA inference.Handling such modalities is challenging, due to the scarcity of data, missing knowledge on their precise biological role, and how to properly incorporate them in computational tools.Therefore, we do not further detail them here.

TA B L E 1
Overview of computational tools for TFA inference.Significance analysis shows whether a p-value is provided.We consider a software to provide a Package if it is either installable via common package management systems such as bioconda, or via container platforms like Docker.If such an environment is not provided, but the code comes with an integrated documentation (e.g., .Rd-files for R), it is labeled with a checkmark in parentheses.For programming languages, we list the three most used ones.CNN, convolutional neural network; CNV, copy number variation; 5mC, cytosine methylation; HMMs, Hidden Markov model; TF, transcription factor; TFA, transcription factor activity.

F I G U R E 3
Decision tree representation of input data and methods.The nodes represent the input data, while the leaves correspond to methods with available packages.Each method is color-coded based on its corresponding programming language.

Regulatory network-based approaches
A plethora of tools define TFA via the interaction of TFs with putative target genes and construct TF-or gene-specific regulons or try to assemble whole regulatory networks.We only consider GRNs as estimate of TFA, if the strengths of the TF-gene interactions are quantified, that is, as edge weights in a graph.
The first methods approximating TFA emerged in the early 2000s and were based on linear regression.With the limited amount of data available at that time, studies were mainly focused on yeast strains, which contain a much lower number of genes and TFs than mammalian cells [29].In general, the methods searched the upstream region of differentially expressed genes for shared sequence motifs and tried to explain expression log-ratios based on motif score and occurrence.
The coefficients of a motif that implicitly represented a TF acted as a proxy for a TF's activity [30][31][32][33].With the rising availability of ChIP data subsequent studies integrated the ChIP signal into the model [34].A popular method that was developed in 2003 was the network component analysis (NCA) technique [35].NCA generates a matrix of log-ratios of expression values for multiple samples as the product of a matrix of control strength and a matrix of TFAs.Each row in the control strength matrix represents the potential influence of TFs on a gene, while a column in the TFA matrix represents the activity of all TFs per sample.Prior information derived from ChIP experiments was incorporated into the control strength matrix where TF-gene interactions without evidence were set to zero.To be able to uniquely decompose the expression matrix, a set of constraints was imposed onto the TFA and control strength matrices.In its original implementation, NCA had several limitations which were tackled by subsequent methods.FastNCA provided an implementation with an improved computational complexity and run time [36].gNCA allowed to incorporate prior information from knockout experiments [37].ROBNCA improved the robustness of the algorithms by explicitly integrating noise and outliers of the expression data into the model [38].As NCA only checks upon initialization if all constraints are satisfied, gfNCA ensures that no violations occur during the iteration steps [39].sparseNCA extended NCA to be able to handle the incompleteness of the prior information [40].LNCA was adapted to cope with expression data sets that show high heterogeneity such as cancer samples or samples covering different cell states.In order to accomplish this, LNCA creates local expression profiles with their corresponding control strength matrices by partitioning the expression data using the k-nearest neighbor algorithm and then finds an optimal global solution [41].
Similarly, Ma and Brent also use a control strength and TFA matrix and describe TF regulation as the product of these matrices with a bilinear model [42].They further specify positive control strengths to indicate an activating effect and negative values as a repressive effect, while the TFA matrix is constrained to positive values.Their model performed best when the signs of the control strength were predetermined using TF perturbation data.Initializing the control strength matrix with ChIP-seq data led to a drop in performance, which dropped even further when using motif-based TFBS prediction.
TFA analysis in heterogeneous samples, such as those derived from cancer patients, is complicated by confounding factors.For instance, copy number variations can lead to differential gene expression even when the activity of the regulatory factor remains constant between two samples.To overcome this issue, RACER was developed to estimate TFAs in such conditions [43].The authors use a regularized linear regression model in which gene expression depends on DNA methylation levels, copy number variations, miRNA levels, and the product of TFA and TF binding strength.A feature selection procedure provides a further metric to assess the importance of the TFs by comparing the model's performance when a TF is left out.
Similarly, Jiang et al. control for the influence of copy number, DNA methylation, and TF somatic mutation in their regression framework called RABIT [44].However, their method focuses on differences in TFAs between samples by using differential gene expression as the dependent variable, and the regulatory activity of a TF is based on a significance test of the regression coefficients.
biRte is another method that is able to utilize data beyond transcriptome information [45].By integrating differential gene expression data and a regulatory network into a joint probabilistic framework, the model uncovers TFs that drive differences between two groups.mRNA expression levels are modeled using a sparse Bayesian linear regression and Markov-Chain Monte-Carlo sampling infers the activity states of the regulatory factors.Additionally, biRte can incorporate further information like prior knowledge of TFAs, TF-TF interactions, and miRNA expression data into the model.
SEPIRA relies on a large collection of expression data from public compendia, such as GTEx [46], to construct sample-specific TF-gene networks [47].These networks are formed by highly expressed TFs and their target genes, in which interactions are inferred via co-expression and encoded as activating, repressive or non-interacting.Then, a linear regression is used to predict the expression profile from the ternary interaction matrix (genes x TFs), and the t-statistic of that regression represents the TFA.Instead of predicting gene expression, Chen et al. also tested to estimate the average methylation level of promoters.
The authors of VIPER extend the idea of TFA to any type of protein, allowing for the identification of indirect regulators of expression, such as signaling proteins [18].VIPER works in a two-step fashion by first using an extension of the ARACNe [48] algorithm for network construction and then applying analytic rank-based enrichment analysis (aREA) to infer the activity of a protein.aREA checks for the enrichment of a regulon within genes that are differentially expressed.Each gene in a regulon is weighted based on the confidence of the regulator-gene interaction and its mode of action.To infer the mode and strength of action, Alvarez et al. modeled the Spearman's correlation coefficient density between each regulator and its target as a three-Gaussian mixture, which also allows each regulator-target pair to be represented by a continuous value.
NetProphet 3 predicts the probability of functional TF binding events in a gene's promoter by combining multiple weighted networks derived from expression data and sequence information with regularized gradient boosting [49].Compared to its predecessor NetProphet 2.0 [50], NetProphet 3 was developed to be more flexible and allows users to incorporate any number and type of evidence scores for TF-gene interactions.However, its performance relies heavily on the usage of TF perturbation data.The authors showed that without the perturbation data, the model was not able to outperform even simpler regression-based approaches.
ISMARA is a webserver version of Balwierz et al.'s tool MARA [51], which aims to identify key regulators by predicting either gene expression or chromatin states across samples with a linear model using a Bayesian procedure [52].TF information is provided via a collection of precalculated TFBS in promoters, and is used to find informative TF motifs, as well as to identify TF-gene interactions that explain the changes across samples.TF-TF interactions are also considered by looking at which motifs are found in TF promoters.
FindIT2 combines multiple tools into an R package, among which is the aforementioned MARA [53].The TF-related functions include linking ChIP-seq peaks to genes, which is either based on closest distance or on a defined window, and allows one to quantify the correlation of peak accessibility with gene expression or promoter accessibility.
FindIT2 also calculates the regulatory potential [54]  PageRank algorithm to rank TFs by their importance in a network [56].
The tool utilizes epigenome data to link active regulatory regions to their putative target genes by applying the method EpiTensor [57].A cell type-specific GRN is then generated based on the TFBS predictions in the regulatory regions of each gene.The node weight of a TF in its GRN corresponds to the number of differentially expressed genes regulated by the TF, while the edge weight is proportional to a TF's expression level.Applying the PageRank algorithm on the network gives the overall importance of a TF.In an extension called Taiji-reprogram the method predicts the top TFs whose differential activity explains the transcriptional differences between two conditions [58].
TEPIC also uses TF motifs in regulatory regions of genes, but in a HINT-ATAC [27] (see also Section 2.2).TEPIC then calculates TF affinity scores, followed by DYNAMITE [60] to employ a logistic regression model for each condition or time point, including empirical p-values based on a background distribution of scores.

Network-based methods in single cell
The previously listed methods were developed for the use of bulk data and were hence limited to investigating tissues, cell lines, or FACSsorted cells.However, the emergence of single-cell technologies has enabled the study of individual cells, resulting in the creation of many new computational tools, in particular for GRN inference (recently reviewed in Ref. [61]).
Behjati Ardakani et al. proposed TRIANGULATE, an extension of TEPIC to scRNA-seq where the TFAs of individual cells can be inferred using a tree-guided multitask regression model [62].TRIANGULATE takes as input the gene expression measurements in single cells along with the per gene TFAs generated by TEPIC.The TRIANGULATE setup is designed in such a way that the single cells appearing in the same lineage tree-inferred from the single cell expression data-are penalized similarly.The coefficients of this multitask learning model are then used to deduce the relevance of TFs in regulating each individual cell.
Similarly, Teschendorff and Wang built a new version of SEPIRA [47], called SCIRA [63].Regulons are still inferred from a collection of bulk RNA samples, but SCIRA allows one to successively estimate TFA in single cells.
VIPER also received a successor version, called metaVIPER, which aims to overcome the requirement for a large number of gene expression data sets from the same tissue.It tries to solve this problem by constructing tissue-specific networks from a set of heterogeneous samples and thus, allows its usage on single cell data [64].The approach is based on the assumption that regulator-target interactions may be partially conserved even across distinct lineages.Hence, given a sufficient number of different tissue-specific networks, there is a high probability that a protein shares the same targets in a subset of the networks.Further, the algorithm assumes that only the context-specific regulons will show a significant enrichment score when comparing genes that are differentially expressed in the tissue of interest.The method BITFAM applies Bayesian factor analysis to infer TFA from scRNA-seq data by decomposing an expression matrix into a weight and TFA matrix [65].While the TFA matrix represents the TFA in each individual cell, each column in the weight matrix represents the potential targets of a TF.To derive the posterior distributions of the matrices, the method leverages a collection of non-tissue-specific ChIP-seq data which is used to incorporate prior probabilities into the weight matrix.
The learned TFA values can be further used for downstream analysis tasks such as cell clustering or trajectory inference.
SCENIC is an approach that infers GRNs from scRNA-seq data and characterizes regulons in a single cell as active or not [66].To achieve its goal, the method first uses GENIE3 [67] to find genes that are coexpressed with TFs across a large number of single cells.To minimize false positives, SCENIC then checks the putative genes of a regulon for motif enrichment of their regulator.Finally, SCENIC's enrichment algorithm, called AUCell, classifies the regulons in each cell as active or not.
In its latest version, named SCENIC+, the method tries to leverage epigenome data by taking both scRNA-seq and scATAC-seq data as input [68].Using topic modeling on co-accessible regions, they identify candidate enhancers in specific cell types and states.For each TF, SCENIC+ then infers all its target genes and the cis-regulatory regions through which it exerts its effects by using Pearson correlation and gradient boosting machines.Based on the calculated GRN, SCENIC+ can also perform in silico TF perturbations and identify the most influential TF for each cell state.
Another tool for constructing cell type-specific GRNs from scRNAseq data is Inferelator 3.0 [69].It requires a prior network of TF-gene interactions to calculate TFA from the expression of a TF's target genes, and infers new GRNs via a selection of regularized regression models that estimate expression from the TFA.The prior network can be built from existing databases or with their tool inferelator-prior that predicts TFBS with PWMs in a gene's regulatory regions.
Similar to Ma and Brent's approach, TIGER is based on matrix factorization and incorporates a sign constraint on the regulatory network, as well as restricts the TFA matrix to non-negative values [70].In contrast to Ma and Brent's method, TIGER can use scRNA-seq data and uses a Bayesian approach which relies on a literature-curated network to impose prior distributions on the variables.

Genome occupancy-based approaches
As a second category of tools for TFA inference, we want to summarize those that do not require any mapping of TFs to genes.They quantify the genome-wide binding behavior of TFs and estimate their importance for the sample at hand or try to identify those responsible for changes between conditions.Usually, a TF's binding behavior is assessed by the chromatin accessibility at the binding sites.One class of such methods focuses on a specific shape of the accessibility profile, the so-called footprints.Footprint calling is often used to narrow down TFBS and can be performed on DNase-seq as well as ATAC-seq data.Both of these sequencing methods introduce their own biases which need to be corrected for, but the detectable footprints are largely shared [71].To map footprints to TFs one can either first find TF motifs and look for footprint signatures around those, or perform a posterior TF motif search in already identified footprints' sequences.
One example of such a footprinting tool is HINT, which uses hidden Markov models to identify TF footprints by using strand-specific open chromatin signals with correction for protocol-specific biases [26,27].It was adapted to work both with DNase-seq, as well as ATAC-seq data.HINT employs position dependency models for the correction of cleavage bias, which was shown to be crucial for its performance.TFA is quantified by averaging the depth of a TF's footprints and the number of reads in its flanking regions, and can be differentially compared between samples.
Similarly, TOBIAS enables genome-wide investigation of TF binding dynamics via footprint calling from ATAC-seq and offers additional analyses and visualization tools [25].Bias correction is done by calculating a dinucleotide weight matrix of the cleavage enzyme to then estimate the enzyme's expected influence and subtract it from the measured signal.Footprint scores are generated using a scoring function that considers the accessibility and depth of the local footprint.
This score is then correlated with the presence of TFBS, and a threshold is set to distinguish between bound and unbound sites.Moreover, TOBIAS allows contrasting footprints across conditions, comparison of binding specificity between individual TFs, and TF network prediction.
BaGFoot also utilizes chromatin accessibility data (DNase-seq or ATAC-seq) for TF footprinting and aims to detect changes between conditions [72].The approach focuses on the footprint depth, as well as the accessibility of the flanking region at all motif occurrences of a TF.
Using these two metrics at all motif sites allows for measuring changes or TFs that do not show a measurable footprint pattern, meaning TFs where the footprint signature is too variable to confidently determine footprints.
Similarly, diffTF also aims to estimate changes in TFA between two conditions via accessibility changes at potential TFBS [73].The algorithm scans the genome for TF binding motifs that overlap with a consensus peak set of all samples called on ATAC-seq data.For peaks containing multiple motifs of the same TF, the binding site with the highest read count across all samples is chosen as a representative.
While controlling for GC content, log2 fold-changes for all peaks of a TF are calculated and compared to a background distribution.TFA is then represented as the mean difference to the background.If additional expression data is provided, diffTF classifies TFs into activators and repressors, based on the Pearson correlation between the RNAseq counts of a TF and the ATAC-seq signal of all its putative binding sites.

Sequence-based deep learning models
There has been a tremendous advancement in deep learning models that predict epigenetic signals or gene expression from DNA sequence alone.While these models do not initially use TF information, they can successively be interrogated to find which motifs drive the predictions, or, be fed with artificial sequences.Their use in practice is hampered by the required computational resources, the cell type-specificity of the predictions, and that they have difficulties to predict the impact of the genomic environment [78].Nevertheless, these methods offer complementary insights into the syntax of the regulatory code.
Hammelman and Gifford created such a deep learning approach and used it for the identification of cell state-specific TFs in chromatin accessibility data [79,80].Their method is part of the frame- Worth mentioning is also BPNet which does not predict accessibility, but the binding profile of specific TFs from CHIP-exo data [82].Although it needs to be trained per TF, BPNet allows one to examine TF-specific motif syntax rules, specifically how the presence of other motifs and their spacing affects the predicted TF binding profile.

APPLICATIONS
TFA inference methods have been utilized in diverse settings to advance our comprehension of fundamental biological mechanisms.
Here, we showcase the application of TFA methods across various contexts, with a focus on examples from development and differentiation, cancer and aging.This is by all means not a complete list, but supposed to illustrate potential applications of TFA inference.

Development and cell differentiation
TFs are essential regulators during development for sustaining cellular potency, as well as for establishing specific cell lineages.As an example application in this context, Kamimoto et al. used their in silico perturbation approach to predict TFs that are important for axial mesoderm development in zebrafish, followed by experimental validation [83].

Cancer
Aberrant activity of TFs has been shown to contribute to cancer initiation, maintenance, progression, and drug resistance in various ways [91].Among the most prominent examples of cancer-driving TFs is the oncoprotein c-myc, whose activation increases overall transcription elongation [92].Many other classes of TFs have members that contribute to malignancies, such as forkhead box proteins [93] or the ETS family [94].The change in TFA in a cancer setting can be caused by direct effects on the TF itself, but also by indirect effects like mutations in TFBSs or altered levels of cofactors and miRNAs.The complexity and multitude of different drivers make it particularly challenging to infer altered activity of a TF.Nonetheless, TFA analyses have been used to estimate the impact of somatic mutations [18], to find interactors of genes promoting tumorigenesis [95], to serve as prognostic markers in association with survival rates [96], and to predict drug response [97].

Aging
Aging is the most significant risk factor for a wide range of diseases and is highly correlated with morbidity and mortality.Compared to their younger counterparts, aged cells exhibit changes at the transcriptional level, including a loss of cell type-specific profiles and dysregulation of developmental genes [98], as well as increased cell-to-cell variation in their expression profiles [99].TFs play a central role in the mechanisms underlying these changes [100] and are known to contribute to the initiation and progression of age-related diseases [101].Inference of TFA patterns can, therefore, provide valuable insights into the mechanisms of aging.Maity et al. used their SCIRA algorithm to analyze scRNA-seq data from a public murine aging atlas and identified TFs with differential activity levels in aging, which were linked to the dysregulation of the circadian rhythm.Further, they found TFs that could explain different macrophage subtype ratios observed in aging and potential contributors to leukemia [102].In a similar fashion, Karakaslar et al. examined the effects of aging in peripheral blood leukocytes and splenic cells and compared the transcriptome and epigenome between young and old mice [103].They applied footprinting analysis, including HINT [27], to describe TFs associated with increased inflammation upon aging.

EXPERIMENTAL MEASUREMENT OF TFA VIA PERTURBATION
TF perturbation promises a more direct readout of TFA and often serves as a resource for validation data.Hence, we want to give an overview of the huge range of experimental studies and their approaches for analysis.The list of presented works here focuses more on larger scaled setups and is not exhaustive, but is supposed to illustrate the variety of methods and designs.The role of this type of data in TFA validation will be discussed later in a separate section.
A common procedure is the perturbation of a TF via knockout, knockdown or overexpression, followed by a readout of the caused changes-mostly by transcriptome measurements.Dixit  in combination with single cell transcriptomics in mouse embryonic stem cells.Their goal was to find TFs whose inductions promote a cell state expected in zygotic genome activation [106].Using MOFA+ [107] allowed them to jointly analyze the expression of genes and of repeat elements to derive latent factors that explain variation across cells with different sgRNAs.Nakatake et al. induced hundreds of genes, including 481 TFs, in hESCs followed by transcriptomic readout after 48 h [108].
On top of measuring expression changes, they recorded microscopic images which enabled them to link induced genes to morphological changes.To identify TFs which drive differentiation into certain cell types, they correlated the perturbed transcriptome to public transcriptome data, assuming that a high similarity indicates a TF's capability to differentiate hESC into that cell type.Similarly, Joung et al. created a TF Atlas of expression profiles of hESCs overexpressing all annotated human TF isoforms (>3500) via a barcoded ORF library, coupled with scRNA-seq [109].They found drastic differences in the differentiation potential between splice isoforms for many TFs.Other works do not focus on expression changes upon TF perturbation, like Rubin et al., who combined CRISPR interference with ATAC-seq into Perturb-ATAC.They observe changes in the accessibility of chromatin in single cells and in particular the accessibility at TFBS [110].Another important resource informing on a TF's activity are cell viability screens, which frequently include loss-of-function of TFs, and can thus provide an estimate of the importance of a TF in a cell line [111,112].
Beside methods interfering with a TF's gene itself, TFA can be assessed by using a reporter gene which holds a respective TFBS in its promoter.Massively parallel reporter assays (MPRAs) measure the regulatory activity of sequences of candidate regions like enhancers, and give information on TFA via the TFBS included in the sequence.
They exists in a variety of designs, but have the downside of testing regions outside of their native chromatin context and do not allow to identify endogenous target genes of a TF [14,113].But since the reporter is usually not functional, MPRAs promise to reduce indirect effects of perturbation [114].Abe and Abe aimed to measure endogenous TFA via a viral-vector-based TF reporter battery using a bipromoter containing a reference gene and a TFA reporter gene which holds the binding motif of the TF [115].The reference gene was used to correct for transfection efficiency.They tested their constructs in human and murine cell cultures, as well as in vivo in the mouse brain.
Additionally, they measured different environmental conditions and stimuli to visualize dynamic changes with their so-called TFA profile.
Another example is work from Kreimer et al., who conducted lentiviralbased MPRAs during neural differentiation and tested selected TF motifs in regulatory sequences [114,116].The sequence with the motif is placed in front of a transcribed barcode, so that the ratio of barcode to the number of coding sequences can be interpreted as activity of the sequence.

LIMITATIONS OF TFA VALIDATION
As of now, there is no way to directly measure the activity of a TF.Every quantification is only an estimate capturing a certain modality, like the protein level, mRNA level, or availability of binding sites.TF perturbation is the most direct readout, but converts cells into an artificial state, accompanied by various confounding mechanisms which are discussed later on.An aggravating factor is the limited availability of perturbation data.Thus, it is also not possible to directly evaluate the quality of a TFA estimate, which has led to a variety of indirect approaches trying to describe the plausibility of results or to compare different methods.Here, we want to discuss those validation practices, potential issues, and point to aspects of TF regulation that are underrepresented in computational tools of TFA inference.We defined TFA as a TF's regulatory impact on its target genes, which is often summarized for a cell state or condition, and, hence, we focus primarily on validating TFgene relationships.These relationships serve as the basis for-or are the result of-tools we categorized as regulatory network-based tools (Section 2.1).
A common approach is to search for support of TF-gene interactions, which has sparked efforts in assembling databases like HTRIdb [117], IntAct [118], or TRRUST [119], which gather TF interactions via text mining, manual curation or integration of other databases and resources.Due to their nature, those databases are biased toward well-studied TFs and collect data with variable level of evidence [16,120].Others try to support the importance of their identified TFs and inferred target genes via GO enrichment [49,121], eQTLs in binding sites [122], or via measured TF binding [49,104,122].Con-cerning changes in TF binding behavior, condition-specific TF binding experiments can be used for evaluation and further substantiated by measurements of chromatin accessibility or other assays for regulatory activity.Frequently referred to as the gold standard, however, is TF perturbation data.Numerous publications validate their findings on public data or perform their own experiments [42, 72, 87-89, 119, 123-125].Others assembled databases, like KnockTF [126].Although those different approaches for validation are commonly used, there appears to be little agreement between them.In a benchmark study, Garcia-Alonso et al. gathered TF-gene interactions from four types of approaches (literature-based databases, ChIP-seq data, TFBS calling based on PWMs and inference from expression data) into a joint database called DoRothEA.They found the vast majority of TF-gene interactions to be supported by only one approach (96.3%) [120].
Even across literature-based databases there was little overlap.Benchmarked against three collected TF perturbation data sets, the different types of resources varied heavily in their accuracy.
A substantial discrepancy exists between TF ChIP-seq and TF perturbation data.A common assumption is that TF binding in the promoter is required for a functional TF-gene interaction, meaning that the TF is important for the gene's expression.It is the basis for many network-driven tools (Section 2.1).However, this notion that every binding event of a TF in a promoter leads to a transcriptional response was challenged by a TF perturbation screen in yeast, where only 3% of genes with a measured TFBS in their promoter were affected by the TF's knockout [127], and reproduced in more recent perturbation studies also in human and mice with varying but still small fractions of overlap [104,105,128,129].Conversely, the majority of responsive genes were not bound by the perturbed TF, posing a predicament for occupancy-based approaches (Section 2.2), since TF binding at sites of known regulatory importance appeared mostly ineffective.Multiple components have to be considered when it comes to the, seemingly, lack of regulatory function of TF binding.Aspects on the level of TFBS, the TFs themselves, as well as the target genes could influence the response to perturbation and contribute to this gap between bound and responsive genes.With regard to TFBS, low-affinity binding sites are frequently neglected, due to the difficulty in annotating them, although they can actually be informative and functional [15,17,130].They could account for a portion of responsive genes where no strong binding sites were found and thus were overlooked for being regulated by a TF.Also rarely considered are more distal TFBS outside the promoter.On the other side of the scale, high-affinity binding sites and regions with more bound TFs appear to be more sensitive to perturbations [128].Another layer of complexity is added by the redundancy of binding motifs.Compared to the number of TFs the amount of different DNA-binding domains is small, and binding specificity is additionally driven by mechanisms like combination of multiple DNA-binding domains, interaction with other proteins, DNA shape, epigenetic modifications at the target site, or compartmentalization [3,[14][15][16][17].Despite these specificity mechanisms, TF binding and function can still be redundant, and thus, the regulatory importance of a TF might only become evident if any compensatory mechanisms are abolished [131].For instance, Gitter et al. found a fourfold higher agreement between bound and responsive genes upon knockout when excluding TFs that had a redundant paralog [132].On top, they could show an increase of responsive genes when a potentially compensating TF was removed in double knockout experiments.Others also found compensation for TFs which were co-expressed with other TFs, or shared functional annotation terms [133,134].Such buffering mechanisms of TFs point to an aspect of transcriptional regulation which is heavily neglected in computational approaches: combinatorial action of TFs.Most prominently represented by the formation of multimers, TFs interact and depend on each other, as well as on other cofactors [14,15].Although some tools allow for the quantification of TF co-occurrence [135,136] perturbation experiments and built gradient boosted trees to predict which genes will change expression [138].Gene expression and gene expression variation were the most informative features.Similarly, but in a broader scope and not focused on TF perturbations, Sigalova et al. found that expression variation was predictive for differential expression between conditions, independent of the experimental design [139].Taken together, inherent properties of genes may explain non-responsiveness to perturbation of bound TFs, as well as responsiveness despite lack of TF binding.
Even across perturbation experiments it can be expected to detect different effects dependent on their design.Knockdown and overexpression studies can be variable in their efficiency of changing a TF's availability, while a full knockout consistently removes a TF.However, knockout experiments have been repeatedly found to elicit less profound changes than knockdown, as the loss-of-function was accompanied by compensatory transcription of related genes [140][141][142][143].This compensation can also serve as explanation as to why many healthy humans exhibit several loss-of-function mutations.Mechanistically it is described to take place on a transcriptional level, independent of functional compensation on protein level.Different studies suggest the key player to be premature termination codons that are generated from the mutated transgene sequence [140][141][142][143].For gain-of-function protocols it is also argued that they can cause more transcriptomic changes than loss-of-function [105,108,144].Another aspect potentially affecting the detectable effects is the timepoint of measurement after perturbation, which differs heavily between studies.While some works aim to capture more direct effects and measure shortly after perturbation (e.g., 5 min [105]), others wait longer to focus more on differentiation effects (e.g., 7 days [109]).
All in all, TF perturbation experiments are still the most evident and insightful source of TFA validation.It should be kept in mind however, that experimental design, compensatory mechanisms, gene states, and the chromatin environment impact their outcome.Particular care should be taken, when interpreting non-responsiveness of genes despite TF binding as non-functional.In other words, if perturbing a TF does not change the expression of its bound target genes, it does not necessarily mean that it is not important for the genes' regulation in homeostasis, but that its role might be taken over by other TFs, or that other compensatory mechanisms mask its function.

DISCUSSION
In this review, we presented an overview of available computational A recurring assumption in models is to expect a high TF expression to indicate regulatory importance.This could not be confirmed by perturbation studies [108,114], and neglects the discrepancy between RNA and protein levels and the impact of post-translational modifications [16,147,148].Analogously, the majority of models were not yet transferred to more general TFA tools.The dependency on other non-TF factors is also not well understood and is lacking in models.
Chromatin compartmentalization and phase-separated condensates are another relevant factor in gene regulation, as they create microenvironments and transcriptional hubs with specific conditions.
While TFs are thought to be important for the formation of such compartments, their function is also likely heavily affected by them, due to localized concentration of TFs and cofactors [15].How exactly condensates form and how they are composed is subject of ongoing research, and thus, TFA inference still assumes a uniform availability of TFs across the genome.
Incorporation of more data modalities is a central challenge for TFA inference.Currently, tools focus on few data types-most prominently on transcriptome and chromatin accessibility-and thus, can only capture a small fraction of all mechanisms that affect gene regulation.DNA methylation, post-transcriptional and -translational modifications, protein levels and their stability and localization, hold valuable information, but are currently underrepresented, due to lacking availability and missing knowledge of usability.
Although TFA inference tools vary in the data they use and albeit the difficult validation (Section 5), there have been efforts to benchmark their performance, especially for methods quantifying or identifying TF-gene interactions.Although unweighted GRNs do not represent TFA by our definition, their comparison can still be insightful, since they form the basis for a large fraction of tools.Some works provide dedicated frameworks for benchmarking, such as BEELINE [149] ornot limited to TFA-decoupleR [55].While the scope of tested data and acquisition of the ground truth between benchmarks differ, a common finding is that tools perform moderately at best and sometimes worse than random [55,120,[149][150][151][152][153].Variable efforts for parameter optimization could contribute to the modest performance.Some studies showed that accuracy can be increased by jointly integrating the output of multiple tools [55,150].Further, the intersection of highly ranked TFs or predicted regulons across tools is often low.The low performance and similarity emphasize the necessity for standard benchmarks and a critical view on the generation of simulation data and the accompanied limitations and assumptions.It would be insightful to further investigate how general strategies and principles, for example, linear versus non-linear models, affect the performance.Nonetheless, the results of computational TFA tools are frequently backed up by findings from the literature, and top-ranked TFs are often in line with their proposed role in the condition at hand [43-45, 52, 83, 152].Also, even if the highly ranked TFs contain false positives, it restricts the number of potential candidates and facilitates the prioritization TFs for experimental validation.
One of the key advancements in the field is the ongoing develop- Sequence-based deep learning models also hold great potential for shaping the field, as they provide the possibility to identify TF motif syntax rules.However, such knowledge has yet to be transferred to more generalizable tools, that do not require high computational power or need to be trained on the sample at hand.

F I G U R E 1
The complexity of transcription factor activity (TFA).The activity of a transcription factor (TF) can be influenced by numerous, potentially interacting factors: (A) different isoforms of the same TF can have different functions.(B) Chromatin accessibility, and thus the ability of a TF to bind its target, can be limited by DNA methylation or chromatin structure.(C) TFA can be influenced by post-translational modifications.(D) Many different interaction partners are involved in transcription, for example, other TFs, mediator proteins, and cofactors.All of which can potentially affect TFA.Created with BioRender.com.
Optional inputs in Other resource are enclosed by parentheses.If a tool requires a prior network, that network can be constructed from any source of data, such as literature-based databases or ChIP-seq catalogs.Mode of action means that tools predict whether a TF has an activating or repressive influence.Quantitative score indicates whether a tool returns a numeric TFA.
Tools examining genome-wide occupancy of TFs are also increasingly developed specifically for single cell data, or designed to work with bulk as well as single cell.For example, Schep et al. created a method to calculate accessibility deviations for peaks that share the same motif in single cell (or sparse) chromatin accessibility data[74].Their method, chromVar, analyzes the gain or loss of accessibility of motifs within peaks by calculating a z-score of the number of fragments that map to a motif in a cell, minus the expected number of fragments based on all cells.The mean and standard deviation for the scaling procedure are based on a background peak set that matches the GC content and accessibility, thus controlling for technical biases introduced by PCR amplification or variable transposase tagmentation conditions.Like previous methods, chromVar suffers from two major limitations:an open TF motif does not necessarily represent a binding event, and the same motifs can be shared by many different TFs.Argelaguet et al.attempt to tackle these problems by introducing a modified version of chromVar, called chromVAR-Multiome[75].In this method, binding sites are based on an in silico binding score that incorporates information from single cell RNA expression.For a motif to be considered, the correlation coefficient between its accessibility and the gene expression of its corresponding TF must pass a threshold.deBoer and Regev developed an R package, BROCKMAN, to unravel the dependencies between TF binding and chromatin accessibility or chromatin marks using gapped k-mer frequencies across multiple samples or single cells[76].They first construct a k-mer x samples matrix that contains the frequency of a particular k-mer associated with an open chromatin region or chromatin mark measured in each sam-ple.Next, using principal component analysis (PCA), they decompose this matrix into two other matrices: (1) k-mer x principal components (PCs)-indicating the contribution of k-mers to each PC and (2) PCsx samples-projection of samples into PCs, where the number of PCs is determined through a permutation test.In this manner, the k-mers represent a group of co-varying TFs that are identified through those recognizing multiple related k-mers.Finally, to infer the differential TFA, they examine the significant PCs associated with k-mers that were classified into "bound" or "unbound" for each TF.Applying a hypergeometric test helps assess the enriched or depleted status of a bound k-mer to a particular TF, enabling the identification of differential TFs between various conditions or cell types.deBoer and Regev also predict TF-TF interactions by identifying TFs that show covariation on the same PC.scFANuses deep learning on scATAC data and estimates TF binding in single cells[77].The convolutional neural network is fist trained on bulk ATAC-seq and ChIP-seq data to then predict TF binding in open chromatin regions of individual cells using their continuous scATACseq profile as input.The ATAC signal of similar cells is aggregated to reduce sparsity.TFA per cell is then quantified by summarizing the predicted occurrence of each TF across all scATAC-peaks.

work
DeepAccess and is based on an ensemble of convolutional neural networks.Upon training with cell type-specific open chromatin regions and randomly sampled closed DNA sequences, DeepAccess predicts whether a sequence is accessible.For estimating the influence of TF motifs, the predicted accessibility of a sequence set with a TF motif is compared to the same sequences without the motif.Applying a signed-rank test statistic on the predicted difference gives the socalled expected pattern effect.This metric can also be calculated for the difference between conditions, which the authors term differential expected pattern effect.The flexibility of the approach allows scientists to investigate combinations of TF motifs and varying spacing between motifs.Yuan and Kelley also model chromatin accessibility from sequence with convolutional neural networks, but in single cell data, specifically scATAC-seq[81].Similar to Behjati Ardakani et al., their tool scBasset treats single cells as the tasks in their multitask setting.The final layer of their network represents a latent cell embedding which is then combined with a linear transformation matrix to predict accessibility.In order to derive the single cell TFA, perturbed DNA sequences-with or without the TF motif of interest-are given to the model.The predicted activity delivered by the output layer of scBasset allows the user to investigate the role of the TF in each single cell.Meaning, if the TF was positively involved in regulating a cell, the sequence with the motif is expected to return an increased accessibility.
et al. published Perturb-seq, which combines scRNA-seq with pooled CRISPRbased perturbation, and applied it in murine immune cells and human cell lines to knock out TFs and other regulators [104].Their method uses lentiviral vectors to deliver the sgRNA together with an expressed guide barcode for identification.Their work comes with its own computational tool MIMOSCA to estimate the effect of sgRNAs on gene expression with a regularized linear model, also allowing to account for covariates like the number of transcripts in a cell or the cell state.Hackett et al. present an atlas of gene expression dynamics (IDEA: Induction Dynamics gene Expression Atlas), that provides data on the induction of more than 200 TFs via a synthetic promoter in yeast [105].They measured over multiple time points with the aim to increase the detection of direct regulation with rapid changes in expression, as opposed to indirect effects supposedly taking place at later time points.It also enabled examination of the dynamics of expression changes.Another example comes from Alda-Catalinas et al., who used CRISPR activation tools to estimate TFA, described examples for their usage, and discussed their validation and the concomitant shortcomings.Here, we want to further detail limitations of frequently used data and assumptions, underrepresented data, shortly summarize findings from benchmark studies, and give a perspective of how the field might develop in the future.Plenty of methods rely on experimentally measured or predicted TF binding information, either hit-based or non-hit-based (e.g., via motif enrichment).As consequence, they are restricted to TFs where such data is obtainable.Current assays for annotating TFBSs like ChIPseq, ChIP-exo, or CUT&RUN are limited to TFs where an antibody with high affinity is available.They require relatively large amounts of homogeneous cells, which makes it particularly challenging for tissues with a high diversity of cell types or cell states.In addition, only one TF can be measured at a time, hampering the acquisition of a complete TF binding annotation[19,20].Motif-based TFBS prediction, on the other hand, does not require data in the cell type of interest, but already defined motifs which were identified so far only for a fraction of all TFs.Using motifs on the DNA sequence alone comes with the downside of being prone to false-positives, which can be mitigated by the usage of chromatin accessibility data, but in turn requires an additional cell type-specific data modality[22].Furthermore, TFs show specific binding patterns.This led to classification systems like the distinction between pioneers (can bind closed chromatin and reshape chromatin), settlers (bind majorly to their motifs in open chromatin), and migrants (bind only a fraction of their accessible motifs)[145,146], but these classes are rarely considered for TFA inference methods.Redundancy of binding motifs of different TFs further hamper an accurate motif-based TFBS prediction.
assume a linear influence of TFA on gene regulation, which might be insufficient to describe biological complexity.Barely included in any model are specific characteristics of TFs, such as DNA-binding domains, other protein domains, their structure, or subcellular localization.Despite growing knowledge on such kind of information, TFs are treated as uniform features.On top, different isoforms of TFs are rarely considered, although they apparently differ in their regulatory action[109].Another aspect is the cooperative action of TFs, which is inherently complicated to capture in a model.The majority of tools define TFs as independent or assume an additive effect.While sequence-based deep learning models start to give insights into the motif syntax of TFs, including how motif spacing and co-occurrence could affect a region's activity or a target gene's expression, their computational cost and the laborious investigation of the syntax rules ment and availability of single cell technologies, accompanied by tools that analyze such data[61].While there are challenges regarding the sparsity and integration of multiple modalities, single cell resolution promises to overcome the inaccuracy of bulk data and to give insights into cell-specific mechanisms.It has the convenient advantage of providing a high number of samples for models to train on, given that the sparsity does not require a high level aggregation of individual cells.Another positive development is the increased feasibility and availability of large-scale TF perturbation data.While their interpretation and analysis is complicated by multiple aspects, as discussed in Section 5, they are still providing the most direct data for validation of TFA inference tools.
While many of the established approaches are built on the identification of TFs that show differential activity between two types of cells [80, 87-89], others are specifically tailored to infer new reprogramming strategies, like Taiji-reprogram [58].Hammelman et al. provide a comparative benchmark on tools for ranking reprogramming factors [86].In an applied example, Patel et al. measured transcriptomic and chromatin accessibility changes in the lifetime of murine spinal motor [84]bvre et al. leveraged TFA inference to identify a set of TFs that promote the transition from naive B cells to centroblasts in the germinal center[84].Similarly, Liu et al. used TFA analysis to identify TFs that promote the differentiation of T cells into effector and memory cells plete, only achieving partial characteristics of the target cell type.The process itself is highly time-consuming as the possible combinations to test are enormous.Hence, developing computational methods capable of accurately predicting effective TF sets is crucial for advancing this field.