Bias in data‐driven artificial intelligence systems—An introductory survey

Artificial Intelligence (AI)‐based systems are widely employed nowadays to make decisions that have far‐reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training, and deployment to ensure social good while still benefiting from the huge potential of the AI technology. The goal of this survey is to provide a broad multidisciplinary overview of the area of bias in AI systems, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well‐grounded in a legal frame. In this survey, we focus on data‐driven AI, as a large part of AI is powered nowadays by (big) data and powerful machine learning algorithms. If otherwise not specified, we use the general term bias to describe problems related to the gathering or processing of data that might result in prejudiced decisions on the bases of demographic features such as race, sex, and so forth.

provides a visual map of the topics discussed in this survey. This paper complements existing surveys that either have a strong focus on machine ethics, such as Yu et al. (2018), study a specific subproblem, such as explaining black box models (Atzmueller, 2017;Guidotti et al., 2019), or focus in specific contexts, such as the Web (Baeza-Yates, 2018), by providing a broad categorization of the technical challenges and solutions, a comprehensive coverage of the different lines of research as well as their legal grounds.
We are aware that the problems of bias and discrimination are not limited to AI and that the technology can be deployed (consciously or unconsciously) in ways that reflect, amplify or distort real world perception, and status quo. Therefore, as the roots to these problems are not only technological, it is also naive to believe that technological solutions will suffice. Rather, more than technical solutions are required including socially acceptable definitions of fairness and meaningful interventions to ensure the long-term well-being of all groups. These challenges require multidisciplinary perspectives and a constant dialogue with the society as bias and fairness are multifaceted and volatile. Nevertheless, as the AI technology penetrates our lives, it is extremely important for technology creators to be aware of bias and discrimination and to ensure responsible usage of the technology, keeping in mind that a technological approach on its own is not a panacea for all sorts of bias and AI problems.

| UNDERSTANDING BIAS
Bias is an old concept in machine learning (ML), traditionally referring to the assumptions made by a specific model (inductive bias) (Mitchell, 1997). A classical example is Occam's razor preference for the simplest hypothesis. With respect to human bias, its many facets have been studied by many disciplines including psychology, ethnography, law, and so forth. In this survey, we consider as bias the inclination or prejudice of a decision made by an AI system which is for or against one person or group, especially in a way considered to be unfair. Given this definition, we focus on how bias enters AI systems and how it is manifested in the data comprising the input to AI algorithms. Tackling bias entails answering the question how to define fairness such that it can be considered in AI systems; we discuss different fairness notions employed by existing solutions. Finally, we close this section with legal implications of data collection and bias definitions.

| Socio-technical causes of bias
AI relies heavily on data generated by humans (e.g., user-generated content) or collected via systems created by humans. Therefore, whatever biases exist in humans enter our systems and even worse, they are amplified due to the complex sociotechnical systems, such as the Web. 3 As a result, algorithms may reproduce (or even increase) existing inequalities or discriminations (Karimi, Génois, Wagner, Singer, & Strohmaier, 2018). Within societies, certain social groups may be disadvantaged, which usually results in "institutional bias" where there is a tendency for the procedures and practices of particular institutions to operate in ways in which some social groups are being advantaged and others disadvantaged. This needs not be the result of conscious discrimination but rather of the majority following existing norms. Institutional racism and sexism are common examples (Chandler & Munday, 2011). Algorithms are part of existing (biased) institutions and structures, but they may also amplify or introduce bias as they favor those phenomena and aspects of human behavior that are easily quantifiable over those which are hard or even impossible to measure. This problem is exacerbated by the fact that certain data may be easier to access and analyze than others, which has caused, for example, the role of Twitter for various societal phenomena to be overemphasized (Tufekci, 2014). Once introduced, algorithmic systems encourage the creation of very specific data collection infrastructures and policies, for  (Introna & Wood, 2004) which then change or amplify power relations. Algorithms thus shape societal institutions and potential interventions, and vice versa. It is currently not entirely clear, how this complex interaction between algorithms and structures plays out in our societies. Scholars have thus called for "algorithmic accountability" to improve understanding of the power structures, biases, and influences that algorithms exercise in society (Diakopoulos, 2015).

| How is bias manifested in data?
Bias can be manifested in (multimodal) data through sensitive features and their causal influences, or through under/ over-representation of certain groups.

| Sensitive features and causal influences
Data encode a number of people characteristics in the form of feature values. Sensitive characteristics that identify grounds of discrimination or bias may be present or not. Removing or ignoring such sensitive features does not prevent learning biased models, because other correlated features (also known as redundant encodings) may be used as proxies for them. For example, neighborhoods in U.S. cities are highly correlated with race, and this fact has been used for systematic denial of services such as bank loans or same-day purchase delivery. 4 Rather, including sensitive features in data may be beneficial in the design of fair models (Zliobaite & Custers, 2016). Sensitive features may also be correlated with the target feature that classification models want to predict. For example, a minority's preference for red cars may induce bias against the minority in predicting accident rate if red cars are also preferred by aggressive drivers. Higher insurance premium may then be set for red car owners, which disproportionately impacts minority members. Simple correlation between apparently neutral features can then lead to biased decisions. Discovering and understanding causal influences among variables is a fundamental tool for dealing with bias, as recognized in the legal circles (Foster, 2004) and in medical research (Grimes & Schulz, 2002). The interested reader is referred to the recent survey on causal approaches to fairness in classification models (Loftus, Russell, Kusner, & Silva, 2018).

| Representativeness of data
Statistical (including ML) inferences require that the data from which the model was learned be representative of the data on which it is applied. However, data collection often suffers from biases that lead to the over-or underrepresentation of certain groups, especially in big data, where many data sets have not been created with the rigor of a statistical study, but they are the by-product of other activities with different, often operational, goals (Barocas & Selbst, 2016). Frequently occurring biases include selection bias (certain individuals are more likely to be selected for study), often as self-selection bias, and the reverse exclusion bias; reporting bias (observations of a certain kind are more likely to be reported, which leads to a sort of selection bias on observations); and detection bias (a phenomenon is more likely to be observed for a particular set of subjects). Analogous biases can lead to under-or over-representations of properties of individuals, for example Boyd and Crawford (2012)). If the mis-represented groups coincide with social groups against which there already exists social bias such as prejudice or discrimination, even "unbiased computational processes can lead to discriminative decision procedures" (Calders & Zliobaite, 2013). Mis-representation in the data can lead to vicious cycles that perpetuate discrimination and disadvantage (Barocas & Selbst, 2016). Such "pernicious feedback loops" (O'Neil, 2016) can occur with both under-representation of historically disadvantaged groups, for example, women and people of color in IT developer communities and image datasets (Buolamwini & Gebru, 2018), and with over-representation, for example, black people in drug-related arrests (Lum & Isaac, 2016).

| Data modalities and bias
Data come in different modalities (numerical, textual, images, etc.) as well as in multimodal representations (e.g., audio-visual content). Most of the fairness-aware ML approaches refer to structured data represented in some fixed feature space. Data modality-specific approaches also exist, especially for textual data and images. Bias in language has attracted a lot of recent interest with many studies exposing a large number of offensive associations related to gender and race on publicly available word embeddings (Bolukbasi, Chang, Zou, Saligrama, & Kalai, 2016) as well as how these associations have evolved over time (Kutuzov, Øvrelid, Szymanski, & Velldal, 2018). Similarly for the computer vision community where standard image collections like MNIST are exploited for training, or off-the-shelf pretrained models are used as feature extractors, assuming the collections comprise representative samples of the real world. In reality, though, the collections can be biased as many recent studies have indicated. For instance, Buolamwini and Gebru (2018) have found that commercial facial recognition services perform much better on lighter male subjects than darker female ones. Overall, the additional layer of feature extraction that is typically used within AI-based multimodal analysis systems makes it even more challenging to trace the source of bias in such systems.

| How is fairness defined?
More than 20 different definitions of fairness have appeared thus far in the computer science literature (Verma & Rubin, 2018;Zliobaite, 2017); and some of these definitions and others were proposed and investigated in work on formalizing fairness from other disciplines, such as education, over the past 50 years (Hutchinson & Mitchell, 2019). Existing fairness definitions can be categorized into: (a) "predicted outcome," (b) "predicted and actual outcome," (c) "predicted probabilities and actual outcome," (d) "similarity based," and (e) "causal reasoning" (Verma & Rubin, 2018). "Predicted outcome" definitions solely rely on a model's predictions (e.g., demographic parity checks the percentage of protected and non-protected groups in the positive class). "Predicted and actual outcome" combine a model's predictions with the true labels (e.g., equalized odds requires false positive and negative rates to be similar among protected and non-protected groups). "Predicted probabilities and actual outcome" employ the predicted probabilities instead of the predicted outcomes (e.g., good calibration requires the true positive probabilities between protected and non-protected groups to be the same). Contrary to definitions (a)-(c) that only consider the sensitive attribute, "similarity based" definitions also employ non-sensitive attributes (e.g., fairness through awareness states that similar individuals must be treated equally). Finally, "causal reasoning" definitions are based on directed acyclic graphs that capture relations between features and their impact on the outcomes by structural equations (e.g., counterfactual fairness (Kusner, Loftus, Russell, & Silva, 2017) constructs a graph that verifies whether the attributes defining the outcome are correlated to the sensitive attribute). Despite the many formal, mathematical definitions of fairness proposed over the last years the problem of formalizing fairness is still open as well as the discussion about the merits and demerits of the different measures is missing. Corbett-Davies and Goel (2018) show the statistical limitations of prevailing mathematical definitions of fairness and the (negative) effect of enforcing such fairness-measures on group well-being and urge the community to explicitly focus on consequences of potential interventions.

| Legal issues of bias and fairness in AI
Taking into account the variety of bias creation in AI systems and its impact on society, the question arises whether the law should provide regulations for non-discriminatory AI-based decision making. Generally speaking, existing EU regulation comes into play when (discriminatory) decisions have been taken, while provisions tackling the quality of selected data are rare. For the earlier, the control of discriminatory decisions, the principle of equality and the prohibition of discrimination (Art. 20, 21 EU Charter of Fundamental Rights, Art. 4 Directive 2004/113 and other directives) apply. However, these provisions only address discrimination on the basis of specific criteria and require prima facie evidence of a less favorable treatment on grounds of a prohibited criterion, which will often be difficult to establish (Hacker, 2018). For the latter, the control of the quality of the selected data, with respect to "personal data" Art. 5 (1) GDPR, 5 stipulates "the principle of data accuracy" which, however, does not hinder wrongful or disproportionate selection. With respect to automated decision-making (Art. 22 GDPR), recital 71 only points out that appropriate mathematical or statistical procedures shall be used and that discriminatory effects shall be prevented. While the effectiveness of Art. 22 GDPR is uncertain (Zuiderveen Borgesius, 2018) it provides some safeguards, such as restrictions on the use of automated decision-making, and, where it is used, a right to transparency, to obtain human intervention and to contest the decision. Finally, some provisions in area-specific legislation can be found, for example, Art. 12 Regulation (EC) No 223/2009 for European statistics.

| MITIGATING BIAS
Approaches for bias mitigation can be categorized into: (a) preprocessing methods focusing on the data, (b) inprocessing methods focusing on the ML algorithm, and (c) post-processing methods focusing on the ML model. We conclude the section with a discussion on the legal issues of bias mitigation.

| Preprocessing approaches
Approaches in this category focus on the data, the primary source of bias, aiming to produce a "balanced" dataset that can then be fed into any learning algorithm. The intuition behind these approaches is that the fairer the training data is, the less discriminative the resulting model will be. Such methods modify the original data distribution by altering class labels of carefully selected instances close to the decision boundary  or in local neighborhoods (Luong, Ruggieri, & Turini, 2011), by assigning different weights to instances based on their group membership (Calders, Kamiran, & Pechenizkiy, 2009) or by carefully sampling from each group. These methods use heuristics aiming to balance the protected and unprotected groups in the training set; however, their impact is not well controlled despite their efforts for minimal data interventions. Recently, Calmon, Wei, Vinzamuri, Ramamurthy, and Varshney (2017) proposed a probabilistic fairness-aware framework that alters the data distribution towards fairness while controlling the per-instance distortion and by preserving data utility for learning.

| In-processing approaches
In-processing approaches reformulate the classification problem by explicitly incorporating the model's discrimination behavior in the objective function through regularization or constraints, or by training on latent target labels. For example, Kamiran, Calders, and Pechenizkiy (2010) modify the splitting criterion of decision trees to also consider the impact of the split w.r.t. the protected attribute. Kamishima, Akaho, Asoh, and Sakuma (2012) integrate a regularizer to reduce the effect of "indirect prejudice" (mutual information between the sensitive features and class labels). Dwork, Hardt, Pitassi, Reingold, and Zemel (2012) redefine the classification problem by minimizing an arbitrary loss function subject to the individual fairness-constraint (similar individuals are treated similarly). Zafar, Valera, Gomez-Rodriguez, and Gummadi (2017) propose a constraint-based approach for disparate mistreatment (defined in terms of misclassification rates) which can be incorporated into logistic-regression and SVMs. In a different direction, Krasanakis, Xioufis, Papadopoulos, and Kompatsiaris (2018) assume the existence of latent fair classes and propose an iterative training approach towards those classes which alters the in-training weights of the instances. Iosifidis and Ntoutsi (2019) propose a sequential fair ensemble, AdaFair, that extends the weighted distribution approach of AdaBoost by also considering the cumulative fairness of the learner up to the current boosting round and moreover, it optimizes for balanced error instead of overall error to account for class imbalance.
While most of the in-processing approaches refer to classification, approaches for the unsupervised case have also emerged recently, for example, the fair-PCA approach of Samadi, Tantipongpipat, Morgenstern, Singh, and Vempala (2018) that forces equal reconstruction errors for both protected and unprotected groups. Chierichetti, Kumar, Lattanzi, and Vassilvitskii (2017) formulate the problem of fair clustering as having approximately equal representation for each protected group in every cluster and define fair-variants of classical k-means and k-medoids algorithms.

| Post-processing approaches
The third strategy is to postprocess the classification model once it has been learned from data. This consists of altering the model's internals (white-box approaches) or its predictions (black-box approaches). Examples of the white-box approach consist of correcting the confidence of CPAR classification rules (Pedreschi, Ruggieri, & Turini, 2009), probabilities in Naïve Bayes models (Calders & Verwer, 2010), or the class label at leaves of decision trees (Kamiran et al., 2010). White-box approaches have not been further developed in recent years, being superseded by in-processing methods. Examples of the black-box approach aim at keeping proportionality of decisions among protected versus unprotected groups by promoting or demoting predictions close to the decision boundary (Kamiran, Mansha, Karim, & Zhang, 2018), by differentiating the decision boundary itself over groups (Hardt, Price, & Srebro, 2016), or by wrapping a fair classifier on top of a black-box base classifier (Agarwal, Beygelzimer, Dudík, Langford, & Wallach, 2018). An analysis of how to postprocess group-wise calibrated classifiers under fairness constraints is given in (Canetti et al., 2019). While the majority of approaches are concerned with classification models, bias post-processing has been deemed as relevant when interpreting clustering models as well (Lorimer, Held, & Stoop, 2017).

| Legal issues of mitigating bias
Pertinent legal questions involve whether modifications of data as envisaged by the pre-and in-processing approaches, as well as altering the model in the post-processing approach, could be considered lawful. Besides intellectual property issues that might occur, there is no general legal provision dealing with the way data is collected, selected or (even) modified. Provisions are in place mainly if such training data would (still) be personal data. Modifications (as well as any other processing) would need a legal basis. However, legitimation could derive from informed consent (provided that specific safeguards are met), or could rely on contract or legitimate interest. Besides, data quality could be relevant in terms of warranties, if a data provider sells data. A specific issue arises when "debiasing" involves sensitive data, as under Art. 9 GDPR special category data such as ethnicity often requires explicit consent (Kilbertus et al., 2018). A possible solution could be Art. 9(2)(g) GDPR which permits processing for reasons of substantial public interest, which arguably could be seen in 'debiasing'. The same grounds of legitimation apply when altering the model. However, contrary to data modification, data protection law would arguably not be applicable here, as the model would not contain personal data, unless the model is vulnerable to confidentiality attacks such as model inversion and membership inference (Veale, Binns, & Edwards, 2018).

| ACCOUNTING FOR BIAS
Algorithmic accountability refers to the assignment of responsibility for how an algorithm is created and its impact on society (Kaplan, Donovan, Hanson, & Matthews, 2019). In case of AI algorithms the problem is aggravated as we do not codify the solution, rather the solution is inferred via ML algorithms and complex data. AI accountability has many facets, we focus below on the most prominent ones that account for bias either proactively, via bias-aware data collection, or retroactively by explaining AI decisions in human terms; furthermore, we discuss the importance of describing and documenting bias by means of formalisms like ontologies.

| Proactively: bias-aware data collection
A variety of methods are adopted for data acquisition to serve diverse needs; these may be prone to introducing bias at the data collection stage itself, for example, Morstatter, Pfeffer, and Liu (2014). Proposals have been made for a structured approach to bias elicitation in evidence synthesis, including bias checklists and elicitation tasks that can be performed either by individual assessors and mathematical pooling, group elicitation and consensus building or hybrid approaches (Turner, Spiegelhalter, Smith, & Thompson, 2009). However, bias elicitations have themselves been found to be biased even when high quality assessors are involved and remedies have been proposed (Manzi & Forster, 2019).
Among other methods, crowdsourcing is a popular approach that relies on large-scale acquisition of human input for dealing with data and label scarcity in ML. Crowdsourced data and labels may be subject to bias at different stages of the process: task design and experimental setup, task decomposition and result aggregation, selection of workers, and the entailing human factors (Hube, Fetahu, & Gadiraju, 2019;Kamar, Kapoor, & Horvitz, 2015;Karger, Oh, & Shah, 2011). Mitigating biases in crowdsourced data becomes harder in subjective tasks, where the presence of varying ideological and cultural backgrounds of workers means that it is possible to observe biased labels with complete agreement among the workers.

| Describing and modeling bias using ontologies
Accounting for bias not only requires understanding of the different sources, that is, data, knowledge bases, and algorithms, but more importantly, it demands the interpretation and description of the meaning, potential side effects, provenance, and context of bias. Usually unbalanced categories are understood as bias and considered as sources of negative side effects. Nevertheless, skewed distributions may simply hide features or domain characteristics that, if removed, would hinder the discovery of relevant insights. This situation can be observed, for instance, in populations of lung cancer patients. As highlighted in diverse scientific reports, for example, (Garrido et al., 2019), lung cancer in women and men has significant differences such as etiology, pathophysiology, histology, and risk factors, which may impact in cancer occurrence, treatment outcomes, and survival. Furthermore, there are specific organizations that collaborate in lung cancer prevention and in the battle against smoking; some of these campaigns are oriented to particular focus groups and the effects of these initiatives are observed in certain populations. All these facts impact on the gender distribution of the population and could be interpreted as bias. However, in this context, imbalance reveals domain specific facts that need to be preserved in the population, and a formal description of these uneven distributions should be provided to avoid misinterpretation. Moreover, as any type of data source, knowledge bases and ontologies can also suffer from various types of bias or knowledge imbalance. For example, the description of the existing mutations of a gene in a knowledge base like COSMIC, 6 or the properties associated with a gene in the Gene Ontology, 7 may be biased by the amount of research that has been conducted in the diseases associated with these genes. Expressive formal models are demanded in order to describe and explain the characteristics of a data source and under which conditions or context, the data source is biased.
Formalisms like description and causal logics, for example, (Besnard, Cordier, & Moinard, 2014;Dehaspe & Raedt, 1996;Krötzsch, Marx, Ozaki, & Thost, 2018;LeBlanc, Balduccini, & Vennekens, 2019), allow for measuring and detecting bias in data collections of diverse types, for example, online data sets  and recommendation systems (Serbos, Qi, Mamoulis, Pitoura, & Tsaparas, 2017). They also enable the annotation of statements with trustworthiness (Son, Pontelli, Gelfond, & Balduccini, 2016) and temporality (Ozaki, Krötzsch, & Rudolph, 2019), as well as causation relationships between them (LeBlanc et al., 2019). Ontologies also play a relevant role as knowledge representation models for describing universe of discourses in terms of concepts such as classes, properties, and subsumption relationships, as well as contextual statements of these concepts. NdFluents (Giménez-García, Zimmermann, & Maret, 2017) and Context Ontology Language (CoOL) (Strang, Linnhoff-Popien, & Frank, 2003), represent exemplar ontology formal models able to express and combine diverse contextual dimensions and interrelations (e.g., locality and vicinity). Albeit expressive, existing logic-based and ontological formalisms are not tailored for representing contextual bias or differentiating unbalanced categories that consistently correspond to instances of a realworld domain. Therefore, expressive ontological formalisms are demanded to represent the contextual dimensions of various types of sources, for example, data collections, knowledge bases, or ontologies, as well as annotations denoting causality and provenance of the represented knowledge. These formalisms will equip bias detection algorithms with reasoning mechanisms that not only enhance accuracy but also enable explainability of the meaning, conditions, origin, and context of bias. Thus, domain modeling using ontologies will support context-aware bias description and interpretability.

| Retroactively: explaining AI decisions
Explainability refers to the extent the internal mechanics of a learning model can be explained in human terms. It is often used interchangeably with interpretability, although the latter refers to whether one can predict what will happen given a change in the model input or parameters. Although attempts to tackle interpretable ML have existed for some time (Hoffman & Klein, 2017), there has been an exceptional growth of research literature in the last years with emerging keywords such as explainable AI (Adadi & Berrada, 2018) and black box explanation . Many papers propose approaches for understanding the global logic of a model by building an interpretable classifier able to mimic the obscure decision system. Generally, these methods are designed for explaining specific models, for example, deep neural networks (Montavon, Samek, & Müller, 2018). Only few are agnostic to the black box model (Henelius, Puolamäki, Boström, Asker, & Papapetrou, 2014). The difficulties in explaining black boxes and complex models ex post, have motivated proposals of transparent classifiers which are interpretable on their own and exhibit predictive accuracy close to that of obscure models. These include Bayesian models (Li & Huan, 2017), generalized additive models (Lou, Caruana, Gehrke, & Hooker, 2013), supersparse linear models (Ustun & Rudin, 2016), rule-based decision sets (Lakkaraju, Bach, & Leskovec, 2016), optimal classification trees (Bertsimas & Dunn, 2017), model trees (Broelemann & Kasneci, 2019), and neural networks with interpretable layers (Zhang, Wu, & Zhu, 2018).
A different stream of approaches focuses on the local behavior of a model, searching for an explanation of the decision made for a specific instance . Such approaches are either model-dependent, for example, Taylor approximations (Kasneci & Gottron, 2016), saliency masks (the image regions that are mainly responsible for the decision) for neural network decisions (Ma, Yu, & Yue, 2015), and attention models for recurrent networks (Choi et al., 2016), or model-agnostic, such as those started by the LIME method (Ribeiro, Singh, & Guestrin, 2016). The main idea is to derive a local explanation for a decision outcome on a specific instance by learning an interpretable model from a randomly generated neighborhood of the instance. A third stream aims at bridging the local and the global ones by defining a strategy for combining local models in an incremental way . More recent work has asked the fundamental question What is an explanation? (Mittelstadt, Russell, & Wachter, 2019) and reject such usage of the term "explanation," criticizing that it might be appropriate for a modeling expert, but not for a lay man, and that, for example, humanities or philosophy have an entirely different understanding of what explanations are.
We speculate that there are computational methods that will allow us to find some middle ground. For instance, some approaches in ML, statistical relational learning in particular (Raedt, Kersting, Natarajan, & Poole, 2016), take the perspective of knowledge representation and reasoning into account when developing ML models on more formal logical and statistical grounds. AI knowledge representation has been developing a rich theory of argumentation over the last 25 years (Dung, 1995), which recent approaches (Cocarascu & Toni, 2016) try to leverage for generalizing the reasoning aspect of ML towards the use of computational models of argumentation. The outcome are models of arguments and counterarguments towards certain classifications that can be inspected by a human user and might be used as formal grounds for explanations in the manner that Mittelstadt et al. (2019) called out for.

| Legal issues of accounting for bias
While data protection rules affect both the input (data) and the output (automated decision) level of AI decision-making, anti-discrimination laws, as well as consumer and competition rules, address discriminatory policies primarily from the perspective of the (automated) decision and the actions based on it. However, the application of these rules to AIbased decisions is largely unclear. Under present law and the principle of private autonomy, decisions by private parties normally do not have to include reasons or explanations. Therefore, a first issue will be how existing rules can be applied to algorithmic decision-making. Given that a decision will often not be reasoned (hence the reasons will be unknown), it will be difficult to establish that it was made on the basis of a biased decision-making process (Mittelstadt et al., 2019).
Even if bias can be proven, a second issue is the limited scope of anti-discrimination law. Under present law, only certain transactions between private parties fall under the EU anti-discrimination directives (Liddell & O'Flaherty, 2018). Moreover, in most cases AI decision-making instruments will not directly use an unlawful criterion (e.g., gender) as a basis for their decision, but rather a "neutral" one (e.g., residence) which in practice lead to a less favorable treatment of certain groups. This raises the difficult concept of indirect discrimination, that is, a scenario where an "apparently neutral rule disadvantages a person or a group sharing the same characteristics" (Liddell & O'Flaherty, 2018). Finally, most forms of differential treatment can be justified where it pursues a legitimate aim and where the means to pursue that aim are appropriate and necessary. It is unclear whether the argument that AI-based decision making systems produce decisions which are economically sound can be sufficient as justification.

| FUTURE DIRECTIONS AND CONCLUSIONS
There are several directions that can impact this field going forward. First, despite the large number of methods for mitigating bias, there are still no conclusive results regarding what is the state of the art method for each category, which of the fairness-related interventions perform best, or whether category-specific interventions perform better comparing to holistic approaches that tackle bias at all stages of the analysis process. We believe that a systematic evaluation of the existing approaches is necessary to understand their capabilities and limitations and also, a vital part of proposing new solutions. The difficulty of the evaluation lies on the fact that different methods work with different fairness notions and are applicable to different AI models. To this end, benchmark datasets should be made available that cover different application areas and manifest real-world challenges. Finally, standard evaluation procedures and measures covering both model performance and fairness-related aspects should be followed, in accordance with international standards like the IEEE-ALGB-WG-Algorithmic Bias Working Group. 8 Second, we recognize that "fairness cannot be reduced to a simple self-contained mathematical definition," "fairness is dynamic and social and not a statistical issue." 9 Also, "fair is not fair everywhere" (Schäfer, Haun, & Tomasello, 2015) meaning that the notion of fairness varies across countries, cultures and application domains. Therefore, it is important to have realistic and applicable fairness definitions for different contexts as well as domainspecific datasets for method development and evaluation. Moreover, it is important to move beyond the typical training-test evaluation setup and to consider the consequences of potential fairness-related interventions to ensure long-term wellbeing of different groups. Finally, given the temporal changes of fairness perception, the question of whether one can train models on historical data and use them for current fairness-related problems becomes increasingly pressing.
Third, the related work thus far focuses mainly on supervised learning. In many cases however, direct feedback on the data (i.e., as labels) is not available. Therefore alternative learning tasks should be considered, like unsupervised learning or reinforcement learning (RL) where only intermediate feedback is provided to the model. Recent works have emerged in this direction, for example, Jabbari, Joseph, Kearns, Morgenstern, and Roth (2017) examine fairness in the RL context where one needs to reconsider the effects of short-term actions on long-term rewards.
Fourth, there is a general trend in the ML community recently for generating plausible data from existing data using Generative Adversarial Networks in an attempt to cover the high data demand of modern methods, especially DNNs. Recently, such approaches have been used also in the context of fairness (Xu, Yuan, Zhang, & Wu, 2018), that is, how to generate synthetic fair data that are similar to the real data. Still however, the problem of representativeness of the training data and its impact on the representativeness of the generated data might aggravate issues of fairness and discrimination. In the same topic, recent work revealed that DNNs are vulnerable to adversarial attacks, that is, intentional perturbations of the input examples, and therefore there is a need for methods to enhance their resilience (Song et al., 2018).
Fifth, AI scientists and everyone involved in the decision making process should be aware of bias-related issues and the effect of their design choices and assumptions. For instance, studies show that representation-related biases creep into development processes because the development teams are not aware of the importance of distinguishing between certain categories (Buolamwini & Gebru, 2018). Members of a privileged group may not even be aware of the existence of (e.g., racial) categories in the sense that they often perceive themselves as "just people," and the interpretation of this as an unconscious default requires the voice of individuals from underprivileged groups, who persistently perceive their being "different." Two strategies appear promising for addressing this cognitive bias: try to improve diversity in development teams, and subject algorithms to outside and as-open-as-possible scrutiny, for example by permitting certain forms of reverse engineering for algorithmic accountability.
Finally, from a legal point of view, apart from data protection law, general provisions with respect to data quality or selection are still missing. Recently an ISO standard on data quality (ISO 8000) was published, though not binding and not with regard to decision-making techniques. Moreover, first important steps have been made, for example, the Draft Ethics Guidelines for trustworthy AI from the European Commission's high-level Expert group on AI or the European parliament resolution containing recommendations to the Commission on Civil Law Rules on Robotics. However, these resolutions are still generic. Further interdisciplinary research is needed to define specifically what is needed to meet the balance between the fundamental rights and freedoms of citizens by mitigating bias, while at the same time considering the technical challenges and economical needs. Therefore, any legislative procedures will require a close collaboration of legal and technical experts. As already mentioned, the legal discussion in this paper refers to the EU where despite the many recent efforts, there is still no consensus for algorithmic fairness regulations across its countries. Therefore, there is still a lot of work to be done on analyzing the legal standards and regulations at a national and international level to support globally legal AI designs.
To conclude, the problem of bias and discrimination in AI-based decision-making systems has attracted a lot of attention recently from science, industry, society and policy makers, and there is an ongoing debate on the AI opportunities and risks for our lives and our civilization. This paper surveys technical challenges and solutions as well as their legal grounds in order to advance this field in a direction that exploits the tremendous power of AI for solving real world problems but also considers the societal implications of these solutions. As a final note, we want to stress again that biases are deeply embedded in our societies and it is an illusion to believe that the AI and bias problem will be eliminated only with technical solutions. Nevertheless, as the technology reflects and projects our biases into the future, it is a key responsibility of technology creators to understand its limits and to propose safeguards to avoid pitfalls. Of equal importance is also for the technology creators to realize that technical solutions without any social and legal ground cannot thrive and therefore multidisciplinary approaches are required.