The challenges of integrating explainable artificial intelligence into GeoAI

Although explainable artificial intelligence (XAI) promises considerable progress in glassboxing deep learning models, there are challenges in applying XAI to geospatial artificial intelligence (GeoAI), specifically geospatial deep neural networks (DNNs). We summarize these as three major challenges, related generally to XAI computation, to GeoAI and geographic data handling, and to geosocial issues. XAI computation includes the difficulty of selecting reference data/models and the shortcomings of attributing explanatory power to gradients, as well as the difficulty in accommodating geographic scale, geovisualization, and underlying geographic data structures. Geosocial challenges encompass the limitations of knowledge scope—semantics and ontologies—in the explanation of GeoAI as well as the lack of integrating non‐technical aspects in XAI, including processes that are not amenable to XAI. We illustrate these issues with a land use classification case study.


| 627
XING and SIEBER (GeoAI) (cf., Cheng et al., 2020). Our concern is that a credulous application of XAI to GeoAI may generate misleading interpretations. A healthy skepticism of an easy integration of XAI and GeoAI drives our research.
For a working definition, we characterize geographic explainable AI (GeoXAI) as: a set of approaches (from plain text and visualizations to algorithms) that integrate geographic structures and knowledge to improve the understanding of the geographic implications of AI for various audiences.
We use the term geographic structures to encompass geometric, topological, scalar and ontological attributes (Germanaitė et al., 2018). For knowledge, we foreground the importance of findings across the field of geography.
Contrary to Janowicz et al. (2022), we argue that we cannot ignore elements of geographic knowledge that are not easily transferred into AI or GeoXAI. Although our focus is on DNNs, we also refer to advances from work in knowledge graphs (KGs) (e.g., Mai et al., 2020;Rajabi & Etminani, 2022). We envision numerous potential challenges in realizing GeoXAI. These include the difficulty of selecting reference data/models, the shortcomings of gradients as explanation, the challenges of accommodating geographic scale, that inability to easily integrate topology and geometry into the explanatory process of GeoAI, the incompatibility of geography in XAI visualization, the missed opportunities of integrating geospatial semantics and ontologies, and the lack of acknowledging social and ethical aspects in XAI.
First, we review literature related to the use of XAI in geographic modeling, data analysis, and cartographic representation. The point of the article is not to provide a primer on GeoAI or XAI but to briefly describe core concepts and classify popular XAI approaches. Second, we develop a case study of land use classification to illustrate issues of applying XAI geographically. Then we discuss three general categories of challenges, focusing on DNN: from XAI computation, from GeoAI, and also from geosocial considerations like geospatial semantics and ontologies. Last, we summarize our findings and consider a future for achieving a GeoXAI.
XAI taxonomies also are studied for specific research domains, such as medical image analytics (Muddamsetty et al., 2021). Another way to taxonomize XAI is by function and medium (e.g., algorithms, visualization, audio, KGs, and plain text) (e.g., Rawal et al., 2021). There is even a taxonomy of taxonomies (Speith, 2022). Table 1 extends these works: we classify widely used XAI approaches (Column 1), with descriptions in Column 2.
Taxonomies highlight a challenge in how we bound the concept of explainability of AI. AI explainability can include any method to describe the model, input or outcome. XAI often concerns purely computational approaches but explainability can range from writing algorithms for XAI to publishing and describing a model in an algorithmic registry to stress-testing a model with different data as seen in AI auditing (Costanza-Chock et al., 2022;Gilpin et al., 2018). AI itself can be viewed as a kind of explanation; dimension reduction algorithms (e.g., linear discriminant analysis) can suggest the importance of certain input variables since they can collapse the number of variables while preserving their "essence" (Belaid et al., 2022). This extension of explanation blurs the boundary between the explainability and interpretability within XAI (Gunning et al., 2019). Explainability indicates the quality of explanation that is associated with the AI algorithm. Interpretability often is associated with prediction and is the level to which the AI algorithm (including the training process and outputs) is understandable by humans, TA B L E 1 The most popular algorithms and approaches for XAI and their utility in spatially explicit models.

Rule-based explanations
Embedding domain knowledge to assess the difference between given advice and a novel situation  Chen et al. (2021) focusing on end goals like trustworthiness and transparency (Kaur et al., 2022). Further characterizations of explainability can include a legal right to a plain text understandable by a lay person (Edwards & Veale, 2017). Indeed, XAI is user dependent: adequate XAI results for one user may be opaque to another, even to computer scientists (Speith, 2022). Langer et al. (2021) reviewed XAI approaches with respect to different stakeholders and emphasized the heterogeneity of social context in AI explanations. Similar research in GeoAI has yet to be conducted.
Before we consider specific cases of integrating XAI into GeoAI, a brief explanation of core concepts in XAI is warranted. Local XAI approaches tend to illustrate the outcome generation process given individual input features or variables; global XAI methods provide an overall explanation of an AI system's behavior (e.g., rules, inputs, and features leading to outcomes). Model-specific XAI is designed for a single model or group of models (e.g., Decision Tree). Model-agnostic approaches do not depend on the structure of the model itself but base explanations on the inputs and outputs of XAI (e.g., SHAP and LIME). Then there is the when and where one applies XAI: ante hoc XAI (attribution to inputs/outputs in the pre-modeling phase) and post hoc XAI (post-modeling phase, e.g., LIME).
Researchers also distinguish between white-and black-box methods; the former concerns AI models that are simple and self-explanatory and the latter considers opaque models and data handling prevalent in DNN (Dwivedi et al., 2023). All are used in GeoAI, as will be seen below.
The use of XAI within GeoAI is still quite young (e.g., Xing & Sieber, 2021).  Amiri et al. (2021) and Parmar et al. (2021). LIME approximates the predictions of the black box model using a local interpretable model, such as a linear regression, with a specific input variable(s). Parmar et al. (2021) explained how land use patterns could predict parking duration. Amiri et al. (2021) demonstrated how XAI could be used to retune the model. They used LIME for hyperparameter tuning to increase the prediction accuracy of household transportation energy consumption. Although they considered geographic issues, their explanation approaches did not account for geographic structures like topology. This represents a significant challenge to integrating XAI and GeoAI, which we explore in Section 4.2. XAI is frequently visualized as maps, whether salience 1 maps (Zhou et al., 2021) or heatmaps (Mamalakis et al., 2022). XAI visualizations attempt to glassbox the hidden layers, in which the attribution of certain features towards final results are depicted as intensities or colored regions in Cartesian space (Das & Rad, 2020). KGs warrant special attention since they are important to GeoAI, although not currently GeoXAI. Here a knowledge graph refers to a semantic network of, for example, geographic objects, spatial events, or places, and the labeled relationship among them (Ma, 2022). Challenges and approaches of integrating KGs and XAI techniques have been summarized by Rajabi and Etminani (2022) and Lecue (2020); the authors argued that KG can improve post-hoc explanations by representing them as a computational ontology of classes. Ganesan et al. (2020) tailored GNNExplainer to interpret graph link predictions in social media. These hierarchical graphs were considered preferable to heatmaps or plain text. Research into embedding geospatial knowledge within KG for ante-hoc explanations has been proposed (Karalis et al., 2019) but, as will be seen in Section 6, this problem still remains challenging.
Finally, a broad range of XAI approaches have yet to be explored in or are problematic for GeoAI, which is why Columns 3 and 4 may be blank. For example, the XAI could be anchored to specific locations (e.g., O'Hare Airport near Chicago) to geographically ground the explanation. Some are not geofriendly, even if they are applied to geographic problems. Permutation explanations can distort the geometry and topology of the input data.
Gradient-based explanation approaches can violate geographic boundaries, which can be seen from our land use classification case study below. We will discuss specific cases of explanation references and gradients in Section 4. Our point, suggested by our critiques of some existing applications, is that GeoXAI is more than simply applying a common XAI method to a GeoAI study.

| A C A S E S TUDY OF X AI IN G IScience
To illustrate the challenges of integrating XAI and GeoAI, we conducted a case study on land use classification.
Land use classification is considered a typical GIScience/remote sensing application and has been studied extensively using GeoAI (e.g., Vali et al., 2020). Land use classification faces common geographic challenges, such as sensitivity to resolution and spatial extent of imagery, and detection and labeling of features. These all offer opportunities to investigate the demands land use classification places on GeoXAI. Land use classification studies often require place-based context, which distinguishes them from general computer vision research. Classifications therefore comply with Janowicz et al.'s (2020) argument that GeoAI models should be both spatially explicit and should change if the studied phenomena changes location. We performed our land use classification with a DNN called CNN, similar to other GeoAI studies (e.g., Abdollahi & Pradhan, 2021;Guo et al., 2021;Vali et al., 2020).
We selected SHAP because it is a widely used XAI algorithm in GeoAI (e.g., Abdollahi & Pradhan, 2021;Li, 2022;Simini et al., 2021;Yang et al., 2021). SHAP quantitatively measures the attribution of features (e.g., image patches or text tokens) by assessing what would be required to nudge original pixel values toward the results (e.g., classification labels or prediction values) (Lundberg & Lee, 2017 For illustration purposes, we chose one correctly classified example, a runway, and one misclassified example, an overpass labeled a mobilehomepark. We examined two layers instead of all layers to simplify the discussion below and because most layers will show no activation. For example, we chose Layers 2 and 9, which showed the most obvious activation and which aided our discussion of visualization (Section 5.3). Two layers allowed us to illustrate granularity and extent issues (Section 5.1). Figure 1 shows visualized SHAP values at Layer 2 and Layer 9 of VGG16 for the top five classification labels. As one can see, the range of SHAP values varies considerably. At initial convolution layers, SHAP values tend to be low and tend to increase with the number of convolutions. The very small range in Figure 1a differs little from the example in the official repository of SHAP (https://github.com/ slund berg/shap/blob/maste r/noteb ooks/image_examp les/image_class ifica tion/Multi -input %20Gra dient %20Exp laine r%20MNI ST%20Exa mple.ipynb).
In Figure 1a, the SHAP value for the runway-runway comparison shows that the image patches extracted by the CNN contributed positively to the correct label. SHAP rejected the river classification label (runway-river), showing negative contributions of patches to the river label. Figure 1a shows that, for the second convolutional layer of VGG16, there was no predictive value identified by SHAP for other categories (e.g., runway-overpass). Examination of the underlying feature patches in our case suggests the difficulty of GeoAI to infer complex land use patterns and the way in which VGG16 was likely misled by the co-occurrence of green space and cement.
The overpass in Figure 1b emphasizes the necessity of inserting geographic knowledge into the explanation as well as the AI. For example, one could explain parts of an overpass as a runway. Indeed, there are airports with aircraft bridges that bring aircraft traffic over highways, such as at Chicago's O'Hare Airport.
Our case study can help us begin to understand the challenges of integrating GeoAI and various XAI approaches, whether SHAP or other methods. We divide the critiques into three main areas, computational challenges that are agnostic to geography, specific problems induced by GeoAI, and finally, the semantics and societal challenges of trying to explain GeoAI.

| CHALLENG E S INTRODUCED BY X AI
As illustrated in our land use classification case study, XAI algorithms could provide insights into the inputs, outcomes and model activity to increase the understandability, transparency, traceability, causality, and trust in AI outcomes (e.g., Holzinger et al., 2020). XAI can reveal which variables or features (e.g., patches in Figure 1) are important and at which stage of the analysis they are important, with respect to the intermediate/final results of DNN. It could highlight biases in input data, processing, performance metrics, and post-processing (Arrieta et al., 2020). Irrespective of the domain, common computational challenges in XAI should be discussed first. As will be seen, GeoAI adds complexity to these existing challenges.
Computational challenges tend to be model-agnostic. They may be applied to a geographic problem and best explained by geographic data (input, output) but they treat the model or algorithm as a blackbox. That blackbox will likely have no understanding of geographic structures. We identify two most common XAI computational challenges, the explanation reference and gradient calculation.

| Challenges in explanations that require references
SHAP provides a snapshot of the challenges introduced by a facile integration of XAI and GeoAI but it is far from the only XAI approach that illustrates a reliance on an appropriate reference against which to understand what is activated within a DNN (Samek et al., 2021). The need for a reference appears in local surrogate explanations as well (possibly for some layer-based, neuron-based, and adversarial explanations). References could be datasets, feature patches or individual points. Most XAI algorithms require reference points to function as a baseline of model explanation (Xu et al., 2019). Serving as kind of a null hypothesis, reference data points or datasets are those elements where the XAI results are selected to measure a neutral contribution of neurons or layers to the output (Samek et al., 2021). A good reference should neither shift the explanation towards correct classification or mis-classification.
In the land classification case study, we used the unconvoluted image dataset as the reference for explanations. Figure 2 shows examples of popular methods to generate neutral references. The original image (Figure 2a) and the black uniform image (Figure 2b) are considered de-facto baselines (Sundararajan et al., 2017). A blurred baseline image (Figure 2c) is often used to enhance the interpretation of higher-level feature attributions but generate distortions at the edges of objects (Fong & Vedaldi, 2017). A baseline image generated via Gaussian noise (Figure 2d) is often used in Generative Adversarial Networks and can provide better explanation for lowfrequency features (Sturmfels et al., 2020). It should be noted that the addition of Gaussian noise can alter geometric structures of existing objects needed in the process of explanation (e.g., the distortion of the broken center line in Figure 2d).
Neutral references in XAI are not designed to account for geographic structures like coordinates, distances, and projections and we have yet to see them employed for GeoAI. Geographic attributes can be neglected if the sole goal is higher accuracy in DNN based on knowledge gained from XAI (Goodfellow et al., 2015). If we use a blurred reference (Figure 2c), we likely will lose the bright center line as a positive contribution towards classifying the runway (seen in Figure 2a). In this way, GeoAI can insufficiently distinguish itself from computer vision without acknowledging "spatial is special." A potential solution would be to synthetically create a neutral reference that preserved the original coordinates and geometry, such as another linear feature. Considering that many linear features might partially activate as a runway (e.g., overpass, canal), the reference would need to be assessed for its lack of activation.

| Gradient-based XAI challenges
We consider gradients since they are widely used XAI approaches for GeoAI. More broadly, gradients are a prevalent optimization approach in training DNNs (i.e., parameter optimization) and are also employed for feature to feature attribution comparisons (Bengio, 2012). Additionally, XAI approaches utilize gradients to assess the impact of the number of layers on accuracy as increasing the number of layers in a DNN can improve model performance. Increasing the number can create shattered gradients, differentials among gradients that decay as the

| CHALLENG E S INTRODUCED BY G eoAI
Here we consider the difficulties in AI that cannot accommodate geographic data structures and which can be compounded by XAI. As a result, the explanation may be insufficient for GeoAI. This section includes accommodating scale, handling topology and geometry, and reconciling different conceptions of mapping. These could be a mixture of model-agnostic and model-specific XAIs. As GeoXAI relies on geographic structure and knowledge in the explanation, we argue that these additions will inevitably shift usage from model-agnostic XAI approaches towards model-specific, or even location-specific model explanations.

| Challenges in accommodating spatial scale
Scale is an innate concept in GIScience and plays a pivotal role in GeoAI (Kang et al., 2019;Liu & Biljecki, 2022).
The challenge is whether we can introduce multiscale analysis into XAI. Two aspects of scale can be considered, granularity and scope (Goodchild, 2011). Granularity refers to the level of details about geographic information from the input and, through the layers, to output. Scope represents the spatial extent of study areas or data analytic windows that, in DNN layers, could be considered as the size of receptive fields (Jacobsen et al., 2016). Liu and Biljecki (2022) pointed out how variations in scale could heavily impact model performance in GeoAI. They further explored the Modifiable Areal Unit Problem (MAUP), which is considered a geocoding issue in GeoAI but could influence the effectiveness of GeoXAI as well.
In our case study, we coded the convolutions to reduce granularity and increase the spatial extent for feature extraction at each layer and we visualized two layers to illustrate the granularity challenge in XAI (see Figure 1).

Convolutional layers as granularity reflect a typical means of handling spatial scale in GeoAI applications using
CNNs (e.g., Zhao et al., 2017). In Figure 1b, the four large red patches centered in the SHAP result covered runway areas but shifted the classification results toward "mobilehomepark." With Figure 1b, we also observed the potential impact of the MAUP in terms of zonal aggregation. Slight changes in zonal aggregations (areas adjacent to the object) can affect not only classification but also explainability (e.g., Ryo et al., 2021). Others have reported the effect of scope in variants in gradients, where a greater spatial extent utilized in Grad-CAM++ improves explainability (Chattopadhay et al., 2018).
Accommodating scale reflects the confusion in terminology of words like granularity, scope, features, local and global in geography as compared to computer science. For example, XAI has been employed for spatial clustering studies (Cilli et al., 2022) and spatial clustering researchers recently have begun merging local XAI outputs into global explanations (Gobbo et al., 2022;Peng, Li, et al., 2022). However, their use of global and local in XAI is a-spatial. A search of the literature on XAI finds scale described as a quality of explanation, that is the level of explainability for a given audience (e.g., Gerlings et al., 2022). Scale can be described as an explainability scope, represented in terms of global explanations (XAI for the whole model), local explanations (explanations for a given portion or an instance of the input data) (Das & Rad, 2020). Care must be taken as one integrates these computational, cognitive science or GIScience concepts of scale in XAI.
To integrate scale into XAI, we can utilize model-agnostic or model-specific approaches. We for example can add input data at different resolutions to the GeoAI or, for GeoXAI, vary the granularity of the reference data. Model-agnostic approaches will likely challenge interpretation because some features might vanish at coarse resolutions in terms of explaining a specific activation. Given scaling operations in DNNs, a differently scaled reference (see Section 4.1) may be preferred but we will lose some SHAP patches with upscaling. He et al. (2015) approached granularity heterogeneity with a method called spatial pyramid pooling. They added a pyramid pooling layer atop the last convolution layer, which essentially shifted the DNN towards modelspecific. Zheng and Ding (2020) argued for combining spatial pyramid pooling and an XAI at each layer; only then would the XAI offer a fulsome explanation. In some sense, XAI amplifies all the scale-based challenges inherent in any spatial analysis. We argue that a GeoXAI needs to modify the internal model structure, not just the input data.

| Handling topology and geometry in GeoAI computation
We use topology as a shorthand for positions of spatial features in relation to each other. By geometry, we refer to size, shape, position, and dimensions. We acknowledge that there are problems in AI handling of spatial data.
But, we argue, little attention has been paid to XAI methods that consider topology and geometry in a way that satisfies GeoAI (cf., Dombrowski et al., 2019).
Explanation may require substantial neural network restructuring and retraining of the underlying DNN.
A gradient-based XAI method like CAM can force layer modification (i.e., the output of the final convolutional layer is fed into an additional global average pooling layer followed by a linear dense layer for class activation calculation). The retraining of SHAP is based on sampling/resampling of training data to approximate conditional expectations with certain features held out. Consequently, the neighbor pixels/features may be missing, thus a "contained within" would become "adjacent" (Samek et al., 2021). For XAI, shapes and distances can be distorted and the explanation will usually be sufficient. Shape and distance could be preserved as input variables to a GeoAI but that limits their utility as explainers, especially if a more robust GeoXAI may be model-specific.
By perturbing values of the input (e.g., pixels), LIME seeks to explain the behavior of a classifier "around" the object being predicted. Guidotti et al. (2018) stretched the concept of around. They proposed modifications to LIME, which imposed regular spatial extents for explanation output generation using a grid-based tessellation technique. Their LIME regularized the perturbations (or absences) as a form of explanation that paved a path for a geometry-aware XAI.

| Challenges of using geovisualization as an explanation
As suggested above, a fundamental challenge of geovisualizing XAI results lies in the fact that XAI methods are often feature-based (where features are activated elements) and not location-based as in GIScience. Well-cited XAI visualization tools (e.g., https://github.com/yosin ski/deep-visua lizat ion-toolbox) display images as ideal types but lose geographic context like georeferences. In Section 2, we briefly described Matin and Pradhan's (2021) research on XAI to explain earthquake damage. XAI and DNN both extract common features (here, image patches) of all earthquake damaged buildings at any given layer. XAI techniques lose georeferences that could allow for simultaneous interpretation of location and patches.
Sometimes mapping is considered an XAI, a mode of explanation, and sometimes the mapping is done to visualize another XAI output. Our case study used salience maps as an example of the latter. In Figure 1, red and blue colors depicted regions of convolution operations. We can only present positive and negative contributions at given regions while possessing no idea of how they interact with their neighborhoods hierarchically across DNN layers. It is important to note that most people will work with pre-existing geovisualization libraries. Our salience maps were part of the same library, out of which we obtained the SHAP algorithm. The lack of customization as well as the lack of holistic visualizations are general problems in XAI visualizations like salience maps.
To combine cartographic elements in the delivery of XAI results, one could, for example, incorporate text, cartographic symbology, and contour lines to further enhance the geographic explanation within salience maps.
Similar to spatial pyramid pooling, for CNNs we could add a label layer atop the last convolution. GeoAI has been applied to cartographic style transformation and optimization (Ganguli et al., 2019). Knowledge from spatial cognition research (Freundschuh & Egenhofer, 1997) could become key to capturing unique geovisualization provided as explanations in XAI.

| CHALLENG E S INTRODUCED BY G EOSO CIAL K NOWLEDG E
GeoXAI is social as well as technical. In addition to semantics, GeoXAI also has ethical connotations, varying by different types of audience, who may have every reason to question the trustworthiness of an explanation.

| Geographic semantics and ontology challenges
How those interested in geography understand a road, either partonomically or semantically, tends not to be picked up in a DNN nor expressed in an XAI. The knowledge required for GeoAI can be broader and more complex than prosaic AI tasks. In computer vision, DNN has been largely applied for specific tasks, such as cat/dog classification from images or videos or natural language processing from large corpora. Many tasks are location independent and the larger corresponding geographic context may be unnecessary. GeoAI tasks usually require explicit consideration of locations, for example, relationships among neighborhoods and neighborhood change over time (Hu et al., 2019). In remote sensing-based land use change detection, decisions are not only associated with pixel value differences among images acquired at different times but also topological and semantic definitions regarding land use changes (Xing & Sieber, 2016). Autonomous driving systems not only depend on current traffic conditions, but also are subject to local transportation regulations (Atakishiyev et al., 2021;Garcia Cuenca et al., 2019). Before we even consider specific challenges in GeoXAI, we recognize the challenges of integrating geographic concepts into DNN, including ontologies around physical geographic features as well as semantics in time and mobility, topology, geometry, and cultural context.
To highlight the semantics challenges within GeoXAI, we revisit our case study of the land use classification.
The UCMLU datasets contain only 21 class labels. Harkening back to Arrieta et al. (2020), the interpretation of concepts such as grassland and cement pavement will differ by audience. A group of trees can be classified under green space, park, or even forest; whereas a small water area can be labeled as pond, lake, or even meander. Li et al. (2021) argued that appropriate labels should originate from geography and not from feature similarity to existing labeled data in computer vision tasks and training datasets. Current XAI techniques can provide a fitness score for each individual classification label but XAI cannot easily suggest whether labels are appropriate for a given explanation or whether additional labels are required in the training and application of given GeoAI methods.
In the case study, SHAP values show the attribution of features towards the classification results and not the likelihood of spatiotemporal-specific variations (e.g., different viewing angles, seasonal conditions, and culturally contingent building practices). It could be argued that these variations represent a problem of insufficient training data or categories. But why are we looking for more data to fix this problem as opposed to enacting rules that could reduce the training dataset and a constant need to retrain? We could include a partonomy ontology in which an overpass would be composed of pavement, edges, centerlines, along with perpendicular or parallel (e.g., ramps) similar composites. That way we do not delegate to the algorithm the task of inductively inferring an ontology and interpreting the object.
The misclassification in Figure 1b may be acceptable as SHAP values are based on corresponding local features. However, we could not attach labels, such as overpass or runway, to the extracted patches because intermediate image features in the VGG16 layers were computationally but not geographically meaningful. A possible improvement on SHAP is causal SHAP (e.g., Banerjee et al., 2021), which resembles a hierarchical graph-based approach to capture correlation among features in attribution analytics and then order them in a hierarchy of importance, which could better explain the wrong/right decision-making.
A KG can function as an explanation in GeoAI, for results like misclassification or false positives (Li, 2020). In certain instances, the KG functions as the GeoAI so instead of using XAI forensically to interpret the outcomes of algorithms like DNNs, the algorithm and explanation could be fused . Ma (2022) suggested that KG in AI could help explainability by generating a quick overview of major entities, relationships, and structures of the geospatial topics. The author recommended adding an explicit semantic layer, essentially a KG, to traditional DNNs. Rožanec et al. (2022) proposed a graph-based XAI technique to explain the demand of autonomous vehicles in Europe. Their ontology-based KG abstracted market features as linked events to which they argued features could easily be added. Futia and Vetrò (2020) proposed a neural-symbolic integration method to improve KG for knowledge matching (e.g., mapping from neuron behavior in DNN to entities in the corresponding KG).
Although Futia and Vetrò (2020) argued that injecting KG into DNN and mapping DNN functionalities to KG (or hierarchical KGs) could achieve a KG-based XAI, constructing a complex semantic mapping or an ontology is time-consuming.
Investing the time to integrate geographic semantics into XAI could shift model-agnostic approaches towards model-specific explanations. For instance, we could develop additional rules in which XAI constrains layers (e.g., semantic input sampling offered by Sattarzadeh et al., 2021). In their survey paper on KG, Majic et al. (2017) illustrated the complexity of determining semantic similarities of OpenStreetMap tags. They pointed out the potential non-transferability of semantics across geographic regions and the difficulty in inferring intent if the documentation is not in computer-analyzable form. This implies a level of non-automated subjectivity intrinsic to AI, GeoAI, and GeoXAI that researchers may be reluctant to acknowledge. Even computational approaches like Dodge et al. (2018) relied on interviews as a base for the XAI, suggesting stronger involvement of qualitative content and social scientists in XAI research.

| Social challenges
DNNs are increasingly embedded in almost all aspects of our lives, whether global climate change, national security and policing, or personal details like our employment, credit card access or medical diagnoses. The social implications of DNNs are dramatic. Many of the examples of egregious effects of AI are geographic (e.g., mapping of hotspots from predictive policing). Location (e.g., areas, communities) is strongly collinear with race and ethnicity. This means that GeoAI can affect racial and income-based discrimination, violate privacy, and enable geosurveillance (Sieber, 2022). Consequently, the burden placed on GeoXAI to interpret, explain, and ensure trust and transparency is high.
The first phrase in Arrieta et al.'s (2020, p. 85) definition of XAI is "given an audience," emphasizing the target audience-dependent nature of XAI. Whether plain text or computational, explanation rests on assumptions regarding literacy, cognitive load, and access to information; explainability overall reveals an ontological disconnect between subjects of an algorithm and developers of an algorithm (Edwards & Veale, 2017). GeoXAI presumes a uniformity in understanding (e.g., an undifferentiated public) instead of variances by geographic and cultural communities. The problem is that XAI methods are only increasing in complexity, many times to optimize the DNN. Speith (2022Speith ( , p. 2240) points out that "since it is commonly assumed that ante-hoc explainable models do not achieve satisfying performance, opaque models are frequently used. These models are so complex that they are black boxes for humans, even eluding the understanding of experts." Characterizations of explainability may conflict with competing demands, for instance of accuracy and privacy (Kazim & Koshiyama, 2021). To be explainable, we may need to reveal information about marginalized groups that could then be used to identify individuals and violate privacy (Royal Society, 2019, p. 21). XAI is promoted in GIScience as a way to improve trust, sometimes called trustworthy AI (Kaur et al., 2022;Solís et al., 2020). For instance, Selvaraju et al. (2017) hoped that the visualization component of Grad-CAM could increase trust in the model. In turn, trustworthy AI becomes the vehicle to ensure public and end-user acceptance of AI. We know from critiques of GeoAI (Sieber, 2022) that many legitimate reasons exist for marginalized communities to distrust AI.
Marginalized communities have been disproportionately subjected to surveillance by the public and private sector, for example, geosurveillance afforded by facial recognition technology, doorbell and other camera systems, and targeted mapping produced by predictive policing. AI Ethics scholars are increasingly skeptical of computational approaches, for example computational debiasing of input and training data (Balayn & Gürses, 2021). These concerns cannot necessarily be ameliorated by computational XAI.
GeoXAI covers not only comprehension in an explainability sense but also attribution in an interpretation sense. XAI can be used to explore the role of variables in the training and testing of data. It can reveal the image segments, patches, and features that are used to arrive at a given outcome. XAI has become a prime method to reveal discrimination, for example racial discrimination that (mis-)identifies location as important for predicting crime. DNNs can not only discriminate but GeoXAI can fail to capture that discrimination because it does not explain whether the predictions were followed (or not) (police are notorious for ignoring police technologies) (Lally, 2021). As systems become more blackboxed, a specific XAI method could be manipulated to make a DNN appear unbiased when it is not (Hutson, 2021).
Recent research on XAI is transitioning from a purely quantitative and statistical approach towards a humancentric explanation (Ehsan et al., 2021). Even human-centric approaches can be highly computational, for example, focusing on improving the UI/UX (user interface/user experience) (Lepri et al., 2021). The discipline of geography has a long tradition of human involvement in geospatial technologies via approaches like Public Participation/ Participatory Geographic Systems (P/PGIS). P/PGIS lays out various approaches to engage marginalized peoples and can be viewed as a step towards co-producing explainability as well as trust (Sieber, 2006). Integrating XAI into GeoAI presumably would incorporate the lessons learned from subfields like P/PGIS and also borrow from emergent participatory methods in AI Ethics (Balaram et al., 2018).

| CON CLUS ION
GeoXAI provides a potential set of approaches to glassbox ML and improve both the performance and explanatory power of GeoAI. We argue, however, that GeoXAI cannot be achieved by simply submitting GeoAI results to an XAI method. At minimum the XAI will likely only address performance and not offer geographically informed explainability. We presented three categories of issues that should be addressed to achieve the explainability from a geographic perspective: challenges introduced by the XAI, by the GeoAI, and by semantics and society. Given the sheer breadth of explainability, some of these challenges are more abstract than others.
We already see progress in the more technical part of GeoXAI; accommodating the social aspects of GeoXAI is less certain. Cartographic elements, such as symbology, color ramps, and contour lines, could improve the interpretability and explainability of XAI output. Geographical structures and knowledge are beginning to be integrated into the design and development of XAI techniques. We can, for example, use the Wikipedia KG, which contains ontologies (e.g., road lanes and sideways) to georegister SHAP results (Sarker et al., 2020). This may improve the explanation of semantics in the classification of overpasses in our case study, which could be visualized via a graph visualization platform like Gephi, for better understanding of topology among various image patches (Bănică & Croitoru, 2020).
There also are opportunities in GeoXAI to adopt ensemble approaches, which might enhance interpretability and understandability of XAI (Calegari et al., 2020). Innovations in GeoXAI methods need not to be mutually exclusive, for example, focusing on visualization to the exclusion of gradients or KG. Some XAI methods could be merged into existing geospatial structures or several GeoXAI KGs could be merged as a hierarchical geographical KG for enhanced explanation.
Beyond these challenges, we must ask what constitutes explainability as it spans transparency demands of plain text as a form of explanation to computational remedies for training data, hyperparameter tuning and layer restructuring. We considered how XAI could bypass an explanation of DNN outcomes to directly provide insights about the data (Belaid et al., 2022). We discussed the need for semantic understanding, not only to introduce greater geographic semantics and ontologies into the underlying DNN but also to account for a lack of definitional overlap between the fields of geography and AI. Whether the users are GIScientists or impacted communities, if they are not actively participating in the generation of explanations, in "explaining the explanations," then it becomes difficult to guarantee the effectiveness of XAI even if it is technically correct. We hope that insights and knowledge from GeoXAI can feed back into the design and development of GeoAI.

ACK N OWLED G M ENTS
Part of the funding for this research came from the Social Sciences and Humanities Research Council of Canada grant, SSHRC 430-2020-00564, AI for the rest of us.

CO N FLI C T O F I NTE R E S T S TATE M E NT
There is no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data sharing is not applicable to this article as no new data were created or analyzed in this study.

E N D N OTE
1 The literature uses salience, saliency, and salient interchangeably. We use salience throughout.