A convolutional neural network approach to predict non‐permissive environments from moderate‐resolution imagery

Convolutional neural networks (CNNs) trained with satellite imagery have been successfully used to generate measures of development indicators, such as poverty, in developing nations. This article explores a CNN‐based approach leveraging Landsat 8 imagery to predict locations of conflict‐related deaths. Using Nigeria as a case study, we use the Armed Conflict Location & Event Data (ACLED) dataset to identify locations of conflict events that did or did not result in a death. Imagery for each location is used as an input to train a CNN to distinguish fatal from non‐fatal events. Using 2014 imagery, we are able to predict the result of conflict events in the following year (2015) with 80% accuracy. While our approach does not replace the need for causal studies into the drivers of conflict death, it provides a low‐cost solution to prediction that requires only publicly available imagery to implement. Findings suggest that the information contained in moderate‐resolution imagery can be used to predict the likelihood of a death due to conflict at a given location in Nigeria the following year, and that CNN‐based methods of estimating development‐related indicators may be effective in applications beyond those explored in the literature.


| INTRODUC TI ON
Measuring factors surrounding human development around the world is a cornerstone of research and policy within the international development community, as highlighted by targets and indicators of the United Nation's Sustainable Development Goals (Colglazier, 2015). Household surveys and other local data collection methods have historically served as the primary means of gathering data on key indicators associated with human development. Surveys collect data on indicators such as household consumption, assets, health, conflict events and other factors which can be used to target resources, inform policies, and evaluate progress (Dolan et al., 2019;Karlsson, Kim, Joe, & Subramanian, 2020;Sato, 2019).
A major shortcoming of such surveys and measures is the inherent difficulty-and costs-associated with collecting them (Demombynes & Sandefur, 2014;Jerven, 2014). In some cases, survey-based data collection can be impractical due to constraints such as cost and conflict (e.g., Axinn, Ghimire, & Williams, 2012;Coghlan et al., 2006). It is often not possible to collect data with sufficient coverage at the intervals needed. Multiple groups have noted that low-income countries suffer from a substantial data gap due to conducting surveys far less frequently than needed, even for information on basic needs (Chandy & Zhang, 2015).
To overcome data gaps, researchers and policy-makers are increasingly utilizing remotely sensed geospatial data to generate estimates of traditional socioeconomic outcomes and indicators (see Chen & Nordhaus, 2011;Henderson, Storeygard, & Weil, 2012;Noor, Alegana, Gething, Tatem, & Snow, 2008). One of the most prominent examples of this is the use of nighttime lights (NTL) as an indicator of economic activity (Chen & Nordhaus, 2011;Henderson et al., 2012;Noor et al., 2008). While the use of NTL has been shown to correlate with various measures of economic output or wealth in a range of contexts, it is fundamentally limited by the ability to detect low levels of light in poor rural areas (Jean et al., 2016), the relatively coarse resolution of the data,1 and other issues such as light saturation and bloom (Elvidge, Baugh, Zhizhin, & Hsu, 2013).
Recent literature has identified machine learning methods to further improve our ability to detect the geospatial location of impoverished communities. Jean et al. (2016) used a combination of satellite imagery and machine learning to estimate consumption and assets in multiple African countries. This method was based around convolutional neural networks (CNNs), which have been shown to be highly effective with image-based applications (He, Zhang, Ren, & Sun, 2016;Razavian, Azizpour, Sullivan, & Carlsson, 2014). CNN-based methods of estimating poverty using satellite data have now been shown to outperform methods which used NTL alone (Jean et al., 2016). Authors (see, for example, Babenko, Hersh, Newhouse, Ramakrishnan, & Swartz, 2017;Head, Manguin, Tran, & Blumenstock, 2017;Perez et al., 2017) working with this class of methods have begun exploring applications in additional countries, using varying sources of satellite imagery, and addressing potential limitations.
The range of data sources, CNN architectures, and sampling schemes possible for the use of CNN-based approaches to estimation present a number of novel challenges. Additionally, little work has been done to explore the use of these methods to estimate or predict other outcome measures, such as conflict, or assess the effectiveness of these methods when using specific temporal data. This article engages with a subset of these challenges, focusing on the research question whether CNNs can be used to predict likely locations of conflict-related deaths in Nigeria based on satellite imagery. We engage with this as follows. In Section 2 we present background information on conflict in Nigeria, CNNs, and a brief review of the satellite imagery used in this article. In Section 3 we discuss the methods used to prepare the data and implement the CNNs.
Section 4 presents the results. Finally, in Section 5, we discuss limitations of the methods and avenues for future work.

| Conflict and instability in Nigeria
Levels of conflict present in Nigeria are the result of a range of complicated political, economic, humanitarian, and ethnic factors-some recent, others rooted in Nigeria's past. Among the most substantial sources of conflict and instability today are the Boko Haram insurgency; violence between farmers and herdsmen often driven into each other's paths by a shifting landscape due to climate change, drought (Akinwotu, 2017), and environmental degradation, along with changing laws; ethnic and religious violence; government and police forces killing civilians; and general corruption (Herbert & Husaini, 2018).
Conflict events in Nigeria, particularly fatal events, are largely tied to the Northern regions where Boko Haram is most active and, according to the Armed Conflict Location & Event Data (ACLED) data set, have been responsible for nearly 11,000 fatalities since 2010 (ACLED, 2019). Another major source of conflict is between herdsmen and farmers. As herdsmen and farmers are forced to move across the land due to environmental factors or grazing laws (International Crisis Group, 2018), conflict in defense of their livelihoods has arisen, with ethnic and religious differences contributing to an already tenuous situation (CFR, 2019;Peace-Insight, 2017). Of the many results of this and other conflicts and instability in Nigeria, and a further factor in future conflict, are internally displaced persons (IDPs) and food insecurity. According to the Famine Early Warning System Network (FEWS Net), there are now over 2.0 million IDPs in the Northeast alone, leading to increased food assistance needs (FEWS, 2019).
Several programs exist to track conflict events in Nigeria with varying mechanisms and scopes. Two of the most commonly used data sets are the ACLED project (Raleigh, Linke, Hegre, & Karlsen, 2010), and the Uppsala Conflict Data Program Georeferenced Event Database (UCDP GED; Sundberg & Melander, 2013). The UCDP data set provides data on organized events of lethal violence, while the ACLED data set provides data on political violence and protest around the world. A key distinguishing factor is that while ACLED can suffer from data quality issues (Eck, 2012;Raleigh et al., 2010), it provides non-lethal conflict event data not available from UCDP ( Figure 1). Events included in the UCDP GED are also restricted to those in which at least 25 deaths occurred (Sundberg & Melander, 2013).
Examples of organizations utilizing these data sources or others to track conflict and instability in Nigeria increasing availability and use of georeferenced subnational conflict data, the quality, coverage, and detail of data collected are still fundamentally limited by available reporting and data collection mechanisms (Eck, 2012). In addition to data sets and trackers of historic events, a range of methods have been explored to predict future conflict events, including some utilizing machine learning (Cederman & Weidmann, 2017). However, these methods have been limited in their temporal and spatial precision, as well as their reliability (Bazzi et al., 2019).
Recent works which have shown the viability of using machine learning methods based on convolutional neural networks to estimate poverty (Head et al., 2017;Jean et al., 2016) have been expanded on by researchers to additional applications such as road network quality assessments (Oshri et al., 2018) and crop yield estimates (You, Li, Low, Lobell, & Ermon, 2017) with some success. Adapting these methods to fill in spatial and temporal gaps of conflict, as well as for generating future predictions, could aid the efforts of researchers and actors in the international development community who are currently limited by what data are available.  Krizhevsky, Sutskever, & Hinton, 2012;Simonyan & Zisserman, 2014). Inspired by biological structures associated with visual processing, CNNs utilize a unique combination of alternating convolutional and pooling layers. This structure takes advantage of the spatial structure of images (i.e., dimensions) to detect features more efficiently than traditional neural networks (Albelwi & Mahmood, 2017). An example of the structure of CNNs can be seen in Figure 3.

| Convolutional neural networks
CNNs can be implemented with a variety of architectures which can significantly impact the effectiveness of the CNN, depending on the class of problem being solved (Albelwi & Mahmood, 2017). Classes of CNN architecture that have been developed include LeNet (LeCun, Bottou, Bengio, & Haffner, 1998), AlexNet (Krizhevsky et al., 2012), GoogLeNet (Szegedy et al., 2015), and VGG-Net (Simonyan & Zisserman, 2014). Residual networks, or ResNets, are one of the more recent types of CNNs developed and have been shown to be highly effective in image tasks such as image classification, detection, localization, and segmentation (He et al., 2016). The ability to perform well on traditional image tasks makes ResNets, and CNNs in general, an excellent candidate for utilization in applications based on satellite imagery. To date, CNN-based methods have been used in combination with various sources of satellite imagery to produce estimates of poverty (Perez et al., 2017), crop yields (You et al., 2017), and infrastructure quality (Oshri et al., 2018). While use of such methods is not a guarantee of success (Head et al., 2017), the diversity of promising results from early literature using these methods suggests that other areas and applications may benefit from these methods.

| Satellite imagery
A key component of the CNN-based approach introduced by Jean et al. (2016) is the source of satellite data used to train the CNNs. Over the past decade a wide range of satellite imagery has become available from private companies such as Digital Global (DigitalGlobe, 2018) and Planet (Boshuizen, Mason, Klupar, & Spanhake, 2014), as well as from free and publicly available sources such as the Landsat (Irons, Dwyer, & Barsi, 2012;USGS, 2018a) and Sentinel (ESA, 2018) programs. Existing work in the development community using CNNs has incorporated satellite data from many satellite imagery sources, including Google Static Maps API (Head et al., 2017;Jean et al., 2016;Xie, Jean, Burke, Lobell, & Ermon, 2015), Planet (Babenko et al., 2017), Digital Globe (GeoEye-1 and Quickbird-2) (Babenko et al., 2017;Engstrom, Hersh, & Newhouse, 2017) Landsat 7 (Perez et al., 2017), Landsat 8 (Oshri et al., 2018), and Sentinel-1 (Oshri et al., 2018). Each satellite has specific sensor capabilities and orbital characteristics associated with the satellite's intended purpose. Factors such as return time and spectral resolution, Components of a convolutional neural network (Albelwi & Mahmood, 2017) which impact the data a satellite collects, are essential considerations when determining an appropriate source of imagery for geospatial applications. Some critical characteristics to consider include: • spatial coverage, the regions of the world which the satellite covers; • spatial resolution, the pixel size, or how fine a feature the satellite can detect; • temporal coverage, the time range over which the satellite has been operating; • temporal resolution, the frequency at which the satellite revisits a location; and • spectral resolution, the number and range of bands (ranges of light) that the sensor can pick up.
When using satellite imagery to estimate metrics of human development, the above factors will influence the accuracy and suitability of different CNN architectures. Spatial resolution impacts both the ability of the CNN to detect appropriate features, and will ultimately impact the resolution or precision at which the final estimates can be generated (Blaschke, 2010). For example, imagery with a resolution of 1 km would have limited use if the goal was to detect the roof material of individual houses in rural area. Finally, satellite platforms can contain a range of sensors with the ability to detect varying spectral characteristics. For example, if an application depends on the detection of features only visible using infrared imagery, then a satellite which can only capture visible wavelengths (e.g., red, green, and blue bands) would not be suitable.

| Landsat imagery
The Landsat program includes a temporally continuous series of satellites that offer a combination of spatial, temporal, and spectral resolution and coverage that make it a powerful tool for regularly monitoring changes of relevance to policy-makers. Landsat 8 is the most recent active satellite in the Landsat program since its launch in 2013, and its data are public. Landsat 8 captures data for the entire planet, revisiting each location at 16-day intervals. The sensors on board Landsat 8 acquire imagery at a 30 m resolution in the visible, near infrared, and shortwave infrared range using the Operation Land Imager sensor, as well as thermal bands from the Thermal Infrared Sensor. These bands are detailed in Table 1. Landsat 7, the predecessor to Landsat 8, remains active until 2021 (USGS, 2017a) and offers significant temporal coverage from 1999 to the present. Landsat 7 also shares many of the attributes that make Landsat 8 useful (USGS, 2018b). When combined with broad temporal coverage, these attributes have made Landsat 7 useful TA B L E 1 Landsat 8 operational land imager and thermal infrared sensor bands for a wide range of historic and contemporary applications focused on regions around the world (Demirkesen, Evrendilek, Berberoglu, & Kilic, 2007;Jensen & Cowen, 1999;USGS, 2018a;Yang, Huang, Homer, Wylie, & Coan, 2003). A major drawback to using Landsat 7 is the scan line corrector (SLC) failure which occurred aboard Landsat 7 in 2003 (USGS, 2018d).
The SLC failure impacted the satellite's ability to correct for forward movement, leaving gaps in the resulting images, as seen in Figure 4. A range of methods exist to fill in the gaps created by the SLC failure (Maxwell, 2004;Scaramuzza, Micijevic, & Chander, 2004;Zeng, Shen, & Zhang, 2013;Zhu, Liu, & Chen, 2012) and allow Landsat 7 data to continue to be used in some applications. For applications which do not require a historic record, Landsat 8 offers very similar data without the need for additional gap filling. In addition to the availability of historic data from Landsat 7 and earlier satellites in the program (USGS, 2019), long-term support for the Landsat series is already planned, with the launch of Landsat 9 scheduled for late 2020.
Several alternatives to Landsat exist, and often offer improvements along certain dimensions. For example, significantly finer spatial resolution imagery is available from Digital Globe or Planet (sub-meter resolution).
However, these products are not freely available, and do not have the spectral or temporal coverage available from Landsat (Boshuizen et al., 2014;DigitalGlobe, 2018). One of the most competitive alternatives to Landsat is the Sentinel program. The Sentinel-1 and Sentinel-2 satellites have offered publicly available data, since 2014 and 2015 respectively. Sentinel-2 has a resolution of 10-20 m and is only lacking thermal bands when compared to Landsat 8 (Drusch et al., 2012), while Sentinel-1 utilizes a C-band synthetic-aperture radar instrument (Geudtner, Torres, Snoeij, Davidson, & Rommen, 2014) for collecting data. Initial work has shown success in utilizing Sentinel-1 data in satellite-imagery-based machine learning applications (Oshri et al., 2018), and the potential for exploring the use of Sentinel-2 will be discussed in Section 5.
Despite the relatively coarse resolution of Landsat imagery, existing work has shown that when used with

| Data preparation
The primary source of input data for this application is daytime satellite imagery from Landsat 8. Imagery was requested and downloaded using the U.S. Geological Survey's freely available EarthExplorer (USGS, 2017b) and Bulk Download tools (USGS, 2017c). High-quality imagery suitable for time-series analysis2 was downloaded for Nigeria for all of 2014. Sixty scenes3 of imagery are acquired from each full revisit cycle to achieve complete coverage of Nigeria. Due to variability in the quality of the raw data collected by the sensors not all scenes captured at each revisit are suitable for use in analysis. Each individual scene contains ten bands of data at 30 m resolution, corresponding to the wavelengths detailed in Table 1.
The analysis in this article leverages yearly time-steps, necessitating additional processing. Yearly composites are created from individual scenes by taking the mean pixel values within each imagery scene at every available time-step for the year. Prior to aggregation, each individual scene is masked for pixels associated with cloud cover.
These pixels are defined by pixel quality assurance data which accompany each scene and are provided by USGS (2018c). All pixels with values indicating high or medium confidence of cloud coverage are masked during processing. Once all available scenes captured over the year for individual locations are masked and aggregated, the resulting set of scenes is then combined into a single mosaic of images covering the entire country.
Conflict data were acquired from the ACLED database (Raleigh et al., 2010). The location of events in ACLED is defined by longitude and latitude, and is used to define the satellite imagery used for training and validation data. ACLED data also provide the number of known fatalities for each conflict event, which is used to create a binary classification of either no fatalities (0)  In addition to utilizing each conflict location for a single sample, two additional sampling methods are implemented for comparison. The primary comparison method will attempt to determine if nearby spatial features in the imagery improve the model's performance. For each original sample location, this method generates nine additional samples based on points distributed in a regular grid around the original location within approximately 1,000 m. To ensure that the increased sample size is not responsible for improved accuracy, as opposed to the content of the additional samples, one additional comparison method will be tested. This secondary comparison method involves duplicating each original sample nine times and will serve as a balanced sample size comparison for the previous method.4

| CNN implementation
The CNNs tested in this article were implemented using PyTorch, an open source Python package, and run using Nvidia Tesla GPUs.5 The specific class of CNN implemented is a residual network (ResNet), which has been developed with varying architectures that are primarily defined by the depth of the network (e.g., ResNet-18 has 18 layers). Using PyTorch, ResNets can be initialized using pre-trained weights based on training performed using ImageNet. ImageNet is a collection of approximately 14 million images from over 20,000 categories, including cars, dogs, and cats (Deng et al., 2009). Utilizing pre-trained networks enables a network to leverage information learned by training on a far larger data set, which can then be incorporated and refined by an application using a smaller, limited, data set. This process is known as transfer learning.
The fundamental principle of transfer learning is that training on separate yet similar data set will allow the network to learn basic, generalizable features that are useful for any image classification task. Existing work has shown that using transfer learning to initialize a network, even when the original training task is extremely different, can still outperform traditional weight initialization techniques (Pan & Yang, 2010 Once the network is initialized with the pre-trained weights, the data for the new task can be used to update the network. A common procedure for updating the network involves "freezing" the earlier layers, where basic features are learned, and only fine-tuning deeper layers which will impact more complicated, application-specific, features (Yosinski et al., 2014). Then, the final fully connected layer which is responsible for performing the actual classification-based on the output from earlier layers-is modified and trained to reflect the current task.
A critical difference between the images from ImageNet used for pre-training and satellite imagery is the inclusion of additional bands of data. Traditional images contain only red, green, and blue (RGB) bands to reflect the visible wavelengths of imagery, whereas Landsat 8 includes a total of 10 bands that cover a broader spectrum of imagery. Due to this difference, the CNN must be adapted to consider the extra bands of data. To utilize the additional bands of data available with Landsat 8, a custom data loader was created6 and the initial convolutional layer of the CNN modified based on the number of input bands. Additionally, each new band must have weights initialized. Following the implementation strategy of previous work (Perez et al., 2017), the average of the pretrained weights for the RGB bands was used to initialize the new bands.

| Optimization
A range of different CNN architectures and parameters are tested to determine the optimal combination for accurately predicting conflict fatalities. A grid search is used to iterate over a defined set of parameters for each variable associated with the CNN. The variables explored include: • network architecture, the different versions of ResNet based on varying network depth (number of layers); • learning rate, a factor influencing the impact of training data on the network, or how fast the network learns from data; • momentum, a factor which helps to overcome issues associated with local minima; • step size, the number of iterations at which learning rate is decayed; • gamma, the amount the learning rate is decayed at a specified step size; and • optimization function, the type of optimization algorithm used.
For each unique combination of these variables, the accuracy of a CNN is estimated based on the percentage of ACLED locations it successfully classifies as either a fatal (1) or non-fatal (0) conflict event. This is done by progressively training the CNN over multiple iterations, or epochs, for each set of parameters. Each epoch consists of passing the entirety of the training data through the CNN, allowing it to learn, and then using the validation data to assess the accuracy of the CNN for that epoch. Each subsequent epoch builds on the learned behavior of the previous epochs, assessing accuracy until the defined number of epochs has been reached.7 The accuracy of a given CNN, consisting of one set of parameters from the grid search, is defined as the maximum epoch accuracy achieved. As each epoch is progressively run, it is not always the case that the learned information improves the network's accuracy. In order to determine the most effective network, the state of the network and its accuracy are recorded after each epoch. Once all epochs have been completed, the network state with the best accuracy is saved.

| RE SULTS
To assess the ability of a convolutional neural network to predict conflict fatalities, pre-trained ResNet CNNs were implemented in PyTorch as detailed in Section 3.2. Multi-spectral Landsat 8 imagery for 2014 and ACLED conflict event data for 2015 were prepared based on the methods in Section 3.1. The subsequent year (2015) of conflict data was selected in order determine the potential effectiveness of these methods to predict future conflict fatalities based on contemporary satellite imagery. Restricting data within temporal bounds is a key feature that distinguishes this application from earlier work (e.g., Jean et al., 2016) which will provide insights into the feasibility of this approach for time series predictions. Using the locations and fatality count of conflict events, training and validation data sets were labeled using a binary of whether conflict events did or did not have any fatalities associated with them. The set of samples for each binary class in the training and validation data were balanced to ensure no bias during training. Two sets of tests were performed which were designed to identify the most effective sampling method for training data, as well as identify the optimal combination from a range of network parameters.
The three sampling strategies tested were based on a combination of machine learning and spatial theory, and determined the satellite image(s) that were input into the network for each observation. Method 1 (no fill) used a single 244 × 244 pixel window (for each band) around the latitude-longitude location of the conflict event, and is designed to serve as a baseline for comparison. Method 2 (grid fill) used nine equally spaced samples in a grid around the original sample latitude and longitude, to test the theory that an expanded geographic region of information could provide improved predictions. Method 3 (duplicates) duplicated the original location nine times, and served as a comparison with a similar sample size to method 2 in order to avoid bias due to sample size.
During the first set of tests, each sampling method was tested using 64 different parameter combinations over a training and validation cycle of 30 epochs, detailed in Table 2. Based on these tests, both methods which expanded the size of the training data set resulted in a clear improvement in accuracy over the baseline method, as seen in Figure 5. Methods 2 and 3 have similar minima during this initial set of tests, but method 3 achieved higher median and maximum accuracy values. Individual parameter variations across the sampling methods generally did not show significant performance trends ( Figure 6)-the only notable finding was slightly better performance when using the smaller learning rate.
The second set of tests expanded the range of parameters explored (Table 2) while using method 3, for a total of 432 parameter combinations. Method 3 was chosen as the focus for the second set of tests due to its strong performance in the first set of tests. For the second set of tests, the number of epochs run was increased to 60.
More variation was found across the expanded batch of tests, with a minimum accuracy of 59.8% and a maximum of 80.9%. However, no clear patterns emerged among the parameter combinations aside from those associated with learning rate. Smaller learning rates outperformed larger learning rates as they did in the initial batch of

F I G U R E 5 Boxplot of sample type
F I G U R E 6 Boxplot of parameters, first set of tests tests. The best overall accuracy was achieved using a learning rate of 0.0001, but overall trends showed no significant performance improvement over a learning rate of 0.001. As expected given the balanced class sizes during training, predictions for neither class skewed the overall accuracy. Using the best-performing parameters (overall accuracy of 80.9%), the accuracy for predicting conflict events with fatalities (true positives) was 79.4%, and the accuracy for predicting conflict events with no fatalities (true negatives) was 82.4%.
Overall, the grid search of parameters conducted across the two batches of tests showed minimal impact from parameter adjustments within commonly accepted ranges (see Figure 7). The only notable exception to this was large (i.e. multiple orders of magnitude) changes in learning rate. The sampling method and the resulting data used for training proved to be a much more significant factor in the overall ability of a network to accurately identify the likelihood of future conflict in a region based on satellite imagery. Additionally, these results were compared to CNNs which were trained using randomly labeled data. The accuracy of random data networks after multiple iterations was found to be significantly lower (maximum of approximately 50%) when compared to the true network (maximum of over 80%).

| Convolutional neural network findings
The work presented in this article provides evidence that CNNs applied to moderate-resolution satellite imagery can be used to predict the likelihood of conflict-related deaths in Nigeria. We specifically illustrate that if a conflict F I G U R E 7 Boxplot of parameters, second set of tests event occurs in 2015, multi-spectral imagery of the physical features on the ground at that location in 2014 can predict with 80% accuracy whether or not the conflict event will result in any fatalities. This finding illustrates the value of spatial information for prediction of development indicators, especially when processed with a CNNbased approach. While the use of the methods presented in this article for predicting non-permissive environments and related policy applications requires additional research, our results show the potential for CNN-based methods in applications beyond those explored in the existing literature.
One of the most notable findings of this work is the limited impact of varying network parameters when compared to the impact of modifying sampling schemes for training data. Given the impact of the sampling method on overall network performance, expanding the scope of the tests comparing these methods may lead to significant improvements. Utilizing duplicates of training data was identified as the most effective sampling method, yet this duplication could simply offer similar performance gains as a proportional increase in the number of epochs during training. Comparing duplication at varying levels with proportional increases in the number of epochs could provide insight into the impact each of the strategies has on performance. Further, while the grid-based approach to creating additional training data samples underperformed relative to simple duplication of training data baseline, additional work which explores CNN accuracy when varying the distance from the original sample location to generate the grid, as well as the number of additional points in the grid, could prove more effective and provide additional insight into the spatial correlation of satellite imagery and conflict fatalities.
Considering the significance of training data, comparing the currently used data from Landsat 8 and ACLED with other sources of satellite imagery and conflict data would also be valuable. Of other available sources of satellite imagery, discussed in Section 2.3, the Sentinel-2 platform offers the most compelling case for comparison.
Sentinel-2 is publicly available, has similar spectral coverage to Landsat 8, and has finer resolution. Other sources of conflict and event data such as the Social Conflict Analysis Database (SCAD; Salehyan et al., 2012) and the Global Database of Events, Language, and Tone ( GDELT Leetaru & Schrodt, 2013) could be used as alternatives or supplements to ACLED data.
Another limiting factor of our approach is that the training data were restricted to conflict events and classified by whether each event did or did not have fatalities associated with it. An alternative approach, which could be achieved using random locations or non-conflict event locations, would be to compare conflict events with control locations at which no conflict occurred. Additional methods for testing the robustness of our approach to predicting conflict include expanding the temporal and spatial coverage using historical data. The work presented in this article used only 2014 imagery and 2015 conflict data, while data for both exist through 2019. Additionally, both imagery and ACLED data exist for other countries which could be explored independently or as part of a comprehensive model. Testing across countries would also contribute to exploring the impact of spatial scale of training on network performance. Models could also be tested which restrict data to subnational regions by way of comparison.

| Scope and extent
This article introduces a purely predictive CNN-based approach for identifying whether a conflict event is likely to result in a death: we make no causal inferential claims, nor does the current CNN approach support the examination of causality. The causes, factors, and defining characteristics of conflict death will frequently be very different, and be associated with different features found in satellite imagery, when compared with another conflict.
As such, CNN-based predictive models trained solely on data from Nigeria may not perform effectively in other countries without including those countries in the training.
Given that geographic extent could reasonably be a limiting factor in effectiveness of this approach, we explored the spatial distribution of both classes of conflict events, as well as the spatial distribution of locations which were accurately (and inaccurately) predicted in Nigeria. In all cases, no clear pattern was found, indicating it is unlikely that our model was skewed toward only predicting accurately in certain portions of the country with homogeneous spatial features. While this does not indicate that this specific model could be applied to other countries, it is indicative of good overall performance across varying geographic contexts for conflict in Nigeria.

| Intractability
As we consider that the landscape and features associated with conflict may vary across geography, it is important to recognize that the CNN approach detailed here does not attempt to define the specific features being detected.
Thus, we are limited to general statements about the type of features likely to be detected given the moderate 30 m resolution of the imagery. Small features such as broken windows or burned cars will not be detectable with 30 m resolution data. Rather, the CNN is only capable of detecting large landscape trends such as urban areas, draught-stricken areas, or other features which may be well-established correlates of conflict from the literature.
While CNN methods may have the potential to help identify novel indicators of conflict, identifying what specifically the CNN is detecting is beyond the scope of this article and would be a valuable focus of future research.
More broadly, despite the promising results presented here, these methods are not intended to supplant existing data sources or methods used for assessing conflict. While these methods may serve as a useful supplement within a broader ecosystem of conflict tools and methodologies, the goal at this stage is to serve as a spur for additional work on the applications of CNN approaches to predicting conflict as well as other development indicators.

| Conclusions
This paper sought to answer the research question whether convolutional neural networks can be used to predict likely locations of conflict-related deaths in Nigeria based on satellite imagery. The approach presented shows promising results, achieving up to 80% accuracy in binary classification of death versus no death conflict events in the year after model calibration. Despite the significance of this result, we highlight the many limitations of the CNN-based approach to conflict prediction accomplished to date, the most predominant being the lack of insight into what physical features in imagery the CNN identifies as related to conflict deaths. Future research should engage with many of these challenges, including how to identify the driving image features detected by a convolutional neural network, and how to better understand the spatial scopes across which a given model calibration might be most effective. Despite these limitations, the public availability of the data and tools used and the high temporal frequency at which satellite imagery is available allow for a broad range of potential applications and adaptation to various use cases. Examples include the use of predictive maps, generated using these methods, as supplemental aids in decision-making processes for security specialists implementing aid projects in developing countries.

ACK N OWLED G EM ENTS
This study was funded by a cooperative agreement (AID-OAA-A-12-00096) between USAID's Global Development

CO N FLI C T S O F I NTE R E S T
The authors declare no conflicts of interest.

E N D N OTE S
2 Imagery scenes with the highest data quality, and suitable for time-series analysis, are labeled as Tier 1. This includes precision and terrain correct data that are intercalibrated across Landsat sensors (USGS, 2018c).