Bioclimatic and altitudinal variables influence the potential distribution of canine parvovirus type 2 worldwide

Abstract Canine parvovirus type 2 (CPV‐2) is extremely contagious and causes high rate of morbidity to many wild carnivores. It has three variants (CPV‐2a, CPV‐2b, and CPV‐2c) that are distributed worldwide with different frequencies and levels of genetic and antigenic variability. The disease poses a threat to the healthy survival and reproduction of wildlife. The research on the relationship between CPV‐2 epidemic and environmental variables is lacking. To fill this research gap, we used maximum entropy (MaxEnt) approach with principal component analysis (PCA) to evaluate the relation between CPV‐2 and environmental variables and to create a world risk map for this disease. According to the PCA results, 18 environmental variables were selected from 68 variables for subsequent analyses. MaxEnt showed that annual mean temperature, isothermality, altitude, November precipitation, maximum temperature of warmest month, and precipitation of warmest quarter were the six most important variables associated with CPV‐2 distribution, with a total of 77.7% percent contribution. The risk of this disease between 18°N and 47°N was high, especially in the east of China and the United States. These results support further prediction of risk factors for this virus to help secure the health and sustainable survival of wild carnivores.

Infections are normally acquired through contact with feces, vomit, saliva from infected dogs, and contaminated water or food.
Upon entering the body, the virus replicates in the oropharynx in the first 2 days and transmits to other organs through bloodstream, and viremia appears after 3-5 days. The virus then reaches the lymphoid tissues, intestinal epithelium, and bone marrow, as well as the heart in neonatal pups, which affects mitotically active tissues. After an incubation period of 3-7 days, the disease can be characterized by either enteritis or myocarditis. In the intestine, the replication of the virus kills the embryonic epithelial cells of intestinal gland, leading to epithelial shedding, short villi, vomiting, hemorrhagic diarrhea, fever, dehydration, high degree of depression, shock, and even death, which are typical symptoms of enteric form, whereas myocarditis can be commonly seen in 4-to 6-week-old puppies, often without aura symptoms, or only mild diarrhea, moans, mucosal cyanosis, difficult breathing, fast and weak pulse, and often within a few hours suddenly death which probably due to acute respiratory depression. The characteristic of this disease is a very short clinical course with death that can often occur 2 or 3 days after onset of signs in nonprotected hosts (Carman & Povey, 1985). It can affect dogs at any age, but those puppies between 6 weeks and 6 months of age have the highest risk of developing severe disease (Houston, Ribble, & Head, 1996). While those dogs older than 1 year are still highly susceptible to CPV-2 infection, they have a milder form and lower mortality of disease owing to discharging part of virus in feces (Wilson et al., 2014). CPV-2 can also be transmitted directly to wild carnivores through close contact with domestic cats and dogs or via prey species of smaller carnivores (Miranda & Thompson, 2016).
Thus, CPV-2 is not only one of the most significant enteric pathogens in domestic dogs and cats, but also has been detected in at least seven related families of wild carnivores (Decaro & Buonavoglia, 2012;Steinel, Parrish, Bloom, & Truyen, 2001), such as Grey wolf (Canis lupus; Allison et al., 2013), red fox (Vulpes vulpes; Filipov et al., 2016), Siberian tiger (Panthera tigris altaica; Steinel, Munson, van Vuuren, & Truyen, 2000), Masked palm civet (Paguma larvata ;Chen et al., 2011), Red panda (Ailurus fulgens), and Giant panda (Ailuropoda melanoleuca; Mainka, Qiu, He, & Appel, 1994). While the disease poses a potential threat to wild carnivore survival and reproduction in many countries around the world, it remains unclear what environmental conditions influence the incidence of the disease in the wild. So, epidemiological surveillance and risk predictions require additional insight about environmental factors that determine the geographic distribution of the disease.
Studies have shown that environmental factors such as geography, climate, and weather have a significant influence on the geographic distribution of animal viruses. For example, adenovirus (Fagbo et al., 2016), rotavirus (Das et al., 2017), respiratory syncytial virus (Nenna et al., 2017), hantavirus (Prist, Uriarte, Fernandes, & Metzger, 2017;Tian et al., 2017), and avian influenza virus (Tian et al., 2015) are closely related to temperature, precipitation, and humidity, which may vary locally and seasonally (Lujan, Greenberg, Hung, Dimenna, & Hofkin, 2014). Indeed, CPV-2 shows local and seasonal characteristics (Schoeman, Goddard, & Leisewitz, 2013;Zhao et al., 2016). But how environmental conditions and CPV-2 incidence are related reminds uncertain. Here, I report on an analysis that links environmental characteristics with CPV-2 incidence across the globe. I applied maximum entropy (MaxEnt) analysis, a kind of ecological niche modeling (Phillips et al., 2009), to ascertain the association between geospatial variation in environmental factors and geospatial patterns of CPV-2 incidence. The study aims to provide a basis for developing early warning predictions of when and where canine parvovirus is like to emerge.

| MATERIAL S AND ME THODS
I used MaxEnt to model the association between CPV-2 distributing and environmental variables. MaxEnt is a machine learning method that estimates species distributions by finding the probability distribution using the maximum entropy principle with constraints on the expected values of the environmental predictors (Phillips, Anderson, & Schapire, 2006). It requires only presence records of the species and remains effective despite small sample size (Padalia, Srivastava, & Kushwaha, 2015). Moreover, this method combines species occurrence data and spatial environmental variables to produce an index of relative suitability that varies from 0 (unsuitable or most dissimilar to presence locations) to 1 (most suitable or most similar to presence locations; Kumar et al., 2015). I reduce the likelihood of aliasing between environment variables and eliminate highly correlated variables using principle component analysis (PCA ;Freeman, Kleypas, & Miller, 2013

| Canine parvovirus 2 data collection
Geographical coordinates of known CPV-2 records were obtained using canine parvovirus type 2 as search terms and downloading the global CPV-2 gene sequence information in GenBank (https://www. ncbi.nlm.nih.gov/genbank/). Further literature searches were made in PubMed according to the title or PubMed Unique Identifier found in the gene sequence supplemental information.
Google Earth software was used to obtain the geographical coordinates of the given city, town, or village in which CPV-2 was reported, whenever exact geographical coordinates were not provided (Miller et al., 2012). Studies for which such information could not be obtained excluded. This produced 549 geographical coordinates from GenBank, and 285 from PubMed, for a total of 834 geospatial locations.
I used a regular grid with 1 km × 1 km cells analyzed by ArcGis10.2 software, in order to make a maximum of one distribution point in each grid cell, thereby eliminating duplicate or very close record points. This meant that 228 CPV-2 geographical coordinates were excluded, leaving 606 as inputs for MaxEnt. All data were entered into a single spreadsheet file and saved as ".csv" format.

| Environmental variables data
Environmental data were obtained from WorldClim-Global Climate Data (Available from: http://www.worldclim.org/). WorldClim environmental variables were obtained from weather stations averaged over a 50-year period (from 1950 to 2000) at the 30 arc-seconds (~1 km) spatial resolution. I converted those data to ".asc" format required by MaxEnt software (Syfert, Smith, & Coomes, 2013). Of the 68 variables considered, 48 were climate variables that describe monthly total precipitation and average, minimum, and maximum monthly temperature; the remaining 20 were 19 bioclimatic variables and one altitude.

| Statistical analysis
Multivariate statistical analyses require using explanatory variables that are not closely correlated (Syfert et al., 2013;Zuur, Ieno, & Elphick, 2010). I therefore removed the potential for collinearity among correlated variables using PCA such that the correlation coefficients among the variables used in the analyses were <0.80 (Freeman et al., 2013;Kumar et al., 2015). I used ArcGis10.2 software to convert CPV-2 point data to shape raster and extract the attribute values of the 68 environment variables of 606 records points by "extract the analysis tool" from spatial analyst, then exported the attribute value data to the "txt" text file and convert spreadsheet file, and finally entered the data into the software to calculate and analyze. All calculations were made in SPSS 22.
Occurrence points were divided randomly into training data and test data. Of these points, 75% were utilized as training data for model prediction and 25% were used as test data for model testing and independent validation purposes. I used a Jackknife procedure to assess the contribution of each variable to model prediction. This procedure was replicated 10 times (Johnson et al., 2016;Miller et al., 2012). The best fit model was judged using the area under the receiver operating characteristic (ROC) curve (AUC; Jiang et al., 2016).
ROC curves relate true positive rate against false-positive error rate on an xy-coordinate system. The AUC (the area under ROC curve) value ranges between 0 and 1. Higher values indicated better F I G U R E 1 Plot of PC-1 and PC-2 scores of environmental variables of CPV-2. The 68 environmental variables initially considered as projected into principle component space in this study. Each vector group is detailed in Table S2 model performance (Fourcade, Engler, Rödder, & Secondi, 2014). Hence, a model is judged not to perform better than random if the AUC is below 0.5, and generally model performance is considered high when AUC values exceed 0.9.

| Determination of environmental variables
Principle component analysis revealed that much of the variation in environmental variables could be explained by the first and second principal component (PC1 and PC2) which explained about 71.5% of the total variance of the environmental variables data (Figure 1). The PC1 summarized more than 51.3% of the information, which was temperature variable. It can therefore be interpreted as a temperature factor. The PC2, summarizing 20.2% of the information, was a combination of temperature and precipitation variables. Overall, PCA identified 18 groups of highly correlated variables with each group having more than one variable. I thus selected only one variable from each group in final calculations, reducing the number of explanatory environmental variables considered from 68 to 18 (Table 1).
The 18 environmental variables essentially represent three categories, namely temperature-related, precipitation-related, and terrain-related environmental variables (Table 1). The correlation coefficient between them was less than 0.8 (Freeman et al., 2013;Kumar et al., 2015; Table S1).

| Model evaluation and environmental variable importance
The 18 candidate environmental variables (Table 1) were used as input for the MaxEnt model. The mean AUC test for the 10 replicate models of CPV-2 was 0.949, and the standard deviation was 0.007 with low omission rates and p-values (Figure 2), which indicates that MaxEnt model had a high accuracy.
The relative contribution of environmental variables in predictive species distribution models is evaluated utilizing the jackknife test in MaxEnt, which indicates that annual mean temperature (Bio 1), isothermality (Bio3), altitude (Alt), November precipitation (Prec11), maximum temperature of warmest month (Bio5), and precipitation of warmest quarter (Bio18) were the most important environmental variables associated with CPV-2 distribution, with a total of 77.7% contribution.
Among them, annual mean temperature was the top most important predictor which contributed 21.8% and it had the most information that was not present in other variables. Moreover, there were 28 environmental variables with high correlation with annual mean temperature, including min temperature of coldest month, mean temperature IEV is initial environmental variables, Min. is minimum, Max. is maximum, SD is standard deviation, C·V is coefficient of variation, and OR is optimum range. Bioclimatic variables computed from temperatures (T), from precipitation sums (P), or from both (T + P).
of driest quarter, mean temperature of coldest quarter, temperature (maximum temperature, minimum temperature, and average temperature) from January to May and October to December, and September average temperature (Table S2). As shown in the table (Table S2), it showed that CPV-2 presence was higher at lower levels of monthly and seasonal temperature. It is also possible to conclude that CPV-2 is more closely related to temperature.

| Spatial distribution of CPV-2 Risk
The

| D ISCUSS I ON
It has now been about 30 years since CPV-2 emerged; however, the disease caused by CPV-2 was not recognized until serious or fatal illness affected large numbers of dogs and other canids.
What remains uncertain, however, is what environmental factors determine its global distribution. My analysis revealed that temperature, precipitation, and altitude have an effect on the distribution of CPV-2, more specifically annual mean temperature, isothermality, altitude, November precipitation, maximum temperature of warmest month, and precipitation of warmest quarter were the most important environmental variables for CPV-2.
With regard to terrain, CPV-2 cases are expected to occur mainly in low altitude areas of <300 m (Table 2, Figures 3c, and 4c). This is based on the fact that 69.8% of collected data used in the analysis were collected from below 300 meters, with 51.8% coming from below 100 m.
Moreover, the incidence of CPV-2 was significantly higher in the season with high temperature difference (Fu, Pei, Wang, & Yin, 2012). Approximately 48.5% of 606 cases had an annual mean temperature of 8.5-16°C, and 55% had isothermality of 20-36.  (Table 2). It can be seen that the host of CPV-2 has a wide distribution and large terrain difference, and thus, the altitude varied greatly (Table 1).
Globally, the high risk of CPV-2 prediction is mainly in the eastern and northern parts of Asia in the range of 20°N to 45°N, which is consistent with the actual occurrence of the case. High risk is mainly in the central and eastern coastal areas in China. procyonoides, and Procyon lotor, so that they are most likely to be infected with each other through the close contact or fomites.
And the red panda in Yunnan Provinces, China, and the Grey wolf in northern USA may also face the threat of CPV-2. Therefore, it is essential to pay close attention to the high-risk area of CPV-2, especially wildlife reserves in various countries, and it is necessary to monitor climate data and contact with domestic animals in these regions simultaneously.

CO N FLI C T O F I NTE R E S T
None declared. F I G U R E 5 Predicted potential geographic distributions for CPV-2 in the world. Color scale indicates the probability that conditions are the risk level for CPV-2: red = high-risk probability, green = average-risk probability, blue = low-risk probability