Spatial familial networks to infer demographic structure of wild populations

Abstract In social species, reproductive success and rates of dispersal vary among individuals resulting in spatially structured populations. Network analyses of familial relationships may provide insights on how these parameters influence population‐level demographic patterns. These methods, however, have rarely been applied to genetically derived pedigree data from wild populations. Here, we use parent–offspring relationships to construct familial networks from polygamous boreal woodland caribou (Rangifer tarandus caribou) in Saskatchewan, Canada, to inform recovery efforts. We collected samples from 933 individuals at 15 variable microsatellite loci along with caribou‐specific primers for sex identification. Using network measures, we assess the contribution of individual caribou to the population with several centrality measures and then determine which measures are best suited to inform on the population demographic structure. We investigate the centrality of individuals from eighteen different local areas, along with the entire population. We found substantial differences in centrality of individuals in different local areas, that in turn contributed differently to the full network, highlighting the importance of analyzing networks at different scales. The full network revealed that boreal caribou in Saskatchewan form a complex, interconnected familial network, as the removal of edges with high betweenness did not result in distinct subgroups. Alpha, betweenness, and eccentricity centrality were the most informative measures to characterize the population demographic structure and for spatially identifying areas of highest fitness levels and family cohesion across the range. We found varied levels of dispersal, fitness, and cohesion in family groups. Synthesis and applications: Our results demonstrate the value of different network measures in assessing genetically derived familial networks. The spatial application of the familial networks identified individuals presenting different fitness levels, short‐ and long‐distance dispersing ability across the range in support of population monitoring and recovery efforts.


| INTRODUC TI ON
Population genetic analyses are used to inform on the genetic composition of a population and the forces that explain the changes to that composition (Griffiths et al., 2000). A larger number of analytical approaches have been developed to delineate populations and assess the extent and patterns of gene flow and dispersal (e.g., Galpern et al., 2014;Jombart et al., 2008;Pritchard, Stephens, & Donnelly, 2000). More recently, graph-theoretic approach has been used to assess population genetic structure (Dyer & Nason, 2004), investigate sex-specific dispersal processes and network structures in landscape genetics (Bertrand et al., 2017), and analyze spatial patterns of genetic variation across a species' range (Fortuna et al., 2009). In parallel, pedigree reconstructions have been done to inform on demographic parameters (Creel et al., 2003;Gobush et al., 2009;Lucena-Perez et al., 2018;McFarlane et al., 2018), yet network analyses and genetically derived pedigrees have been used as two separate methodological frameworks. Here, we suggest that the combination of these methods may highlight the interconnectedness between individuals (Escoda et al., 2019;Morrison, 2016), differences in reproductive success (McFarlane et al., 2018), and ultimately inform on the demographic structure of a population.
Reconstructing a reasonably complete and accurate familial network from pedigree data is especially relevant for endangered species, providing information on mating patterns and reproductive success (Lucena-Perez et al., 2018;Manlik et al., 2016). However, collecting reliable parentage information for cryptic and elusive species is difficult or directly unfeasible; pedigree information obtained through direct field observations is often limited to females and may consistently overlook cryptic mating (Coltman et al., 1999;Gottelli et al., 2007). Molecular markers, such as microsatellites, have been used to infer parentage and familial relationships in wild populations (Pemberton, 2008) and assess individual heterogeneity in survival and reproduction (Bolnick et al., 2011;Hamel et al., 2009;Kendall et al., 2011). Such heterogeneity can be the result of a number of common processes, such as persistent social rank (e.g., von Holst et al., 2002;Stockley & Bro-Jørgensen, 2011), unequal allocation during parental care (e.g., Johnstone, 2004;Manser & Avey, 2000), fine-scale spatial habitat heterogeneity (Bollinger & Gavin, 2004;Franklin et al., 2000;Manolis et al., 2002), and genetics (Meyers & Bull, 2002;Nussey, 2005).
Graph theory (Harary, 1969) is widely used in ecology to assess functional and structural connectivity (Fall et al., 2007;Urban & Keitt, 2001;Wagner & Fortin, 2005). Graphs are represented as a network of nodes and edges, where edges imply a level of connection between the nodes (Urban & Keitt, 2001). Several networkbased measures are commonly used to quantify indirect connections between nodes (e.g., individuals, habitat patches; Table 1). Each measure captures a distinct aspect of the network. Alpha centrality is a generalization of eigenvector centrality given to directed graphs; while eigenvector centrality is a measure of the influence of a node in a network, alpha centrality allows nodes to have external sources of influence that does not depend on that node's connection to other nodes (Bonacich & Lloyd, 2001). Betweenness centrality indicates how central a node is in a network, based on the number of shortest paths between pairs of nodes that pass through that node (Freeman, 1977). Closeness centrality measures how fast information can spread from a given node to all other reachable nodes in K E Y W O R D S boreal caribou, dispersal, familial network, fitness, network analysis, pedigree reconstruction,

Metric Type Definition
Alpha centrality Indirect Alpha centrality of all vertices. A generalization of eigenvector centrality to directed graphs. Alpha centrality indicates the overall connectivity of a node, both direct and indirect connections (Bonanich and Lloyd 2001).
Betweenness centrality Indirect Quantifies the number of times a node lies along the shortest path between two other nodes in the network (Freeman, 1977).

Closeness centrality Indirect
A centrality measure based on the shortest path length between a node and other nodes in the network. The Latora closeness centrality is used in networks with disconnected components (Latora and Marchiori 2001).

Degree centrality Direct
The number of edges connected to a node (Harary, 1969).

Eccentricity centrality Indirect
The maximum noninfinite length of a shortest path between n and another node in the network (Hage & Harary, 1995).

TA B L E 1
Node-based measures of connectivity a network, and the Latora closeness centrality is used in networks with disconnected components (Latora and Marchiori, 2001).
Degree centrality represents the number of edges connected to a node; in directed graphs, in-degree counts the number of edges directed toward the node, and out-degree counts the number of edges that leaves the node toward other nodes (Harary, 1969). Eccentricity centrality is the maximum distance from a node to any other node, representing the importance of a node within a network, determining the influence of a particular node within a network (Hage & Harary, 1995). A priori selection of network measures is important to avoid including several spuriously correlated measures (Webber et al., 2020). Although some network-based centrality measures may overlap, each measure captures a distinct aspect of the network; nodes with high scores for one measure may not necessarily have a high score in other measures.
Here, we infer population demographic structure by assessing different node-based measures of centrality obtained from a familial pedigree network. First, we use microsatellite data to identify parent-offspring relationships and construct a spatial familial network from all relationships (familial pedigree) of boreal caribou in Saskatchewan, Canada. Then, we create a spatial familial network to identify local area networks with varying distributions of centrality measures, determining whether high centrality measures and edge-to-node ratios at the fine scale correspond to high centrality in the full network. Spatially analyzing familial networks is inherently difficult due to the presence of inferred individuals, whose spatial locations are unknown. By using the centrality measures from the aspatial network in the spatial network of individuals, the network connections to the inferred individuals can be brought into a spatial framework. We also assess the structure and cohesiveness within the full network using edge removal to identify boundaries that run between subgroups (Girvan & Newman, 2002;Lusseau & Newman, 2004;Newman & Girvan, 2004), with a particular focus on parts of the range presenting different levels of anthropogenic disturbance. Our findings allow us to discuss how different measures of network centrality can be used to spatially identify areas of highest fitness levels, dispersal and reproductive skew across the landscape in support of population monitoring and recovery efforts. increasingly fragmented and reduced in area (Arsenault, 2003;Rock, 1992). Further studies have shown reduced movement of female caribou and low adult survival in the Boreal Plains (Arsenault & Manseau, 2011). Boreal caribou in Saskatchewan maintain a natural clinal pattern of genetic structure, with isolation by distance and isolation by resistance shaping spatial patterns of genetic variation Priadka et al., 2018). More information on Saskatchewan's boreal caribou habitat can be found in Appendix 1.

| Fecal pellet collection and genetic analysis
We used samples from across the boreal caribou range in Microsatellite alleles were scored with the program GeneMarker® (SoftGenetics, State College, PA) and followed a protocol documented in Flasko et al. (2017). Unique individuals were identified using the program ALLELEMATCH (Galpern, Manseau, Hettinga, et al., 2012). We retained samples that amplified at ≥ 5 loci and reamplified apparent unique genetic profiles represented by a single sample using two independent scorers to confirm unique individual identities . The rate of allelic dropouts (amplifications of only one of the two alleles for heterozygous individuals, producing false homozygotes; Taberlet et al. 1996) and false alleles (false allele amplifications; Bonin et al. 2004) were calculated using these re-amplification results.
Input parameters were set to allow for female and male polygynous mating systems without inbreeding avoidance, and the probability of mothers or fathers being present in the sampled dataset was set to 50% in the absence of other prior information. All sampled females were set as possible mothers, and all sampled males were set as possible fathers. COLONY infers the parental genotypes for each individual; inferred parents are genotypes that are not included in the candidate parent samples, either through that individual's genotype not being captured during sampling, or that parent is no longer living, resulting in a family network with more individuals than were sampled. Finally, individual fitness was calculated with the number of offspring each individual produced.

| Modeling the demographic structure of the population
Identifying parts of the network that are highly connected and those individuals that are less connected to the network can help define the local and global structure of the familial network. We used the r package CINNA (Ashtiani et al., 2018) to calculate individual nodebased measures of network centrality. Nodes represent individuals, and edges represent parent-offspring relationships, with directionality from parent to offspring. We calculated five direct and indirect node-based measures of centrality for each individual to quantify distinct aspects of centrality: alpha, betweenness, closeness, degree, and eccentricity centrality (Table 1). We calculated correlation coefficients between measures to only select statistically independent aspects of centrality. We used principal component analysis (PCA) to collapse variance among any dependent centrality measures, as suggested by Brent (2015), and to identify the most important centrality types based on our network structure. We used the r package FactoMineR (Lê et al., 2008) to run the PCA, and package factoextra (Kassambara & Mundt, 2020) to visualize PCA results.

| Network analysis
As boreal caribou mating system is polygamous, with individuals having multiple mating partners, a dense and complicated network is created; visually analyzing the aspatial network along with the nodebased measures of network centrality allows for easier identification of patterns and trends within the network. We used Cytoscape v3.7.2 (Shannon et al., 2003) for the nonspatial analyses of the local and full familial networks. We created the familial network from the reconstructed parent-offspring relationships identified by COLONY.
As each individual has their parents identified by COLONY, as well their offspring, a network can be created from the multigenerational relationships among individuals.
To assess network cohesiveness within the full network, we used the Girvan-Newman algorithm to look for boundaries that run between family groups to find natural divisions within the network by removing edges with the highest betweenness scores TA B L E 2 Sampling data  (Girvan & Newman, 2002;Lusseau & Newman, 2004;Newman & Girvan, 2004). We used an edge betweenness centrality measure (Freeman, 1977)

| RE SULTS
A total of 2,198 samples were collected (2,099 fecal and 99 blood blot   Table 3).

| Local area networks
We identified 18 local area networks in order to determine the cohesiveness and centrality of individuals. The local areas with the lowest edge-to-node ratios were all located in the northern part of the Boreal Shield, with the high edge-to-node ratio areas found further south in the western part of the Boreal Plains and southern part of the Boreal Shield ( Figure 2). We found differences between the distribution of centrality measures between high and low edgeto-node ratio local areas ( Figure 3). The largest edge-to-node ratio was Canoe Lake in the western Boreal Plains (ratio of 15; Table S2.1, Figure S2.3). We identified three other local areas with similarly high edge-to-node ratios ( Figure S2.4, Figure S2.5, Figure S2.6, Table S2.1). The smallest edge-to-node ratio (Central SK Shield) had zero parent-offspring relationships (Table S2.1; Figure S2.7). We identified two other local areas with similarly low edge-to-node ratios, with very few parent-offspring relationships occurring within these local areas ( Figure S2.8- Figure S2.9, Table S2.1), indicating that Boreal Shield individuals are not presenting the same proximity to related individuals as observed in the Boreal Plains. Overall, edge-tonode ratios correlated positively to closeness ( Figure S2.2a), alpha ( Figure S2.2c), betweenness ( Figure S2.2d), and degree centrality ( Figure S2.2e). However, edge-to-node ratios decreased with eccentricity centrality ( Figure S2.2b), meaning areas with lower edge-tonode ratios were less central to the overall network.
When bringing in the first neighbors of all individuals within a local area, the high edge-to-node ratio areas formed a tighter cluster of individuals than in the low edge-to-node ratio areas. Including first neighbors in the area with the highest edge-to-node ratio (Canoe Lake) increased the ratio to 1.14 and connected 73.6% of individuals into one cluster ( Figure S2.

3). A large proportion of each
high edge-to-node ratio local area became connected into one or two large clusters with the inclusion of first neighbors ( Figure S2.4, Figure S2.5, Figure S2.6). In comparison, including first neighbors in the lowest edge-to-node ratio local area (Central SK Shield)

| Full network
Individuals from high edge-to-node ratio local areas were located more centrally within the full family network and clustered with other individuals from the same local area. Individuals from low edgeto-node ratio local areas were dispersed throughout the network and primarily found on the outer edges of the network (Figure 4).
Although all local areas were of similar geographic size (Figure 2), individuals from low edge-to-node ratio local areas were not closely connected to each other in the network. Individuals from these local areas were not found within a few edges of other individuals from the same local area, indicating that individuals encountered in each low edge-to-node ratio local area are from different familial lines, or are dispersers that were sampled in that local area ( Figure 4); as the edges in the familial network represent parent-offspring relationships, these individuals are not highly related to one another and do not form a cohesive group. In contrast, individuals from high edgeto-node local areas were highly connected to one another within the full network, indicating they are closely related, with a high density of familial ties (parent-offspring relationships).
Removal of edges with high betweenness did not alter the overall network structure ( Figure S2.10). Most edges within the network had low betweenness centrality (score of 1% -81.5% of edges; Table 4). Only 2.97% of edges were removed after sequentially removing edges with the highest edge betweenness score until only edges with an edge betweenness > 4 remained (Table 4). While edge removal did not lead to separated subnetworks, the high edge-tonode local areas from the Boreal Plains remained central and clustered within the edge removal network ( Figure S2.10). Individuals from Trade Lake maintained a high level of clustering, but became F I G U R E 2 Locations of local areas. High edge-to-node ratio (pink) and low edge-to-node (green) local areas within the spatial familial network. Lines represent parent-offspring relationships separated from the main network, forming a separate subgroup ( Figure S2.10). Removal of high betweenness edges did not result on individuals from low edge-to-node ratio areas becoming separate subgroups; individuals remained dispersed throughout the network ( Figure S2.10).

| D ISCUSS I ON
Network analyses have been used in biological and ecological studies to quantify and explore the structure of populations across numerous taxa (Bertrand et al., 2017;Dyer & Nason, 2004;Fortuna et al., 2009), but to our knowledge, this is the first to combine genetically derived pedigree data with network analysis to infer familial structure of wild populations. Network analyses are powerful and flexible methods for investigating the complex networks of interconnections between individuals within and between populations (Wasserman & Faust, 1994). With a large interconnected network of 1,562 nodes (individuals) and 1,866 edges (parent-offspring relationships) between individuals, it can be difficult to identify significant differences within the network. By bringing the familial network into a spatial framework and incorporating aspatial node-based centrality measures, we were able to identify groups presenting different levels of cohesion within the network, with some local areas composed of clustered family groups and others presenting lower fitness or being more dispersed over the range. Comparing local area networks allowed us to identify areas of higher and lower fitness and connectivity in the overall boreal caribou familial network.
By identifying local areas within the network, we were able to gain a better understanding of which areas contributed most to the familial network. We found significant differences in central- avoidance. Maternal social rank influenced reproductive success in reindeer (R. tarandus), with higher fitness females having higher fecundity and earlier offspring date of birth than lower fitness females (Holand et al., 2004). We found that local areas with high edge-tonode ratios had a wider distribution of alpha and degree centrality, indicating that more higher fitness individuals are found in these

TA B L E 3 Correlation coefficients between node-based measures of network connectivity
F I G U R E 4 Boreal caribou familial network in Saskatchewan, Canada. Node size indicates alpha centrality score. Node colour represents both local area and edge-to-node ratios. All pink nodes represent individuals from local areas with high edge-to-node ratios, and green nodes represent individuals from local areas with low edge-to-node ratios local areas than in low edge-to-node local areas ( Table S1.1). Very few parent-offspring relationships occurred within or between the northern Boreal Shield local areas ( Figure 2). This suggests that individuals in the Boreal Shield are not central to the familial network and have lower individual fitness, not producing many offspring that survive until fall (low degree centrality). Individuals in low edge-to-node local areas are not from the same familial lines and are not highly related to any other individuals in the network. The removal of high betweenness edges led to some individuals becoming disconnected from the full network, but these disconnected individuals were not from one local area, instead located throughout both ecozones, again highlighting the interconnectedness of the familial network.
In most animal network studies, nodes represent observed individuals, with relationships between pairs of individuals (dyads) defined by an association index (the time the pair of individuals spent together), with edges representing observed relationships, forming an interaction network (Morrison, 2016;Whitehead & Dufault, 1999). For many species, it is not possible or feasible to directly observe rare and elusive species, and therefore, association information cannot be obtained. Pedigree reconstruction can give direct information about dyads between closely related individuals (parent-offspring and full siblings), with these relationships forming the basis of the familial network. In comparison with association networks, in familial networks, only the sampled individuals are known or observed, and the edges between individuals and the unsampled individuals (parents) are inferred by the data analysis (Morrison, 2016 Network analyses are powerful methods to assist in wildlife conservation (Bertrand et al., 2017;Dyer & Nason, 2004;Fortuna et al., 2009), but most wild populations cannot be directly observed, and demographic networks cannot be constructed. By constructing a familial network based on genetically derived parent-offspring relationships, we calculated informative measures to draw a much finer picture of their individual fitness levels, pattern of demographic structure, and relative contribution of local areas to the larger population. The spatial application of the familial network allowed us to identify areas with individuals of higher fitness levels, short-and long-distance dispersal ability across the range in support of population monitoring and recovery efforts.

ACK N OWLED G M ENTS
We would like to thank Bridget Redquest, Jill Lalor, and Austin Camargo for their helpful comments on the manuscript.

CO N FLI C T O F I NTE R E S T
None declared.