A wireless sensor network for underground passages: Remote sensing and wildlife monitoring

This article presents a Wireless Sensor Network (WSN) system, developed to monitor concealed box‐shaped underground passages to protect wildlife. The system's purpose is to recognize the species of passengers (ie, human, canis, flying species, and so on) without using video sensors (image recognition) through data provided by sensors that measure distance and weight. Moreover, the information provided by the WSN was analyzed via a variety of methods, including a neural pattern recognition network, as well as clustering algorithms, which were able to recognize the species of passengers with certainty scores of over 90%. The main concern for future study is the evaluation of these passages, which are frequently located along main highways, and using data fusion to evaluate low‐cost and low‐power microsensors regarding their effectiveness, that is, whether they are frequently utilized by animals.


INTRODUCTION
In Greece, until 2016, a central database-where deaths of wildlife passers due to traffic accidents could be systematically registered-did not exist. The only exception was a small database maintained by Egnatia Odos, the operator of Egnatia Highway, concerning this specific highway. Most of the data was provided by Greek ecological groups, such as Callisto, 1 as well as personal records of concerned researchers. The main proposals included the construction of fences and the installation of nets, on each side of the highway. However, the railings did not have the expected impact, due to the diversity of the animals' shape and size. For instance, a bear would be blocked by a fence, whereas a dog or a fox could easily bypass it by sliding under it. On the other hand, a light-weight net could hinder a dog or a fox, but it would not prevent the passage of a bear (due to its weight and claws).
As a solution for this problem, recent studies suggested the construction of several mitigating structures so as to surpass all available obstacles. Due to the overall high cost of this proposal and the amount of time required to build these passages along several 100 key-crossing points of the road, the construction of tunnels was deemed a suitable F I G U R E 1 Egnatia motorway in regards to bear distribution areas where the disturbance of the natural habitant occurs alternative. Subsequently, underground, box-shaped tunnels were manufactured along the Egnatia motorway, especially in regions of high bear population distribution, as shown in Figure 1. It should be noted these passages were designed to be used by animals, thus several measures were taken into consideration, such as enhancing the fences and protecting barriers leading to the highway, in order for animals to select other routes. Second, tunnel locations were studied and planned beforehand in order to assure that these passages would be used by animals. The main concern, regarding the future, is the evaluation of these passages in respect to their effectiveness, that is, whether they are utilized by animals, thus acting as a way of preserving the natural habitat and enhancing the animals' threatened ecosystem.
Over the past decade, advances in the semiconductor industry, wireless communication, sensor design, and energy storage technologies have helped realize the concept of a truly pervasive wireless sensor network (WSN). 2 In addition, WSN technology has been utilized during the last years for wildlife management, monitoring and conservation of the natural habitat, since traditional ways of wildlife tracking were severely limited. Specifically, the main issue faced was that the systems developed could not function well and were error-prone due to weather conditions, labor-intensive tasks, and their limited monitoring scope. 3 Integrated microsensors, less than a few millimeters in size, with onboard processing and wireless data transfer capability, are the basic components of the new WSN nodes offering optimal identification, tracking, and monitoring.
Nowadays, the WSN applications developed for monitoring animal wildlife are usually expensive to build and maintain. Most solutions resolve to purely to video surveillance or thermal cameras which have generated systems able to analyze and predict movement and animals' pattern with high accuracy. 4 Latest research uses the same techniques but have also incorporated drones that can produce real time data and not disturb the natural habitat. 5 WSNs organize nodes in a network topology that enables data to be collected, aggregated online, and ultimately forwarded to a designated data sink. 6,7 The main benefit of the application of WSN in the wildlife protection domain is the generation of accurate, real-time information about passage usage at high granularity, and low cost. This information can be further analyzed in order to be capable of providing the necessary insights about the effectiveness of such mitigation structures. Except for wildlife monitoring, WSNs are used in the field of ecological informatics as evident by AL-Dhief et al, 8 which presents current WSN-based solutions for forest fire surveillance, using concurrent detection systems. This article presents a novel approach regarding the remote sensors communication and rationale in developing a Wireless box-shaped underground passages (WBSUP), that is, a specialized form of WSN, which mostly use sensors close to the soil's surface. We introduce the term WBSUP in order to present WSNs that have various characteristics (eg, multiple shapes, structures and geometric shapes, different wall/ceiling/floor materials used, and so on), do not use image trap solutions (visual WSNs) and may be used for applications requiring low power, low-cost, and less resource intensive sensor nodes data fusion techniques. More specifically, these nodes fuse data and typically consist of sensors (transducers), microcontrollers, and radio transceivers. In most applications, 9 the nodes are attached to subsurface underground sensors and communicate using electromagnetic waves. The subsurface nodes act as a receiver, whereas those buried (weight sensors) act as a sender, conveying the necessary signals (usually data samples) to aboveground sinks or relays.

AIMS AND OBJECTIVES
This article aims to address the evaluation of these underground passages, examining whether they have served their purpose: that is, whether they are used by a different wildlife species. In order to achieve this, a pilot WSN was developed with the following aims: 1. Design of a WSN for (a) detecting the kind of each passer and (b) automatically counting the number of crossings of each species. 2. Develop optimal configuration of the WSN topology and sampling, identical to the dimensions of the possible passers, in order to increase its detection and measurement accuracy. 3. The data modeling of information, mining of raw data and elimination of inaccurate and duplicate entries among the sensors, the determination of data structure-in regards to the real-time model simulation, timing errors, potential occurrence of outliers and data dispatches from nodes-and the development of relevant information systems. 4. The implementation and evaluation of state-of-the-art data fusion methods using clustering models for supervised and unsupervised learning (K and X means algorithms), nonlinear regression models in order to assess the pattern recognition neural networks (NNs) output.
This work has the following objectives: 1. The measurement of the usage frequency regarding passages close to wildlife (especially brown bear and Canis lupus, as well as by other species). 2. The optimal method in order to evaluate the contribution of the passage towards (a) the geographical and demographical connectivity among individuals of the above species, and (b) motorway safety in respect to accidents involving those individuals.
Based on the above aims and objectives, the contribution of this article is 3-fold: 1. The development of a custom, low-power (most applications based on visual WSNs need additional power to operate and computation resources to analyze videos and images) WSN consisting of seven wireless sensor nodes (each corresponding to a specific sensor), which may collectively measure the height, width, and weight of a random crosser, aka passer, when combined. 2. The development of state-of-the-art data fusion methods that are able to classify reliably the above measurements into passer classes, that is, brown bear, wolf, bird, human, dog. 3. The configuration and deployment of the combined system to a real-life scenario and its evaluation with real data.
An important feature of this study is that underpinning the WSN is a custom remote data acquisition system that uses 3G technology to communicate measured data to a central repository, from where it is queried and analyzed. The WSN was deployed at passage K81 along the VA-45 motorway section. It should be noted that the selection of the specific passage was based on the Hot Spot Analysis-Getis-Ord G* technique. 10 This map was derived via this method's z-p scores values and was selected in regard to the context of neighboring features. The passage selected is located in the upper left side of the map, presented in Figure 1.

RELATED WORK
Recent advances in microelectromechanical systems and in low-power wireless network technology have created the technical conditions to build multifunctional tiny sensor devices, which can be used to observe and react to physical phenomena of their surrounding environment. 11 Wireless sensor nodes are low-power devices equipped with processor, storage, a power supply, a transceiver, one or more sensors and, in some cases, an actuator. Several types of sensors can be attached to wireless sensor nodes, such as chemical, optical, thermal, and biological. These wireless sensor devices are small and cheap, and they can automatically organize themselves to form an ad hoc multihop network. Widespread networks of inexpensive wireless sensor devices offer a substantial opportunity to monitor the surrounding physical phenomena more accurately, in comparison to traditional sensing methods. 12 Some challenges, which we need to comprehend, are cited in Reference 13. Several approaches in this area target WSN life span, 14,15 energy consumption, 16,17 and security applications. 18 This is the reason why WSNs have been increasingly entailed and developed, primarily targeting real time applications, 19 such as those supporting the general public in urban settings 20 and environmental monitoring applications, both indoor 21 and outdoor, 22 that is, water quality, 23 habitat monitoring, 24 traffic monitoring, 25 earthquake detection, 26 volcano eruption, 27 agriculture, 28 and weather forecasting. 29 A specific focus of environmental monitoring is wildlife monitoring. In recent decades, wildlife has been thoroughly examined, in order to shed some light into various everyday aspects, such as climate effects, the reduction of the planet's biodiversity, natural and habitat pollution and in general the overall fallout, in the years to come, on the variety of life in all its forms.
In terms of applications, most case studies mainly analyzed the construction of animal tracking systems 30 in order to study the migration paths of wild animals, 31 their natural habitat, 32 their movement process, 33 and their behavior. 34,35 Sheep flocks were tracked in Reference 36, in order to optimize farming. In most cases, animal monitoring location suffers from intermittent or not existing connectivity, relying on opportunistic techniques for communication. The most important projects on which we relied were Zebra Net, 37 GreatDuckIsland, 38 and Wildsensing 39 that constitute the initial WSNs, created for monitoring wildlife. 40 In terms of WSN infrastructure developed for the study of fauna, the following approaches were utilized, referring to performing experimental procedures: • By placing a static monitoring device on the sample tested (eg, collar) and collecting the necessary data via Very High Frequency (VHF) mobile readers. 41 Zebras are used as data mules, that is, mobile nodes for collecting and physically routing sensor measurements, while mobile VHF readers store the collected data to a local laptop, which uploads them to a server later. 42 • By installing a fixed infrastructure of wireless sensors in a preselected location. 43 • By installing a hybrid scheme where fixed infrastructure elements, that is, radio frequency identification readers, interact with tags placed on animal collars, collect data, and upload it via 3G to a server. 44 Throughout the publication, we primarily focused on the development of a WSN with MANET 45 features. Our approach is similar in concept to both the second and the third case above, consisting of high-power and low-power WSN nodes: the former is used for untagged animals, while the latter is used for tagged animals. High power nodes are equipped with VHF readers and the WSN architecture discussed in this article and installed at wildlife passages with electrical supply. Low-power nodes are lightweight versions, installed in remote passages without power connection. The Hot Spot Analysis-Getis-Ord G* technique 46 calculates the most probable locations where bears may try to cross the Egnatia Motorway, that is, hot spots, based on an analysis of satellite telemetry data, by means of animal radio collars as well as human autopsy related data along the motorway. More specifically, this method calculates the statistical value Getis-Ord G 10,47 for each crossing point, an indication of the degree of accumulation of high value data points inside the study area.
Special attention should be given to fence trespasses (383 incidents), motorway crossings by radio tagged animals (36 incidents), and road accidents (21 incidents). The used WSN nodes interact both with tagged and untagged animals. Tagged animals are bears equipped with either commercial or custom radio-collars, developed in the ALPINE project. 48 As the radio-collar transmits a unique identifier, both the individual and its species can be recognized. Contrary to this, untagged animals bear no technological elements and constitute the majority of passers. They are classified only in respect to their species using the work described in this article. It is noted that this study takes into account recent studies regarding large passage monitoring, 49 animal tracking, 50 long-term environmental monitoring, 51 and behavioral inference. 52,53 As far as we are aware, the only alternative approach to species recognition of untagged animals is using image processing, for example, References 54,55; however, this is usually performed online, using static images taken as snapshots, and it is error-prone. Finally, in terms of mathematical analysis, we studied the usage and effect of pattern classification of NNs 56 and clustering algorithms. 57,58

SYSTEM ARCHITECTURE AND MATERIALS
In the following section, we present the basic structure of the concepts and methodologies this work is based on. These include (a) the custom WSN architecture and (b) a knowledge representation model consisting of the dimensions of passers and the mathematical equations used to calculate these from sensor measurements.
The system architecture consists of seven WSN nodes, one gateway, a minicomputer equipped with a 3G router and AC/DC converter. The most important component of the WSN node is the data collector module, which is responsible for: (a) collecting the data from the sensor interface, (b) transmitting the data wirelessly to the gateway, using the ZigBee technology, and (c) acting as a router among other data collectors in the same network when a remote gateway is used.
Depending on the sensor type that is used, a different sensor interface is required on the collector side. Two sensor interfaces were implemented, one for distance sensors (Eco Prisma MultiSense Dist) and one for weight sensors (Eco Prisma MultiSense Load). The Eco Prisma Multi Sense Dist is a serial protocol interface, which utilizes the collector's serial port. The Eco Prisma MultiSense Load is based on voltage measurement using a Wheatstone bridge, which is connected after its amplification to the collector's analog input. The layout and schematics for these sensors are shown in Figure 2.
A gateway device is required, which uses the ZigBee protocol for data transportation in order for WSN nodes, which act as collectors, to communicate with the mini computer. The gateway uses the same Processing Unit (MSP430). The minicomputer aggregates the sensor measurements from the gateway and forwards them via 3G to the central repository for further processing. The use of 3G was deemed acceptable, due to the lack of 4G cellular data coverage in the remote locations where the majority of passages were located along the highway. Moreover, the bandwidth from the sensors' data was fairly low (ie, an xml file regarding a sensor's measurement is less than 1 KB), thus not problems were observed in the flow of raw data. For this role, an Intel NUC DN2820FYKH was selected, due to its minimal size. Furthermore, given the lack of both wired and wireless local area networks, a 3G/GPRS router was acquired. This leverages the provided cellular network for creating a local wireless network, in order to forward data as above to the repository. The sensors are presented in the following sections.

Sensor properties and positions
As previously stated, seven sensors S1… S7 measuring distance and weight have been installed in K81, the case study passage of this article. These are four distance sensors and three weight sensors, described and presented in Figure 3. Generally, the distance sensors are categorized in sonar and beam models and in this specific passage, due to its morphology, we have selected sonar sensors so as to ensure better coverage of the individual passer, required for instance when only part of the body of the passer, for example, its torso or its legs, is observed by the sensor, due to the passer's mobility. In respect to weight sensors, S Type Load Cell sensors were selected with Maximum weight capacity of 500 [kg] and approximate overload 700 [kg]. Throughout their installation, special attention was paid to several factors, such as calibration and pressure. In Table 1, in an effort to simplify 3D visualization shown in Figure 3, we present the sensors used, their type, and the place where they were installed. It is noted that the distance sensors selected were weather resistant high performance and precision ultrasonic components and the weight sensors were compression/tension load cells, designed to measure a specific type of force, and ignore other forces being applied. All information regarding these types of sensors (range, power consumption, voltage supply, dimensions) can be found to their datasheets for distance 59 and weight 60 sensors, respectively. The main objective, when placing the sensors, was to extract valuable information regarding the crossing of a passer through the tunnel. In order to measure the passer's height, a distance sensor was placed at the ceiling of the passage, near its entrance, so as to ensure its instant activation. Three weight sensors were installed before the entrance of the tunnel, in order to measure weight. It should be noted that the specific number of sensors placed in the tunnel was calculated with respect to the width of the passage's narrow entrance and each sensor's dimension, so as to cover the tunnel's length and width and to ensure that the passer will step on at least one weight sensor, generating a value. Furthermore, width was measured by two distance sensors, diametrically placed on the walls, at equal distances from the entrance and simultaneously providing values. The measurement regarding width and height is shown in Figure 4. Finally, the measurement of length was acquired by placing an additional distance sensor on the left hand-side wall, further along the passage at the same height as the opposing width sensors. A graphic display of the sensors' positions in 3D appears in Figure 3.

Sensor actual values
First and foremost, our task was to select an optimal sample rate. To improve the efficiency of our actual dataset the sampling needed to be high enough for the system to be able to detect a passer with a high walking speed without inserting multiple duplicate values. For example, consider the case of a male wolf which paces at 5 km/h (reference point is in regard to its front legs) and traverses the tunnel with an approximate speed of 4968 km/h, as shown in Figure 5. It is assumed that the point of entrance is shown by the arrow and that the wolf will keep walking straight on. Furthermore, its distance from the passage walls is 62. 5 Based on its speed, it would take the wolf 1 second to cross 1.38 m and 3 seconds to outrun the sensor measurement operating area, as shown in Table 2. Each row corresponds to a sensor measurement at a present time. For example, when the passenger crosses the tunnel (t = 1 second), the first set of sensors triggered are the ones regarding the weight.  Due to each size only the left and middle weight sensors are triggered (Sensor ID: 5,6). Later on, due to its walking speed the passer has crossed the area detecting the weight and has entered the line of sight of the distance sensors thus triggering the sensors fusing width (Sensor ID: 1,2) as well as the sensor measuring the height placed on the ceiling (Sensor ID: 4). In this step, the mathematical types to calculate the actual values are clearly shown in Figure 4. Finally, in the next sampling, the wolf will have crossed the entrance and will trigger the sensor regarding length (Sensor ID: 3). From this case it is evident that it is necessary to group our data to a tuples of passenger dimensions in order to accelerate the upcoming data processing. Several experimental procedures have been conducted, involving various animals moving at different speeds, points of entrances, and random mobility patters. This has led to the following finding: the optimal sampling should be set at 0.5 second. However, a sampling rate of 1 second was selected in the end, due to the fact that it was high enough to capture high speeds and to ensure precision in our measurements, without filling up our database entirely (or the intermediate storage buffer inside the sensor node). It should be noted that the sampling rate selected focused more on acquiring a small sample rate from several data samples and less on the data's accuracy. Specifically, this decision balanced the need for collecting an adequate number of data samples for evaluation, while avoiding heavy and resource-intensive tasks which would "stress" the minicomputer used to aggregate the sensors' data.

Pilot application
A WSN that complies with the above design was implemented and deployed in Passage K81, which is in an area covered with trees and bushes, around a sand path with rocks. Sensors were enclosed in protected waterproof cases, so as to avoid vandalism or corruption by natural causes. Although the sensors used were low-powered components, thus not needing a big electric supply, they were powered by the high voltage load line already installed in that section of the Egnatia Highway. The system was used for a duration of 1.5 months, from October 26, 2015 at 11:40:29 to December 9, 2015 at 18:02:49, resulting in a real dataset. In this article, we shall refer to this dataset as 7-Input Dataset as it consists of all the sensors values from presented in Table 1. A set of data points from this dataset that correspond to a human passer is presented in Table 3. From these values, three-input types of the format <width; height; weight> can be generated as shown in Table 4.

TA B L E 3
The raw data provided for a human passer by the Wireless Sensors Network for the seven sensors of weight It should be noted that, during WSN deployment, certain corrections were applied to the equations so as to reflect real conditions. For example, passer's height should be computed as the difference between the height of the passage (350 cm) and the sum of the observation of the height sensor S 4 (value I, time) (distance between height sensor and human head), the height of the weight sensor box on which the person applies pressure and the height of the sensor box installed on the ceiling. The height variable for flying species has not been included, because, if we incorporate it in our system, chaotic results are noted, due to the installation of a sole height sensor and outliers to the value of it (near to zero when flying on top and near maximum value when flying closer to the ground). The height of flying species varies, depending on several factors, for example, wind and animal's age. This inconsistency is crucial, since a flying species' height may be associated with other animals. Finally, we separate species by using clustering algorithms and comparing these results with the NNs, and the attribute of height is a decisive factor in predicting flying units; however, a precise value is not necessary. As a result, the ceiling is set as a starting point, in order to calculate the initial value of each measurement. This technique results in a variety in our data mining algorithms, in accordance with the height of the flying object.

DATA FUSION
In this section, we shall present basic notions regarding the storage of information as well as the structure of XML files, which provide the sensors' measurements. In addition, we will demonstrate the pattern NN (PNN) as well as the cluster results that have been extracted alongside, for the same data group, for evaluation and verification of the PNN. The equations used in order to fuse all measurements into one mathematical model and more specifically from raw sensors' data to kind of passenger are briefly presented in the following sections. During database construction, three tables were created for the storage of information. The first, Sensors, lists the seven sensors of Table 2 and provides a description of their type (weight/distance), name (Sensor 1, 2, and so on), position (longitude and latitude), and the manufacturer's company. The second table, Units, contains a description of the units of the variables measured by Sensors (eg, kg, cm). Last but not least, table Sensor Measurements contains raw sensor data and has the fields' date-stamp, time-stamp and value. The three tables are linked to each other with foreign keys, so as to be able to query the units and the type of each sensor, along with sensor measurements.
The communication between our system and the database was established via a local network, created by the tool XAMPP. 61 MySQL technology was chosen for the database implementation, the Transmission Control Protocol in order to connect to it and UTF-8 Unicode (utf8) as the database character set. Furthermore, in the Sensor Measurements table, all values are floating-point numbers and the time-stamp is measured in msec. Originally, raw sensor data were provided by the WSN technology owner, in XML files. In Table 5, we present the basic structure of one of the 455 XML files, used during our experiments. XML files were mined using JAVA and in particular, the X stream library, 62 which is mostly utilized for serialization and deserialization of objects. A class was defined to hold the parsed values of the XML files and then a java object was generated for each XML field, for example, node, sensor, data register. Using SQL statements, the derived objects were stored in the appropriate tables, thus populating the database.

DATA PREPROCESSING
After the storage of all available data, a view was constructed in order to retrieve the complete dataset. Each row in the query output corresponded to a unique measurement and would be directly categorized by seven columns, displaying the value of each sensor. Next, the dataset needed to be divided into "commutes", so that the data for each commute could be used for model training. However, this was not straight forward, as the number of data points for each commute was not fixed. This occurred due to the fact that distance sensors were triggered several times, for each passing, for as long as the user was moving within their activation zone. Moreover, it should be stated that time is not a safe criterion, because a passer may have an extremely high or low walking speed, thus entering the sensors' line of sight for a short or extended period of time. In the first case, fewer than necessary data points were produced, while in the second case multiple observations occurred per sensor. Weight sensors were only activated during the beginning of each measurement, thus providing a point of reference, except for the slim possibility that a passer might step over these sensors, due to its stride and pace. This is the main reason why it was necessary to preprocess the raw data from the sensors. More specifically, three weight sensors were placed in the passage entrance, in order to cover its full width. The following scenarios were examined: (a) none of the sensor was pressed, (b) one sensor was pressed, and (c) one sensor was partly pressed. A simple way to check these cases was to preprocess these values, based on real case scenarios taking into account the dimensions provided by the environmental nongovernmental organization. 1 An example of an animal's dimensions is presented in Figure 6. In addition, another decisive factor for recognizing a passer is mining the raw data to produce the number of commutes. In order to achieve that, we based our data processing in timestamp and length references. The first one provided the necessary information to group a set of data in a bundle and the second confirmed this assumption. We also examined different scenarios, such as an animal staying in the passage's entrance for a long time or traversing in high speed, the possibility of loss of data (ie, an XML file not sent/received) and false data either from voltage drops or random small objects (eg, leaves), given that the region's climate is windy. We created a dataset from all the provided data, thus including duplicate entries; however, we excluded wrong values, for example, negative values for length. For instance, animals' height varies, as it is measured from head to tail, when entering or exiting a passage; thus, we decided to calculate the values from the first 2 seconds, that is, their head and torso.
Based on the above, in order to delineate the data into commutes, our data were afterwards sorted in descending order, based on their date-stamp. A problem faced during the examination of sensors' data was false triggering. For instance, a common case which created noise in the data samples extracted from the sensors was when an animal entered the line of sight, but did not pass through the underground passage. This triggered some of the system's sensors. This error would be identified by the lack of length values (Sensor 3 would not be triggered) and the inconsistencies in the height and weight values which would aid the NN recognize a false trigger or the existence of insufficient data for an accurate prediction. It is noted that pattern recognition NNs have been tested and trained to group height and weight values as a vector of continuous values in order to track each animal from head to tail. More specifically, in order to deal with these scenarios we calculated the mean of all the values per sensor that corresponded to a similar period of time (contiguous timestamps) using the weight measurements (sensors 5,6,7) as a reference point. These calculations were saved in a new table consisted of unique rows and seven columns. It should be noted that timestamp was extracted from the data samples, as evident from Table 5 (line: <TIME>value</TIME>). Even when we noted a zero value within a sorting by the weight sensors, we stored F I G U R E 6 An example of an animal's dimensions (a brown bear) provided by the environmental nongovernmental organization 1 it into our final dataset so as to construct a highly comprehensive and analytical model. When the weight was not recorded at all, we considered each passer commute, based on a specific, shorter period of time, which we deemed necessary, during which all other sensors would be activated. Finally, we cross-examined each row of the final table to the total date stamp of these values, ensuring that the values corresponded to a correct month, day, and hour. To illustrate the above, with a theoretical fixed number of seven sensor measurements per commute, our dataset of 455 raw data files would correspond to a minimum of 65 passers. However, the above preprocessing generated 76 different commutes for further evaluation.

RESULTS AND DISCUSSION
Our first approach and analysis were based on raw data, namely the 7-Input Dataset. At first, there was a clustering of data, based solely on their units (cm, kg), which was accurate, since it clustered together distance measurements (denoted by cm) and weight measurements (denoted by kg). This method was not successful, because both algorithms could not comprehend the connection between the sensors (ie, S 5 , S 6 , S 7 as one value) that was derived from our mathematical model, with the exception of the opposing ones (S1 and S2, regarding width) which reported very similar values and timestamps. Consequently, this method was not useful, so we decided to exclude the units of measurements and recluster the data. During our tests, we have not excluded any outliers, regarding the inputs of our system. Primarily, an open source software suite of machine learning software written in Java 63 has been utilized to verify the tuple for the 3-input passer dataset, for which we know the "ground truth" and the NN's final recognition result set, which we do not. It should be noted that the latter refers to processing real data . The clustering results are presented in the following manner: instances of the training set (samples) are shown in Real axis and the variable of interest (classification variable) in Imaginary axis. As we know the "ground truth" of our datasets and the prediction of our NN, by associating the sample id with the type of passer (eg, samples 0-21 human, 22-64 bear, and so on) we were able to identify the passer which we were trying to cluster and thus verify the validity of the clustering's centroid.
Each centroid consists of three attributes: width, height, and weight, for a dataset of 549 instances. For example, the average bear's dimensions in our examined dataset are: <50:55 cm, 89:51 cm, 130:17 kg>. In order to examine the validity of these results, we have implemented K and X means algorithms, for both raw and processed data. The purpose of conducting tests was to expand an NN proposed, which introduced a novel modeling technique for technical BSUP, and validate its results, via supervised and unsupervised learning. This is the main reason why we validated the results, by providing the exact number of passengers (clusters) from the NN, then repeated the test, without providing additional information.
For example, when NN showed two different species, we conducted an experiment requesting two clusters to group our data (K means), then repeated the test without providing the number of groups for the data to be organized. The data processed focused on bears and wolves; however, our study case and dataset were created to train, test, and validate the NN, which consists of humans, bears, C lupus, and flying species. The procedure followed (database to NN for validation of ground truth) is presented via a flow chart diagram, in Figure 7.
Moreover, based on Reference 42, we extended the publication with several test stages in order to test extreme cases. First, we performed clustering among flying species (birds and bats) and bears. We underline that this comparison includes two clusters, where both zero and high values are observed in both weight and height variables. In particular, we may not measure the weight of flying units, since they do not apply pressure to the weight sensors while flying, hence generating zero weight values (whereas bears often weigh more than 180 k, leading to the generation of high weight values). In addition, we are not able to determine a fixed value regarding the flying units' distance from the ground. As birds fly at different heights, the values of the ceiling sensor vary: small distances are measured as they get closer to the ceiling and vice versa, whereas the observed values of bear's height are more standardized, depending on whether they walk across the passage on two legs or all four. In Figure 8, we present the clustering results using Euclidean distance over 63 data points, where instance 0-20 were flying species, in respect to height. As evident from this figure, instances 0-20 are characterized by either maximum (ceiling height: 340 cm) or minimum (ground level is close to 0) values, thus the clustering algorithm correctly segregates them, based on the Height attribute. Due to the accuracy of this prediction, it may be utilized when there is a shortage of data, for example, in cases of animals walking with four legs. In practice, the following questions are raised: 1 In case of a malfunction, how would a four-legged animal be recognized without the weight value? 2 If the first two legs of a male bear (its length typically exceeds 2 m) press the weight sensors, but it stops moving and its torso is within the range of the sensors, would this produce multiple data samples?
For this purpose, height measurements and clustering analysis is presented regarding this attribute. Similar testings were conducted regarding the Greek shepherd and the Grey wolf. We executed the X-means algorithm on 63 samples, consisting of 21 Greek shepherds and 42 wolves and using the Euclidean distance. Three clusters were generated, one representing Greek shepherds, one male wolves, and one female wolves, thus correctly distinguishing the different species. However, although our dataset included both male and female Greek shepherds, these were not segregated, because most females were mistakenly classified as wolves. It should be clarified that Shepherds are a dog breed, while wolves are subspecies of C lupus. 64 In Figure 9, we present the clustering results, based on weight.
To cluster raw data (7-Input Dataset), each sample is characterized by seven values, which represent a data sample from each sensor (see Table 1 and Figure 3). We had to exclude the measurements from each value, that is, cm and kg, because measuring units act as a binary identifier, which greatly impact raw data and alter the prediction. For example, in Figure 10 if we study instances 0-64 and the first cluster (in blue color), the clustering method may not clearly segregate samples and demonstrates an approximate centroid value to be taken into consideration. The clusters do not unveil the difference between humans and animals, and no logical assumption could be reached by inspecting their centroids either.
Finally, we exclusively presented figures derived from X-means algorithm. Although we could achieve similar results with K-means algorithm, we already knew the number of clusters based on the NN's prediction. X-means accurately clusters the data in the majority of cases.

F I G U R E 9
Clustering results of the X-means algorithm using Euclidean distance where orange represents Greek shepherd, green male wolves, and blue female wolves samples in accordance with the variable of weight F I G U R E 10 Clustering results of the X-means algorithm using Euclidean distance where no obvious conclusion regarding the samples clustering can be made, in accordance with the variable of height

CONCLUSIONS
An artificial NN system, consisting of two hidden layers, was developed and in each case scenario we used approximately 30-65 iterations while the maximum number was set at 100, six validation checks, random data division, scaled conjugate gradient, and cross-entropy performance, all in accordance with Matlab's standards so as to ensure the quality of results. In addition, for each test the application generated, the confusion matrix, the receiver operating characteristic, the training performance, the training state and the error histogram plot, which were used to assess the validity of the configuration of the above results, in order to ensure that the produced outcome is optimal. A crucial question arises, regarding the percentage rate of the datasets used, corresponding to training, validation and testing. The testing included a real time dataset of commuting from the passage, divided into two subsets (in both cases, there were five output classes). The first experiment consists of a dataset of three inputs for each passer (ie, width, height, weight) and the NN results are presented in Table 6. The second experiment contains seven inputs for each passer (ie, the sensor's value) and the NN results are presented in Table 7. These tables were designed in order to display both the overall behavior and efficiency of each case and the specific parameters. For example, since our system was created so as to recognize the kind of each passer, the validation rate had to be high, in order to ensure that the system produced an accurate forecast. As evident from the previous section the system developed managed to predict the passers accurately in most cases. It is noted that even though the results are positive the above-mentioned system needs to be tested in extreme case scenarios (ie, hot/cold weather, higher percentage of moisture) the sensors' hardware capabilities need to be tested and expanded in general. For example, even though we have managed to simulate predictions close to the ones of camera-vision systems, it is unclear in the long run if the precision and accuracy of the measurements will be kept intact. During our testing in a time span of 2 months, problems regarding the measurements and specifically the distance sensors, which were exposed to nature conditions (air, animals touching them, and so on), have occurred. For future reference, an automated control system should be proposed that would monitor the system itself and would test the instruments for errors and calibration issues.
After the creation of a PNN, we conducted experiments and simulations using both supervised and unsupervised learning techniques. More specifically, we used the Simple K-means, a clustering algorithm which organized the data points provided to the nearest centroid, for a fixed number K of clusters (ie, kind of passenger), until the system converged, for a given number of iterations, which varied from 100 to 1000. In addition, an extension of this algorithm was used called X-means, in order to validate the results of our initial approach. The term X (contrary to K) describes not predefined clusters, produced during run time, in regards to the data provided. In many cases, it could cluster the allocations more efficiently. 57 It should be noted that the system proposed is not resource-intensive. The methods to recognize the kind of passer are based solely on data derived from low-power and low-cost sensors. The same methodology can be implemented and simulated using cheaper sensors or microcomputers, such as Raspberry Pi. This is very important, given that the use of low-cost, low-size, and low-power sensors in tunnels is not currently promoted as a durable solution. The majority of recent literature focuses on implementing image-based solutions for tracking animals 65 and passers. 66 There are few existing alternative solutions 67 which provide accurate results. However, they present limitations due to their high cost. Compared to a WSN, a thermal/night vision camera or a visual sensor network requires more electric power for its operation and additional computation power to process and store the images, even when they are compressed.
While future solutions focus on real-time data streaming via distributed interconnected networks of remote cameras, 68 this article suggests the use of simple WSN approaches, especially in small or medium sized mitigating structures. Besides installing the actual sensors, most of the methods used (ie, data preprocessing, a simple-small size NN, and simple clustering algorithms) are not resource-intensive. This system can be expanded in order to monitor wildlife activity in other box-shaped underground areas like caves, tunnels, and so on. Finally, the authors suggest that this system could be used as a standalone proposal for monitoring animals' mitigation flows or, due to low energy consumption and hardware cost, to be used as a means of confirming direct observation of empirical evidence. It is noted that it can be used either as an early and quick solution before setting up an image-trap based sensor network or to downsize the total number of cameras. More specifically, a WSN could be used to collect and categorize data samples. However, in order to assess our validity of prediction, we could alternatively check the ground truth (ie, the clustering algorithms validating the NNs prediction) via cameras. This would be an interesting future work, as it will not only validate the predictions made, but it could also train a dataset and future NNs to make more complex predictions, beyond monitoring wildlife and the natural habitat. For example, a crucial factor that can be determined is the local multitude of an area, in terms of the endangered animals, determining if the greenhouse effect has not altered the patterns of bear's hibernation, and so on.
Finally, future system expansions should focus on distinguishing the number of passers simultaneously passing the tunnel, by further developing the proposed solution or adopting a new approach for the calculation of the flocks of animals passing through. In addition, a study of bears' behavior when subjected to artificial or natural light would be of great interest. The system proposed does not require high bandwidth to transmit the sensors' data. It should be noted that a project currently developed focuses on reducing the cellular data transmission load. Since most tunnels are located under main highways, above which public buses cross daily at specific times within each day, said project will utilize their open Wi-Fi connections. This approach presents challenges, such as the high velocity of cars which does not allow the transfer of big chucks of data; nevertheless, interconnectivity between sensor nodes can be used in Internet of Things applications. In addition, the authors suggest a study of bears' behavior, when subjected to artificial or natural light (eg, mirror reflection from the surroundings) in respect to the attractiveness of using a tunnel, is also an interesting task. Another future project that is already in place is the study of their behavior and specifically their biological pulses (ie, their primal instinct for mating, food), so as to determine whether we can correlate their feelings and needs with our data and patterns extracted from the passage usage.
The above-mentioned algorithms, techniques, and rationale should be incorporated into an application programming interface (API) in order to facilitate future researchers. This can be further enhanced and simplified by adopting a cloud-based solution. A container shall be used ad hoc to provide the database schema, an API to call the NN, clustering as well as the database to locally create a visualization of the sensors' samples. This tool would be useful to researchers, environmental nongovernmental organizations, and enterprises interested in using a full software as a service solution.

APPENDIX A.
Firstly, based on Figures 8,9, and 10 the clustering representations of this publication mainly focus on X-means algorithm. In order to explain our results, we shall present other test cases regarding an analysis of raw sensor data using a similar algorithm, the K-means. The aim of these tests was to present the clustering rationale for raw data samples from all the sensors as explained in Table 1 and presented in Figure 3, based on the timestamp of the sensor measurements (measurement unit: seconds).
The first clustering results regarding K-means algorithm are presented in Figure A1, where we selected two clusters to be produced; cluster 0 (blue) and 1 (red). In cluster 0, the algorithm grouped all measurements deriving from sensors 1, 2, and 3, whereas cluster 1 contains data samples from sensors 4, 5, 6, and 7. Based on these results, we may validate the data provided by the sensors; in Figure 5, it is evident that S1-S2 which are opposing sensors have similar values, while S3s values are similar to S1s, because passers usually walk in a straight line. Furthermore, cluster 1 grouped all other measurements together, since they do not have similar attributes.
Afterwards, we repeated this execution, having set the number of clusters to three and then four in order to group the data based on width, height, and weight. It should be noted that, even though the illustrations were different, the logic applied did not change. These test cases are presented in Figures A2 and A3, respectively. F I G U R E A1 Clustering results of the K-means algorithm regarding two clusters, for raw data from all the sensors F I G U R E A2 Clustering results of the K-means algorithm regarding three clusters, for raw data from all the sensors F I G U R E A3 Clustering results of the K-means algorithm, regarding four clusters, for raw data from all the sensors F I G U R E A4 Clustering results of the X-means algorithm regarding two clusters, for raw data from all the sensors Finally, we conducted the same test case using X-means algorithm and presented the results in Figure A4. Contrary to K-means, X-means receives the minimum-maximum number of clusters and selects the optimal number of clusters to be used to group data samples. Specifically, we executed the algorithm with the same data samples and provided inputs from minimum 2 and maximum 4 clusters in order to compare our findings with the previous tests. The algorithm provided correct predictions, since it grouped data into two groups, where cluster 0 (blue) presents the weight sensors' values (A1, A2, A3, and A4) and cluster 1 (red) shows the distance sensors' (A1, A2, A3, and A4) values.
Finally, briefly after reading this publication, the reader shall understand the rationale behind: • Remote sensing application for underground box shaped passages (tunnels) • Prediction and validation of passers from data derived by a wireless sensor network • Wildlife monitoring for endangered Species (brown bears, wolves, and so on)