Blockchain-enabled secure crowdsensing for trackside infrastructure information collection and validation in railway signalling data preparation

Rail operators around the world are adopting advanced control systems for railway signalling and train protection to improve their safety performance. In these re-signalling projects, the information about the already installed signals, point machines, track circuits etc. along the trackside infrastructure is essential for the conﬁguration of advanced signalling equipment. The conventional information collection method which is based on unmanned aerial vehicles, helicopters etc. has the access limitations in terrains such as tunnels, hilly regions, river bridges, dense forests etc. To overcome these challenges, this paper proposes the railway vehicular crowdsensing method in which one group of trains participates in this information collection and another group of trains conﬁrms the validity of the collected information. For information exchange between the trains and centralized server, this study chooses the permissioned blockchain-based information transaction mechanism to ensure the trustworthiness. Furthermore, the railway markup language-based blockchain databases are included for information immutability, crowd information integration, and the automatic execution of data preparation signalling rules. Finally, the case studies are carried out to analyse the adequacy of the proposed combination of crowdsensing based secure trackside infrastructure information collection and validation, permissioned blockchain-enabled information transaction and blockchain databases in the proposed signalling data preparation process.


FIGURE 1
Typical railway signalling data preparation process  data modelling of railway infrastructure using railML is demonstrated in the projects such as 'Intelligent integration of railway systems' (InteGRail) and 'Automated and cost-effective railway infrastructure maintenance' (ACEM-Rail) [6].
Similarly, in the case of interlocking [7] configuration data preparation, the specific signalling rules are applied over the railML information. Overall, the railML has been proven to solve the problem of signalling data modelling [7,6]. In the context of proposed data preparation process, this study also chooses the railML database as the base. Section 2 introduces the background of the conventional data preparation process.

BACKGROUND
The steps involved in typical railway signalling data preparation process [3] are (a) trackside infrastructure information collection, (b) trackside signalling data design which includes application of signalling principles over collected information in the form of railML, (c) data validation, (d) data verification. For this data preparation process, there exist, in general, two approaches [3], namely, single-chain and double chain processes. The singlechain process mandates the formal method approach [8]. There are two possible variants in the double-chain process [3]. For variant-1, the two chains i.e. chain-1 and chain-2 use the same specification to generate the same signalling data. Finally, the generated data outputs are compared using another tool. In the variant-2, the chain-1 generates the output data and chain-2 regenerates the input information from chain-1 output data. In the end, input and regenerated input files are compared using another tool. This paper chooses a double chain variant-1 process due to its simplicity to realize the double chain process with the common specifications.
As shown in Figure 1, [3], the variant-1 double chain process is realized in four layers as (a) the information collection layer, (b) optional intermediate data verification layer, (c) data design and validation layer, and (d) data verification layer. Arrows in Figure 1 represent sequential information flow from one layer to another layer from bottom to top. Several step numbers are indicated inside the 'yellow' circle.
At first (Step 1 of Figure 1), the information about the trackside signalling infrastructure is collected either manually or automatically. For manual information collection, the data design chain-1 and chain-2 teams (Figure 1) refer to the existing track and signalling plan drawings that are provided by the railway operators. The automatic information collection is carried out using helicopters [9], Figure 1, or Unmanned Aerial Vehicles UAVs [10], drones [11], Figure 1, or airborne laser scanning [12]. The major drawback of these airborne vehicles is their limited access across the terrains such as tunnels, hilly regions, river bridges, dense forests etc. Hence, the railway survey system like GPSinfradat uses Global Navigation Satellite System (GNSS) systems along with inertial, radar, and high-speed cameras [13]. This collected information is stored in the form of railML along with camera data [13]. Similarly, Fugro's train-mounted survey system [11] measures absolute track position and collects laser point cloud information of the entire rail corridor for generation of topographical survey information.
Secondly (Step 2 of Figure 1), the optional intermediate information verification rules layer checks the correctness of the collected information instantaneously. This check can be done either manually by applying information verification rules or automatically by using tools [8]. Once the information from both chains is consistent with each other, it is declared as valid for further usage. Else, the information collection layer will be alerted for the possible re-collection. Thirdly (Step 3 of Figure 1), the data design is a key step in which the dedicated ERTMS or PTC signalling principles/rules are applied over the collected information. There is also a data validation in this layer to perform the tests on the designed data.
Finally (Step 4 of Figure 1), the formal data verification is completed using the dedicated tools, for example, [8]. This step helps to ensure whether the designed data satisfies the properties of the system under deployment for a specific project.
Though the existing data preparation method caters to the current needs, still there is the scope of improvements. The main drawbacks are: (a) majorly manual execution, (b) sequential execution (bottom-up), (c) each step is distinct, (d) mostly offline, (e) time-inefficient as the defect is detected only in later stages which may escalate the cost (f) lack of provision for concurrent information integration even when the signalling upgrade is executed by multiple contractors over different railway lines across the country.
With the major drawbacks on one hand and the recent technological evolutions, on the other hand, this paper proposes a novel railway signalling data preparation method. The purpose and significance of adopting specific technologies into the conventional process are detailed in Section 3.

PURPOSE AND SIGNIFICANCE
To overcome the drawbacks exist in the conventional data preparation process, this paper proposes a novel data preparation process that uses the combination of real-time (a) decentralized information collection, (b) secured information transaction, (c) centralized information processing and integration into railML immutable databases, (d) decentralized guided information validation, (e) finally, the automatic application of conventional data design, validation, and verification rules, for the first time.
In the context of decentralized real-time information collection (a) and guided validation (d), the crowdsensing [14][15][16][17] can be the real enabling technology. In general, crowdsensing [14] refers to a technique where 'a large group of individuals with sensing and computing devices collectively share data and extract information to measure and map phenomena of common interest'.
The crowdsensing can be broadly classified into two categories, namely, mobile crowdsensing [14] and vehicular crowdsensing [15] based on how the information is collected.
The current generation of smartphones evolved from the conventional communication device to edge computing machines in the context of mobile crowdsensing of environmental, infrastructure, and social application data [14]. Due to the high and cheap availability of smart mobile phones, this mobile crowdsensing has almost negligible deployment costs [18]. Even though the applications of mobile crowdsensing are wider from healthcare, social, e-governance, education, environmental monitoring, to transportation, the challenges are also huge [19]. Addressing these challenges, several frameworks, architectures, and design strategies were proposed [20,21] for mobile crowdsensing. In another category, the vehicular crowdsensing was demonstrated in road vehicles for the nonsafety critical applications such as parking navigation, road surface monitoring, traffic collision reconstruction etc. [15]. Compared to mobile crowdsensing, the railway vehicular crowdsensing may be the better candidate for the information collection by considering the limited access or awareness of railway infrastructure to typical mobile phone users.
The future of rail data collection [11] at Network Rail involves the complex technologies such as inertial navigation systems, satellite positioning, photogrammetry, mobile data connections and the latest high-speed laser scanning systems. This automation of data collection not only provides added value and also can have a big impact on the railway surveying [11]. At the same time, the addressing the major problems associated with geospatial engineering's fast-changing software applications, data storage and data security are also most needed [11]. Network Rail collects detailed information about its track and therefore the surrounding features, like bridges and tunnels [22]. The information is then analysed to assess clearances between trains and therefore the infrastructure around them, which is vital to safety.
Though major advances made in these crowdsensing methods, still their adaptation in railway safety-critical application data preparation is impracticable in its current form due to crowd trustworthiness, and information trustworthiness, transaction security, integrity and storage security issues etc. [23][24][25]11]. However, these typical problems can be addressed through the use of blockchain technology [23,26]. By definition [23], 'blockchain is the mechanism that allows transactions to be verified by a group of unreliable actors. It provides a distributed, immutable, transparent, secure, and auditable ledger'.
By considering the possible improvements in the traditional data preparation process and the concerns in crowdsensing, this paper proposes a novel railway vehicular crowdsensing that incorporates the blockchain-based transaction (b) for decentralized real-time information collection and validation.
Depending on the access rights, the blockchains can be broadly categorised into public or permissionless blockchain, private or permissioned blockchain and hybrid type [27]. The hybrid blockchain is a combination of the public and private blockchain. In the case of permissionless blockchain, any participant can read, write and participate in transactions and in the consensus without the need for special authentication or authorization. The examples of permissionless blockchains are Bitcoin and Ethereum [27]. The major drawback is that it is more prone to attacks such that the anonymous participants can have several identities to influence the consensus.
In the case of permissioned blockchains, the authentication is mandatory for any participant to gain access and to use the system resources, such examples include Multichain, Hyperlegder Fabric, Parity, BigChainDB, InterPlanetary, Corda and Quorum [27]. Permissioned blockchains have the dedicated hardware module to manage the identity, privacy, confidentiality and auditability within the system [27]. In the context of this study, permissioned blockchain is suitable [28] due to a) the participation of the trains in the crowdsensing is based on invite only i.e. the registered trains only can participate in the proposed crowdsensing method. Thus, each train will be identified using the unique pair of private and public keys [28] in the permissioned blockchain. b) As permissioned blockchains needs their own nodes to validate transactions, the proposed work involves the choice of few crowd i.e. few trains to participate in the validation of information transactions.
For information transaction, the permissioned blockchain relies heavily cryptography as similar to Public Key infrastructure (PKI) in terms of the usage of private/public key pairs [28]. In comparison, PKI focuses only on how the key is managed and data is signed than about how the data is stored and shared, which can also be performed by permissioned blockchain. In ERTMS [29], Key Management Centre (KMC) and PKI are used for the key management and data signature creation. The KMC can play the role of a Certification Authority (CA) in PKI [30]. KMC generates the valid certificates that will guarantee the identity of the ERTMS entities. However, the option of CA as KMC is not available during information collection stage prior to ERTMS or PTC deployments. Hence, it is required to incorporate the self-generation of keys than using of CAs. Unlike PKI, the major advantage of permissioned blockchain is that it does not need a KMC like CAs to issue certificates [31]. Hence, these equivalent certificates are self-generated by the crowd themselves compared to PKI and the trust about the collected data is arrived from a network of validating crowd in this study.
In the case of centralized information integration (c), the information immutability has to be satisfied, i.e. once the train stores the information into a railML database that should not be altered by other trains. The adoption of blockchain technology into the railML centralized database also can be an answer to this requirement. Hence, the blockchain hash is included in railML [32,33] database to lock the stored information from modification by the trains.
In summary, five major contributions of this paper in the novel data preparation process are: 1. Railway vehicular crowdsensing based decentralized trackside information collection to enable the multi train information collection which can emulate Fugro's train-mounted survey system [11] if it is installed in several trains for simultaneous information collection and validation.

Permissioned blockchain-enabled information transactions
between the train and trackside centralized server and vice versa to ensure the data security requirements [11] 3. Blockchain-integrated immutable railML databases at a centralized server for enabling data storage with security [11] 4. Information fusion of railML blockchain raw data using nearest neighbour and probabilistic data association algorithms for enabling the fusion of collected multi-train information. 5. Railway vehicular crowdsensing based decentralized trackside information validation so as to be compliant with the transaction validation of permissioned blockchain such as Multichain [34].
The rest of this paper is organized as follows: Section 4 describes the architecture of the vehicular crowdsensing enabled, blockchain-enabled trackside railway signalling data preparation process. The various components used in the proposed architecture are also detailed in this section. The case studies along with simulation results are discussed in Section 5. Finally, Section 6 concludes the summary of the study. In the rest of the sections, the term 'crowdsensing' is used in place of a 'railway vehicular crowdsensing'.

ARCHITECTURE OF BLOCKCHAIN-ENABLED RAILWAY SIGNALING DATA PREPARATION
As seen in Figure 2, the proposed signalling data preparation process comprises of following layers: • Crowd layer

Crowd layer
Crowd layer consists of two groups of trains, i.e. one group of trains acts as the crowd for the trackside infrastructure information collection and another group of trains performs the information validation to realize the permissioned blockchain. This crowd can be a loco-pilot or co-pilot of the train. Furthermore, the train [35] or railway vehicle trolley [36] with automatic cameras and trackside infrastructure object recognition capability [37] can also be a crowd. These rail vehicles will be equipped with the odometer, accelerometer, gyroscope along with GNSS etc. to perform crowdsensing functionalities. The crowd layer is marked in the Steps 1, 8 in Figures 2 and 3.

Crowdsensing information collection layer with passive crowdsensing
As mentioned in Figures 2 and 3(a), in this layer, the train (1 to n) collects trackside infrastructure information (Step 2) and sends them to the trackside centralized server by participating in passive crowdsensing. In the case of passive  [38], 'the crowd shares the information on their own to extract and integrate relevant information regarding a specific task'. As the crowd involved is continually required for signalling information collection and validation, the participatory sensing mode [39] is required. Participatory crowdsensing [14] requires 'the active involvement of individual train to contribute to the reporting of trackside infrastructure information related to a data preparation process'.
In this passive participatory crowdsensing, the crowd collects the information, for example, as mentioned in Table 1 [40, 38]. The registered loco pilot or co-pilot can visualize and identify the trackside infrastructure information via the driver machine interface (DMI) to collect or validate the specific information. In another possibility, DMI of the train is programmed to take pictures using dedicated cameras and confirm the trackside infrastructure objects using advanced algorithms [41][42][43]. The neural network [41] or deep learning [42,43] based algorithms at edge device can be used for the classification of specific railway trackside infrastructure objects such as signal etc.
The DMI, sensors, and local processing unit formulate an "edge device" as represented in Figure 2(b). In general, these edge devices are responsible for 'information gathering, processing, and filtering of sensor data, as well as data aggregation' [44].

Permissioned blockchain layer with elliptic curve digital signature algorithm
Once the information is collected, it is sent to the centralized server over the permissioned blockchain layer (Step 3 of Figures 2, 3(a)). For this blockchain layer, the Elliptic Curve Digital Signature Algorithm (ECDSA) [45] is chosen which is commonly adopted in permissioned blockchains such as R3 corda [46] and Multichain [34] in this paper. In general, ECDSA is a cryptographic algorithm used by permissioned blockchain such as Multichain [34] to ensure that funds can only be spent by their rightful owners. It uses cryptographic elliptic curves (EC) to generate key-pair of Private key (PrK) and Shared key (ShK) [47]. Each train (Crowd) generates its PrK which is a secret number. This key is maintained as confidential by the train. The PrK is the 32 bytes or 256 bits unsigned integer which is generated using any "random number generator" within the range of 1 to 2 256 . The ShK is a number that is calculated from PrK, i.e. ShK = ECDSA(PrK), but vice versa is not possible, i.e. PrK≠ ECDSA(ShK). It has a size of 64 bytes or 512 bits.
The collected information is signed using PrK before it is sent to the centralized server. At first, the double-hash of the information is created by calculating the SHA-256 (Secure Hash Algorithm-256) of SHA-256 of the information, i.e.
Hash of information = SHA-256 (SHA-256 (information)) Then, the hash of the information is signed using the PrK of the crowd, i.e.

Signature = Sign (Hash of information, PrK)
Once the information is received, the centralized server uses an ShK to verify whether the signature is valid or not, without knowing PrK (Step 4). The verification is done in two-sub steps. In the first substep, the hash of the information is recreated as, If the hash and the regenerated hash of the received information are the same, then it is declared as correct. With the help of signature and verification using ECDSA, security, and integrity issues are resolved. This way only the registered crowd can participate in crowdsensing.
Similarly, when the centralized server allocates the information validation task to the individual crowd, it signs the information with its private key. The registered individual crowd, which has the shared key of the centralized server only can participate in this validation task.

Data design layer with blockchain databases
The major activities performed in this layer (Step 4 of Figures  2, 3(a)) are (a) to push the collected information into railML blockchain raw database (b) to evaluate the raw data for the outlier detection and retain only the non-deviant information, (c) to use one of the fusion algorithms, namely, single nearest neighbourhood (NN), k-nearest neighbourhood (kNN) or probabilistic data fusion (PDA) to combine the non-deviant information and to upload it into railML blockchain database, (d) finally, to apply the ERTMS or PTC specific signalling principles/rules automatically to transform the railML blockchain information into specific configuration data.
A typical structure of the railML blockchain database is described in Appendix-A.1. The railML blockchain raw database is the subset of the railML blockchain database which stores the collected information from the crowd. As seen in Figure 3(a), the railML blockchain database stores the fused information. As the entire collected information is located on a centralized server in the form of a railML blockchain database, it is easier to apply signalling rules and also perform the data validation and verification.

Crowdsensing information validation layer with active crowdsensing
As shown in Steps 5 to 11 of Figures 2, 3(b), the information validation is executed as the 'guided validation', i.e. active crowdsensing. This step is compliant with the transaction validation of permissioned blockchain such as Multichain [34] . In the case of active crowdsensing [38], 'the crowd generates the information based on requests to actively design and collect data for a specific task and then integrate the information' (Step 5). In this paper, a centralized server guides the crowd (train n+1 to 2n of Figure 3(b)) with the information to be validated, i.e. it transmits the upcoming trackside infrastructure information, for example, as per list in Table 1 over the blockchain layer (Step 6). For each information validation task, the centralized sever signs the information to be verified with its private key. The registered individual validation crowd uses a shared key of a centralized server to verify whether it is indented to it (Step 7). Crowd confirms the trackside information (Step 8) using the provided button interface in DMI application manually or automatically [41][42][43].
Finally, the steps involved in Section 4.3 section are again repeated by the crowd (Steps 9, 10) to send the "validation" status message, i.e. Pass (1) or Fail (0). The centralized server updates the railML Blockchain database with the validation status as Pass or Fail (Step 11). When the number of valid information collection reaches the designed limit (design choice), the validation is confirmed as completed and the corresponding flag is set in the database. This concludes the transformation of the collected information into reliable data. Then, the further steps namely, data validation and data verification can be executed directly or the specific signalling rules can be applied before data validation and data verification steps.

Data validation and verification rules layers
These layers are the same as the conventional data preparation process except that all the rules are executed automatically (Steps 12, 13 of Figure 2). The data validation layer is involved with the automated testing of reliable data with generic and project-specific rules. The data verification layer further verifies whether the final data adheres to the expected signalling principles for the specific project configuration. This layer is not detailed further as they are almost the same as the conventional signalling data preparation process except the automatic execution of rules.

5.1
Case-1: Permissioned blockchainenabled information transaction and processing to update railML blockchain raw database The objective of this case study is to numerically demonstrate the substeps involved in permissioned blockchain enabled information transactions, information processing at the centralized server, and finally, to store the properly received informa-tion into the railML blockchain raw database as described in Figure 3(a). The following assumptions are made in this case study: (a) Each train which is part of identified crowd collects the trackside infrastructure information. (b) Each train is fitted with the cameras at the top, side, and bottom of the first vehicle for imaging the trackside infrastructure as per Table 1. (c) each train is equipped with DMI as an edge device (d) for each information collection, the convergence is reached by comparing the actual camera image with the reference image [41][42][43]. (e) The image identification, classification, and confirmation are not detailed in this study as they were well studied in the works of literature, for example, refs. [41][42][43].
Each train sends this collected information in the form of a blockchain transaction by signing using its PrK (Step 3). For this action, each train uses ECDSA along with cryptographic elliptic curves (EC) to generate the key-pair of PrK and ShK, as similar to permissioned blockchain, for example, Multichain [47]. For example, the train-1 generates the random confidential private key (256 bits, 64 hexadecimal digits) as, The shared key (512 bits, 128 hexadecimal digits + '04′ prefix) of the train 1, which is equivalent of PrK is given as, In this study, the information for signal in 'up' direction along with the track id = tr_01 of infrastructure id = i_01 at the distance of 10.21 m is represented as, Train-1 Information for Signal = 'Signal|i_01|tr_01|10.21| up ' The signal information in the generic form with the separator (|) contains (Infrastructure Type | Infrastructure Id | Track Id| position| direction), for example.
The train-1 calculates hash of the information using a double SHA-256 algorithm and signs the information using its PrK as, Signature = ECDSA_Sign(SHA-256(SHA-256 (Information)), PrK) The SHA-256 of the train-1 information is formulated as,

SHA-256(Data) = D65F300A396A672FE11FF9310DAE0 B904B966C900A7480C18EA970DCD503B7C9
Finally, the signature of the train-1 information is computed as, The signature is transmitted in the encoded form as, Encoded Signature with Base64 = MEQCIBeqC6KNDVPwFz Pgd34KepLzrIcA4sNGDaf0sl9Spw+EAiA0KIwfxmA9ZRdZSCfg TQ7DC0YsG48rCYyFcMSdQ5p8kg = = Similarly, the encoded form of ShK is sent to centralized server as, Encoded ShK with Base64 = MFYwEAYHKoZIzj0CAQYF K4EEAAoDQgAE9hCSsQuAGFtOplKomxG+auzKja3SfI7zCah to confirm the blockchain transaction as the valid one. If the information is invalid or cannot be verified with any crowd's public key, then, the collected information is rejected, and further, the analysis is done to check whether any malicious information attack was attempted.
The hash is calculated using the "SHA-1" algorithm. The actual time of reception of the collected information is also added while writing it into the railML blockchain raw database. Similarly, each train updates its information as provided in Figure 4. Once the information is integrated into this raw database, then, the next step is information fusion.

Case-2: Information fusion
The objective of this case study is to demonstrate numerically the substeps involved in the integration of information from a group of trains (Step 4). The information fusion is carried out into two substeps. The first substep involves the computation of information centroid. In general, the centroid is the arithmetic mean position of all the information samples. Secondly, the fusion algorithm is applied over railML blockchain raw data to reject the outlier information using computed centroid and to combine only the non-deviant information. The following assumptions are made in this case study: (a) One group of 10 trains (crowd) participates in the information collection. (b) Each train is equipped with onboard sensors such as GNSS, gyroscope, accelerometer etc. as mentioned in [48]. (c) As per [48], for each train information, 3σ location accuracy of 1.5 m is achievable.

Information centroid computation
The centroid calculation is initiated only when the number of collected information reaches the designed limit. For example, with 100 collected information per train, the total number of measurements, N = 1000 has to be reached. Consider the sig-nal at the absolute position of 10 m which is marked with the legend 'Expected Centroid' in Figure 5 and Table 2. To inject the location uncertainty (3σ) around signal location information, the three simulation runs are carried out with the different seed numbers of 0.9, 183, and 3235 to generate the random numbers. The 'Actual Centroid' is calculated by taking the mean value of all N samples and marked at 9.9837 for Seed (Sd) = 0.9, at 10.0176 for Sd = 183 and 9.9562 for Sd = 3235 in Figure 5(a, b and c) respectively and they are mentioned in Table 2 row: N-Mean ('Actual Centroid'). The 'Actual measurements' are also shown in Figure 5 along with their measurement uncertainties (10 ± 1.5 m) which are marked as 'Expected Measurement Uncertainty lines' for 'Expected Centroid line' and 'Actual Measurement Uncertainty lines' relating to 'Actual Centroid'.

Fusion algorithms for removal of outlier information
In this step, the outlier samples are eliminated based on the calculated distance between 'Actual Centroid' and collected information. Based on sorted distance (minimum to maximum values), the k-number of nearest neighbours (kNN) to 'Actual Centroid' can be chosen with 'k' as the design choice.
This simulation is carried out with k = 10, 25, 50, 100, 250, 500, and 750 for all three seed numbers such as Sd = 0.9, Sd = 183, and Sd = 3235. However, Figure 5 shows only for k = 100 and the entire simulation summary is included in Table 2. The number of valid measurements per train is marked just below the 'Actual Centroid' line in Figure 5. The number of possibly chosen kNN measurements for k = 100 based 'Expected Centroid' (ideal value = 10) is shown just above the 'Expected Centroid' line.
In Table 2, 'Single NN' is named for k = 1 case. The 'kNN-Mean' row of Table 2 is computed based on the mean value of all kNN samples. In the case of 'kNN-Prob' values, all kNN samples are fused using its distance from the centroid as the probability. The probability of each sample is calculated in such a way that the lesser distance is given more weightage or vice versa [48].
As shown in Figure 5(a), for Sd = 0.9, train-5 and train-1 produce the greatest number of closest measurements around the 'Actual Centroid' line with the count of 15 and 13 respectively. Similarly, for Sd = 183, train-1 and train-6, according to Figure 5(b), and with Sd = 3235, train-10 and train-11, as seen in Figure 5(c), produce the greatest number of closest measurements around the 'Actual Centroid' line.
From the Table 2, it is observed that the best fusion accuracy with the least error of −0.0063 m is achieved by the 'kNN-Mean' algorithm with k = 500 for Sd = 0.9. Similarly, 'kNN-Mean' algorithm produces the least error of 0.0106 m with k = 750 for Sd = 183 and −0.0399 m with k = 250/50 for Sd = 3235. It is evident that 'kNN-Mean' with k > 250 and k < 750 shows better results compared to 'N-Mean'. The results of 'kNN-Prob' are inferior compared to 'kNN-Mean'. Interestingly, with k = 10 only, the results of 'kNN-Prob' are almost closer to "N-Mean". Overall, the simulation results confirm that the fusion accuracy of 1-4 cm is possible to achieve for the collected information.
Hence, depending on the number of samples available and accuracy needs, the choice of 'kNN-Prob' or 'kNN-Mean' can be made in the context of information fusion. From this case study, it can be concluded that for the least value of 'k', the 'kNN-Prob' algorithm is enough. With the larger value of 'k', it is essential to use the 'kNN-Mean' algorithm to achieve the best accuracy.

Case-4: Permissioned blockchainenabled information validation
The objective of this case study is to demonstrate numerically the substeps involved in the (a) transaction of collected information to be validated from a centralized server to another group of trains (train n+1 to 2n), (b) transaction of the validation results from another group of trains to the centralized server as compliant with permissioned blockchain. As illustrated in Figures 2 and 3(b), (downward arrow), at first, the centralized server signs the information to be validated using its PrK and broadcasts the same (Step 5). This information is extracted from the railML blockchain database as shown in Figures 3(b), 6, A.2. Other groups of trains receive this blockchain transaction (Step 6). Each train has to verify whether the received information is correct/authentic or not by using the public key of the centralized server (Step 7).
The steps involved in this case study is as similar to Section 5.1 except that the PrK and ShK of the centralized server is used. The blockchain transaction is performed between the centralized server and the specific train. Finally, the steps involved in Section 5.1 section are again repeated by the train to send the "validation" status message. The details are included in Appendix-A.3.
Once the collected information validation is confirmed through crowdsensing, it can be declared that the signalling information is transformed into reliable data successfully. Then, the further steps on data design, data validation, and data verification can be executed (Steps 12, 13).

CONCLUSION
A novel combination of railway vehicular crowdsensing-based trackside information collection and validation, information transaction over the permissioned blockchain layer between the train and the trackside centralized server, railML blockchain raw database, information fusion, and railML blockchain database is studied in the context of railway signalling data preparation process for the first time. Instead of manual, sequential, conventional signalling data preparation, real-time, automated, concurrent data preparation is proposed to overcome challenges.
To enable concurrent information integration from in-service multiple trains and also to ensure information storage security, the blockchain-enabled railML databases are included in this paper. For the fusion of multi-train collected information, the nearest neighbourhood, and probabilistic data associationbased algorithms are examined. To ensure the crowd and information trustworthiness and information transaction security, a permissioned blockchain-enabled transaction in crowdsensing information exchange mechanisms between the train and centralized server is used. Finally, the case studies are carried out to confirm the feasibility of implementing this proposed method within reasonable accuracy. The simulation results confirm the probable adaptation of the proposed method in performing secured trackside infrastructure information collection in the context of future ERTMS or PTC solution deployments over the tens of thousands of kilometres, of the railway lines around the world. As a future direction, the addition of more elements and attributes into railML blockchain databases can be considered. Also, several other information fusion methods can be evaluated in place of the discussed algorithms. Description of railML blockchain database RailML blockchain database stores trackside railway signalling information and configuration data at each railway infrastructure area. As detailed in Figure A.1, the railML blockchain XML database contains the root node as "railML" which has the key element named, "railML.infrastructure". The element node "railML.infrastructure" acts as the Merkle root node of a railML which will contain the number of "tracks.track" leaf nodes. Each "railML.infrastructure.tracks.track" element contains the number of elements such as "track.trackTopology", "track. trackElements", "track.ocsElements". Each "railML.infrastructure.
The child elements of "track.trackTopology", "track. trackElements", "track.ocsElements" contain by default four attributes such as @hash, @id, @pos, and @absPos. The attribute @hash stores the calculated SHA-1 value as per the comment " < !-Hash for" as mentioned inside the railML file as shown in Figures 6, A.2. The attribute @id is used for the unique identification of any railML trackside infrastructure object. The attributes @pos and @absPos represent the relative distance from the beginning of the same track and absolute location in the track respectively.
The element "track.trackTopology" defines the key information which is related to the beginning and end of the track via "track.trackBegin" and "track.trackEnd" elements. This information is either calculated at the first step before the start of actual information collection or fed manually using some tools or collected manually through the crowdsensing technique.

A.3
Permissioned blockchain-enabled information validation For information validation, also, the Elliptic Curve Digital Signature Algorithm (ECDSA) is used by the centralized server (Step 6), as similar to permissioned blockchain, for example, Multichain [47]. For example, the PrK (256 bits, 64 hexadecimal digits) of the centralized server is assumed as, The ShK (512 bits, 128 hexadecimal digits + '04′ prefix) of the centralized server is derived from PrK as, FIGURE A.2 Typical railML blockchain database (a) "railML.infrastructure.tracks.track.trackElements" element (b) "railML.infrastructure.tracks.track.ocsElements" element ShK = 04C18D5EF93C6165C44973F0ED8570F53691C 3D47E1025726D8EAC5BE4560E66F88DA4BE6C9C528144 2596008B800F2A2DCD5CB96868CF1107B2C8B99AA4699C 90 As an example, the fused information about the signal in 'up' direction along with the track id = tr_01 of infrastructure id = i_01 at the distance of 10.08 m is assumed as, Information for Signal = 'Signal|i_01|tr_01|10.08| up' For the blockchain transaction, the hash of the signal information is calculated using the double SHA-256 algorithm first and then, signed using PrK of the centralized server as, Signature = ECDSA_Sign(SHA-256(SHA-256(Information)), PrK) The SHA-256 of the signal-1 information from the centralized server is determined as,

SHA-256(Data)
= 57C179266146634BB64C7980634 7C77821E3478A932150F3B30EF2FA95E1ACBC The signature of the centralized server information is formulated using "SHA256withECDSA" as, Signature = 3046022100B0075C33159416446207A5053AD0EF6 CBC788E92B35D5357494FB8779696CFDB022100B577A4F D3E03AF34358BB63A53762D7B9F45BECF3F4537ABA130 903B111E3B22. This signature can also be transmitted in the encoded form.
The verification of proper reception using the public key (Step 7) is also carried out in two substeps: First, the hash of the information is recreated using SHA-256. Then, the signature verification is performed using ShK of the centralized server as, Verification = ECDSA_Verify (SHA-256(SHA-256(infor mation)), Signature, ShK) If the verification result is 'true' then the blockchain transaction is declared as 'valid'. Then, the information is split using "|" separator, and the Infrastructure type is identified as 'Sig-nal'. In the end, the infrastructure id and track id are extracted. Each train will look for the @pos attribute value of the signal to confirm its physical presence on the track.
Finally, the steps involved in Section 5.1 section are again repeated by the train to send the "validation" status message, i.e. Pass (1) or Fail (0) as indicated in Figure 3b (upward arrow, Steps 9, 10, 11).