Federated learning–based global road damage detection

Deep learning is widely used for road damage detection, but it requires extensive, diverse, and well‐labeled data. Centralized model training can be difficult due to large data transfers, storage needs, and computational resources. Data privacy concerns can also hinder data sharing among clients, leaving them to train models on their own data, leading to less robust models. Federated learning (FL) addresses these problems by training models without data sharing, only exchanging model parameters between clients and the server. This study deploys FL along with YOLOv5l to generate models for single‐ and multi‐country applications. These models gave 21%–25% lesser mean average precision (mAP) than centralized models but outperformed local client models by 1.33%–163% on the global test data.

These methods typically involve training a centralized model on a data set of road damage images.However, there are challenges in developing a robust road damage detection model using this approach.First, training a centralized model for road damage detection requires a large quantity of good-quality and diversified labeled data.The entire data are stored on a central server, requiring a large amount of data transfer, storage space, and computational resources.Second, different municipalities or countries may choose to not share their road damage data due to data privacy regulations such as the General Data Protection Regulation (GDPR) or technical constraints.Consequently, every municipality or country might need to develop its own model by training it on a locally curated data set.This can lead to models that are not generalizable to other areas.For example, a model trained on a road damage data set of Japan may not be able to accurately detect road damage on images from India (Arya et al., 2021).Finally, the data used to train a model may not be evenly distributed across different types of road damage (Arya, Maeda, Ghosh, Toshniwal, & Sekimoto, 2022).This can lead to biased models that cannot accurately detect all types of road damage.These challenges can restrict the development of a more robust road damage detection model, as pointed out by several researchers participating in recent road damage detection challenges (Arya et al., 2020;Arya, Maeda, Ghosh, Toshniwal, Omata, et al., 2022;Behzadian et al., 2022), working toward global road damage detection.
To address similar challenges related to large data transfer, data sharing, and privacy, federated learning (FL; Konečn ỳ et al., 2016) is being utilized in various industries.This approach allows multiple clients to collaboratively train a model without sharing their data.Instead of sharing data, only the model parameters are exchanged between the clients and the central server.FL was applied for the first time in a commercial setting by Hard et al. (2018).They used it for mobile keyboard prediction where FL offered security and privacy advantages for users by training across a population of highly distributed computing devices while simultaneously improving language model quality.Further, in the healthcare industry where patient data are highly sensitive, Brisimi et al. (2018) used it to develop predictive models from federated electronic health records.
Although FL has the potential to train models without violating data privacy and without the need to transfer data to a central server, there is limited research available on its application in road damage detection.Research is limited to continual learning of potholes on roads (Rahman et al., 2021) and detection of the health of roads (Yuan et al., 2021).To the best of our knowledge, no literature is available that identifies and classifies different types of road damage using FL, or that performs road damage detection for single or multicountry use using collaborative training.
Our research aims to address these existing gaps in the field by making the following contributions to the area: 1. Utilizes road damage data set 2022 (RDD2022; Arya, Maeda, Ghosh, Toshniwal, Omata, et al., 2022;Arya, Maeda, Ghosh, Toshniwal, & Sekimoto, 2022) to prepare road damage data sets with different data distribution for deployment of FL for two cases, namely, single country and multiple countries.Different countries in RDD2022 exhibit different data distribution among their road damage types and this distinct data distribution was not artificially created.2. Develops road damage detection models for different areas of a country and multiple countries of the world by both centralized training and FL approaches.
3. Analyzes and compares the performance of road damage detection models developed using centralized training and FL methods.4. Identifies the applications and use cases of road damage detection model using FL. 5. Identifies different parameters that affect training and performance of road damage detection models developed using FL, and studies their effects.
The rest of the paper is organized as follows.In Section 2, object detection methods and FL are discussed.The methodology used in the research is explained in Section 3. Experimental details along with the results are explained in Section 4. Discussions are provided in Section 5. Section 6 finally concludes the paper.

LITERATURE REVIEW
This section provides an overview of the latest object detection algorithms, followed by a review of FL and its related terms and trending use cases.It then discusses the application of FL in road damage detection.

Object detection architectures
An object detection model feeds an image to the network and predicts bounding boxes around detected objects along with confidence scores.It simultaneously localizes and classifies the objects in the image.The object detection architectures can be divided into two main categories: twostage detectors and one-stage detectors.A two-stage detector first generates a set of candidate object regions and then classifies each region as an object or background based on features extracted from these regions and the regression of the coordinates of rectangular bounding boxes, for example, Faster Region-based Convolutional Neural Networks (R-CNN) (Ren et al., 2015) and Mask R-CNN (He et al., 2017).A one-stage detector directly predicts the bounding boxes and class labels of objects in an image without the need for any region proposals.Examples of one-stage detectors are Single Shot Detectors (SSD) (Liu et al., 2016), RetinaNet (Lin et al., 2017), and You Look Only Once (YOLO) (Redmon et al., 2016), and its variants.The detection speed of two-stage detectors is typically lesser than that of one-stage detectors.The YOLO model can achieve a mean average precision (mAP) that is comparable to other object detection models when tested on the Microsoft Common Objects in Context (COCO) data set (Lin et al., 2014).However, the YOLO model has the added advantage of being faster in detecting objects than other models.

FL: Background
Google introduced FL in 2015 (Konečn ỳ et al., 2016).FL is a decentralized method for training machine learning models.It does not require the transfer of data from client devices to global servers.Instead, the model is trained locally using raw data on edge devices, which enhances data privacy.The final model is developed in a shared manner by aggregating the local updates after every round till the model converges.
The concept of FL lies at the intersection of two approaches to training, namely, machine learning and edge computing (Banabilah et al., 2022).Edge computing is a distributed computing model that brings computing processes and resources closer to edge devices, which are the data sources.This can help to address privacy and security concerns, as well as reduce costs and improve operational efficiency.There is no requirement to transfer data to the central server or allocate storage if data are processed at the edge devices.FL is especially useful in edge computing, where data are often stored on thousands of distributed devices, such as smartphones and Internet of Things (IoT) sensors.By leveraging FL, a single model can be trained on the data stored across these devices without ever needing to move the data themselves.
FL is different from TL. TL is used when there are limited labeled data or data are scarce in the target domain but abundant in a related domain.However, FL is particularly useful when data privacy is a concern as it allows for model training with decentralized data.TL focuses on knowledge transfer between domains, while FL enables collaborative learning across decentralized data sources.
Different terminologies related to FL are described as follows: Client: It refers to a device or user that participates in FL.Clients typically have their own data that they want to contribute to the training of a model, but they do not want to share their data with anyone else, such as other clients or central server.Server: It refers to a central server that coordinates the FL process.The server does not have access to the client's data, but it aggregates and exchanges model updates with the clients while training is being done.Aggregation: It is the process of combining the updates from the clients to form a global model.The server is responsible for aggregation, and it ensures that the global model is updated in a privacy-preserving way.Server round: It is a single iteration of the FL process.In each communication round, the clients locally train on their own data, send model updates to the server, and receive the updated global model from the server.These devices then train the models using their locally generated data.Over time, the models on each device become personalized, providing a better user experience.The updates (model parameters) from the locally trained models are then securely shared with the main model on the central server using techniques such as FedSGD (McMahan et al., 2017), FedAvg (McMahan et al., 2017), and FedDyn (Jin et al., 2023).This model combines and averages the different inputs to generate new learning, and since the data come from diverse sources, the model has a greater potential to become generalizable.After the central model has been retrained with new parameters, it is shared with the client devices again for the next iteration.With each cycle, the model gathers more diverse information and improves further without compromising privacy.
FL differs from centralized training of models in the following ways: (1) Privacy: FL allows for training to take place locally on the edge devices, rather than sending data to a central server for processing.This helps to prevent potential data breaches and enhances data privacy.
(2) Data security: In FL, only the encrypted model updates are shared with the central server, ensuring data security.Secure aggregation techniques allow the decryption of only aggregated results.(3) Access to heterogeneous data: FL provides a way to access data that are distributed across multiple devices, locations, and organizations.It enables the training of models on sensitive data while maintaining security and privacy.Further, access to large and diversified data makes the model more generalizable.
TFF was developed by Google to expand existing Ten-sorFlow or Keras models to federated setting.It supports Tensorboard and allows for customized aggregation algorithms.The building blocks provided by TFF can also be used to implement nonlearning computations, such as federated analytics.
Flower is an open-source FL framework that is flexible, user-friendly, and easy to understand.It is framework-agnostic and provides comprehensive documentation and useful examples for various machine learning frameworks.It was originally developed as part of a research project at the University of Oxford.Flower is highly customizable, allowing users to easily modify, extend, or override its components to suit their needs, such as changing the FL strategy, aggregation algorithms, or the number of clients.It is compatible with multiple operating systems, can run on a wide range of devices, and can scale to support millions of clients.
PySyft is an open-source FL framework created by the OpenMined project.In addition to being an FL framework, it also serves as a remote data science platform that enables experiments to be conducted on protected data while maintaining privacy through the use of differential privacy and secure multiparty computation.PySyft is not framework-agnostic and only supports the PyTorch and TensorFlow deep learning libraries.
PySyft previously had extensive documentation, including video tutorials, but some of the tutorials may not reflect the most recent version of the framework, making it challenging for new users to use effectively.

Applications and trending use cases of FL
FL has been applied in a variety of public-facing applications such as Gboard.Gboard is Android's Google keyboard, which uses it for predictive text (Hard et al., 2018).FL has also been used to enable secure cross-data analysis across healthcare institutions by leveraging data islands.Brisimi et al. (2018) have used FL to solve a binary supervised classification problem to predict hospitalizations for cardiac events using electronic health records.Additionally, FL is used in a wide range of applications, such as credit risk assessment (Kawa et al., 2019) in the banking sector and vehicular management (Tan et al., 2020).In the context of autonomous vehicles, data generated by each self-driving car can be large.This can potentially lead to communication delays and slow response times when transmitting data over the network.In such situations, FL can enable each client to locally train a model and then transfer a compressed version of new parameters or updates to the global model at the server end (Lu et al., 2020).

Road damage detection using FL
The current application of FL in road damage detection is still in the preliminary phase, but several research initiatives are investigating its potential.Some of the research projects examining the use of FL for road damage detection include the following: (1) Pothole detection using FL: Rahman et al. (2021) have proposed an FL-based pothole detection system that uses smartphone's accelerometer sensor and GPS data from smartphones to constantly update pothole information by recording information about repaired potholes.The goal is to gather data on potholes in unknown locations on the road, track whether preexisting potholes have been fixed, and warn the driver before approaching a pothole while driving.(2) FedRD: FedRD (Yuan et al., 2021) is a privacypreserving adaptive FL framework for intelligent hazardous road damage detection and warning.In this research, hazardous road damage information of a wide area is aggregated into a global map using a new map construction approach.The global map is hundreds or thousands of times wider than existing edge-based systems.It detects images with hazardous road damage and then classifies them into three dangerous levels based on their visual severity into low-, middle-, and high-level damage.
Further investigation is required in the road damage detection domain that allows different regions or countries to collaboratively train a road damage detection model without data sharing or data transfer to a central server.Furthermore, this model should be capable of not only detecting but also classifying different types of road damage.

METHODOLOGY
The methodology consists of five steps: selection and preparation of data sets, selection of FL framework, object detection using FL on Flower, training of road damage detection models, and performance evaluation of FL models.

Selection and preparation of data sets
This research considered two scenarios for road damage detection: road damage detection for a single country and road damage detection for multiple countries.A part of RDD2022 (Arya, Maeda, Ghosh, Toshniwal, & Sekimoto, 2022;Arya et al., 2022) was used to prepare data sets for both scenarios.The RDD2022 is a crowdsourced (Arya, Maeda, Ghosh, Toshniwal, Omata, et al., 2022) road damage data set from six different countries, namely, China, India, Japan, the Czech Republic, the United States, and Norway.In the road images, annotations are provided for four types of road damage.These are longitudinal cracks, transverse cracks, alligator cracks, and potholes.The data set has data annotations in a format suitable for training a YOLO model.
For single-country road damage detection, the road damage data set of RDD2022 on Japan was considered.This data set was named as Japan road damage data set (JRDD).For multicountry road damage detection, a part of RDD2022 comprising of the road damage data sets of Japan, India, and the United States was considered.This data set was named as multicountry road damage data set (MRDD).While using the RDD2022 data set, only positive images (Saha & Sekimoto, 2022) were considered.By positive images, it means that only those images that had at least one instance of road damage type in them.The details about the data sets used are provided in Table 1.
After selecting data sets for road damage detection, the data sets were prepared for the deployment of FL.To do this, the data sets were analyzed to determine if they could be split to exhibit IID and non-IID behavior.For singlecountry road damage detection, JRDD was used.Since the images of Japan available in the RDD2022 data set are labeled only with the country name and without specifying individual locations, JRDD was randomly divided into equal parts, resulting in an IID behavior.For multicountry road damage detection, MRDD was split by country (Japan, India, and the United States).Each country's road damage data set had different data distributions, resulting in a non-IID behavior for the data set.Additionally, MRDD was randomly split into three equal parts such that each country's data were equally represented in each part, giving IID behavior to data.

Selection of FL framework
Different frameworks, such as TFF, Flower, PySyft, and IBM Federated Learning, are available for deployment of FL.Since the research required to perform object detection using YOLOv5l (Jocher et al., 2022) whose open-source code is available in PyTorch and TFF is limited to its application using TensorFlow and Keras, TFF was not used.
Thereafter, PySyft and Flower, both of which could be used with PyTorch, were shortlisted.Flower was finally used due to its version compatibility with the adopted deep learning model.

Object detection using FL on flower
The goal of the road damage detection models was to identify and classify the four types of road damage.Centralized models were trained using YOLOv5l as the object detection architecture.Its pretrained weights trained on the COCO (Lin et al., 2014) data set were utilized to train the requisite road damage detection using TL.The hyperparameters or network training parameters is the same as for Yolov5l.However, the vertical flip was not used as images with the vertically flipped roads not appearing in the real-world data set while taking videos and images using dashboard cameras on vehicles.The loss function used in Yolov5l is generalized intersection over union (GIoU) and the optimizer is stochastic gradient descent (SGD).YOLOv5l was used because of its successful integration with Pytorch and FL code of Flower.YOLOv5l was used with Flower for building models using FL.FedAvg was used as the server aggregation method.If FL is run for  rounds, the total data exchanged by the server are given by For centralized training of the model, if it is assumed that data sources send their data to the central server and the central server sends the predictions back to the data sources, the total data exchanged by the server in centralized learning (CL),   , is given by where D is the total data from all the data sources.

Training of road damage detection models
To ensure a fair comparison between centralized models and FL models, parameters such as image size, batch size, and number of epochs were kept the same for both.Within FL models, different models were trained using different versions of the data set (IID and non-IID) and different combinations of parameters, such as server rounds and local epochs.For single-country road damage detection, different FL models were trained using IID version of JRDD by varying the number of server rounds and local epochs.For multicountry road damage detection, different FL models were trained using both IID and non-IID versions of MRDD, and by varying the number of server rounds and local epochs.The models with the best mAP were considered for comparison.

Performance evaluation of FL models
To compare the performance of centralized and FL models for road damage detection, the main parameter used was mAP@IoU = 0.5.It is the parameter used to compare different object detection models using YOLOv5l.The higher the value of mAP, the better the model performance (Arya et al., 2021).Various factors, such as data distribution, data volume, server rounds, and local epochs, were considered to evaluate different FL models.FL models were evaluated on the respective test data sets and their performance was compared with those of clients' models.The volume of data transferred, the number of trainable parameters exchanged during federated training, training time, and the trade-off between model performance and computation cost were also analyzed.

EXPERIMENTS
This section provides experimental details and results on the development of FL models for road damage in a single country and multiple countries and further discusses the various parameters that can impact the performance of FL models.In the research, the centralized and FL models were trained on an Amazon Web Services (AWS) instance g4dn.4xlargeran on the Ubuntu 18.04 operating system.It had one NVIDIA-T4 GPU with 16 GiB GPU memory, 16 vCPUs, and 64 GiB memory.

Deployment of FL on JRDD
As shown in Figure 2, JRDD was randomly divided into three equal parts: J 1 , J 2 and J 3 .Data of each of the three parts were randomly split into training and testing sets in the ratio of 90:10.For example, a part J i was divided into a training data set called J i,Train and a test data set called J ,Test .The training data sets of all three parts were combined to make a training data set for JRDD called JRDD Train .This was used to train a centralized model on JRDD.Similarly, the test data sets of all three parts were combined to make a test data set for JRDD called JRDD Test .This was used to test centralized and FL models on JRDD.
Since the data distribution is random and identical, it gives IID behavior to the data set.
For training object detection model, YOLOv5l was run on an image size of 640 × 640, batch size of 4, and epochs of 60.For centralized training, two kinds of road damage detection models were trained, which are as follows: (1) Centralized model on Japan.This involves training one model with the training data set, JRDD Train , and the test data set, JRDD Test .
(2) Centralized models on individual clients of Japan.This involves three models as there are three clients.Each client  trains one centralized model on its local data, J i,Train , as the training data set and evaluates on J i,Test as the test data set.
For federated training, an FL model on Japan is trained on the three clients of Japan, that is, J 1,Train , J 2,Train , and J 3,Train for training and evaluated on JRDD Test .Training is done for the same image size and batch size, server rounds at 60, local epochs at 1, and FedAvg as the server aggregation method.Comparison between centralized and FL approaches was performed by comparing centralized model on Japan and FL model on Japan.Figure 3a shows that the mAP obtained by FL on JRDD is 21.3% lesser than the mAP obtained by the centralized model on JRDD.Thereafter, the performance of centralized and FL approaches was compared for individual clients in Japan.
This was done by comparing the performance of centralized models on individual clients of Japan with the performance of FL model on Japan evaluated on individual client's test data, J i,Test , and the results are shown for a client in Figure 3b.The mAP of both models was found comparable.6a.This also illustrates the effect of data distribution, that is, IID and non-IID data, on the models trained by FL.The mAP obtained by the centralized model on MRDD is higher than the mAP obtained by the IID version on MRDD, which in turn is higher than the mAP obtained by the non-IID version on MRDD.Thereafter, a comparison of centralized and FL approaches on the model performance of the individual country was performed by comparing centralized models on individual country of MRDD, FL model on IID version of MRDD, and FL model on non-IID version of MRDD evaluated on individual country's test data, D Test .The results are shown in Figure 6b, c, and d.When evaluation was done on their respective countries, all three local country models obtained higher mAP than both FL models.For the IID version of MRDD, the mAP of the FL model is lesser than that of the local country model by 10% in Japan and 15% in India.In the United States, it is slightly better than the local country model by 2.72%.For the non-IID version of MRDD, the mAP of FL model is lesser than the mAP of the local country model by 12% in Japan, 11% in the United States, and 51% in India.

Deployment of FL on MRDD
Non-IID version of MRDD reflects the real-world data distribution among the countries.To study the feasibility of the development of a multicountry road damage detection model, model performance of FL model on non-IID version of MRDD and centralized models on individual country of MRDD were evaluated on MRDD Test data set.FL model trained on non-IID data performed better than all the local country models as shown in Figure 7.The gain in mAP by FL model is different for different countries and the value ranges from 1.33% to 163%.Comparable performance was observed in Japan, while smaller and larger differences in mAP of federated model and local country models were observed for the United States and India, respectively.
To study how a country with no road damage data set of its own can benefit from the FL models trained on a group of countries, a specific use case of FL was studied.Under this, the performance of FL model on non-IID version of MRDD was analyzed on a country that did not participate in the training process.Czech Republic data of RDD2022 were considered for this.When the centralized model on MRDD and FL model trained on non-IID version of MRDD were evaluated on the Czech Republic, it was found to have a similar performance.When the evaluation was done on the test data set of the Czech Republic, a local country model for the Czech Republic gave an mAP of 0.26, the

Effect of parameters on FL models
(1) Data distribution: FL models on IID and non-IID versions of MRDD were trained for different combinations of server rounds and local epochs.These were compared among themselves and with the centralized model on MRDD.It was found that the FL model

DISCUSSIONS
In this study, in addition to preparing data sets for IID behavior, a road damage data set using real-world data from multiple countries was used that exhibited non-IID behavior.The mAP of FL models was found to be lower than the mAP of centralized models, with the difference dependent on whether the data was IID or non-IID.This is consistent with the existing findings (Bochie, 2021;Luo et al., 2019;Yu & Liu, 2019;Lai et al., 2022).Data diversity was also found to affect the performance of FL models.In the case of JRDD, the performance of the local client model and the FL model was comparable because the three clients of Japan had similar data volume and distribution.In contrast, for MRDD, variation in mAP could be attributed to factors such as data diversity, data imbalance among classes, and data quality.Japan's data are balanced while data from India and the United States are highly imbalanced.The reason for the similarity in performance obtained by Japan and the United States could be because both the Japan and US data set is made of good-quality data.However, the images of India data are not of good quality (Saha & Sekimoto, 2022) because of poor image capturing due to dust, sun glare, motion blur, and pollution.
For non-IID distribution of MRDD, it was observed that local country models had better mAP than FL models when evaluated on the test data of their respective countries.However, when evaluated on MRDD Test , all three local country models performed worse than the FL model.This is because the centralized models on individual countries learned features specific to that country but could not generalize well on global test data as they were not trained on data from different countries.This demonstrates that an model trained on data from multiple countries can produce a more robust and generalizable model than a local country model trained on its own data.
This finding is further supported the performance of models on the Czech Republic.The local country model had an mAP of 0.26 on its own country but only 0.06 on MRDD Test .In contrast, an FL model trained on MRDD had an mAP of 0.21 on the Czech Republic's data, providing satisfactory predictions despite not being trained on its data.Both centralized and federated models trained on MRDD had similar performance on Czech Republic, with the federated model performing slightly better.This shows

CONCLUSION AND FUTURE SCOPE
This research has demonstrated the potential of FL for collaboratively training road damage detection models across multiple municipalities or countries without the need to share data with a central server.This approach is beneficial in situations where privacy is a concern or where data sharing or data transfer is not possible due to legal or technical reasons.The results showed that when both FL and centralized training are feasible on a given data set, FL models had 21%-25% lesser mAP than the centralized models.Within FL models, a model trained on the non-IID version of data has 16% lesser mAP than a model trained While this research has demonstrated the potential of FL for training road damage detection models across multiple municipalities or countries, there are still some limitations that can be addressed in future work.Due to hardware constraints, the number of clients used in the FL experiments was limited.In real-world deployments, however, a much larger number of clients would be involved.Further experiments with a larger number of clients could provide more insights into the performance of FL for road damage detection.Additionally, including more data from a wider range of countries could improve data diversity and potentially enhance the performance of FL models.Exploring different server aggregation methods, object detection algorithms and sophisticated supervised machine learning and classification algorithms, such as the Neural Dynamic Classification algorithm (Rafiei & Adeli, 2017a), the Dynamic Ensemble Learning Algorithm (Alam et al., 2020), and the Finite Element Machine for fast learning (Pereira et al., 2020), could also lead to improved performance.Future research may also include a detailed visual investigation of successful and unsuccessful detection cases to analyze the localization problem as done in Arya et al. (2021) for several cases.When large-scale deployment of this research is conducted, the statistical significance of the performance differences may also be studied.Overall, this research provides a strong foundation for further exploration of FL for road damage detection while preserving user privacy.
F I G U R E A 1 Sample of images used for training.

F
Schematic diagram of federated learning.
Local epoch: It refers to the number of times a client trains its model on its local data before sending the updated model to the server for aggregation.Non-independent and identically distributed (IID) data: Non-IID data are one when the label distribution of the client or local data set does not match the global one.

Figure 1
Figure 1 represents a schematic diagram of training an FL model.A baseline model is stored on a central server and copies of this model are distributed to client devices.These devices then train the models using their locally generated data.Over time, the models on each device become personalized, providing a better user experience.The updates (model parameters) from the locally trained models are then securely shared with the main model on the central server using techniques such as FedSGD(McMahan et al., 2017), FedAvg (McMahan et al., 2017), and FedDyn(Jin et al., 2023).This model combines and averages the different inputs to generate new learning, and since the data come from diverse sources, the model has a greater potential to become generalizable.After the central model has been retrained with new parameters, it is shared with the client devices again for the next iteration.With each cycle, the model gathers more diverse information and improves further without compromising privacy.FL differs from centralized training of models in the following ways: FedAvg works by training local models on individual devices' data, then exchanging and averaging these models across devices, repeating this process until convergence.During the training of an FL model, in each round, the server sends the model weight, , to the selected group of clients,   .Once local training is completed on the clients, the server receives the updated model weights from those clients.Therefore, data exchanged by the server in each TA B L E 1 Road damage datasets (derived from road damage data set 2022 [RDD2022]; Arya, Maeda, Ghosh, Toshniwal, Omata, et al., 2022) considered in the study.,154,449 model parameters, which is equivalent to approximately 185 MB when stored as float32 numbers.Therefore, for each round of FL running YOLOv5l, the number of model parameters and data volume exchanged by the server with the clients is given by Equation (2) and Equation (3), respectively, Number of model parameters = 2  × 46, 154, 449 (2) Data volume exchanged = 2  × 185 MB (3)

F
Independent and identically distributed (IID) version of Japan road damage data set (JRDD).(1) For centralized training, JRDD Train is the train data set and JRDD Test is the test data set.(2) For federated learning (FL), J 1,Train , J 2,Train , and J 3,Train are the three clients, and JRDD Test is the test data set.
Two versions of MRDD, namely, IID and non-IID versions of data, were created.To make the IID version of MRDD as shown in Figure 4, each country data set, say D, was randomly divided into three training data sets and one test data set.Each training data set had 30% of the country data and the test data set had 10% of the country data.Each training data set is represented as D Train, 1 , D Train, 2 , and D Train, 3 , and the test data set is represented as D Test .All D Train, 1 from the three countries were combined to make the first client called Client 1,Train .Similarly, other two clients, namely, Client 2,Train and Client 3,Train , were prepared.Thereafter, all the three Client i, Train were combined to make the training data set called MRDD Train and all the three D Test were combined to make the test data set called MRDD Test .For centralized training on the IID version of MRDD, MRDD Train was used as the training data set and MRDD Test was used as the test data set.For FL on MRDD, Client 1,Train , Client 2,Train , and Client 3,Train were the three clients for training, and MRDD Test was used as test data set.To make a non-IID version of MRDD as shown in Figure 5, each country data set, say D, was randomly divided into a training data set D Train and a test data set D Test in the ratio of 90:10.Each training data from the three F I G U R E 3 Results federated learning on Japan damage data set (JRDD).F I G U R E 4 Independent and identically distributed (IID) version of multicountry road damage data set (MRDD).(1) For centralized training, MRDD Train is training data set and MRDD Test is test data set.(2) For federated learning (FL), Client 1,Train , Client 2,Train , and Client 3,Train are the three clients, and MRDD Test is test data set.countries were combined to make a training data set for MRDD called MRDD Train and each test data from the three countries were combined to make a test data set for MRDD called MRDD Test .For centralized training on the non-IID version of MRDD, MRDD Train was used as the training data set and MRDD Test was used as the test data set.For FL on MRDD, J Train , I Train , and USA Train were the three clients for training, and MRDD Test was used as the test data set.It is important to note that MRDD Train and MRDD Test are the same in IID and non-IID versions of data.The only differ-ence is that for FL, the data distribution varies among the clients considered.For example, for country India, I Train of the non-IID version of MRDD is comprised of I Train, 1 , I Train, 2 , and I Train, 3 of IID version of MRDD.I Test in both IID and non-IID versions of India of MRDD are the same.For training object detection model, YOLOv5l was run on an image size of 640 × 640, batch size of 4, and epochs at 45.For centralized training, two kinds of road damage detection models were trained, which are as follows: F I G U R E 5 Non-independent and identically distributed (IID) version of multicountry road damage data set (MRDD).(1) For centralized training, MRDD Train is training data set and MRDD Test is test data set.(2) For federated learning (FL), J Train , I Train , and USA Train are the three clients, and MRDD Test is test data set.(1) Centralized model on MRDD.This involves training one model with the training data set, MRDD Train , and the test data set, MRDD Test .(2) Centralized models on individual country of MRDD.This involves three models as there are three countries.Each country, say D, trains one centralized model on its local training data, D Train , and test data, D Test .For federated training, two types of models were trained: one on IID version of MRDD, and another on the non-IID version of MRDD.The training was done with the same image size and batch size and different combinations of server rounds and local epochs.FedAvg was used as a server aggregation method.Evaluation is done against the test data set, MRDD Test .FL models for both IID and non-IID versions of MRDD were for different of server rounds and local epochs.Comparison between centralized and FL approaches was performed by comparing centralized model on MRDD, FL model on IID version of MRDD, and FL model on non-IID version of MRDD, and the results are shown in Figure

F
Results of federated learning on multicountry road damage data set (MRDD).

F
Development of a multicountry model using federated learning (FL).FL model trained on the non-IID version of MRDD gave an mAP of 0.21, and the ensemble of both models gave mAP of 0.27.When the evaluation was done on MRDD Test , the local country model gave a very low mAP of 0.06.The results are shown in Figure 8.The overall result of the experiments that compared the performance of global models obtained by both centralized training and FL trained on JRDD and MRDD is shown in Figure 9.The FL models have 21%-25% lesser mAP than the centralized models.Within FL models, data distribution significantly influences the mAP values.In the case of MRDD, the FL model trained on the non-IID version of F I G U R E 8 Performance of centralized and federated learning (FL) on the Czech Republic.data has mAP lesser than the FL model trained on the IID version of data by 16%.

F
Performance of global models obtained by centralized training and federated learning on different data sets.F I G U R E 1 0 Effect of number of clients performance of federated learning (FL) model.trained on the non-IID version of MRDD gave 16% lesser mAP than the FL model trained on the IID version of MRDD.This can be seen in Figure 6a.(2) Number of clients: FL models were trained by varying the number of clients using the FL strategy.A greater value of mAP was obtained with all three clients rather than only two clients available for training an FL model.The result is shown in Figure 10.(3) Server rounds and local epochs: By keeping the effective number of epochs constant at 60, FL models were trained on JRDD for different combinations of server rounds and local epochs.The mAP and training time were recorded for each case.Both training time and mAP were found to decrease when the number of local epochs at the clients was increased.The results are shown in Figure 11.(4) Data transfer volume and parameters: These were calculated empirically as explained in Section 3.3 and verified during the training of models.During one communication round of FL training using YOLOv5l, the total number of model parameters was 2  × 46, 154, 449 and the total data volume exchanged with the central server was 2  × 185 MB.This theoretical value was verified during the training.For central-ized training, the data transfer in JRDD was 2 = 2 * 602.3 = 1204.6MB and in MRDD was 2 = 2 * 1167.7 = 2335.4MB.For FL, the data transfer in JRDD and MRDD was 2   = 2 * 3 * 185 * 4 = 4440 MB.

F
I G U R E 1 1 Effect of local epochs on training and performance of federated learning (FL) model.anFL model trained on MRDD can be used to develop a more robust road damage detection model without accessing data from individual countries.Furthermore, a country without its own road damage data set can leverage an FL model trained on multiple countries to perform road damage detection without accessing road damage data from those countries.It is important to consider the impact of the number of clients participating in the training process on the performance of the FL models.Generally, the performance of the FL model improves if more clients can participate in the training process.However, there is a trade-off between the number of clients and communication costs, as increasing the number of clients also increases communication costs.The amount of data transferred between the server and clients after each round of FL training depends on the selected model, with heavier model weights resulting in more data being transferred per round.Therefore, it is important to carefully select a suitable model to ensure that communication costs do not outweigh the benefits of training a federated model.In our research, the data transfer for FL was more than that for centralized training.Centralized training can involve significant data transfer, especially when dealing with large data sets.Ignoring any other overhead costs, for a specific experiment involving a certain number of clients and a model, the amount of image data used for centralized training and the number of rounds in FL will impact the comparison of data transfer between the two methods.Increasing the number of local epochs at the client can decrease training time by reducing the number of server rounds and communication with the server.However, this can also decrease the mAP of the model, which is unde-sirable.This decrease in mAP may be due to local weights becoming more biased toward local data with more local training at the client(Díaz & García, 2023), resulting in a decrease in the performance of the global server model when these biased local weights are To find an optimal number of local epochs for training, it may be a good idea to start with a small number and gradually increase them until the model's performance stops improving.Training time is also influenced by the training time of individual clients.For example, when training an FL model on the MRDD, it was found that one FL round was completed after local training on Japan was completed because Japan had the largest data size among the three countries and took the longest time to finish local training.Since server aggregation is performed only after locally trained weights are received from individual clients, the training time of each round is directly proportional to the training time of the slowest client.
on the IID version.It was observed that FL model trained on the non-IID version of multicountry data outperformed local country models when evaluated on global test data, thereby giving a more robust and generalizable road damage detection model.The gain in performance varied with data distribution and quality of the country, and the gain in mAP ranged from 1.33% to 163%.This approach can allow countries without their own road damage data sets to deploy road damage detection without accessing data from other countries.The performance of FL models was found to be significantly affected by factors such as data distribution and diversity.More local epochs at the client can speed up training by reducing communication with the server, but this can decrease model performance.Selecting a suitable model is important to ensure that the communication costs do not outweigh the benefits of training an FL model.