XML‐LightGBMDroid: A self‐driven interactive mobile application utilizing explainable machine learning for breast cancer diagnosis

Nowadays, breast cancer detection and diagnosis are done using machine learning algorithms. It can enhance cancer understanding and help in treatment selection and diagnosis. But many reliable decision assistance systems have been developed as “black boxes,” or devices that conceal their internal workings from the user. In fact, this method's output is difficult to understand, which makes it difficult for doctors to use it. This study uses explainable machine learning to investigate a technique for more promptly and accurately predicting breast cancer. The data is obtained from Kaggle to generate a machine learning (ML) model that forecasts the occurrence of breast cancer and Shapley Additive exPlanations (SHAP) are used to interpret the model's forecasts. To forecast the development of this disease, explainable machine learning (XML) model based on gradient boosting machine (GBM), extreme gradient boosting (XGBoost), and light gradient boosting (LightGBM) is built. The investigation's findings show that the LightGBM is capable of a maximum accuracy of 99%. An explainable ML has been demonstrated here which may produce an explicit understanding of how models generate their predictions, which is critical in boosting the confidence and acceptance of cutting‐edge ML methods in oncology and healthcare in general. Finally, a mobile app is also developed, integrating the best model.

currently the most widespread cancer worldwide, especially in developed nations.However, it is now rapidly spreading to middle-income and low-income countries, with more than 2.3 million women receiving a diagnosis in 2020, and 685,000 of them passing away. 4According to the World Health Organization, the worldwide mortality rate could rise to 2.9 million. 5However, there are two different kinds of breast cancer: benign and malignant.Benign tumors are slow-growing and have clear borders that never damage the surrounding tissue.Malignant tumors, on the other hand, rapidly invaded other bodily regions. 6It generally develops in and around the milk ducts and glands of the breast tissue.However, traditional methods for detecting breast cancer such as mammography have limitations, including low sensitivity and specificity, leading to missed diagnoses and unnecessary biopsies.But given the volume of medical imaging data, manual diagnosis is frequently time-consuming and vulnerable to observer variability.Expert doctors are, however, often rare.As a result, it can be challenging to diagnose breast cancer because these tumors normally do not show any symptoms.All of these medical tests are very expensive.Yet, if this cancer is found in the early stages, patients can avoid unnecessary therapies.
8][9] The ability of machine learning methods to predict oncological outcomes has been demonstrated by researchers; however, there are still a few barriers to its general clinical application.Lack of faith in the models is one of these barriers.This reduces patients' and clinicians' confidence in the predictions made by the models.Black boxes are frequently used to describe ML tools.If judgments must be made based on predictions provided by ML algorithms, users must be able to comprehend how and why the system arrived at that conclusion. 10L explainability has drawn a lot of attention during the past few years.Recently, two techniques have been presented and implemented to increase the interpretability of ML models.local interpretable model-agnostic explanations (LIME) 11 and Shapley Additive exPlanations 12 are these two categories of explanations.The goal of machine-learning explainability techniques is to give a clear explanation of how an ML model arrived at its output. 114][15][16] It solves the inadequacies of the most modern machine-learning techniques and is a very active area of research right now."Explainability" broadly refers to any method that aids the user or model creator in understanding why ML models behave in the manner that they do. 17Explaining symptoms that were suggestive of a specific diagnosis to patients 18 or assisting factory workers in identifying inefficiencies in a production pipeline 19 are just a few examples of how explanations can take many different forms.Additional advantages of explainable machine learning include model debugging, model monitoring, model transparency, fair and ethical decision-making, accountability, model audit, and making adjustments.Additionally, it has the ability to reduce mistake costs, diminish the consequences of model biasing, boost model performance, increase code confidence and compliance, and promote thoughtful decision-making.The Shapley additive explanation (SHAP) values, which stand for a unified method of evaluating predictions provided by sophisticated machine learning models, are the main topic of this article.A technique called SHAP values (Shapley Additive exPlanations) is used to make machine learning models more transparent and understandable.It is based on cooperative game theory. 20It provides a breakdown of variable contributions to ensure that each data point is represented by a variable contribution to the prediction made by the trained machine learning model as well as input properties.Explainable machine learning models are increasingly being used in a wide range of daily life contexts, such as finance, 21 social media, 22 and healthcare. 23ealthcare organizations need ML explainability more than any other sector because, if decisions are to be based on predictions generated by ML algorithms, users must be able to understand how the algorithm came to that result in order to believe and, more crucially, employ the model. 24Because of these benefits, this approach is increasingly often used today among academics.However, in order to increase the explainability of breast cancer 25 and broaden knowledge of humans without sacrificing predictive accuracy, machine learning models are constructed using XGBoost, GBM, Light-GBM, and SHAP values from Shapley additive explanation (SHAP) are primarily utilized to provide insight into how well classifiers function.
Following are the ways in which we contribute to the proposed research on breast cancer: 1.This study achieves a higher accuracy of 99% for LGBM.By using explainable machine learning, clinicians and researchers can understand how the model arrived at its decision, allowing for more accurate and reliable diagnoses.2. Another major contribution of this research is transparent and interpretable explanations for its predictions because of using an explainable machine learning approach.This can help build trust and confidence in the model's results, as well as facilitate communication between clinicians and patients.3.In order to make the output of the gradient boosting process intelligible to people, Shapley additive explanation values are used.By using explainable machine learning to analyze patient data, clinicians can identify individualized risk factors and develop personalized treatment plans.This can lead to more effective and targeted treatments that improve patient outcomes.4. To achieve the optimum outcome, feature scaling, performance matrices, and hyperparameter tuning are used.5. Based on this research, a user-friendly smart phone application that uses real-time inputs to calculate the outcome has been developed.
This study project divides the remaining text into four sections.The proposed materials and method of this research for predicting breast cancer using XML are illustrated in Section 3, and Section 4 outlines the related work.Part 3 is separated into four subsections: explainability, hyperparameter tuning, machine learning classifiers, and dataset description and data pre-processing.Section 4 has a thorough description of the outcome and discussion section.The environment setup, classification accuracy, model evaluation, SHAP result analysis, and implementation of the mobile app are the other five subsections that make up this section.The complete work of this study is concluded in Section 5.

LITERATURE SURVEY
Millions of people throughout the world are affected by breast cancer.The death rate is still high, despite substantial advancements in early detection and therapy, and there are still many unsolved questions.Explainable AI (XAI) is a young field that aims to create AI models and systems that can give visible, comprehensible justifications for their judgments and forecasts.In the case of breast cancer, XAI can assist physicians and researchers in comprehending the elements that contribute to the onset and spread of the condition as well as in locating potential treatments and interventions.
Here are some recent literature surveys on breast cancer using interpretable and non-interpretable machine-learning approaches: Four machine learning classifiers-XGBoost, Random Forest, Logistic Regression, and Univariate Logistic Regression-were employed in a different study by Vrdoljak et al. 26 to predict breast cancer lymph node metastases in patients who were eligible for neoadjuvant therapy.All Croatian hospitals provided data for this study.This dataset includes 719 patients who were determined to be NST eligible out of 8381 breast cancer patients.Using Shapely values, the authors calculated the explainability of the model.The results of this study demonstrate that patient age, tumor size, and Ki-67 are the three main indicators of breast cancer risk.In this experiment, XGBoost, with a mean AUC of 0.762 (95% CI: 0.726-0.794),had the best performance.
The authors of this study 27 developed an explainable machine-learning pathway for breast cancer diagnosis using ultrasound pictures.They used the first and second-order texture data from the ultrasound images and used them to create a probabilistic ensemble of decision tree classifiers for this experiment.They made advantage of a public dataset of breast ultrasound scans, which has a total of 780 images from 600 female patients.The most crucial texture characteristics for the model were determined by quantifying feature importance with SHAP values.The LightGBM method, with 500 iterations, performs best, with 0.91 accuracy, 0.94 precision, 0.93 recall, and 0.93 F1 score, according to this experiment.
Several machine learning classifiers, including random forest (RF), neural networks (NN), and ensembles of neural networks (ENN) were utilized in a further investigation of breast cancer research by Karatza et al. 28  In a different study, Arturo et al. 25 used conventional Cox Proportional Hazards (CPH) and three distinct machine learning classifiers, including Extreme Gradient Boosting Random Survival Forests, and Survival Support Vector Machines, to predict breast cancer.Data were gathered for this experiment from the Netherlands Cancer Registry (NCR).
To determine the best model, they contrasted each model's performance using Harrell's concordance index.To explain the predictions of the top-performing model, they employed SHAP values.In this experiment, XGB outperforms other models with a c-index of 0.73.
In their study on breast cancer, Jansen et al. 29 employed a variety of machine learning techniques, including Logistic Regression, Naive Bayes, Extreme Gradient Boosting (XGB), KNN, and Random Forest.For this investigation, information was gathered from the Netherlands Cancer Registry (NCR).They used Area under the Curve (AUC) as a tool to assess performance, and XGB produced the highest value-0.78-ofall the variables.Then, to better comprehend the model's forecast, they applied LIME and SHAP.
Uddin et al. 30 recently demonstrated that machine learning can significantly lower the death rate for breast cancer.They compared and analyzed breast cancer into benign and malignant tumors using 11 classifiers, including the support vector machine (SVM), Random Forest (RF), K-nearest neighbors (K-NN), Decision tree (DT), Nave Bayes (NB), Logistic Regression (LR), AdaBoost (AB), Gradient Boosting (GB), Multi-layer perceptron (MLP), Nearest Cluster Classifier (NCC), and voting classifier (VC).The studies made use of the Wisconsin Breast Cancer Dataset.To get a promising outcome, they used a variety of machine learning approaches, including feature scaling, principal component analysis (PCA), and hyperparameter tuning.Finally, they compared their results based on error rate, F1 score, accuracy, recall, and precision.As a result of the experiment, it was discovered that the voting classifier has the best accuracy (98.77%).Ultimately, they created a website that incorporates the most accurate model for predicting breast cancer.
One study by Mangukiya et al. 1 included the proposal of a mechanism to detect breast cancer.They evaluated the performance and data visualization of different machine learning methods, including Support Vector Machine (SVM), Decision Tree, Naive Bayes (NB), K Nearest Neighbors (k-NN), Adaboost, XGboost, and Random Forest.In order to conduct this study, they used the Wisconsin breast cancer Dataset.This paper's major goal was to explore different methods for machine learning-based early, effective, and accurate detection.The study's results show that XGboost has the best accuracy, at 98.24%.
Another study 31 by Nanda Gopal et al. also suggested a way to doing an early diagnosis of breast cancer utilizing the Internet of Things and machine learning.This study's main goal is to investigate how IoT-based machine-learning tools might be used to predict breast cancer.This experiment made use of the three machine learning classifiers LR, MLP, and RF.The dataset was compiled from the 569 cases and 32 attributes of the Wisconsin Breast Cancer Dataset (WBCD).Here, the dataset's dimension is reduced using principal component analysis (PCA), which picks out the crucial elements for accurate diagnosis.The accuracy, precision, recall, and F-measure of MLP classifiers are all greater at 98%, 98%, 97%, and 96%, respectively.
Recently, scientists have drawn a lot of attention for their work on breast cancer prediction.Many studies have been conducted in this area.However, in spite of having so much research, there are still so many technical gaps.Several researchers make use of modest datasets that do not adequately represent the population as a whole, which can lead to biased results.In addition, some studies only focus on specific populations or types of breast cancer, which may limit the generalizability of the models to other populations or types of breast cancer.Instead of concentrating on clinical value, some studies aim to increase model accuracy by minimizing false positives or detecting particular subtypes of breast cancer.During the experiment, we have seen that researchers have conducted different types of experiments, used various machine learning algorithms, and shown different results to detect breast cancer.The majority of experiments, however, are not reliable.In that case, explainable machine learning can aid in bringing openness and interpretability to the decision-making process, which is crucial for fostering confidence in the models being used and making sure they are producing accurate and unbiased predictions.That is why we have been motivated to work on this topic to overcome these limitations.

MATERIALS AND METHODS
This section outlines the suggested strategy of this experiment and the methods used to identify breast cancer.The overview of the proposed XML-LightGBMDroid is shown in Figure 1.This figure makes it very evident that for this experiment, the dataset's CSV file had been loaded first.The next stage was to prepare the data.After experimenting with various machine learning techniques, SHAP, an explicable machine learning approach, was used.Then, we employed performance matrices to determine which model would work best for this experiment.The best model was then adjusted using a flask to create a smart phone app that allows users to forecast cancer.The best accuracy of 99% is obtained using LightGBM after employing several machine learning approaches, such as feature scaling, hyperparameter optimization, and performance matrix.After all, models have been trained and accuracy calculations have been accomplished.Finally, a mobile app is constructed that allows a user to input data to decide whether they will develop breast cancer or not. Figure 2 provides a description of the proposal's application workflow.This figure demonstrates that users must provide some inputs in order to receive the predicted outcome.As data is provided through a mobile app, it begins to function on the backend, where the explainable machine learning model receives a request from the Flask server and responds with the projected outcome.The mobile app then receives this result from the Flask server and displays it for the user.This proposed method investigated feature selection techniques with standard scalar where each feature's values in the data have zero mean and unit variance, Hyperparameter tuning with Grid search can be used to evaluate the best model, and performance matrices is also used to gauge the effectiveness of a classification model.To determine breast cancer with the best level of accuracy, these features are supplied into the suggested classifiers.This experiment revealed a promising outcome for predicting breast cancer using all machine learning classifiers.Figure 3 displays the workflow of the suggested XML-LightGBMDroid.This figure shows that firstly we preprocessed the data which involves cleaning the data, handling missing values, and transforming the data to the required format.
Then we used the feature scaling technique to standardize the range of features or variables in a dataset.It helps to ensure that the features are on the same scale and thus can be compared on an equal footing, which can lead to better performance and more accurate predictions.The next step is to select the best machine-learning model that can accurately predict breast cancer.After selecting the model, the performance of the model was evaluated using various metrics such as accuracy, precision, recall, and F1-score.Then explainable machine learning is used to provide insights into the model functionality that influences the model's decision.This is done using SHAP (Shapley Additive exPlanations) values.We have shown feature importance, waterfall plot, and beeswarm plot, to visualize and interpret the contribution of each feature to the model output as a combination of SHAP methods.The final step of our work is to deploy the model in a production environment.The model is deployed as a mobile application.
In comparison to the existing methodologies, a variety of strategies, such as label encoding, feature scaling, and hyperparameter tuning are used which provided more promising accuracy.Also, we used all SHAP techniques in our work that had not been used in earlier research, including feature importance, beeswarm, and waterfall plot.For ease of comprehension, the Algorithm for this experiment is given in Algorithm 1.The entire operational process of this experiment is represented by this algorithm.A few crucial files were imported, including SHAP and dataset.Here, a function called DO EXPLAINABLE is defined and invoked at the end of the algorithm.We used SHAP's built-in function shap.Explainer to compute SHAP values.The SHAP library's pre-existing functions, such as a bar(), beeswarm(), and waterfall(), were used in this case to generate a bar plot, beeswarm plot, and waterfall plot.After that, it is determined whether the data value is NaN or empty.Following a final check, these values are changed, and label encoding is carried out as a preliminary pre-processing step.We used the feature scaling method after putting the dataset into the x and y dimensions.The hyperparameter tuning (HPT) upto length models were then examined.Finally, we computed performance metrics to find out the best model for this experiment.

Dataset description and data pre-processing
Modern medical diagnostics makes extensive use of machine learning classifiers.In the experiments, the Wisconsin Breast Cancer Dataset is used openly.There are no missing values in this dataset, which is composed of 569 instances and 30 features that are taken from the UCI repository.Figure 4 illustrates the distribution of breast cancer types in this dataset, which includes 357 benign or non-cancerous and 212 malignant or cancerous cases.
The goal of data pre-processing is to produce the best outcome possible for the dataset.The accuracy of the results may be lowered by the dataset's noise, missing values or information, and unbalanced data. 32As a result, before running the machine learning model, undesirable items from the dataset must be removed.The diagnosis columns M and B in this data set stand for malignant and benign, respectively.These string data are transformed into the numbers 0 and 1, where 0 denotes malignant data and 1 denotes benign data.Additionally, the independent features that are present in the data are standardized in a set range using the feature scaling technique.It is carried out as part of the data pre-processing to deal with drastically different magnitudes, values, or units.

Hyperparameter tuning
A hyperparameter is a machine learning parameter whose value is selected prior to algorithm training.It entails determining the hyperparameter settings for a learning algorithm that is optimal, then using this optimized algorithm on any data set.By minimizing a predetermined loss function, that set of hyperparameters maximizes the model's performance and yields better outcomes with fewer errors.By doing this, errors in the models are decreased in this experiment.As a result, GridSearchCV is combined with hyperparameter tuning.In order to discover the ideal values for the desired hyperparameters, the GridSearchCV technique for hyperparameter tuning does a thorough cross-validation search. 33It evaluates the model for each individual combination of values from the dictionary and verifies it. 34Therefore, the optimal model with the highest accuracy for each set of hyperparameters is selected.

Machine learning models
Three ensemble-based Machine Learning models-Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), and LightGBM-are used to predict breast cancer for data modeling in this experiment.
F I G U R E 4 Class distribution of breast cancer.

Extreme gradient boosting (XGBoost)
The Extreme Gradient Boost model is used here, one of the most well-known and quick machine learning algorithms illustrating this methodology without sacrificing generality. 35An effective open-source version of the technique of the gradient-boosted tree is called XGBoost.Classification, ranking, and prediction are just a few of the several learning tasks that can be supported by the optimization technique known as gradient boosting. 36he success of XGBoost is primarily due to its scalability in all situations.XGBoost's scalability is made possible by several significant systems and algorithmic improvements. 37The idea behind this is to use an iterative weak classifier calculation to get accurate classification effects. 37The primary distinction between XGBoost and other gradient-boosting algorithms is the adoption of a novel regularization strategy to curb overfitting.To prevent over-fitting, this classifier uses the tree model complexity as a regular term in the target function.The XGBoost algorithm's target function- where, l ( is the loss function.Consequently, it is quicker and more reliable throughout the model tuning.

Gradient boosting machine (GBM)
GBM is the learning procedure that successively fits new models to offer a more accurate estimate of the response variable. 38The fundamental concept of this technique is to build a new base of learners to have a maximum correlation with the ensemble's overall negative gradient of the loss function. 31If the error function is the traditional squared-error loss, the learning process would produce consecutive error-fitting.However, the loss functions employed can be random in order to provide a greater intuition.The following equation determined the gradient decent size. 39,40 ] . (2)

Light gradient boosting machine (LightGBM)
The light gradient boosting machine (LightGBM) is open source and was created by a Microsoft team in April 2017 to shorten the implementation time. 41The parallel voting decision tree technique, which uses the histogram-based algorithm 42,43 to optimize parallel learning, speeds up the training process, uses less memory, and incorporates improved network connectivity.Divide the training data across several machines, and in each iteration, decide on the local voting decision to choose the top k characteristics and the global voting decision to receive the top 2 k attributes.Initializing n decision trees, where the weight of training samples is 1/n, and training weak classifier f(x) are the major steps of the LightGBM technique.The strength of the weak classifier is then calculated.Updating Weights is the following step, and then the final classifier is produced.Equation (3) represents the LightGBMapproach. 44n (x) =  0 q 0 (x) +  1 q 1 (x) + … +  n q n (x).
The leaf-wise approach is used by LightGBM to identify the leaf with the greatest splitter gain.Several hyperparameters are available to fine-tune the method and to find the optimum ones for each model, 10-fold cross-validation is used to thoroughly search all potential values.The "GridSearchCV" program from the sklearn package was used to conduct the search.In this study, a variety of machine-learning algorithms were tested for their ability to predict the development of breast cancer, with LightGBM outperforming other classifiers.

Explainability
Finally, we were eager to comprehend how the models generated their forecasts.Clinicians may find it more challenging to comprehend how predictions are created with the XGBoost, GBM, and LGBM models.Explainability is related to the idea of explanation as a communication channel between people and a decision-maker that is both understandable to people and an accurate representation of the decision-maker. 45Therefore, to increase model explainability for such models, Shapley Additive exPlanations (SHAP) is used, a game theoretic post-hoc interpretation method 46 that is model-independent.Several features of this method make it useful for this research work.SHAP values are not model-specific.This indicates that they are not restricted to a specific kind of model, which was important to our investigation.Additionally, SHAP values exhibit local correctness, missingness, and consistency features that are not simultaneously present in other approaches. 47Another justification is that the implementation is simple to use, well-documented, and has active community support.
The aim of SHAP is to calculate each feature's contribution to the prediction in order to clarify the prediction of an instance.From coalitional game theory, the SHAP explanation approach derives Shapley's values.A coalition's players are the feature values of a data instance.This refers to giving characteristics in a model prediction task a (quantitative) relevance value based on how much of a prediction they contribute.A Shapley value is thus described in our context as the average marginal contribution of a feature value overall potential feature coalitions.According to SHAP, the justification is: where the explanation model is represented by g, z'∈{0,1} defines the coalition vector,  j ∈R defines the feature attribution for a feature j and the maximum size of a coalition is M. The difference between the actual forecast and the average prediction for the entire dataset can be read as a Shapley value for a specific feature value.Additionally, SHAP values employ the Shapley interaction index from game theory, which enables rewards to be distributed across all pairs of participants in addition to just the individual players.In this manner, SHAP values are able to account for the modeling of local interaction effects that would otherwise go unreported.This quality is crucial because it makes it possible to offer fresh perspectives on the variables in the model and their relationships.The reasons for choosing SHAP over additional explanability algorithms are outlined below.The first is global interpretability; the total of the SHAP values may show how much each predictor, whether positively or negatively, influences the target variable.With the ability to show the angle of each variable's relationship to the target, this plot is comparable to the variable significance plot.The second benefit is local interpretability, where each observation is given a distinct set of SHAP values.This considerably increases its transparency.We are able to explain both the predictions made by a case and their predictors.Traditional variable significance algorithms only offer results for the entire population and do not offer results for each specific case.As a result of the local interpretability, the effects of the components can be separated and contrasted easily.Third, unlike earlier methods that used surrogate models like logistic or linear regression, the SHAP values can be calculated for any tree-based model.

RESULT AND DISCUSSION
The outcome of the evolving system is described in this section.Training makes up 65% of the data, and testing makes up 35% of this study.Various machine learning methods with feature scaling, hyper-parameters, and performance indicators are used to obtain the highest level of accuracy.

Environmental setup
This work uses explainable machine learning to diagnose breast cancer utilizing a combination of high-performance computing tools, computer languages, and software.F I G U R E 5 Accuracy for the GBM, XGB, and LGBM.

Classification accuracy
We employed Accuracy, Precision, Recall, F1-Score, and AUC as performance measures to assess and compare the models after applying machine learning algorithms to the dataset.The main motto of these performance metricsis to evaluate the precision, effectiveness, and quality of predictions provided by a model.These metrics offer quantitative measurements that enable us to gauge how well a prediction model is operating and accomplishing its stated goals.Table 2 represents the performance of GBM, XGBoost, and LightGBM in terms of machine learning techniques.
This table demonstrates that across all of these classifiers, the LGBM classifier has the highest accuracy with only a 0.010% error rate.The best performance has been demonstrated not just by accuracy metrics but also by precision, recall, and F-measures.XGB has the second-best accuracy while GBM performed poorly, with the maximum inaccuracy being 0.025%.For classification algorithms, accuracy has been utilized as a major performance parameter.Figure 5 shows the study's accuracy results.

Model evaluation
A confusion matrix is an additional helpful tool for assessing how well a classification model is performing.It is used to calculate a number of performance indicators, including F1-score, recall, accuracy, and precision.These metrics give us information about the model's effectiveness and assist us in identifying potential areas for development.True positive (TP), false positive (FP), true negative (TN), and false negative (FN) are the four categories that make up a confusion matrix.The actual classes are typically listed in the rows of the confusion matrix, while the predicted classes are listed in the columns.The examples that were correctly classified are represented by the matrix's diagonal, whereas the incorrectly classified are shown by the matrix's off-diagonal.The confusion matrix for each classifier is displayed in Figure 6.
In addition, the AUC-ROC curve matric is also used to assess the effectiveness or quality of the model.These performance indicators enable us to evaluate how well the model handled the supplied data.By adjusting the hyper-parameters, the model performs better.The following performance metrics are used to gauge each classification's performance, which is estimated using a 10-fold cross-validation- where TP, TN, FP, and FN, stand for the corresponding numbers of true positives, true negatives, and false positives.Accuracy reflects a thorough assessment of a classifier's entire performance. 35call (%) = TP TP + FN × 100.
The capacity of a classification model to predict positive cases is known as recall. 48Increasing the ability to identify positive instances in the prediction of breast cancer recurrence increases patient survival rates by enabling the patient to receive the best care available.Recall is thus one of the key metrics used to assess the effectiveness of classification model performance.
F-score, also known as F1 Score, is a statistic used to assess a binary classification model based on predictions provided for the positive class.Precision and Recall are used to compute it.A single score of this kind combines precision and recall.Therefore, the harmonic mean of both precision and recall can be used to generate the F1 Score, giving each variable equal weight.
In addition, the AUC-ROC curve is employed here to display the performance of the classification model on graphs.It is a well-liked and significant statistic for assessing how well the classification model is performing.The ROC curve is a graph that displays how well a classification model performs at various threshold values.It is the most often employed ROC curve summary metric. 49The effectiveness of a classifier increases with the AUC value.The curve is drawn between the following two parameters: True positive rate (%) = TP TP + FN × 100.(10)   Figure 7 depicts the Receiver Operating Characteristics (ROC) curve, which contrasts the true positive rate (TPR) on the x-axis with the false positive rate (FPR) on the y-axis, for a variety of threshold values (usually percentile values). 19By examining the AUC-ROC curve, it is determined how well the machine learning classifier is performing.The Area under the ROC curve is typically used to summarize the ROC curve.A value between 0 and 1 is known as the under the ROC curve value (AUROC).The model performs better the higher the AUROC.A model that predicts 100% incorrectly will have an AUC of 0.0, whereas a model that predicts 100% correctly will have an AUC of 1.0.

SHAP result analysis
The Shapley value interpretations of the strokes in the test set utilizing the values of their explanatory variables.Model explainability is the ability of a model to be accurately anticipated by a person.In the discipline of machine learning, it is simpler to understand and interpret the predictions made the better an explanation a model has.As a result, SHAP is employed here, which calculates each input feature for all samples in the data set, to explain the model outputs and ascertain the degree to which a particular characteristic contributes to the outcomes of a specific event.This investigation led us to the strategy of feature importance.It is utilized in a variety of fields, including banking, healthcare, facial identification, and content control.It is also referred to as feature attributions and feature-level interpretations.This explainability technique is the most popular and well explainability method. 50,53Figure 8 represents the SHAP feature importance for predicting breast cancer.Permutation feature importance is a substitute for SHAP feature importance.Between the two important measures, there is a significant difference.Based on the decline in model performance, permutation feature relevance is determined.SHAP was developed based on the size of feature attributions.Nonetheless, this graph demonstrates that area_se has the greatest contribution to prediction by 25%, followed by texture worst by 23%.The lowest contributions for it are by 1% for radius_mean, perimeter_mean, and compactness_mean.
When a matrix of SHAP values is passed to the bar plot function, a global feature importance plot is produced, where the global importance of each feature is assumed to be the mean absolute value for that feature overall provided samples.Concavity_worst, which has mean absolute SHAP values that are much greater than those of any other feature, is shown in the Figure 9A to be the most important factor in GBM.
Similarly, to Figure 9B, Concave_points_worst is the most important component by +1.03.Additionally, the LGBM is most impacted by perimeter_worst, followed by concave points_mean, and concave points_worst.
Additionally, Beeswarm is a different type of plotting technique designed to display an overview of a dataset's key properties and how they influence the model's output in a way that is both information-dense and simple to comprehend.This summary demonstrates the association between a variable and the prognosis for breast cancer.Positive SHAP levels indicate a higher likelihood of predicting breast cancer, whereas negative SHAP values indicate a lower likelihood of developing breast cancer.Higher values are displayed in red, while lesser values are displayed in blue, as represented by the colorbar.This demonstrates, for instance, that whereas negative differences are linked to the absence of cancer, positive differences are linked to the presence of cancer.
According to Figure 10, it is evident that, when its values are large, the concavity_worst variable has a high positive contribution for (a) GBM and a very low negative contribution when its values are low.The total of the remaining 21 features contributes relatively little to the forecast.Concave_point_worst also contributes the most to (b) XGBoost, while Perimeter_worst contributes the most to (c) LGBM.The most significant variable is displayed at the top, and the least significant variable is displayed at the bottom to show the relevance of each variable.The length of the bar indicates the SHAP value for each observational feature.In Figure 11, Compactness_se in the aforementioned example has a SHAP value of −3.86, concavity_worst has a SHAP of +2.19 and so on for (A) GBM.E[f(x)] -f will be equal to the total of all SHAP values(x).Compact_se contributed the most to the forecast, followed by fractional_dimension_se, compactness_mean, and texure_worst for the (A) GBM.
And for (B) XGBoost, compactness se made the biggest contribution, whereas texture_worst made the least contribution to the prediction.Similarly, for (C) LGBM, fractional_dimension_sehas the characteristic with the lowest contribution to the prediction, and compactness_se has the highest contribution.
It is evident from this section that LGBM is highly effective to detect breast cancer.The suggested method uses explainable machine-learning techniques to present an innovative and efficient way of detecting breast cancer.The method displayed great accuracy, precision, memory, and F1 score, showing its potential to enhance clinical decision-making    and trends in patient groups that could guide public health initiatives.Further research into the prognosis of breast cancer using XAI should, however, concentrate on enhancing the models' accuracy, interpretability, and fairness as well as confirming their utility in clinical settings.

Implementation of mobile app
Finally, the best model is then integrated into a mobile app.This application was made with React Native.This app has a user input form that gathers feedback from users and predicts breast cancer.A pkl file is first created when the model has been trained in the Jupyter notebook, and then an API is created using the flask program.An Android app is enhanced with a machine learning model using this API, and the results are displayed as seen in the accompanying Figure 12.
The page where the input for forecasting the outcome can be provided by anyone is shown in Figure 12A.Figure 12B,C represent the output after the inputs.All users' prediction results stored for breast cancer using this software are shown in Figure 12D.

CONCLUSION
Numerous research already conducted on medical decision support systems have the propensity to ignore the influence of system users in favor of improving accuracy.Because there is currently no clear consensus on how to assign malpractice blame in the context of medical decision support systems.It is frequently exceedingly challenging for doctors to accept and believe in a system's output when it is offered in the form of solely predictive outcomes.A decision support system is created for breast cancer prediction to address this issue.The system uses an XGBoost, LGBM, and GBM classifier to estimate the likelihood of breast cancer recurrence.In this experiment, LGBM outperformed the other classifiers in terms of accuracy, achieving 99%.Thereafter, SHAP is employed here for interpreting machine learning models which enables us to pinpoint the characteristics or variables that are influencing a model's predictions.It provides an explanation of the machine learning model's output in terms of the significance of each individual feature and quantifies how much each feature contributes to the predicted result.The insights provided by SHAP is used hereto aid physicians and researchers in better comprehending the underlying causes of breast cancer and in the development of more precise and dependable models for predicting breast cancer risk and diagnosis.Finally, a user-friendly mobile app is also developed based on the best model for better results representation.Another research study using image processing and deep learning techniques will be conducted in the future in order to develop best results that can be trusted.We will try to develop a large dataset with more information to conduct future research on this subject.We are optimistic that this investigation will aid in the effective treatment of breast cancer.

F I G U R E 1
Overview of the proposed XML-LightGBMDroid.F I G U R E 2Working flow of the mobile application.

F I G U R E 8
Feature importance determined by SHAP-values.The average absolute to assess a SHAP feature's importance, Shapley values are utilized.Area_se, the most important variable, modifies the predicted absolute chance of breast cancer by an average of 25 points (25 on x-axis).

F
I G U R E 9 Global Feature importance based on SHAP-value for (A) GBM, (B) XGBoost, and (C) LGBM.The global feature relevance is illustrated using the mean absolute SHAP values.

F
I G U R E 11 SHAP waterfall plot for (A) GBM, (B) XGB, and (C) LGBM.The target (dependent) variable's values are represented on the x-axis.x is the selected observation, f(x) is the model's predicted value given input x, and E[f(x)] is the target variable's expected value, or alternatively, the mean of all forecasts.The absolute SHAP value illustrates how much a single feature influences the prediction.

F
I G U R E 12 Android application, input field (A), result field (B)(C), all users' prediction results (D).
A variety of interpretability techniques were used to explain the model's predictions, including the Global Surrogate model, Individual Conditional Expectation plots, and Shapley values.For testing and training, the 569-sample WDBC dataset, of which 357 are benign and 212 are malignant, was employed.The results of this investigation demonstrate that ENN has the highest accuracy, with predictions being explained by Individual Conditional Expectation plots.
1. Working Procedure of XML Breast Cancer Prediction

Table 1
Environment setup of the proposed system.Evaluation of explainable machine learning methods.
displays the environment setup for this study's model development.TA B L E 1