Deep Learning‐Based Skin Diseases Classification using Smartphones

Skin disease recognition is one of the essential topics in the medical industry. Detecting skin disease from appearance can be difficult due to the similar appearance of skin lesions. In some cases, such as the monkeypox virus, the illness must be quickly determined, and the patients must be isolated to reduce the spreading of the disease. This study aims to create a deep learning‐based automated intelligent mobile application to detect skin disease. First, different small‐size pretrained networks are trained for skin lesion image classification. Then, the most suitable network from the viewpoint of both performance and mobile compatibility is transformed into the TensorFlow Lite format. Finally, a mobile application is created on the Android platform that utilizes the smartphone's camera to obtain images and uses TensorFlow Lite to make predictions. The proposed system produces 74.27% classification accuracy for seven classes on a combined dataset. It produces comparable/better results compared to the literature. Owing to the proposed system, the patients can make a preliminary diagnosis of their lesions using their smartphones. Thus, risky patients can be encouraged to visit the hospital for a definitive diagnosis. In addition, the mobile application can avoid undue stress and false alarms.


DOI: 10.1002/aisy.202300211
Skin disease recognition is one of the essential topics in the medical industry.Detecting skin disease from appearance can be difficult due to the similar appearance of skin lesions.In some cases, such as the monkeypox virus, the illness must be quickly determined, and the patients must be isolated to reduce the spreading of the disease.This study aims to create a deep learning-based automated intelligent mobile application to detect skin disease.First, different small-size pretrained networks are trained for skin lesion image classification.Then, the most suitable network from the viewpoint of both performance and mobile compatibility is transformed into the TensorFlow Lite format.Finally, a mobile application is created on the Android platform that utilizes the smartphone's camera to obtain images and uses TensorFlow Lite to make predictions.The proposed system produces 74.27% classification accuracy for seven classes on a combined dataset.It produces comparable/better results compared to the literature.Owing to the proposed system, the patients can make a preliminary diagnosis of their lesions using their smartphones.Thus, risky patients can be encouraged to visit the hospital for a definitive diagnosis.In addition, the mobile application can avoid undue stress and false alarms.
The datasets that have been combined in this step are the PAD-UFES-20 [19,20] and the Monkeypox Skin Lesion Dataset (MSLD). [21,22]The newly formed dataset contains images of various skin diseases, including monkeypox.Second, data balancing has been applied to prevent the model from becoming biased toward any particular class.Then, different pretrained convolutional neural network (CNN) models were trained with the newly created dataset, and network results were compared.To be able to utilize the deep learning models on mobile devices, TensorFlow [23,24] has been used.TensorFlow is a machine learning (ML) library that helps to create ML models.In this study, the most suitable network from the viewpoint of both performance and mobile compatibility has been created on TensorFlow and transformed into a TensorFlow Lite model [25] to be used in the mobile system.TensorFlow Lite is a mobile framework that is used to run ML models on portable systems such as smartphones and Internet of Things (IoT) devices.Eventually, a mobile application that runs on Android smartphones has been generated in this study.The mobile application gathers skin images through the Android smartphone's camera and performs classification utilizing the TensorFlow Lite framework.
The contributions of this study are given in the following items: 1) Contribution in terms of healthcare: An essential, affordable, and noninvasive disease diagnostic mobile application has been created.Owing to the proposed system, contagious and requiring urgent detection of diseases such as monkeypox will able to be prediagnosed by patients' mobile phones.Patients can be motivated for an actual diagnosis through an expert.Therefore, the transmission rate of the disease can be reduced with the help of the proposed system; 2) Contribution in terms of computer science: A low-modified ResNet-18 model has been introduced.The model was trained to classify skin lesion images and transformed for mobile applications; and 3) Contribution in terms of data science: Two different skin lesion datasets have been combined and a new dataset that includes monkeypox images has been used for seven-class classification problem.

Skin Disease Image Analysis
Skin disease image analysis is a challenging and popular task in the computer vision field.Nida et al. [26] studied melanoma lesion detection and segmentation using deep region-based convolutional neural network and fuzzy C-means clustering.Karthik et al. [27] classified the skin images as acne, actinic keratosis, melanoma, and psoriasis.They used CNN-based approach.In the study of Anand et al., [4] a transfer learning-based model has been developed with the help of a pretrained Xception model.In the study of Zheng et al., [28] skin disease health detection of college students has been done.Three skin lesions were identified and classified: actinic keratosis, melanocytic nevus, and vascular lesions.Srinivasu et al. [29] proposed a study that classifies skin disease using MobileNet V2 and long short-term memory.They used the HAM10000 dataset and compared the performance of the study against the state-of-the-art deep learning models.Their shared results show that the proposed method is faster than the conventional MobileNet.Along with this, they also have developed a mobile detection application.Medhat et al. and Alyami et al. [30,31] selected two cancer types from PAD-UFES-20 dataset (in which we used six classes in it) and applied a comparative study.Chen et al. [32] used clinical data and visual data to classify skin lesions.
Due to the monkeypox pandemic, monkeypox detection studies have recently been popular in the literature.In the study of Ahsan et al., [33] monkeypox-infected skin images were collected from Google and analyzed using deep learning approaches.Ali et al. [22] created a human monkeypox image dataset.Then, they classified the images using VGG16, ResNet50, InceptionV3, and Ensemble networks.Sahin et al. [34] performed a binary image classification as monkeypox versus nonmonkeypox.

Mobile Applications for Healthcare Improvement
Owing to recent technological developments, smartphones can be used to monitor healthcare.Therefore, people can rapidly learn about their health conditions and be monitored by their doctors without additional device costs.
Mamoun et al. [35] developed a healthcare mobile application prototype.The application provides diseases to be diagnosed online by medical specialists.Moreover, it simplifies the ordering of medicines using online payment.Cho et al. [36] proposed a mobile healthcare application for investigating the effectiveness of oral and pharyngolaryngeal strength training on voice in older women.Watts et al. [37] developed The Facial Remote Activity Monitoring Eyewear (FRAME) mobile application.This application provides nearly real-time, safe remote access for therapists monitoring their patients.Berger-Groch et al. [38] reviewed mobile applications used for diagnosing and treating tumors in orthopedic oncology.

Methodology
The proposed study focuses on classifying skin diseases using smartphones.The proposed system concentrates on distinguishing several skin lesions, including monkeypox.Figure 1 shows the general system pipeline.As seen in the figure, first, a merged skin lesion image dataset has been created and some data augmentation techniques have been applied to these images.Then, using the transfer learning approach, different small-size pretrained networks were trained with these images.The pretrained network models have been compared, and the most suitable network from the viewpoint of both performance and mobile compatibility has been converted to the TensorFlow Lite model.Finally, a mobile application has been developed for the skin lesion classification task.

Deep Transfer Learning for Skin Image Classification
Advances in deep learning started to enable accurate image analysis.[41][42] CNN is a deep learning model that includes multiple layers to extract higher-level features from the raw input.
CNNs are created with different combinations of convolution, pooling, activation, dropout, fully connected, and some other layers.The last layer is the classification layer for image classification problems.In the convolution layers, filters are convolved with the input data, and features are extracted from training samples.The mathematical definition of convolution is shown in Equation ( 1). [43]i, jÞ In the equation, I is the input image with two dimensions, K is the two-dimensional kernel, and S is the two-dimensional output after the convolution process.(i,j) represent matrix indexes and (m,n) represent filter sizes.
Pooling operations can transform multiple cells into one cell.The pooling layer is used to reduce the input size in width and height.As an activation function, the rectified linear unit (ReLU) is mostly used.It regularizes the CNN with a reduction of the model parameters because it ignores negative values.Theoretically, these values are not activated regardless.
Dropout randomly removes some output features of a layer during training.Thus, it can prevent the model from overfitting.The fully connected layer connects to all nodes in the previous layer.It is used just before the classification layer.
Training data for specific problems such as medical image analysis is often limited because of costly acquisition and limited accessibility.Also, the training operation usually requires high computing costs.A common solution to these problems is using transfer learning during CNN model training.Transfer learning carries features learned on one problem to a new problem. [44]n this study, different pretrained networks have been used for the skin lesion classification task.The list of the networks and their characteristics can be seen in Table 1.These networks have been selected because of their small size.Since the developed application will be used on mobile devices with resource constraints, we made our choice in this way.The sizes reported in the table are of approximate values.They may vary depending on different model formats such as onxx, tf, and tflite.
A unique fully connected layer with seven outputs has been created as a substitute for the fully connected layers of the pretrained networks.Thus, the networks have been made ready for the seven-class skin disease classification.

Mobile Application for Skin Lesion Detection
To utilize smartphones for skin lesion detection, first, a mobile platform should be chosen.In this first step, the authors decided to use the Android operating system (OS). [45]Android OS supports a wide range of device types such as tablets, smartphones, smartwatches, and TVs.
An Android application can be developed using different approaches.The first approach is to create a native Android application that compiles to and runs on only Android OS.We can apply this approach using the mobile application tools provided in the Android ecosystem. [46]The second approach is using cross-platform/multiplatform frameworks like Flutter, [47] Video Stream  Kotlin Multiplatform Mobile, [48] and React Native. [49]These frameworks' main aim is to provide a platform to create an application using a single code base that can run on multiple computing platforms.The last approach is utilizing Progressive Web Applications (PWAs). [50]PWAs are web applications that can be installed on devices and work offline.In this study, the authors decided to use the first approach.In line with this approach, Android Studio has been chosen.The Android 12 standard development kit (SDK) and hence Android application programming interface (API) level 31 was used with the Android Studio.The development was performed using Kotlin programming language.Skin lesion detection using a smartphone requires gathering skin images using the mobile device's camera.Currently, there are two different camera APIs in the Android platform.The first one is Camera2 API, and the second one is CameraX API.In this study, CameraX API was used.
An ML framework supporting mobile systems is needed to classify skin lesion images with smartphones.For this need, TensorFlow Lite has been chosen.With the help of TensorFlow Lite, we can run ML models on portable devices like smartphones and edge devices.One of the main aspects of this framework is that it is optimized to perform ML tasks on mobile devices.It provides high performance.It can use hardware accelerators like graphics processing units (GPUs) and digital signal processors (DSPs).It can also use optimized models.
Executing the models on TensorFlow Lite run-time and making predictions is called inference.TensorFlow Lite offers different ways to run inference, such as TensorFlow Lite Support Library, TensorFlow Lite Task Library, and TensorFlow Lite Interpreter API.The authors have chosen the TensorFlow Lite Interpreter API.It is a low-level API that can be used on multiple platforms and languages.
The technologies used during the mobile system development are given in Table 2.
TensorFlow Lite has its own model format called the TensorFlow Lite model.Therefore, the network training and the model creation have been done using TensorFlow.Before deploying to the Android application, the TensorFlow model was transformed into the TensorFlow Lite format.
A typical process of machine learning tasks on Android smartphones using TensorFlow Lite is illustrated in Figure 2. The input data are first obtained using the sensors of the device.In this study, the input data are the image frames from the camera.In the second step, the input data are converted to a tensor.Tensors are the inputs of the TensorFlow Lite run-time.After getting the input, the TensorFlow Lite run-time runs inference (makes a prediction) on the data and outputs its result again as a tensor.In the last step, output tensors are interpreted, and prediction is gathered.
CameraX API provides four use cases: 1) preview use case that helps display the camera stream on the smartphone screen; 2) image analysis use case that helps the processing of the image frames; 3) image capture use case that enables capturing photos; and 4) video capture use case that enables the capturing video and audio.In this study, the preview and the image analysis use cases were used.In light of this information, the workflow of the developed application is shown in Figure 3 and is explained in detail below.
In the first step, with the help of the preview use case (preview object), the camera preview stream is connected to the user interface (UI) surface of the application for displaying the stream on the screen of the device.In the second step, with the help of the image analysis use case (ImageAnalysis object), the image frames of the device's camera are delivered to the application for processing and prediction purposes.
In this study, to use the ImageAnalysis object, three parameters needed to be set: operating mode, image format, and aspect ratio.The configured parameters are given in Table 3 and explained here.The ImageAnalysis object has two operating modes: nonblocking and blocking.The application uses nonblocking mode.In this mode, during the analysis of an image frame, only the newest arriving image is cached in the image buffer.This mode is enabled by setting back the pressure strategy to STRATEGY_KEEP_ONLY_LATEST option.Because RGB image format is needed for ML tasks, RGBA image format was set as the output color space with the OUTPUT_IMAGE_FORMAT_RGBA_8888 option.In addition, 4:3 aspect ratio was set with the RATIO_4_3 option.
After getting the image frames with the help of the ImageAnalysis object, in the third step, the RGB bits of the image are copied to a bitmap buffer.In the fourth step, the image is processed with the help of the ImageProcessor object of TensorFlow Lite.During this phase, the image is cropped, resized, rotated, and normalized.Then, a TensorImage is created, which is the input of the TensorFlow Lite run-time.In the fifth step, the TensorImage is given to the TensorFlow Lite Interpreter object as input, and inference is run.The TensorFlow Lite returns the classification result as an output tensor.In the last step, the prediction value of each class is  extracted from the output tensor, and the maximum prediction value is shown to the user as a percentage value together with the corresponding class name.

Dataset
The dataset used in this study includes seven classes.It has been created using two skin lesion datasets: PAD-UFES-20, [19,20] and MSLD. [21,22]PAD-UFES-20 dataset includes six class image lesions.These are actinic keratosis (ACK), basal cell carcinoma (BCC), melanoma (MEL), nevus (NEV), squamous cell carcinoma (SCC), and seborrheic keratosis (SEK).All of these lesions and images have been included in the combined dataset.MSLD has two classes: Monkeypox (MPX) and others.Label "others" includes different skin lesions other than monkeypox without a specific disease label.Therefore, the "others" class has been excluded from this study.Because of the monkeypox pandemic, [51] fast and correct classification of monkeypox images has been crucial recently.Therefore, the authors included the monkeypox (MPX) class in the newly formed dataset.As a result, 2,298 images which include 1,641 skin lesions from 1,373 patients have been used from PAD-UFES-20 and 102 images have been used from MSLD.Table 4 shows the details of the datasets.
After that, the merged dataset was divided into two subsets: the training set (80%), and the testing set (20%).Also, to ensure a balanced dataset, the training subset has been augmented using reflection, rotation, scale, and translation approaches.The imbalanced and balanced versions of the dataset are shown in Figure 4.The values on the y-axis represent the number of samples.

Experiments on Skin Lesion Image Classifications
In this study, small-sized networks are especially preferred in terms of integration into mobile applications.Thus, six smallsized pretrained networks have been utilized.These networks had originally been trained using ImageNet dataset [52] and can classify images into 1,000 categories.The fully-connected layers of the networks were modified to classify skin lesion images into seven classes.The parameters of the training can be found in Table 5.
The classification results using the combined dataset are given in Table 6.As seen in the table, data balancing improved the result for all networks and the best accuracy has been obtained from EfficientNetb3 with balanced data.EfficientNetb3 was also trained and tested in this study because, as shown in Table 7, this network produced successful results in another study.However, we decided to use the ResNet-18 model for our application because of its mobile applicability from resource usage and performance viewpoints.The file sizes of our EfficientNetb3 and ResNet-18 TensorFlow Lite models are approximately 42.8, and 8.9 MB, respectively.As seen, these are different values from the mentioned sizes in Table 1.TensorFlow Lite converts the models into a smaller, more efficient machine learning (ML) model format. [53]In our preliminary mobile experiments, using  our mobile application with our models, we noticed a significant performance difference between the EfficientNetb3 and ResNet-18 models regarding inference times and frames per second (FPS) values.For example, in one of our devices, with no acceleration setting, we observed 401.49 and 91.63 ms inference times for EfficientNetb3 and ResNet-18 models, respectively.We also observed a 2.44 FPS value for the EfficientNetb3 model and a 10.16 FPS value for the ResNet-18 model.In addition, we did not notice any performance improvement in the XNNPack library acceleration setting with the EfficientNetb3 model.Lastly, we got an error and could not run our application when we used NNAPI delegate acceleration with EfficientNetb3.
Figure 5 shows the confusion matrix of the ResNet-18.The system successfully distinguishes diseases in general.Also, the performance rate of the Monkeypox class (indicated as 4 in the matrix) is quite high when compared to other classes.In the figure, the white squares represent 0, meaning no false prediction exists in the relevant cells.Figure 6 shows some visual results on the confusion matrix of the ResNet-18.
The proposed system has also been compared to other studies that used the same PAD-UFES-20 dataset.Table 7 shows the comparison results in terms of precision, recall, F1-score, Jaccard, and accuracy index.For a more fair comparison, ResNet-18 has also been trained and tested using only six classes on PAD-UFES-20 dataset images.For six-class classification, the proposed system showed better results than Chen et al. and Haritha et al. [32,54] in terms of accuracy.It showed a comparable result with Pacheco et al. and Khan et al. [55,56] It also produced better and comparable results than other studies, despite the seven-class comparison, which included monkeypox images, unlike the others.ResNet18, ResNet50, and DenseNet121 are examples of deep networks class.ResNet18 has fewer parameters, while ResNet50 and DenseNet121 are deeper networks with more parameters.Having more parameters, generally, makes the model more complex and flexible.However, having more parameters also has disadvantages like more memory usage, slower training time, and more data requirements.Therefore, a scenario where ResNet18 outperforms ResNet50 and DenseNet121 might depend on the complexity and size of the dataset.Although ResNet18 has fewer parameters, it may have enough capacity to make a better generalization over the dataset.The information in Table 7 is an example of this situation.
In the literature, there are some other studies, such as Medhat et al. and Alyami et al., [30,31] which used the same dataset.But, these studies selected some classes of the dataset and studied with fewer classes; some of them [32] merged visual data with clinical data.To ensure a fair comparison, these are not included in the comparison table.

Experiments with Android Application
The application has been delivered to four Android smartphones for the on-device experiments.The devices' information is shown in Table 8.It has been successfully run on each device.Some sample screenshots of the application are shown in Figure 7.The class of the skin lesion with the highest probability is shown at the top left corner of the screen, together with the probability percentage value.At the bottom of that information, the inference time and the FPS values are seen.Inference time is the time spent predicting a single image frame.FPS represents the number of image frames processed and predicted in a second.
For the quantitative evaluation of the mobile application, inference time and FPS performances have been observed.The performances were measured using different accelerators.TensorFlow Lite framework provides two main mechanisms for acceleration.These are the XNNPack library and delegates.
XNNPack library provides optimized implementations for floating-point neural network operators and, therefore, utilizes CPUs of smartphones for better performance.For example, it includes optimized convolution and fully connected layer operators for ARM processors that support ARM NEON extension.It also provides an operator fusion feature that combines possible operators into a single operator, resulting in improved performance.Delegates allow the usage of hardware accelerators of smartphones during ML operations.TensorFlow Lite currently provides four delegates: 1) GPU delegate for Android and iOS platforms; 2) Hexagon delegate for Android platform; 3) Android neural networks API (NNAPI) delegate for Android platform; and 4) Core ML delegate for iOS platform.
GPU delegate and Hexagon delegate allow the utilization of GPU and Qualcomm Hexagon DSP, respectively.NNAPI delegate enables the usage of device's GPU, DSP, and neural processing unit (NPU) for acceleration on Android platforms.Core ML delegate utilizes the neural engine of iOS platforms.
During the experiments, both the inference time and FPS performances have been observed, with no acceleration, XNNPack library acceleration, and NNAPI delegate acceleration options on all smartphones.The inference times have been measured to assess the singleframe prediction performances.They are shared in Table 9 in milliseconds (ms).During the experiments, 1,000 inferences for each acceleration option and device have been measured, then the average inference time has been calculated.In the table, NA means not applicable, which states that the accelerator can not be used for that platform.According to these results, for Device-1, XNNPack provided 16.71%, and NNAPI provided a 40.48% increase in performance compared to no acceleration.NNAPI provided a 28.54% increase in performance compared to XNNPack.For Device-2, Device-3, and Device 4, XNNPack provided a 26.88%, 7.44%, and 23.77% increase in performance, respectively, compared to no acceleration.
The FPS values have been observed to get an idea of the application performance from the viewpoint of real-life experience.During the experiments, the time spent for 1,000 frames for each setting and device was measured, and then the average FPS was calculated.The FPS values are shared in Table 10.There are two types of FPS values: provided FPS and analyzed FPS.Provided FPS represents the number of frames delivered to the image analysis use case per second by the platform.Analyzed FPS represents the number of image frames processed and predicted per second.NA means not applicable, which states that the accelerator can not be used for that platform.For Device-1, the provided FPS value and the analyzed FPS values for each acceleration option are approximately 29.No improvement is seen between different acceleration settings.This is because all of the provided image frames are processed and predicted even without any acceleration.The application can keep up with the provided image frames.For Device-2, the analyzed FPS values are lower than the provided FPS values.The application can not keep up with the provided image frames.When no acceleration option is selected, 48.97% of the image frames are dropped, and when XNNPack acceleration is set, 33.25% of the image frames are dropped.XNNPack resulted in a 30.80% increase in FPS performance compared to no acceleration.For Device-3, the analyzed FPS values are approximately 75.42% lower than the provided FPS values.Analyzed FPS values are approximately 7 with no acceleration and XNNPack acceleration options.Recall that XNNPack provided a 7.44% increase in inference time performance for this device.This increase did not result in significant improvement in FPS performance in our application.For Device-4, the analyzed FPS values are again lower than those provided.When no acceleration option is selected, 85.12% of the image frames are dropped, and when XNNPack acceleration is set, 69.86% of the image frames are dropped.XNNPack resulted in a 102.56% increase in FPS performance compared to no acceleration.
These results show that accelerators of the TensorFlow Lite improve inference time performance, which may also result in improved FPS performance.In addition, with modern, powerful smartphones, we can get real-time prediction performance.
To examine the effect of optimization approaches on accuracy, the samples in the test set were tested in all three optimization techniques.When the results were examined, the same label values were obtained with the same probabilities for many images in all three approaches.In a few examples, although the labels were the same, a 1%-2% difference was observed between the probability values, but in this case, the accuracy value still did not change.The results of some samples from the test set on the three approaches are given in Table 11.

Conclusion
This article presents a mobile system for skin disease classification.One of the focuses of the study is to distinguish monkeypox lesions from other lesions and quickly determine monkeypox lesions.For this purpose, a combined dataset has been created using two different datasets.The combined dataset consists of seven types of skin lesions, including monkeypox.Then, using this dataset, different pretrained networks were trained based on the transfer learning approach.In this step, small-size pretrained networks have been chosen to better adapt mobile applications.Then the most suitable network from the viewpoint of both performance and mobile compatibility has been transformed into the TensorFlow Lite format.Finally, a mobile application for Android platforms has been developed that uses the TensorFlow Lite ML library to classify skin lesion images.The development has been carried out using Kotlin.The CameraX API was used to gather the images through the smartphone's camera.The mobile application has been run successfully on different smartphones, and the performance results have been collected.Improved performance in inference times has been observed when accelerators are used.The proposed system allows quick preliminary diagnosis of the lesions of patients using their smartphones.The system produced 74.27% classification accuracy for seven-class.In addition, the same system has been also trained and tested with just PAD-UFES-20 dataset images for a more fair comparison with the literature.The system produced 74.62% accuracy for six-class PAD-UFES-20 dataset images.It yielded comparable/better results compared to the literature.
To supplement the testing with some real clinical data collected from hospitals is goal of future work.Also, it aims to cooperate with dermatologists and radiologists to receive feedback on both testing with new patient data and increasing the adoption and usefulness of the system.In addition, the authors are investigating multiplatform/cross-platform mobile development technologies including PWAs, especially from the viewpoint of ML framework integration.In the future, it is aimed to create mobile AI solutions that span a wide range of platforms using a single code base with the help of these technologies.

Figure 2 .
Figure 2. The typical process of machine learning tasks on Android smartphones using TensorFlow Lite.

Figure 3 .
Figure 3.The workflow of the developed smartphone application.

Figure 4 .
Figure 4. Comparison of the number of balanced and imbalanced data.

Figure 6 .
Figure 6.Some visual examples on the confusion matrix of the ResNet-18 produced the best performance.

Figure 7 .
Figure 7.Some sample screenshots from the application.
*NasnetMobile does not have a linear sequence.

Table 2 .
Technologies used during the development of Android application.

Table 3 .
The image analysis uses case settings used in the application.

Table 4 .
Details of the datasets.

Table 5 .
The parameters used in the training phase.

Table 6 .
Performances of the pretrained networks for skin lesion classification.

Table 7 .
Comparative results for skin lesion classification task.
*Only monkeypox-labeled images have been used from MSLD.

Table 11 .
The effect of optimization techniques on accuracy.