Predicting Mobile Money Transaction Fraud using Machine Learning Algorithms

The ease with which mobile money is used to facilitate cross-border payments presents a global threat to law enforcement in the ﬁght against laundering and terrorist ﬁnancing. This paper aims to use machine learning classiﬁers to predict transactions ﬂagged as a fraud in mobile money transfers. Data for this paper came from real-time transactions that stimulate a well-known mobile transfer fraud scheme. This paper uses logistic regression as the baseline model and compares it with ensembles and gradient descent models. The results indicate that the established logistic regression model did not perform too poorly compared to the other models. The random forest classiﬁer had the most outstanding performance among all measures. The amount of money transferred was the top feature to predict money laundering transactions through mobile money transfers. These ﬁndings suggest that more research is needed to improve the logistic regression model. The random forest classiﬁer should be further explored as a potential tool for law enforcement and ﬁnancial institutions to detect money laundering activities in mobile money transfers.


Introduction
With the increasing popularity of mobile money services, there has been a corresponding increase in fraud and money laundering cases.Mobile money providers must therefore be vigilant in combating such activity.One way to combat fraud is to require users to provide additional information when making transactions, such as PIN or biometric data.Money laundering cases can be more difficult to detect, but mobile money providers can look for patterns of suspicious activity, such as unusually large or frequent transactions.The Financial Action Task Force (FATF) noted that mobile money payment presents a global threat to money laundering and terrorist financing as it can be used to facilitate cross-border payments without the need for a bank account.In response, the FATF released risk-based approaches to countering the threat.The guidance sets out recommendations for identifying and managing risk, including the use of computational technology.
The use of machine learning (ML) and artificial intelligence (AI) is increasingly seen to combat mobile money fraud and address anti-money laundering (AML) compliance.Computational technology has always played a role in the fight against financial crime, but the rise of ML and AI is giving law enforcement a powerful new tool in the battle against mobile money fraud.AI can help financial institutions to identify and flag suspicious behaviour, such as large or unusual transactions, and better understand their customers' needs and risk profiles.By harnessing the power of AI, financial institutions can significantly improve their ability to combat mobile money fraud and address money laundering threats.This paper aims to use ML learning algorithms to build a fraud detection model that will detect red flags of fraud and money laundering from mobile money transactions.More specifically, this paper will use a set of risk-based indicators to predict how likely a transaction will be fraudulent.
This study provides several significant advances to the existing body of research on methods for detecting suspicious transactions in mobile money transfers.In theory, machine learning algorithms can circumvent the challenges of attempting to identify illegal transactions by relying on the more conventional rule-based benchmark methodology.In the classic rule-based benchmark technique, identifying illegal transactions is accomplished by using predefined criteria based on mathematical conditions.The rule-based approach is time-consuming and costly and has a high rate of false-positive results.ML addresses these issues by enabling computers to learn from the data and make predictions.When applied to mobile money, ML can be used to enable automated detection of potentially fraudulent transactions.An example of this would be training a ML algorithm on a dataset containing transactions that are known to be fraudulent.Based on this learned experience, the algorithm can be tuned to find future fraudulent transactions by looking for patterns similar to those found in the training data.
Practically, we propose a novel data-driven method of fraud detection that has been precisely tuned to the distinctive features of mobile money transactions.This strategy uses ML to automatically identify suspicious transactions in real-time, eliminating the need for extensive human involvement.The ML approach, which uses real-time analysis, can quickly spot transactions that could be fraudulent and stop them from going through and reduce the number of fraudulent transactions.
The remainder of this paper is structured in the following manner.Section two thoroughly analyzes the literature on ML and mobile money transfers concerning fraud and money laundering.Section three discusses the methodology and algorithms considered for the ML models.Section four examines and analyzes the results.Section five concludes with limitations in ML for fraud research and identifies opportunities for further study.

Literature Review on Mobile Money Fraud and Money Laundering
Mobile Money Services (MMS) or Mobile Money Transfer Services (MMTS) are unbanked financial services that operate primarily through smartphone apps supported by mobile operators or banking institutions and are frequently referred to as branchless banking services for their users 1,2 .They facilitate the fund transfer of electronic cash using the users' mobile phones while not involving any bank account in the process 3,4 .A few common examples of MMS or MMT services are: Tigopesa, M-Pesa, Simbanking, and NMB Mobile offered by Tigo Tanzania Ltd, Vodacom Tanzania, CRDB Bank, and National Microfinance Bank respectively 5(p4) .Ideally, MMT enables person-to-person (P2P) payments for the customers, and the services supported by the mobile money system involve participation from various stakeholders like mobile users, regulators, mobile network operators, telecom retailers, agents, and financial institutions 6 .The mobile users act as the customers for the MMT services, while mobile network operators (MNOs) completely facilitate the ecosystem of MMS in conjunction with telecom retailers and agents, who are responsible for opening accounts for the customers, conducting customer due diligence and other compliance activities like KYC, and Know Your Customer.Financial institutions and regulators assist MNOs in establishing financial inclusion and risk management mechanisms, whereas MNOs limit banks to processing payment delivery, clearing, and settlement 6 .A bank can or cannot be involved in the MMS depending on the adopted model of MMT 7 .These players collectively enable MNOs to implement the new P2P payment facility for unbanked users.
The number of users using mobile money for small or large transactions has increased drastically in the last decade 1 .Research estimates that this number is expected to rise with the increasing dependency and usage of mobile phones in the future 8,9 .Due to their success and popularity, mobile money systems are set to attract the attention of fraudsters interested in laundering the proceeds of crime 10 .Fraudsters can launder money by seizing the details related to several mobile money transfers during transmission or creation and saving the server's data through phishing attacks or viruses, which can then be misused to launder illicit funds 8,11 .Similarly, the reprobate end users of MMS can launder their dirty money through this system by smurfing a large chunk of the illegitimate source of income into a small number of mobile money transactions, using multiple accounts and phones while avoiding the suspicious nature of the act 3,6 .Indeed, some speculate that this system could be used to fund terrorist activities, though there is evidence that launderers have used mobile transfers to launder funds for terrorist financing 12 .These findings have brought the need for more advanced technology to identify and control the risks associated with the mobile money system.

Detecting Mobile Money Fraud and ML Using Computational Technology
Technological innovation can be useful in mitigating various risks associated with the MMS.Improving technological surveillance by increasing the security, resilience, and scalability of MNO networks used in MMT can reduce risks associated with mobile money fraud (MMF) to some extent at the security and procedural levels 13 .Implementing the two-factor authentication model for securing communications through SMS in MMT has proven very effective 14 .The most important contribution technology can make, other than the new developments in the security information and events management field, is through innovation or by designing MMF and money laundering prediction or detection tools for the MMS.In the following paragraphs, we will focus on using ML algorithms and artificial intelligence to mitigate the risk of MMF and money laundering in mobile money transactions.

Machine Learning and Artificially Intelligent Algorithms
Technology is pivotal in investigating and detecting fraudulent or laundered mobile money transactions 15 .ML, AI, and data mining have proven effective in detecting MMF and money laundering activities in the MMS 16,17 .More specifically, ML algorithms teach computers to learn human behaviour and detect patterns in the data 17,18 .Supervised ML algorithms like logistic regression, decision tree, gradient descent, and random forest have all been successfully used in detecting financial fraud from labelled data 16,[18][19][20][21][22] .The following sections are devoted to reviewing the literature on these algorithms.

Logistic Regression
Logistic regression will be used as the baseline algorithm to compare with the other models.Logistic regression uses a linear combination of input variables (x ) to predict an output variable (y ) 20 .The output variable is usually (0 or 1), representing the two possible outcomes of a binary classification task (e.g., fraud or not fraud).The coefficients of the input variables (β ) are estimated using maximum likelihood estimation.The Sigmoid function is a mathematical function that is the foundation of logistic regression and takes an actual number and translates it into a value between 0 and 1.The Sigmoid translation is important for ML learning classification tasks because it allows the algorithm to easily separate data points into different classes 23 .The sigmoid function is denoted in equation 1.
Where f(x ) is the value bounded between 0 and 1, X is he derivative of the sigmoid function, e is the mathematical constant f(x ) = 1 / (1 + e ˆ-x ).eq . 1 The output of the sigmoid function can be interpreted as a probability.For example, if the output of the Sigmoid function is 0.8, this can be interpreted as an 80% chance that the data point belongs to one class and a 20% chance that the data point belongs to the other class 24,25 .Using the Sigmoid function, a logistic regression model can be trained to predict the class to which a new data point belongs.The logistic regression classifier uses the Sigmoid function to estimate the probability that y = 1, given the size of x .Equation 2denotes the logistic regression model.
Where Y = values between 0 and 1, eβ 0+ β 1+ X represents the independent features, and B 0 and B 1 will give different estimations of Pr 1+eβ0+β1+X eq . 2 Logistic regression is a valuable technique for fraud classification tasks 20,24,25 .Research has shown that the logistic regression performed relatively well and, in some cases, outperformed other classifiers in fraud classification tasks 20,23,25,26 .Logistic regression has been used in a variety of domains to predict fraud.For example, logistic regression has been used in the financial sector to detect credit card and insurance fraud 23,27 .Others have used logistic regression to predict medical billing fraud with reliable results 28,29 .Logistic regression models have several advantages over traditional fraud detection methods.First, it is highly scalable and can be applied to large data sets 29 .Second, it is highly effective at detecting fraud, with a success rate that is generally much higher than traditional methods 23 .In addition, logistic regression models are relatively easy to interpret, which makes them valuable tools for fraud analysts 20,25 .Finally, it is relatively easy to deploy and use in production systems 20,23 .However, logistic regression is not without drawbacks; in particular, it can be susceptible to overfitting if the data is not carefully preprocessed 24,27 .Even though model overfitting is a problem, logistic regression is an excellent way to build a fraud detection model that can be used as a benchmark to compare with other classifiers.

Decision Tree
Another useful machine learning algorithm for fraud detection is the decision tree classifier 19 .Decision tree employs a tree structure for choice making, where the root symbolizes the fundamental decision, edges display the decision node, leaves show the class labels that convey the decision, and internal modes indicate qualities picked based on information gain or Gini Index 18,30 .Typically considered a weak learner, decision tree classification ability is boosted by using the gradient boosting technique 31 .p8) .This project employs the Gini Index to label the data.The mathematical formula for Gini Index is shown in equation 2: Where fk is the fraction of items labeled with k in the set and [?] fk = 1.
Concerning fraud detection, a decision tree involves building a model that can predict whether an observation is legitimately derived or not 32 .The decision tree model is based on a series of yes-or-no questions, each narrowing down the possible outcomes (i.e., fraud or no fraud) 33 .For example, a decision tree for fraud detection might ask whether the transaction is consistent with the customer's past behaviour.If the answer is no, it could be flagged as potentially fraudulent.Once the model is built, it can be used to classify new data points as either fraudulent or non-fraudulent.Decision tree algorithms are highly effective in identifying fraud.They are often used with other methods, such as rule-based systems and ML 18,19,33 .Classification algorithms based on decision trees are a powerful way to find fraud because they can help find even the most complex kinds of fraud.

Gradient Descent
Gradient descent is a machine learning algorithm that uses first-order iterative optimization to find the minimum of a function.To locate a function's local minimum using gradient descent, one must take steps proportional to the function's negative gradient (or approximate gradient) at the current point 18 .Instead, if one takes steps proportional to the gradient's positive, one approaches a local maximum of that function, known as gradient ascent 34 .It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f ) that minimizes a cost function (c ).The cost function is a measure of how far away the predicted values are from the actual values.The algorithm iteratively adjusts the coefficients until it converges on a set of coefficients that minimizes the cost function 34,35 .
The algorithm is represented by the probabilistic formula where the likelihood function p (x , ?0, ?1) predicts the probability of a binary outcome given a set of independent variables.In this case, the algorithm is trained to predict whether an instance belongs to class 0 or 1, which are represented by the labels ?= 0 and ?= 1.The coefficients ?0 and ?1 represent the probability of the output y to be 1 or 0 given x .In other words, ?0 and ?1 are the log odds of the output being 1 or 0 given x 18,34 .The gradient descent algorithm is popular for machine learning applications, particularly in fraud detection, because the algorithm can learn from data very quickly and effectively.Additionally, the gradient descent algorithm can handle very large datasets, making it ideal for fraud detection applications requiring high accuracy.Numerous studies have been conducted on the efficacy of the gradient descent algorithm for fraud detection.Most of these studies have found that the algorithm effectively detects fraudulent activities 36,37 .Recent studies have found that the algorithm was very effective in detecting known fraud cases in a dataset of credit card transactions 36,38 .Another study found that the algorithm could successfully identify fraudulent insurance claims with high accuracy 37 .Furthermore, the gradient descent algorithm is also relatively easy to implement, which makes it a good choice for organizations that do not have a lot of resources or expertise in fraud detection methods.

Random Forest
Random Forest is a machine learning algorithm for classification and regression tasks and is learned base on the decision tree concept 22,23,39 .The random forest algorithm generates a series of decision trees, each created using a randomly chosen subset of the training data 40,41 .Even though random forest is a useful learning algorithm that can be used to solve both linear and nonlinear problems, it is especially useful for addressing nonlinear data 16,39 .The predictions of the individual trees are then combined to produce the final prediction.The random forest algorithm is effective because it reduces the prediction variance while maintaining the model's accuracy 41,42 .Additionally, the random forest algorithm, when pruned, is resistant to overfitting, which means it can handle large datasets and generalize well to new data.Because of these advantages, the random forest algorithm is a powerful tool for machine learning applications.
The advantage of the random forest algorithm for classification tasks is that it can help reduce the number of false positives generated by other AI methods, such as neural networks 40 .In addition, the random forest technique is not difficult to construct, and it is possible to execute the algorithm on large datasets with accurate performance 42 .These two features combine to make the algorithm a powerful instrument for discovering patterns in data.In particular, the random forest classifier is well-suited for detecting fraud, often identifying unusual patterns in the data 19,41 .For example, fraudsters might create multiple accounts with different email addresses and use them to make small purchases to avoid detection.Alternatively, they might try to return items they never bought to receive a refund.By looking for these and other unusual patterns, the random forest algorithm can help to detect fraud before it results in significant losses.In addition, the random Forest classifier has also been used in other domains such as loan default, credit risks, image recognition, and medical diagnosis 19,23,39,41 .The random forest classifier has proven to be a versatile tool algorithm and has been used in various domains.

Research Design and Experimental Setting Data Generation and Stimulation
Currently, there is a lack of data on fraud and money laundering detection 43 .One of the reasons cited for this outcome is confidentiality and the sensitivity of the data.Researchers have developed stimulators that use algorithms to generate synthetic data from real-time observations to address this problem.Some of the most prominent stimulators used by researchers are the Mobile Money Simulator (PaySim) and Retail Store Simulator (RetSim) 43,44 .These simulators allow researchers to generate synthetic transactional data that contains both legitimate and fraudulent transactions. 43and 44 demonstrated using Agent-Based Simulation (ABS) and Multi Agent-Based Simulations (MABS) that synthetic transactional data developed by PaySim and RetSim are as useful as real transaction data for detecting MMF and money laundering activities while retaining the reliability and confidentiality of the actual transaction data.
The data for this project came from an MABS that was used to calibrate real-time transactions.The data came from Lopez-Rojas and his colleagues' work, who use MABS to develop agents representing clients and merchants in PaySim and customers and salesmen in RetSim 43,44 .The data is simulated and uses a realworld scenario based on a well-known fraud scheme to demonstrate the superiority of simulated data over real-world data when establishing adequate controls for fraud detection 45 .The usual behaviour was derived from the behaviour that was observed in the data collected.This behaviour is enshrined in the agents' rules governing the transactions and interactions between consumers and salespeople or between customers and merchants.Based on patterns of actual fraud 43 , some of these agents were set up to commit fraud.

Data Description and Variables
PaySim is used to simulate mobile money transactions in this dataset.The simulations are based on a sample of actual mobile money transactions that were taken from one month's worth of financial logs generated by a mobile money service that was deployed in an African nation.The first logs were given by a global firm that is the supplier of the mobile financial service presently operational in more than 14 countries.The company provided the original logs.In total, 1048575 rows of data were collected that comprised nine independent features.Table 1 depicts the features and target variables that represent the dataset.The dependent variable is fraud."Fraud," in this context, refers to the transactions carried out by fraudulent actors inside the simulation.More specifically, the fraudulent activity of the agents tries to profit by seizing control of client accounts and laundering the cash by moving them to another system.The funds are then cashed out of the system.Fraud was coded as 1 = fraud and 0 = no fraud and is represented in equation 1.

Data Cleaning and Preprocessing
Some features were redundant and had to be dropped before model building.As a result, the features "nameorig" and "nameDest" are no longer relevant and must be removed.There was no longitude or latitude associated with these features to locate the destinations.There was no variation in the feature "isFlaggedFraud," and it was also dropped from the model.

Feature Scaling
One of the most important aspects of preprocessing data for ML is feature scaling.Preprocesing is especially necessary when the features are on different scales and span a wide range of values.ML models are highly sensitive to features with different scales and, if not handled properly, can throw off the model and lead to sub-optimal performance or even incorrect predictions.There are a few different ways to scale features, but the most common is min-max scaling.This approach scales all values to be between 0 and 1.Other methods include standardization, which scales values so that they have a mean of 0 and a standard deviation of 1.
The data for this project contains features with different scales and ranges.Given that the data was not normally distributed, normalization with MinMaxScaler was used to normalize the data.The formula used to normalize the data is shown in equation 5.

SMOTe-ENN for Imbalance Data
The dataset used for this study was highly imbalanced.The fraud to no fraud ratio was 99% (no fraud) and 1% (fraud).One common approach when working with imbalanced datasets is to upsample the minority class.Upsampling can be done in various ways, but one popular method is the Synthetic Minority Oversampling Technique (SMOTe) plus Edited Nearest Neighbour (ENN).The SMOTe-ENN method combines the SMOTe and ENN algorithms to improve the performance of the ML classifiers 46,47 .SMOTe creates synthetic minority examples by interpolating between existing minority examples 48 .ENN then cleans up the resulting oversampled data set by removing outliers 49 .The SMOTe-ENN method has been shown to be more effective than either algorithm alone 46,47,49 .SMOTE-ENN is particularly effective at handling imbalanced data sets, often in real-world applications.As a result, the SMOTe-ENN method is often used in fields such as credit scoring and fraud detection 47,49 .
Figure 1 shows the data before SMOTe-ENN resampling.Because of the imbalanced nature of the data, the no-fraud observations are scattered along the dotted red line.ML classifier modelling on imbalanced data will lead to biased results, with the algorithms only reading the no-fraud observations 19,40 .Oversampling with SMOTe-ENN will create new, synthetic data points similar to the existing minority class.Figure 2 shows the data after SMOTe-ENN oversampling.Note that the data on the red line is more densely packed and is now moving in the same direction as the blue line.This symmetry of the lines is caused by the synthetic data points generated by SMOTe-ENN.These data points are not identical to the actual data points but are close enough to be used to train the model 46,47 .The resultant effect is a more accurate model, better able to classify the new data points.By artificially generating additional data points, SMOTe-ENN can ensure that the model can learn from the data and generalize to new inputs (Chawla et al., et al., 2002).
In this way, SMOTe-ENN can help to improve the performance of ML models on imbalanced datasets 46,47 .

Fraud Model Performance Measures
The results were analyzed using the standard confusion matrix.A confusion matrix is a table that helps calculate the accuracy of a classification model and the precision, recall, and f1-score.The table is made up of true positive (TP), false positive (FP), false negative (FN), and true negative (TN) values 15 .For a 2x2 binary classification, the confusion matrix as it pertains to this project can be deciphered as follows: True Positive (TP): The algorithm predicts the fraud, and the outcome is fraud.

True Negative (TN):
The algorithm predicts no fraud and there was no fraud.

False Positive (FP):
The algorithm predicted fraud, but there was no fraud (Type 1 Error).

False Negative (FN):
The algorithm predicted no-fraud, but there was fraud (Type II Error) Table 2 presents the performance measures used in the models for this paper.Accuracy is the proportion of correct predictions over all predictions.However, accuracy is not the best metric for an imbalanced dataset.An improved single measure is the Matthews Correlation Coefficient (MCC) 50 .The MCC is a measure of the quality of binary classification.The MCC considers true and false positives and negatives and is widely regarded as a balanced measure that can be applied even when the classes are of very different sizes 51 .The MCC is in the range [-1, 1].A coefficient of +1 represents a perfect prediction, a coefficient of 0 represents an average random prediction, and a coefficient of -1 represents an inverse prediction 52 .The MCC has some valuable properties that make it more suitable than other measures for some purposes, most notably its ability to work well even when one of the two classes is much more frequent than the other 50,51 .The formula for both the performance accuracy and the MCC is shown in Table 2.

Practitioner Measures
To understand how well a classification model is performing, we need to look beyond the performance accuracy.The accuracy may be high, but this could be due to the model predicting the most frequent class all the time, which produces inconsistent and unreliable results for an imbalanced dataset.To get a better idea of model performance, the confusion matrix can be used to calculate the performance of other metrics.
From the confusion matrix, we can calculate the precision, recall, and F1-score.Precision is the number of correct predictions divided by the total number of predictions.The recall is the number of correct predictions divided by the total number of actual positive cases.Generally, a classifier with a higher precision but lower recall will miss some fraudulent items but will not incorrectly predict too many items as fraud.A classifier with a higher recall but lower precision will correctly identify more of the fraud items and incorrectly predict more items as being a fraud.The ideal classifier would have perfect precision and recall, but this is usually impossible in practice.Instead, the goal is usually to find a balance between precision and recall that gives the best overall results.The F-1 score is the harmonic mean of precision and recall that achieves this objective.A good classification model will have a high precision, recall, and F1 score 19 .
A more robust measure is the Receiver Operating Characteristics (ROC).The ROC curve is a graphical tool used to evaluate the performance of a binary classifier.The curve is generated by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold values.The area under the ROC curve (AUROC) is a metric that can be used to compare different classifiers.A classifier with a higher AUC will have better discrimination, meaning it can better distinguish between positive and negative observations.A perfect classifier would have a TPR of 1 and an FPR of 0, resulting in a point in the upper left corner of the ROC curve.Generally, the closer the ROC curve is to this corner, the better the classifier performs.
Classifiers that perform similarly to random guessing will have a ROC curve close to the diagonal line 18,19 .

Descriptive Results
Table 3 provides information on the results of numerical features for money laundering transactions.The average amount of money transferred was $159,000-about $76,000, or 50% of total laundered transactions, made up most of the transfers.The highest amount laundered was $10 million.The average initial balance before the transaction was $875,000, while the average new balance after the transaction was $895,000.Note also that the maximum initial balance before the transfer and the new balance after the transfer was $38 million.The average initial balance of the recipient before the transaction was $978,000, while the average new balance after the transaction was higher at $1.1 million.Once again, it is worth noting that the maximum initial balance and the new balance of the recipient after the transfer was over $40 million.These findings suggest that money laundering is a significant problem, with large sums of money being transferred illegally 11,40 .
The findings of this study have implications for both the government and the public.First, the findings indicate that much illegal activity is not being detected or stopped 40 .Second, large amounts of money are not being taxed, which means the government is losing a lot of revenue 53 .Third, the findings prove that an underground economy operates outside the legal system 53,54 .A large underground economy can have several negative consequences, such as making it more difficult for law-abiding businesses to compete and increase the chances of crime and corruption 55 .Regulators need to take action to address the problem of the underground economy or risk people being doubtful of the government's ability to regulate the financial system.As can be seen in Figure 3, there are five unique types of transactions: cash-out, cash-in, payment, transfer, and debit.Cash-out is the most frequent money laundering transaction, accounting for nearly 36% of all transactions.Cash-out is followed by payment (34%), cash-in (22%), transfer (8%), and debit (1%).As expected, most transactions are in the form of cash (70%), with the remainder made up of payments, transfers, and debit cards.As noted above, the average amount laundered per transaction is $159,000, with a standard deviation of $265,000.These results show that money laundering is usually done with large sums of money, though the amount washed in each transaction varies greatly 6,40 .

Baseline Logistic Regression
In this study, we sought to compare the performances of several different classifiers to determine which would be best suited for predicting financial transaction fraud.Logistic regression was used as the base model, then compared with the results of other classifiers.As shown in Figure 4, all the features except cash-in, debit, and payment type are statistically significant at a p-value of 0.05 and with a confidence interval of 95%.These findings suggest that the other classifiers may be more accurate in predicting fraud than the logistic regression model 25 .More research is needed, but the logistic regression results suggest that other classifiers should be considered when predicting financial transaction fraud.To build the different classifiers and to see how the model performs using the classification metrics precision, recall, and the F1-score, the features with a p-value > 0.05 will be dropped from the dataset.

Figure 4: Logistic Regression Results
The coefficients of the logistic regression model are in terms of log(odd).The log(odd) in Table 4 lacks interpretation since it does not directly give the odds of an event occurring.Instead, they show how much each feature contributes to the model, meaning that they are more likely to predict whether or not an event will occur.Based on the logistic regression model, the amount involved in the transfer is the most important feature in detecting fraud.The features that positively affect fraud detection are amount, cash-out, and transfer.The features which negatively affect fraud prediction are the initial balance of the recipient before the transaction (i.e., oldbalanceDest) and the new balance of the recipient account after the transaction (i.e., newbalanceDest).However, these coefficients cannot be directly interpreted without first taking the exponential of the features.To find the odds, we must take the exponential of the coefficients.For example, if we take the exponential of-0.693,we get 0.5, which means that for every unit increase in X 1 , the odds of y = 1 decrease by 0.5.Holding all other variables constant, a one-unit increase in X 1 leads to a 50% decrease in the odds of y = 1 .5 provides an overview of the odds of the features and how they affect fraud.Note that the initial balance of the recipient before the transaction and transfer type have the highest odds of predicting fraud.Therefore, these two features are important when trying to predict fraud.However, we should also consider other factors, such as the amount of money being transferred, the country where the recipient is located, and whether or not the recipient has a bank account.These features can provide additional insights into whether or not a transaction is fraudulent 40 .For instance, if the amount of money being transferred is very large, it is more likely to be fraudulent.Similarly, this is another red flag if the recipient is located in a country with a high risk of fraud.

Performance Accuracy and Matthew Correlation Coefficient
The models were analyzed using the SK learn library after dropping the features with p-values greater than 0.05.Table 1 presents the performance accuracy and the MCC results for the different models tested.
Compared to the gradient descent and the ensemble classifier, the benchmark logistic regression model did not fear much.The ensemble classifier, a combination of multiple models, showed the best performance in terms of accuracy and MCC.The superior performance of the ensemble classifiers is because they can capture different patterns in the data and weigh them appropriately, resulting in more accurate predictions 41,56 .In terms of computational cost, gradient descent was the most expensive model to run, while logistic regression was the least expensive 34 .Therefore, if computational cost is a concern, logistic regression may be a better option despite its slightly lower accuracy.Overall, all the models performed reasonably well, with ensembles being the best performers in terms of accuracy and MCC.
As shown in Table 6, the MCC had poorer performance overall than the model accuracy.The random forest model had the highest performance accuracy (.89) and MCC (.78) on the test set, although it showed signs of overfitting.The next best performing classifier was the decision tree classifier, with a performance accuracy of .86 and an MCC of .75.The gradient descent classifier had a similar accuracy (.82); but a lower MCC (.67).These results suggest that while the random forest model is more likely to overfit the data, it may still be the best option for this dataset due to its higher accuracy.However, further tuning of hyperparameters may be necessary to improve its performance.These results suggest that, while gradient descent may be a computationally efficient method, it is not as effective as other methods in predictive power.In conclusion, the random forest model is the best-performing model in terms of accuracy and MCC.

Classification Measures
When dealing with imbalanced datasets, it is important to use measures of performance that are more robust than simple accuracy 40 .The performance accuracy only measures the FP and FN and does not provide enough information to properly assess the model's performance.When dealing with imbalanced datasets, it is important to use measures of performance that are more robust than simple accuracy 37 .In addition, we need to consider the TN and TP rates because these measures provide a complete picture of the model's performance.For example, if we have a high TN rate, our model correctly identifies negative instances.On the other hand, if we have a low TN rate, our model incorrectly identifies positive instances.As a result, the TN and TP rates are significant for assessing the performance of imbalanced datasets 15 .
When choosing a machine learning model for fraud detection, it is important to consider how well the model will perform in terms of precision, recall, and the F1-score 40,49 .These measures are better indicators of how well a model predicts laundered transactions than accuracy alone.However, it is important to remember that all of these measures can be affected by choice of threshold.A high threshold will result in fewer FPs and TPs, while a low threshold will have the opposite effect.Therefore, tuning the threshold according to the application's needs is important.In some cases, it may even be necessary to use multiple thresholds to achieve the desired level of performance 20,42 .To offset the trade-off between precision and recall, the threshold was tuned to ensure that it was equal before running the classifiers.
Table 7 shows the results of the classifiers' performance.Note that the decision tree classifier had the highest precision score (1.0), followed by the random forest model (.96).In contrast, the logistic regression and gradient classifiers had the highest recall scores (.96).The random forest (.87) followed by logistic regression (.85) had the highest F1 scores.The gradient descent and decision tree classifiers had the lowest F1-scores, respectively (.84).These results indicate that the random forest is the best performing classifier overall because the model achieved high scores across all three measures.Contextually, these results can be interpreted to mean that the random forest classifier correctly identifies a high proportion of positives and negatives and achieve a high degree of overall accuracy.The logistic regression and gradient descent classifiers also performed well, achieving high scores in recall and F1.However, they did not achieve the same high precision score as the random forest classifier.These results suggest that the logistic regression and gradient descent classifiers may misclassify more cases than the random forest classifier.As shown in Table 7, the decision tree classifier had the highest precision score, which means that the decision tree model did an excellent job predicting the fraudulent transactions.Note from Figure 5 that the type of transfer was the feature selected to split the tree.If the amount laundered is [?]0.5, the tree splits at the amount that was cashed out, and if the amount is [?] 0.5, the tree splits at the amount the recipient had after the laundered transaction (newbalanceDest).The 'type of transfer' is an important feature in classifying a money laundering transaction because it can show how sophisticated or structured the laundering process is.For example, a cash-out is a transfer with an influx of money into one account and then an immediate withdrawal of those funds.This type of laundering is generally associated with low-level or first-time offenders.
In contrast, a structured deposit is when funds are gradually deposited into an account over time before being withdrawn.This type of laundering is generally associated with more knowledgeable or experienced offenders with access to multiple accounts.The decision tree correctly classified 84% of all cash-outs as money laundering transactions and 100% of all structured deposits as money laundering transactions.These results indicate that the decision tree is an effective classifier for identifying money laundering activities.
Both the accuracy scores for cash-outs and structured deposits are high, suggesting that the decision tree could be improved by adding more information, such as the customer's location or account history.The ROC Curve The ROC curve is a commonly used metric for evaluating the performance of classification models, mainly when dealing with imbalanced datasets 49 .As seen in Figure 6, the AUROC curve plots the true positive rate (TPR) against the false positive rate (FPR) at various threshold values and provides a visual representation of the model's accuracy.The decision tree model was the best performing classifier (.95), followed by the logistic regression model (.93).The random forest and gradient descent classifiers had the lowest AUC score of .92.These findings suggests that, while all four models are effective at predicting laundered transactions, the decision tree model is the most accurate.Furthermore, the logistic regression model is a close second, making it a good choice for businesses that require a more interpretable model.Finally, the random forest and gradient descent models are still effective predictors of fraud, but they are not as accurate as the other two models.The features with the highest p-values in the baseline logistic regression model were amount, oldbalanceDest, newbalanceDest, cash-out, and transfer.These features were compared with the features from the ensemble classifiers to examine which ones contributed more to predicting laundered transactions.As shown in Figure 7, the amount involved in the transfer was again the top feature to predict suspicious transactions in mobile money transfers.These findings suggest that the amount of a transaction is a good predictor of whether a transaction is suspicious or not.The other features with p -values > 0.05 did not contribute much to predicting suspicious transactions.Altogether, mobile money transfer providers need to be aware of transfers involving large amounts of money and those made frequently or without a specific destination 30 .By considering these factors, mobile money providers can detect and prevent suspicious behaviour on their platforms.

Discussion and Conclusion
The popularity of mobile money transfer services has grown rapidly in recent years, driven by the increasing penetration of mobile devices and the widespread adoption of mobile banking services.However, using mobile money transfer services has also created new opportunities for criminals to launder money.To provide more insights into this problem, this study employs ML classifiers to predict mobile money transfer laundering transactions 32,40 .While all the classifiers were very useful in predicting suspicion transactions, the random forest model was the most consistent and best-performing model across all the classifiers 41 .The random forest model is a powerful ML technique for fraud detection because it can effectively learn from data with many features and can be tuned to achieve high precision (.96), recall (.80), and F1-score (.87).The findings from this study provide useful intelligence for mobile money service providers and law enforcement agencies to fight against money laundering.
Given the current global landscape, it is not surprising that the use of mobile money has increased dramatically in the past few years 53 .For consumers and businesses alike, the convenience and flexibility of mobile wallets have made them a popular choice 16 .However, the rise of mobile money has also attracted the attention of criminals looking for new ways to launder their ill-gotten gains 1,18,32 .In many cases, mobile money services are being used to facilitate money laundering by allowing criminals to quickly and easily move large amounts of cash without raising suspicion.For example, recent studies found that a significant proportion of mobile money users have been involved in a transaction that could be considered suspicious 8,14,44,45 .Considering these findings, more must be done to prevent mobile money from being used for illegal purposes.While mobile money has brought many benefits to the global economy, it is important to remember that those with criminal intent can also exploit its services.
Financial institutions have constantly been pressured to prevent money laundering and terrorist financing by implementing stringent compliance measures 12,53 .The mobile money ecosystem has only added to this pressure, as financial institutions must now also contend with the challenges posed by digital transactions.In response, there has been a shift in emphasis from traditional, deterministic rules-based methodologies toward more sophisticated computational techniques.This shift is primarily because the sheer volume of transaction data makes it difficult to flag and detect suspicious activities using rules-based approaches.Computational techniques in the form of ML offer a more effective way to monitor suspicious behaviour, as they can consider a broader range of factors and larger datasets.Adopting ML techniques for money laundering detection is a promising development in the fight against illicit activities.These techniques could make catching people trying to launder money through criminal networks much more accessible.They could also help mobile money service providers and law enforcement agencies stay one step ahead of individuals trying to launder illicit funds.

Limitations and Future Research
While ML is a valuable tool for fraud detection, several limitations must be considered when conducting research in this area.First, the data sets used to train the algorithms may not represent the population of interest, leading to inaccurate predictions.Second, ML models can be biased based on previous studies that used flawed methodology.Third, it is important to remember that ML is only one tool for detecting fraud; other methods, such as human intelligence and expert analysis, may be more effective in certain situations.Despite these limitations, ML is a powerful tool that significantly impacts fraud research.With continued advances in this area, even more, progress will likely be made in using ML algorithms to fight against fraud.Future research can employ ML to identify behaviour patterns that may indicate fraudulent activity, while human intelligence can provide context and insights that an automated system may miss.By combining these two approaches, future research can significantly improve the fight against fraud.

Figure 3 :
Figure 3: Types of Transfer Analytical Results

Figure 6 :
Figure 6 : Results from the ROC Curve Feature Importance

Table 1 :
Independent Features and Description

Table 2 :
Performance Metrics and Formulae

Table 3 :
Summary Statistics

Table 4 :
Odds of Logistic Regression

Table 5 :
Odds of the Features Occurring

Table 6 :
Algorithms Performance

Table 7 :
Algorithms Classification Metrics