T ‐ SNERF: A novel high accuracy machine learning approach for Intrusion Detection Systems

In the last few decades, Intrusion Detection System (IDS), in particular, machine learning ‐ based anomaly detection, has gained importance over Signature Detection Systems (SDSs) in the novel attacks detection. Herein, a novel approach called T ‐ Distributed Stochastic Neighbour Embedding and Random Forest Algorithm (T ‐ SNERF) is presented for the classification of cyber ‐ attacks. The approach consists of three different steps. First, the examination of feature correlations is provided. Second, the T ‐ Distributed Stochastic Neighbour Embedding (T ‐ SNE) data dimensional reduction technique is used. Third, Random Forest (RF) technique is utilised to evaluate the complications in the accuracy and False ‐ Positive Rate (FPR). The proposed approach has been tested on various well ‐ known datasets, namely, UNSW ‐ NB 15, CICIDS ‐ 2017, and phishing datasets. The proposed novel approach achieved significant results compared with existing approaches, achieving 100% accuracy, and 0% FPR for the UNSW ‐ NB15 dataset, and achieving high accuracy rates, up to 99.7878%, and 99.7044%, for CICIDS ‐ 2017 and Phishing datasets respectively.


| INTRODUCTION
Network security is gaining importance with computer networks for the protection against network attacks. Traditional security tools are used in many organisations such as firewalls, techniques of anti-spam, antiviruses' software, etc. However, more complex attacks cannot be identified by these tools. Network Intrusion Detection Systems (NIDSs) is the second line of defence that emerged for tracking the activity of the network and detecting disruptive events. NIDSs are now seen as effective defensive mechanisms that can defend against most disruptive threats and attacks [1].
This research aimed to study network anomaly detection systems using multiple machine learning techniques. The two main problems with existing state-of-the-art techniques are the reduction of false-negative and false-positive rates, and realtime processing to detect network intrusion. Some studies focussed on rule-based identification for complex network attacks where pre-designed processing rules are used for attack detection [2]; however, the rule-based expert approaches are less effective for very large datasets. Also, a recent post-Internet network architecture called wireless sensor networks (WSN) is increasingly utilised. The technical requirements of intrusion detection become more complicated by its addition to conventional and mobile networks with the diversification of intrusion methods available for malicious agents that should be taken into consideration [3]. WSNs are exposed to many security attacks easier than wired networks due to its limited distribution, and the multi-hop communication, bandwidth, and battery power used therein. Therefore, it is very important to design an efficient IDS for WSNs. The development of new attacks (e.g., black hole, sink hole, abnormal transmission, and packet dropping attacks) has taken place due to this new environment. By mechanism, attack detection is classified into traditional signature-based detection, an anomaly in behaviour detection and hybrid detection mechanisms. WSN protection is mostly done using anomaly in behaviour-based detection [4].
However, the detection of intrusion is inadequate for the protection against any attack for many reasons. First of all, its learning capacity is limited, it is based on the summation of features from raw data and then transform them into vectors to be inputs for the classifier. When the complexity of network structure increases, learning performance declines. Second, using an intrusion detection method with only one or two levels of information is not enough for the recognition of any additional attack types. Third, there is a similarity between network intrusion in real network datasets and those in normal datasets which limits classifiers to be able to classify them. Next, the unpredictability of behaviours for intrusion actions causes costly mistakes in IDS for intrusion detection. Therefore, finding an efficient detection method for intrusion becomes a necessity [5]. Finally, traditional approaches for intrusion detection become inconvenient because of large-scale data with high-dimensional structures generated by the variety of network types [6].
For the determination of analysis aspects, a part of the UNSW-NB15 dataset is split into training and testing sets. The evaluation of the complexity of these sets is the goal of the three aspects. First, the Kolmogorov-Smirnov Test [7] is applied to define and compare the distribution of the training and testing sets; the asymmetry of the features is measured using skewness [8]; and the features flatness is estimated using kurtosis [8]. In case an approximate similarity exists between these statistics and the features of the training and testing sets, results' reliability can be reached. Second, correlations of the features are calculated in two ways: (1) for the first perspective, Pearson's Correlation Coefficient (PCC) [9] is calculated. For the second perspective, the Gain Ratio (GR) method [10] is utilised. Third, for the measurement of accuracy and False Alarm Rates (FARs), five techniques exist, namely, Naïve Bayes (NB) [11], Decision Tree (DT) [12], Artificial Neural Network (ANN) [12,13], Logistic Regression (LR) [13], and Expectation-Maximisation (EM) Clustering [14] are applied for the training and testing sets. Furthermore, a comparison between the results of the UNSWNB15 dataset and the KDD99 dataset [15] takes place to recognise UNSWNB15 dataset capability in the evaluation of old and recent classifiers.

| IDS and Limitations of traditional antivirus
The commonly used signature-based firewalls and antivirus are considered reactive and cannot recognise the first time seen attacks. The signature, usually a hash, is generated manually by malware analysts for the detection of a specific part of malware. The signature is saved in the malware database. During any new scan, antivirus software accesses the database of signature. In the beginning, malware analysts were efficient because of the simplicity of attacks (viruses, trojans, and worms) but with the appearance of automated malware polymorphism and obfuscation, this method becomes incompetent. Zero-day attacks (malicious files targeting vulnerabilities that are previously undisclosed) could not be recognized by Signature-based detection. The development of alternatives to supplement traditional signature-based detection becomes a necessity for the creation of a robust antivirus product.
IDS is the active process of detecting unauthorized activity in system and network. It can be any hardware, software, or the combination between them for surveillance system or network of systems in case of any suspicious activity. Catching perpetrators in the act before any deterioration of resources. An IDS is the guarantee to protect the system from any attack. The main responsibilities of network activity are: Monitoring network activity, audit network, and system configurations for vulnerabilities, analysing data are the main responsibilities for IDS. It is an indispensable component in the security toolbox providing three functions: monitoring hosts, detecting the behaviours, and generating an alert. IDS can be considered as firewall functionality even though there is a difference between both. For the protection of the information flow and the prevention of intrusions, a firewall is safeguarded, whereas detecting the status of the network (under attack or not) or determining if the firewall security worked well are the main responsibilities for IDS. The combination of firewalls and IDS improves the security of the network.
The act of ensuring that no malicious activities occurred in a network system is called 'Network security'. It can treat different types of network attacks: spoofing, denial of service, intrusion detection. It is also called 'communication security', also deals with the protection of the transmission of information [16]. Some tools are available for network security like Wireshark [17], Snort [18], and TippingPoint [19]. Hosts participating in a network of computers is highly preferable for the possession of host security. In this case, the applications must be equipped with application security.
Most existing security systems require a network administrator for the surveillance of one or more tools listed above. In case of any security breach, the network administrator with the help of other analytical tools starts taking all the required measures. Rule-based network security tools include Snort, Tip-pingPoint, and their variations. These tools come with security rules already developed in them. Adding more rules is the responsibility of the network administrator, and the manufacturers can take this responsibility via updates.

| Machine learning-based IDSs
Machine learning is a data-dependent process, where the first step is to understand the data. In this part, several ways of machine learning application to IDS type were introduced using multiple datasets. This variety represents different types of attacks including the behaviours of host and network. System logs reflect the behaviours of the host, and network traffic reflects network behaviours. There are several types of attacks, each one has a particular pattern. Thus, it is important to select suitable data sources for the detection of various attacks according to the characteristics of the attack. Sending many packets in a quite brief time is one of the key features of a Deny Of Service (DOS) attack; therefore the data flow is convenient for the detection of a DOS attack.

| Flow-based attack detection
The most common source of data for IDSs is packets grouped in a period that exists inflow data. The detection of flow attacks has two advantages: (1) Flow reflects the entire environment of a network and allows the detection of most attacks, in particular DOS and Probe. (2) Pre-processing the flow without HAMMAD ET AL.
-179 parsing for packets or restructuring of the session is easy. However, the packet content is ignored by flow; that is why it gives an unsatisfactory detection result for U2R and R2L. When flow features are extracted, cached packets are important; thus, it involves some hysteresis. Feature engineering and deep learning methods are included in a flow-based attack. However, a strong heterogeneous flow can cause weak detection effects. Usually, a grouping of traffic is used to solve this issue.

| Contribution to the field
The contribution herein was to develop high accuracy machine learning model for intrusion detection using latest network traffic datasets, implement different machine learning techniques to detect most advanced and rare network attacks, and implement data dimensionality reduction machine learning technique to increase the individual classification accuracy of the network attacks. The proposed approach using only relative features in the datasets and maintaining at the same time low dimensionality which can in its turn reduce the training time. Also, to study network anomaly detection systems using multiple machine learning techniques, summarised by the following steps: the use of a combination of T-SNE algorithm, the selection of the most important subset of features by using Correlationbased Feature Selection (CFS) and finally using Random Forest as a classification technique. Weka [20] and R are the programing language for this work.

| Paper structure
Herein, Section 2 is a presentation of related works; Section 3 provides a concise overview about all of the three used datasets; Section 4 describes the methodology background; Section 5 presents a clear overview of the proposed process and gives a description about the workings of the algorithm; Section 6 is for the evaluation criteria, results of the experiment, and results and discussion; Section 7 lists some of the challenges of using machine learning algorithms in IDS; Section 8 concludes the work and presents the future work.

| RELATED WORK
Our work is addressing the accuracy of the current intrusion detection techniques and proposing a novel technique using multiple machine learning techniques to enhance the accuracy and detection of rare attacks, while minimising the consumption of the resources by reducing the feature set and implementing data dimensionality reduction technique to minimise the training and testing time of the machine learning model. The proposed technique is clearly scoring higher detection rates than any available models.
In [21], a fusion model that integrates rank based chisquare feature selection with multi class SVM optimised by kernel scale achieved an accuracy of 97.44% using NSL-KDD dataset. Other novel method of fusion of PCA and optimised SVM in [22] has been proposed, the obtained accuracy was 99.78% using KDDCup99 dataset.
For the classification of the UNSW-NB15 dataset [42], multiple machine learning techniques are used herein. The development of the UNSW-NB15 dataset using IXIA Perfect Storm is a robust network dataset that represents recent network traffic scenarios and several low footprint intrusions [43]. New studies have proven that traditional datasets should be replaced by new benchmark datasets, which means that recent day network traffic is not represented anymore by these datasets [43,44]. Some of the old datasets are KDD98 [45], KDDCUP99 [15], and NSL-KDD [46]; hence UNSW-NB15, CICIDS-2017 [47], and Phishing datasets are used herein.
Deep learning is a machine learning branch, recently utilised to detect network intrusion. In the previous research in IDS, many deep learning algorithms were used for the unsupervised feature such as Deep Belief Networks (DBNs), restricted Boltzmann machines (RBMs), auto-encoder, and deep neural networks (DNNs). For example, a new method proposed by Erfani et al. [48] is DBNs with a linear uni-class SVM for the detection of intrusion. Its application is on multiple benchmark datasets. Likewise, for learning compressed features from a specific set of features that don't exist in the payloads of packets, a discriminative RBM (DRBM) method is introduced by Fiore et al. [49]. For classifying behaviours, compressed features are the input of a soft-max classifier. DNNs are a deep learning method introduced by Javaid et al. [50] for anomaly detection. The results proved the efficiency of a deep learning model for detecting a flow-based anomaly in software-defined networks (SDNs). For the NSL-KDD dataset, a deep learning model is proposed by Tang et al. [51], which is using Self Taught Learning (STL) to build a network IDS. The results demonstrated that deep learning outperforms past research studies by its performance and accuracy. Wang [51] introduced a deep learning method for network traffic detection from raw data based on stacked autoencoder. It achieved remarkably high performance. Also, a deep learning method built on recurrent neural networks (RNNs) for detecting the intrusion is proposed by Yin et al. [52]. The application of RNNs was on the NSL-KDD dataset. It proved that deep learning methods are more efficient than traditional machine learning classification algorithms for IDS. The four hidden layers deep learning method based on RBM and DBN to reduce the number of features is proposed by Alrawashdeh and Purdy [53]. The update of the DBN weight is in a fine-tuning phase while Logistic Regression is used for classification. The accuracy of the model after its application on the KDD99 dataset is 97.9% and a false alarm rate of 0.5%. This accuracy is insufficient to build a robust model for detecting network intrusion [54].
A deep learning approach based on a non-symmetric deep auto-encoder (NDAE) is presented by Shone et al. [55] for intrusion detection. The KDD99 dataset was used for the application using RF for classification. The accuracy of the results is 97.85%. However, this method is inefficient for detecting complicated attacks because of the high value of false alarm rate 2.15%. More recently, a model using PCA and a Gaussian-binary restricted Boltzmann machine (GRBM) is proposed by Nguyen et al. [56] for the detection of cyberattacks in a mobile cloud environment. However, the unclear testing process of this method does not allow comparative benchmarking.
Few works exist that study the application of classification techniques on UNSW-NB15. The statistical analysis presented by Moustafa and Slay [28] is for the observations and attributes in UNSW-NB15 [42]. For the calculation of accuracy and FARs, five different classifiers are used. A group of techniques for detecting intrusion, AdaBoost is proposed by Moustafa et al. [57]. DT, NB, and ANN machine learning techniques are used on UNSW-NB15 and NIMS botnet datasets. A high accuracy of 99.54% and low false-positive rates of 1.38% are provided by the ensemble technique. The techniques of machine learning are used on UNSW-NB15 using identifiers of flow [42] for the efficient detection of botnets and their tracks.
The UNSW-NB15 dataset is used for NIDSs by Mogal et al. [58]. Herein, Central Points of attribute values with the a priori algorithm is used for pre-processing. NB and logistic regression are machine learning classifiers. The results are improved after pre-processing. The research of Moustafa and Slay [28] concentrated on the classification of the deferent types of attacks that were captured in the UNSW-NB15 dataset [42]. Herein, it is focussed on the identification of the important features used in UNSW-NB15 datasets by using multiple machine learning techniques such as NB, EM, and association rule mining. However, the accuracy value for these techniques was not so high for the rare attacks (e.g., 20% for BackDoor). Some works are made by Cannady [59,60] on network classification. He proved that neural networks are suitable solutions for a specific problem when they are trained using selective sets of the training dataset. Since the model does not have the ability to work with continuous data, the protection of the system is a necessity to take off-line the data whenever training the model is needed and run it to the updated set of selective data.
A classification method is proposed by Hansman and Hunt [61]. It comprises four distinct dimensions. Their classification scheme as a whole contains different types of breaches, helping the protection by maintaining clarity in the language that defines the various types of attacks. The system is improved by a robust style where the distinct attack types are described in detail. The first dimension is an assistant for the administrator in the categorisation of the breach, The second aspect concentrate on the description of the target of the breach and the third one describes the mechanism reflecting the various stages of vulnerability. The possible impacts are defined by the final aspect to be obtained prior to the final act.
Mayhew et al. [62] proposed a packet detection system, based on SVM and K-means. They collected packets from a network of real businesses and parsed them with Bro. Firstly, the packets are grouped by type of protocol. Then, they clustered data for the different protocol datasets with the K-means++ algorithm. Thus, grouping the original dataset into several clusters takes place. The data are homologous in any cluster. Next, the features are extracted from the packets, and the SVM models are trained on each cluster. The detection accuracy of E-mail, Wiki, TCP, Twitter, and HTTP were 93%, 99%, 92.9%, 96%, and 99.6%, respectively.
Goeschel et al. [63] proposed A hybrid algorithm. The first step is done using NB, SVM, and decision tree algorithms; training on the SVM model divides the data into normal or abnormal samples. A decision tree model is used for the evaluation of attack types for the abnormal samples. Known attacks can be identified using the decision tree algorithm. Accordingly, unknown attack types can be identified using the NB algorithm. Using the three mentioned algorithms, a hybrid technique accomplished a good detection rate on the KDD99 dataset of 99.62% and FPR of 1.57%.
A spectral clustering-based and DNN classification method is introduced by Ma et al. [64]. Low accuracy is the cause of a heterogeneous flow. Hence, the original dataset was first divided into six homogenous subsets. Then, each subset is trained separately using the DNN algorithm. The accuracy results obtained for the NSL-KDD dataset is 92.1%.
Processing raw data directly in deep learning methods allows learning features and achieving classification in parallel. Potluri et al. [65] have suggested a detection method based on CNN. Experiments are conducted on the UNSW-NB15 and the NSL-KDD datasets. In such datasets, the type of data is a feature vector. The conversion of feature vectors into images takes place because processing 2-dimensional (2D) data on Convolution Neural Network (CNNs) is good. Nominal features were one-hot coded, the dimensionality of features increases from 41 to 464. Then, each pixel is represented by 8-byte. Zeros are used as padding for blank pixels. The output was images of 8*8 pixels by transforming the vectors of feature into it. Finally, for the classification of attacks, a three-layer CNN is constructed. The performance of the proposed CNN is better than others with an accuracy of 94.9% on the UNSW-NB 15% and 91.14% on the NSL-KDD, compared to other DNN (GoogLeNet and ResNet 50).
The literature reviewed shows a clear lack of very high accuracy machine learning models to detect rare attacks using only relative features in the datasets and maintaining at the same time low dimensionality which can in its turn reduce the training time. We therefore find the need for using CFS, data dimensionality reduction techniques with very powerful RF classifier to build our model.

| BENCHMARK DATASETS IN IDS
Machine learning has the function of extracting useful information from the data; hence, machine learning success relies on input data quality. The machine learning methodology is focussed on understanding the data. For IDSs, network and host behaviours should be captured correctly as well as the data should be reachable. In IDSs the source of data types includes packets, sessions, flows, and logs. Creating a dataset is a difficult, and time-consuming process. It can be re-used repeatedly by several researchers after a benchmark dataset is created. In addition to convenience, the use of benchmark datasets offers two other benefits. (1) Benchmark datasets are authoritative, and the findings of studies are more compelling.
(2) Many publications have been used common benchmark datasets which allow the comparison of new study results with previous studies. Benchmark datasets are used herein: UNSW-NB15, and Phishing to implement various machine learning algorithms.

| UNSW-NB15
Three virtual machines were configured to capture the traffic of the network and the extraction of 47 features and two class labels. The setup of the environment was done by the University of South Wales [43]. UNSW-NB15 dataset is an evaluation of IDS datasets which is more complex compared to other benchmark datasets such as NSL-KDD and KDD99 because of different reasons: (1) traditional fewer types of attacks, (2) the absence of normal traffic situations. Recently, the UNSW-NB15 dataset was created for these issues. This dataset is composed of nine modern attack types and new normal traffic patterns, and 49 features that include the data flow between hosts and the examination of network packets to distinguish between normal and attack types of observations.

| Attack types
There are nine types of attached captured in the UNSW-NB15 dataset [43], mentioned in Moustafa and Slay [28,66] as follows: 1. Fuzzers: An attack using large quantities of random data called "Fuzz" to cause a network outage or crash servers across the network. 2. Analysis: attacks formed from spam files, footprinting, vulnerability scans, and port scans are included in this class. It is often referred to as Active Reconnaissance, in which scanning the network takes place without being exploited.

Backdoors:
This family uses a technique by which a legitimate portal of the system is utilised by attackers for gaining unlawful access. Malicious software is used as part of an exploit to insert themselves in a device and give remote access for cyber-attackers.

Denial of service (DoS): A popular cyber-attack in which
the attacker tries to expose a computer with many unauthorized demands for communication to make the network resources temporarily or permanently inaccessible to its expected users. These can be hard to differentiate from a legitimate network activity; however, some indicators exist for detecting these ongoing disruptive activities. 5. Exploits: In general, exploit attacks are accomplished by taking known vulnerabilities in operating systems as targets to exploit. For the automation of such attacks, exploit tools are used upon the discovery of a possible weakness in a network. 6. Generic: Is a cipher based attack; it is a type of collision attack on a generated secret key. The application of this type of attack is mainly focussed on message authentication, block, and stream block cipher. It relies on the greater likeliness of collisions between attempted random attacks. 7. Reconnaissance: Details about any public network or target host are gathered and are then used by manipulating techniques to use the collected information about the targeted networks or individual hosts. Free public information is used in this class 'Whois' service, Shodan, and ARIN records. Searches in social media help in such type of attacks. It can be called passive reconnaissance. 8. Shellcode: It is can be considered a sub-type of the of exploit attacks. This attack uses a tiny coding part as an exploit 's payload. To have remote access to a device, the injection of malicious code into an active application takes place. The attacker can control the compromised machine through a command shell. 9. Worms: Network propagation allows a malicious attack called a 'worm' to propagate through network propagation. The infection of a large network is fast. When a worm infects computers, it transforms them into zombies or bots to use them in distributed attacks. Table 1 shows two types of sets in the UNSW-NB15 dataset: training and testing sets; several records have been split into training and testing sets with an approximate 60%:40% ratio respectively.

| CICIDS-2017
The CICIDS-2017 dataset has been developed by the Canadian Institute for Cybersecurity. It has 5 days of network traffic activities. CICIDS-2017 shows seven types of network attacks, Infiltration Attack, Brute Force Attack, DoS Attack, DDoS Attack, Botnet, Heart Bleed Attack, and Web Attack. Table 2 shows the types of sets in the CICIDS-2017 dataset. CICIDS-2017 dataset has 79 features and one label, it has been recorded in CSV format so it is very easy to be used to implement machine learning algorithms [47].

| Phishing
There are 10 features in the Phishing dataset mainly related to transitional payment systems such as online transactions, electronic payments, e-commerce [67]. Other attributes are associated with phishing and trusted websites from many websites' sources. there are 1353 different websites gathered in the Phishing dataset out of which 805 are identified as a phishing attack and others are identified as legitimate websites.

| METHODOLOGY BACKGROUND
This method is about five parts. The beginning is to collect data from the network. During this phase, network data were collected using the benchmark UNSW-NB15 dataset. The second stage is the selection of attributes. In this step, for classification purposes, feature reduction, and ranking of the attributes, the (CFS) technique is applied. The third stage relates to data dimensional reduction. In this stage, the T-SNE algorithm is used to reduce the dimensionality of the data by representing high-dimensional data in two dimensions to allow visualization using scatter plots. The fourth step is the classification. In this step, the Random Forest (RF) technique is utilised to construct the model of classification. The last step is Accuracy. For the evaluation of detection models efficiency, accuracy, recall, Precision, F-Measure, and (FPR) are applied. When the data is trained and the model is developed, accuracy factors are reported.

| Feature selection
Machine learning processing can run faster and more accurately by using specific relevant features [68]. There are many feature selection methods and algorithms that have been created such as the Gini index, Information Gain, correlation coefficients, and uncertainty [69]. One of the fastest feature selection algorithms is CFS, which uses heuristic evaluation function [70] for correlation-based features ranking. The previous process is called a scheme-independent attribute subset evaluator which is used by the CFS, which takes into account every attribute prediction value and inter-redundancy degree. the two main criteria when selecting the subset of the attributes are (a) strong relation to the class attribute and (b) without strong correlation. Attributes with a high correlation to each other are not selected to reduce the redundancy. Also, irrelevant attributes are not selected because it has no impact on the class attribute. The evaluation function of CFS's feature is [70]: kr cf ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Merit S k is the heuristic merit of a S subset of features that contains k features, the mean feature to class correlation is r cf where ðf ∈ SÞ. While the average inter-correlation feature is r ff . The CFS is defined as follows: max S k r cf 1 þ r cf 2 þ ⋯ þ r cf k ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi  -183 The correlations are represented by r cf i and r f i f j variables. Herein, the aim is to use machine learning algorithms and to feature a selection method to analyse the features included in all of the three datasets in which the main target of increasing the accuracy of the IDS in any network system can be achieved. This is why duplicated and irrelevant features are discarded from each dataset for much faster training and testing of the model with lesser resource utilization and at the same time higher attack detection rates.

| T-SNE
The T-SNE algorithm was developed by Laurens van der Maaten and Geoffrey Hinton [71]. It is a nonlinear dimensionality reduction algorithm, popular for representing high-dimensional data in two or three dimensions to allow visualization using scatter plots.
One of the main components in data analysis is a visual representation, as it enables hypotheses and intuitions to be formed for the processes producing the data. Visual analytic offers and builds methods from complex data to achieve such an understanding. This seeks to develop methods for analysts to analyse the underlying mechanisms of the data [72]. There are many data dimensionality non-parametric visualization algorithms used to visualise the datasets such as Classical scaling [73], which is closely related to Sammon mapping [74], PCA [75,76], Locally Linear Embedding [77], and Isomap [77].
T-SNE algorithm is one of the most commonly used methods for data visualization to reduce dimensionality [78]. T-SNE is much easier to refine and provides substantially improved visualizations by reducing the tendency in the centre of the map to crowd points together. T-SNE is the best option while creating a single map visual representation at many scales [71].

| Random forest
Random decision forests classification algorithm is an ensemble machine learning method consisting of many decision trees in the training process and producing the output class using classification or prediction using regression of multiple trees. Random decision forests are right in overfitting the behaviour of decision trees to their training set.
T-SNE and Random forest are combined to get a good model for classifying UNSW-NB 15, CICIDS-2017, and phishing datasets. The coming part describes our approach and shows the algorithm and its detailed diagram.

| THE PROPOSED APPROACH OF T-SNERF
Many classification models such as SVM, RF, NB, BPNN, J48, ANN, and DNN have a good performance to solve efficiently complex problems in particular detection of intrusion problems. Their performance becomes poor when facing some issues related to intrusion detection fields such as data randomness. Therefore, within the next section, an approach is proposed to solve this issue.

| T-SNERF
In our algorithm, we used the train and test datasets of UNSW-NB 15, CICIDS-2017, and phishing datasets, then we used correlations feature-class to extract the important features and then R language code is used to implement T-SNE algorithm where data dimensionality can be reduced, subsequently, Weka machine learning application is used to implement random forest algorithm, that starts with the random sample selection from a given dataset. Next, the algorithm must create a decision tree for every sample. Then, all decision trees must produce prediction results. The proposed T-SNERF novel approach is shown in Figure 1. The figure is proposing a novel approach using multiple machine learning techniques to enhance the detection rate of rare attacks using only relative features in the datasets and maintaining at the same time low dimensionality which can in its turn reduce the training time.

| The T-SNERF algorithm
T-SNERF is combined with two main algorithms (T-Distributed Stochastic Neighbour Embedding and random forest algorithm), and these algorithms are used for predicting/classifying the categories network attacks type. The T-SNERF algorithm is illustrated in Algorithm 1.

| EXPERIMENTAL RESULTS
This experiment examines and compares T-SNERF with other IDS machine learning models: SVM, J48, RF, NB, ANN, K-NN, and other hybrid techniques. All datasets' classification results are compared against other machine learning algorithms and approaches using the accuracy of attack detection and FPR.

| Performance metrics
For the evaluation of the performance of detection models: accuracy, precision, recall, F-measure, FPR and specificity are utilised in this research. The calculation of these parameters are as follows [79]: FalsePositiveRateðFPRÞ When an attack is identified correctly, this represents the true positive (TP); a true negative (TN) when the model knows the normal network properly; a false negative (FN) when an attack is not identified correctly; and false positive (FP) when a normal network is identified as an attack. The accuracy rate calculates the percentage of correct attacks detection, recall shows the degree of correct attack detection of all output predicted as attack and specificity is the properly classified percentage of a normal network. Higher accuracy and recall with low FPR are indicators for a good classifier.

| Results, comparisons of the accuracy, and discussion
In our approach, we used R programing language to achieve the requirement of the T-SNE algorithm, which is to find the similarity between the data points in the low-dimensional space to reduce the dimensionality of the data. The algorithm added two new columns to the original datasets. The two new columns are the map coordinates produced by T-SNE which represents the x and y-axis, and then the Random Forest algorithm is used to train these enriched datasets. Figure 2-4 present the datasets' declaration of target distribution for the attack records for UNSW-NB1, CICIDS-2017, and Phishing datasets, respectively. There are two classes that defined in the datasets that is normal or attack classes, which are represented by the grey-scale of the points. The separation between the points is not clear due to a high number of instances. However, it is clear that data points are mostly close to each other in all of the three datasets. Some of the data points were not correctly classified by the RF in the final stage of the T-SNERF because they are difficult to identify. Naive Bays, Random Forest, and J48 classifiers have been implemented after using CFS only without implementing the T-SNE algorithm using the UNSW-NB15 dataset. The result shows that RF and J48 algorithms performed best results with 97.59%, and 93.78%, respectively, that is why this Random Forest algorithm has been selected in the T-SNERF approach. Table 3 shows the obtained accuracy and FPR for implemented classification techniques.

TA B L E 4 Evaluation measures of T-SNERF algorithm for benchmark datasets
Three combined machine learning algorithms: CFS, T-SNE, and RF have been implemented together to produce these remarkable results. Table 4 shows the results for T-SNERF using multiple benchmark datasets used in machine learning-based IDS.
Experiment 1: UNSW-NB15 data set. Table 5 presents the accuracy obtained by implementing the proposed T-SNERF algorithm. Random forest initially obtained 97.60% in Table 3, which increased by 7.46% after using CFS technique, compared to 90.14% in [80]. Also, random forest accuracy has been increased by 2.4% by implementing T-SNE algorithm prior to the classification to achieve 100% accuracy and 0% FPR using nine features selected using CFS. Experiment 2: CICIDS-2017 dataset. Table 6 compares the T-SNERF model with recent work on the CICIDS-2017 dataset. The comparison is based on the classifier used in each study, the results of the accuracy, and FAR. The proposed novel approach achieved significant results compared to existing approaches, achieving 99.7878% accuracy, and 0.003% FPR for the UNSW-NB15 dataset using nine features selected using CFS. Experiment 3: Phishing dataset. Table 7 shows the proposed novel approach which achieved significant results compared to existing approaches, achieving 99.7044% accuracy and 0.003% FPR for the Phishing dataset using five features selected using CFS.
According to the work done in [81], feature selection is used with an NB classifier that achieved high accuracy. The results showed very high accuracy for rare types of attacks. As well as good enhancement in the FPR using NB and feature selection algorithms.
The feature selection process in [81], however, has almost no effect on the J48 algorithm, in which the accuracy performance of J48 classification was not impacted by using feature selection. We conclude from this that the UNSW-NB15 dataset has many redundant features.
Receiver operating curves (ROC): The ROC function is commonly used to demonstrate the discriminative potential of any machine learning algorithm. As shown in Figure 5, the performance of proposed T-SNERF algorithm is presented using the grey-dotted curve using recall function from Equation 5 on the y coordinate versus the FPR from Equation 7 on the x coordinate is used to generate the ROC, The area under the curve (AUC) is presented by the black point on the curve using Equation 8. As shown in the curve the value of AUC is equal to 1, which declares the power of the high accuracy for the T-SNERF algorithm which confirms that the model produces the best results.

| CHALLENGES
Although machine learning methods have good achievements for intrusion detection, they still face some challenges.
1. Difficulty in getting datasets that simulate latest types of attacks. Currently, the most common dataset is KDD99 that has several issues and needs recent datasets. The building of recent datasets, however, relies on the knowledge of expert knowledge and the price of labour time. However, the lack of datasets is intensified by the variability of the Internet environment. There are many new forms of attacks that are not reflected by current datasets. Also, being representative, balanced, and less redundant are the main conditions for available datasets. The systematic construction of datasets and learning may be solutions to this problem. 2. Lower accuracy of detection for real conditions. Machine learning approaches possess a potential for detecting intrusions, but on completely unknown data they still struggle to perform well. Most of the recent research used labelled datasets. As a result, without covering all samples in the real world, there is no guarantee of good results in actual environments-even if its accuracy is high on test sets. 3. Low efficiency. Most research gives importance for the results of the detection; thus, complicated models and extensive methods of pre-processing data that are typically employed, lead to low performance. However, the detection of attacks in real-time is a necessity for IDS to mitigate the harm as much as possible. a trade-off exists between effect and efficiency.

| CONCLUSION
Herein, a novel high accuracy machine learning algorithm TSNERF has been introduced. It is used to solve the problem of network intrusion detection. Experiments have been applied on public datasets, in particular, the benchmark UNSW-NB15, CICIDS-2017, and Phishing datasets. T-SNERF accomplished F I G U R E 5 AUC (Area under the ROC Curve) of normal and attack classification of T-SNERF excellent results with an accuracy of 100%, with a zero FPR for the UNSW-NB15 dataset, and a very high accuracy of 99.7878% and 99.7044% for CICIDS-2017 and Phishing datasets respectively. T-SNERF data dimensional reduction was exploited for the generation of Random Forests to improve the accuracy of the classifier. Also, its low running makes it suitable to deploy in the future for detecting intrusion tasks in real-time.
Our work is extended in the following directions. First, we have a plan for integrating our machine learning algorithm with recent machine learning reinforcement algorithms [87], for the optimization of our system for detecting network intrusion. Additionally, an intrusion recovery programme can be introduced to repair misuses and anomalies of intrusions that can occur in a device, so that after an attack or abuse or an anomaly has been detected, a procedure known as patching has to fix the programme or software or operating system related to the detected anomaly.