• Open Access

Multivariate correlation analysis and geometric linear similarity for real-time intrusion detection systems

Authors

  • Abdelouahid Derhab,

    Corresponding author
    1. Center of Excellence in Information Assurance (COEIA), King Saud University, Riyadh, Kingdom of Saudi Arabia
    • Correspondence: Abdelouahid Derhab, Center of Excellence in Information Assurance (COEIA), King Saud University, Riyadh, Kingdom of Saudi Arabia.

      E-mail: abderhab@ksu.edu.sa

    Search for more papers by this author
  • Abdelghani Bouras

    1. Industrial Engineering Department, College of Engineering, King Saud University, Riyadh, Kingdom of Saudi Arabia
    Search for more papers by this author

Abstract

In this paper, we propose an intrusion detection system (IDS) based on four approaches: (i) statistical-based IDS to reduce detection time; (ii) intertwining data acquisition phase and data preprocessing phase to ensure real-time detection; (iii) geometric linear similarity measure that improves detection accuracy compared with existing measures; and (iv) multivariate correlation analysis that extracts a subset of strongly correlated features to construct a normal behavioral graph. Based on this graph, we derive the normal profile composed of high-level features. We use NSL-KDD dataset to analyze and evaluate the efficiency of the proposed IDS at detecting denial-of-service (DOS) attacks. Experimental results show that the proposed IDS can achieve good results in terms of detection rate and false positive rate. For some DOS attacks, 100% detection rate is achieved with 1.55% false positive. We also use KDD99 dataset to compare the proposed IDS with two statistical-based methods and some data mining and machine learning-based methods. Comparison study shows that the proposed IDS achieves the best tradeoff between detection rate (99.76%) and false positive rate (0.6%). It also requires just a few microseconds to classify the connection as normal or attack with low CPU usage and low memory consumption. Copyright © 2014 John Wiley & Sons, Ltd.

1 Introduction

Intrusion detection systems (IDSs) are one of the required pillars to build a computer and a network security architecture. Depending on the type of detection technique performed, IDSs are classified as anomaly based and misuse based. Misuse-based (or signature-based) IDS compares on-going observations (e.g., network traffic) with the patterns of known attacks, and those matched are labeled as intrusive attacks. The misuse-based detection technique is showing its limits as there is continuous emerging of new vulnerabilities and attacks with unknown patterns. Anomaly-based (or behavioral-based) detection IDS consists of the following phases: (i) training phase and (ii) detection (or testing) phase. A profile of normal behavior is built [1-5] during the training phase. In the detection phase, the observed profile is compared with the normal one. If the distance between the two profiles is found significant, an anomaly is raised (detected). The anomaly-based detection has the advantage of detecting unknown attacks compared with the misuse-based one. In the detection phase, a typical anomaly-based IDS consists of two steps: (i) data (or flow) acquisition and storage and (ii) data preprocessing and analysis [6]. In the data acquisition step, data packets are captured and exported to a data storage. In the next step, statistics about networks flows are calculated, and features representing traffic behaviors are extracted. Then, analysis is performed to distinguish between the normal network traffic and the anomalous one.

Network traffic features can be classified as low-level and high-level features [7]. The low-level feature is extracted from the captured traffic, that is, fields of the packet header. The high-level feature is deduced from the captured traffic. The network IDSs (NIDSs) consider the traffic as the collection of network flows. As the speed of network links and the number of flows are increasing, keeping per-flow state is too costly in terms of processing time and storage. High-speed networks pose serious challenges to the resource-constrained NIDSs [8-14]. Nowadays, the network speed has increased to 40 Gbps [14], and the demand for real-time NIDSs, which are capable of dealing with the continuous increase of the network speed, becomes a necessity. To improve the real-time of NIDSs, dedicated hardware platforms are deployed to ensure more efficient anomaly-based detection algorithm. It has been shown in [14] that the performance of the NIDSs can be improved during the data acquisition step using a powerful hardware platform. Typical packet capture libraries like pcap [15] and ipfirewall [16] spend high percentage of the CPU time on packet processing [17], and hence, a suboptimal performance is only achieved under high-speed networks. To avoid this problem, some techniques [17-19] are based on handling multiple packets simultaneously and using receive-side scaling [20]. The latter assigns each RX queue to a CPU core, which allows assigning packets of the same flow to the same network interface card (NIC) queue.

In this paper, we aim to ensure a real-time detection while incurring good performance in terms of detection efficiency (i.e., high detection rate and low false positive rate). This aim is achieved by adopting four approaches:

  • Low detection time anomaly-based method: We adopt the multivariate statistical-based approach as a detection method. The literature review [7] shows that the statistical-based approaches are more suitable for the real-time detection than the data mining-based and machine learning-based approaches. The multivariate model considers the correlation between two or more features. This model is shown to lead to better detection accuracy.
  • Intertwined data acquisition and data preprocessing phases: Instead of performing the data acquisition phase and the data preprocessing phase one after another, we propose to intertwine the two phases to ensure real-time detection.
  • Optimal feature selection: The optimal selection is achieved by using the reduced numbers of features. This reduces the processing overhead and speeds up the detection process. Also, selecting only relevant features improves the detection efficiency with respect to detection rate and false positive rate.
  • Accurate similarity measure: The key concept used in anomaly detection is the similarity (resp., dissimilarity) measure between two profiles, that is, to what extent that the two profiles are similar or different. An accurate similarity measure is likely to enhance the detection efficiency.

In this paper, we propose a real-time and statistical-based IDS based on geometric linear similarity and multivariate correlation analysis, named normal behavioral graph (NBG). The major contributions of this paper are the following: First, we show that the geometric linear similarity (resp., dissimilarity) measure, which has been proposed in [21], is more accurate than the ones proposed in the literature. Second, we compute the multivariate correlations between the low-level features of the training dataset, consisting of normal and attack traffic. Then, we generate the two graphs of features: one is derived from the strong correlation in the training normal dataset and the other one is derived from the strong correlations in the training attack dataset. By performing a composition operation on these two graphs, we derive the NBG consisting of strongly correlated features. Third, based on the normal behavior graph, we derive the normal profile, which is composed of high-level features. Fourth, to the best of our knowledge, this work is the first multivariate IDS tested under NSL-KDD dataset [22]. Fifth, we compare the proposed IDS with two statistical-based methods and other data mining-based and machine learning-based methods. The comparison study shows that the proposed IDS achieves the best tradeoff between detection rate and false positive rate. It also incurs low detection time, low CPU usage, and low memory consumption. Sixth, we describe how the real-time detection property is ensured using the intertwined data acquisition and data preprocessing approach.

The rest of the paper is organized as follows. Section 2 provides preliminaries on IDS. Section 3 presents related work. Section 4 presents the geometric linear similarity measure. In section 5, we present the multivariate correlation analysis, which generates the NBG and the normal profile. Section 6 presents the experimental results. Section 7 describes the real-time NBG. Finally, section 8 concludes the paper.

2 Preliminaries

The anomaly-based intrusion systems have been extensively studied in the research literature [7, 23-26]. They can be divided into four categories: (i) statistical based; (ii) knowledge based; (iii) data mining based; and (iv) machine learning based. Figure 1 shows the taxonomy of anomaly-based IDSs.

Figure 1.

Taxonomy of anomaly-based intrusion detection systems.

2.1 Statistical methods

Statistical-based anomaly detection methods [27-29] compute some statistical measures of normal observations like mean and variance to build the normal profile, and they check whether the test observations significantly deviate from the normal one. The main advantage of statistical-based approaches is that there is no need for knowledge of attacks beforehand. It has been proven that most of the statistical-based approaches can ensure real-time detection [7]. However, the hypothesis of quasi-stationary process, which is adopted by the statistical-based methods, does not always hold true. In addition, setting a threshold for deviations, and which does not incur high false alarms or low detection, is a difficult task. Statistical-based methods can further be classified into the following approaches:

  1. Univariate model: It considers the features as independent Gaussian random variables.
  2. Multivariate model: It considers the correlations between two or more features. This model is useful when two or more features are related.
  3. Time series model: It identifies abnormal activities by considering their time occurrences. An activity is considered abnormal if the probability of its occurrence is low.

2.2 Knowledge-based methods

The knowledge-based methods [30] use a set of rules, manually created by a human expert, to define the normal behavior of the system. The specification rules can also be derived using some formal tools like automata, expert systems, and Petri nets approaches. However, this approach requires long time to construct complex knowledge.

2.3 Data mining-based methods

Data mining-based methods attempt to derive a reduced amount of data for comparison with test data [7]. Data mining-based methods can further be divided into clustering and classification approaches.

2.3.1 Clustering-based approach

Clustering [31] is an unsupervised approach that groups data into clusters with respect to a given similarity measure. A representative point (e.g., centroid) is selected for each cluster. Then, each of the new data is assigned to a given cluster if it is close to the cluster's centroid. Data, which are not assigned to any cluster, are called anomalies. The main advantage of the clustering approach is that it can operate in unsupervised mode. In addition, as the number of clusters is small, every test instance needs to be compared with a limited number of centroids, and hence, it is fast during the testing phase. However, it is effective only when it is not possible to group anomalies into one cluster.

2.3.2 Classification-based approach

Classification [32, 33] is a supervised approach that operates in two phases: a training phase that builds a classifier from a set of labeled data instances. In the testing phase, a test instance is classified as normal or anomalous using the classifier. Because each test instance is computed against a pre-computed model, the testing phase is considered fast.

2.4 Machine learning-based methods

Machine learning method builds a model that adapts its performance based on previous observations. The machine learning methods can further be divided into the following approaches:

  1. Bayesian networks [34];
  2. Neural networks [35-39];
  3. Fuzzy logic [40-42];
  4. Genetic algorithms [43];
  5. Support vector machine (SVM) [44].

The main characteristic of these approaches is to label data in order to learn the behavior model. However, this operation requires high resource consumption.

2.5 Discussion

Based on some comparative studies, Table 1 summarizes the main advantages and disadvantages of the anomaly-based IDS approaches presented earlier. The approaches are also compared in terms of usage frequency, which is the frequency of performing the detection process. Here, we distinguish two types: batch (periodic analysis) and real time (continuous analysis).

Table 1. Comparison of anomaly-based intrusion detection approaches.
Anomaly-based methodAdvantagesDisadvantagesUsage frequency
Statistical basedNo prior knowledge on attacksUnrealistic quasi-stationary processReal time
Knowledge basedRobust, flexible, and scalableModel generation is difficultBatch
Data mining basedFast during the detection phaseHigh computational costBatch
Machine learning basedAdaptive to changesHigh computational costBatch

The main advantage of knowledge-based IDS is that they are robust, flexible, and scalable [23]. Their main drawback is that the generation of knowledge-based model is difficult and time consuming. In addition, expert systems require intensive computations when the size of the rule set increases, and hence, the IDS becomes slower and eventually becomes unsuitable for real-time detection.

Data mining-based IDSs are fast during the detection phase. They (and also machine learning-based IDSs) have the ability to adapt to changes as a new information is acquired. Because this adaptation incurs high computational cost, the methods are considered batch detection.

Statistical-based IDSs require no prior knowledge on attacks, and hence, they are appropriate for real-time detection. Some approaches [45, 46] have shown an acceptable performance in real traffic scenarios. However, the unrealistic hypothesis of a quasi-stationary network flow prevents the statistical-based IDSs from adapting to normal changes. In this category, the multivariate models are a good choice because they can achieve better detection rate with less false-alarm rate as compared with other statistical-based models. Experiments have shown that better results can be produced by combining the features rather than individually. For the previous reasons, we adopt in this paper the multivariate model for the anomaly-based IDSs.

3 Related work

In the literature, there have been many techniques related to the detection of denial-of-service (DOS) attacks, such as k-nearest neighbor [5], SVM [47, 48], decision trees [49], naive Bayesian (NB) [50], artificial immune systems [51], neural networks (NN) [35], fuzzy logic [40], and linear genetic programing [52].

Some recent approaches adopted the hybridization between many data mining and machine learning classifiers to improve the detection accuracy. However, these approaches might incur higher computational complexity compared with the ones based on a single-machine learning classifier. The major challenge here is how to find a subset of features, which reduces the computational burden while keeping a good detection accuracy. Khor et al., [53] proposed a hybrid classifier by cascading different machine learning techniques such as the Bayesian networks and C4.5 decision tree. Guo et al., [54] proposed a hybrid learning method, named distance sum-based SVM (DSSVM). In DSSVM, a distance sum is defined as a correlation between each data sample and cluster centers. The dataset is first converted into distance sums, which are further used as an input for the training of an SVM classifier. Experimental results under NSL-KDD dataset demonstrated that DSSVM with five features can achieve good results than SVM in terms of detection time and training time. Elngar et al., [55] proposed a PSO-Discritize-HNB IDS, which combines a particle swarm optimization (PSO) for feature selection and information entropy minimization discritization with hidden NB (HNB) classifier. PSO-Discritize-HNB was compared under KDD99 with (i) IG-Discritize-HNB, which uses information gain method to select features, and (ii) HNB. PSO-Discritize-HNB outperforms both methods in terms of detection rate and false positive rate. Also, it can achieve a low training time as it reduces the number of features from 41 to 11. In [56], a misuse detection and an anomaly detection are combined into one hybrid framework. The output of misuse detection technique, which is a random forests classification algorithm, is used as an input for the anomaly-based weighted k-means clustering algorithm. In this hybrid approach, some categorical features of the KDD99 dataset such as protocol_type, which defines the protocol of the connection, for example, TCP, UDP, and ICMP packets, are encoded to binary-valued features. This operation increases the number of features from 41 to 95, which further increases the computational complexity of this approach.

Some fuzzy techniques have also been used to tackle the intrusion issue. Boughaci et al., [57] proposed a fuzzy PSO algorithm (FPSO), which combines between fuzzy logic and PSO. The same authors also proposed a fuzzy genetic algorithm (FGA) [58], which improves the fuzzy rules by adding a genetic algorithm. In [59], the fuzzy rules are combined with a stochastic local search classifier to build fuzzy stochastic local search (FSLS). The authors compared FGA, FSLS, and FPSO with other intrusion detection methods such as hybrid encrypting file system, [60], C4.5[61], 5-NN [62], evolving fuzzy rules for intrusion detection (EFRID) [63], and NB [50]. The comparison study under KDD99 showed the superiority of FPSO.

In [64], an NB classifier was extended to a multi-layer Bayesian. Although this method achieves an acceptable detection rate (i.e., 96.85%), it incurs a high false positive (i.e., 32.67%). In [65], SVM and clustering based on self-organized ant colony network (CSOACN) are combined to form a new method called combining support vectors with ant colony (CSVAC). CSVAC outperforms SVM and CSOACN in terms of detection rate and running time.

Wei et al., [66] improved the k-means clustering algorithm by avoiding initializing the number of clusters k and the k initial clustering centers. Instead, the clustering is performed according to the characteristics of the dataset. Alsharafat [67] designed an IDS that is executed in two phases. In the first phase, an artificial neural network is applied to select the best set of features for each type of network attacks. In the second phase, an extended classifier system [68] based on modified genetic algorithm operators is applied.

In [69], the authors identified 12 features of network traffic to detect attacks using the information gain as a feature selection criterion. Based on this criterion, they compared between various machine learning algorithms, which are decision tree, Ripper rule, back-propagation neural network, radial basis function neural network, Bayesian network, and NB. The comparison study showed that the decision tree has higher total detection rates than the other algorithms. The decision tree is then used as the basis to propose real-time IDS. The latter is shown to be efficient in terms of CPU and memory consumption and can detect malicious data packets within 2 to 3 s. In [70, 71], the authors developed a real-time IDS, which combines an empirical kernel map for online feature extraction and the least squares SVM (LS-SVM) for classification. Experimental results show that LS-SVM exhibits good results in terms of detection efficiency, training time, and detection time.

Some intrusion detection methods do not take into consideration the dependencies between features when building the normal behavior, which results in high false positive. Jin et al., [72] constructed a covariance feature space (CFS) where the correlation differences among sequential samples are evaluated. They utilized two statistical supervised learning approaches: one is the threshold-based detection approach and the second is the traditional decision tree approach. The experimental results show that the threshold-based approach achieves better detection rate than the decision tree, that is, 99.95%, with a false positive of 10% . Tan et al., [73] proposed a multivariate correlation analysis based on the Euclidean distance map (EDM). The aim of this analysis is to identify the relations that exist among features. Significant changes of these relations indicate the existence of attacks. This analysis achieves also a high detection rate while keeping a low false positive rate.

4 Geometric linear similarity/dissimilarity measure

4.1 Training dataset model

Let P be a set of object profiles. These profiles are defined by n features (s1,s2,…,sn). The training dataset D is defined as a set of N profiles, denoted by Z, collected during the training phase. The training dataset is denoted by a matrix A = [aij], where aij is the number of times that a feature si in profile Zj occurs.

As in [74], an information retrieval (IR) vector is used to represent the set of profiles. A profile is represented in IR as a binary (0-1) vector. The value 1 means that the feature has occurred in the profile, and 0 means that the feature has never occurred in the profile. We, thus, define a matrix B = [bij], where bij=1, if the ith feature si is present in the jth profile Zj, and bij=0, otherwise.

4.2 Background on similarity and dissimilarity measures

A similarity measure [75], denoted by Sim, is a function: math formula, which satisfies the following properties:

  • (P1) ∀x,yP:Sim(x,y)≥0
  • (P2) ∀x,yP:Sim(x,x) = Sim(y,y)≥Sim(x,y)
  • (P3) ∀x,yP:Sim(x,y) = Sim(y,x)

Likewise, a dissimilarity measure [75] (also known as distance), denoted by Dis, is a function: math formula, which satisfies the following properties:

  • (P4) ∀x,yP:Dis(x,y)≥0
  • (P5) ∀x,yP:Dis(x,x) = 0
  • (P6) ∀x,yP:Dis(x,y) = Dis(y,x)

Typically, a similarity measure can be converted to serve as the dissimilarity measure and vice versa. In general, any monotonically decreasing function can be applied to convert similarity measures into dissimilarity measures, and any monotonically increasing function can be applied to convert the measures the other way around. The similarity (dissimilarity) measures can be classified as feature based and distance based.

In the feature-based category, we find the binary similarity (known also as Jaccard similarity) [75]. The binary similarity measure between the two profiles, Zj and Zk, is calculated as follows:

display math

It can also be written as: math formula, where

  • a: the number of shared features between Zj and Zk;
  • b: the number of features belonging to Zj and not to Zk;
  • c: the number of features belonging to Zk and not to Zj.

As 0≤μ(Zj,Zk)≤1, the Jaccard dissimilarity, denoted by μ(Zj,Zk)d , is obtained by subtracting the Jaccard similarity from 1. Formally, μ(Zj,Zk)d=1 − μ(Zj,Zk).

In the distance-based category, we find the following dissimilarity measures:

  1. Euclidean distance:
    display math
  2. Minkowski distance:
    display math
  3. Chebychev distance:
    display math
  4. Manhattan distance:
    display math

The previous distances and their corresponding similarity measures can be related via an exponential function [76, 77]. Formally, Sim(a,b) = eDis(a,b).

4.3 Basic idea

The basic idea of our new similarity measure is that the two profiles are similar with respect to feature C if

  1. They both have C in common, or
  2. Neither of them has C.

Also, the two profiles are dissimilar with respect to a characteristic C if only one of them has C.

4.4 Problem formulation

We formally define the binary atomic similarity (dissimilarity) measure between the two profiles, Zj and Zk, with respect to a characteristic si by using the XNOR (resp., XOR) logic operation as follows:

display math

It is obvious that the binary atomic similarity is property based, and there is a need to represent the value associated with the characteristic, that is, math formula. Also, the XOR operation has its limitations as it cannot be used to measure the dissimilarity between the two profiles, especially the case when bij=bik=1 and aij≠0∧aik≠0. To address this issue, we propose a new similarity measure, named quantitative atomic similarity(resp., dissimilarity), denoted by Sim (resp., Dis), in which the elements of matrix A are used and the XOR operation is replaced by the OR operation.

Formally, the similarity (resp., dissimilarity) measure between the two profiles is as follows:

math formula, where

display math

math formula, where

display math

It is easy to show that our proposed similarity (resp., dissimilarity) measure satisfies the properties (P1), (P2), and (P3) (resp., (P4), (P5), and (P6)).

It is obvious from the definition of Sim and Dis that Sim(x,y)≥0 and Dis(x,y)≥0. As all the elements of matrix A are nonnegative, (P1) and (P4) hold true.

Property (P2) is satisfied as math formula.

Property (P5) is satisfied as math formula.

As min(a,b) = min(b,a),max(a,b) = max(b,a), and ⊕ and ∨ are symmetric operations, P(3) and P(6) hold true.

We prove that the similarity and dissimilarity measures can be combined to yield the following theorem:

Theorem 1. Sim(a,b) + Dis(a,b) = n

Proof.

display math
display math

(bijbik)⇒(aij≠0∨aik≠0)⇒aijaik. Then, math formula

display math

Dis(Zj,Zk) can also be written as follows:

display math
display math

Thus, Theorem 1 is proven. From Properties (P1), (P4), and Theorem 1, it is obvious that Sim(a,b)≤n and Dis(a,b)≤n and both are related via a geometric linear function. Figure 2 shows the graphic representation of the relation between similarity and dissimilarity measures. The two measures between any two profiles can form a two-dimensional point that lies on the line y = nx.

Figure 2.

Relation between similarity and dissimilarity.

4.5 Comparison of similarity measures

In the following, we compare between the geometric linear similarity (resp., dissimilarity) measure and the other measures proposed in the literature.

4.5.1 Geometric linear similarity versus binary similarity

The main drawback of the binary similarity is that it ignores the features that are not present in both profiles Zj and Zk. Let us consider a network with two users. Each user is represented by a corresponding profile: x = (1,1,1,0,1,1,0) and y = (1,0,1,0,1,0,1). So, we have math formula. In this example, we can notice that (a + b + c) ≠ 7, which is the number of features in a profile. Also, the fourth element in both vectors, which is 0, is not considered. In fact, the similarity measure should be math formula. Therefore, the binary similarity measure cannot faithfully represent an accurate similarity between the two profiles. Also, other similarity measures like binary weighted cosine (BWC), binary weighted radial basis function (BWRBF), and smooth binary weighted radial basis function (SBWRBF) [74] are based on the binary similarity, and therefore, they cannot accurately declare whether a profile is normal or anomalous.

4.5.2 Geometric linear similarity versus distance based

The geometric similarity and dissimilarity measures both take values in the interval [0,n]. The distance-based measure is defined on the interval [0,[, and its corresponding similarity measure, which is an exponential function, lies in the interval ]0,1]. Table 2 shows the different values of similarity and dissimilarity when applying the geometric linear and the geometrical exponential concepts. Let P1 and P2 be the two vector profiles. Ls (resp., Ld) denote the geometric linear similarity (resp., dissimilarity) function. Es denotes the geometric exponential similarity, and Ed denotes its corresponding Euclidean distance. In each row of the table, we modify one element of P2 vector in order to increase the dissimilarity between the two vectors. We can observe that Ls equals the upper bound of the interval [0,4] when the two vectors are totally similar. When they are totally different, the similarity between them is 0. This is not the case for function Es. When the two vectors are completely different from each other, Es is different from 0. This observation shows that the geometric linear similarity defines more accurately the proximity between the two profiles compared with the distance-based measures.

Table 2. Comparison between geometric linear and geometric exponential similarity.
P1P2Ls(P1,P2)Ld(P1,P2)Es(P1,P2)Ed(P1,P2)
(0,0,0,0)(0,0,0,0)4010
(0,0,0,0)(0,0,0,1)310.361
(0,0,0,0)(0,0,1,1)220.251.41
(0,0,0,0)(0,1,1,1)130.171.73
(0,0,0,0)(1,1,1,1)040.132

4.5.3 Geometric linear similarity versus symmetric radial basis function

The kernel similarity measure smooth radial basis function (SRBF) is defined as follows:

display math

where ||.|| is the Euclidean norm. The first problem with SRBF is that it cannot measure the similarity between vectors composed only of null values. In addition, it suffers from the same issue as the distance-based measure, that is, the similarity between vectors, which are totally different from each other, is different from 0. The following example in Table 3 clearly explains this issue.

Table 3. Comparison between geometric linear similarity and symmetric radial basis function.
P1P2Ls(P1,P2)SRBF
(1,1,1,1)(1,1,1,1)41
(1,1,1,1)(1,1,1,0)30.84
(1,1,1,1)(1,1,0,0)20.71
(1,1,1,1)(1,0,0,0)10.60
(1,1,1,1)(0,0,0,0)00.51

5 Normal behavioral graph-based multivariate correlation analysis

5.1 Framework of normal behavioral graph model

The framework of the NBG approach, as illustrated in Figure 3, is composed of two phases: offline training phase and online detection phase. In the offline phase, a labeled training dataset is partitioned into normal and attack datasets. From each resulted dataset, we compute its corresponding correlation matrix, which is used to generate a graph of correlated low-level features. The two resulted graphs are combined to generate the NBG. In the online detection phase, the data packets captured during each x seconds are combined into one data record. We only extract from the data record the low-level features that compose the NBG. Then, we compute the high-level features that will be used as an input for the NBG. The NBG will examine whether the input values correspond to a normal connection or an attack.

Figure 3.

Framework of normal behavioral graph-based model.

5.2 Training phase: generation of normal behavioral graph

In the training phase, we investigate the associations between features. We are interested in identifying the degree of correlation between those features, that is, the correlation coefficient. First, we show that a correlation matrix is constructed from a dataset. Because we consider all observations from the training dataset, we refer to Pearson's correlation coefficient when applied to a population as the population correlation coefficient. We use the formula:

display math(1)

If ρ(X,Y) = 1, then X and Y have a linear correlation. If 0.7≤ρ(X,Y) < 1, then X and Y have a strong linear correlation. If 0.5≤ρ(X,Y) < 0.7, then X and Y have a modest linear correlation. Finally, with 0≤ρ(X,Y) < 0.5, X and Y are said to have a weak linear correlation.

In our notation, we consider the set of all features math formula. For each pair of features (Fi,Fj), we calculate ρ(Fi,Fj). Then, we construct Ω[N], a n × n correlation matrix, where math formula.

display math

and −1≤Ωij≤+1.

In addition to these developments, we select only significant features, that is, those association of features with p-value < 0.05. The p-value is a probability used to show if a correlation exists between two features.

display math

Then, we derive a graph G = (V,E,w) from matrix Ω, defined as

  • V={v1vn}, the set of vertices (features) where |V|=n;
  • E={(vi,vj),whereΩij≠0,wijijand|E|=m}.

We consider that the dataset is composed of normal and abnormal connections, and hence, the dataset is partitioned into two partitions (or clusters): normal dataset and abnormal dataset. We apply the previous calculations on the two datasets separately to generate two matrices: Ω[N] and Ω[A]. We also generate their corresponding graphs G[N]=(V,E[N],w[N]) and G[A]=(V,E[A],w[A]).

We define a composition operation ⊙ that creates a new graph G[C] from the two graphs. Formally, G[C]=G[N]G[A] is defined as follows:

  • math formula;
  • V[C]={xV:∃yV(x,y)∈E[C]};
  • math formula.

G[C] is composed of strongly connected subgraphs, in the sense that if a subset of features is strongly correlated, then the subgraph associated with this subset of nodes will be strongly connected. A clique, or complete graph, is formed when each pair of this subset is forming a strong association.

For example, suppose that we have four network features, namely F1,F2,F3, and F4. The correlation coefficient matrices Ω[N] and Ω[A] between these features are

display math
display math

According to the correlation matrices, we generate the graph G[C], as shown in Figure 4.

Figure 4.

Graph-based normal behavioral model.

Then, for each edge (Fi,Fj)∈G[C], we compute a high-level feature using a function Hij. Formally, HFk=Hij(Fi,Fj), where math formula. The methodology to choose Hij is described as follows:

  • If Fi and Fj are originally taken from G[N], we choose a function Hij in a way that maximizes (resp., minimizes) HFk when a normal (resp., abnormal) connection occurs.
  • If Fi and Fj are originally taken from G[A], we choose a function Hij in a way that maximizes (resp., minimizes) HFk when an abnormal (resp., normal) connection occurs.

Then, we define a reference vector math formula that represents normal behavior as follows:

display math

5.3 Detection phase

To check whether a profile Z is normal or anomalous, we first convert Z of n features to a vector ZHF of M elements: math formula, such that each element is a high-level feature computed from the low-level features of Z. Then, we carry out the following pseudo-code depicted in Algorithm 1.

image

6 Experimental results

In this section, we aim at evaluating the performance of our normal behavior graph-based multivariate IDS (NBG) using NSL-KDD dataset [22].

6.1 Discussion of dataset accuracy

To assess the efficiency of an IDS, it is required to test it with a dataset. There are two ways to obtain a labeled dataset: (i) use the available public labeled datasets like DARPA [78], KDD99 [79], and NSL-KDD [22] or (ii) build a new dataset from real-network traffic.

Collecting data from real networks raises concerns about its accurate classification and labeling. Anomaly-based IDSs generally needs training with attack-free data. Nevertheless, the data collected from real networks might not satisfy this requirement. Shafi and Abbas [80] have described a methodology to generate a customized labeled dataset. The dataset is composed of background traffic obtained from a real network and attack traffic obtained through simulation. To label the background traffic, Snort IDS is executed. The authors have acknowledged that this methodology might falsely label a normal connection as an attack. Spathoulas et al., [81] have also criticized the use of datasets obtained from real-network traffic and argued that it can create a lot of ambiguity as it is not possible to distinguish with certainty between normal and attack traffic. As network traffic can never be fully accurate because the attacks are more and more sophisticated and stealthy, it can be concluded then that there is no methodology that can accurately label network connections as legitimate or intrusion, and if such an accurate methodology exists, it means that this methodology itself is an IDS.

On the other hand, the available public labeled datasets like DARPA, KDD99, and NSL-KDD have been widely used by research community [82, 83]. However, some studies have pointed out some drawbacks of DARPA and KDD99. One of the drawbacks is the presence of simulation artifacts in DARPA dataset. These artifacts could exist as well in the KDD99 datasets. Although these critics, Thomas et al., [84], have given the following supporting reasons for the use of DARPA dataset,

  • Brugger and Chow in [85], while analyzing DARPA 1998, concluded that it is possible, with the means of sophisticated IDS, to reach good rate of false positive detection on the DARPA IDS dataset.
  • On the other hand, Mahoney and Chan in [83] argued that there is a sort of ‘positive relationship’ between the use of an IDS on DARPA dataset and that on real data. They claimed that any IDS not performing well on DARPA dataset would not perform well on real data with regard to attack detection.
  • Labeling the network connections with raw data is not 100% accurate.

Researchers have wildly used KDD99 [79], which is a labeled dataset, to assess new methods for anomaly detection. This dataset is a subset of the DARPA98 IDS evaluation program [78]. NSL-KDD is a reduced and cleansed version of the original KDD99 dataset [79]. NSL-KDD solves the following problems of the original KDD99 dataset:

  • It eliminates redundancies from the original data in order to avoid any possibility of bias.
  • The test dataset does not contain duplicate records. Hence, the performance of the IDS is not biased toward redundant records.

Ibrahim et al., [86] have evaluated the performance of an IDS using KDD99 and NSL-KDD. Their results have shown that detection rate is 92.37% for KDD99 and 75.49% for NSL-KDD. This indicates that KDD99, which has redundant records, produces questionable and inaccurate results.

For these reasons, NSL-KDD is used in this work. To the best of our knowledge, our work is the first multivariate IDS tested under NSL-KDD. In NSL-KDD, there are 41 features for each connection. A connection is a group of sequential packets collected over a time window of 2 s. Detailed description about these features can be found in [87].

In this work, we aim at detecting DOS attacks. The DOS attacks, as defined in the dataset, are divided into six types: Teardrop, Smurf, Pod, Neptune, Land, and Back. Table 4 shows the number of normal and DOS records in the training dataset.

Table 4. Number of normal and denial-of-service records in the training dataset.
NormalBackLandNeptuneSmurfTeardropPod
67 3439561841 2142646892201

According to the normal and the abnormal datasets in NSL-KDD, we generate G[C] as explained in section 5. To increase the accuracy detection of a specific DOS attack, for example, Teardrop or Smurf, we perform the following operations:

  1. Extract the corresponding attack (e.g., Teardrop) from the abnormal dataset.
  2. Calculate its corresponding correlation matrix Ω[T].
  3. Calculate its corresponding graph G[T].
  4. Calculate G[NT]=G[N]G[T].
  5. Add G[NT] to G[C].

G[C] resulted from the previous operations is depicted in Figure 5. It is composed of 15 low-level features, which are used to generate 20 high-level features.

Figure 5.

Graph-based normal behavioral model from NSL-KDD.

6.2 Accuracy evaluation

We assess the accuracy of NBG under NSL-KDD, using the following two metrics:

  • Detection rate: The percentage of attack connections detected by the IDS to the total number of connection attacks.
  • False positive rate: The percentage of normal connections that are misclassified as attacks to the total number of normal connections.

Table 5 shows the performance of our multivariate IDS in terms of detection rate of attacks and normal connections under different values of thresholds. We vary the threshold from 1 to 2.5 with an increment of 0.5. The overall detection rate of all DOS attacks ranges between 99.08% and 83.53%. The detection rate of normal connections rises sharply from 63.53% to 98.45% along with the increase of the threshold. The Land and Pod attacks are completely detected in all the cases. As for Neptune, our IDS achieves a detection rate as close as possible to 100% (between 99.69% and 100%). Our IDS also suffers from a slight decline in detecting Back and Teardrop attacks when the threshold reaches 2, but it still manages to detect approximately 80% of the Back attacks and 83% of the Teardrop attacks. As for Smurf, there is a drastic decrease from 100% to 27.36% when the threshold is 2. This can be explained that the features generated in the G[C] are mostly dominated by the Neptune attack, which consists of 89.73% of the total DOS records in the training dataset.

Table 5. Detection rate for normal connections and denial-of-service attacks.
Type of recordThreshold
1 (%)1.5 (%)2 (%)2.5 (%)
  1. DOS, denial of service.

Normal63.5363.5398.4598.45
DOS99.0899.0284.6383.53
Back98.3297.2182.4579.10
Land100100100100
Neptune10010099.9399.69
Smurf10010027.3626.91
Teardrop10010083.3383.33
Pod100100100100

To better understand the tradeoff offered by the proposed IDS, the results given in Table 5 are plotted as a receiver operating characteristic (ROC) curve in Figures 6 and 7. To measure the performance of IDSs, the ROC curve is obtained by plotting a pair of coordinates (X,Y) such that X is the false positive rate and Y is the detection rate. The two metrics change in relation to each other as a threshold varies. A better ROC curve has y-values, which grow at a faster rate than its x-value. ROC curve is better to be near the upper left-hand corner (or the point of perfection at (0,1)), at the very best; that is, the further to the top left, the better. Figure 6 shows the ROC curve for all the DOS attacks, and Figure 7 shows the ROC curve for each attack, separately. To detect all DOS attacks, we can consider that the best tradeoff between detection rate and false positive rate, which is ensured by our multivariate IDS, is when the threshold is 2. We can draw the same observation in the case of Back and Teardrop attacks. As for Land, Pod, and Neptune attacks, the perfect configuration setting is when the threshold is 2, as a detection rate of (or very close to) 100% is achieved while keeping the lowest false positive. As for Smurf, we have to tolerate relatively a high false positive rate of 36.47% if high detection rate is a requirement of high priority.

Figure 6.

ROC curve for all attacks.

Figure 7.

ROC curve for each DOS attack.

To further assess the accuracy of our IDS, we perform two comparison studies: one study compares NBG with two statistical-based IDSs, which are EDM multivariate correlation analysis-based IDS and CFS IDS. The other one compares NBG with other data mining and machine learning-based IDSs presented in section 3.

6.2.1 Comparison with statistical-based intrusion detection systems

As EDM and CFS have been tested under KDD99, and in the purpose of comparing the results fairly, we have also tested our IDS under KDD99. The accuracy of the IDSs at detecting DOS attacks is shown in Table 6. The IDSs are compared in terms of detection rate, false positive rate, and ROC distance. The ROC distance is defined as math formula, where DR and FPR denote the detection rate and the false positive rate, respectively. We can note here that the lower ROC distance is, the more efficient the IDS is. The first observation that we can draw from the table is that the statistical-based IDSs generally incur relatively high false positive. For instance, CFS incurs a false positive of 10.33%. This is because the statistical-based IDS assumes a quasi-stationary process, and this is the price to pay to ensure real-time detection. EDM succeeds to achieve a lower false positive (i.e., 2.08%) as it uses correlation between features to describe the normal behavior. In this way, it makes it difficult for the attacker to mimic this behavior. NBG, on the other hand, makes mimicking the normal behavior more difficult as the normal behavior graph is only composed of edges with very strong correlated features (or quasi-linear correlation) (i.e., ρ(X,Y) > 0.9). This explains the low false positive values incurred by this method, that is, 1.55% under NSL-KDD and 0.6% under KDD 99. From Table 6, we can also notice that NBG can make the best tradeoff between detection rate and false positive rate as it incurs the lowest ROC distance under KDD99.

Table 6. Accuracy comparison of statistical-based approaches.
 EDM-based multivariate correlation [73] (KDD99)Covariance feature space [72] (KDD99)NBG-based multivariate correlation (KDD99)NBG-based multivariate correlation (NSL-KDD)
  1. EDM, Euclidean distance map; NBG, normal behavioral graph; ROC, receiver operating characteristic.

Detection rate99.96%99.95%99.76%84.63%
False positive rate2.08%10.33%0.6%1.55%
ROC distance2.0810.330.6415.44

6.2.2 Comparison with data mining and machine learning-based intrusion detection systems

The accuracy of the data mining and machine learning-based methods at detecting DOS attacks is shown in Table 7. In the table, besides the name of each method, there is a reference that indicates where the corresponding result values are published. The values were obtained under different types of datasets such as KDD, NSL-KDD, and Reliability Lab Data 2009 datasets [69]. Except for NB [57], the rest of data mining and machine learning-based methods show good detection accuracy. Many hybrid classifiers succeed at achieving a high detection rate while keeping a low false positive. The ROC distance can tell that a better tradeoff between two metrics is ensured by these IDSs. For the methods tested under KDD99, their ROC distances range between 0.71 and 9.74, which is less optimal than the one achieved by NBG. We also observe that NBG shows lower detection rate compared with HNB, IG-Discritize-HNB, and PSO-Discritize-HNB [55], which are tested under NSL-KDD. On the other hand, NBG outperforms these methods in terms of false positive rate when it is tested under the same dataset.

Table 7. Accuracy comparison of other intrusion detection systems against denial-of-service attacks.
 DR (%)FPR (%)ROC distanceDataset
  1. DR, detection rate; FPR, false positive rate; ROC, receiver operating characteristic; NB, naive Bayesian; k-NN, k-nearest neighbor; NN, neural networks; BN, Bayesian networks; SVM, support vector machine; DSSVM, distance sum-based support vector machine; HNB, hidden naive Bayesian; PSO, particle swarm optimization; FPSO, fuzzy particle swarm optimization algorithm; FGA, fuzzy genetic algorithm; EFS, evolutionary fuzzy system; ANN, artificial neural network; XCS, extended classifier system; RT-IDS, real-time intrusion detection system; LS-SVM, least squares support vector machine; RLD09, Reliability Lab Data 2009; NBG, normal behavioral graph.

NB [54]79.2621.65KDD99
NB [57]79.45.821.4KDD99
NB [69]78.21.321.84Subset of RLD09
C4.5 [54]96.834.39KDD99
C4.5 [57]97.14.15.02KDD99
C4.5 [69]99.40.30.67Subset of RLD09
RT-IDS with C4.5 [69]99.170.571.01RLD09
C4.5-BN [53]97.82.63.41KDD99
k-NN [54]971.73.45KDD99
5-NN [57]96.73.74.96KDD99
SVM (41 features) [54]96.82.33.94KDD99
DSSVM (5 features) [54]97.21.63.22KDD99
HNB [55]99.13.53.61NSL-KDD
IG-Discritize-HNB [55]993.33.45NSL-KDD
PSO-Discritize-HNB [55]99.322.12NSL-KDD
FPSO [57]97.229.349.74KDD99
FGA [57]99.997.57.5KDD99
Hybrid EFS [57]98.51.52.12KDD99
EFRID [57]98.917.227.3KDD99
k-means [88]1000.8640.86KDD99
Improved k-means [88]1000.7120.71KDD99
XCS-ANN [67]98.80.91.5KDD99
Batch LS-SVM [70]98.561.241.9KDD99
Online LS-SVM [71]98.481.332.02KDD99
Back-propagation NN [69]98.61212.08Subset of RLD09
Ripper rule [69]99.41.21.34Subset of RLD09
Proposed NBG99.760.60.64KDD99
 84.631.5515.44NSL-KDD

6.3 Running time evaluation

In this section, the IDSs are evaluated in terms of the following two metrics:

  • Training time: The required time to train the classifier or build the normal profile model.
  • Detecting time: The required time to decide whether a connection is normal or anomalous. Some parameters, which are used to compute the detection time under online detection, are not considered under offline dataset. For instance, there is no need to compute the time of data acquisition (i.e., collecting raw data packets over a time window) and the time of preprocessing (i.e., converting the raw data connections into feature vectors) as the offline dataset is already formatted as feature vectors.

It is not easy to fairly compare between the IDSs in terms of running time as the latter is not usually reported in research studies. Also, the experimental settings, under which the tests are carried out, are unknown. In some cases, there is considerable divergence from one test to another one. The obvious case is SVM, which incurred a training time of 267.22 s in [54] and 4.231 s in [65]. Therefore, this comparison is provided in Table 8 just for reference. As for NBG, it is implemented on a 2.83 GHz Intel Pentium Core 2 Quad 9550 processor with 4 GB RAM. NBG requires 25.14 s under KDD99 to build the graph of strongly correlated features. This value is reduced to 3.34 s under NSL-KDD, as the latter has smaller size compared with KDD99. NBG requires just a few microseconds to classify the connection as normal or anomalous. This result is logical as, during the testing phase, NBG performs the following operations for each test connection Z:

  1. It extracts 15 low-level features from the connection Z (The features are already available in the dataset).

  2. It uses simple arithmetic operations to convert 15 low-level features into 20 high-level ones; that is, it converts Z to ZHF.
  3. It computes the geometric linear similarity between ZHF and the reference vector HFref.
  4. It executes a conditional instruction to decide the type of the connection.
Table 8. Training and detection time of intrusion detection systems.
 Training time (s)Detection time (s)DatasetExperimental settings
  1. SVM, support vector machine; NN, neural networks; DSSVM, distance sum-based support vector machine; CSOACN, clustering based on self-organized ant colony network; CSVAC, combining support vectors with ant colony; RT-IDS, real-time intrusion detection system; LS-SVM, least squares support vector machine; DOS, denial of service; HNB, hidden naive Bayesian; PSO, particle swarm optimization; NBG, normal behavioral graph; RLD09, Reliability Lab Data 2009; N/A, not available.

optimum-path forest [89]377.4065324.8682NSL-KDDN/A
Bayes [89]246.3297592.2913NSL-KDDN/A
SVM-RBF: radial basis function [89]750.4116627.7814NSL-KDDN/A
self-organizing maps [89]1674.55822692.3606NSL-KDDN/A
SVM (41 features) [54]267.22581.02KDD99Core dual 2.66 GHz and 3 GB RAM
SVM (8 features) [54]22.0921.22FDDCore dual 2.66 GHz and 3 GB RAM
SVM [65]4.231N/AKDD99N/A
DSSVM (5 features) [54]209.92483.39KDD99Core dual 2.66 GHz and 3 GB RAM
CSOACN [65]5.645N/AKDD99N/A
CSVAC [65]3.388N/AKDD99N/A
Batch LS-SVM [70]Normal: 5.25Normal: 1.42KDD99N/A
 DOS: 15.92DOS: 1.48KDD99 
Online LS-SVM [71]Normal: 3.12Normal: 0.9KDD99N/A
 DOS: 10.99DOS: 1.10KDD99N/A
HNB [55]2.47N/ANSL-KDDN/A
IG-Discritize-HNB [55]1.09N/ANSL-KDDN/A
PSO-Discritize-HNB [55]0.18N/ANSL-KDDN/A
Back-propagation NN [69]16.92N/ARLD092.83 GHz Intel Pentium Core 2
Ripper rule [69]1.5N/ARLD09Quad 9550 processor with
RT-IDS with C4.5 [69]0.28Few millisecondsRLD094 GB RAM and 100 Mbps LAN
Proposed NBG25.145.5μsKDD992.83 GHz Intel Pentium Core 2
 3.345.5μsNSL-KDDQuad 9550 processor with 4 GB RAM

6.4 Resource consumption evaluation

While NBG was running, we used the Windows task manager to measure the CPU and memory consumption, as shown in Figures 8 and 9. Under KDD99 dataset, NBG uses 26% of CPU while consuming 85 MB of memory. Under NSL-KDD, the same CPU usage is observed with less memory consumption, which is 41 MB. We also observe that the CPU occupation lasts longer in case of KDD99 because of its bigger size compared with NSL-KDD.

Figure 8.

CPU consumption of NBG.

Figure 9.

Memory consumption of NBG.

7 Real-time normal behavioral graph

Real-time intrusion detection is defined as the ability to report detection results while the incoming data packets are captured online [90]. However, this definition does not tell the minimum time that should separate the occurrence of the attack and its detection to verify this property. In the literature, many IDSs [70, 71, 91, 92] claim that they offer real-time detection but no proof is provided. Also, we cannot know by which mechanism that this property is verified. The optimal selection of features can play an important role at reducing detection time. However, even the IDSs, which provided low training time and detection time, cannot guarantee real-time detection as they were tested under offline dataset. Some phases, which consume long running time such as data acquisition and preprocessing of captured data, are not considered in the case of offline dataset.

We consider that there is a continuous traffic of raw data that arrives at the NIC of the computer. We define a data record as a set of packets collected within a time window of every T seconds. The kth observed time window is denoted by Wk. Because of the existence of direct memory access (DMA) feature in network cards, the CPU can perform other tasks during T seconds while the NIC is receiving the data frame from the NIC and the DMA is transferring the data frame from the NIC's buffer to the memory of the computer. The CPU receives an interrupt from the DMA controller when the transfer operation is performed. Therefore, DMA allows computation and data transfer to proceed in parallel. This means that the data acquisition process can be executed by DMA while the data preprocessing, feature extraction, and detection processes are executed by the CPU. Figure 10 shows how the different components of the IDS are executed on a time axis. At the end of each time interval Wk, the data acquisition process has finished collecting data traffic corresponding to T second. This time interval is immediately followed by another time interval, called processing time interval, and is denoted by Pk. The duration of each Pk is U seconds. During each U seconds, the IDS has to execute the following tasks:

  1. Preprocess the data collected during T seconds to extract the corresponding feature vectors.
  2. Use the feature vectors as an input for the detection process (i.e., classifier or normal behavioral model), and decide whether the input values correspond to a normal connection or an attack.
Figure 10.

Time diagram of IDS execution.

We say that an IDS ensures real-time property iff each time interval Pk is contained in Wk + 1; that is, Pk and Wk + 1 have the same starting point, and Pk must end before the beginning of Wk + 2 (Formally, U < T). In general, the detection process does not incur long time, but it is difficult to say that the preprocessing phase will be contained in the observed time window because it depends on the size of raw data collected. The latter depends further on the rate at which the incoming data packets arrive to the NIC and the duration of the time window. As the network speed is continuously increasing and reached a speed of 40 Gbps [14], the IDS has to process a huge data size, and it is possible that U exceeds T. In this case, let us consider that U = T+Δ, T = 2 s, and Δ=1 ms. Then, the data collected during Wk will be tested by the IDS after k × T + k×Δ instead of k × T+Δ. The detection will be delayed by 1 s after each receipt of 1000 data records, which means that it will delayed by 1 s after the expiry of each 33.33 min. Therefore, the real detection property is not ensured in this case.

To deal with this issue, the IDS in [90] extracts packet information from IP header, TCP header, UDP header, and ICMP header from each packet. Then, it aggregates the information that is extracted during each 2 s, which means that the data acquisition and the preprocessing phases are executed simultaneously.

To ensure real-time property, we propose a mechanism that intertwines the data acquisition phase and the preprocessing phase, as shown in Figure 11. The IDS allocates memory space for the 15 low-level features obtained during the offline training phase. Upon the transfer of each data frame by the DMA to the buffer of the memory, the IDS extracts the header information of each frame and updates some or all the 15 low-level features. After the expiry of T seconds, the final update of the low-level features correspondent to the data of T seconds has already been performed. At the end of every T second, the IDS computes the high-level features based on the low-level ones to decide whether an attack has occurred during the last T seconds or not. Then, it resets all the 15 low-level features to 0. If we consider that T = 2 s as in KDD99 and NSL-KDD datasets, we can say that a DOS attack can be detected after 2 s and 5.5 μs from its occurrence.

Figure 11.

Intertwined data acquisition and preprocessing in NBG.

8 Conclusion

In this paper, we have proposed an IDS, named NBG-based multivariate correlation analysis (NBG), based on four approaches: (i) statistical-based IDS to reduce detection time; (ii) intertwining data acquisition phase and data preprocessing to ensure real-time detection; (iii) geometric linear similarity measure; and (iv) multivariate correlation analysis. We have shown that the geometric linear similarity measure is better than the ones proposed in the literature in terms of detection accuracy. Using multivariate correlation analysis, we have reduced the dimensionality of the feature space processed by the IDS. In this analysis, a subset of strongly correlated features have been extracted from a training dataset, and NBG is constructed. Based on this graph, the normal profile, which is composed by high-level features, is derived. The experimental results show that the proposed IDS can achieve good results in terms of detection rate and false positive rate. For some DOS attacks, 100% detection rate is achieved with 1.55% false positive. Comparison study performed under KDD99 and NSL-KDD has shown encouraging results. NBG can ensure the best tradeoff between detection rate (99.76%) and false positive rate (0.6%), while ensuring real-time detection. NBG shows lower detection rate compared with some methods tested under NSL-KDD, but it outperforms these methods in terms of false positive rate. It also incurs low detection time while consuming low CPU and low memory. Therefore, NBG can be considered as competitive candidate for intrusion detection.

Acknowledgment

The authors extend their appreciation to the Deanship of Scientific Research Center of the College of Engineering at King Saud University for supporting this work.

Ancillary