CICIDS-2017 Dataset Feature Analysis With Information Gain for Anomaly Detection

Stiawan, Deris (2020) CICIDS-2017 Dataset Feature Analysis With Information Gain for Anomaly Detection. IEEE.

CICIDS-2017 Dataset Feature Analysis With.pdf

Download (2MB) | Preview


Feature selection (FS) is one of the important tasks of data preprocessing in data analytics. The data with a large number of features will affect the computational complexity, increase a huge amount of resource usage and time consumption for data analytics. The objective of this study is to analyze relevant and signi�cant features of huge network traf�c to be used to improve the accuracy of traf�c anomaly detection and to decrease its execution time. Information Gain is the most feature selection technique used in Intrusion Detection System (IDS) research. This study uses Information Gain, ranking and grouping the features according to the minimum weight values to select relevant and signi�cant features, and then implements Random Forest (RF), Bayes Net (BN), Random Tree (RT), Naive Bayes (NB) and J48 classi�er algorithms in experiments on CICIDS-2017 dataset. The experiment results show that the number of relevant and signi�cant features yielded by Information Gain affects signi�cantly the improvement of detection accuracy and execution time. Speci�cally, the Random Forest algorithm has the highest accuracy of 99.86% using the relevant selected features of 22, whereas the J48 classi�er algorithm provides an accuracy of 99.87% using 52 relevant selected features with longer execution time.

Item Type: Other
Subjects: T Technology > T Technology (General) > T1-995 Technology (General)
Divisions: 09-Faculty of Computer Science > 56201-Computer Systems (S1)
Depositing User: Dr. Deris Stiawan
Date Deposited: 11 Sep 2020 10:16
Last Modified: 11 Sep 2020 10:16

Actions (login required)

View Item View Item