Machine Learning-based supervised approaches require highly customized and
fine-tuned methodologies to deliver outstanding performance. This paper
presents a dataset-driven design and performance evaluation of a machine
learning classifier for the network intrusion dataset UNSW-NB15. Analysis of
the dataset suggests that it suffers from class representation imbalance and
class overlap in the feature space. We employed ensemble methods using Balanced
Bagging (BB), eXtreme Gradient Boosting (XGBoost), and Random Forest empowered
by Hellinger Distance Decision Tree (RF-HDDT). BB and XGBoost are tuned to
handle the imbalanced data, and Random Forest (RF) classifier is supplemented
by the Hellinger metric to address the imbalance issue. Two new algorithms are
proposed to address the class overlap issue in the dataset. These two
algorithms are leveraged to help improve the performance of the testing dataset
by modifying the final classification decision made by three base classifiers
as part of the ensemble classifier which employs a majority vote combiner. The
proposed design is evaluated for both binary and multi-category classification.
Comparing the proposed model to those reported on the same dataset in the
literature demonstrate that the proposed model outperforms others by a
significant margin for both binary and multi-category classification cases.
[Journal_ref: ]