This paper presents a visual analysis of the UNSW-NB25 computer network
security or intrusion detection dataset in order to detect any issues inherent
to this dataset which may require researchers to address before employing this
dataset for data-driven model development such as a machine learning
classifier. A number of data preprocessing algorithms are applied on the raw
data to address common issues such as elimination of redundant features,
conversion of nominal features into numerical format and scaling. PCA, t-SNE
and K-means clustering algorithms are employed for developing the graphs and
plots for visualization. Consequent analysis through visualization identified
and illustrated two major problems as class imbalance and class overlap for
this dataset. In conclusion, it is necessary to address these two problems of
class imbalance and class overlap prior to employing this dataset for any
classifier model development.
[Journal_ref: ]