A two-stage intrusion detection method based on light gradient boosting machine and autoencoder

Hao Zhang; Lina Ge; Guifen Zhang; Jingwei Fan; Denghui Li; Chenyang Xu; Hao Zhang; Lina Ge; Guifen Zhang; Jingwei Fan; Denghui Li; Chenyang Xu

doi:10.3934/mbe.2023301

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 4: 6966-6992. doi: 10.3934/mbe.2023301

Previous Article Next Article

Research article Special Issues

A two-stage intrusion detection method based on light gradient boosting machine and autoencoder

1.
School of Artificial Intelligence, Guangxi Minzu University, Nanning 530006, China
2.
Key Laboratory of Network Communication Engineering, Guangxi Minzu University, Nanning 530006, China
3.
Guangxi Key Laboratory of Hybrid Computation and IC Design Analysis, Nanning 530006, China
4.
College of Electronic Information, Guangxi Minzu University, Nanning 530006, China

Academic Editor: Víctor Leiva

Received: 28 September 2022 Revised: 18 January 2023 Accepted: 28 January 2023 Published: 09 February 2023

Intrusion detection systems can detect potential attacks and raise alerts on time. However, dimensionality curses and zero-day attacks pose challenges to intrusion detection systems. From a data perspective, the dimensionality curse leads to the low efficiency of intrusion detection systems. From the attack perspective, the increasing number of zero-day attacks overwhelms the intrusion detection system. To address these problems, this paper proposes a novel detection framework based on light gradient boosting machine (LightGBM) and autoencoder. The recursive feature elimination (RFE) method is first used for dimensionality reduction in this framework. Then a focal loss (FL) function is introduced into the LightGBM classifier to boost the learning of difficult samples. Finally, a two-stage prediction step with LightGBM and autoencoder is performed. In the first stage, pre-decision is conducted with LightGBM. In the second stage, a residual is used to make a secondary decision for samples with a normal class. The experiments were performed on the NSL-KDD and UNSWNB15 datasets, and compared with the classical method. It was found that the proposed method is superior to other methods and reduces the time overhead. In addition, the existing advanced methods were also compared in this study, and the results show that the proposed method is above 90% for accuracy, recall, and F1 score on both datasets. It is further concluded that our method is valid when compared with other advanced techniques.
- cybersecurity,
- feature selection,
- focal loss,
- intrusion detection systems,
- machine learning
Citation: Hao Zhang, Lina Ge, Guifen Zhang, Jingwei Fan, Denghui Li, Chenyang Xu. A two-stage intrusion detection method based on light gradient boosting machine and autoencoder[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6966-6992. doi: 10.3934/mbe.2023301

Related Papers:

Abstract

Intrusion detection systems can detect potential attacks and raise alerts on time. However, dimensionality curses and zero-day attacks pose challenges to intrusion detection systems. From a data perspective, the dimensionality curse leads to the low efficiency of intrusion detection systems. From the attack perspective, the increasing number of zero-day attacks overwhelms the intrusion detection system. To address these problems, this paper proposes a novel detection framework based on light gradient boosting machine (LightGBM) and autoencoder. The recursive feature elimination (RFE) method is first used for dimensionality reduction in this framework. Then a focal loss (FL) function is introduced into the LightGBM classifier to boost the learning of difficult samples. Finally, a two-stage prediction step with LightGBM and autoencoder is performed. In the first stage, pre-decision is conducted with LightGBM. In the second stage, a residual is used to make a secondary decision for samples with a normal class. The experiments were performed on the NSL-KDD and UNSWNB15 datasets, and compared with the classical method. It was found that the proposed method is superior to other methods and reduces the time overhead. In addition, the existing advanced methods were also compared in this study, and the results show that the proposed method is above 90% for accuracy, recall, and F1 score on both datasets. It is further concluded that our method is valid when compared with other advanced techniques.

References

[1]	An Article to Understand Ransomware Attacks: Characteristics, Trends and Challenges. Available from: https://www.secrss.com/articles/33928
[2]	D. J. Du, M. G. Zhu, M. R. Fei, M. Fei, S. Bu, L. Wu, et al., A Review on cybersecurity analysis, attack detection, and attack defense methods in cyber-physical power systems, J. Mod. Power Syst. Clean Energy, 2022 (2022), 1–18. https://doi.org/10.35833/MPCE.2021.000604 doi: 10.35833/MPCE.2021.000604
[3]	Ransomware Attack Forces Shutdown of Largest Fuel Pipeline in the U.S. Available from: https://www.cnbc.com/2021/05/08/colonial-pipeline-shuts-pipeline-operations-after-cyberattack.html
[4]	P. R. Kanna, P. Santhi, Unified deep learning approach for efficient intrusion detection system using integrated spatial–temporal features, Knowl. Based Syst., 226 (2021), 107132. https://doi.org/10.1016/j.knosys.2021.107132 doi: 10.1016/j.knosys.2021.107132
[5]	M. Bijone, A survey on secure network: intrusion detection & prevention approaches, Am. J. Inf. Syst., 4 (2016), 69–88. https://doi.org/10.12691/ajis-4-3-2 doi: 10.12691/ajis-4-3-2
[6]	A. Khraisat, I. Gondal, P. Vamplew, J. Kamruzzaman, Survey of intrusion detection systems: techniques, datasets and challenges, Cybersecurity, 2 (2019), 1–22. https://doi.org/10.1186/s42400-019-0038-7 doi: 10.1186/s42400-018-0018-3
[7]	A. Thakkar, R. Lohiya, A review of the advancement in intrusion detection datasets, Procedia Comput. Sci., 167 (2020), 636–645. https://doi.org/10.1016/j.procs.2020.03.330 doi: 10.1016/j.procs.2020.03.330
[8]	C. Guo, Y. Ping, N. Liu, S. S. Luo, A two-level hybrid approach for intrusion detection, Neurocomputing, 214 (2016), 391–400. https://doi.org/10.1016/j.neucom.2016.06.021 doi: 10.1016/j.neucom.2016.06.021
[9]	Intrusion Detection System. Available from: https://blog.51cto.com/u_12632800/4810474
[10]	I. F. Kilincer, F. Ertam, A. Sengur, Machine learning methods for cyber security intrusion detection: Datasets and comparative study, Comput. Networks, 188 (2021), 107840. https://doi.org/10.1016/j.comnet.2021.107840 doi: 10.1016/j.comnet.2021.107840
[11]	X. Xue, Y. Jia, Y. Tang, Expressway project cost estimation with a convolutional neural network model, IEEE Access, 8 (2020), 217848–217866. https://doi.org/10.1109/ACCESS.2020.3042329 doi: 10.1109/ACCESS.2020.3042329
[12]	N. Sameera, M. Shashi, Encoding approach for intrusion detection using PCA and KNN classifier, in Proceedings of the Third International Conference on Computational Intelligence and Informatics, 1090 (2020), 187–199. https://doi.org/10.1007/978-981-15-1480-7_15
[13]	J. Kevric, J. Samed, S. Abdulhamit, An effective combining classifier approach using tree algorithms for network intrusion detection, Neural Comput. Appl., 28 (2017), 1051–1058. https://doi.org/10.1007/s00521-016-2418-1 doi: 10.1007/s00521-016-2418-1
[14]	M. Yousefnezhad, J. Hamidzadeh, M. Aliannejadi, Ensemble classification for intrusion detection via feature extraction based on deep Learning, Soft Comput., 25 (2021), 12667–12683. https://doi.org/10.1007/s00500-021-06067-8 doi: 10.1007/s00500-021-06067-8
[15]	R. Swami, M. Dave, V. Ranga, Voting-based intrusion detection framework for securing software-defined networks, Concurrency Comput. Pract. Exper., 32 (2020), e5927. https://doi.org/10.1002/cpe.5927 doi: 10.1002/cpe.5927
[16]	A. Basati, M. M. Faghih, PDAE: Efficient network intrusion detection in IoT using parallel deep auto-encoders, Inf. Sci., 598 (2022), 57–74. https://doi.org/10.1016/j.ins.2022.03.065 doi: 10.1016/j.ins.2022.03.065
[17]	A. S. Almogren, Intrusion detection in edge-of-things computing, J. Parallel Distrib. Comput., 137 (2020), 259–265. https://doi.org/10.1016/j.jpdc.2019.12.008 doi: 10.1016/j.jpdc.2019.12.008
[18]	M. S. ElSayed, N. Le-Khac, M. A. Albahar, A. Jurcut, A novel hybrid model for intrusion detection systems in SDNs based on CNN and a new regularization technique, J. Network Comput. Appl., 191 (2021), 1–18. https://doi.org/10.1016/j.jnca.2021.103160 doi: 10.1016/j.jnca.2021.103160
[19]	N. Chouhan, A. Khan, Network anomaly detection using channel boosted and residual learning based deep convolutional neural network, Appl. Soft Comput., 83 (2019), 1–18. https://doi.org/10.1016/j.asoc.2019.105612 doi: 10.1016/j.asoc.2019.105612
[20]	G. Andresini, A. Appice, N. D. Mauro, C. Loglisci, D. Malerba, Exploiting the auto-encoder residual error for intrusion detection, in 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS & PW), (2019), 281–290. https://doi.org/10.1109/EuroSPW.2019.00038
[21]	R. C. Aygun, A. G. Yavuz, Network anomaly detection with stochastically improved autoencoder based models, in 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), (2017), 192–198. https://doi.org/10.1109/CSCloud.2017.39
[22]	Y. Yang, K. Zheng, C. Wu, Y. Yang, Improving the classification effectiveness of intrusion detection by using improved conditional variational autoencoder and deep neural network, Sensors, 19 (2019), 2528. https://doi.org/10.3390/s19112528 doi: 10.3390/s19112528
[23]	B. Min, J. Yoo, S. Kim, D. Shin, Network anomaly detection using memory-augmented deep autoencoder, IEEE Access, 9 (2021), 104695–104706. https://doi.org/10.1109/ACCESS.2021.3100087 doi: 10.1109/ACCESS.2021.3100087
[24]	E. Mushtaq, A. Zameer, M. Umer, A. A. Abbas, A two-stage intrusion detection system with auto-encoder and LSTMs, Appl. Soft Comput., 121 (2022), 1–16. https://doi.org/10.1016/j.asoc.2022.108768 doi: 10.1016/j.asoc.2022.108768
[25]	M. Al-Qatf, Y. Lasheng, M. Al-Habib, K. Al-Sabahi, Deep learning approach combining sparse autoencoder with SVM for network intrusion detection, IEEE Access, 6 (2018), 52843–52856. https://doi.org/10.1109/ACCESS.2018.2869577 doi: 10.1109/ACCESS.2018.2869577
[26]	M. Belouch, S. E. Hadaj, M. Idhammad, A two-stage classifier approach using reptree algorithm for network intrusion detection, Int. J. Adv. Comput. Sci. Appl., 8 (2017), 389–394. https://doi.org/10.14569/IJACSA.2017.080651 doi: 10.14569/IJACSA.2017.080651
[27]	A. Javaid, W. Q. Sun, A. Y. Javaid, M. Alam, A deep learning approach for network intrusion detection system, in Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS), 3 (2016), 1–6. http://dx.doi.org/10.4108/eai.3-12-2015.2262516
[28]	L. X. Zhang, D. Ma, A hybrid approach toward efficient and accurate intrusion detection for in-vehicle networks, IEEE Access, 10 (2022), 10852–10866. http://dx.doi.org/10.1109/ACCESS.2022.3145007 doi: 10.1109/ACCESS.2022.3145007
[29]	J. Gu, L. H. Wang, H. W. Wang, S. S. Wang, A novel approach to intrusion detection using SVM ensemble with feature augmentation, Comput. Secur., 86 (2019), 53–62. https://doi.org/10.1016/j.cose.2019.05.022 doi: 10.1016/j.cose.2019.05.022
[30]	C. Ieracitano, A. Adeel, F. C. Morabito, A. Hussain, A novel statistical analysis and autoencoder driven intelligent intrusion detection approach, Neurocomputing, 387 (2020), 51–62. https://doi.org/10.1016/j.neucom.2019.11.016 doi: 10.1016/j.neucom.2019.11.016
[31]	H. Zhang, J. L. Li, X. M. Liu, C. Dong, Multi-dimensional feature fusion and stacking ensemble mechanism for network intrusion detection, Future Gener. Comput. Syst., 122 (2021), 130–143. https://doi.org/10.1016/j.future.2021.03.024 doi: 10.1016/j.future.2021.03.024
[32]	S. M. Kasongo, Y. X. Sun, Performance analysis of intrusion detection systems using a feature selection method on the UNSW-NB15 dataset, J. Big Data, 7 (2020), 1–20. https://doi.org/10.1186/s40537-020-00379-6 doi: 10.1186/s40537-019-0278-0
[33]	A. A. Megantara, T. Ahmad, A hybrid machine learning method for increasing the performance of network intrusion detection systems, J. Big Data, 8 (2021), 1–19. https://doi.org/10.1186/s40537-021-00531-w doi: 10.1186/s40537-020-00387-6
[34]	M. Rashid, J. Kamruzzaman, T. Imam, S. Wibowo, S. Gordon, A tree-based stacking ensemble technique with feature selection for network intrusion detection, Appl. Intell., 52 (2022), 1–14. https://doi.org/10.1007/s10489-021-02968-1 doi: 10.1007/s10489-021-02377-4
[35]	A. Chohra, P. Shirani, E. B. Karbab, M. Debbabi, Chameleon: Optimized feature selection using particle swarm optimization and ensemble methods for network anomaly detection, Comput. Secur., 117 (2022), 102684. https://doi.org/10.1016/j.cose.2022.102684 doi: 10.1016/j.cose.2022.102684
[36]	B. Y. Tama, M. Comuzzi, K. H. Rhee, TSE-IDS: A two-stage classifier ensemble for intelligent anomaly-based intrusion detection system, IEEE Access, 7 (2019), 94497–94507. https://doi.org/10.1109/ACCESS.2019.2928048 doi: 10.1109/ACCESS.2019.2928048
[37]	B. I. Seraphim, E. Poovammal, K. Ramana, N. Kryvinska, N. Penchalaiah, A hybrid network intrusion detection using darwinian particle swarm optimization and stacked autoencoder hoeffding tree, Math. Biosci. Eng., 18 (2021), 8024–8044. https://doi.org/10.3934/mbe.2021398 doi: 10.3934/mbe.2021398
[38]	S. Seo, S. Park, J. Kim, Improvement of network intrusion detection accuracy by using restricted Boltzmann machine, in 2016 8th International Conference on Computational Intelligence and Communication Networks (CICN), (2016), 413–417. https://doi.org/10.1109/CICN.2016.87
[39]	W. Li, G. Yin, X. Chen, Application of deep extreme learning machine in network intrusion detection systems, IAENG Int. J. Comput. Sci., 47 (2020), 136–143.
[40]	Z. R. Zhao, L. N. Ge, G. F. Zhang, A novel DBN-LSSVM ensemble method for intrusion detection system, in 2021 9th International Conference on Communications and Broadband Networking, (2021), 101–107. https://doi.org/10.1145/3456415.3456431
[41]	H. Zhang, L. N. Ge, Z. Wang, A high performance intrusion detection system using LightGBM based on oversampling and undersampling, in International Conference on Intelligent Computing, 13393 (2022), 638–652. https://doi.org/10.1007/978-3-031-13870-6_53
[42]	G. L. Ke, Q. Meng, T. Finley, T. F. Wang, W. Cheng, W. D. Ma, et al., Lightgbm: A highly efficient gradient boosting decision tree, Adv. Neural Inf. Process. Syst., 30 (2017).
[43]	T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, (2016), 785–794. https://doi.org/10.1145/2939672.2939785
[44]	K. Mo, J. Li, A deep auto-encoder based LightGBM approach for network intrusion detection system, in Proceedings of the International Conference on Advances in Computer Technology, Information Science and Communications, (2019), 142–147. http://doi.org/10.5220/0008098401420147
[45]	T. Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, Focal loss for dense object detection, in Proceedings of the IEEE International Conference on Computer Vision, (2017), 2980–2988.
[46]	Q. Liu, D. Wang, Y. Jia, S. Luo, C. Wang, A multi-task based deep learning approach for intrusion detection, Knowl. Based Syst., 238 (2022), 1–12. https://doi.org/10.1016/j.knosys.2021.107852 doi: 10.1016/j.knosys.2021.107852
[47]	N. Shone, T. N. Ngoc, V. D. Phai, Q. Shi, A deep learning approach to network intrusion detection, IEEE Trans. Emerging Top. Comput. Intell., 2 (2018), 41–50. https://doi.org/10.1109/TETCI.2017.2772792 doi: 10.1109/TETCI.2017.2772792
[48]	S. Naseer, Y. Saleem, S. Khalid, M. K. Bashir, J. Han, M. M. Iqbal, et al., Enhanced network anomaly detection based on deep neural networks, IEEE Access, 6 (2018), 48231–48246. https://doi.org/10.1109/ACCESS.2018.2863036 doi: 10.1109/ACCESS.2018.2863036
[49]	M. Tavallaee, E. Bagheri, W. Lu, A. A. Ghorbani, A detailed analysis of the KDD CUP 99 data set, in 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, (2009), 1–6. https://doi.org/10.1109/CISDA.2009.5356528
[50]	N. Moustafa, J. Slay, UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set), in 2015 Military Communications and Information Systems Conference (MilCIS), (2015), 1–6. https://doi.org/10.1109/MilCIS.2015.7348942
[51]	N. Moustafa, J. Slay, The evaluation of network anomaly detection systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set, Inf. Secur. J. Global Perspect., 25 (2016), 18–31. http://dx.doi.org/10.1080/19393555.2015.1125974 doi: 10.1080/19393555.2015.1125974
[52]	W. J. Lian, G. Q. Nie, B. Jia, D. D. Shi, Q. Fan, Y. Q. Liang, An intrusion detection method based on decision tree-recursive feature elimination in ensemble learning, Math. Prob. Eng., 2020 (2020). https://doi.org/10.1155/2020/2835023 doi: 10.1155/2020/2835023
[53]	LightGBM. Available from: https://lightgbm.readthedocs.io/
[54]	N. Moustafa, J. Slay, G. Creech, Novel geometric area analysis technique for anomaly detection using trapezoidal area estimation on large-scale networks, IEEE Trans. Big Data, 5 (2017), 481–494. https://doi.org/10.1109/TBDATA.2017.2715166 doi: 10.1109/TBDATA.2017.2715166
[55]	B. A. Tama, K. H. Rhee, An in-depth experimental study of anomaly detection using gradient boostedmachine, Neural Comput. Appl., 31 (2019), 955–965. https://doi.org/10.1007/s00521-017-3128-z doi: 10.1007/s00521-017-3128-z

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)