Identifying VoIP traffic in VPN tunnel via Flow Spatio-Temporal Features

Faiz Ul Islam; Guangjie Liu; Weiwei Liu; Faiz Ul Islam; Guangjie Liu; Weiwei Liu

doi:10.3934/mbe.2020260

Mathematical Biosciences and Engineering

2020, Volume 17, Issue 5: 4747-4772. doi: 10.3934/mbe.2020260

Previous Article Next Article

Research article Special Issues

Identifying VoIP traffic in VPN tunnel via Flow Spatio-Temporal Features

1.
School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China
2.
School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China

Received: 02 April 2020 Accepted: 27 June 2020 Published: 09 July 2020

The persistent emergence of new network applications, along with encrypted network communication, has make traffic analysis become a challenging issue in network management and cyberspace security. Currently, virtual private network (VPNs) has become one of the most popular encrypted communication services for bypassing censorship and guarantee remote access to geographically locked services. In this paper, a novel identification scheme of VoIP traffic tunneled through VPN is proposed. We employed a set of Flow Spatio-Temporal Features (FSTF) to six well-known classifiers, including decision trees, K-Nearest Neighbor (KNN), Bagging and Boosting via C4.5, and Multi-Layer perceptron (MLP). The overall accuracy, precision, sensitivity, and F-measure verify that the proposed scheme can effectively distinguish between the VoIP flows and Non-VoIP ones in VPN traffic.
- encrypted traffic,
- Flow Spatio-Temporal Features,
- machine learning,
- virtual private network (VPN),
- Voice over IP (VoIP)
Citation: Faiz Ul Islam, Guangjie Liu, Weiwei Liu. Identifying VoIP traffic in VPN tunnel via Flow Spatio-Temporal Features[J]. Mathematical Biosciences and Engineering, 2020, 17(5): 4747-4772. doi: 10.3934/mbe.2020260

Related Papers:

Abstract

The persistent emergence of new network applications, along with encrypted network communication, has make traffic analysis become a challenging issue in network management and cyberspace security. Currently, virtual private network (VPNs) has become one of the most popular encrypted communication services for bypassing censorship and guarantee remote access to geographically locked services. In this paper, a novel identification scheme of VoIP traffic tunneled through VPN is proposed. We employed a set of Flow Spatio-Temporal Features (FSTF) to six well-known classifiers, including decision trees, K-Nearest Neighbor (KNN), Bagging and Boosting via C4.5, and Multi-Layer perceptron (MLP). The overall accuracy, precision, sensitivity, and F-measure verify that the proposed scheme can effectively distinguish between the VoIP flows and Non-VoIP ones in VPN traffic.

References

[1]	M. Shen, M. W. Wei, L. H. Zhu, M. Z. Wang, Classification of encrypted traffic with second-order markov chains and application attribute bigrams, IEEE Trans. Inf. Forensics Secur., 12 (2017), 1830-1843.
[2]	Y. N. Dong, J. J. Zhao, J. Jin Novel feature selection and classification of internet video traffic based on a hierarchical scheme, Comput. Networks, 119 (2017), 102-111.
[3]	S. E. Middleton, S. Modafferi, Scalable classification of QoS for real-time interactive applications from IP traffic measurements, Comput. Networks, 107 (2016), 121-132.
[4]	P. Burnap, M. L. Williams, Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making, Policy Internet, 7 (2015), 223-242.
[5]	M. Korczyński, A. Duda, Markov chain fingerprinting to classify encrypted traffic, In IEEE INFOCOM 2014-IEEE Conference on Computer Communications, 2014, 781-789.
[6]	P. Velan, M. Cermák, P. Čeleda, M. Drašar, A survey of methods for encrypted traffic classification and analysis, Int. J. Network Manage., 25 (2015), 355-374.
[7]	Z. Cao, G. Xiong, Y. Zhao, Z. Z. Li, L. Guo, A survey on encrypted traffic classification, In International Conference on Applications and Techniques in Information Security, Springer, Berlin, Heidelberg, 2014, 73-81.
[8]	W. B. Diab, S. Tohme, C. Bassil, Critical vpn security analysis and new approach for securing voip communications over vpn networks, In Proceedings of the 3rd ACM workshop on Wireless multimedia networking and performance modeling, 2007, 92-96.
[9]	J. Khalife, A. Hajjar, J. Diaz-Verdejo, A multilevel taxonomy and requirements for an optimal traffic-classification model, Int. J. Network Manage., 24 (2014), 101-120.
[10]	N. Namdev, S. Agrawal, S. Silkari, Recent advancement in machine learning based internet traffic classification, Proc. Comput. Sci., 60 (2015), 784-791.
[11]	M. Finsterbusch, C. Richter, E. Rocha, J. A. Muller, K. Hanssgen, A survey of payload-based traffic classification approaches, IEEE Commun. Surv. Tutorials, 16 (2013), 1135-1156.
[12]	K. S. Shim, J. H. Ham, Baraka D. Sija, M. S. Kim, Application traffic classification using payload size sequence signature, Int. J. Network Manage., 27 (2017), e1981.
[13]	T. T. Nguyen, G. Armitage, A survey of techniques for internet traffic classification using machine learning, IEEE commun. Surv. Tutorials, 10 (2008), 56-76.
[14]	J. Erman, M. Arlitt, A. Mahanti, Traffic classification using clustering algorithms, In Proceedings of the 2006 SIGCOMM workshop on Mining network data, 2006, 281-286.
[15]	R. Keralapura, A. Nucci, C. Chuah, A novel self-learning architecture for p2p traffic classification in high speed networks, Comput. Networks, 54 (2010), 1055-1068.
[16]	J. Zhang, Y. Xiang, W. L. Zhou, Y. Wang, Unsupervised traffic classification using flow statistical properties and IP packet payload, J. Comput. Syst. Sci., 79 (2013), 573-585.
[17]	Y. Wang, Y. Xiang, J. Zhang, W. L. Zhou, G. Y. Wei, L. T. Yang, Internet traffic classification using constrained clustering, IEEE Trans. Parallel Distrib. Syst., 25 (2014), 2932-2943.
[18]	A. Este, F. Gringoli, L. Salgarelli, Support vector machines for TCP traffic classification, Comput. Networks, 53 (2009), 2476-2490.
[19]	A. Finamore, M. Mellia, M. Meo, D. Rossi, Kiss: Stochastic packet inspection classifier for udp traffic, IEEE/ACM Trans. Networking, 18 (2010), 1505-1515.
[20]	L. Zhenxiang, H. Mingbo, L. Song, W. Xin, Research of P2P traffic comprehensive identification method, In 2011 International Conference on Network Computing and Information Security, 2011, 307-310.
[21]	D. J. Arndt, A. Nur Zincir-Heywood, A comparison of three machine learning techniques for encrypted network traffic analysis, In 2011 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2011, 107-114.
[22]	R. Alshammari, A. Nur Zincir-Heywood, Can encrypted traffic be identified without port numbers, IP addresses and payload inspection?, Comput. networks, 55 (2011), 1326-1350.
[23]	T. T. Nguyen, G. Armitage, P. Branch, S. Zander, Timely and continuous machine-learning-based classification for interactive IP traffic, IEEE/ACM Trans. Networking, 20 (2012), 1880-1894.
[24]	W. Ye, K. Cho, Hybrid P2P traffic classification with heuristic rules and machine learning, Soft Comput., 18 (2014), 1815-1827.
[25]	L. Peng, B. Yang, Y. H. Chen, Effective packet number for early stage internet traffic identification, Neurocomputing, 156 (2015), 252-267.
[26]	J. Zhang, C. Chen, Y. Xiang, W. L. Zhou, Semi-supervised and compound classification of network traffic, In 2012 32nd International Conference on Distributed Computing Systems Workshops, 2012, 617-621.
[27]	J. Yuan, Z. Li, R. Yuan, Information entropy based clustering method for unsupervised internet traffic classification, In 2008 IEEE International Conference on Communications, 2008, 1588-1592.
[28]	M. Zhang, H. L. Zhang, B. Zhang, G.Lu, Encrypted traffic classification based on an improved clustering algorithm, In International Conference on Trustworthy Computing and Services, Springer, Berlin, Heidelberg, 2012, 124-131.
[29]	V. Paxson, Empirically derived analytic models of wide-area TCP connections, IEEE/ACM Trans. Networking, 2 (1994), 316-336.
[30]	V. Paxson, S. Floyd, Wide area traffic: The failure of Poisson modeling, IEEE/ACM Trans. Networking, 3 (1995), 226-244.
[31]	A. McGregor, M. Hall, P. Lorier, J. Brunskill, Flow clustering using machine learning techniques, In International workshop on passive and active network measurement, Springer, Berlin, Heidelberg, 2004, 205-214.
[32]	T. Auld, A. W. Moore, S. F. Gull, Bayesian neural networks for internet traffic classification, IEEE Trans. Neural Networks, 18 (2007), 223-239.
[33]	J. Erman, A. Mahanti, M. Arlitt, I. Cohen, C. Williamson, Offline/realtime traffic classification using semi-supervised learning, Perform. Eval., 64 (2007), 1194-1213.
[34]	W. Li, M. Canini, A. W. Moore, R. Bolla, Efficient application identification and the temporal and spatial stability of classification schema, Comput. Networks, 53 (2009), 790-809.
[35]	C. Bacquet, K. Gumus, D. Tizer, A. Nur Zincir-Heywood, M. I. Heywood, A comparison of unsupervised learning techniques for encrypted traffic identification, J. Inf. Assur. Secur., 5 (2010), 464-472.
[36]	D. Arndt, How to: Calculating flow statistics using netmate, 2011. Available from: http://dan.arndt.ca/nims/calculating-flow-statistics-using-netmate/.
[37]	J. Zhang, C. Chen, Y. Xiang, W. L. Zhou, Y. Xiang, Internet traffic classification by aggregating correlated naive bayes predictions, IEEE Trans. Inf. Forensics Secur., 8 (2013), 5-15.
[38]	N. F. Huang, G. Y. Jai, H. C. Chao, Y. J. Tzang, H. Y. Chang, Application traffic classification at the early stage by characterizing application rounds, Inf. Sci., 232 (2013), 130-142.
[39]	Y. J. Fu, H. Xiong, X. Lu, J. Yang, C. Chen, Service usage classification with encrypted internet traffic in mobile messaging apps, IEEE Trans. Mobile Comput., 15 (2016), 2851-2864.
[40]	M. Conti, L. V. Mancini, R. Spolaor, N. V. Verde, Analyzing android encrypted network traffic to identify user actions, IEEE Trans. Inf. Forensics Secur., 11 (2016), 114-125.
[41]	Z. Liu, R. Wang, D. Tang, Extending labeled mobile network traffic data by three levels traffic identification fusion, Future Gener. Comput. Syst., 88 (2018), 453-466.
[42]	G. Aceto, D. Ciuonzo, A. Montieri, A. Pescap, Multi-classification approaches for classifying mobile app traffic, J. Network Comput. Appl., 103 (2018), 131-145.
[43]	K. L. Dias, M. A. Pongelupe, W. M. Caminhas, L. de Errico, An innovative approach for real-time network traffic classification, Comput. Networks, 158 (2019), 143-157.
[44]	A. J. Pinheiro, J. de M. Bezerra, C. A. Burgardt, D. R. Campelo, Identifying IoT devices and events based on packet length from encrypted traffic, Comput. Commun., 144 (2019), 8-17.
[45]	Y. M. Choi, On the accuracy of signature-based traffic identification technique in IP networks, In 2007 2nd IEEE/IFIP International Workshop on Broadband Convergence Networks, 2007, 1-12.
[46]	B. C. Park, Y. J. Won, M. S. Kim, J. W. Hong, Towards automated application signature generation for traffic identification, In NOMS 2008-2008 IEEE Network Operations and Management Symposium, 2008, 160-167.
[47]	T. Okabe, T. Kitamura, T. Shizuno, Statistical traffic identification method based on flow-level behavior for fair VoIP service, In 1st IEEE Workshop on VoIP Management and Security, 2006, 35-40.
[48]	D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli, Revealing skype traffic: When randomness plays with you, SIGCOMM '07: Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications, 2007, 37-48.
[49]	R. Alshammari, A. Nur Zincir-Heywood, An investigation on the identification of VoIP traffic: Case study on Gtalk and Skype, In 2010 International Conference on Network and Service Management, 2010, 310-313.
[50]	H. A. H. Ibrahim, S. M. Nor, A. Mohammed, A. B. Mohammed, Taxonomy of machine learning algorithms to classify real time interactive applications, Int. J. Comput. Networks Wireless Commun., 2 (2012), 69-73.
[51]	D. Adami, C. Callegari, S. Giordano, M. Pagano, T. Pepe, Skype-Hunter: A real-time system for the detection and classification of Skype traffic, Int. J. Commun. Syst., 25 (2012), 386-403.
[52]	L. A. Khan, M. S. Baig, A. M. Youssef, Speaker recognition from encrypted VoIP communications, Digital Invest., 7 (2010), 65-73.
[53]	T. Yildirim, P. J. Radcliffe, VoIP traffic classification in IPSec tunnels, In 2010 International Conference on Electronics and Information Engineering, 2010, 151-157.
[54]	B. Li, M. Ma, Z. G. Jin, A VoIP traffic identification scheme based on host and flow behavior analysis, J. Network Syst. Manage., 19 (2011), 111-129.
[55]	R. Alshammari, A. Nur Zincir-Heywood, Identification of VoIP encrypted traffic using a machine learning approach, J. King Saud Univ. Comput. Inf. Sci., 27 (2015), 77-92.
[56]	T. Qin, L. Wang, Z. L. Liu, X. H. Guan, Robust application identification methods for P2P and VoIP traffic classification in backbone network, Knowl. Based Syst., 82 (2015), 152-162.
[57]	M. M. Rathore, A. Ahmad, A. Paul, S. Rho, Exploiting encrypted and tunneled multimedia calls in high-speed big data environment, Multimedia Tools and Appl., 77 (2018), 4959-4984.
[58]	G. Draper-Gil, A. H. Lashkari, M. S. Mamun, A. A. Ghorbani, Characterization of encrypted and vpn traffic using time-related features, In Proceedings of the 2nd international conference on information systems security and privacy (ICISSP), 2016, 407-414.
[59]	H. L. Arash, G. Draper-Gil, M. S. Mamun, Ali A. Ghorbani, CICFlowMeter: Network traffic flow generator and analyser, Available from: https://www.unb.ca/cic/research/applications.html, 2017.
[60]	J. R. Quinlan, C4.5: Program for machine learning, San Mateo, California, Morgan Kaufmann Publishers, 1993.
[61]	W. X. Sun, J. Chen, J. Q. Li, Decision tree and PCA-based fault diagnosis of rotating machinery, Mech. Syst. Signal Process., 21 (2007), 1300-1317.
[62]	L. Breiman, Random forests, Mach. Learn., 45 (2001), 5-32.
[63]	Y. Freund, R. E. Schapire, A desicion-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 55 (1997), 23-27.
[64]	L. Breiman, Bagging predictors, Mach. Learn., 24 (1996), 123-140.

Reader Comments

Your name:*

Email:*
© 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)