Weapon operating pose detection and suspicious human activity classification using skeleton graphs

Anant Bhatt; Amit Ganatra; Anant Bhatt; Amit Ganatra

doi:10.3934/mbe.2023125

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 2: 2669-2690. doi: 10.3934/mbe.2023125

Previous Article Next Article

Research article

Weapon operating pose detection and suspicious human activity classification using skeleton graphs

Anant Bhatt ^,,
Amit Ganatra

Devang Patel Institute of Advance Technology and Research (DEPSTAR), Charotar University of Science and Technology (CHARUSAT), Nadiad Petlad Road, Changa, Gujarat-388421, India

Academic Editor: Biswajeet Pradhan

Received: 14 July 2022 Revised: 24 October 2022 Accepted: 31 October 2022 Published: 28 November 2022

Spurt upsurge in violent protest and armed conflict in populous, civil areas has upstretched momentous concern worldwide. The unrelenting strategy of the law enforcement agencies focuses on thwarting the conspicuous impact of violent events. Increased surveillance using a widespread visual network supports the state actors in maintaining vigilance. Minute, simultaneous monitoring of numerous surveillance feeds is a workforce-intensive, idiosyncratic, and otiose method. Significant advancements in Machine Learning (ML) show potential in realizing precise models to detect suspicious activities in the mob. Existing pose estimation techniques have privations in detecting weapon operation activity. The paper proposes a comprehensive, customized human activity recognition approach using human body skeleton graphs. The VGG-19 backbone extracted 6600 body coordinates from the customized dataset. The methodology categorizes human activities into eight classes experienced during violent clashes. It facilitates alarm triggers in a specific activity, i.e., stone pelting or weapon handling while walking, standing, and kneeling is considered a regular activity. The end-to-end pipeline presents a robust model for multiple human tracking, mapping a skeleton graph for each person in consecutive surveillance video frames with the improved categorization of suspicious human activities, realizing effective crowd management. LSTM-RNN Network, trained on a customized dataset superimposed with Kalman filter, attained 89.09% accuracy for real-time pose identification.
Citation: Anant Bhatt, Amit Ganatra. Weapon operating pose detection and suspicious human activity classification using skeleton graphs[J]. Mathematical Biosciences and Engineering, 2023, 20(2): 2669-2690. doi: 10.3934/mbe.2023125

Related Papers:

Abstract

Spurt upsurge in violent protest and armed conflict in populous, civil areas has upstretched momentous concern worldwide. The unrelenting strategy of the law enforcement agencies focuses on thwarting the conspicuous impact of violent events. Increased surveillance using a widespread visual network supports the state actors in maintaining vigilance. Minute, simultaneous monitoring of numerous surveillance feeds is a workforce-intensive, idiosyncratic, and otiose method. Significant advancements in Machine Learning (ML) show potential in realizing precise models to detect suspicious activities in the mob. Existing pose estimation techniques have privations in detecting weapon operation activity. The paper proposes a comprehensive, customized human activity recognition approach using human body skeleton graphs. The VGG-19 backbone extracted 6600 body coordinates from the customized dataset. The methodology categorizes human activities into eight classes experienced during violent clashes. It facilitates alarm triggers in a specific activity, i.e., stone pelting or weapon handling while walking, standing, and kneeling is considered a regular activity. The end-to-end pipeline presents a robust model for multiple human tracking, mapping a skeleton graph for each person in consecutive surveillance video frames with the improved categorization of suspicious human activities, realizing effective crowd management. LSTM-RNN Network, trained on a customized dataset superimposed with Kalman filter, attained 89.09% accuracy for real-time pose identification.

References

[1]	A. International, Gun violence–key facts, 2017. Available from: https://www.amnesty.org/en/what-we-do/arms-control/gun-violence/.
[2]	A. R. Bhatt, A. Ganatra, K. Kotecha, Cervical cancer detection in pap smear whole slide images using convnet with transfer learning and progressive resizing, PeerJ Comput. Sci., 7 (2021). http://dx.doi.org/10.7717/peerj-cs.348
[3]	A. Bhatt, A. Ganatra, K. Kotecha, Covid-19 pulmonary consolidations detection in chest x-ray using progressive resizing and transfer learning techniques, Heliyon, 2021 (2021). http://dx.doi.org/10.1016/j.heliyon.2021.e07211
[4]	A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with Deep Convolutional Neural Networks, Commun. ACM, 60 (2017), 84–90. http://dx.doi.org/10.1145/3065386 doi: 10.1145/3065386
[5]	M. T. Bhatti, M. G. Khan, M. Aslam, M. J. Fiaz, Weapon detection in real-time cctv videos using deep learning, IEEE Access, 9 (2021), 34366–34382. http://dx.doi.org/10.1109/ACCESS.2021.3059170 doi: 10.1109/ACCESS.2021.3059170
[6]	N. Dwivedi, D. K. Singh, D. S. Kushwaha, Weapon classification using Deep Convolutional Neural Network, in 2019 IEEE Conference on Information and Communication Technology, IEEE, 2019, 1–5. http://dx.doi.org/10.1109/CICT48419.2019.9066227
[7]	A. Bhatt, A. Ganatra, Explosive weapons and arms detection with singular classification (WARDIC) on novel weapon dataset using deep learning: enhanced OODA loop, Eng. Sci., 20 (2022). http://dx.doi.org/10.30919/es8e718
[8]	M. Dantone, J. Gall, C. Leistner, L. Van Gool, Human pose estimation using body parts dependent joint regressors, in 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, 3041–3048. http://dx.doi.org/10.1109/CVPR.2013.391
[9]	Z. Cao, T. Simon, S. E. Wei, Y. Sheikh, Realtime multi-person 2d pose estimation using part affinity fields, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, 7291–7299. http://dx.doi.org/10.1109/CVPR.2017.143
[10]	X. Ji, H. Liu, Advances in view-invariant human motion analysis: a review, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 40 (2010), 13–24. http://dx.doi.org/10.1109/TSMCC.2009.2027608 doi: 10.1109/TSMCC.2009.2027608
[11]	D. M. Gavrila, The visual analysis of human movement: a survey, Comput. Vision Image Understanding, 73 (1999), 82–98. http://dx.doi.org/10.1006/cviu.1998.0716 doi: 10.1006/cviu.1998.0716
[12]	T. B. Moeslund, A. Hilton, V. Krüger, L. Sigal, Visual Analysis of Humans, Springer, 2011. http://dx.doi.org/10.1007/978-0-85729-997-0
[13]	R. Poppe, Vision-based human motion analysis: an overview, Front. Sports Active Living, 108 (2007), 4–18. http://dx.doi.org/10.1016/j.cviu.2006.10.016 doi: 10.1016/j.cviu.2006.10.016
[14]	J. K. Aggarwal, Q. Cai, Human motion analysis: a review, Comput. Vision Image Understanding, 73 (1999), 428–440. http://dx.doi.org/10.1006/cviu.1998.0744 doi: 10.1006/cviu.1998.0744
[15]	W. Hu, T. Tan, L. Wang, S. Maybank, A survey on visual surveillance of object motion and behaviors, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 34 (2004), 334–352. http://dx.doi.org/10.1109/TSMCC.2004.829274 doi: 10.1109/TSMCC.2004.829274
[16]	T. B. Moeslund, E. Granum, A survey of computer vision-based human motion capture, Comput. Vision Image Understanding, 81 (2001), 231–268. http://dx.doi.org/10.1006/cviu.2000.0897 doi: 10.1006/cviu.2000.0897
[17]	T. B. Moeslund, A. Hilton, V. Krüger, A survey of advances in vision-based human motion capture and analysis, Comput. Vision Image Understanding, 104 (2006), 90–126. http://dx.doi.org/10.1016/j.cviu.2006.08.002 doi: 10.1016/j.cviu.2006.08.002
[18]	M. B. Holte, C. Tran, M. M. Trivedi, T. B. Moeslund, Human pose estimation and activity recognition from multi-view videos: comparative explorations of recent developments, IEEE J. Sel. Top. Signal Process., 6 (2012), 538–552. http://dx.doi.org/10.1109/JSTSP.2012.2196975 doi: 10.1109/JSTSP.2012.2196975
[19]	X. Perez-Sala, S. Escalera, C. Angulo, J. Gonzalez, A survey on model based approaches for 2d and 3d visual human pose recovery, Sensors, 14 (2014), 4189–4210.
[20]	Z. Liu, J. Zhu, J. Bu, C. Chen, A survey of human pose estimation: the body parts parsing based methods, J. Visual Commun. Image Represent., 32 (2015), 10–19. http://dx.doi.org/10.1016/j.jvcir.2015.06.013 doi: 10.1016/j.jvcir.2015.06.013
[21]	W. Gong, X. Zhang, J. Gonzàlez, A. Sobral, T. Bouwmans, C. Tu, et al., Human pose estimation from monocular images: a comprehensive survey, Sensors, 16 (2016), 1966. http://dx.doi.org/10.3390/s16121966 doi: 10.3390/s16121966
[22]	P. F. Felzenszwalb, D. P. Huttenlocher, Pictorial structures for object recognition, Int. J. Comput. Vision, 61 (2005), 55–79. http://dx.doi.org/10.1023/B:VISI.0000042934.15159.49 doi: 10.1023/B:VISI.0000042934.15159.49
[23]	S. Qiao, Y. Wang, J. Li, Real-time human gesture grading based on openpose, in 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 2017, 1–6. http://dx.doi.org/10.1109/CISP-BMEI.2017.8301910
[24]	D. Osokin, Real-time 2d multi-person pose estimation on cpu: lightweight openpose, preprint, arXiv: 1811.12004.
[25]	N. Nakano, T. Sakura, K. Ueda, L. Omura, A. Kimura, Y. Iino, et al., Evaluation of 3d markerless motion capture accuracy using openpose with multiple video cameras, Front. Sports Active Living, 2 (2020), 50. http://dx.doi.org/10.3389/fspor.2020.00050 doi: 10.3389/fspor.2020.00050
[26]	W. Chen, Z. Jiang, H. Guo, X. Ni, Fall detection based on key points of human-skeleton using openpose, Symmetry, 12 (2020), 744. http://dx.doi.org/10.3390/sym12050744 doi: 10.3390/sym12050744
[27]	C. B. Lin, Z. Dong, W. K. Kuan, Y. F. Huang, A framework for fall detection based on openpose skeleton and lstm/gru models, Appl. Sci., 11 (2020), 329. http://dx.doi.org/10.3390/app11010329 doi: 10.3390/app11010329
[28]	A. Viswakumar, V. Rajagopalan, T. Ray, C. Parimi, Human gait analysis using openpose, in 2019 Fifth International Conference on Image Information Processing (ICIIP), IEEE, 2019,310–314. http://dx.doi.org/10.1109/ICIIP47207.2019.8985781
[29]	D. Yang, M. M. Li, H. Fu, J. Fan, H. Leung, Centrality Graph Convolutional Networks for skeleton-based action recognition, preprint, arXiv: 2003.03007.
[30]	M. Fanuel, X. Yuan, H. N. Kim, L. Qingge, K. Roy, A survey on skeleton-based activity recognition using Graph Convolutional Networks (GCN), in 2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA), 2021,177–182. http://dx.doi.org/10.1109/ISPA52656.2021.9552064
[31]	Z. Hu, E. J. Lee, Dual attention-guided multiscale dynamic aggregate Graph Convolutional Networks for skeleton-based human action recognition, Symmetry, 12 (2020), 1589. http://dx.doi.org/10.3390/sym12101589 doi: 10.3390/sym12101589
[32]	L. Zhao, X. Peng, Y. Tian, M. Kapadia, D. N. Metaxas, Semantic Graph Convolutional Networks for 3d human pose regression, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, 3425–3435. http://dx.doi.org/10.1109/CVPR.2019.00354
[33]	M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, Q. Tian, Actional-structural Graph Convolutional Networks for skeleton-based action recognition, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, 3595–3603. http://dx.doi.org/10.1109/CVPR.2019.00371
[34]	K. Thakkar, P. Narayanan, Part-based Graph Convolutional Network for action recognition, preprint, arXiv: 1809.04983.
[35]	M. Li, S. Gao, F. Lu, K. Liu, H. Zhang, W. Tu, Prediction of human activity intensity using the interactions in physical and social spaces through Graph Convolutional Vetworks, Int. J. Geog. Inf. Sci., 35 (2021), 2489–2516. http://dx.doi.org/10.1080/13658816.2021.1912347 doi: 10.1080/13658816.2021.1912347
[36]	W. Liu, S. Fu, Y. Zhou, Z. J. Zha, L. Nie, Human activity recognition by manifold regularization based dynamic Graph Convolutional Networks, Neurocomputing, 444 (2021), 217–225. http://dx.doi.org/10.1016/j.neucom.2019.12.150 doi: 10.1016/j.neucom.2019.12.150
[37]	M. Korban, X. Li, Ddgcn: a dynamic directed Graph Convolutional Network for action recognition, in European Conference on Computer Vision, 2020,761–776. http://dx.doi.org/10.1007/978-3-030-58565-5_45
[38]	F. Manessi, A. Rozza, M. Manzo, Dynamic Graph Convolutional Networks, Pattern Recognit., 97 (2020), 107000. http://dx.doi.org/10.1016/j.patcog.2019.107000 doi: 10.1016/j.patcog.2019.107000
[39]	R. Zeng, W. Huang, M. Tan, Y. Rong, P. Zhao, J. Huang, et al., Graph Convolutional Networks for temporal action localization, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, 7094–7103. http://dx.doi.org/10.1109/ICCV.2019.00719
[40]	H. Yang, D. Yan, L. Zhang, Y. Sun, D. Li, S. J. Maybank, Feedback Graph Convolutional Network for skeleton-based action recognition, IEEE Trans. Image Process., 31 (2021), 164–175. http://dx.doi.org/10.1109/TIP.2021.3129117 doi: 10.1109/TIP.2021.3129117
[41]	J. Sanchez, C. Neff, H. Tabkhi, Real-world Graph Convolution Networks (rw-gcns) for action recognition in smart video surveillance, in 2021 IEEE/ACM Symposium on Edge Computing (SEC), 2021,121–134. https://doi.org/10.1145/3453142.3491293
[42]	L. Feng, Q. Yuan, Y. Liu, Q. Huang, S. Liu, Y. Li, A discriminative stgcn for skeleton oriented action recognition, in International Conference on Neural Information Processing, 2020, 3–10. http://dx.doi.org/10.1007/978-3-030-63823-8_1
[43]	T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, et al., Microsoft coco: common objects in context, in European Conference on Computer Vision, 2014,740–755. https://doi.org/10.1007/978-3-319-10602-1_48
[44]	M. Andriluka, L. Pishchulin, P. Gehler, B. Schiele, 2D human pose estimation: new benchmark and state of the art analysis, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, 3686–3693. http://dx.doi.org/10.1109/CVPR.2014.471
[45]	S. Johnson, M. Everingham, Clustered pose and nonlinear appearance models for human pose estimation, in Proceedings of the British Machine Vision Conference, 2010, 12.1–12.11. http://dx.doi.org/10.5244/C.24.12
[46]	B. Sapp, B. Taskar, Modec: multimodal decomposable models for human pose estimation, in 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, 3674–3681. http://dx.doi.org/10.1109/CVPR.2013.471
[47]	M. Andriluka, U. Iqbal, E. Insafutdinov, L. Pishchulin, A. Milan, J. Gall, et al., Posetrack: a benchmark for human pose estimation and tracking, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 5167–5176. http://dx.doi.org/10.1109/CVPR.2018.00542
[48]	J. Wu, H. Zheng, B. Zhao, Y. Li, B. Yan, R. Liang, et al., Large-scale datasets for going deeper in image understanding, in 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019, 1480–1485. http://dx.doi.org/10.1109/ICME.2019.00256
[49]	W. Mao, Y. Ge, C. Shen, Z. Tian, X. Wang, Z. Wang, Tfpose: direct human pose estimation with transformers, preprint, arXiv: 2103.15320.
[50]	Y. Abouelnaga, H. M. Eraqi, M. N. Moustafa, Real-time distracted driver posture classification, preprint, arXiv: 1706.09498.
[51]	K. Simonyan, A. Vedaldi, A. Zisserman, Deep Inside Convolutional Networks: visualising image classification models and saliency maps, preprint, arXiv: 1312.6034.
[52]	K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for large-scale image recognition, preprint, arXiv: 1409.1556.
[53]	M. Z. Alom, T. M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M. S. Nasrin, et al., The history began from alexnet: a comprehensive survey on deep learning approaches, preprint, arXiv: 1803.01164.
[54]	Q. Zhang, Y. N. Wu, S. C. Zhu, Interpretable Convolutional Neural Networks, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 8827–8836. http://dx.doi.org/10.1109/CVPR.2018.00920
[55]	G. Hidalgo, Y. Raaj, H. Idrees, D. Xiang, H. Joo, T. Simon, et al., Single-network whole-body pose estimation, preprint, arXiv: 1909.13423.
[56]	A. Neubeck, L. Van Gool, Efficient non-maximum suppression, in 18th International Conference on Pattern Recognition (ICPR'06), 2006,850–855. http://dx.doi.org/10.1109/ICPR.2006.479
[57]	L. Cai, B. Zhao, Z. Wang, J. Lin, C. S. Foo, M. S. Aly, et al., Maxpoolnms: getting rid of NMS bottlenecks in two-stage object detectors, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, 9356–9364. http://dx.doi.org/10.1109/CVPR.2019.00958
[58]	S. Goswami, Reflections on Non-Maximum Suppression (NMS), 2020.
[59]	D. Wang, C. Li, S. Wen, Q. L. Han, S. Nepal, X. Zhang, et al., Daedalus: breaking nonmaximum suppression in object detection via adversarial examples, IEEE Trans. Cybern., http://dx.doi.org/10.1109/TCYB.2020.3041481
[60]	I. Ahmed, M. Ahmad, A. Ahmad, G. Jeon, Top view multiple people tracking by detection using deep sort and yolov3 with transfer learning: within 5g infrastructure, Int. J. Mach. Learn. Cybern., 12 (2021), 3053–3067, http://dx.doi.org/10.1007/s13042-020-01220-5 doi: 10.1007/s13042-020-01220-5
[61]	N. Wojke, A. Bewley, D. Paulus, Simple online and realtime tracking with a deep association metric, in 2017 IEEE International Conference on Image Processing (ICIP), 2017, 3645–3649. http://dx.doi.org/10.1109/ICIP.2017.8296962
[62]	S. Challa, M. R. Morelande, D. Mušicki, R. J. Evans, Fundamentals of Object Tracking, Cambridge University Press, 2011. http://dx.doi.org/10.1017/CBO9780511975837
[63]	A. Yilmaz, O. Javed, M. Shah, Object tracking: a survey, ACM Comput. Surv. (CSUR), 38 (2006). http://dx.doi.org/10.1145/1177352.1177355
[64]	H. Fan, M. Jiang, L. Xu, H. Zhu, J. Cheng, J. Jiang, Comparison of long short term memory networks and the hydrological model in runoff simulation, Water, 12 (2020), 175. http://dx.doi.org/10.3390/w12010175 doi: 10.3390/w12010175
[65]	A. Agarwal, S. Suryavanshi, Real-time* multiple object tracking (mot) for autonomous navigation, Tech. Rep. Available from: http://cs231n.stanford.edu/reports/2017/pdfs/630.pdf.
[66]	D. P. Kingma, J. Ba, Adam: a method for stochastic optimization, preprint, arXiv: 1412.6980.
[67]	J. Teow, Understanding kalman filters with python, 2017.
[68]	J. Song, L. Wang, L. Van Gool, O. Hilliges, Thin-slicing network: a deep structured model for pose estimation in videos, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, 4220–4229. http://dx.doi.org/10.1109/CVPR.2017.590
[69]	Y. Luo, J. Ren, Z. Wang, W. Sun, J. Pan, J. Liu, et al., Lstm pose machines, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 5207–5215. http://dx.doi.org/10.1109/CVPR.2018.00546

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)