Export file:


  • RIS(for EndNote,Reference Manager,ProCite)
  • BibTex
  • Text


  • Citation Only
  • Citation and Abstract

Apply computer vision in GUI automation for industrial applications

Department of Computer Science and Information Engineering, National Central University, Zhongli District, Taoyuan City 32001, Taiwan

Special Issues: Internet of Things (IoT)-Based Environmental Intelligence

Technology has reshaped the workplace and the rapid improvements have transformed how we work nowadays. In the pursuit of industry 4.0, we build smart machines and robots to replace manual labor. While the manual labor is replaced by machines, in many cases, humans are trans-formed into desktop software users. Jobs such as testing, quality inspection, data monitoring, data entry, and routine editing remain to be done by humans in front of desktop computers. The operations to software applications in principle can be reduced to screen output understanding and mouse and keyboard operations. When the characteristics of these jobs are repetitive, tedious, and monotonous, they can be replaced by GUI automation techniques. GUI automation can be achieved by different un-derlying technologies, each has its pros and cons. In this paper, we describe a tool-Korat, which uses computer-vision to achieve maximum cross-platform capability for industrial applications, including test automation and robotic process automation. Although Korat has been successfully adopted by several industrial customers, difficult problems remain to be addressed. The problems and difficulties in applying computer vision for GUI automation are discussed and studied in this paper, particularly the experiences of applying open source OCR to GUI automation over color screenshots. By intro-ducing critical pre-processing stages and algorithms, the recognition rate is significantly increased and becomes feasible for practical usage.
  Article Metrics

Keywords GUI automation; computer vision; test automation; optical character recognition; image analysis

Citation: Yung-Pin Cheng, Ching-Wei Li, Yi-Cheng Chen. Apply computer vision in GUI automation for industrial applications. Mathematical Biosciences and Engineering, 2019, 16(6): 7526-7545. doi: 10.3934/mbe.2019378


  • 1. B. N. Nguyen and A. M. Memon, An observe-model-exercise* paradigm to test event-driven sys-tems with undetermined input spaces, IEEE T. Software Eng., 40 (2014), 216–234.
  • 2. Selenium Projects, Seleniumhq, browser automation. Available from http://www.seleniumhq. org/, July 2019.
  • 3. Hewlett-Packard Inc., Hp quicktest professional software. Available from https://download.cnet.com/HP-QuickTest-Professional/3000-2383_4-10969380.html.
  • 4. Microsoft, Codeduitest, microsoftuiautomation(uia).Availablefromhttps://en.wikipedia.org/wiki/Microsoft_UI_Automation.
  • 5. EOSS Group, Ranorex, test automation for everyone. Available from https://www.ranorex.com/.
  • 6. Telerik, Test studio, automated testing made easy. Available from http://www.telerik.com/teststudio.
  • 7. UIPath Ltd., UIPath , Accelerate Human Achievement . Available from https://www.uipath. com.
  • 8. T. Yeh, T. Chang and R. C. Miller, Sikuli: using GUI screenshots for search and automation, in Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology, Victoria, BC, Canada, October 4-7, 2009 (A. D. Wilson and F. Guimbretière, eds.), ACM, (2009), 183–192 .
  • 9. T-Plan, T-plan robot. Available from http://www.t-plan.com/robot/.
  • 10. TestPlant, eggplant test automation tools. Available from https://www.testplant.com/eggplant/testing-tools/.
  • 11. Y. Cheng, J. W. Kuo, B. Cheng, et al., platform-independent capture/replay test automation sys-tem," in 17th IEEE International Conference on High Performance Computing and Communica-tions, HPCC 2015, 7th IEEE International Symposium on Cyberspace Safety and Security, CSS 2015, and 12th IEEE International Conference on Embedded Software and Systems, ICESS 2015, New York, NY, USA, IEEE, (2015), 1122–1127.
  • 12. ABBYY Co. Ltd., ABBYY FindReader. Available from https://www.abbyy.com/en-eu/.
  • 13. Google Inc., Cloud vision. Available from https://cloud.google.com/vision/.
  • 14. HP labs, Tesseract-ocr. Available from https://code.google.com/p/tesseract-ocr/.
  • 15. G. Neszaros, xUnit Test Patterns: Refactoring Test Code. Addison-Wesley, 2007.
  • 16. IBM Inc., Rational robot. Available from http://www-01.ibm.com/software/awdtools/tester/robot/index.html.
  • 17. H. Zhu, W. K. Chan, C. J. Budnik, et al., The 5th Workshop on Automation of Software Test, AST 2010, May 3-4, 2010, Cape Town, South Africa, ACM, (2010).
  • 18. C. Hsueh, Y. Cheng and W. Pan, Intrusive test automation with failed test case clustering, in 18th Asia Pacific Software Engineering Conference, APSEC 2011, Ho Chi Minh, Vietnam, December 5-8, 2011 (T. D. Thu and K. R. P. H. Leung, eds.), IEEE Computer Society, (2011), 89–96.
  • 19. A. Mesbah and M. R. Prasad, Automated cross-browser compatibility testing, in Proceedings of the 33rd International Conference on Software Engineering, ICSE '11, (New York, NY, USA), ACM, (2011), 561–570.
  • 20. Software Testing Help, Top 10 cross browser testing tools in 2019 (latest ranking). Available from https://www.softwaretestinghelp.com/best-cross-browser-testing-tools-to-ease-your-browser-compatibility-testing-efforts/.
  • 21. R. Smith, An overview of the tesseract ocr engine, in Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 2 (2007), 629–633.
  • 22. C. Song and V. Shmatikov, Fooling OCR systems with adversarial text images, Computing Re- search Repository, abs/1802.05385, 2018.
  • 23. J. Zhang and J. Hu, Image segmentation based on 2d otsu method with histogram analysis, in 2008 International Conference on Computer Science and Software Engineering, 6 (2009), 105–108.
  • 24. T. Kasar, J. Kumar and A. G. Ramakrishnan, Font and background color independent text bina-rization, in In Proceedings of 2nd International Workshop on Camera Based Document Analysis and Recognition, (2007), 3–9.


This article has been cited by

  • 1. Dejan Viduka, Vladimir Kraguljac, Comparison of human computer interaction on Windows and Linux operations systems, Trendovi u poslovanju, 2019, 7, 2, 95, 10.5937/trendpos1902095V

Reader Comments

your name: *   your email: *  

© 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution Licese (http://creativecommons.org/licenses/by/4.0)

Download full text in PDF

Export Citation

Copyright © AIMS Press All Rights Reserved