Apply computer vision in GUI automation for industrial applications

Yung-Pin Cheng; Ching-Wei Li; Yi-Cheng Chen; Yung-Pin Cheng; Ching-Wei Li; Yi-Cheng Chen

doi:10.3934/mbe.2019378

Mathematical Biosciences and Engineering

2019, Volume 16, Issue 6: 7526-7545. doi: 10.3934/mbe.2019378

Previous Article Next Article

Research article Special Issues

Apply computer vision in GUI automation for industrial applications

Department of Computer Science and Information Engineering, National Central University, Zhongli District, Taoyuan City 32001, Taiwan

Received: 21 May 2019 Accepted: 29 July 2019 Published: 19 August 2019

Technology has reshaped the workplace and the rapid improvements have transformed how we work nowadays. In the pursuit of industry 4.0, we build smart machines and robots to replace manual labor. While the manual labor is replaced by machines, in many cases, humans are trans-formed into desktop software users. Jobs such as testing, quality inspection, data monitoring, data entry, and routine editing remain to be done by humans in front of desktop computers. The operations to software applications in principle can be reduced to screen output understanding and mouse and keyboard operations. When the characteristics of these jobs are repetitive, tedious, and monotonous, they can be replaced by GUI automation techniques. GUI automation can be achieved by different un-derlying technologies, each has its pros and cons. In this paper, we describe a tool-Korat, which uses computer-vision to achieve maximum cross-platform capability for industrial applications, including test automation and robotic process automation. Although Korat has been successfully adopted by several industrial customers, difficult problems remain to be addressed. The problems and difficulties in applying computer vision for GUI automation are discussed and studied in this paper, particularly the experiences of applying open source OCR to GUI automation over color screenshots. By intro-ducing critical pre-processing stages and algorithms, the recognition rate is significantly increased and becomes feasible for practical usage.

Keywords:

Citation: Yung-Pin Cheng, Ching-Wei Li, Yi-Cheng Chen. Apply computer vision in GUI automation for industrial applications[J]. Mathematical Biosciences and Engineering, 2019, 16(6): 7526-7545. doi: 10.3934/mbe.2019378

Related Papers:

[1]	Nilkanth Mukund Deshpande, Shilpa Gite, Biswajeet Pradhan, Ketan Kotecha, Abdullah Alamri . Improved Otsu and Kapur approach for white blood cells segmentation based on LebTLBO optimization for the detection of Leukemia. Mathematical Biosciences and Engineering, 2022, 19(2): 1970-2001. doi: 10.3934/mbe.2022093
[2]	Jianzhong Peng, Wei Zhu, Qiaokang Liang, Zhengwei Li, Maoying Lu, Wei Sun, Yaonan Wang . Defect detection in code characters with complex backgrounds based on BBE. Mathematical Biosciences and Engineering, 2021, 18(4): 3755-3780. doi: 10.3934/mbe.2021189
[3]	Delong Cui, Hong Huang, Zhiping Peng, Qirui Li, Jieguang He, Jinbo Qiu, Xinlong Luo, Jiangtao Ou, Chengyuan Fan . Next-generation 5G fusion-based intelligent health-monitoring platform for ethylene cracking furnace tube. Mathematical Biosciences and Engineering, 2022, 19(9): 9168-9199. doi: 10.3934/mbe.2022426
[4]	Shuai Cao, Biao Song . Visual attentional-driven deep learning method for flower recognition. Mathematical Biosciences and Engineering, 2021, 18(3): 1981-1991. doi: 10.3934/mbe.2021103
[5]	Thaweesak Trongtirakul, Sos Agaian, Adel Oulefki . Automated tumor segmentation in thermographic breast images. Mathematical Biosciences and Engineering, 2023, 20(9): 16786-16806. doi: 10.3934/mbe.2023748
[6]	Yong Tian, Tian Zhang, Qingchao Zhang, Yong Li, Zhaodong Wang . Feature fusion–based preprocessing for steel plate surface defect recognition. Mathematical Biosciences and Engineering, 2020, 17(5): 5672-5685. doi: 10.3934/mbe.2020305
[7]	Xin Shu, Xin Cheng, Shubin Xu, Yunfang Chen, Tinghuai Ma, Wei Zhang . How to construct low-altitude aerial image datasets for deep learning. Mathematical Biosciences and Engineering, 2021, 18(2): 986-999. doi: 10.3934/mbe.2021053
[8]	Rafsanjany Kushol, Md. Hasanul Kabir, M. Abdullah-Al-Wadud, Md Saiful Islam . Retinal blood vessel segmentation from fundus image using an efficient multiscale directional representation technique Bendlets. Mathematical Biosciences and Engineering, 2020, 17(6): 7751-7771. doi: 10.3934/mbe.2020394
[9]	Wu Zeng . Image data augmentation techniques based on deep learning: A survey. Mathematical Biosciences and Engineering, 2024, 21(6): 6190-6224. doi: 10.3934/mbe.2024272
[10]	Amsa Shabbir, Aqsa Rasheed, Huma Shehraz, Aliya Saleem, Bushra Zafar, Muhammad Sajid, Nouman Ali, Saadat Hanif Dar, Tehmina Shehryar . Detection of glaucoma using retinal fundus images: A comprehensive review. Mathematical Biosciences and Engineering, 2021, 18(3): 2033-2076. doi: 10.3934/mbe.2021106

Abstract

1. Introduction

In the areas of software engineering or industrial applications, which involves hardware and software integration, testing is often performed by testers or programmers using keyboards and mouses. When graphical user interface (GUI) is involved, a tester may use a keyboard and a mouse to drive a test run and then monitor the system behaviors at the same time to assert the correctness of a test run. Since the steps of a test run are often repetitive and monotonous, testers often try to find a suitable testing tool to automate the task. The basic features of a software testing tool are reproducing the keyboard and mouse events that drive a test on a system under test (SUT) reliably at good timing and then asserting the correctness of the system under test in the run.

Technically speaking, what these testing tools perform is actually regression testing. The goal of a test is trying to expose the bugs hidden in the test run but a regression test is a repeated test to make sure system features that functioned correctly remains unaffected by the changes of the code due to bug fixes, feature extension, refactoring, etc. True GUI test automation where test cases are generated and tested automatically remains to be too hard to be practical in foreseeable future ^[1]. On the other hand, regression testing is practically doable, necessary, and indispensable for industrial applications. So, in software industry or other industrial areas nowadays, testers remain to be responsible for designing the test cases and preparing the test data to test the code of a SUT.

The underlying technology of test automation is GUI automation. Test Automation is one of the well-known industrial applications of GUI automation. Other applications are robotic proccess automation (RPA) or grinding in video gaming, where grinding refers to the playing time spent doing repetitive tasks within a game to unlock a particular item or to build the experiences needed to progress. GUI automation, at first glance, should be matured and commercialized to some extent already and applicable to practical industrial and personal needs. However, GUI automation has limited its usage in a few narrow domains for several reasons.

First, automating the keyboard and mouse events can be straightforward if operating systems or the platforms (e.g., web browser) under test supports GUI component accessing and invocation APIs. Popular commercial or open source testing tools typically adopt such an approach. To click a GUI button, these approaches actually invoke its click() method instead of moving mouse cursor onto the button and then performing a click action. We call such an approach platform-dependent (PD) GUI automation. For instance, popular open-source testing tool, Selenium ^[2], relies on the testing support from Firefox browser in its earlier versions. In Windows, commercial testing tools ^[3,4,5,6] and robotic process automation (RPA) tools ^[7] mainly rely on some Windows APIs from.Net framework.

Techniques that adopt such an approach, inevitably limit their use to a specific platform but not all the operating systems or platforms provide needed API support for GUI automation. The problems of relying on such an approach are:

● backward compatibility, for example, not applicable to old applications which were built by the early platform libraries,

● inapplicable to cross platform testing or automation, and

● unpredictable time delay to keep up with new emerging technology advances.

On the other hand, instead of adopting these platform dependent approaches, it is intuitive to approach the problem by mimicking humans. Humans process the screen output and then control the mouse and keyboard to drive the GUI automation. We call these approaches, computer-vision (CV) based GUI automation. For example, test automation tools adopting these approaches include Sikuli ^[8], T-Planrobot ^[9], Egg-plant^[10], and Korat ^[11]. The merits of this approach are:

1. not sensitive to platform support and technology advances,

2. applicable to several platforms such as Windows, Linux, and Web,

3. capable of verifying the program's GUI errors, such as layouts, and

4. capable of verifying program output that are rendered in image pixels.

However, the drawbacks of CV approach are:

1. GUI component properties (numeric values or strings) are not accessible in programming,

2. GUI component event triggering is not guaranteed,

3. GUI structured information, such as the elements in a list box, is not available in programming, and

4. performance can be slower.

Performance in item 4 in general is not a critical issue in practice. First, reliability of a black-box test is often more important than speed. Second, the bottlenecks of testing performance are often or mainly attributed to the response time of systems under test.

Technically speaking, CV approaches in several cases make the GUI automation problem more difficult to deal with, i.e., CV approaches turn a programming problem into an image processing problem. Particularly, programming is still an indispensable part in applications such as test automation and RPA. So, it is not surprising that these applications favor PD approach.

However, through years of development and experiences in applying and promoting Korat in several industrial applications, we now believe CV approach can be a competitive solution in many scenarios. We have encountered several industrial customers who tried to adopt PD-based solutions but eventually gave up for different reasons, such as backward compatibility and cross-platform capability. Many industrial applications have legacy and cross-platform issues, which makes PD approaches inapplicable or failed from time to time.

In this paper, we will describe the computer vision problems and how we address these problems practically in Korat. One of the most critical computer vision problem and challenges for CV GUI automation is optical character recognition (OCR). OCR is in general remains to be a difficult problem in practice. Commercial product such as ABBYY ^[12] or Google Vision ^[13] have already dominated the market in various domains. However, although these services have been available for some time and they provide adequate accuracy in general, cost and their technical constraints are always problems. For example, applying Google Vision cloud service requires internet connections, but in our applications to industry, most Korat users are not allowed to have internet connections in their factories. In addition, Korat's test cases may run in 24 hours. Cost becomes a real issue and could not be ignored. So, it is inevitable for us to seek out open-source solutions as an alternative. In this paper, we describe how we improve and enhance an open source OCR tool-Tesseract ^[14] for the character recognition on the screenshots.

This paper is organized as follows. In Section 2, we compare the techniques of test automation. Section 3 gives an overview of Korat, which adopts computer vision in GUI automation for testing. The major computer vision problems in GUI automation are discussed in Section 4. The improvements and evaluations are at the end of Section 4. Section 5 ends the paper with conclusion remarks.

2. Related work

Technically speaking, GUI automation is only an underlying technology to reproduce UI events (such as mouse and keyboard events) so that a GUI desktop software can be automated without human intervene. Among its applications, regression test automation is the most well-known one. Regression test automation can be done by programming, tools, or a combination of them. Systems with sophisticated GUI are often difficult to automate. Besides, system testing must wait until an executable program is built. So, writing xUnit test code has become a popular approach to support regression test automation at the programming level. However, xUnit testing requires programming skills and considerable code maintenance efforts ^[15]. For several reasons, unit tests can never replace system tests.

Instead of programming test cases, Capture/Replay (CR) is an advanced feature in software regression testing. Its basic idea is to record the operations of users, particularly in a GUI, into a test script, mainly consisting of keyboard and mouse events. When a system under test (SUT) is modified, the test script is replayed to see if the SUT is damaged by the change. In the replay, the previously saved keyboard and mouse events are sent to SUTs to emulate the testing behaviors of a tester. If a test can be successfully replayed, the test run is passed. Since Capture/Replay is a very practical approach, many commercial CR tools have been built. Examples of some CR tools are: T-Plan Robot ^[9], HP QuickTest ^[3], Rational Robot ^[16], etc..

Capture/Replay approaches may sound like a straightforward method for software regression testing but in practice they are often complicated by a lot of problems ^[17]. First, an SUT's execution time for the same test can be different in two separate runs. The timing to trigger a series of events may not always be the same between two different runs. So, a straightforward replay is often infeasible. Second, to intercept the mouse/keyboard events, CR tools often cause performance interference to the SUTs. There are a few ways to intercept mouse events. If monitored events are not local to the applications, a global hook is often used to intercept the mouse events from O.S. Unfortunately, hooks tend to slow down the system because they increase the amount of processing the system must perform for each message. The interference can be so significant that the temporal synchronization between a capture run and a replay run becomes difficult. Mouse dragging events are the most common GUI events that often fail to synchronize replayed runs in a precise manner ^[17]. Automating the CR tests for 3D games, for example, is even more challenging. Since no GUI components can be captured in a 3D scene, an intrusive CR approach was proposed ^[18].

In most software applications, human testing behaviors are mainly keyboard/mouse operations guided by human sight. A tester not only work as a test driver but also a test oracle to monitor and assert the correctness of a test run. So, in a human test, the human brain, eyes, and hands combine to play the roles of test driver and test oracle. So, to emulate what humans do, one intuitive CR approach is to analyze screen output (image pixels) and guide the mouse and keyboard to repeat a capture run. This CR approach is CV-based in this sense. The representative image-analysis CR tool is Sikuli ^[8]. Sikuli proposes a GUI scripting language called Sikuli, in which the sequence of operations of a test, called visual workflow, is recorded. The advantage of Sikuli script is its visual and easy-to-understand characteristics. In principle, limited programming skills are required for using Sikuli for software testing. One major drawback of Sikuli is the performance interference caused by intercepting mouse events with global hooks. Another drawback is that Sikuli's test scripts are often sensitive to the failures of image analysis and recognition. The major image recognition approach provided by Sikuli is template matching from OpenCV, which can fail by false positive cases (i.e., objects are incorrectly identified), and false negative cases (i.e., objects are incorrectly rejected) in practice.

Korat's initial work was first published in ^[11]. In this previous work, the prototype of Korat was presented to address the BIOS testing scenarios in industrial personal computer manufacturing, where a test case is across from BIOS, OS boot up, to Windows/Linux GUI systems. Note that in the stages of BIOS and OS boot up, there are no commercial operating systems to support testing need. Korat introduced a two-machine testing architecture to address the problem and achieved test scenarios which are capable of crossing platforms. In the best of our knowledge, Korat is unique in such a domain.

Another work that is closely related to Korat is T-Plan Robot ^[9]. Unlike Sikuli, T-Plan Robot is run at a separate machine from SUT, which is the same approach as Korat. Running CR tools on a separate machine can guarantee no performance interference on the execution of SUT. However, the problem is how to intercept mouse/keyboard events from SUTs. The answer of T-Plan Robot is to run a VNC (Virtual Network Computing) server on the SUT machine. VNC is a common tool for users to remotely control another computer. It transmits the keyboard and mouse events from one computer to another and graphical screen updates back in the other direction over a network. So, T-Plan robot can be platform independent wherever a VNC server can be installed. Compared to Korat's approach, a VNC server can still cause performance interference to SUTs. Besides, VNC servers are only allowed in platforms which often require a general purpose O.S. For example, it is inapplicable to BIOS testing and many other embedded devices. The platform independence is considered more limited than Korat.

3. Maximize cross-platform capability by CV GUI automation

Mouse devices, keyboards, and touch screens are now the major input devices for most software products. These input devices typically work with a GUI system. As a result, most system test automation tools aim for GUI automation driven by keyboards and mouse devices. One critical fact is that most software products, even mobile applications, are developed in major desktop environments, such as Windows, Macs, or Linux. These environments are mainly operated by a mouse device and a keyboard.

For the purpose of explaining GUI automation approaches, it is convenient to use an example to help explain the technical details behind it. This example is to move the cursor to click a button. When a GUI automation technique tries to reproduce the example move, it is possible to make a real mouse click, an emulated mouse click, or a fake mouse click. Making a fake mouse click is to invoke a GUI component's event handler such as its click() method without any real/emulated mouse events. This approach has been widely discussed in previous sections as the platform-dependent approaches.

Making a real mouse click can be done by producing a series of mouse USB signals without a real mouse device but let the SUT recognize it as a mouse device. Tools which adopt this approach are capable of being independent of operating systems and platforms. If USB signals are keyboard signals, it is even applicable to non-GUI environment such as BIOS and DOS. Let the approach be called $\textsf{Method-}$ USB. The major challenge of such an approach is tracking the mouse cursor in an SUT and determining when and where to perform a click. Korat ^[11] supports such an approach.

Making an emulated mouse click is to insert mouse actions into a GUI system's input queue. For example, Windows is an operating system bundled with a GUI system. It provides APIs to allow processes to insert mouse actions into system input queue to control mouse cursors. Typically, these APIs allow you to move a mouse cursor to a specific screen coordinate and perform mouse button actions. Let the approach be called $\textsf{Method-OS}$ . Tools relying on such an approach inevitably depend on the operating system. Building a tool by $\textsf{Method-OS}$ is comparatively less difficult than $\textsf{Method-}$ USB because you can precisely navigate the mouse cursor. However, the problem of determining when and where to perform a click is the same as $\textsf{Method-}$ USB. Tools that adopt this approach include Korat, Sikuli ^[8]. Sikuli and Korat are capable of driving SUTs under the same operating system. Basically, this approach needs image analysis and computer vision methods to determine where to click.

3.1. Modes of cross-platform

In , the architecture of Korat in $\textsf{Method-}$ USB mode is illustrated. As shown in the figure, Korat is run at a separate machine from SUT but the SUT's video output is connected to a video image capture card installed in the Korat machine. In this mode, Korat is a two-system testing tool but it is completely non-intrusive. An ARM-cortex M4 development board is used as a USB emulator to intercept and send keyboard/mouse events to SUT's USB ports.

Figure 1. Korat architecture in

$\textsf{Method-}$ USB mode.

	Terrseract	Otsu Bin	FBCITextBin	Google
Google Login page (A)	87.97	87.34	98.12	100.0
Twitter login page (B)	96.51	98.25	100.0	100.0
stack overflow (C)	23.33	34.56	81.38	98.67
Facebook (D)	91.72	91.37	91.95	98.62
Yahoo (E)	22.09	31.97	85.46	97.67
Average	64.32	68.69	91.38	98.99

	Tesseract (Trained)	Otsu Bin (trained)	FBCITextBin (trained)	Google
Google Login page (A)	89.24	87.97	99.36	100.0
Twitter login page (B)	97.67	98.25	100.0	100.0
stack overflow (C)	23.33	34.56	80.79	98.67
Facebook (D)	92.52	90.57	91.83	98.62
Yahoo (E)	22.25	31.97	85.88	97.67
Average	65.00	68.66	91.57	98.99

[1]	B. N. Nguyen and A. M. Memon, An observe-model-exercise* paradigm to test event-driven sys-tems with undetermined input spaces, IEEE T. Software Eng., 40 (2014), 216–234.
[2]	Selenium Projects, Seleniumhq, browser automation. Available from http://www.seleniumhq. org/, July 2019.
[3]	Hewlett-Packard Inc., Hp quicktest professional software. Available from https://download.cnet.com/HP-QuickTest-Professional/3000-2383_4-10969380.html.
[4]	Microsoft, Codeduitest, microsoftuiautomation(uia).Availablefromhttps://en.wikipedia.org/wiki/Microsoft_UI_Automation.
[5]	EOSS Group, Ranorex, test automation for everyone. Available from https://www.ranorex.com/.
[6]	Telerik, Test studio, automated testing made easy. Available from http://www.telerik.com/teststudio.
[7]	UIPath Ltd., UIPath , Accelerate Human Achievement . Available from https://www.uipath. com.
[8]	T. Yeh, T. Chang and R. C. Miller, Sikuli: using GUI screenshots for search and automation, in Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology, Victoria, BC, Canada, October 4-7, 2009 (A. D. Wilson and F. Guimbretière, eds.), ACM, (2009), 183–192 .
[9]	T-Plan, T-plan robot. Available from http://www.t-plan.com/robot/.
[10]	TestPlant, eggplant test automation tools. Available from https://www.testplant.com/eggplant/testing-tools/.
[11]	Y. Cheng, J. W. Kuo, B. Cheng, et al., platform-independent capture/replay test automation sys-tem," in 17th IEEE International Conference on High Performance Computing and Communica-tions, HPCC 2015, 7th IEEE International Symposium on Cyberspace Safety and Security, CSS 2015, and 12th IEEE International Conference on Embedded Software and Systems, ICESS 2015, New York, NY, USA, IEEE, (2015), 1122–1127.
[12]	ABBYY Co. Ltd., ABBYY FindReader. Available from https://www.abbyy.com/en-eu/.
[13]	Google Inc., Cloud vision. Available from https://cloud.google.com/vision/.
[14]	HP labs, Tesseract-ocr. Available from https://code.google.com/p/tesseract-ocr/.
[15]	G. Neszaros, xUnit Test Patterns: Refactoring Test Code. Addison-Wesley, 2007.
[16]	IBM Inc., Rational robot. Available from http://www-01.ibm.com/software/awdtools/tester/robot/index.html.
[17]	H. Zhu, W. K. Chan, C. J. Budnik, et al., The 5th Workshop on Automation of Software Test, AST 2010, May 3-4, 2010, Cape Town, South Africa, ACM, (2010).
[18]	C. Hsueh, Y. Cheng and W. Pan, Intrusive test automation with failed test case clustering, in 18th Asia Pacific Software Engineering Conference, APSEC 2011, Ho Chi Minh, Vietnam, December 5-8, 2011 (T. D. Thu and K. R. P. H. Leung, eds.), IEEE Computer Society, (2011), 89–96.
[19]	A. Mesbah and M. R. Prasad, Automated cross-browser compatibility testing, in Proceedings of the 33rd International Conference on Software Engineering, ICSE '11, (New York, NY, USA), ACM, (2011), 561–570.
[20]	Software Testing Help, Top 10 cross browser testing tools in 2019 (latest ranking). Available from https://www.softwaretestinghelp.com/best-cross-browser-testing-tools-to-ease-your-browser-compatibility-testing-efforts/.
[21]	R. Smith, An overview of the tesseract ocr engine, in Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 2 (2007), 629–633.
[22]	C. Song and V. Shmatikov, Fooling OCR systems with adversarial text images, Computing Re- search Repository, abs/1802.05385, 2018.
[23]	J. Zhang and J. Hu, Image segmentation based on 2d otsu method with histogram analysis, in 2008 International Conference on Computer Science and Software Engineering, 6 (2009), 105–108.
[24]	T. Kasar, J. Kumar and A. G. Ramakrishnan, Font and background color independent text bina-rization, in In Proceedings of 2nd International Workshop on Camera Based Document Analysis and Recognition, (2007), 3–9.

1.	Dejan Viduka, Vladimir Kraguljac, Comparison of human computer interaction on Windows and Linux operations systems, 2019, 7, 2334-816X, 95, 10.5937/trendpos1902095V
2.	Yi-Chun Chen, Bo-Huei He, Shih-Sung Lin, Jonathan Hans Soeseno, Daniel Stanley Tan, Trista Pei-Chun Chen, Wei-Chao Chen, Demystifying data and AI for manufacturing: case studies from a major computer maker, 2021, 10, 2048-7703, 10.1017/ATSIP.2021.3
3.	Yan Wang, Yuchen Zhang, LinJun Shen, ShuMing Wang, Bai Yuan Ding, Analysis and Research on Human Movement in Sports Scene, 2021, 2021, 1687-5273, 1, 10.1155/2021/2376601
4.	Sergio Borraz‐Martínez, Joan Simó, Anna Gras, Mariàngela Mestre, Ricard Boqué, Francesc Tarrés, Combining computer vision and deep learning to classify varieties of Prunus dulcis for the nursery plant industry , 2022, 36, 0886-9383, 10.1002/cem.3388
5.	Netzah Calamaro, Michael Levy, Ran Ben-Melech, Doron Shmilovitz, TNT Loss: A Technical and Nontechnical Generative Cooperative Energy Loss Detection System, 2022, 22, 1424-8220, 7003, 10.3390/s22187003
6.	Michel Nass, Emil Alégroth, Robert Feldt, Why many challenges with GUI test automation (will) remain, 2021, 138, 09505849, 106625, 10.1016/j.infsof.2021.106625
7.	Zhenchang Xu, Kuirong Liu, Bill Gu, Luchun Yan, Xiaolu Pang, Kewei Gao, Image recognition model of pipeline magnetic flux leakage detection based on deep learning, 2023, 41, 2191-0316, 689, 10.1515/corrrev-2023-0027
8.	Lewin Schaudt, Dennis Schlegel, 2023, Chapter 5, 978-3-031-36285-9, 83, 10.1007/978-3-031-36286-6_5

Mathematical Biosciences and Engineering

Apply computer vision in GUI automation for industrial applications

Related Papers:

Abstract

1. Introduction

2. Related work

3. Maximize cross-platform capability by CV GUI automation

3.1. Modes of cross-platform

3.2. Combinations of image recognition methods

3.3. Assertions

3.4. Applications

4. Apply computer-vision in GUI automation

4.1. Image recognition

4.1.1. Foreground mask for template matching

4.1.2. Background color switch

4.1.3. Anti-aliasing effect

4.2. Image understanding

4.3. Improve tesseract OCR for GUI automation

4.3.1. Tesseract to colorful images

4.3.2. Text binarization with training

5. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog