Abbot (with support from Costello)

Applying what I learned with JUnit to writing GUI tests, and building a better 'bot.

The Pain (Problems and Goals)

On the subject of why testing is so important, I won't repeat here what has been aptly described in Test Infected. If you haven't read that document, please do so before proceeding here.

Often a developer will want to make some optimizations to a piece of code on which a lot of user-interface behavior depends. Sure, the new table sorter will be ten times faster and more memory efficient, but are you sure the changes won't affect the report generator? Run the tests and find out. A developer is much more likely to run a test that has encapsulated how to set up and test a report than he is to launch the application by hand and try to track down who knows how to do that particular test.

To test a GUI, you need a reliable method of finding components of interest, clicking buttons, selecting cells in a table, dragging things about. If you've ever used java.awt.Robot, you know that you need A Better 'Bot in order to perform any user-level actions. The events the Robot provides are like assembly language for GUI testing. To facilitate and encourage testing, you need a higher-level representation.

Describing expected behavior before writing the code can clarify the developer's goals and avoid overbuilding useless feature sets. This principle applies to GUI component development as well as composite GUI development, since the process of writing tests against the API can elucidate and clarify what is required from the client/user point of view.

GUI Design

First of all, any GUI component should provide a public API which can be invoked in the same manner via a system user event or programmatically. Keep this in mind when writing new components. In the case of Java's Swing components, the event handling is mixed up with complex component behavior triggers in the Look and Feel code. It's not possible to execute the same code paths without triggering the originating events. A better design would be to have the Look and Feel code simply translate arbitrary, platform-specific event sequences into public API calls provided by the underlying component. Such a design enables the component to be operated equally as well by code, whether for accessibility or testing purposes.

GUI Testing

GUIs need testing. Contrary to some opinion, the problem is not always (or even commonly) solvable by making the GUI as stupid as possible. GUIs that are sufficiently simple to not require testing are also uninteresting, so they do not play into this discussion. Any GUI of sufficient utility will have some level of complexity, and even if that complexity is limited to listeners (code responding to GUI changes) and updates (GUI listening to code state changes), those hookups need to be verified.

Getting developers to test is not easy, especially if the testing itself requires additional learning. Developers will not want to learn the details of specific application procedures when it has no bearing on their immediate work, nor should they have to. If I'm working on a pie chart graph, I don't really want to know the details of connecting to the database and making queries simply to get an environment set up for testing. So the framework for testing GUIs should require no more special knowledge than you might need to use the GUI manually. That means

Look up a component, usually by some obvious attribute like its label.
Perform some user action on it, e.g. "click" or "select row".

Scripts vs compiled code

How can I test a GUI prior to writing code? One alternative (and useful in certain cases) is developing a mockup in a RAD tool. Unfortunately, the usefulness is relatively short-lived; it's not really possible (at this point in time) to automatically generate a very interesting interface. If your entire interface consists of buttons and forms, you may not really need a gui tester anyway. Mockups don't convert well to tests for the actual developed code, and RAD tool output usually requires some hand modification afterwards.

What a test script could plainly describe the GUI components of interest, and simply describe the actions to take on those components? Providing you know the basic building blocks, you can edit the scripts by hand or if you don't know the building blocks, in a script editor. No compilation necessary, which speeds development and maintenance of tests.

I wanted scripts to be hand-editable, with no separate compilation step. I wanted to be able to drop scripts into directory hierarchies as needed, and have the test framework pick them up automatically, similar to how JUnit auto-generates test cases.

How to Test

One issue in defining a GUI test is that you need to map from a semantic action (select the second item in the records table) onto a programmatic action (myTable.setSelectedIndex(1)). This comprises two separate problems. First, the target of the action, "records table", must be somehow translated into an actual code object. Second, the semantic event "select the second item" must be translated into a programmatic action on that code object.

Tracking components

There are many methods of identifying a component in a GUI hierarchy. We want the tests to be flexible enough to not break simply because another button or panel was added to the GUI layout.

Component Names Java provides for naming components, but since no one ever names them, this method of identification is mostly useless. Worse, some auto-generated components (frames for otherwise frameless windows, windows for popup menus, and most dialog instantiations) have auto-generated names, which aren't particularly helpful, or even downright misleading if components get created in a different order than expected.
Position in hierarchy This method guarantess a unique match, but also guarantees your script will break when that hierarchy changes, even for otherwise trivial modifications (like inserting a scrollpane). Each component would need to store its parent reference and index within that parent. Note that this implies each parent reference might need to store the same information for itself.
Parent window title This is useful as a first-pass discriminator in multi-window applications.
Component class This helps discriminate from the available components list, but is not likely sufficient on its own to identify a component.
Custom tags Here's where the real component resolution is done. Many component classes will have some aspect or property that can be used to uniquely identify them, typically a label (the text of a button or menu, a labelFor component, a window's title), but potentially any identifying element may be used. Abbot uses custom component testers to get the unique tag for any given component. These testers are dynamically loaded, so we don't have to know a priori how to get a tag for a given component class. Combined with all the previous attributes, Abbot does a pretty good job of tracking components.

Recording events

Ideally, there would be a programmatic function on every component that performed a given user action, similar to the doClick method on AbstractButton. Given that this is not the case, we want to provide for both implementing such functions and loading them dynamically, and constructing semantic events from low-level events when corresponding functions are not available. Having dedicated semantic functions makes writing scripts easier, but supporting the low-level events means they are not absolutely required.

The most basic events to support are those that correspond to low-level OS events for user input, namely

Pointer motion
Button press/release
Key press/release

In addition, the most common semantic events that affect user input are windows opening and closing, so we throw in support for those (Alternatively, the time delta between events could be preserved for replay, but doing so has not yet proved useful in very many cases)

Wait for window open/closed

The easiest way to write a script is to record a series of actions for later replay, adding checkpoints or assertions along the way to verify correctness. Eventually, you might want to add heuristics that strip out meaningless events, but keeping track of a few basic events is sufficient for good functionality.

Over and above event stream support, it's useful to have custom, class-based component functions that provide higher-level semantic events. For example, a custom table might export a select(int row, int col) action to select a cell within the table, or sortByColumn(int col) to invoke a click on the column header which would cause a sort in the table. The custom actions use existing building blocks (low-level events or existing semantic actions) to construct the higher-level semantic event.