By Patrick Lightbody
A few months ago the good people at
Sonatype asked if I could lend some help in testing the user interface for
Nexus , a full-featured
Maven repository that happens to have a UI built entirely using
ExtJS and a lot of AJAX. Besides having great respect for Sonatype, Maven, and Nexus (the
Selenium team actually uses Nexus to host manage all our artifacts), I was additionally drawn to the project because of two key technical challenges.
The first challenge was the the UI architecture. Nexus is designed as a true "single page" application. That means that the browser makes one request for the base page and then all subsequent UI changes, such as dialog boxes, tab selections, and modal dialogs, are done using a combination of client-side AJAX techniques and a server-side REST API serving back JSON content. This type of UI is extremely rich in interactivity and fully exercises the capabilities of Selenium. In addition, the UI framework itself is ExtJS, which presents some unique challenges when writing Selenium-based tests.
The second challenge is the application state itself. Nexus is designed to deal with remote files and repositories, many of which go through various ephemeral states. For example, the UI might display something very different for a Nexus repository that is temporarily unable to sync with a remote repository (due to a network failure) than it would display when the synchronization is working fine. In order to test these different UI behaviors, you must either recreate these different states, which is hard, or you must mock out the server state and behavior and trick the UI into operating in a certain way.
Given these two challenges, I was eager to pitch in and see what we could accomplish. Although I've been heavily involved with Selenium for many years, I find that I'm learning something new about UI testing almost every day, and this project was no different. In fact, the end result was far more surprising than I expected, and they might surprise you too!
Testing ExtJS with Selenium
The biggest challenge when writing Selenium tests for AJAX-rich applications like Nexus is making sure your tests aren't brittle, but instead can withstand the constant refactoring and other changes an application goes through during it's lifetime. After all, what use is an automated test if you have to re-write it or significantly edit it every time your application code changes? That would defeat the point of automation and potentially make manual testing a more palatable alternative.
Unfortunately, brittle tests all too often end up being the norm, ultimately resulting in the automation effort being abandoned or at least costing much more, in terms of time and manpower, than originally anticipated. By far, the two most common causes of brittle tests are timing issues when interacting with AJAX interfaces and use of Selenium "locators" that are not future-proof.
Timing Issues and AJAX
Let's start with the timing issue: suppose you had a "Login" link, as does Nexus, that, using AJAX techniques, dynamically pops up a dialog box with a username and password field. When first recording this test using Selenium IDE, you might find that the recorded commands would be:
Action |
Target |
Value |
click |
link=Login |
|
type |
username |
bob@example.com |
type |
password |
superSecret! |
The problem with this is that when you play it back at high speed, the test fails on the first "type" command, claiming that the username field does not exist. But by the time you get around to debugging the state of the browser using Firebug, you can plainly see that the username is right there. So what's going on?
It turns out that Selenium, by default, tries to run your test at extremely high speeds - faster than your eye can probably keep up with. As such, as soon as it clicked the "Login" link, it immediately tried to type the username field in. But if that popup dialog for login is displayed using AJAX or an internal iframe (as is the case with Nexus), that element isn't available immediately after the link is clicked.
Instead, it may takes a fraction of a second for the elements to appear on the screen. And it only takes that fraction of a second for Selenium to move on to the next action ("type") and find that it can't succeed. Often test writers attempt to solve this timing issue by slowing down the test, through use of commands such as setSpeed (which controls a built in pause between each Selenium command) or pause (which instructs Selenium to stop processing for an exact amount of time specified). A resulting test that passes successfully and uses this approach might look like:
Action |
Target |
Value |
click |
link=Login |
|
pause |
1000 |
|
type |
username |
bob@example.com |
type |
password |
superSecret! |
By pausing for 1000 milliseconds, we give the application enough time to render the login dialog box, which contains the username field. The test passes successfully. But suppose that one day the dialog box took 1.5 seconds to load - now the test has failed despite the fact that the functionality is still there, just slightly slower.
This could be resolved by changing the time from 1000 ms to 5000 ms, but it really only papers over the problem. It also needlessly wastes test execution time (critical for any Continuous Integration environment) for test runs where the dialog box appears in only half a second: a remaining 4.5 seconds are spent effectively "doing nothing".
The solution to this is to tell Selenium not to pause for a fixed amount of time, but to instead wait for a specific event to happen, such as an element finally appearing on the page:
Action |
Target |
Value |
click |
link=Login |
|
waitForElementPresent |
username |
|
type |
username |
bob@example.com |
type |
password |
superSecret! |
Now the script will be properly "synchronized", waiting as long or as little as necessary for the required element to be present before it interacts with it. In fact, this approach is so common with AJAX applications that in Selenium 2.0 we will likely see the behavior change: interaction with any element will, by default, include a built-in component that will wait for some amount of time for the element to appear on the page if it isn't there already.
Future-Proofing Brittle Locators
As I mentioned, the second cause of brittle tests is the "locator" not being future proof. In Selenium a locator is the string of text used to identify a field that you intend to interact with, such as "link=Login", "username", or "password". Selenium supports locators that reference by ID, name, CSS rule, XPath, link, or even JavaScript expression (ie: document.forms[0].elements[1]).
It is relatively easy to write a locator that works for your application
right now, but it takes some care to write one that will work in the future, even after major changes to the UI. Consider how Selenium IDE works when recording a script. It needs to generate a locator for you based on the element you clicked on or typed in to, and so it will follow these general guidelines:
- If the element has an id attribute, that will be used. The logic is that if you took the time to assign an ID to an element, it likely won't change in the future, so we should prefer that first.
- If there is no ID and the element is a basic <a> tag plain text inside of it, it'll use the link= locator, such as link=Login.
- If the element is a form element and it has a name attribute that is unique on the page, it will use that as the locator.
- If the element is a form element and does not have a unique name attribute, a DOM expression of the form document.forms[...].elements[...].
- Otherwise, an XPath expression will be utilized, such as /html/body/div[1]/div[2]/img[3].
As the recorder tries different locators, it moves from a best practice (using IDs) to a bad practice (using poorly-written XPath expressions). Generally, the best way to future proof your tests is to add IDs to your elements that you interact with - though we'll see in a moment that that isn't always the case.
If adding an ID isn't practical (which can happen for a variety of reasons), then the other approach is to use good XPath rather than bad XPath. For example, /html/body/div[1]/div[2]/img[3] is very specific. If a new image or div is added to the page, it could break the entire XPath and is therefore considered brittle. However, //img[@alt = 'Signup Button'] or //img[contains(@src, 'signup.png')] is probably a better XPath expression, since it doesn't rely on the page structure and instead will only break if the alt attribute or the image source file name itself changes.
Specific Challenges with Nexus and ExtJS
When applying these rules specifically to Nexus - and its underlying UI framework, ExtJS - there are a few issues that actually work against the best practices when creating Selenium locators. For example, ExtJS automatically generates ID attributes for the underlying HTML elements that it creates when using the various UI components such as a popup dialog box, form container, or sliding "drawer" which is present on the left hand side of Nexus.
These generated IDs look like
ext-419, where 419 is a somewhat unpredictable/random number that is based on browser timing issues and page construction events that may change from page load to page load. What that means is that if you are recording a test, you might find these IDs referenced in your script, but if you were to play back the test it might not work the next time around. This is clearly a problem - our "best practice" is no longer working for us!
The other issue that we encountered was dealing with components that are hidden or otherwise not visible. ExtJS has a nice design that only generates the HTML elements for a component if it needs to be displayed. This "lazy loading" approach improves performance on the browser itself, but it also makes your Selenium tests a little more difficult to write. This is because it's sometimes hard to tell whether you should use an "element present" check or an "element visible" check and often end up needing to use both to properly synchronize your tests.
Working with ExtJS and the Page Object Pattern
So given these problems with ExtJS, what can be done to make test creation easier? Fortunately there's a
great article that discusses the very approach we used when developing the Nexus test case system. In short, the article suggests creating Java classes that represent the logical page-level components (ie: ChangePasswordWindow) that generate dynamic locators based on the ExtJS API.
For example, in Nexus the window for changing your password is given an ID of "change-password-window", but the individual form fields are not given any ID that we could easily use in our locators. Fortunately, ExtJS provides a very nice API for first locating the window and then locating the text field relative to the window:
var cpWindow = Ext.getCmp('change-password-window');
var textField = cpWindow.findBy(function(c) {
return c.fieldLabel == 'Current Password';
})[0];
This code uses the Ext.getCmp() function, which returns a component located somewhere on the page by ID. From there, we then use the findBy() function, which takes in an anonymous function that we use to filter down all components in the cpWindow and extract out only the one we're interested in.
What this means is that we could write a Selenium expression that types in to that field using the following expression:
document=Ext.getCmp('change-password-window').findBy(function(c) { return c.fieldLabel == 'Current Password' })[0]
While that would work, clearly it's a
lot of text and wouldn't be very maintainable. But what we can do is extract these various bits in to Java classes that will be smart enough to abstract all this complexity away. Our Selenium test, which is now written in Java, will look very simple:
public class ChangePasswordTest extends SeleniumTest {
@Test
public void changePasswordSuccess() {
main.clickLogin()
.populate(User.ADMIN)
.loginExpectingSuccess();
ChangePasswordWindow window = main.securityPanel().clickChangePassword();
// ...
PasswordChangedWindow passwordChangedWindow = window
.populate("password", "newPassword", "newPassword")
.changePasswordExpectingSuccess();
passwordChangedWindow.clickOk();
}
}
What we've done is abstract out and conceptualize all the major components of the Nexus UI in to simple, reusable Java classes. First, we can see that ChangePasswordTest extends SeleniumTest. SeleniumTest is a class we created that automatically sets up Selenium according to a few different infrastructure requirements and settings (more on that later). Most importantly, it creates a protected "main" variable that we reference at the start of this test.
The main variable is a MainPage object, which is designed to represent all the top-level interactions a user can do with Nexus, such as click the "login" link in the upper right-hand corner. This has been abstracted out so you can simply call clickLogin(), which returns a LoginWindow object, which in turn can be populated with the login credentials for the admin, and then finally logged in and told to expect success. If the login fails, the LoginWindow code is designed to throw an exception and fail the test.
Next, we navigate to the security panel on the left-hand side of the Nexus UI with securityPanel() and then click the "change password" link with clickChangePassword(), which returns a ChangePasswordWindow. With a handle to this window, we can populate the embedded form with the populate() function and then finally click the button that saves the changes.
The entire result is a very clean test, but unless you have seen an example of one of these underlying objects, it may look like a bunch of magic. If you're curious, you can examine the entire source here. Let's take a look at the ChangePasswordWindow.java source:
public class ChangePasswordWindow extends Window {
private TextField currentPassword;
private TextField newPassword;
private TextField confirmPassword;
private Button button;
public ChangePasswordWindow(Selenium selenium) {
super(selenium, "window.Ext.getCmp('change-password-window')");
currentPassword = new TextField(this, ".findBy(function(c) { return c.fieldLabel == 'Current Password' })[0]");
newPassword = new TextField(this, ".findBy(function(c) { return c.fieldLabel == 'New Password' })[0]");
confirmPassword = new TextField(this, ".findBy(function(c) { return c.fieldLabel == 'Confirm Password' })[0]");
button = new Button(selenium, "window.Ext.getCmp('change-password-button')");
}
public ChangePasswordWindow populate(String current, String newPass, String confirm) {
currentPassword.type(current);
newPassword.type(newPass);
confirmPassword.type(confirm);
}
public PasswordChangedWindow changePasswordExpectingSuccess() {
button.click();
waitForHidden();
return new PasswordChangedWindow(selenium);
}
}
A few important notes: ChangePasswordWindow extends Window, another class we've created that provides access to standard capabilities that ExtJS exposes in any window component. Window itself extends a Component class, which also exposes generic functionality that any ExtJS component can provide, such as waiting for the component to be hidden with the waitForHidden() method.
In the constructor you can also see that we define the JavaScript expression that gets a handle to the window itself, but also create four additional objects that represent the critical form elements we will want to interact with. The key thing to note is that some of these components can take in another component (the ChangePasswordWindow) while others take in the Selenium object itself.
The difference here is that components that take in another component can have a partial locator expression, since it will be strung together using the parent component's locator string. Alternatively, those that are given a Selenium object directly must be given a component locator that is fully qualified and standalone. In this example, you can see uses of both, since the button had an ID that we could reference but the text fields did not.
This approach is commonly referred to as the Page Object Pattern and is becoming increasingly popular among the Selenium community as a way to write tests that will withstand the test of time. But even a well designed test is only half the battle: if the underlying application state isn't reliably recreated each time the test runs, the tests will likely have difficulty passing consistently.
Mocking Out Complex Application State
All web applications have some concept of state. Usually that state is what is contained in the database and possibly a caching layer. But in the case of Nexus, there are additional application states that are especially hard to reproduce. For example, recreating the state necessary to produce a dialog box warning that a file download attempt from the central repository failed is not trivial to do. Doing so would require the central repository to be temporarily taken offline - something that wouldn't make the millions of Maven users very happy!
The Nexus UI Architecture
So clearly there are some things we need to mock out. But the central repository example is just one thing. How many more behaviors must we also mock out? At this point, it is worth looking at the Nexus UI architecture to see if there is perhaps a single entry point that we can mock out.
Nexus uses a "single page" architecture. What I mean by that is there is conceptually only one page for the entire UI. The rest of the interaction and UI complexity is created dynamically using AJAX. Think about how Google Maps works: the location bar in your browser never really changes despite very big changes within the page. Nexus works exactly the same way.
The UI components themselves are generated using ExtJS and a
lot of JavaScript. But the data that controls how and when those components get rendered comes from the Nexus web server via REST API. So if we could mock out that API, we could essentially create only one mock rather than trying to mock all the edge cases where the application state is hard to reliably reproduce.
You might be asking yourself, "If you mock out the REST interface, are you even testing Nexus at this point?" While it's true that we would no longer be exercising any of the server-side application, since everything would be intercepted with mocks, it's important to note that that is not the goal. Nexus already has a very large suite of integration tests that are designed to test exactly that. These Selenium tests, on the other hand, were specifically designed for UI testing only. Because our goal isn't to do an end-to-end test, we can mock out the REST API without any problem.
Setting Mock Expectations in the Selenium Tests
Recall the ChangePasswordTest we saw earlier - you may have noticed there was a "..." comment in the middle where we cut out part of the test. Let's now look at the test in its entirety:
public class ChangePasswordTest extends SeleniumTest {
@Test
public void changePasswordSuccess() {
main.clickLogin()
.populate(User.ADMIN)
.loginExpectingSuccess();
ChangePasswordWindow window = main.securityPanel().clickChangePassword();
MockHelper.expect("/users_changepw", new MockResponse(Status.SUCCESS_NO_CONTENT, null) {
@Override
public void setPayload(Object payload) throws AssertionFailedError {
UserChangePasswordRequest r = (UserChangePasswordRequest) payload;
assertEquals("password", r.getData().getOldPassword());
assertEquals("newPassword", r.getData().getNewPassword());
}
});
PasswordChangedWindow passwordChangedWindow = window
.populate("password", "newPassword", "newPassword")
.changePasswordExpectingSuccess();
passwordChangedWindow.clickOk();
}
}
What we've done is made a call to MockHelper to tell it that the next call to /users_changepw (one of the many REST APIs) should return a mock response in which there is no data returned, but that we examine the data submitted to the REST API and confirm it matches what was entered in to the change password window.
We can use this technique to effectively stub out unique logic that is hard to reproduce. Even better, because Nexus uses a Plexus-based REST framework, we can work with these stubs using Java and they will be automatically marshaled to a format that the Nexus UI can understand.
Runtime Considerations
The key thing that enables this simple test design is the fact that the mock web server (it is not a fully working Nexus instance) and the test that controls Selenium are both running within the same JVM. Recall that ChangePasswordTest extends SeleniumTest. It turns out, SeleniumTest extends NexusTestCase, which is responsible for spinning up the mock web server, which will host the Nexus UI (HTML, CSS, JS, etc) and the mock REST API framework.
Once both the test and the mock server are running inside the same runtime environment, it's easy for the test to quickly set expectations that relate to the Selenium code that is before and after the MockHelper callout.
One thing we didn't tackle with this approach was concurrent or parallel test executions. Modern unit test frameworks, such as JUnit4 and TestNG, now allow for the running of parallel test cases. This is used to significantly speed up the time it takes to complete a build. But given the way that MockHelper is currently being used, it would be impossible to know or guarantee that a call to /users_changepw from test X or test Y should be mapped to browser session X or browser session Y.
Fortunately, this isn't a terribly difficult problem to overcome. With a little work in SeleniumTest, one could easily assign a random string to a ThreadLocal and and a cookie in the browser session. You could then use that string to uniquely associate mock calls with browser sessions and a map of expected response to REST API calls on the server side. The test itself would look the same, but now they could be run in parallel on a Selenium Grid.
Additional Tuning for Nexus + Selenium
There are a few other miscellaneous tricks we did with this project that are worth sharing, most of which are visible in the
SeleniumTest source code and the corresponding
SeleniumJUnitRunner, which is used by any test case extending SeleniumTest. In there we do a few things:
- Check for any mock assertion failures whenever a Selenium call is made.
- Capture an automatic screenshot of the browser upon any test failure (great for debugging).
- Capture the log of all network calls made during the test, including HTTP headers (also great for debugging).
- Utilize the grid.sonatype.org build farm for launching browsers remotely.
Checking Mock Assertions
Recall that in ChangePasswordTest we set mock expectations with the following code:
MockHelper.expect("/users_changepw", new MockResponse(Status.SUCCESS_NO_CONTENT, null) {
@Override
public void setPayload(Object payload) throws AssertionFailedError {
UserChangePasswordRequest r = (UserChangePasswordRequest) payload;
assertEquals("password", r.getData().getOldPassword());
assertEquals("newPassword", r.getData().getNewPassword());
}
});
What we didn't explain is that the MockResponse's setPayload() method was not being called by the test case itself, but rather via a callback managed by an entirely separate thread within the mock framework. This thread is the one that picks up the HTTP request from the browser and maps it to the mock response.
If an assertEquals() call fails within that thread, it will throw an AssertionFailedError, which is standard JUnit behavior. But because the error is being thrown outside of the scope of JUnit, JUnit won't know about it and therefore won't be able to mark the test as a failure.
To compensate for this, we needed to write the framework to capture assertion fails that happened in the mock environment and then regularly check if one was reported and, if so, report it as a failure to JUnit. We did this by creating a dynamic proxy around the Selenium object that would check the assertions and throw any that were found before every Selenium call:
final Selenium original = new DefaultSelenium(...);
selenium = (Selenium) Proxy.newProxyInstance(..., new InvocationHandler() {
@Override
public Object invoke(Object p, Method m, Object[] args) throws Throwable {
// check assertions on every remote call we do!
MockHelper.checkAssertions();
return m.invoke(original, args);
});
This now means that when we use the Page Object Pattern to indirectly interact with Selenium, each interaction will quietly check to see if there were any assertion failures. If there were, it will be re-thrown and reported in a way that JUnit can catch and report on.
Capturing Screenshots and Network Logs Automatically
Another thing we did was make it easier for developers to debug what went wrong if a test failed by capturing a screenshot of the browser upon failure and by always capturing the network traffic between the browser and the mock Nexus web server.
This is where the SeleniumJUnitRunner class comes in to a play. In the world of JUnit, a "runner" is the thing responsible for executing test cases. If your class doesn't specify a runner (and most do not), the default BlockJUnit4ClassRuner is used, which looks for methods with the @Test annotation that you're likely already familiar with. Our custom runner extends this class:
public class SeleniumJUnitRunner extends BlockJUnit4ClassRunner {
public SeleniumJUnitRunner(Class<?> c) throws InitializationError {
super(c);
}
protected Statement methodInvoker(FrameworkMethod m, Object test) {
if (!(test instanceof SeleniumTest)) {
throw new RuntimeException("Only works with SeleniumTest");
}
final SeleniumTest stc = ((SeleniumTest) test);
stc.setDescription(describeChild(m));
return new InvokeMethod(m, test) {
@Override
public void evaluate() throws Throwable {
try {
super.evaluate();
} catch (Throwable throwable) {
stc.takeScreenshot("FAILURE");
throw throwable;
} finally {
stc.captureNetworkTraffic();
}
};
}
}
What is happening here is that we're overriding the way JUnit invokes a method in our test case. We're still letting it go through, but we're also catching and re-throwing any exception, with the addition of capturing a screenshot in between. We're also capturing the network traffic that ran from the browser.
Both of these methods are part of the SeleniumTest that our test cases extend, and both use standard commands built in to Selenium. Screenshots are taken with a selenium.captureScreenshotToString() command, which returns the screenshot as a Base64 encoded string. The network log is retrieved using the selenium.captureNetworkTraffic(), which returns the network traffic in XML, JSON, or plain text format. Note: in order for the captureNetworkTraffic() command to work, you must start Selenium like so:
selenium.start("captureNetworkTraffic=true");
This tells Selenium to route all browser traffic through its local proxy, which allows it to see the network traffic and capture it for retrieval later.
Launching Browsers from the Sonatype Grid
The last thing we did was some work to make it easier to get started with the test framework. Normally, Selenium Remote Control requires that you run a "Selenium Server" either locally or on another computer:
java -jar selenium-server.jar
While this overhead is relatively small, it'd be nice if we could avoid it entirely. Another problem the Nexus developers have is that they are like most web developers today: they work on a Mac and test/develop locally with Firefox. As such, they don't have an easy way to launch tests that automate IE.
Fortunately, Sonatype has a large grid of machines that are running various operating systems and have many different browsers installed on them. It was currently being used to run Hudson build jobs, but it was clearly also perfectly capable of serving as a remote farm of browsers. As such, we decided that what we'd do is make test runs from developer laptops to Hudson build jobs all use the same set of browsers on the Sonatype grid.
We did this by dynamically opening up an SSH tunnel and port forwarding the necessary ports to talk to the Selenium Server that was already running on the remote machine, as well as to let the browser on the remote machine talk back to the hosted mock Nexus web server.
One big gotcha with this approach is that you have to remember that if there are multiple test runs from two separate developers executing at the same time, they can't both open a remote port forward (pointing back to the mock Nexus web server) on the same port. To solve that issue, we used the port-allocator-maven-plugin developed by Sonatype, which finds a random, unused port to use with the SSH connection.
One requirement of this model is that you must be connected to the internet in order to run tests. While we provide some command line switches that let you run your tests on a local browser, the default assumes you're always on. Also, because we're making an SSH connection, some form of credentials are required. We went with a shared (private) SSH key or the option to use your own personal SSH key and a supplied password for that key. Check out the openTunnel() and seleniumSetup() methods in the SeleniumTest source code to see how it all works.
The result is that we now have a farm of machines that can easily be used by developer builds and Hudson builds, all without needing to set up local browsers or Selenium Servers (except for those in the Sonatype Grid itself). Developers can now write a test in their IDE on OS X, right click on it and select "Run Test", and have it drive an IE browser on a remote machine.
Conclusion
When I started this project and the Sonatype team suggested we mock out the entire UI, I was skeptical. I felt there were already more than enough challenges with building a Selenium framework for an application built on top of ExtJS and that trying to mock out all these RESTful calls would only make the project more complex. Fortunately, I was wrong!
Because the Nexus team had put so much effort in to their headless integration tests, there was little need to really test the backend through a user interface test. As such, we were given a rare opportunity to truly only focus on the UI. This project is a testimony to the value of writing unit tests, integration tests, and functional UI tests without needing to necessarily embrace the overhead and complexity of trying to test them all at once.
I hope that this article gives you some ideas for testing your project with Selenium, whether it's the use of the Page Object Pattern or coming up creative ways to focus your testing efforts on the things that Selenium does well (UI testing) and using other testing techniques for the things it isn't necessarily best at (functional and unit testing).