By Simon Stewart

It's a complaint that I've heard too many times to ignore: "My Selenium tests are unstable!" The tests are flaky. Sometimes they work, sometimes they don't. How deeply, deeply frustrating! After the tests have been like this for a while, people start to ignore them, meaning all the hard work and potential value that they could offer a team in catching bugs and regressions is lost. It's a shameful waste, but it doesn't have to be.

Firstly, let's state clearly: Selenium is not unstable, and your Selenium tests don't need to be flaky. The same applies for your WebDriver tests.

Of course, this raises the obvious question as to why so many selenium tests fail to do what you intended them to. There are a number of common causes for flaky tests that I've observed, and I'd like to share these with you. If your (least) favourite bugbear isn't here, please tell us about it, and how you would like to approach fixing it, in a comment to this post!

Problem: Poor test isolation.
Example: Tests log in as the same user and make use of the same fixed set of data.
Symptoms: The tests work fine when run alone, but fail "randomly" during the build.
Solution: Isolate resources as much as makes sense. Set up data within the tests to avoid relying on a "set up the datastores" step in your build (possibly using the Builder pattern). You may want to think about setting up a database per developer (or using something like Hypersonic or SQLite as a light-weight, in-memory, private database) If your application requires users to log in, create several user accounts that are reserved for just your tests, and provide a locking mechanism to ensure that only one test at a time is using a particular user account.

Problem: Relying on flaky external services.
Example: Using production backends, or relying on infrastructure outside of your team's control
Symptom: All tests fail due to the same underlying cause.
Solution: Don't rely on external services that your team don't control. This may be easier said than done, because of the risk of blowing out build times and the difficulty of setting up an environment that models reality closely enough to make the tests worthwhile. Sometimes it makes sense to start servers in-process, using something like Jetty in the Java world, or webrick in Ruby.

Watching the tests run is a great way to spot these external services. For example, on one project the tests were periodically timing out, though the content was being served to the browser. Watching the tests run showed the underlying problem: we were serving "fluff" --- extra content from an external service in an iframe. This content was sometimes not loading in time, and even though it wasn't necessary for our tests the fact it hadn't finished loading was causing the problem. The solution was to simply block the unnecessary fluff by modifying the firewall rules on the Continuous Build machine. Suddenly, everything ran that little bit more smoothly!

Another way to minimize the flakiness of these tests is to perform a "health check" before running the tests. Are all the services your tests rely on running properly? Given that end-to-end tests tend to run for a long time, and may place an unusual load on a system, this isn't a fool-proof approach, but it's better to not run the tests at all rather than give a team "false negatives".

Problem: Timeouts are not long enough
Example: You wait 10 seconds for an AJAX request that takes 15 to complete
Symptom: Most of the time the tests run fine, but under load or exceptional circumstances they fail.
Solution: The underlying problem here is that we're attempting to determine how long something that lasts a non-deterministic amount of time will take. It's just not possible to know this in advance. The most sensible thing to do is not to use timeouts. Or rather, do use them, but set them generously and use them in conjunction with a notification from the UI under test that actions have finished so that the test can continue as soon as possible.

For example, it's not hard to change the production code to set a flag on the global Javascript "window" object when an XmlHttpRequest returns, and that could form the basis of a simple latch. Rather than polling the UI, you can then just wait for the flag to be set. Alternatively, if your UI gives an unambiguous "I'm done" signal, poll for that. Frameworks such as Selenium RC and WebDriver provide helper classes that make this significantly easier.

Problem: Timeouts are too long
Example: Waiting for a page to load by polling for a piece of text, only to have the server throw an exception and give a 500 or 404 error and for the text to never appear.
Symptom: Your tests keep timing out, probably taking your Continuous Build with them.
Solution: Don't just poll for your desired end-condition, also think of polling for well-known error conditions. Fail the test with an informative error message when you see the error condition. WebDriver's SlowLoadableComponent has an "isError" method for exactly this reason. You can push the additional checks into a normal Wait for Selenium RC too.

The underlying message: When your tests are flaky, do some root cause analysis to understand why they're flaky. It's very seldom because you're uncovered a bug in the test framework. In order for this sort of analysis and test-stability improvement work to be done effectively, you may well need support and help from your team. If you're working on your own, or in a small team, this may not be too hard. On a large project, it may be harder. I've had some success when a person or two is set aside from delivering functionality to work on making the tests more stable. The short-term pain of not having that extra pair of hands focusing on writing production code is more than made up for by the long-term benefit of a stable and effective suite of end-to-end tests that only fail when there's a real issue to be addressed.