My Selenium Tests Aren't Stable!
Tuesday, June 02, 2009
By Simon Stewart
It's a complaint that I've heard too many times to ignore: "My Selenium tests are unstable!" The tests are flaky. Sometimes they work, sometimes they don't. How deeply, deeply frustrating! After the tests have been like this for a while, people start to ignore them, meaning all the hard work and potential value that they could offer a team in catching bugs and regressions is lost. It's a shameful waste, but it doesn't have to be.
Firstly, let's state clearly: Selenium is not unstable, and your Selenium tests don't need to be flaky. The same applies for your WebDriver tests.
Of course, this raises the obvious question as to why so many selenium tests fail to do what you intended them to. There are a number of common causes for flaky tests that I've observed, and I'd like to share these with you. If your (least) favourite bugbear isn't here, please tell us about it, and how you would like to approach fixing it, in a comment to this post!
Problem: Poor test isolation.
Example: Tests log in as the same user and make use of the same fixed set of data.
Symptoms: The tests work fine when run alone, but fail "randomly" during the build.
Solution: Isolate resources as much as makes sense. Set up data within the tests to avoid relying on a "set up the datastores" step in your build (possibly using the Builder pattern). You may want to think about setting up a database per developer (or using something like Hypersonic or SQLite as a light-weight, in-memory, private database) If your application requires users to log in, create several user accounts that are reserved for just your tests, and provide a locking mechanism to ensure that only one test at a time is using a particular user account.
Problem: Relying on flaky external services.
Example: Using production backends, or relying on infrastructure outside of your team's control
Symptom: All tests fail due to the same underlying cause.
Solution: Don't rely on external services that your team don't control. This may be easier said than done, because of the risk of blowing out build times and the difficulty of setting up an environment that models reality closely enough to make the tests worthwhile. Sometimes it makes sense to start servers in-process, using something like Jetty in the Java world, or webrick in Ruby.
Watching the tests run is a great way to spot these external services. For example, on one project the tests were periodically timing out, though the content was being served to the browser. Watching the tests run showed the underlying problem: we were serving "fluff" --- extra content from an external service in an iframe. This content was sometimes not loading in time, and even though it wasn't necessary for our tests the fact it hadn't finished loading was causing the problem. The solution was to simply block the unnecessary fluff by modifying the firewall rules on the Continuous Build machine. Suddenly, everything ran that little bit more smoothly!
Another way to minimize the flakiness of these tests is to perform a "health check" before running the tests. Are all the services your tests rely on running properly? Given that end-to-end tests tend to run for a long time, and may place an unusual load on a system, this isn't a fool-proof approach, but it's better to not run the tests at all rather than give a team "false negatives".
Problem: Timeouts are not long enough
Example: You wait 10 seconds for an AJAX request that takes 15 to complete
Symptom: Most of the time the tests run fine, but under load or exceptional circumstances they fail.
Solution: The underlying problem here is that we're attempting to determine how long something that lasts a non-deterministic amount of time will take. It's just not possible to know this in advance. The most sensible thing to do is not to use timeouts. Or rather, do use them, but set them generously and use them in conjunction with a notification from the UI under test that actions have finished so that the test can continue as soon as possible.
For example, it's not hard to change the production code to set a flag on the global Javascript "window" object when an XmlHttpRequest returns, and that could form the basis of a simple latch. Rather than polling the UI, you can then just wait for the flag to be set. Alternatively, if your UI gives an unambiguous "I'm done" signal, poll for that. Frameworks such as Selenium RC and WebDriver provide helper classes that make this significantly easier.
Problem: Timeouts are too long
Example: Waiting for a page to load by polling for a piece of text, only to have the server throw an exception and give a 500 or 404 error and for the text to never appear.
Symptom: Your tests keep timing out, probably taking your Continuous Build with them.
Solution: Don't just poll for your desired end-condition, also think of polling for well-known error conditions. Fail the test with an informative error message when you see the error condition. WebDriver's SlowLoadableComponent has an "isError" method for exactly this reason. You can push the additional checks into a normal Wait for Selenium RC too.
The underlying message: When your tests are flaky, do some root cause analysis to understand why they're flaky. It's very seldom because you're uncovered a bug in the test framework. In order for this sort of analysis and test-stability improvement work to be done effectively, you may well need support and help from your team. If you're working on your own, or in a small team, this may not be too hard. On a large project, it may be harder. I've had some success when a person or two is set aside from delivering functionality to work on making the tests more stable. The short-term pain of not having that extra pair of hands focusing on writing production code is more than made up for by the long-term benefit of a stable and effective suite of end-to-end tests that only fail when there's a real issue to be addressed.
It's a complaint that I've heard too many times to ignore: "My Selenium tests are unstable!" The tests are flaky. Sometimes they work, sometimes they don't. How deeply, deeply frustrating! After the tests have been like this for a while, people start to ignore them, meaning all the hard work and potential value that they could offer a team in catching bugs and regressions is lost. It's a shameful waste, but it doesn't have to be.
Firstly, let's state clearly: Selenium is not unstable, and your Selenium tests don't need to be flaky. The same applies for your WebDriver tests.
Of course, this raises the obvious question as to why so many selenium tests fail to do what you intended them to. There are a number of common causes for flaky tests that I've observed, and I'd like to share these with you. If your (least) favourite bugbear isn't here, please tell us about it, and how you would like to approach fixing it, in a comment to this post!
Problem: Poor test isolation.
Example: Tests log in as the same user and make use of the same fixed set of data.
Symptoms: The tests work fine when run alone, but fail "randomly" during the build.
Solution: Isolate resources as much as makes sense. Set up data within the tests to avoid relying on a "set up the datastores" step in your build (possibly using the Builder pattern). You may want to think about setting up a database per developer (or using something like Hypersonic or SQLite as a light-weight, in-memory, private database) If your application requires users to log in, create several user accounts that are reserved for just your tests, and provide a locking mechanism to ensure that only one test at a time is using a particular user account.
Problem: Relying on flaky external services.
Example: Using production backends, or relying on infrastructure outside of your team's control
Symptom: All tests fail due to the same underlying cause.
Solution: Don't rely on external services that your team don't control. This may be easier said than done, because of the risk of blowing out build times and the difficulty of setting up an environment that models reality closely enough to make the tests worthwhile. Sometimes it makes sense to start servers in-process, using something like Jetty in the Java world, or webrick in Ruby.
Watching the tests run is a great way to spot these external services. For example, on one project the tests were periodically timing out, though the content was being served to the browser. Watching the tests run showed the underlying problem: we were serving "fluff" --- extra content from an external service in an iframe. This content was sometimes not loading in time, and even though it wasn't necessary for our tests the fact it hadn't finished loading was causing the problem. The solution was to simply block the unnecessary fluff by modifying the firewall rules on the Continuous Build machine. Suddenly, everything ran that little bit more smoothly!
Another way to minimize the flakiness of these tests is to perform a "health check" before running the tests. Are all the services your tests rely on running properly? Given that end-to-end tests tend to run for a long time, and may place an unusual load on a system, this isn't a fool-proof approach, but it's better to not run the tests at all rather than give a team "false negatives".
Problem: Timeouts are not long enough
Example: You wait 10 seconds for an AJAX request that takes 15 to complete
Symptom: Most of the time the tests run fine, but under load or exceptional circumstances they fail.
Solution: The underlying problem here is that we're attempting to determine how long something that lasts a non-deterministic amount of time will take. It's just not possible to know this in advance. The most sensible thing to do is not to use timeouts. Or rather, do use them, but set them generously and use them in conjunction with a notification from the UI under test that actions have finished so that the test can continue as soon as possible.
For example, it's not hard to change the production code to set a flag on the global Javascript "window" object when an XmlHttpRequest returns, and that could form the basis of a simple latch. Rather than polling the UI, you can then just wait for the flag to be set. Alternatively, if your UI gives an unambiguous "I'm done" signal, poll for that. Frameworks such as Selenium RC and WebDriver provide helper classes that make this significantly easier.
Problem: Timeouts are too long
Example: Waiting for a page to load by polling for a piece of text, only to have the server throw an exception and give a 500 or 404 error and for the text to never appear.
Symptom: Your tests keep timing out, probably taking your Continuous Build with them.
Solution: Don't just poll for your desired end-condition, also think of polling for well-known error conditions. Fail the test with an informative error message when you see the error condition. WebDriver's SlowLoadableComponent has an "isError" method for exactly this reason. You can push the additional checks into a normal Wait for Selenium RC too.
The underlying message: When your tests are flaky, do some root cause analysis to understand why they're flaky. It's very seldom because you're uncovered a bug in the test framework. In order for this sort of analysis and test-stability improvement work to be done effectively, you may well need support and help from your team. If you're working on your own, or in a small team, this may not be too hard. On a large project, it may be harder. I've had some success when a person or two is set aside from delivering functionality to work on making the tests more stable. The short-term pain of not having that extra pair of hands focusing on writing production code is more than made up for by the long-term benefit of a stable and effective suite of end-to-end tests that only fail when there's a real issue to be addressed.
I have seen a Selenium test suite (for a very AJAXy application) that had random unexplainable nondeterministic failures. Actions like "wait for element with id=foobar to appear" and then the immediate next action "click on element with id=foobar" would suddenly say it couldn't find the element any more---but only sometimes and not always.
ReplyDeletePeople spent hours trying to find the root cause, without any luck.
With this type of error, particularly with an ajaxy application, you want to use 'wait for element to be clickable' rather than 'wait for element to appear'
DeleteYou missed XPath brittleness when UI changes with each dev build.
ReplyDeleteHow do I ensure that with each DEV build, though my XPath changes, I can successfully run my tests?
DeleteDo you have any solution ?
"It's very seldom because you're uncovered a bug in the test framework."
ReplyDeleteThat may be so, but browsing through their list of bugs turns up no small amount of unresolved "flakyness" issues.
Selenium is a great tool, but it's ironic that one of my testing tools should be the buggiest of all the dev tools I use...
While re-factoring automation for my project, i found that adhering to good practices resolves most of automation issues:
ReplyDelete1. XPath:
If IDs are available, use the same. Else, instead of relying on XPath constructed using HTML elements:
static final String DISPLAY_NAME = "//div[@id='body']/div[4]/div[1]/div/div[1]/h4";
static final String EMAIL_ADDRESS = "//html/body/div/div/div/div/span";
The following is more stable to minor changes in GUI:
static final String DISPLAY_NAME = “//div[@id='body']//div[@class='innerleft']/h4”;
static final String EMAIL_ADDRESS = "//div[@id='global-info']/span";
Use @class, text(), contains etc, to create stabler Ids.
2. Instead of flaky, I prefer to use the “maintainability”. Sometimes, the debug information is less or confusing that does not let us to the actual issue quickly.
Ex:
20:04:58.539 INFO - Command request: isElementPresent[//a[text()='Remove image'], ] on session ece3c0c368e74594ab788751e13cea96
If the command fails identifying the element (timeout) , there is no clue regarding which page the above element belongs to.
Along with this, I suggest that we can have a high level log as below:
[UI Flow] Is Element Present >> EditProfilePage.RemoveImageLink
20:04:58.539 INFO - Command request: isElementPresent[//a[text()='Remove image'], ] on session ece3c0c368e74594ab788751e13cea96
3. There is always a temptation to add 'sleep(...)' in the code to get the automation running locally. I suggest any test code with sleep should not be allowed to be checked-in. Instead use 'waitForElementsToLoad(...);' to wait for actual element to be loaded before continuing with the automation.
Another practice which I found cleaner and also resolved issues is to keep the test scenario code away from the logic of waiting for a GUI element to load. This needs to be handled within the abstraction of the page itself.
Ex:
(1)
public void clickBloggerBuzz() {
click(Ui.BUZZ_LINK);
// Add the waitForElementsToLoad() here
}
(2) Page constructors by default should wait for key element to be loaded before it returns.
public DashboardPage(Selenium selenium) throws MalformedURLException {
super(selenium);
waitForElementsToLoad(Ui.EDIT_PHOTO_LINK);
}
I love Selenium!
ReplyDeleteWish I could get my head around this BDD though :(
I wrote an article on how we’ve been testing an ExtJS UI in Java at:
ReplyDeletehttp://www.xeolabs.com/portal/articles/selenium-and-extjs
Primarily, it describes how we got around having to assign hand-crafted IDs to all the Ext components (taking care of the fragile XPath problem V. Narayan Raman mentioned) and how we synchronise the tests with AJAX.
hope it helps somebody,
Lindsay
I work with Selenium since at least 3 years, and I have to admit it is quite buggy, forcing me to submit patches from time to time.
ReplyDeleteOne of the problems of Selenium is that bugs are fixed very slowly, even when you submit a patch.
Another problem is that the code is not checked with an automated tool, like JsLint.
The most annoying problem is that Selenium randomly detects the loading of iframes.
As Marius, I have a test suite that works fine on 2 computers, but fails on 2 others, and all computers are quite different.
I'm still searching for the bug after having already spent a lot of time on it...
I also found some problems related to Firefox, like the fact that retrieving window.document.body generates an exception sometimes !
Great post Simon. I think you hit the nail on the head. I will add to the list of poor test isolation with a few (fairly obvious) examples:
ReplyDeletea) sharing data between tests
b) running tests without proper cleanup
c) running tests in parallel where current actions in one test might create unexpected outcomes in another
Writer is glossing over how buggy and unstable Selenium can be, especially for an automated build environment. As another commentor stated, simple calls to access elements will succeed one time but fail others.
ReplyDeleteAmen. I've run into any number of scenarios where the Selenium webdriver just ... stops. Everything looks fine, the browser is doing what it's supposed to be doing, but Selenium has somehow stopped interacting with the test runner. Running the whole thing again, the test that failed last time will work this time, but some other test will fail (or maybe, the whole batch will pass). Maybe it's my code, but I've never been able to isolate it. (And yeah, I'm following all the best practices he describes.)
DeleteAs another commentor stated, simple calls to access elements will succeed one time but fail others.
ReplyDeleteCannot agree more! Selenium creates more problems, produces more false alerts that I have to deal with than the actual coding.
Great Post.
ReplyDeleteAt the beginning of reading this post I had hope that I will find some suggestion/feedback/solution which will make my test suite stable.
But still I’m finding myself in same point as I was before reading this post.
I agree with jcmeyrignac and Eric Jain comment.
Here is one of the latest bug from selenium guys: http://code.google.com/p/chromedriver/issues/detail?id=121, this clearly shows that there is no backward compatibility check before releasing selenium version.
If things don’t get stable, I guess soon there will be another open source testing tool to compete with selenium.
I agree with Marius Gedminas, Larry Port, & Xun. My primary cause of flakiness is waitForElement (or similar wait) succeeding and then the following action I take on the element fails to locate the element, even if I try to assure it is present & visible.
ReplyDeleteDid you ever find a solution to this?
DeleteSame here, I can run a test and it fails to find element, yet it really finds it and test succeeds at the end, yet always has those failures.
ReplyDeleteIs there anyway to tell selenium to ignore those failures, when running as a htmlsuite?
Did you by chance find a solution to this, we are experiencing something similar and just cannot resolve...
DeleteSame problem facing here too... Many test fails when we ran in jenkins but almost all test passes when we ran locally, No idea why it happens like that..
ReplyDeleteHi All,
ReplyDeleteI am facing an issue while running selenium java code using webdriver. I tried in both linux and windows and both is throwing the same error. I tried to browse the solution in internet but most of the suggestion was to upgrade the selenium files. I am using the latest version of selenium and firefox. Below is the error am facing.Please help me resolve the issue.
org.openqa.selenium.firefox.NotConnectedException: Unable to connect to host 127.0.0.1 on port 7055 after 45000 ms. Firefox console output:
Firefox.","creator":"Canonical Ltd.","homepageURL":null},{"locales":["sv-SE"],"name":"Ubuntu Modifications","description":"Ubuntu-paket för Firefox.","creator":"Canonical Ltd.","homepageURL":null},{"locales"