Exploratory Testing: More than Superficial Bug Hunting

Sometimes people define exploratory testing (ET) quite narrowly, such as only going on a short-term bug hunt in a finished application. I don’t define ET that narrowly, in fact, I do ET during development whether I have a user interface to use or not. I often ask for and get some sort of testing interface that I can use to design and execute tests around before a UI appears. I’ll also execute traditional black-box testing through a UI as a product is being developed, and at the end, when development feels it is “code complete”. I’m not alone. Cem Kaner mentioned this on the context-driven testing mailing list, which prompted this blog post.
Cem wrote:

To people like James and Jon Bach and me and Scott Barber and Mike Kelly and Jon Kohl, I think the idea is that if you want useful exploratory testing that goes beyond the superficial bugs and the ones that show up in the routine quicktests (James Whittaker’s attacks are examples of quicktests), then you want the tester to spend time finding out more about the product than its surface and thinking about how to fruitfully set up complex tests. The most effective exploratory testing that I have ever seen was done by David Farmer at WordStar. He spent up to a week thinking about, researching, and creating a single test-which then found a single showstopper bug. On this project, David developed exploratory scenario tests for a database application for several months, finding critical problems that no one else on the project had a clue how to find.

In many cases, when I am working on a software development project, a good deal of analysis and planning go into my exploratory testing efforts. The strategies I outline for exploratory testing reflect this. Not only can they be used as thinking strategies in the moment, at the keyboard, testing software, but they can guide my preparation work prior to exploratory testing sessions. Sometimes, I put in a considerable amount of thought and effort modeling the system, identifying potential risk areas and designing tests that yield useful results.

In one case, a strange production bug occurred in an enterprise data aggregation system every few weeks. It would last for several days, and then disappear. I spent several days researching the problem, and learned that the testing team had only load tested the application through the GUI, and the real levels of load occurred through the various aggregation points communicating in. I had a hunch that there were several factors at work here and it took time to analyze them. It took several more days working with a customer support representative who had worked on the system for years before I had enough information to work with the rest of the team on test design. We needed to simulate not only the load on the system, but the amount of data that might be processed and stored over a period of weeks. I spent time with the lead developer, and the lead system administrator to create a home-grown load generation simulation tool we could run indefinitely to simulate production events and the related network infrastructure.

While the lead developer was programming the custom tool and the system administrator was finding old equipment to set up a testing environment, I created test scenarios against the well-defined, public Web Services API, and used a web browser library that I could run in a loop to help generate more light load.

Once we had completed all of these tasks, started the new simulation system, and waited for had the data and traffic statistics to be at the level that I wanted to generate, I began testing. After executing our first exploratory test, the system fell over, and it took several days for the programmers to create a fix. During this time, I did more analysis and we tweaked our simulation environment. I repeated this with the help of my team for several weeks, and we found close to a dozen show-stopping bugs. When we were finished, we had an enhanced, reusable simulation environment we could use for all sorts of exploratory testing. We also figured out how to generate the required load in hours rather than days with our home-grown tools.

I also did this kind of thing with an embedded device that was under development. I asked the lead programmer to add a testable interface into the new device he was creating firmware for, so he added a telnet library for me. I used a Java library to connect to the device using telnet, copied all the machine commands out of the API spec, and wrapped them in JUnit tests in a loop. I then created code to allow for testing interactively, against the API. The first time I ran a test with a string of commands in succession in the IDE, the device failed because it was writing to the input, and reading from the output. This caused the programmer to scratch his head, chuckle, and say: “so that’s how to repeat that behavior…”

It took several design sessions with the programmer, and a couple days of my time to be able to set up an environment to do exploratory testing against a non-GUI interface using Eclipse, a custom Java class, and JUnit. Once that was completed, the other testers used it interactively within Eclipse as well. We also used a simulator that a test toolsmith had created for us to great effect, and were able to do tests we just couldn’t do manually.

We also spent about a week creating test data that we piped in from real-live scenarios (which was a lot of effort to create as well, but well worth it.) We learned a good deal from the test data creation about the domain the device would work in.

Recently, I had a similar experience – I was working with a programmer who was porting a system to a Java Enterprise Edition stack and adding a messaging service (JMS.) I had been advocating testability (visibility and control – thanks James) in the design meetings I had with the programmer. As a result, he decided to use a topic reader on JMS instead of a queue so that we can see what is going on more easily, and added support for the testers to be able to see what the Object-Relational Mapping tool (JPA) is automatically generating map and SQL-wise at run-time. (By default, all you see is the connection information and annotations in Java code, which doesn’t help much when there is a problem.)

He also created a special testing interface for us, and provided me with a simple URL that passes arguments to begin exercising it. For my first test, I used JMeter to send messages to it asynchronously, and the system crashed. This API was so far below the UI, it would be difficult to do much more than scratch the surface of the system if you only tested through the GUI. With this testable interface, I could use several testing tools as simulators to help drive my and other tester’s ET sessions. Without the preparation through design sessions, we’d be trying to test this through the UI, and wouldn’t have near the power or flexibility in our testing.

Some people complain that exploratory testing only seems to focus on the user interface. That isn’t the case. In some of my roles, early in my career, I was designated the “back end” tester because I had basic programming and design skills. The less technical testers who had more knowledge of the business tested through the UI. I had to get creative to ask for and use testable interfaces for ET. I found a place in the middle that facilitated integration tests , while simulating a production environment, which was much faster than trying to do all the testing through the UI.

I often end up working with programmers to get some combination of tools to simulate the kinds of conditions I’m thinking about for exploratory testing sessions, with the added benefit of hitting some sort of component in isolation. These testing APIs allow me to do integration tests in a production-like environment, which complements the unit testing the programmers are doing, and the GUI-level testing the black box testers are doing. In most cases, the programmers also adopt the tools and use them to stress their components in isolation, or as I often like to use them for, to quickly generate test data through a non-user interface while still exercising the path the data will follow in production. This is a great way to smoke test minor database changes, or database driver or other related tool upgrades. Testing something like this through the UI alone can take forever, and many of the problems that are obvious at the API level are seemingly intermittent through the UI.

Exploratory testing is not limited to quick, superficial bug hunts. The learning, analyzing, executing, test idea generation and execution are parallel activities, but sometimes we need to focus harder in the learning and analyzing areas. I frequently spend time with programmers helping them design testable interfaces to help with exploratory testing at a layer behind the GUI. This takes preparation work including analysis and design, and testing of the interface itself, which all help feed into my learning about the system and into the test ideas I may generate. I don’t do all of my test idea generation on the fly, in front of the keyboard.

In other cases, I have tested software that was developed for very specialized use. In one case, the software was developed by scientists to be used by scientists. It took months to learn how to do the most basic things the software supported. I found some bugs in that period of learning, but I was able to find much more important bugs after I had a basic grasp of the fundamentals of the domain the software operated in. Jared Quinert has also had this kind of experience: “I’ve had systems where it took 6 months of learning before I could do ‘real’ testing.”