Category Archives: test automation

Testing Debt

When I’m working on an agile project, (or any process using an iterative lifecycle), an interesting phenomenon occurs. I’ve been struggling to come up with a name for it, and conversations with Colin Kershaw have helped me settle on “testing debt”. (Note: Johanna Rothman has touched on this before, she considers it to be part of technical debt.) Here’s how it works:

  • in iteration one, we test all the stories as they are developed, and are in synch with development
  • in iteration two, we remain in synch testing stories, but when we integrate what has been developed in iteration one with the new code, we now have more to test than just the stories developed in that iteration
  • in iteration three, we have the stories to test in that iteration, plus the integration of the features developed in iterations that came before

As you can see, integration testing piles up. Eventually, we have so much integration testing to do as well as story testing, we have to sacrifice one or the other because we are running out of time. To end the iteration (often two to four weeks in length) some sort of testing needs to be cut in this iteration to be looked at later. I prefer keeping in synch with development, so I consciously incur “integration testing debt”, and we schedule time at the end of development to test a completed system.

Colin and I talked about this, and we explored other kinds of testing we could be doing. Once we had a sufficiently large list of testing (unit testing, “ility” testing, etc.), it became clear that the “testing debt” was more appropriate than “integration testing debt”.

Why do we want to test that much? As I’ve noted before, we can do testing in three broad contexts: the code context (addressed through TDD), the system context and the social context. The social context is usually the domain of conventional software testers, and tends to rely on testing through a user interface. At this level, the application becomes much more complex, greater than the sum of its parts. As a result, we have a lot of opportunity for testing techniques to satisfy coverage. We can get pretty good coverage at the code level, but we end up with more test possibilities as we move towards the user interface.

I’m not talking about what is frequently called “iteration slop” or “trailer-hitched QA” here. Those occur when development is done, and testing starts at the end of an iteration. The separate QA department or testing group then takes the product and deems it worthy of passing the iteration after they have done their testing in isolation. This is really still doing development and testing in silos, but within an iterative lifecycle.

I’m talking about doing the following within an iteration, alongside development:

  • work as a sounding board with development on emerging designs
  • help generate test ideas prior to story development (generative TDD)
  • help generate test ideas during story development (elaborative TDD)
  • provide initial feedback on a story under development
  • test a story that has completed development
  • integration test the product developed to date

Of note, when we are testing alongside development, we can actually engage in more testing activities than when working in phases (or in a “testing” phase near the end). We are able to complete more testing, but that can require that we use more testers to still meet our timelines. As we incur more testing debt throughout a project, we have some options for dealing with it. One is to leave off story testing in favour of integration testing. I don’t really like this option; I prefer keeping the feedback loop as tight as we can on what is being developed now. Another is to schedule a testing phase at the end of the development cycle to do all the integration, “ility”, system testing etc. Again I find this can cause a huge lag in the feedback loop.

I prefer a trade-off. We have as tight a feedback loop on testing stories that are being developed so we stay in synch with the developers. We do as much integration, system, “ility” testing as we can in each iteration, but when we are running out of time, we incur some testing debt in these areas. As the product is developed more (and there is now much more potential for testing), we bring in more testers to help address the testing debt, and bring on the maximum number we can near the end. We schedule a testing iteration at the end to catch up on the testing debt that we determine will help us mitigate project risk.

There are several kinds of testing debt we can incur:

  • integration testing
  • system testing
  • security testing
  • usability testing
  • performance testing
  • some unit testing

And the list goes on.

This idea is very much a work-in-progress. Colin and I have both noticed that on the development side, we are also incurring testing debt. Testing is an area with enormous potential, as Cem Kaner has pointed out in “The Impossibility of Complete Testing” (Presentation) (Article).

Much like technical debt we can incur it unknowingly. Unlike refactoring, I don’t know of a way to repay this other than to strategically add more testers, and to schedule time to pay it back when we are dealing with contexts other than program code. Even in the code context, we still may incur testing debt that refactoring doesn’t completely pay down.

How have you dealt with testing debt? Did you realize you were incurring this debt, and if so, how did you deal with it? Please drop me a line and share your ideas.

Test Automation is Software Development

This is a concept I can’t stress enough: test automation is software development. There really is no getting around it. Even if we use a record/playback testing tool, some sort of code is generated behind the scenes. This is nothing new as people like James Bach and Bret Pettichord have reminded us for years. Attempts to automate software development have been around for a while. Here’s a quote that Daniel Gackle sent to me in “Facts and Fallacies of Software Engineering” by Robert Glass:

Through the years, a controversy has raged about whether software work is trivial and can be automated, or whether it is in fact the most complex task ever undertaken by humanity. In the trivial/automated camp are noted authors of books like “Programming without Programmers” and “CASE — The Automation of Software” and researchers who have attempted or claim to have achieved the automation of the generation of code from specification. In the “most complex” camp are noted software engineers like Fred Brooks and David Parnas.

Software testing is also a non-trivial, complex task. Dan Gackle commented on why what Glass calls the “trivial/automated camp” still has such currency in the testing world, and has less support in the development world:

It’s a lot easier to deceive yourself into buying test automation than programming automation because test automation can be seen to produce some results (bad results though they may be), whereas attempts to automate the act of programming are a patently laughable fiasco.”

I agree with Dan, and take this one step further: attempting to automate the act of software testing is also a fiasco. (It would be laughable if it weren’t for all the damage it has caused the testing world.) It just doesn’t get noticed as quickly.

If we want to automate a task such as testing, first of all, we need to ask the question: “What is software testing?” Once we know what it is, we are now ready to ask the question: “Can we automate software testing?”

Here is a definition I’m comfortable with of software testing activities (I got this from James Bach):

  • Assessing product risks
  • Engaging in testing activities
  • Asking questions of the product to evaluate it. We do this by gathering information using testing techniques and tools.
  • Using a mechanism by which we can recognize a problem (an oracle)
  • Being governed by a notion of test coverage

What we call “test automation” really falls under the tools and techniques section. It does not encapsulate software testing. “Test automation” is a valuable tool we can use in our tester’s toolbox to help us do more effective testing. It does not and can not replace a human tester, particularly at the end-user level. It is a sharp tool though, and we can easily cut ourselves with it. Most test automation efforts fail because they don’t take software development architecture into account, they don’t plan for maintenance, and they tend to be understaffed, and are often staffed by non-programmers.

Test automation efforts suffer from poor architecture, bugs (which can cause false positives in test results), high maintenance costs, and ultimately unhappy customers. Sound familiar? Regular software development suffers from these problems as well, but we get faster and louder feedback from paying customers when we get it wrong in a product. When we get it wrong in test automation, it is more insidious; it may take a long time to realize a problem is there. By that time, it might be too late. Customers are quietly moving on to competitors, talented testers are frustrated and leaving your company to work for others. The list goes on.

This attitude of a silver bullet solution to our problems of “test automation” contributes to the false reputation of testing as a trivial task, and testers are blamed for the ultimate poor results. “Our testers didn’t do their jobs. We had this expensive tool that came with such great recommendations, but our testers couldn’t get it to work properly. If we can hire an expert in “Test Company X’s Capture/Replay Tool”, we’ll be fine.” So instead of facing up to the fact that test automation is a very difficult task that requires skill, resources, good people, design, etc. we hire one guy to do it all with our magic tool. And the vicious circle continues.

The root of the problem is that we have trivialized the skill in software testing, and we should have hired skilled testers to begin with. When we trivialize the skill, we are now open to the great claims of snake-oil salesmen who promise the world, and underdeliver. Once we have sunk a lot of money into a tool that doesn’t meet our needs, will we admit it publicly? (In many cases, the test tool vendors forbid you from doing this anyway in their license agreements. One vendor forbids you from talking at all about their product when you buy it.)

In fact, I believe so strongly that “test automation” is not software testing, I agree with Cem Kaner that “test automation” is in most contexts (particularly when applied to a user interface) a complete misnomer. I prefer the more correct term “Computer Assisted Testing”. Until computers are intelligent, we can’t automate testing, we can only automate some tasks that are related to testing. The inquiry, analysis, testing skill etc. is not something a machine can do. Cem Kaner has written at length about this in: Architectures of Test Automation. In software develpment, we benefit greatly from the automation of many tasks that are related to, but not directly attempting to automate software development itself. The same is true of testing. Testing is a skilled activity.

Anyone who claims they can do software test automation without programming is either very naive themselves, or they think you are naive and are trying to sell you something.

Jerry Weinberg on Test Automation

I was recently part of a discussion on SHAPE about automated unit and functional testing. I was talking about buggy test code, and how frustrating (and sadly common) it can be to have automated test code that is unreliable. As a result, I try to keep test case code as simple and short as possible in an attempt to reduce bugs I might inadvertantly create. I do what I can to test the test code, but it can get out of control easily. It’s hard to unit test test code, and then I wonder why do I need tests for my test code? Isn’t that a smell that my test code is too complex?

I was musing that there is nothing more frustrating than buggy test code and buggy test harnesses, and Jerry agreed. He then pointed out something interesting. Jerry said:

There’s a kind of symmetry to all this, so that someone from Mars might not know, without being told, which was the product code and which was the test code. Each tests the other.

I had never thought of test code and its relationship to production code in this way before. This is a powerful and profound way of thinking about how we can “test our tests”, and is especially true with automated xUnit tests. It also strengthens the idea that automated test code needs to be treated as seriously as production code, and deserves the same kind of design, planning, resources and people behind it. Without symmetry, problems inevitably arise in test code especially.

Five Dimensions of Exploratory Testing

(Edit: was Four Dimensions of Exploratory Testing. Reader feedback has shown that I missed a dimension.)

I’ve been working on analyzing what I do when Exploratory Testing. I’ve found that describing what I actually do when testing can be difficult, but I’m doing my best to attempt to describe what I do when solving problems. One area that I’ve been particularly interested in lately is intermittent failures. These often get relegated to bug purgatory by being marked as “Non-reproducible” or “unrepeatable bugs”. I’m amazed at how quickly we throw up our hands and “monitor” the bugs, allowing them to languish sometimes for years. In the meantime, a customer somewhere is howling, or quietly moving on to a competitor’s product because they are tired of dealing with it. If only “one or two” customers complain, I know that means there are many others who are not speaking up and possibly quietly moving to a competitor’s product.

So what do I do when Exploratory Testing when I track down an intermittent defect? I started out trying to describe what I do with an article for Better Software called Repeating the Unrepeatable Bug, and I’ve been mulling over concepts and ideas. Fortunately, James Bach has just blogged about How to Investigate Intermittent Problems which is an excellent, thorough post describing ideas on how to make an intermittent bug a bug that can be repeated regularly.

When I do Exploratory Testing, it is frequently to track down problems after the fact. When I test software, I look at risk, I draw from my own experience testing certain kinds of technology solutions, and I use the software. As I use the software, I inevitably discover bugs, or potential problem areas, and often follow hunches. I constantly go through a conjecture and refutation loop where I almost subconciously posit an idea about an aspect of the software, and design a test to decide whether my conjecture is falsifiable or not. (See the work of Karl Popper for more on conjectures and refutations and falsifiability.) It seems that I do this so much, I rarely think about it. Other times, I very consciously follow the scientific method, and design an experiment with controlled variables, a manipulated variable, and observe the responding variables.

When I spot an intermittent bug, I begin to build a theory about it. My initial theory is usually wrong, but I keep gathering data, altering it, and following the conjecture/refutation model. I draw information from others as I gather information and build the theory. I run the theory by experts in particular areas of the software or system to get more insight.

When I do Exploratory Testing to track down intermittent failures, these are five dimensions that I consider:

  • Product
  • Environment
  • Patterns
  • People
  • Tools & Techniques

Product

This means having the right build, installed and configured properly. This is usually a controlled variable. This must be right, as a failure may occur at a different rate depending on the build. I record what builds I have been using, and the frequency of the failure on a particular build.

Environment

Taking the environment into account is a big deal. Installing the same build on slightly different environments can have an impact on how the software responds. This is another controlled variable that can be a challenge to maintain, especially if the test environment is used by a lot of people. Failures can manifest themselves differently depending on the context where they are found. For example, if one test machine has less memory than another, it might exacerbate the underlying problem. Sometimes knowing this information is helpful for tracking it down, so I don’t hesitate to change environments if an intermittent problem occurs more frequently in one than another, using the environment a manipulated variable.

Patterns

When we start learning to track down bugs when we find them in a product, we learn to repeat exactly what we were doing prior to the bug occurring. We repeat the steps, repeat the failure, and then weed out extraneous information to have a concise bug report. With intermittent bugs, these details may not be important. In many cases I’ve seen defect reports for the same bug logged as several separate bugs in a defect database. Some of them have gone back for two or three years. We seldom look for patterns, instead, we focus on actions. With intermittent bugs, it is important to weed out the details, and apply an underlying pattern to the emerging theory.

For example, if a web app is crashing at a certain point, and we see SQL or database connection information in a failure log, a conjecture might be: “Could it be a database synchronization issue?” Through collaboration with others, and using tools, I could find information on where else in the application the same kind of call to a database is made, and test each scenario that makes the same kind of call to try to refute that conjecture. Note that this conjecture is based on the information we have available at the time, and is drawn from inference. It isn’t blind guesswork. The conjecture can be based on inference to the best explanation of what we are observing, or “abductive inference”.

A pattern will emerge over time as this is repeated, and more information is drawn in from outside sources. That conjecture might be false, so I adjust and retest and record the resulting information. Once a pattern is found, the details can be filled in once the bug is repeatable. This is difficult to do, and requires patience and introspection as well as collaboration with others. This introspection is something I call “after the fact pattern analysis”. How do I figure out what was going on in the application when the bug occured, and how do I find a pattern to explain what happened? This emerges over time, and may change directions as more information is gathered from various sources. In some cases, my original hunch was right, but getting a repeatable case involved investigating the other possibilities and ruling them out. Aspects from each of these experiments shed new light on an emerging pattern. In other cases, a pattern was discovered by a process of elimination where I moved from one wrong theory to the next in a similar fashion.

The different patterns that I apply are the manipulated variables in the experiment, and the resulting behavior is the responding variable. Once I can repeat the responding variable on command, it is time to focus on the details and work with a developer on getting a fix.

Update:
Patterns are probably the most important dimension, and reader feedback shows I didn’t go into enough detail in this section. I’ll work on the patterns dimension and explain it more in another post.

People

When we focus on technical details, we frequently forget about people. I’ve posted before about creating a user profile, and creating a model of the user’s environment. James Bach pointed me to the work of John Musa who has done work in software reliability engineering. The combination of the user’s profile and their environment I was describing is called an “operational profile”.

I also rely heavily on collaboration when working on intermittent bugs. Many of these problems would have been impossible for me to figure out without the help and opinions of other testers, developers, operations people, technical writers, customers, etc. I recently described this process of drawing in information at the right time from different specialists to some executives. They commented that it reminded them of medical work done on a patient. One person doesn’t do it all, and certain health problems can only be diagnosed with the right combination of information from specialists applied at just the right time. I like the analogy.

Tools & Techniques

When Exploratory Testing, I am not only manually testing, but also use whatever tools and techniques help me build a model to describe the problem I’m trying to solve. Information from automated tests, log analyzers, looking at the source code, the system details, anything might be relevant to help me build a model on what might be causing the defect. As James Bach and Cem Kaner say, ET isn’t a technique, it’s a way of thinking about testing. Exploratory Testers use diverse techniques to help gather information and test out theories.

I refer to using many automated or diagnostic testing tools to a term I got from Cem Kaner: “Computer Assisted Testing.” Automated test results might provide me with information, while other automated tests might help me repeat an intermittent defect more frequently than manual testing alone. I sometimes automate certain features in an application that I run while I do manual work as well which I’ve found to be a powerful combination for repeating certain kinds of intermittent problems. I prefer the term Computer Assisted Testing over “automated tests” because it doesn’t imply that the computer takes the place of a human. Automated tests still require a human brain behind them and to analyze their results. They are a tool, not a replacement for human thinking and testing.

Next time you see a bug get assigned to an “unrepeatable” state, review James’ post. Be patient, and don’t be afraid to stand up to adversity to get to the cause. Together we can wipe out the term “unrepeatable bug”.

Unhealthy Goals

Chris Morris has blogged recently on a topic that is frequently misunderstood. There is often an attitude that we should shoot as high as we can when setting project goals. “Why not attempt to achieve perfection? Even if we don’t get there, we’ll get better results than if we set a lower goal.” is the line of thinking. Unfortunately on software projects (and probably other projects), this line of thinking can have the opposite effect, actually causing harm to a project.

Here are some project goals from my experience that have hindered the quality of a product, and have been detrimental to the development team:

  1. 100% test coverage
  2. Zero defects
  3. 100% test automation

Goal 1, “100% test coverage” is easily refuted as shown in this paper by Cem Kaner: The Impossibility of Complete Testing. What’s wrong with having this as a goal? An experienced tester realizes that it is an impossible task, and will probably feel like they are now on a death march project. At best, they might feel demoralized and not be able to finish an impossible task. At worst, they might feel pressured to falsify test results to please management.

An inexperienced tester might get complacent, and stop thinking about testing, instead rubber stamping the software with a suite of regression tests. Why should we challenge our ideas about testing the software when we already have 100% coverage? Every time I have seen a product go out the door with “100% Coverage”, a bug was found in the field. This ruins the credibility of the project team, especially if these numbers are used to measure performance, or market some sort of quality.

Goal 2, “Zero Defects” is also an impossible goal. If we can’t test every possible permutation and combination that the software might be exercised with ever, how can we guarantee that our software has zero defects? But this is a good goal you say, even if it is unachievable. Not in my experience. Every zero defect attempt project I have seen has caused defect reporting to become politicized. After the initial exhuberance wears off and testers and developers realize they are finding a lot of defects, strange things happen. Terminology changes, so certain classes of defects are called “issues” or “variances” or “thingies” so that they aren’t measured anymore. Developers (and managers) pressure testers and other developers to not log defects. Defects are closed without being fixed. A “shadow process” emerges where the defects that really need to be fixed aren’t logged through formal channels. Instead, they are logged and fixed away from the eyes and ears of other teammates and management. Even more defects can be injected into the code because of the resulting lack of communication and collaboration.

This is important to note, as Joel writes: “Fixing bugs is only important when the value of having the bug fixed exceeds the cost of the fixing it.” These are businesses we are running and working for, and everything we do needs to make sense for the financial health of the company.

Goal 3, “100% test automation” has been recently re-popularized by the agile development community. The problem with this goal, is that there are certain tests that we aren’t able to automate. In fact, there is little about testing that we can automate, especially if the end user of the software is a human. An automated test script is a very rough approximation of what a human does because computers are not intelligent. Entire classes of tests are ignored or not thought of because they do not fit this testing paradigm, especially exploratory testing. Once automated test suites become sufficiently large, maintenance becomes an issue. There is pressure to not add or execute new test cases on software because there are too many automated test cases to worry about.

Thorough, accurate, meaningful testing can be sacrificed when it is more important to automate tests to reach this goal. Relying too much on these automated tests often allows bugs to be released that a human would have caught instantly. Less rich, manual testing is completed as those cycles are taken up with maintenance.

Measuring individuals and teams by these sorts of standards is almost always counter-productive. SMART goals and other devices used on performance appraisals map nicely to numbers and percentages, but there is little in life we do that can be accurately mapped to a two-variable graph. There are lots of other factors beyond our control which are known as “spurious variables”. At some point, people who are measured against something that is always just beyond their reach will cause them to behave in unintended ways. There are even Dilbert cartoons about rewarding developers for the amount of bugs fixed, and measuring testers on bugs found is equally counter-productive. Both can lead to a breakdown within a team and product quality suffering.
When I talk to the people who decide on these goals, in most cases the actual goal they set isn’t the intended result they are looking for. If a senior manager says to me that they have a goal for zero defects, I ask why.

Usually there is a quality issue that is angering customers with the software being delivered, and they desperately want it fixed. If the issue is reframed to: “Would it be ok if the software you delivered was reliable and robust enough so that your customers are happy?” “Why not make having happy customers who can rely on our software as our goal?” Often, that is a satisfactory goal to the manager, and is something that is reasonable to shoot for. It is also helpful to look at goals over the long term, and have a goal of consistent attempts at improvement.

When I hear things like “100% test automation”, and I question further, it is almost always about efficiency. If the issue is reframed to: “We need to look for ways to be more efficient in our testing. Why don’t we analyze our testing processes (both manual and automated), and choose the most efficient methods we can, and keep working for more efficiency. Why not make efficiency our goal?” In some cases, strategic manual testing may be more efficient and cost effective than automated testing. In many projects, especially in the beginning of a test automation effort, a little test automation can go a long way.

The true motivation behind these goals is important to understand. In many cases I’ve seen the numbers line up nicely with the goals, only to have the intended but unexpressed goal fail. “That’s great that you have zero defects when shipping, but our customers are unhappy!” an executive might say. Mary Poppendieck says: “Measure Up!” This means measure what is really important. Look at the big picture. If you measure details too much, you may miss the big picture. If you do set details-oriented goals, carefully analyze resources, schedules and the people on the project instead of just picking a number to shoot for. Be sure to measure how the goals feed the big picture. If they aren’t helping contribute to the big picture (the bottom line, happy customers, a happy, healthy, productive team), drop them. It might be surprising under scrutiny to see how many of these unrealistic goals are barriers to a good bottom line, happy customers and a happy, healthy, productive team.

Many process certified projects with wonderful charts and graphs and SMART goals all over the place release crummy products that customers quietly stop using. After all the cheering over process numbers fades and the company finds it can’t sell products like it used to, people wonder why. If we measure product success instead of adherence to a process, and measure how the project feeds the bottom line instead of things like “Zero Defects”, we might end up with better results.

Testing an Application in Layers

There is often debate about test automation versus manual testing. When I think about testing, I look at an application in 3 broad layers: the code (on the machine side), the system (where the finished software lives), and the visible layer, or how the software is used from an end user’s perspective. I often call this visible layer the social context because of the environment much end-user software is used in. When we spend a lot of time in one context, testing starts to specialize because we concentrate on part of the picture.

When we view an application from the source code view, the testing is dominated by automation. When we look at the system context (as some of my operations friends do), testing involves integration in a system, and testing hardware, firmware, drivers, etc. to make sure the software gets served up correctly. Automated testing tends to get more complex the more we move from the code to the user interface. Attempting to emulate user actions is difficult, and high-volume automated functional tests can involve massive amounts of automated test code. This can be problematic to maintain. Sometimes functional tests become so complex they involve as much test code as the software they are testing has to serve up a component, plus test data generation code, as well as the code that attempts to emulate user actions.

Personally, I agree with Cem Kaner and call automated testing “computer-assisted testing”. The computer is a tool I use in conjunction with good manual testing. Until machines are intelligent, we can’t really automate testing. We can automate some aspects of testing to help maximize problem-solving efficiency.

Traditionally, software testers tend to have a handle on the social context, or how the software is used in a business context. As a result, much conventional testing is focused on the visible layer of the application. I tend to prefer testing at various layers in an application. I value testing components in isolation as well as testing software within a system. There are advantages and drawbacks to both. While I value isolation, testing, particularly brain-engaged manual testing at a visible UI layer has merit. Sometimes testers focus a great deal on testing in this context when component isolation might be more efficient. Often, traditional testers hope for an automated testing tool that can do the work of a tester. I’ve yet to see this occur successfully, but there are still tasks that can be automated to help testing efforts that are a big help.

Frequently there are bugs that are difficult to track down that crop up in the visible layer of the application. These are due to the visible application at runtime becoming greater than the sum of its parts. There is a kind of chaos theory situation that occurs due to the application being used in a way that the underlying code may not be designed to handle. By the time a minor fault at the code level bubbles up to the UI, it may have rippled through the application causing a catasrophic failure. Unfortunately, these kinds of usage-driven faults are problems automated tests at various layers do not tend to catch. Often it is some sort of strange timing issue as I note in this article. Other times, it’s due to actions undertaken in a social environment by an unpredictable, cognitive human. These variable actions motivated by inductive reasoning, driven by tacit knowledge are difficult to repeat by others.

Focusing too much on one testable interface in an application can skew our view. If we view the application by the code most of the time, we have a much different picture than if we view it through the UI. Try flipping the model of an application on it’s side instead of viewing it bottom up(from the code), or top down(from the ui). You may discover new areas to test that require different techniques, some of which are great candidates for automation, while others require manual, human testing.

TDD in Test Automation Projects

I’ve written before about pairing with developers during Test-Driven Development(TDD), and I’ve been fortunate to work with very talented TDD developers who are apt to teach. I’ve learned a lot, and decided to try TDD in a programming role. Recently, I’ve taken off my tester hat and started doing TDD myself with test automation projects. I’m not completely there yet – I often need to do an architectural spike first when I’m developing something new. Once I have figured out a general design, or have learned how a particular library works, I throw away the spike code and start off development by writing a test. I then write enough code to get the test to pass, write a new test, add new code and repeat until the design is where I need it to be.

So what does this gain in test automation projects? I’m loathe to have test cases that are so complex that they themselves require testing. If our test cases are so complex that they are causing problems themselves, that’s a test design smell. However, there are other kinds of software in our automation projects than just the test cases. In automation frameworks, we need special libraries, or adaptors, or a way to access an application we need to test, and all sorts of utilities that help us with automation. Since these utilities are still software, they are subject to the same problems that any other software development effort is. Sometimes there is nothing more frustrating than buggy test code, so we need to do what we can to make it as reliable as possible.

In my own development, I’m finding a lot of benefits doing TDD. My designs improve, because if they aren’t testable, I know there is a problem. When I make my code testable, it suddenly becomes more usable and more reliable. Too often testers aren’t given time to refactor test code. I’ve found that refactoring usually starts when technical debt in a test harness and custom test library are starting to interfere with productivity. When there are no unit tests for custom test library code, it can involve a few minutes to change the code, and several hours testing it. Having a safety net of unit tests helps immensely with refactoring. You can refactor your test code with greater confidence, and when it’s done consistently with automated unit tests, with much greater speed. It just becomes a normal part of development.

Recently, I’ve found several bugs in my test library code in the elaborative phase of TDD that I didn’t find testing manually. The opposite is also true, I tested libraries I was developing manually and found a couple of bugs that my unit tests didn’t uncover. The TDD-discovered bugs required design changes that helped my design immensely. The tests guided the design into something different (and better) than what I had in my head when I started. However, after a couple of days of only running the unit tests to satisfy the “green bar”, I found a big hole that was only uncovered by using the library the way an end user would. The balance of testing techniques is helpful. I have also adopted a practice of pairing a positive test with a negative test, a technique I learned from John Kordyback. If I do an assert_equal for example, I also do an assert_not_equal (using test::unit style). This has really come in handy at times where one assertion would work, but the other would fail.

TDD may not be for everyone, but I find it is a nice complement to other kinds of testing we can do. For my own work, it seems to suit the way I think about development. Even if the test cases I develop while programming are trivial and small, there is strength in numbers, and writing and using them helps assuage that tester voice in the back of my head that comes out when programming. I encourage other conventional testers who work on automation projects to give it a try. You may find that your designs improve, and you have a safety net of automated tests to give you more confidence in your automation code, especially when you need to enhance it down the road. At the very least, it helps you gain an appreciation and helps you communicate when working with TDD folks on a project.

Fast Failures

I was talking with Mark McSweeny the other day about a test design I was thinking about. It involved attempting to automate a process that currently relies on a lot of manual testing involving visual inspection. The tricky part of automating this kind of testing is the potential for variation in the items under test. Variation is hard for a computer to handle, but a human who understands the context can instantly spot the variation and see whether it is permissible, or whether it constitutes a test failure. I was asking Mark’s advice on how to deal with the variation, and mentioned the tools I was thinking of using. Mark pointed out that I was overcomplicating the test design by thinking about what data structures, xUnit tools and libraries to use, and not thinking enough about the human tester.

He described “fast failure” tests he writes that are pure change detectors, designed to run quickly. While they can provide feedback very quickly, they don’t provide a lot of information other than “Pass” or “Fail”. When a test fails, the human steps in and does the manual inspection. In many cases, the human can tell at a glance if the failure is a bug or not. If the change is ok, and recurring, the test gets changed. If it isn’t, the human recognizes the bug, and logs it, or just adds a new unit test and fixes the problem in their own code.

Fast failure tests have a lot of potential as a testing tool. What Mark described is something I’ve written about before: Computer Assisted Testing. After all, I’m part of a school that believes testing is an intellectual craft, and skilled tester activities cannot be automated. An issue many have heard me rant about before is that until we can program inference, and have some sort of intelligent, thinking bots doing testing, we can’t automate all the tests a human tester can do. Humans handle variation easily, and respond to change. Fast failure tests yield the best of both worlds. The computer does what it is good at, and the human exploits the computer to help them concentrate on what they are good at. “So why didn’t you think of this solution Jonathan?” you might ask. I guess I got caught up in the technology instead of thinking about a solution that harnesses both the computer and the skills of a human. I forgot to practice what I preach. This also demonstrates how brilliant people like Mark design simple, elegant solutions, and teach me something every time I talk to them.

There are some potential benefits for fast failure tests, even if they don’t completely emulate what a human does. For example, say a tester must manually inspect ten items every build. We develop some fast failure tests that do a rough approximation of that inspection, and now run these automated tests every build. With the fast failure tests, say that two of those ten items under test report failures because of a variation. The tester inspects the two files manually, and using their judgement and skill realizes that these are legal variations. Also, say that in one of five builds, the fast failure tests fail on one of the items under test because of a bug that can be reported. Instead of the burden being completely on the tester to inspect each item every build, they now have to inspect far fewer items after each build. Even though they may get a couple of red herrings each build due to variation, the tests prove their worth by helping the tester quickly identify potential problems. This test design shows how a computer helps the tester work more efficiently.

If there is a need to have a more complex test that can provide more information about the failures, we can develop it over time. In the mean time, we have a solution that is good enough, and we can use it as a base line for the test under development. However, a complex test automation solution can be dangerous. As Mark warned, test automation is software development, so it is just as prone to bugs, design problems, maintenance issues etc. as any other software. A simple solution that requires some human intervention may be more efficient than a complex one that requires a lot of time spent in the automation code to keep it working.

I’m glad I have smart people like Mark around to talk to, who challenge my ideas and let me know when I’m off the mark.

Watir Release

The Web Testing with Ruby group released version 1.0 of the Watir tool today. Check it out here: Get Watir

Download the latest release which is a zip file at the top of the watir package list. (At the time of this posting, the latest version is 1.0.2.) Open the User Guide for installation instructions, and check out the examples to see how you can use the tool.

You will need Ruby if you don’t have it installed. Version 1.8.2-14 Final is a good candidate to use, or 1.8.1-12 if you prefer to be more on the trailing edge.

For those who may not be familiar with the tool, it allows for testing web applications at the web browser level. The project utilizes testable interfaces in web browsers to facilitate test automation. Currently it only supports Internet Explorer, but plans are underway to support other browsers. It is also only available on Windows.