John Kordyback on Testing

Testing is an exercise of discovering interfaces.

If this interests you, think about the above statement, and share your conclusions. I’ll post my own thoughts shortly.

(edit – I’ve added a couple of my own thoughts)

From a development perspective, driving out interfaces and making the code more testable will improve the overall design when following test-driven development. The developers drive out the design with unit tests using a tool like JUnit which helps them create interfaces. But JUnit itself uses an interface to exercise areas of the code. Now things get interesting…

From a tester’s perspective, this concept came clear to me when exploring areas of the application behind the GUI layer to write automated tests against. For example, if a developer makes a particular part of the code testable, we can use some sort of tool to write tests and exercise areas of the code. In this case, we find an area we’d like to test, and the developers create an interface for us to test against. If we create fixtures with FIT, we are essentially creating another testable interface. When we use WTR Ruby scripts to drive the browser via the DOM, we are using the DOM interface to test the application. Screen-scraping automated functional testing tools create their own testable interface.

From this perspective, the difference between JUnit or HTTPUnit tests, FIT or WTR tests or some other testing tool is the type of interface they use.

An interesting side-effect of seeking testable interfaces in a product from the code level up: the more we learn about the interfaces, the more we explore the product and the more we learn about it. The more we learn, the more information we gather about the applications strengths and weaknesses. It kind of sounds like testing software doesn’t it?

We can take a cue from our test-driven developer colleagues and how they use interfaces to design better code, and seek to drive out interfaces to help us test the product more thoroughly. This is another view of testing which takes us away from a strict black-box perspective. Think of peeling back the GUI layer (with web applications it can be simple to peel back the browser and start looking at the HTTP layer and down into the code), and look for testable interfaces, or areas that could use them. You might be surprised at what you find, and a new world of testing may be waiting to be discovered.

Test Automation as a Testing Tool

When we think of GUI-level test automation, we usually think of taking some sort of test case, designing a script with a language like Perl, Ruby or a vendor testing tool, and developing the test case programatically from beginning to end. The prevailing view is often that there is some sort of manual test case that we need to repeat, so we automate it in its entirety with a tool. This is a shallow view of test automation as James Bach, Bret Pettichord, Cem Kaner, Brian Marick and others have pointed out. I prefer the term “Computer Assisted Testing” (I believe coined by Cem Kaner) over “test automation” for this reason.

While developing automated tests, an interesting side effect came out in my development efforts. While I was debugging script code, I would run a portion of a test case many times. This would be a sequence of events; not an entire test case. I started to notice behavior changes from build to build when I watched a series of steps play back on my screen. When I would investigate, I found bugs that I may not have discovered doing pure manual testing, or running unattended automated tests. I started keeping snippets of scripts around to aid in exploratory testing activities, and found them very useful as another testing tool.

There are some benefits to automating a sequence of steps and blending this type of testing with manual testing. For example, if a test case has a large number of steps required to get to the area we want to focus testing on, using a script to automate that process helps us get there faster, frees us from distractions and helps us focus on the feature or unit under test. Humans get tired repeating tasks over and over and are prone to error. If precision is needed, computers can be programmed to help us. We can also test the software in a different way using a tool. We can easily control and vary test inputs, and measure what was different from previous test runs when bugs are discovered.

There are certain tasks that a human can do much better than a computer when it comes to testing. As James Bach says, testing is an interactive, cognitive process. Human reasoning and inference simply cannot be programmed into a test script, so the “test automation” notion will not replace a tester. Blending the investigative skills and natural curiousity of a tester with a tool that helps them discover bugs is a great way to focus some test automation efforts.

If you think of your automation efforts as “Computer Assisted Testing”, many more testing possibilities will come to mind. You just might harness technology in a more effective way, and it will show in your testing efforts.

I currently use Ruby as my Computer Assisted Testing tool, with the WTR Controller as my web application testing tool of choice. (cf. A Testing Win Using Ruby)

Iterative Testing (intro)

Edit April 11/2006 – This is a topic I’ll be talking about more in the near future. In the mean time, check out this test debt post which describes an iterative testing challenge.

A few people have asked me recently about how I test on Agile projects, or on iterative projects in general. I’ll post some of my experiences here with the disclaimer that these practices are very much a work-in-progress.

To begin this discussion, I’ll point readers back to this post which is about describing testing activities to business stakeholders. Testing activities are broken up into Business Facing and Technology Facing activities (terms coined by Brian Marick) not just over a release, but through each iteration. These two sets of activities not only guide test exection, but test planning as well. Furthermore, each area of testing activity has collaborative components.

In the coming days, I’ll share some of the techniques that I have been working on.

James Bach on Test Automation

James Bach has an excellent post on his blog about test automation with developer and tester collaboration. Be sure to check out his presentation on Agile Test Automation. It is well worth the read. A collaborative approach in test development is important if the tests are to be useful to the entire team. Tests should not just be useful to the testing team, or specialists who know how to use a proprietary testing tool.

Borrowing from Bret Pettichord’s article “Testers and Developers Think Differently“, pairing good developers who are effective problem solvers and software creators with testers who are effective problem presenters and test idea generators can be a powerful combination. In my own experience working with developers in this way, solutions that the testers need can be quickly developed to meet the unique needs of a project or testing department.

Javan Gargus on Underdetermination

Javan Gargus writes:

I was a bit taken aback by your assertion that the testing team may not have done anything wrong by missing a large defect that was found by a customer. Then, I actually thought about it for a bit. I think I was falling into the trap of considering Testing and Quality Assurance to be the same thing (that is a tricky mindset to avoid!! New testers should have to recite “Testing is not QA” every morning. ). Obviously, the testers are no more culpable than the developers (after all, they wrote the code, so blaming the testers is just passing the buck). But similarly, it isn’t fair to blame the developers either (or even the developer who wrote the module), simply because trying to find blame itself is wrongheaded. It was a failure of the whole team. It could be the result of an architecture problem that wasn’t found, or something that passed a code review, after all.

Clearly, there is still something to learn from this situation – there may be a whole category of defect that you aren’t testing for, as you mention. However, this review of process should be performed by the entire team, not just the testing team, since everyone missed it.

Javan raises some good points here, and I think his initial reaction is a common one. The key to me is that the people should be blamed last – the first thing to evaluate is the process. I think Javan is right on the money when he says that reviews should be performed by the entire team. After all, as Deming said, quality is everyone’s responsibility. What the development team (testers, developers and other stakeholders) should strive to do is to become what I’ve read James Bach call a “self critical community”. This is what has served the Open Source world so well over the years. The people are self critical in a constructive sense, and the process they follow flows from how they interact and create working software.

Underdetermination

How many testing projects have you been on that seemed to be successful only to have a high impact bug be discovered by a customer once the software is in production? Where did the testing team go wrong? I would argue that the testing team didn’t necessarily do anything wrong.

First of all, a good tester knows (as Cem Kaner points out) that it is impossible to completely test a program. Another reason we get surprised is due to underdetermination. The knowledge about the entire system is gathered by testers throughout the life of the project. It is not complete when the requirements are written, and it probably isn’t complete when the project ships. The knowledge can be difficult to obtain, and is based on many aspects not limited to: access to subject experts, the skills of the testers involved and their ability to extract the right information on an ongoing basis. Realizing that you are dealing with a situation where you probably do not have all the information is key. This helps guide your activities and helps you keep an open mind about what you might be missing.

Underdetermination is usually used to describe how scientific theories go far beyond empirical evidence (what we can physically observe and measure), yet they are surprisingly accurate. One example of underdetermination is described by Noam Chomsky. He states that the examples of language that a child has in their environment underdetermines the actual language that they learn to speak. Languages have rule sets and many subtleties that are not accurately represented by the common usage which the child learns from.

Testers regularly face problems of underdetermination. The test plan document underdetermines the actual testing strategies and techniques that will be employed. The testers knowledge of the system underdetermines what the actual system looks like. Often, key facts about the system come in very late in the testing process which can send testing efforts into a tailspin.

Just knowing that the testing activities on a project underdetermine what could possibly be tested is a good start. Test coverage metrics are at best a very blunt measurement. Slavishly sticking to these kinds of numbers, or signing off that testing is only complete when a certain percentage of coverage is complete can be misleading at best, and at worst dangerous.

If testing efforts do fail to catch high impact bugs, there are a couple of things to remember:

  1. It wasn’t necessarily a failure – it is impossible to test everything
  2. Testing knowledge at any given point in a project is underdetermined by what could be tested

If this happens to your testing team, instead of just incorporating this problem as a test case to ensure this bug doesn’t occur again, evaluate *why* it happened. What information were the testers missing, and why were they missing it? How could they have got this information when testing? The chances of this particular bug cropping up again is pretty slim, the chances of one like it popping up in another area of the program are probably much greater than one might initially think.

Instead of evaluating solely on coverage percentages on a project, be self critical of your testing techniques. Realize that the coverage percentages do not really give you much information. They tell you nothing about the tests you haven’t thought to run – and those tests could be significant in number as well as in potential impact. Evaluate what and how you are testing throughout the project, and periodically call in experts from other parts of the system to help you evaluate what you are doing. Think of what you could be missing and realize that you can do a very good job, even without all the information.

The scientific community does quite well even though they frequently only work with a small part of the whole picture. Testers should be able to as well. One interesting side note is that many significant discoveries come about by accident. Use these “testing accidents” to learn more about the system you are testing, the processes you are using, and more importantly what they tell you about *you*, your testing and what you can learn from it.

TDD Pairing – What I Missed

Since this is a training exercise, and I am working with a senior developer who is a good teacher, we spent some time reviewing our first pairing session. To teach me while we paired, the developer had created a situation to see if I could spot a problem. I of course missed it. During our TDD session, my primary focus was on thinking of testing ideas. The developer however was simultaneously thinking of making the code testable, improving the software design, and continuously improving the testability of the code. These three activities he says are the hallmarks of a good design.

Leading me down the garden path in the hopes of teaching me something, he deliberately developed a bad code smell and tried to guide me into seeing it. I was so focussed on generating testing ideas that I missed it. He deliberately made the unit tests awkward and difficult to implement. I trusted his design and took that for granted as a technical issue that I didn’t understand. I didn’t realize that the fact that the tests were onerous and difficult to set up and code was a bad test smell.

The lesson that I learned is that if we can add tests simply, it’s a sign of a good code design. Since the tests were awkward and I was completely dependent on the developer to add them, I needed to be concerned. As a tester, part of my job when pairing in a test-driven development situation is to watch for bad testing smells. Those bad smells in the tests are symptoms that something is wrong with the code. The developer pointed out that when it’s hard to test, it’s time to improve the code. When testability is improved, a byproduct is a better design.

At the end of the day, I was thinking about more test ideas and felt we needed to add much more to the existing design. The developer however realized we were in trouble and needed to refactor the existing code to make it more testable. Lesson learned – I need to watch that we can add tests easily. That is a sign of a good design. It doesn’t take a lot of programming skill to realize that some unit tests are awkward while others are simple and elegant. I’m also fairly confident that testers who don’t program would be able to learn to see the difference quite quickly after spending time with a developer who can demonstrate good and poor unit tests.

Pairing in Test Driven Development: Day One Report

An area of Agile Development where testers are usually absent is in Test-Driven Development and other developer testing activities. Since I like to collaborate with developers as much as I can, I asked them what other areas I could support them in. (Brian Marick calls these types of activities Technology-facing programmer support.) They told me that an area they would like to see testers work with them was in unit test development, especially when using Test-Driven Development. While I have pair tested with developers to help generate unit testing ideas, I haven’t actually worked with them during development in a pair programming kind of role. They felt that pairing with a tester would help them generate test ideas, and that it would be a good fit. The developer is thinking about programming most of the time while the tester is thinking about tests most of the time. I encourage developers to use test-infected development techniques, so I decided to stop theorizing and actually give it a try. I can’t very well answer the question about how a tester can add value to Test-Driven Development unless I’ve tried it.

Yesterday I paired with a senior developer who was developing an application in Java using the Intellij IDEA IDE which has JUnit integration. He kindly agreed to take me through the paces, but I have to admit I was a bit nervous. While I have basic Java programming skills, I am not a great coder and I usually work with scripting languages when I develop automated test cases. I wasn’t sure if I would be able to add any value to a programming activity or not.

After we walked through the business problem the coding effort for the day was to address, and some of the code that was in place, we began looking at the test framework. In this case, the developer had already written the first test, and an implementation that worked well enough to get that test to pass. We picked up at this point, and looked at the business rules and designed a new test. When we ran JUnit, the test failed, so that told us that the code implementation needed some work. The developer added some more logic to get that test to pass, and then we added another test case. We continued on adding a test cases which would initially fail, and the developer would work on the implementation to get the test cases to pass. At a certain point, he felt that we had a good basic set of test cases that covered enough of the business logic.

At the end of our session, we had an implementation that satisfied some basic test cases. I suggested test ideas, but was completely dependent on the developer to write the JUnit tests. I was absorbed in thoughts around more test ideas and what would seem feasible. I suggested a lot of test ideas that we could tackle the next day, and we discussed what tests we could justify doing. My intitial response was to test as much as possible, while the developer was thinking about the big picture and what time we could spend on testing. I realized this would be a trade-off; testing everything as robustly as possible would cause a lot of duplicated effort.

I didn’t feel like I added a lot of value to this session. Granted, I was getting trained in how Test – Driven Development works. I caught some minor syntax errors as we paired; the developer caught a few in his own work as well. We both missed some coding errors, but the compiler and the unit tests caught those.

My thoughts at the end of the day were that I had learned a lot, but added little value. Clearly, I need to learn more about the process and do this more in the hopes of adding value to a developer. The basic test ideas that I had generated were already ideas that the developer had thought of. I then started to think of more complex test ideas, and felt that we had very little coverage. The developer agreed that the coverage was light and we needed to work on more tests the next day. I left at the end of the day thinking about further tests we could write – I could add value there.

Describing Software Testing Using Inference Theories

I am re-reading Peter Lipton’s Inference To The Best Explanation which I first encountered in an Inductive Logic class I took in University. Lipton explores this model to help shed some light on how humans observe phenomena, explain what has been observed, and come to conclusions (make an inference) about what they have observed. Lipton says on p. 1:

We are forever inferring and explaining, forming new beliefs about the way things are and explaining why things are as we have found them to be. These two activities are central to our cognitive lives, and we usually perform them remarkably well. But it is one thing to be good at doing something, quite another to understand how it is done or why it is done so well. It’s easy to ride a bicycle, but very hard to describe how to do it. In the cases of inference and explanation, the contrast between what we can do and what we can describe is stark, for we are remarkably bad at principled description. We seem to have been designed to perform the activities, but not to analyze or defend them.

I had studied Deductive Logic and worked very hard trying to master various techniques in previous courses. I was taken aback in the first lecture on Inductive Logic when the professor told us that humans are terrible at Deductive Logic, and instead use Inductive Logic much more when making decisions. Deductive Logic is structured, has a nice set of rules, is measurable and can be readily explained. Inductive Logic is difficult to put parameters around, and the inductive activities are usually explained in terms of themselves. The result of explaining inductive reasoning is often a circular argument. For this reason, David Hume argued against induction in the 18th century, and attempts through the years to counter Hume rarely get much further than he did.

This all sounds familiar from a software testing perspective. Describing software testing projects in terms of a formalized theory is much easier than trying to describe what people actually do on testing projects, most of the time. It’s nice to have parameters around testing projects, and use a set of formal processes to justify the conclusions, but are the formalized policies an accurate portrayal of what actually goes on? My belief is that software testing is much more due to inference than deduction, and attempts to formalize testing into a nice set of instructions or policies are not a reflection of what good testing actually is.

What constitutes good software testing is very difficult to describe. I’m going to go out on a limb and use some ideas from Inductive Logic and see how they match software testing activities from my own experiences. Feel free to challenge my conclusions regarding inference and testing as I post them here.

Thoughts on product development, management, design, mobile and other topics.