Category Archives: coverage

Testing and Numbers

Michael Bolton said this about numbers on testing projects:

In my experience, the trusted people don’t earn their trust by presenting numbers; they earn trust by preventing and mitigating bad outcomes, and by creating and maintaining good ones.

I agree. Lately, I’ve been thinking about how we report numbers on testing projects. I was recently in a meeting of Software Quality Assurance professionals and a phrase kept coming up that bothered me: “What percent complete are you on your project?” I think they mean that they have exercised a certain percentage of test cases on the development project. I don’t feel like I can know what “percent complete” of test cases I am on a project, so I’m uncomfortable giving out a number such as “90%” complete. How can I know what 100% of all test cases on a project are? Cem Kaner’s paper: Impossibility of Complete Testing shows us how vast the possible tests that can be run on a project are.

For all the projects I’ve been on that claimed to have 100% test coverage, each one had a major bug discovered by a customer in the field that required a patch. We obviously at best had only covered 100%-1 of the possible test cases. That one test case that the customer found a bug that we did not find was not in our set of test cases, so how could we claim that we had 100% completion? How many more are we missing?

Reporting numbers like this are dangerous in that they can create a false sense of security for testers and project stakeholders alike. If we as testers make measurement claims without looking at the complexity of measurement, we had better be prepared to lose credibility when bugs are found after we report a high “percent complete” number prior to shipping. Worse still, if testers feel that they are 90% complete of all tests on a project, the relentless pursuit of knowledge and test idea generation is easily replaced by apathy.

Cem Kaner points out many variables in measuring testing efforts in this paper: Measurement Issues and Software Testing. Accurate, meaningful measurement of testing activities is not a simple thing, so why the propensity for providing simple numbers?

I look at a testing project as a statistical problem. How many test cases could be in this project if I knew the bounds of the project? Since I don’t usually know the bounds of the entire project, it is difficult to do an accurate statistical analysis using formulas. Instead, I can estimate based on what I know about a project now, and use heuristics to help deal with the vast numbers of possible tests that would need to be covered to get a good percentage. As the project progresses, I learn more about it, and use risk-based techniques to try to mitigate the risk to the customer. I can’t know all the possible test cases at any given time. I may have a number at a particular point in the project, so of the test cases that I know of, right now, we may have a percentage of completion. However, there may be a lot of important ones that I haven’t, or the testing team together haven’t thought of. That is why I don’t like to quote numbers of “percent complete” without providing a context, and even then I don’t present just numbers.

The Software Quality Assurance school of thought seems to be numbers obsessed these days. I am interested in accurate numbers, but I don’t think we have enough information on testing projects to be using many of the numbers we have conditioned project stakeholders to rely on. Numbers are only part of the picture – we need to realize that positive project outcomes are what are really important to project stakeholders.

Numbers without a context and careful analysis and thought can give project stakeholders a false sense of security. This brings me back to Michael Bolton’s thought: what is more important to a stakeholder? A number, or the opinion of a competent professional? In my experience, the latter outweighs the former. Trust is built by delivering on your word, and helping stakeholders realize the results they need. Numbers may be useful when helping provide information to project stakeholders, but we need to be careful how we use them.

Underdetermination

How many testing projects have you been on that seemed to be successful only to have a high impact bug be discovered by a customer once the software is in production? Where did the testing team go wrong? I would argue that the testing team didn’t necessarily do anything wrong.

First of all, a good tester knows (as Cem Kaner points out) that it is impossible to completely test a program. Another reason we get surprised is due to underdetermination. The knowledge about the entire system is gathered by testers throughout the life of the project. It is not complete when the requirements are written, and it probably isn’t complete when the project ships. The knowledge can be difficult to obtain, and is based on many aspects not limited to: access to subject experts, the skills of the testers involved and their ability to extract the right information on an ongoing basis. Realizing that you are dealing with a situation where you probably do not have all the information is key. This helps guide your activities and helps you keep an open mind about what you might be missing.

Underdetermination is usually used to describe how scientific theories go far beyond empirical evidence (what we can physically observe and measure), yet they are surprisingly accurate. One example of underdetermination is described by Noam Chomsky. He states that the examples of language that a child has in their environment underdetermines the actual language that they learn to speak. Languages have rule sets and many subtleties that are not accurately represented by the common usage which the child learns from.

Testers regularly face problems of underdetermination. The test plan document underdetermines the actual testing strategies and techniques that will be employed. The testers knowledge of the system underdetermines what the actual system looks like. Often, key facts about the system come in very late in the testing process which can send testing efforts into a tailspin.

Just knowing that the testing activities on a project underdetermine what could possibly be tested is a good start. Test coverage metrics are at best a very blunt measurement. Slavishly sticking to these kinds of numbers, or signing off that testing is only complete when a certain percentage of coverage is complete can be misleading at best, and at worst dangerous.

If testing efforts do fail to catch high impact bugs, there are a couple of things to remember:

  1. It wasn’t necessarily a failure – it is impossible to test everything
  2. Testing knowledge at any given point in a project is underdetermined by what could be tested

If this happens to your testing team, instead of just incorporating this problem as a test case to ensure this bug doesn’t occur again, evaluate *why* it happened. What information were the testers missing, and why were they missing it? How could they have got this information when testing? The chances of this particular bug cropping up again is pretty slim, the chances of one like it popping up in another area of the program are probably much greater than one might initially think.

Instead of evaluating solely on coverage percentages on a project, be self critical of your testing techniques. Realize that the coverage percentages do not really give you much information. They tell you nothing about the tests you haven’t thought to run – and those tests could be significant in number as well as in potential impact. Evaluate what and how you are testing throughout the project, and periodically call in experts from other parts of the system to help you evaluate what you are doing. Think of what you could be missing and realize that you can do a very good job, even without all the information.

The scientific community does quite well even though they frequently only work with a small part of the whole picture. Testers should be able to as well. One interesting side note is that many significant discoveries come about by accident. Use these “testing accidents” to learn more about the system you are testing, the processes you are using, and more importantly what they tell you about *you*, your testing and what you can learn from it.