January 16, 2006

RE: Model Myopia in Testing

Chris McMahon asks: "How do I find a useful model when unit tests are passing and show 100% code coverage?" Chris asked me this a bit tongue-in-cheek because he has heard me rant about these sorts of things before. I'm glad he asked the question though, because it brings up a different kind of model myopia. In yesterday's post, we have modeling myopia with regards to the application we are testing. This question brings up model myopia regarding our potential testing contexts.

This is a common question, particularly on agile teams that utilize automated unit testing who are struggling with a tricky bug. Let's examine this situation in more detail. What does the claim in the question tell us about testing?

This claim provides information about a certain testing context, the code context. We know that within this testing context, we have a measure of coverage that claims 100%, and that the tests that achieve this coverage claim pass. Is this enough information? We have a quantitative claim, but this doesn't tell us much from a qualitative perspective. Why does this matter?

One project I saw had 300 automated unit tests that achieved a high level of code coverage on a small application. When we did testing in another context (the user context, or through a Graphical User Interface), the application couldn't pass basic tests. On further investigation, we found that most of the unit tests were programmed to pass, no matter what the result of the test was. The developers were measured on the amount of automated unit tests they wrote, on the code coverage attained as reported by a coverage tool, and they could only check in code when the tests passed. As a measurement maxim says: show me how people are measured, and I'll show you how they behave.

This is an extreme example, but underlines how dangerous numbers can be without a context. Sure, we had a number of passing tests with a high degree of code coverage on a simple application, but the numbers were almost meaningless from a management perspective because the tests were of poor quality, and many of them didn't really test anything. I've seen reams of automated test code not test anything too many times to take a quantitative claim on its own. (I've also seen a lot of manual tests that didn't test anything much either.)

Now what happens if we agree that the quality of the unit tests are sufficiently good? We don't have ridiculous measures like SMART goals where we "pay for performance" on automated unit tests, code coverage, bug counts, etc., so we don't have this "meet the numbers" mentality that frustrates our goals. We know that our tests are well vetted, thought out and have helped steer our design. Our developers are intrinsically motivated to write solid tests, and they are talented and trustworthy. What now? All the tests passed. Where can the problem be?

It's important to understand that "all the tests we could think of at a particular time passed", not that "all the possible tests we could ever think of, express in program code and run have passed". This is an important distinction. Brian Marick has good stories about faults of ommission, and reminds us that a test is only as good as what it tests. It doesn't test what we didn't write in the first place. Narrow modeling is usually a culprit, and once we learn more about the problem we can write a test for it. It's hard to try to write a test for a particular problem that has stumped us and we don't have enough information yet, or to predict all possible problems up front when doing TDD.

In other cases, I have seen tests pass in an automated framework that were due to using the framework incorrectly which caused false positives. This is rare, but I have seen it happen.

Michael Bolton has summed up automated unit testing assumptions eloquently:

    When someone says "all the unit tests passed", we should hear "the unit tests that we thought of, for theories of error of which we were aware, and that we considered important enough to write given the time we had, and that we presume to have run on the units of the product that we assume to have been built, and that we presume actually test something, and that we presume provide a trustworthy result 100% of the time, passed."

Testing Contexts
Another important distinction to make is that we know this information comes from one testing context, the code context. We know there are others. I split an application with a user interface into three broad contexts: the code, the system the application is installed in, and the user interface. We know at the user interface layer, the application is greater than the sum of its parts. It will often react differently when tested from this context than it will from the code context. There are many potential testable interfaces within each context that can provide different information when testing. What interfaces, and in what contexts are we gathering test-generated information from?

When I see a claim like this: "we have 100% of our automated unit tests passing with 100% code coverage", that only tells me about one of three possible model categories for testing. If we haven't done testing in other contexts, we are only seeing part of the picture. We might be falling into a different type of modeling myopia - looking too narrowly at one testing execution model.

It's common to only view the application through the context we are used to. I tend to use the term "interface within a context" rather than "black box" and "white box" after my experience with TDD. The interface we interact with the software with the most can cause us to narrow our focus on testing. We then model the testing we can do with a preference to this interface. Mike Kelly has a blog post that deals with bounded awareness and inattentional blindness, and how they can affect our decisions.

I see this frequently with developers who are working predominantly in the code context, and testers who are testing only in the user interface context. Frequently, each treat testing in the other context with some disdain. Often, testing diversity pays off, and when we work together bringing expertise and ideas from any testable interface we were using, we can combat model myopia.

Testing on one context does not mean testing in another context is duplicated effort or unnecessary. Often, testing using a different model reveals a bug that isn't obvious, or easily reproducible in one context.

Not too long ago, I had two meetings. One was with two developers who asked why I bothered to test against the UI when we had so many other interfaces that were easier to test against behind the GUI. In another meeting, a tester asked why I bothered doing any testing behind the GUI when the end user only uses the GUI anyway. In both cases, my answer was the same: "When I stop finding bugs that you appreciate and find useful in one interface that I couldn't find more easily in the other, I'll stop testing against that interface."

When testing in a context fails to provide us with useful information about the product we are testing, we can drop that model in its importance to our testing. Knowing about a testing model and choosing not to use it is different than thinking we have covered all ourtesting models, and still be stumped as to why we have problems.

Posted by jonathankohl at 10:06 PM

January 15, 2006

Combating Model Myopia in Testing

Part 1

In my last post, I posed a testing problem. This post begins to address how I view the problem, and my motivation behind posing it.

Models
In software testing, a "model" is a representation of something we are trying to understand. It might be a representation in our mind (an idea), a diagram (like an Entity-Relationship diagram), or it might be a physical representation. A model helps us think of a system or concept in a certain way, and as a representation is not going to be perfect or complete. We can have many models for the same thing - our only limitation is our imagination.

Here's an example:
How many ways can you model a web page? I can think of at least three categories:
1) as an end-user, or consumer view where I am using a web page for some purpose. I see the webpage as having text, colors, pictures and hyperlinks. I visualize that web page, served up in a web browser when I think about a particular page I find useful, and want to go to it. I interface with it and relate to it as it is shown through a web browser.
2) as a web developer. I tend to develop web pages using a text editor and do all of the HTML, CSS, etc. by hand. I interface with the web page with a text editor, and I visualize it and relate to it by the HTML view which is made up of markup tags. I interface with the web page primarily through a text editor.
3) as a web designer. I either have a visual picture in my mind of the layout and design of a web site I want to create and I follow that, or sometimes I draw it out using a tool. Even a rough sketch of a general layout on a piece of paper or on a whiteboard will convey that idea. I visualize the web page according to that layout.

These are three simple examples of modeling a web page. I added a couple of dimensions to help describe the models - a user, and an interface. Each one is different, but I may switch between all three as I'm developing a web page, or when I'm testing.

Since I often test web applications, I think of these different views a lot. I think of the first two very often, especially since I did a lot of web development in the mid-late '90s. When I started writing web applications, I also learned to model from a programming context. My HTML tags, page layout, and content would be generated according to program logic, so I had a fourth model category derived from my role as a web application programmer. Understanding all the components used in rendering a web page is important to me when testing a web application. Everything from the web browser to the application server, web server, database if one is used, any drivers that are used, as well as the program code design is important information that can help with testing. I actually visualize what is happening across the entire application path when data is input into a web page, sent to the web server, processed, and the page I am on is updated. Even a simple HTML page has an interesting path. From the web server we have a text document that contains HTML markup, with certain tags that denote it as such. When a web browser asks for that page, the web server takes the tags, processes them and sends them via HTTP to the machine that is asking for it. The web browser takes that HTTP response, interprets it, and if it comes through without an error, it processes the HTML and displays it in a web browser.

Why is this important?

Modeling applications like this helps me imagine areas of potential problems based on experience and technical knowledge of the various technologies used, and also helps me narrow down or get repeatable cases for bugs I find in the application. It also helps me generate different testing ideas instead of one "happy path" based off of requirements documents, I can think of other potential usage of the software that might have the potential for problems. When I work with software users in their own environment, I am always amazed at how they use the software in unintended ways which they are perfectly happy doing.

What other models can you think of when testing web applications? Who are the users? Not all of them will be human. What are the interfaces? Not all of them will involve a Graphical User Interface like most modern, familiar web browsers.

But I've Run Out of Ideas!

The problem I described was something I deliberately chose as a modeling problem. Notice that in the energy industry, when I got a failure on a search attempt that contained the word "well", I immediately had a suspicion of what the problem might be. But if I was in another industry, "well" may not be in my common vocabulary as a noun, but primarily an adverb. (Note that language is important when testing.) In my description here, we have at least two modeling categories of the software. One as "enterprise search", and one as "my energy company search". The ideas that each model represent are going to be subtly different. That subtle shift between models can be the difference between tracking down the source of a bug or not. But if I didn't have energy experience, how can I find that model that will help me track down the bug?

I call this so-called running out of ideas "testing model myopia". If we were omniscient, we would know all the possible ways to model the software, and we'd know where all the bugs were. If we were semi-omniscient and merely knew all the possible ways to model our software, we'd have an easier time. Since we aren't, we have to use our skill as testing thinkers to break out of mental blocks. This can be very difficult if the entire team has testing model myopia as I described in the problem. When under project stress, with constraints on a development team such as time and resources, a tester may have to build a case in order to be able to pursue a testing idea. The predominant model of the software as shared by the team can be very entrenched, and it can be threatening to have that model questioned or analyzed. Nevertheless, as a tester, we are what Jerry Weinberg has described: "someone who can imagine that there might be another way."

One of the first places I would start to tackle this problem is to imagine other models of the software that we aren't addressing when testing.

Stay tuned for Part 2

Posted by jonathankohl at 09:01 PM