Category Archives: unrepeatable bugs

Who Do We Serve? – Intermittent Problems Revisited

I seeded a problem for readers to spot in my recent post “Who Do We Serve?“. George Dinwiddie was the only reader who not only spotted the problem, he emailed me. George says:

…there’s [a] red flag that occurs early in your story that I think deserves more attention. “Sometimes the tests fail sporadically for no apparent reason, but over time the failures appear less frequently.” As Bob Pease (an analog EE) used to say in Electronic Design and EDN magazines, “If you notice anything funny, record the amount of funny.” When something fails intermittently or for an unknown reason, it’s time to sit up and take notice. The system should not be doing something the designers don’t understand.

George spotted the problem that I introduced, but didn’t expand on, and as I hoped a reader would do, emailed me to point out my omission. Good job George, and thanks for emailing me when you spotted it. Your thoughts are absolutely in line with my experience.

One area of expertise I have developed is the ability to track down repeatable cases for intermittent bugs. These are difficult problems for any team to deal with. They don’t fit the regular model, often requiring off-the-wall experimentation, lots of thinking time, tons of creativity, and a good deal of patience. One needs to look at the potentially enormous amount of data in front of them, and figure out how to weed out the things that aren’t important. There is trial and error involved as well, and a good deal of exploration. Naturally, exploratory testing is a fit when tracking intermittent problems.

On Agile teams, I have found we can be especially prone to ignoring intermittent problems. The first time I discovered an intermittent problem on a team using XP and Scrum, we did exactly as Ron Jeffries has often recommended: we came to a complete stop, and the whole team jumped in to work on the problem. We worked on it for a day straight, but people got tired. We couldn’t track down a repeatable case, and we couldn’t figure out the problem from the code. After the second day of investigation, the team decided to move on. The automated tests were all passing, and running through the manual acceptance tests didn’t reveal the problem. In fact, the only person who regularly saw the problem was me, the pesky tester, who shouldn’t have been on an XP team to begin with.

I objected to letting it go, but the team had several reasons, and they had more experience on Agile projects than I did. Their chief concerns were that tracking down this problem was dragging velocity down. They wanted to get working on the new story cards so we could meet our deadline, or finish early and impress the customer. Their other concern was that we weren’t following the daily XP processes when we were investigating. They said it would be better to get back into the process. Furthermore, the automated tests were passing, and the customer had no problem with the functionality. I stood down, and let it go, but made sure to watch for the problem with extra vigilance, and took it upon myself to find a repeatable case.

Our ScrumMaster talked to me and said I had a valid concern, but the code base was changing so quickly that we had probably fixed the bug already. This was a big red flag to me, but I submitted to the wishes of the team and helped test newly developed stories and tried to give the programmers feedback on their work as quickly as I could. That involved a lot of exploratory testing, and within days, my exploratory testing revealed that the intermittent bug had not gone away. We were at the end of a sprint, and it was almost time for a demo for all the business stakeholders. The team asked me to test the demo, run the acceptance tests, and support them in the demo. We would look at the intermittent bug as soon as I had a repeatable case for it.

Our demo was going well. The business stakeholders were blown away by our progress. They thought it was some sort of trick – it was hard for them to understand that in so short a time we had developed working software that was running in a production-like environment. Then the demo hit a snag. Guess what happened? The intermittent bug appeared during the demo, on the huge projector screen in front of managers, executives, decision makers and end users. It was still early in the project, so no one got upset, but one executive stood up and said: “I knew you guys didn’t have it all together yet.” He chuckled and walked out. The sponsor for the project told us that the problem better be fixed as soon as possible.

I don’t think management was upset about the error, but our pride was hurt. We also knew we had a problem. They were expecting us to fix it, but we didn’t know how. Suddenly the developers went from saying: “That works on my machine.” to: “Jonathan, how can I help you track down the cause?” In the end, I used a GUI automated testing tool as a simulator, and based on behavior I had seen in the past, and patterns I documented from the demo, I came up with a theory, and using exploratory testing, I designed and executed an experiment. Using a GUI test automation tool as a simulator while running manual test scenarios helped me find a cause quite quickly.

My experiment failed in that it showed my theory was wrong, but a happy side effect was that I found the pattern that was causing the bug. It turned out that it was an intermittent bug in the web application framework we were using. The programmers wrote tests for that scenario, wrote code to bypass the bug in the framework, and logged a bug with the framework project. At the next demo, we ran manual tests and automated tests showing what the problem was, and how we had fixed it. The client was pleased, and the team learned a valuable lesson. We learned that an XP team is not immune to general software problems, and as George said we practiced this in the future:

When something fails intermittently or for an unknown reason, it’s time to sit up and take notice.

Other Agile teams I’ve been on have struggled with intermittent errors, particularly teams that had high velocity and productivity. Regular xUnit tools and acceptance tests aren’t necessarily that helpful, unless they are used as part of a larger test experiment. Running them until they turn green, and then checking code in without understanding the problem and fixing it will not cause it to go away. An unfixed intermittent bug sits in the shadows, ticking away like a time bomb, waiting for a customer to find it.

Intermittent problems are difficult to deal with, and there isn’t a formula to follow that guarantees success. I’ve written about them before: Repeating the Unrepeatable Bug, and James Bach has the most thorough set of ideas on dealing with intermittent bugs here: How to Investigate Intermittent Problems. I haven’t seen intermittent bugs just disappear on their own, so as George says, take the time to look into the issue. The system should not be doing something the designers don’t understand.

Tracking Intermittent Bugs

Recognizing Patterns of Behavior

In my last post, the term “patterns” caused strong responses from some readers. When I use the term “pattern,” I do not mean a design pattern, or a rule to apply when testing. For my purposes, patterns are the rhythms of behavior in a system.

When we start learning to track down bugs, we learn to repeat exactly what we were doing prior to the bug occurring. We repeat the steps, repeat the failure, and then weed out extraneous details to create a concise bug report. However, with an intermittent bug, the sequence of steps may vary greatly, even to the point where they seem unrelated. In many cases, I’ve seen the same bug logged in separate defect reports, sometimes spanning two or three years. But there may be a pattern to the bug’s behavior that we are missing when we are so close to the operation of the program. This is when we need to take a step back and see if there is a pattern that occurs in the system as a whole.

Intermittent or “unrepeatable” bugs come into my testing world when:

  1. Someone tells me about an intermittent problem they have observed and needs help.
  2. I observe an intermittent problem when I’m testing an application.

How do I know when a bug is intermittent? In some cases, repeating the exact sequence of actions that caused the problem in the first place doesn’t cause the failure to reoccur. Later on, I run into the problem again, but again am frustrated in attempts to repeat it. In other cases, the problem occurs in a seemingly random fashion. As I am doing other kinds of testing, failures might occur in different areas of the application. Sometimes I see error messaging that is very similar; for example, a stack trace from a web server may be identical with several errors that come from different areas of the application. In still other cases my confidence that several bugs that were reported in isolation is the same bug is based on inference – I have a gut feeling based on experience (called abductive inference).

When I looked back at how I have tracked down intermittent bugs, I noticed that I moved out to view the system from a different perspective. I took a “big picture” view instead of focusing on the details. To help frame my thinking, I sometimes visualize what something in the physical world looks like from high above. If I am in traffic, driving from traffic light to traffic light, I don’t see the behavior of the traffic itself. I go through some intersections, wait at others, and can only see the next few lights ahead. But if I were to observe the same traffic from a tall building, patterns would begin to emerge in the traffic’s flow that I couldn’t possibly see from below. One system behavior that I see with intermittent bugs is that the problem is seemingly resistant to fixes that are completed by the team. The original fault doesn’t occur after a fix, but it pops up in another scenario not described in the test case. When several bugs are not getting fixed or are occurring in different places it is a sign to me that there is possibly one intermittent bug behind them. Sometimes a fault is reported by customers, but is not something I can repeat in the test lab. Sometimes different errors occur over time, but a common thread appears: similar error messages, similar back end processing, etc.

Once I observe a pattern that I am suspicious about, I work on test ideas. Sometimes it can be difficult to convince the team that it is a single, intermittent bug instead of a several similar bugs. With one application, several testers were seeing a bug we couldn’t repeat occur infrequently in a certain part of the application we were testing. At first we were frustrated with not being able to reproduce it, but I started to take notes and save error information whenever I stumbled upon it. I also talked to others about their experiences when they saw the intermittent failure. Once I had saved enough information from error logs, the developer felt he had a fix. He applied a fix, and I tested it and didn’t see the error again. We shipped the product thinking we had solved the intermittent problem. But we hadn’t. To my shock and dismay, I stumbled across it again in a slightly different area than we had been testing after the release.

It took a while to realize that the bug only occurred after working in one area of the application for a period of time. I kept notes of my actions prior to the failure, but I couldn’t reliably repeat it. I talked to the lead developer on the project, and he noticed a pattern in my failure notes. He told me that the program was using a third-party tool through an API in that area of the application. The error logs I had saved pointed to problems with memory allocation, so he had a hunch that the API was running out of allocated space and not handling the error condition gracefully. We had several other bug reports that were related to actions in that area of the application, and a sales partner who kept calling to complain about the application crashing after using it for an hour. When I asked the sales partner what area of the application they were using when it crashed, and it turned out to be the same problem area.

The team was convinced we had several intermittent bugs in that area of the application, based on their experience and bug reports. But the developer and I were suspicious it was one bug that could be triggered by any number of actions showing up in slightly different ways. I did more testing, and discovered that it didn’t matter what exactly you were doing with the application, it had to do with how the application was handling memory in one particular area. Our theory was that the failures occurring after a period of time passing while using the application had to do with memory allocation filling up, causing the application to get into an unstable state. To prove our theory, we had to step back and not focus on the details of each individual case. Instead, we quickly filled up the memory by doing actions that were memory intensive. Then, we could demonstrate to others on the team that various errors could occur using different types of test data and inputs within one area of the application. Once I recorded the detailed steps required to reproduce the bug, other testers and developers could consistently repeat it as well. Once we fixed that one bug, the other, supposedly-unrelated, intermittent errors went away as well.

I am sometimes told by testers that my thinking is “backwards” because I fill in the details of exact steps to repeat the bug only after I have a repeatable case. Until then, the details can distract me from the real bug.

Five Dimensions of Exploratory Testing

(Edit: was Four Dimensions of Exploratory Testing. Reader feedback has shown that I missed a dimension.)

I’ve been working on analyzing what I do when Exploratory Testing. I’ve found that describing what I actually do when testing can be difficult, but I’m doing my best to attempt to describe what I do when solving problems. One area that I’ve been particularly interested in lately is intermittent failures. These often get relegated to bug purgatory by being marked as “Non-reproducible” or “unrepeatable bugs”. I’m amazed at how quickly we throw up our hands and “monitor” the bugs, allowing them to languish sometimes for years. In the meantime, a customer somewhere is howling, or quietly moving on to a competitor’s product because they are tired of dealing with it. If only “one or two” customers complain, I know that means there are many others who are not speaking up and possibly quietly moving to a competitor’s product.

So what do I do when Exploratory Testing when I track down an intermittent defect? I started out trying to describe what I do with an article for Better Software called Repeating the Unrepeatable Bug, and I’ve been mulling over concepts and ideas. Fortunately, James Bach has just blogged about How to Investigate Intermittent Problems which is an excellent, thorough post describing ideas on how to make an intermittent bug a bug that can be repeated regularly.

When I do Exploratory Testing, it is frequently to track down problems after the fact. When I test software, I look at risk, I draw from my own experience testing certain kinds of technology solutions, and I use the software. As I use the software, I inevitably discover bugs, or potential problem areas, and often follow hunches. I constantly go through a conjecture and refutation loop where I almost subconciously posit an idea about an aspect of the software, and design a test to decide whether my conjecture is falsifiable or not. (See the work of Karl Popper for more on conjectures and refutations and falsifiability.) It seems that I do this so much, I rarely think about it. Other times, I very consciously follow the scientific method, and design an experiment with controlled variables, a manipulated variable, and observe the responding variables.

When I spot an intermittent bug, I begin to build a theory about it. My initial theory is usually wrong, but I keep gathering data, altering it, and following the conjecture/refutation model. I draw information from others as I gather information and build the theory. I run the theory by experts in particular areas of the software or system to get more insight.

When I do Exploratory Testing to track down intermittent failures, these are five dimensions that I consider:

  • Product
  • Environment
  • Patterns
  • People
  • Tools & Techniques

Product

This means having the right build, installed and configured properly. This is usually a controlled variable. This must be right, as a failure may occur at a different rate depending on the build. I record what builds I have been using, and the frequency of the failure on a particular build.

Environment

Taking the environment into account is a big deal. Installing the same build on slightly different environments can have an impact on how the software responds. This is another controlled variable that can be a challenge to maintain, especially if the test environment is used by a lot of people. Failures can manifest themselves differently depending on the context where they are found. For example, if one test machine has less memory than another, it might exacerbate the underlying problem. Sometimes knowing this information is helpful for tracking it down, so I don’t hesitate to change environments if an intermittent problem occurs more frequently in one than another, using the environment a manipulated variable.

Patterns

When we start learning to track down bugs when we find them in a product, we learn to repeat exactly what we were doing prior to the bug occurring. We repeat the steps, repeat the failure, and then weed out extraneous information to have a concise bug report. With intermittent bugs, these details may not be important. In many cases I’ve seen defect reports for the same bug logged as several separate bugs in a defect database. Some of them have gone back for two or three years. We seldom look for patterns, instead, we focus on actions. With intermittent bugs, it is important to weed out the details, and apply an underlying pattern to the emerging theory.

For example, if a web app is crashing at a certain point, and we see SQL or database connection information in a failure log, a conjecture might be: “Could it be a database synchronization issue?” Through collaboration with others, and using tools, I could find information on where else in the application the same kind of call to a database is made, and test each scenario that makes the same kind of call to try to refute that conjecture. Note that this conjecture is based on the information we have available at the time, and is drawn from inference. It isn’t blind guesswork. The conjecture can be based on inference to the best explanation of what we are observing, or “abductive inference”.

A pattern will emerge over time as this is repeated, and more information is drawn in from outside sources. That conjecture might be false, so I adjust and retest and record the resulting information. Once a pattern is found, the details can be filled in once the bug is repeatable. This is difficult to do, and requires patience and introspection as well as collaboration with others. This introspection is something I call “after the fact pattern analysis”. How do I figure out what was going on in the application when the bug occured, and how do I find a pattern to explain what happened? This emerges over time, and may change directions as more information is gathered from various sources. In some cases, my original hunch was right, but getting a repeatable case involved investigating the other possibilities and ruling them out. Aspects from each of these experiments shed new light on an emerging pattern. In other cases, a pattern was discovered by a process of elimination where I moved from one wrong theory to the next in a similar fashion.

The different patterns that I apply are the manipulated variables in the experiment, and the resulting behavior is the responding variable. Once I can repeat the responding variable on command, it is time to focus on the details and work with a developer on getting a fix.

Update:
Patterns are probably the most important dimension, and reader feedback shows I didn’t go into enough detail in this section. I’ll work on the patterns dimension and explain it more in another post.

People

When we focus on technical details, we frequently forget about people. I’ve posted before about creating a user profile, and creating a model of the user’s environment. James Bach pointed me to the work of John Musa who has done work in software reliability engineering. The combination of the user’s profile and their environment I was describing is called an “operational profile”.

I also rely heavily on collaboration when working on intermittent bugs. Many of these problems would have been impossible for me to figure out without the help and opinions of other testers, developers, operations people, technical writers, customers, etc. I recently described this process of drawing in information at the right time from different specialists to some executives. They commented that it reminded them of medical work done on a patient. One person doesn’t do it all, and certain health problems can only be diagnosed with the right combination of information from specialists applied at just the right time. I like the analogy.

Tools & Techniques

When Exploratory Testing, I am not only manually testing, but also use whatever tools and techniques help me build a model to describe the problem I’m trying to solve. Information from automated tests, log analyzers, looking at the source code, the system details, anything might be relevant to help me build a model on what might be causing the defect. As James Bach and Cem Kaner say, ET isn’t a technique, it’s a way of thinking about testing. Exploratory Testers use diverse techniques to help gather information and test out theories.

I refer to using many automated or diagnostic testing tools to a term I got from Cem Kaner: “Computer Assisted Testing.” Automated test results might provide me with information, while other automated tests might help me repeat an intermittent defect more frequently than manual testing alone. I sometimes automate certain features in an application that I run while I do manual work as well which I’ve found to be a powerful combination for repeating certain kinds of intermittent problems. I prefer the term Computer Assisted Testing over “automated tests” because it doesn’t imply that the computer takes the place of a human. Automated tests still require a human brain behind them and to analyze their results. They are a tool, not a replacement for human thinking and testing.

Next time you see a bug get assigned to an “unrepeatable” state, review James’ post. Be patient, and don’t be afraid to stand up to adversity to get to the cause. Together we can wipe out the term “unrepeatable bug”.

User Profiles and Exploratory Testing

Knowing the User and Their Unique Environment

As I was working on an article for Better Software magazine, I found consistent patterns in cases where I have found a repeatable case to a so-called “unrepeatable bug”. One pattern that surprised me was how often I do user profiling. Often, one tester or end-user sees a so-called unrepeatable bug more frequently than others. A lot of my investigative work in these cases involves trying to get inside an end-user’s head (often a tester) to emulate their actions. I have learned to spend time with the person to get a better perspective on not only their actions and environment, but their ideas and motivations. The resulting user profiles fuel ideas for exploratory testing sessions to track down difficult bugs.

Recently I was assigned the task of tracking down a so-called unrepeatable bug. Several people with different skill levels had worked on it with no success. With a little time and work, I was able to get a repeatable case. Afterwards, when I did a personal retrospective on the assignment, I realized that I was creating a profile of the tester who had come across the “unrepeatable” cases that the rest of the dev team did not see. Until that point, I hadn’t realized to what extent I was modeling the tester/user when I was working on repeating “unrepeatable” bugs. My exploratory testing for this task went something like this.

I developed a model of the tester’s behaviour through observation and some pair testing sessions. Then, I started working on the problem and could see the failure very sporadically. One thing I noticed was that this tester did installations differently than others. I also noticed what builds they were using, and that there was more of a time delay between their actions than with other testers (they often left tasks mid-stream to go to meetings or work on other tasks). Knowing this, I used the same builds and the same installation steps as the tester; I figured out that part of the problem had to do with a Greenwich Mean Time (GMT) offset that was set incorrectly in the embedded device we were testing. Upon installation, the system time was set behind our Mountain Time offset, so the system time was back in time. This caused the system to reboot in order to reset the time (known behavior, working properly). But, as the resulting error message told me, there was also a kernel panic in the device. With this knowledge, I could repeat the bug about every two out of five times, but it still wasn’t consistent.

I spent time in that tester’s work environment to see if there was something else I was missing. I discovered that their test device had connections that weren’t fully seated, and that they had stacked the embedded device on both a router and a power supply. This caused the device to rock gently back and forth when you typed. So, I went back to my desk, unseated the cables so they barely made a connection, and—while installing a new firmware build—tapped my desk with my knee to simulate the rocking. Presto! Every time I did this with a same build that this tester had been using, the bug appeared.

Next, I collaborated with a developer. He went from, “that can’t happen,” to “uh oh, I didn’t test if the system time is back in time, *and* that the connection to the device is down during installation to trap the error.” The time offset and the flakey connection were causing two related “unrepeatable” bugs. This sounds like a simple correlation from the user’s perspective, but it wasn’t from a code perspective. These areas of code were completely unrelated and weren’t obvious when testing at the code level.

The developer thought I was insane when he saw me rocking my desk with my knee while typing to repeat the bug. But when I repeated the bugs every time, and explained my rationale, he chuckled and said it now made perfect sense. I walked him through my detective work, how I saw the device rocking out of the corner of my eye when I typed at the other tester’s desk. I went through the classic conjecture/refutation model of testing where I observed the behavior, set up an experiment to emulate the conditions, and tried to refute my proposition. When the evidence supported my proposition, I was able to get something tangible for the developer to repeat the bug himself. We moved forward, and were able to get a fix in place.

Sometimes we look to the code for sources of bugs and forget about the user. When one user out of many finds a problem, and that problem isn’t obvious in the source code, we dismiss it as user error. Sometimes my job as an exploratory tester is to track down the idiosyncrasies of a particular user who has uncovered something the rest of us can’t repeat. Often, there is a kind of chaos-theory effect that happens at the user interface, that only a particular user has the right unique recipe to cause a failure. Repeating the failure accurately not only requires having the right version of the source code and having the test system deployed in the right way, it also requires that the tester knows what a that particular user was doing at that particular time. In this case, I had all three, but emulating an environment I assumed was the same as mine was still tricky. The small differences in test environments, when coupled with slightly different usage by the tester, made all the difference between repeating the bug and not being able to repeat it. The details were subtle on their own, but each nuance, when put together, amplified each other until the application had something it couldn’t handle. Simply testing the same way we had been in the tester’s environment didn’t help us. Putting all the pieces together yielded the result we needed.

Note: Thanks to this blog post by Pragmatic Dave Thomas, this has become known as the “Knee Testing” story.


Superbugs

My wife and I have friends and family in the health-care profession who tell us about “superbugs” – bacteria which are resistant to antibiotics. In spite of all the precautions, new technology and the enormous efforts of health care professionals, bugs still manage to mutate and respond to the environment they are in and still pose a threat to human health. In software development projects, I have encountered bugs that at least on the surface appear to exhibit this “superbug” behavior.

Development environments that utilize test-driven development, automated unit testing tools and other test-infected development techniques, in my experience, tend to generate very robust applications. When I see how much the developers are testing, and how good the tests are, I wonder if I’ll be able to find any bugs in their code at all. I do find bugs (sometimes to my surprise), but it can be much harder than in traditional development environments. Gone are the easy bugs an experienced tester can find in minutes in a newly developed application component. These include bounds conditions tests, integration tests and others that may not be what first come to mind to a developer testing their own code. However, in a test-infected development environment, most of these have already been thought of, and tested for by developers. As a tester, I have to get creative and inventive to find bugs in code that has already been thoroughly tested by the developers.

In some cases, I have collaborated with the developer to help in their unit test development efforts. <shameless plug> I talk about this more in next month’s edition of Better Software. </shameless plug> The resulting code is very hard for me to find bugs in. Sometimes to find any bugs at all, I have to collaborate with the developer to generate new testing ideas based on their knowledge of interactions in the code itself. The bugs that are found in these efforts are often tricky, time consuming and difficult to replicate. Nailing down the cause of these bugs often requires testers and developers pair testing. These bugs are not only hard to find, they are often difficult to fix, and seem to be resistant to the development efforts which are so successful in catching many bugs during the coding process. I’ve started calling these bugs “superbugs”.

It may be the case that certain bugs are resistant to the developer testing techniques, but I’m not sure if this is the case or not. I’ve thought until recently that these bugs also exist in traditionally developed code, but since testers spend so much time dealing with the bugs that test-infected development techniques tend to catch, they don’t have the time in the life of the project to find these types of bugs as frequently. Similarily, since they are difficult to replicate, they may not get reported as much by actual users, or several users may report the same problem in the form of several “unrepeatable” bugs.

Another reason *I* find them difficult to find might be due to my own testing habits and rules of thumb, particularly if the developer and I are working together quite closely. When we test together, I teach the developer some of my techniques, and they teach me theirs. When I finally test the code, both of our usual techniques have been tested quite well in development. Now I’m left with usability problems, some integration bugs that the unit testing doesn’t catch, and these so-called “superbugs”. Maybe the superbugs aren’t superbugs at all. Another tester might think of them as regular bugs, and may find them much more easily than I can because of their own toolkit of testing techniques and rules of thumb.

This behavior intrigues me none the less. Are we now able to find bugs that we didn’t have the time to find before, or are we now having to work harder as testers and push the bounds of our knowledge to find bugs in thoroughly developer-tested code? Or is it possible that our test-infected development efforts have resulted in a new strain of bugs?