Category Archives: test automation

Load Testing Your Web Infrastructure: Please Be Careful. Part 1

Now that I am on the product management side of software projects, I don’t deal with testing approaches in my day-to-day work very much. I get info about product quality criteria, quality goals and metrics, information on testing status and quality, or show stoppers that require attention. Unless I want to dig deeper, I don’t hear much about the actual testing work. Once in a while though, something big pops up on to my radar, usually because there is a threat to a product release, or there is a political issue at play. In those moments, my background as a software tester comes in handy.

Recently, my testing experience was called into action, because of project controversy about load testing.

There were some problems with a retail system in production, and poor performance was blamed. The tech team did not have the expertise or budget for load testing, and were instead pushing the sales team to take responsibility for that testing. The sales team didn’t have any technically minded people on their team, so they approached marketing. The marketing team has people with more technical skills, so a manager decided to take on that responsibility. They asked the team for volunteers to research load testing, try it out, and report back to the technical team. I happened to overhear this, and began waving my arms like the famous robot from Lost in Space who would warn about impending danger by saying: “Danger, Will Robinson!” This is out of character for me, since I prefer to let the team make technical decisions, and rarely weigh in, so people were shocked by my reaction. I will relay to you what I said to them.

Load testing is an important testing technique, but it needs to be done by people with specialized skills who know exactly what they are doing. It also needs to have test environments, accounts, permissions and third party relationships taken into account.

Load testing is a great way to not only find performance issues with your website or backend servers, it will also cause intermittent bugs to pop up with greater frequency. Problems you might miss with regular use will suddenly appear while under load, due to the high volume of tests that are run during a short period of time. High volume automated testing is extremely effective, and one of my favorite approaches to test automation. To do it correctly and to get utility requires work, environment setup, as well as knowledge and skill. Done well, performance bottlenecks are identified and addressed, intermittent bugs are found and fixed, and a good test environment and test suite helps mitigate risks going forward when there are pushes to production. However, when done poorly, load testing can have dangerous results. Here are some cautionary stories.

The simplest load testing tools involve setting up a recorder on your device to capture the traffic to and from the website you are testing. You start the recorder, execute a workflow test, turn off the recorder, and then use that recorded session for creating load. The load testing tool generates a certain number of unique sessions, and replays that test at the transport layer. In other words, it generates multiple tests, simulating several simultaneous users using the website. However, lots of systems get suspicious of a lot of hits coming from a particular device, and protect against that. Furthermore, internal networks aren’t designed for one machine to broadcast a huge volume of data. If you are working from home, your ISP will get suspicious if you are doing this from your account, fearing that your devices are being used for a Denial of Service attack. Payment processors are especially wary of large amounts of traffic as well. So if you use this method, you need to completely understand the system and the environments where you are performing the tests.

Part 1: Expensive Meaningless Tests

Early in my career, I was working with a popular ecommerce system. They were successful with managing load, but felt their approach was too reactive and possibly a bit expensive. If they could do load and performance testing within the organization rather than deal with complaints and outages, they could also improve customer experience. I was busy with other projects, and I had never worked with load testing tools before. Since I was a senior tester, I was asked to oversee the work by a consultant who was a well known specialist, who also worked for a tool vendor that sold load and performance testing tools. To be completely honest, I was busy, I trusted their expertise, and I didn’t pay a lot of attention to what they were doing. One day, they scheduled a meeting with me, and provided an overview. It all looked impressive, there were charts and graphs, and the consultant had a flashy presentation. They then showed me their load tests, and highlighted that they had found “tons of errors”. He said that his two weeks of work had demonstrated that we clearly needed to buy the tool he was selling. “Look at all the important errors it revealed!”

My heart sank. All they had done was record one scenario on the ecommerce system, and then played that back with various amounts of simultaneous users. They were wise enough not to saturate the local network, so they kept the numbers small, but their tests were all useless because they had no idea or curiosity about how the system actually worked. The first problem was that retail systems don’t have an endless supply of goods. Setting up test environments means you set up fake goods, or copies of production inventories that don’t actually result in a real life sale. To make them realistic, you don’t have an infinite number of widgets, unless you need that for a particular test. These tests didn’t take that into account, and the “important errors” his hard work had revealed with the tool were just standard errors about missing inventory. In other words, there were ten test books for sale, and he was trying to buy the 11th, 12th, 13th books. If he had been a real user using a website, the unavailable inventory messages would have been displayed more clearly. Because he was getting errors from the protocol level, they weren’t as pretty. A two minute chat with an IT person or programmer would have set him straight, but he didn’t look into it. He copied the messages and put them in his report, treating them as bugs, rather than the system working just fine, due to his error.

Next, they were using a test credit card number that was provided to us by the payment processor. There are lots of rules around usage of these test numbers, and he was completely oblivious to these rules. In his days of so-called analysis of our system, he had not explored this at all. That meant that our test credit card numbers were getting rejected. This was the source of some of the other “important errors” he had found, but not investigated. This was so egregious to me, I had to stop the meeting and talk to our IT accountant who managed our test credit card. My fears were confirmed – these load tests resulted in our test credit card numbers getting flagged due to suspicious activity. That meant none of us could test using the credit card, and we had to have a meeting explaining ourselves and apologizing to get them reinstated.

I got dragged into developing my own load and performance testing skills because of this. The consultant went back to the office, and I inherited these terrible tests. What I found that was while the load testing tool looked impressive, it had this terrible proprietary programming language that created unmaintainable code. While it had impressive charts and graphs, they were extremely basic and could actually mask important problems. Recording HTTP(S) traffic and playing it back could be fraught with peril, because the recorder is going to pick up ALL the HTTP traffic on your machine, including your instant messages, webmail, other websites that are open, and 3rd party services such as a weather plugin or stock ticker. Also, you need a protected test network that prevents you from causing problems and interfering with everyone else’s work. Then, you need to look at your backend and see what is possible. In my case, I worked with the team to create new load test products on the website, but the backend retail system only allowed a maximum of 9999, since it maxed out with a 4 digit integer. We also had to create a system to simulate credit card processing, since the payment processor wasn’t going to allow thousands of test purchases hitting their machine. Furthermore, our servers had DDoS protection, and would flag machines that were hitting them with lots of simultaneous requests and deny access, so we had to distribute tests across multiple machines. (These issues were all a bit more technical than I am recording here, but this should give you an idea.)

How much time do you think it took to create the environment for load tests, and then to create good load tests that would actually work?

If you answered: “weeks” with several people working on the testing project, then you are in the ballpark.

We also abandoned the expensive load testing tool, mostly due to it using a vendorscript instead of a real programming language. We used one that was based on the same language the development team used, so I would have support, and other people could maintain the tests over time. It was a bit rudimentary, but we were able to identify problem areas for performance, and address those in production. A happy side effect was the load tests caused intermittent issues that we had missed before to become repeatable cases that could be fixed. It was a lot of work, but it was the start of something useful. The tests were useful, the results were helpful, and we had tests that could be understood, maintained and run by multiple people in the organization.

I was fortunate in this case to be able to work with a great team that was finally empowered to do the right thing for the organization. We were also fortunate in our software architecture and design. We spent the time early on to create something maintainable, with simple tests. As a result, our testing framework was used for years before it required major updates.

Click here for Part 2 of the series.

Test Automation Games

As I mentioned in a prior post: Software Testing is a Game, two dominant manual testing approaches to the software testing game are scripted and exploratory testing. In the test automation space, we have other approaches. I look at three main contexts for test automation:

  1. Code context – eg. unit testing
  2. System context – eg. protocol or message level testing
  3. Social context – eg. GUI testing

In each context, the automation approach, tools and styles differ. (Note: I first introduced this idea publicly in my keynote “Test Automation: Why Context Matters” at the Alberta Workshop on Software Testing, May 2005)

In the code context, we are dominated now by automated unit tests written in some sort of xUnit framework. This type of test automation is usually carried out by programmers who write tests to check their code as they develop products, and to provide a safety net to detect changes and failures that might get introduced as the code base changes over the course of a release. We’re concerned that our code works sufficiently well in this context. These kinds of tests are less about being rewarded for finding bugs – “Cool! Discovery!” and more about providing a safety net for coding, which is a different high value activity that can hold our interest.

In the social context, we are concerned about automating the software from a user’s perspective, which means we are usually creating tests using libraries that drive a User Interface, or GUI. This approach of testing is usually dominated by regression testing. People would rather get the tool to repeat the tests than deal with the repetition inherent in regression testing, so they use tools to try to automate that repetition. In other words, regression testing is often outsourced to a tool. In this context, we are concerned that the software works reasonably well for end users in the places that they use it, which are social situations. The software has emergent properties by combining code, system and user expectations and needs at this level. We frequently look to automate away the repetition of manual testing. In video game design terms, we might call repetition that isn’t very engaging as “grinding“. (David McFadzean introduced this idea to me during a design session.)

The system context is a bit more rare, but we test machine to machine interaction, or simulate various messaging or user traffic by sending messages to machines without using the GUI. There are integration paths and emergent properties that we can catch at this level that we will miss with unit testing, but by stripping the UI away, we can create tests that run faster, and track down intermittent or other bugs that might be masked by the GUI. In video or online games, some people use tools to help enhance their game play at this level, sometimes circumventing rules. In the software testing world, we don’t have explicit rules against testing at this level, but we aren’t often rewarded for it either. People often prefer we look at the GUI, or the code level of automation. However, you can get a lot of efficiency for testing at this level by cutting out the slow GUI, and we can explore the emergent properties of a system that we don’t see at the unit level.

We also have other types of automation to consider.

Load and performance testing is a fascinating approach to test automation. As performance thought leaders like Scott Barber will tell you, performance testing is roughly 20% of the automation code development and load generation work, and 80% interpreting results and finding problem areas to address. It’s a fascinating puzzle to solve – we simulate real-world or error conditions, look at the data, find anomalies and investigate the root cause. We combine a quest with discovery and puzzle solving game styles.

If we look at Test-Driven Development with xUnit tools, we even get an explicit game metaphor: The “red bar/green bar game.” TDD practitioners I have worked with have used this to describe the red bar (test failed), green bar (test passed) and refactor (improve the design of existing code, using the automated tests as a safety net.) I was first introduced to the idea of TDD being a game by John Kordyback. Some people argue that TDD is primarily a design activity, but it also has interesting testing implications, which I wrote about here: Test-Driven Development from a Conventional Software Testing Perspective Part 1, here: Test-Driven Development from a Conventional Software Testing Perspective Part 2, and here: Test-Driven Development from a Conventional Software Testing Perspective Part 3.
(As an aside, the Session Tester tool was inspired by the fun that programmers express while coding in this style.)

Cem Kaner often talks about high volume test automation, which is another approach to automation. If you automate a particular set of steps, or a path through a system and run it many times, you will discover information you might otherwise miss. In game design, one way to deal with the boredom of grinding is to add in surprises or rewarding behavior when people repeat things. That keeps the repetitiveness from getting boring. In automation terms, high volume test automation is an incredibly powerful tool to help discover important information. I’ve used this particularly in systems that do a lot of transactions. We may run a manual test several dozen times, and maybe an automated test several hundred or a thousand times in a release. With high volume test automation, I will run a test thousands of times a day or overnight. This greatly increases my chance of finding problems that only appear in very rare events, and forces seemingly intermittent problems to show themselves in a pattern. I’ve enhanced this approach to mutate messages in a system using fuzzing tools, which helps me greatly extend my reach as a tester over both manual testing, and conventional user or GUI-based regression automated testing.

Similarly, creating simulators or emulators to help generate real-world or error conditions that are impossible to create manually are powerful approaches to enhance our testing game play. In fact, I have written about some of these other approaches are about enhancing our manual testing game play. I wrote about “interactive automated testing” in my “Man and Machine” article and in Chapter 19 of the book “Experiences of Test Automation“. This was inspired by looking at alternatives to regression testing that could help testers be more effective in their work.

In many cases, we attempt to automate what the manual testers do, and we fail because the tests are much richer when exercised by humans, because they were written by humans. Instead of getting the computer to do things that we are poor at executing (lots of simple repetition, lots of math, asynchronous actions, etc.) we try to do an approximation of what the humans do. Also, since the humans interact with a constantly changing interface, the dependency of our automation code on a changing product creates a maintenance nightmare. This is all familiar, and I looked at other options. However, another inspiration was code one of my friends wrote to help him play an online game more effectively. He actually created code to help him do better in gaming activities, and then used that algorithm to create an incredibly powerful Business Intelligence engine. Code he wrote to enhance his manual game play in an online game was so powerful at helping him do better work in a gaming context, when he applied it to a business context, it was also powerful.

Software test automation has a couple of gaming aspects:

  1. To automate parts of the manual software testing game we don’t enjoy
  2. It’s own software testing game based on our perceived benefits and rewards

In number 1 above, it’s interesting to analyze why we are automating something. Is it to help our team and system quality goals, or are we merely trying to outsource something we don’t like to a tool, rather than look at what alternatives fit our problem best? In number 2, if we look at how we reward people who do automation, and map automation styles and approaches to our quality goals or quality criteria, not to mention helping our teams work with more efficiency, discover more important information, and make the lives of our testers better, there are a lot of fascinating areas to explore.

Experiences of Test Automation

I recently received my copy of Experiences of Test Automation: Case Studies of Software Test Automation. I contributed chapter 19: There’s More to Automation than Regression Testing: Thinking Outside the Box. I’m recommending this book not only because I am a contributor, but I have enjoyed the raw honesty about test automation by experienced practitioners. Finally, we have a book that provides balanced, realistic, experienced-based content.

For years, it seems that test automation writing is dominated by cheerleading, tool flogging, hype and hyperbole. (There are some exceptions, but I still run into exaggerations and how automation is an unquestioning good far too often.) The division between the promoters of the practice (ie. those who make a lot of money from it), the decision makers they convince and the technical practitioners is often deep. It can be galling to constantly see outright claptrap about how automation is a cure to all ills, or views that only talk about benefits without also pointing out drawbacks and limitations. It’s really difficult to implement something worthwhile in a world of hype and misinformation that skews implementation ideas and the expected results. This book is refreshingly different.

I agreed to contribute after talking to Dot Graham – she wanted content that was relevant, real, and honest. She said their goal for the book was a balanced view from real practitioners on the ground who would talk about good points, but we also needed to be honest about bad points and challenges we had to overcome. Dot liked my Man and Machine work and asked me to expand on that concept.

Now that I have a copy of the book, I find myself smiling at the honesty and reality described within. Did I really just read that? Where else will you find an admission of: “We tried tool____, and it was a complete disaster?

If you’re serious about automation, consider buying this book. It is chock full of real-world experience, and you are bound to find at least one lesson that you can apply directly to your automation effort. That is worth the cost alone, especially when we are constantly bombarded with distorted ideals and hype. You won’t agree with everything, and we all have preferences and biases, but the real-world honesty is a constant theme, and a breath of fresh air.

Content Pointer: Automation Politics 101

I wrote an article for Automated Software Testing magazine on test automation politics for the November issue. A PDF copy of the article is available here: Test Automation Politics 101.

Article back story: I had a nice dinner with Dot Graham and Dion Johnson this spring. As we were sharing automation stories, Dion asked if I would be interested in writing an article on politics and test automation. Since most automation writing focuses on tools, results and ideals, there isn’t a lot out there on the social aspects of automation. I thought I’d provide my take on some social challenges I learned about the hard way. I was inspired by Bach’s Test Automation Snake Oil. Bach doesn’t pull punches, and I tried not to either. I’ve often thought that it would have been nice to have been advised on potential controversies when I was starting out in test automation, so we took a handbook approach with the article.

I hope you enjoy it, and if you’re starting out in test automation, you find the article informative. If you’re currently facing any of the resistance I outlined, I hope you find the content encouraging.

I’m Speaking in Toronto this Summer

I’ll be traveling to Toronto twice this summer for testing presentations. The first is for TASSQ, on June 24. I’ll be presenting my “Man and Machine” talk where I talk about interactive automated testing. This is a different approach to automation that seems like common sense to some, and is deeply controversial to others. (Usually tool vendors and “automate all test” types.)

July 15, I’ll be co-presenting with Michael Bolton at CAST 2008. We’ll be talking about parallels in testing and music, a topic I’ve written about before.

Michael has a nice blog post describing CAST that is worth a read.

I hope to see you in Toronto this summer, and finally meet some of you face-to-face. If you’re from the area, I hope you’ll come.

Update: New Article Published: Man and Machine

My first cover article for Better Software magazine is out this month: Man and Machine: Combining the Power of the Human Mind with Automation. This is an article describing how I bridge the gap between automated software testing and manual software testing. I rarely see software testing as purely scripted vs. purely exploratory, or purely manual vs. purely automated. I see these as a matter of degrees. I might think: “To what degree is this test automated?” or “To what degree will automation help us create a simulation to enhance our exploratory testing work?”

This isn’t a new concept, and I was influenced by Cem’s Architectures of Test Automation , and James’ Tool-Supported Testing ideas. Aaron West just sent me this link that describes a related concept at Toyota: the “autonomation” which is a machine that is used to help employees in their work, opposed to a fully automated robot.

I hope this article helps bridge a divide that doesn’t need to be there. Automation and manual testing are both important and can co-exist for the benefit of our testing efforts.

Update – Feedback:
Software developer Rob Shaw says:

Good stuff. I hadn’t thought of automation in this way. As a programmer with previous unit testing experience, I was definitely more in the ‘unsupervised automation camp’ but your transportation analogy shows the power of IAT (interactive automated testing).

Employee Empowerment and How I Became a Contextualist

Back when I was in college, I had an instructor who was a big fan of Total Quality Management (TQM). TQM was quite popular, but in our business school setting, it wasn’t taught that much. We learned about Juran, Feigenbaum and Total Quality Control, Deming and a lot of Quality theory. We learned about kaizen, quality circles, flat organizations and employee empowerment. When our instructor had us memorize W. Edwards Deming’s 14 Points, I was hooked. Most of the Quality theory just made sense to me. Why do business any other way?

I particularly liked the idea of flat organizations with empowered employees who were in charge of their own destiny. I had worked part-time with various organizations for several years, and already saw how much waste can occur while executing business tasks. Often, the people doing the work had a much better idea on how to be more efficient, but layers of management seemed to get in the way. I often thought about Quality theory, and in these cases, about using employee empowerment. It seemed to be the answer to inefficiency issues I was seeing, and might help employee morale.

Later on, I learned an effective heuristic for business school case studies:

If I use Quality theory with a good dose of Deming as a solution for any problem, I’ll get a good grade.

So I often used Quality theory for case study assignments. It worked quite well. It seemed that few of my classmates used these kinds of ideas, so I probably got marks for being unique. Then I had Dr. John Rutland as a professor, and things changed. I found out that the heuristic didn’t always work, and that it was a poor one at that. Dr. Rutland figured out quite quickly that Quality theory was my silver-bullet, and deftly pricked the balloon of my pomposity.

After working on a business case study, he asked me for my ideas. I began reciting my typical answer:

  1. empower the employees.
  2. flatten the hierarchy.
  3. form Quality Circles to work on solutions to problems.
  4. etc.

I had barely uttered something about “empowering employees” when he cut me off.

“Jonathan. Have you talked to the individual employees yet?”

“No.” I answered.

“Do you know what makes them tick? Do you know what their motivations are? What their likes and dislikes are? Do you know not only their frustrations but their sources of pleasure?”

“No.” I stammered. “Why did you suggest it then?” I choked out something about having better job satisfaction if employees feel they have a say.

“You mean to tell me that you think an employee who works in the auto industry whose job is to turn a bolt on an engine block a quarter turn all day, five days a week doesn’t have job satisfaction? You may not enjoy that job, but other people might. They may be happy with the way things are, and miserable if they are ’empowered’. How can you be so arrogant to think that without finding out what the problems are, or what the motivations of your employees are, you can just go ahead and impose a system without taking them into account?”

Wow. I was shocked. He was right. I wasn’t empowering anyone, I was doing what Jerry Weinberg describes as “inflicting help.” I felt ashamed at my ignorance. Dr. Rutland went on: “Jonathan, people are cognitive beings. They are all unique. It’s your job as a manager to find out what makes them tick. It’s your job to help them reach their goals and remove obstacles from their way. The right solution is the one what works in their environment, taking their needs and talents into account. You can’t just impose the way you think is the best way for all environments.” I was stunned, but after I swallowed my pride, I decided I should learn everything I could from him. The lesson that day (thanks to me), turned out to be on context, and to not settle on one solution, and then always use that one solution for every problem. That was at best naive, at worst lazy, and negligently incompetent. “Use the solutions that work right now, given the team, environment and culture you have.” Dr. Rutland said. “You can learn a lot of different techniques, but in the real world, things have a habit of not going according to plan. A good manager will learn as much about these tools as possible, but will adapt and adjust accordingly, using the tools that work for each unique situation.

Furthermore, there are cases where Quality theory has failed spectacularly, in real, live organizations. The real world has a habit of yanking the rug out from under us.”

I have never forgotten that lesson on how the context for what we do is extremely important. Years later, some people echoed similar wisdom, calling themselves the Context-Driven Testing School. Their ideas resonated with me, thanks in no small part to what I learned from Dr. Rutland, who had first taught me to be a contextualist.

Furthermore, I’ve been in situations where a flat, employee-empowered organizational structure was a complete disaster. Sadly, despite being prepared for it, I was shocked. I worked for an organization that had several business school case studies that holding it up as a model for an employee-driven organization. Once I was inside, I found it was a terrible place to work. I’d never before seen such bitter political infighting, deliberate project sabotage, posturing projects, coup attempts, inefficiency, waste and poor morale. When I moved on, I was so relieved to get out of that organization with its confusing reporting structure, paralyzing indecision (from so many competing employee committees), and general toxicity. It was a breath of fresh air to go into an organization that many would decry as being “Taylorist” or “Command and Control”. At least most of the decision making was in fact decisive, and I knew who had the decision making power based on the organizational chart.

I told Brian Lawrence a bit about this experience, and he simply said: “The hierarchy went underground.” I was pleasantly surprised by this bit of wisdom. In a top-down organization, at least you know the hierarchy more or less. The unpleasant “Taylorist” and “command and control” behavior did not really go away, even in a company that was founded with a flat hierarchy and text-book employee empowerment programs. It just went underground, and it got very ugly, having the opposite effect that the founders intended. I was reminded of Weinberg’s “Rule of Three”, which goes something like this: if you can’t think of three ways a potential solution can fail, you don’t completely understand the problem space.

I’ve learned from wise teachers, and I’ve been fortunate to learn from people like Dr. Rutland and Brian Lawrence. My learning has shown me that any kind of process, structure or tool has the potential to fail. The context is important, and it is foolhardy to charge ahead with a potential solution without fully exploring the problem space. Life is full of tradeoffs: any potential solution we suggest in an organization has the potential to fail. It’s okay to still go forward with a potential solution even if we can imagine it might fail. (I don’t even care if it is the “right” one, as long as we are prepared to adapt when needed.) To believe a practice, tool, process or organizational structure is unquestioningly right, in any context, will eventually lead to disappointment. When I read back over some of my TQM notes, I find that there are some great ideas there, but I don’t treat them as my hammer and every problem I see as a nail anymore. I hope I manage to continue doing that with other processes and tools.

So what do I do now that I’ve learned something? I:

  • resist jumping to a solution too quickly;
  • learn about the context;
  • talk to the people doing the work, and do a lot of listening;
  • do careful analysis before suggesting solutions; and
  • use Rule of Three to do contingency planning and raise awareness of the potential for problems to come up in the future.

In my line of work, it is common for people to immediately ask me if I’m going to do test automation once they learn I’ll be on a project. My answer tends to be something like this:

If, after careful analysis, we find that test automation is a good fit to solve the problem we need to solve, we’ll explore test automation.

That answer seems to surprise people, but I’ve learned my “test automation as problem solution” lesson several times over. I have seen so many test automation efforts fail because “test automation” was thought of as a solution to all testing woes. I’ve seen enough failures to know that test automation is a tool that requires careful thought and planning. As Tim Beck would say, “Are we solving the real problem?” My job is to work with a team to find out what the problem is, and work with them on a solution to that problem. Test automation, like employee empowerment may or may not help us solve that problem. In some cases it may just compound our existing problems with new and exotic ones.

Reckless Test Automation

The Agile movement has brought some positive practices to software development processes. I am a huge fan of frequent communication, of delivering working software iteratively, and strong customer involvement. Of course, before “Agile” became a movement, a self-congratulating community, and a fashionable term, there were companies following “small-a agile” practices. Years ago in the ’90s I worked for a startup with a CEO who was obsessed with iterative development, frequent communication and customer involvement. The Open Source movement was an influence on us at that time, and today we have the Agile movement helping create a shared language and a community of practice. We certainly could have used principles from Scrum and XP back then, but we were effective with what we had.

This software business involves trade-offs though, and for all the good we can get from Agile methods, vocal members of the Agile community have done testing a great disservice by emphasizing some old testing folklore. One of these concepts is “automate all tests”. (Some claimed agilists have the misguided gall to claim that manual testing is harmful to a project. Since when did humans stop using software?) Slavishly trying to reach this ideal often results in: Reckless Test Automation. Mandates of “all”, “everything” and other universal qualifiers are ideals, and without careful, skillful implementation, can promote thoughtless behavior which can hinder goals and needlessly cost a lot of money.

To be fair, the Agile movement says nothing officially about test automation to my knowledge, and I am a supporter of the points of the Agile Manifesto. However, the “automate all tests” idea has been repeated so often and so loudly in the Agile community, I am starting to hear it being equated with so-called “Agile-Testing” as I work in industry. In fact, I am now starting to do work to help companies undo problems associated with over-automation. They find they are unhappy with results over time while trying to follow what they interpret as an “Agile Testing” ideal of “100% test automation”. Instead of an automation utopia, they find themselves stuck in a maintenance quagmire of automation code and tools, and the product quality suffers.

The problems, like the positives of the Agile movement aren’t really new. Before Agile was all the rage, I helped a company that had spent six years developing automated tests. They had bought the lie that vendors and consultants spouted: “automate all tests, and all your quality problems will be solved”. In the end, they had three test cases developed, with an average of 18 000 lines of code each, and no one knew what their intended purpose was, what they were supposed to be testing, but it was very bad if they failed. Trouble was, they failed a lot, but it took testers anywhere from 3-5 days to hand trace the code to track down failures. Excessive use of unrecorded random data sometimes made this impossible. (Note: random data generation can be an incredibly useful tool for testing, but like anything else, should be applied with thoughtfulness.) I talked with decision makers and executives, and the whole point of them buying a tool and implementing it was to help reduce the feedback loop. In the end, the tool greatly increased the testing feedback loop, and worse, the testers spent all of their time babysitting and maintaining a brittle, unreliable tool, and not doing any real, valuable testing.

How did I help them address the slow testing feedback loop problem? Number one, I de-emphasized relying completely on test automation, and encouraged more manual, systematic exploratory testing that was risk-based, and speedy. This helped tighten up the feedback loop, and now that we had intelligence behind the tests, bug report numbers went through the roof. Next, we reduced the automation stack, and implemented new tests that were designed for quick feedback and lower maintenance. We used the tool to complement what the skilled human testers were doing. We were very strict about just what we automated. We asked a question: “What do we potentially gain by automating this test? And, more importantly, what do we lose?” The results? Feedback on builds was reduced from days to hours, and we had same-day reporting. We also had much better bug reports, and frankly, much better overall testing.

Fast-forward to the present time. I am still seeing thoughtless test automation, but this time under the “Agile Testing” banner. When I see reckless test automation on Agile teams, the behavior is the same, only the tools and ideals have changed. My suggestions to work towards solutions are the same: de-emphasize thoughtless test automation in favor of intelligent manual testing, and be smart about what we try to automate. Can a computer do this task better than a human? Can a human do it with results we are happier with? How can we harness the power of test automation to complement intelligent humans doing testing? Can we get test automation to help us meet overall goals instead of thoughtlessly trying to fullfill something a pundit says in a book or presentation or on a mailing list? Are our test automation efforts helping us save time, and helping us provide the team the feedback they need, or are they hindering us? We need to constantly measure the effectiveness of our automated tests against team and business goals, not “percentage of tests automated”.

In one “Agile Testing” case, a testing team spent almost all of their time working on an automation effort. An Agile Testing consultant had told them that if they automated all their tests, it would free up their manual testers to do more important testing work. They had automated user acceptance tests, and were trying to automate all the manual regression tests to speed up releases. One release went out after the automated tests all passed, but it had a show-stopping, high profile bug that was an embarassment to the company. In spite of the automated tests passing, they couldn’t spot something suspicious and explore the behavior of the application. In this case, the bug was so obvious, a half-way decent manual tester would have spotted it almost immediately. To get a computer to spot the problem through investigation would have required Artificial Intelligence, or a very complex fuzzy logic algorithm in the test automation suite, for one quick, simple, inexpensive, adaptive, yet powerful human test. The automation wasn’t freeing up time for testers, it had become a massive maintenance burden over time, so there was little human testing going on, other than superficial reviews by the customer after sprint demos. Automation was king, so human testing was de-emphasized and even looked on as inferior.

In another case, developers were so confident in their TDD-derived automated unit tests, they had literally gone for months without any functional testing, other than occasional acceptance tests by a customer representative. When I started working with them, they first defied me to find problems (in a joking way), and then were completely flabbergasted when my manual exploratory testing did find problems. They would point wide-eyed to the green bar in their IDE signifying that all their unit tests had passed. They were shocked that simple manual test scenarios could bring the application to its knees, and it took quite a while to get them to do some manual functional testing as well as their automated testing. It took them a while to leave their automation dogma aside, to become more pragmatic, and then figure out how to also incorporate important issues like state into their test efforts. When they did, I saw a marked improvement in the code they delivered once stories were completed.

In another “Agile Testing” case, the testing team had put enormous effort into automating regression tests and user acceptance tests. Before they were through, they had more lines of code in the test automation stack than what was in the product it was supposed to be testing. Guess what happened? The automation stack became buggy, unwieldly, unreliable, and displayed the same problems that any software development project suffers from. In this case, the automation was done by the least skilled programmers, with a much smaller staff than the development team. To counter this, we did more well thought out and carefully planned manual exploratory testing, and threw out buggy automation code that was regression test focussed. A lot of those tests should never have been attempted to be automated in that context because a human is much faster and much superior at many kinds of tests. Furthermore, we found that the entire test environment had been optimized for the automated tests. The inherent system variablity the computers couldn’t handle (but humans could!), not to mention quick visual tests (computers can’t do this well) had been attempted to be factored out. We did not have a system in place that was anything close to what any of our customers used, but the automation worked (somewhat). Scary.

After some rework on the testing process, we found it cheaper, faster and more effective to have humans do those tests, and we focussed more on leveraging the tool to help achieve the goals of the team. Instead of trying to automate the manual regression tests that were originally written for human testers, we relied on test automation to provide simulation. Running simulators and manual testing at the same time was a powerful investigative tool. Combining simulation with observant manual testing revealed false positives in some of the automated tests which had to been unwittingly released to production in the past. We even extended our automation to include high volume test automation, and we were able to greatly increase our test effectiveness by really taking advantage of the power of tools. Instead of trying to replicate human activities, we automated things that computers are superior at.

Don’t get me wrong – I’m a practitioner and supporter of test automation, but I am frustrated by reckless test automation. As Donald Norman reminds us, we can automate some human tasks with technology, but we lose something when we do. In the case of test automation, we lose thoughtful, flexible, adaptable, “agile” testing. In some tasks, the computer is a clear winner over manual testing. (Remember that the original “computers” were humans doing math – specifically calculations. Technology was used to automate computation because it is a task we weren’t doing so well at. We created a machine to overcome our mistakes, but that machine is still not intelligent.)

Here’s an example. On one application I worked on, it took close to two weeks to do manual credit card validation by testers. This work was error prone (we aren’t that great at number crunching, and we tire doing repetitive tasks.) We wrote a simple automated test suite to do the validation, and it took about a half hour to run. We then complemented the automated test suite with thoughtful manual testing. After an hour and a half of both automated testing (pure number crunching), and manual testing (usually scenario testing), we had a lot of confidence in what we were doing. We found this combination much more powerful than pure manual testing or pure automated testing. And it was faster than the old way as well.

When automating, look at what you gain by automating, and what you lose by automating a test. Remember, until computers become intelligent, we can’t automate testing, only tasks related to testing. Also, as we move further away from the code context, it usually becomes more difficult to automate tests, and the trade-offs have greater implications. It’s important to make considerations for automated test design to meet team goals, and to be aware of the potential for enormous maintenance costs in the long term.

Please don’t become reckless trying to fulfill an ideal of “100% test automation”. Instead, find out what the goals of the company and the team are, and see how all the tools at your disposal, including test automation can be harnessed to help meet those goals. “Test automation” is not a solution, but one of many tools we can use to help meet team goals. In the end, reckless test automation leads to feckless testing.

Update: I talk more about alternative automation options in my Man and Machine article, and in chapter 19 in the book: Experiences of Test Automation.

RE: Model Myopia in Testing

Chris McMahon asks:

How do I find a useful model when unit tests are passing and show 100% code coverage?

Chris asked me this a bit tongue-in-cheek because he has heard me rant about these sorts of things before. I’m glad he asked the question though, because it brings up a different kind of model myopia. In yesterday’s post, we have modeling myopia with regards to the application we are testing. This question brings up model myopia regarding our potential testing contexts.

This is a common question, particularly on agile teams that utilize automated unit testing who are struggling with a tricky bug. Let’s examine this situation in more detail. What does the claim in the question tell us about testing?

This claim provides information about a certain testing context, the code context. We know that within this testing context, we have a measure of coverage that claims 100%, and that the tests that achieve this coverage claim pass. Is this enough information? We have a quantitative claim, but this doesn’t tell us much from a qualitative perspective.

Why does this matter?

One project I saw had 300 automated unit tests that achieved a high level of code coverage on a small application. When we did testing in another context (the user context, or through a Graphical User Interface), the application couldn’t pass basic tests. On further investigation, we found that most of the unit tests were programmed to pass, no matter what the result of the test was. The developers were measured on the amount of automated unit tests they wrote, on the code coverage attained as reported by a coverage tool, and they could only check in code when the tests passed. As a measurement maxim says: show me how people are measured, and I’ll show you how they behave.

This is an extreme example, but underlines how dangerous numbers can be without a context. Sure, we had a number of passing tests with a high degree of code coverage on a simple application, but the numbers were almost meaningless from a management perspective because the tests were of poor quality, and many of them didn’t really test anything. I’ve seen reams of automated test code not test anything too many times to take a quantitative claim on its own. (I’ve also seen a lot of manual tests that didn’t test anything much either.)

Now what happens if we agree that the quality of the unit tests are sufficiently good? We don’t have ridiculous measures like SMART goals where we “pay for performance” on automated unit tests, code coverage, bug counts, etc., so we don’t have this “meet the numbers” mentality that frustrates our goals. We know that our tests are well vetted, thought out and have helped steer our design. Our developers are intrinsically motivated to write solid tests, and they are talented and trustworthy. What now? All the tests passed. Where can the problem be?

It’s important to understand that “all the tests we could think of at a particular time passed”, not that “all the possible tests we could ever think of, express in program code and run have passed”. This is an important distinction. Brian Marick has good stories about faults of ommission, and reminds us that a test is only as good as what it tests. It doesn’t test what we didn’t write in the first place. Narrow modeling is usually a culprit, and once we learn more about the problem we can write a test for it. It’s hard to try to write a test for a particular problem that has stumped us and we don’t have enough information yet, or to predict all possible problems up front when doing TDD.

In other cases, I have seen tests pass in an automated framework that were due to using the framework incorrectly which caused false positives. This is rare, but I have seen it happen.

Michael Bolton has summed up automated unit testing assumptions eloquently:

When someone says “all the unit tests passed”, we should hear “the unit tests that we thought of, for theories of error of which we were aware, and that we considered important enough to write given the time we had, and that we presume to have run on the units of the product that we assume to have been built, and that we presume actually test something, and that we presume provide a trustworthy result 100% of the time, passed.”

Testing Contexts

Another important distinction to make is that we know this information comes from one testing context, the code context. We know there are others. I split an application with a user interface into three broad contexts: the code, the system the application is installed in, and the user interface. We know at the user interface layer, the application is greater than the sum of its parts. It will often react differently when tested from this context than it will from the code context. There are many potential testable interfaces within each context that can provide different information when testing. What interfaces, and in what contexts are we gathering test-generated information from?

When I see a claim like this: “we have 100% of our automated unit tests passing with 100% code coverage”, that only tells me about one of three possible model categories for testing. If we haven’t done testing in other contexts, we are only seeing part of the picture. We might be falling into a different type of modeling myopia – looking too narrowly at one testing execution model.

It’s common to only view the application through the context we are used to. I tend to use the term “interface within a context” rather than “black box” and “white box” after my experience with TDD. The interface we interact with the software with the most can cause us to narrow our focus on testing. We then model the testing we can do with a preference to this interface. Mike Kelly has a blog post that deals with bounded awareness and inattentional blindness, and how they can affect our decisions.

I see this frequently with developers who are working predominantly in the code context, and testers who are testing only in the user interface context. Frequently, each treat testing in the other context with some disdain. Often, testing diversity pays off, and when we work together bringing expertise and ideas from any testable interface we were using, we can combat model myopia.

Testing on one context does not mean testing in another context is duplicated effort or unnecessary. Often, testing using a different model reveals a bug that isn’t obvious, or easily reproducible in one context.

Not too long ago, I had two meetings. One was with two developers who asked why I bothered to test against the UI when we had so many other interfaces that were easier to test against behind the GUI. In another meeting, a tester asked why I bothered doing any testing behind the GUI when the end user only uses the GUI anyway. In both cases, my answer was the same: “When I stop finding bugs that you appreciate and find useful in one interface that I couldn’t find more easily in the other, I’ll stop testing against that interface.”

When testing in a context fails to provide us with useful information about the product we are testing, we can drop that model in its importance to our testing. Knowing about a testing model and choosing not to use it is different than thinking we have covered all ourtesting models, and still be stumped as to why we have problems.