Category Archives: software testing

Load Testing Your Web Infrastructure: Please Be Careful. Part 3

In the Part 2 story, we saw what a load testing tool can do when it is used by someone who doesn’t have the right knowledge and skill about the tool and underlying systems. However, you also need to understand the environment where you would need to use the tool. Creating and using test environments that are optimized for load and performance testing is a must. If you use these tools on a regular network, you will likely disrupt everyone else at the office, causing lost productivity and extra work for IT staff. The last thing you want to do is try them out at home, and end up blacklisted by your ISP (internet service provider).

Bye Bye Network!

After a while, I was an old hand at load and performance testing. To bolster my hands-on experience, I attended workshops on how to overcome technical restrictions, how to accurately analyze the data and find problems others would miss, how to write reports and describe risk and problems, and I was adept with a handful of tools. I started to get hired for performance and load testing gigs, and under the right circumstances, I had some rewarding and fun projects. I worked with a lot of talented people with vastly different skills, and learned from each of them.

Since I had a lot of retail and telco experience, a work friend asked me to come in to help him with a large retail system that was going through an upgrade. One of my tasks was to provide load testing help, since they were upgrading all the software and hardware for their back end system. I was given a lot of freedom to choose the tools, to interview everyone I could about any backend system issues, how to simulate credit card processing, etc. I was given a lot of freedom to research and design exactly what they needed. However, I was not given a test network to run the tests, so I never used any load. I verified my load tests would work with only one user.

To find potential areas of concern, we set up monitoring at several key areas on the system, and I had test results output in a format we could utilize with statistical analysis software. We also monitored server utilization, and recommended moving some processes around to better utilize the system. We learned a lot, but I wasn’t ready to unleash full load testing capabilities without a dedicated test network. There was no way I wanted to use this on the corporate network, even though we knew it would only run against our internal test system. I knew from experience that we could overload the internal network and cause problems for others. My friend, the dev manager, ignored my concerns. He was confident that the internal network would handle the extra traffic, since the IT admins had shown him that it was perpetually under-utilized.

Despite my objections, the dev manager insisted I run the load tests on the regular internal network. To start, he wanted to run the tests with 1000 simultaneous users, but I suggested we try something smaller. I wanted to try 10, he insisted we try 100. Still objecting, I hit the “Enter” key on my machine to start the tests. Immediately, a collective howl started to swell across the entire floor of the office. Then people started calling out that they had no network access. The dev manager and the IT manager ran to the server room, and when they unlocked it, all we could see in the dark rook was a sea of blinking red and yellow lights. Clearly, my load tests had overwhelmed the entire network, and every piece of hardware was in an error state. No one in the office was able to do work until all of the equipment was restarted. It took about a half hour to get the network up and running again, and the first thing my friend said was: “TRY IT AGAIN!!!!” He insisted the network outage was coincidental.

I refused to run the tests again, and made him tap the button on my machine. No sooner had his hand lifted from my keyboard, when the collective howl swelled again. The IT admin opened the server room door, and again, it was all blinky lights, and no network access for the company. It was remarkable how quickly the network was getting overwhelmed. Technically, the dev manager and IT team felt it was impossible, but they agreed not to run the tests again until we had investigated the source of the problem. Furthermore, permission and a budget for a test network specifically for load and performance testing was immediately approved by stakeholders.

It turned out that it was an extraordinary event that caused the outage, but it was something that would have happened in production without us catching it internally first. In simple terms, the network cards on the new servers had been set to a default to broadcast to each other when under load, to try to load balance. This was a new feature, that looked good on paper. However there was already had a load balancing system in place, so this was redundant, and harmful. In effect, the servers spammed each other because they were all under load, and the traffic increased exponentially. Machine one would find itself under too much load, so it would message machine two to get it to process excess. Unfortunately, Machine two was also under extreme load and was also messaging machine one, who was messaging machine two for help, as were Machines three and four, messaging each other over and over and over with more and more messages.

To visualize what they were trying to process and the traffic they created themselves, imagine a geometric or hockey stick curve on a graph, or an infinite series in mathematics. The load tests were already creating a huge amount of traffic, but the servers themselves were generating more network traffic at an exponential rate. This traffic generation behavior instantly overwhelmed every component in the corporate network. We quickly turned off that setting in the network cards of the test servers, and then waited for a test network we could safely run the tests on.

The next time we ran the tests, I had several managers breathing down my neck, but the server outages they caused did not cause any network outages. There was no collective howl, no server room full of blinky error lights. We all breathed a sigh of relief, and we went on a find and fix cycle for a few weeks to get the systems ready for a production launch. We were able to ship with a lot of confidence due to this work, and the load tests were part of pre-production tests for years after that launch.

This was a relatively small company, and the impact was fairly low. The entire development team and IT team sat together, and the infrastructure was in a server room on the same floor as the office. We were able to deal with the outages quickly, and the incident became a part of office lore, brought up when a laugh was needed. It wasn’t without political fallout though, since it was disruptive and problematic. Now imagine if this was a larger company, with IT departments in another location, servers at a hosting provider or on the cloud, etc. There could be considerable downtime, and increased costs with hosting providers, etc. While this situation was more lighthearted due to friendships and a tight knit office environment, it could have been extremely serious.

Stay tuned for part 4…

Load Testing Your Web Infrastructure: Please Be Careful. Part 2

In the Part 1 story, time, money and effort were wasted. This story is much more serious. Load and performance testing tools can be simple to get started on, but they belie a good deal of complexity. In other words, a little knowledge can be a dangerous thing. While the tool may look simple, and like there isn’t a lot going on, they have a lot of power and can unleash mayhem on a system. To simulate adequate load, the tools are generating a lot of traffic, which can have unintended consequences unless you know what you’re doing. Using record/playback can be handy when someone has skill and understanding of what they are doing, but when used by someone who is unskilled, can unleash absolute misery. Just because you can use a tool and generate load doesn’t mean that you should.

A Complete Clusterfuck

A year after the Part 1 story, I was brought in to work with some Agile teams that were helping an overwhelmed IT department. Load and performance testing were brought up, but since I had been down that road before, I explained the work and potential pitfalls to stakeholders. They agreed we should treat it as a separate project, and use a cross functional team. However, a high powered consultancy had brought in a team who were desperate to show their mettle. They were skilled, they had a great reputation for turning projects around, but they were extremely arrogant. I was pulled into a meeting with sneering programmers who mocked my experience and concerns about load testing without analysis and careful planning. After my treatment in the meeting, my manager told me to decline further invitations, and let them “sink or swim.”

I didn’t hear much about what they were doing for a few weeks, but then one day a concerned executive assistant called the CTO. The CTO called the IT manager, who in-turn called the people who were on my team. I was on a small cross-functional team that worked on development projects, but we would get pulled into helping fix any difficult production issues. The problem was that the CEO couldn’t access their work email. After rolling our eyes and asking if they had forgotten their password, we realized that webmail access for the entire company was down. The lead IT Admin and I sat next to each other, and he provided me with a play-by-play of what he was doing. He found that the webmail service was hanging, so restarted it. Webmail briefly came up again, but the service started to hang again. Then more reports came in of poor performance on the corporate network, and some services becoming unavailable. He had to restart the mail servers, which in a large organization is not a simple task. It requires communication to all staff, timing warnings over a few minutes, doing the restart, communicating and monitoring. Similarly, certain areas of the network seemed to be under some sort of attack. Was it a security breach? Did someone have a virus or trojan horse?

Eventually, we tracked down the excess traffic to a particular machine, and it was one of the staff consultants from the arrogant consultancy. The IT Admin blocked his IP from the network, and we went to management to figure out what to do next. We wandered over and initiated a chat with a now angry group of consultants who were furious that one of their team members had lost network access. After a brief explanation, and a query as to why they were nuking our network, they admitted they had tasked one of their junior consultants with researching load testing tools. He had downloaded an open source tool, recorded HTTP traffic, played it back, and then kept adding more simultaneous users. There were several problems here, and senior management were furious. The consultant was kicked off the project and escorted out the door, and the consultancy was warned that they were in breach of contract. They had ignored several directives that they had pledged to follow when they signed the contract. As time went on, more problems than the CEO not being able to access webmail started to emerge.

Internally, there were formal complaints to the IT team about a lack of access and downtime. IT was in violation of their commitments for network and tool availability, and management had to spend time mollifying angry managers in other groups. You have to imagine what can happen in an internal network when someone starts generating hundreds of simultaneous requests over and over. Devices get saturated and stop functioning, others go into error mode, and everything slows to a crawl. IT technicians need to identify areas of the network that need intervention, and try to remotely restart services. In some cases, they had to physically go and restart network infrastructure manually. This resulted in thousands of dollars worth of lost time that day.

Remember when I said that if you record traffic for a load testing scenario, it will capture ALL the protocol level traffic on your machine? It turns out that this programmer didn’t know that or think of that. Later that day, the consultancy found out that they were locked out of their corporate messaging system. This is a core tool for a company that has most of its employees distributed at various customer sites. The load test against our system included all the instant message traffic that occurred while he was recording the scenario. They were without their system for days, while they negotiated with the vendor and tried to explain why one of their employees had essentially executed a denial of service attack. They were able to reinstate their corporate account, but that employee was banned from using it.

A few weeks went by, and an IT Manager came storming into our development area with a credit card bill. There were several thousand dollars worth of mystery expenses on it. It turned out that the day of the tests, he had given the consultancy his corporate credit card number “to run a few tests”, and assumed that they would let him know what they had done, and he would call to cancel them. The day of the load test disaster, the credit card company called to let him know they had frozen his account, but he assured them it was ok, people were running a few tests. By the time he had approached the staff consultants, the load testing had been stopped. Unfortunately, no one thought to connect the dots and tell him how his corporate card had been used. Thankfully, the credit card company found the problem and shut down his card, but the damage was done. He had to get a new corporate card, and it took time to dispute the payments and get them refunded. It took time, energy, and other managers had to use their cards on his behalf.

In the end, the consultancy lost their MSA with the company, and they lost credibility due to one person ruining it for everyone else. Unfortunately, a consultancy with people who weren’t as skilled was hired instead, but they were much nicer to deal with. Internally in IT we had hoped the prior consultancy would work out, because they had the skills and experience to deliver. Due to their arrogance, we all lost out. Furthermore, IT lost credibility with the business for allowing a consultant to wreak that much havoc. Because of the sudden, repeated excess traffic from that location, even our corporate ISP had flagged us, and that required finessing and promises to not occur in the future. If we suggested a vendor, stories about this ridiculous situation would be recalled, and we would get stuck with less ideal providers that other groups chose for us. This, plus thousands of dollars of costs, not to mention all the staff work to clean up the mess was caused because someone without the knowledge and skills used a tool they didn’t understand and ran it on our network. Depending on who retells this story, it can even sound amusing, but it was extremely serious. This person downloaded an unauthorized tool against a client corporate policy, recorded some HTTP traffic, then ran this over and over with various sizes of payloads. A few hours of playing around with something they didn’t understand had extremely serious effects.

Click here for Part 3 of the series.

Load Testing Your Web Infrastructure: Please Be Careful. Part 1

Now that I am on the product management side of software projects, I don’t deal with testing approaches in my day-to-day work very much. I get info about product quality criteria, quality goals and metrics, information on testing status and quality, or show stoppers that require attention. Unless I want to dig deeper, I don’t hear much about the actual testing work. Once in a while though, something big pops up on to my radar, usually because there is a threat to a product release, or there is a political issue at play. In those moments, my background as a software tester comes in handy.

Recently, my testing experience was called into action, because of project controversy about load testing.

There were some problems with a retail system in production, and poor performance was blamed. The tech team did not have the expertise or budget for load testing, and were instead pushing the sales team to take responsibility for that testing. The sales team didn’t have any technically minded people on their team, so they approached marketing. The marketing team has people with more technical skills, so a manager decided to take on that responsibility. They asked the team for volunteers to research load testing, try it out, and report back to the technical team. I happened to overhear this, and began waving my arms like the famous robot from Lost in Space who would warn about impending danger by saying: “Danger, Will Robinson!” This is out of character for me, since I prefer to let the team make technical decisions, and rarely weigh in, so people were shocked by my reaction. I will relay to you what I said to them.

Load testing is an important testing technique, but it needs to be done by people with specialized skills who know exactly what they are doing. It also needs to have test environments, accounts, permissions and third party relationships taken into account.

Load testing is a great way to not only find performance issues with your website or backend servers, it will also cause intermittent bugs to pop up with greater frequency. Problems you might miss with regular use will suddenly appear while under load, due to the high volume of tests that are run during a short period of time. High volume automated testing is extremely effective, and one of my favorite approaches to test automation. To do it correctly and to get utility requires work, environment setup, as well as knowledge and skill. Done well, performance bottlenecks are identified and addressed, intermittent bugs are found and fixed, and a good test environment and test suite helps mitigate risks going forward when there are pushes to production. However, when done poorly, load testing can have dangerous results. Here are some cautionary stories.

The simplest load testing tools involve setting up a recorder on your device to capture the traffic to and from the website you are testing. You start the recorder, execute a workflow test, turn off the recorder, and then use that recorded session for creating load. The load testing tool generates a certain number of unique sessions, and replays that test at the transport layer. In other words, it generates multiple tests, simulating several simultaneous users using the website. However, lots of systems get suspicious of a lot of hits coming from a particular device, and protect against that. Furthermore, internal networks aren’t designed for one machine to broadcast a huge volume of data. If you are working from home, your ISP will get suspicious if you are doing this from your account, fearing that your devices are being used for a Denial of Service attack. Payment processors are especially wary of large amounts of traffic as well. So if you use this method, you need to completely understand the system and the environments where you are performing the tests.

Part 1: Expensive Meaningless Tests

Early in my career, I was working with a popular ecommerce system. They were successful with managing load, but felt their approach was too reactive and possibly a bit expensive. If they could do load and performance testing within the organization rather than deal with complaints and outages, they could also improve customer experience. I was busy with other projects, and I had never worked with load testing tools before. Since I was a senior tester, I was asked to oversee the work by a consultant who was a well known specialist, who also worked for a tool vendor that sold load and performance testing tools. To be completely honest, I was busy, I trusted their expertise, and I didn’t pay a lot of attention to what they were doing. One day, they scheduled a meeting with me, and provided an overview. It all looked impressive, there were charts and graphs, and the consultant had a flashy presentation. They then showed me their load tests, and highlighted that they had found “tons of errors”. He said that his two weeks of work had demonstrated that we clearly needed to buy the tool he was selling. “Look at all the important errors it revealed!”

My heart sank. All they had done was record one scenario on the ecommerce system, and then played that back with various amounts of simultaneous users. They were wise enough not to saturate the local network, so they kept the numbers small, but their tests were all useless because they had no idea or curiosity about how the system actually worked. The first problem was that retail systems don’t have an endless supply of goods. Setting up test environments means you set up fake goods, or copies of production inventories that don’t actually result in a real life sale. To make them realistic, you don’t have an infinite number of widgets, unless you need that for a particular test. These tests didn’t take that into account, and the “important errors” his hard work had revealed with the tool were just standard errors about missing inventory. In other words, there were ten test books for sale, and he was trying to buy the 11th, 12th, 13th books. If he had been a real user using a website, the unavailable inventory messages would have been displayed more clearly. Because he was getting errors from the protocol level, they weren’t as pretty. A two minute chat with an IT person or programmer would have set him straight, but he didn’t look into it. He copied the messages and put them in his report, treating them as bugs, rather than the system working just fine, due to his error.

Next, they were using a test credit card number that was provided to us by the payment processor. There are lots of rules around usage of these test numbers, and he was completely oblivious to these rules. In his days of so-called analysis of our system, he had not explored this at all. That meant that our test credit card numbers were getting rejected. This was the source of some of the other “important errors” he had found, but not investigated. This was so egregious to me, I had to stop the meeting and talk to our IT accountant who managed our test credit card. My fears were confirmed – these load tests resulted in our test credit card numbers getting flagged due to suspicious activity. That meant none of us could test using the credit card, and we had to have a meeting explaining ourselves and apologizing to get them reinstated.

I got dragged into developing my own load and performance testing skills because of this. The consultant went back to the office, and I inherited these terrible tests. What I found that was while the load testing tool looked impressive, it had this terrible proprietary programming language that created unmaintainable code. While it had impressive charts and graphs, they were extremely basic and could actually mask important problems. Recording HTTP(S) traffic and playing it back could be fraught with peril, because the recorder is going to pick up ALL the HTTP traffic on your machine, including your instant messages, webmail, other websites that are open, and 3rd party services such as a weather plugin or stock ticker. Also, you need a protected test network that prevents you from causing problems and interfering with everyone else’s work. Then, you need to look at your backend and see what is possible. In my case, I worked with the team to create new load test products on the website, but the backend retail system only allowed a maximum of 9999, since it maxed out with a 4 digit integer. We also had to create a system to simulate credit card processing, since the payment processor wasn’t going to allow thousands of test purchases hitting their machine. Furthermore, our servers had DDoS protection, and would flag machines that were hitting them with lots of simultaneous requests and deny access, so we had to distribute tests across multiple machines. (These issues were all a bit more technical than I am recording here, but this should give you an idea.)

How much time do you think it took to create the environment for load tests, and then to create good load tests that would actually work?

If you answered: “weeks” with several people working on the testing project, then you are in the ballpark.

We also abandoned the expensive load testing tool, mostly due to it using a vendorscript instead of a real programming language. We used one that was based on the same language the development team used, so I would have support, and other people could maintain the tests over time. It was a bit rudimentary, but we were able to identify problem areas for performance, and address those in production. A happy side effect was the load tests caused intermittent issues that we had missed before to become repeatable cases that could be fixed. It was a lot of work, but it was the start of something useful. The tests were useful, the results were helpful, and we had tests that could be understood, maintained and run by multiple people in the organization.

I was fortunate in this case to be able to work with a great team that was finally empowered to do the right thing for the organization. We were also fortunate in our software architecture and design. We spent the time early on to create something maintainable, with simple tests. As a result, our testing framework was used for years before it required major updates.

Click here for Part 2 of the series.

Creating Great Storytelling to Enhance Software Testing Scenarios

Recently, I wrote about Using Storytelling Games in Software Testing, and pointed you to a paper by Martin Jansson and Greger Nolmark. Now I want to give you some tips on creating great storytelling for your testing projects.

First of all, check out Cem Kaner’s work on Scenario Testing: An Introduction to Scenario Testing. I want you to pay special attention to the CHAT (cultural, historical activity theory) model that he talks about. For more on CHAT and testing, read this paper: Putting the Context in Context-Driven Testing (an
Application of Cultural Historical Activity Theory)
.Pay special attention to the descriptions of networks of activity, and tensions. These are vital to help construct variations and different forces within our storytelling. Both of these pieces are foundational and worth the effort to dig into.

Now, I want you to read Hans Buwalda’s article on Soap Opera Testing. This is a nice variation on scenario testing. Buwalda uses television soap operas as inspiration for a story arcs, for structure, and for variation. Remember, there are lots of variations on a theme in testing, as well as real life! Further to that, look into testing tours. Cem Kaner has a blog post with a link or two to help get some background info: Testing tours: Research for Best Practices?.

Soap Opera tests, Testing Tours and Test Scenarios are a great place to start creating good testing stories.

Next, read up on personas in user experience work. Jenny Cham has a really nice description, with lots of helpful links on creating personas here: Creating design personas. Remember to explore her links in this blog, she has great advice here. I wrote a position paper about using UX personas in testing years ago (I will have to dig it up, there’s a dead link) in this blog post. Elisabeth Hendrickson introduced me to this idea, but she recommended using extreme personas such as cartoon characters. I prefer the standard UX methods pioneered by people like Alan Cooper, but the cartoon or other characters are a great place to start, especially if you feel stuck. Personas are a great way to start developing characters for your story that are relevant. What are their motivations when they use our software? What are their fears? What are their cares and worries and distractions?

Next, I want you to read this piece on telling a great story by a famous author: Kurt Vonnegut at the Blackboard. (I am getting to the gamification side of this project, and I asked Andrzej Marczewski for good references on storytelling in games, and this was the first link he sent me. Thanks Andrzej!) Notice the different options for structuring a good story. In testing, we can use different ones for the same scenario, if we think about activity patterns, tensions, characters, and variations during real life product use. Several versions of one story will yield different kinds of important information and observations. Vonnegut provides a simple framework for story creation that we can easily adapt and apply.

Finally, I want you to look at story telling in games. Andrzej talks about it here: I want to experience games not just play them. Notice that within a game context, of a well designed game, he has a sense of cause and effect: decisions made here can impact things in other areas of the game. That’s just like real life, and it is important to add dimensions to storytelling in games for testing. Variation and dimensions have different effects in a system, and they are rewarding to exercise. Now read this piece on Gamasutra The Designer’s Notebook: Three Problems for Interactive Storytellers, Resolved by Ernest Adams. The points about character amnesia, internal consistency and narrative flow are pure gold for testers. We often arrive into a system without really knowing what is going on, especially at first. However, our customers are also starting from scratch when they use our app for the first time. These problems are areas we should also address when creating stories to test around.

There is also a lot of really useful information here: Environmental Storytelling: Creating Immersive 3D Worlds Using Lessons Learned from the Theme Park Industry by Don Carson, particularly with regards to environmental conditions being so important to incorporate (particularly for you mobile testers!) and the idea of an all-encompassing world, rather than one, linear story.

Andrzej also recommends reading Uncle Computer, Tell Me A Story, and Story Structure 104: The Juicy Details.

As testers, we can incorporate more than a linear scenario into our work. We can add so much more depth to our test approach using stories and worlds. Story development in games is incredibly similar to the story telling we need to do in testing. There is a lot to be learned about creating virtual worlds and stories within them to help change our perspective, explore variations and make important discoveries about the software and systems we test. We can leverage these various works that have been provided with us to create something new and powerful.

Some final points to put this all together:

  • Combine the elements from each of the areas I asked you to study above to create a great story, or even better, sets of stories
  • Use structure to create real life conditions: different people, motivations, different environmental conditions, and change.
  • Add plot twists, surprises and ulterior motives, and look for unintended consequences in systems and people
  • Don’t stop at one scenario – create variations on a theme, and change the setting, or the entire world you have created to help change your perspective
  • Introduce different characters – are they interrupting? Helping?
  • Create a beginning, middle and an end
  • Move beyond all happy endings – also try to leave things unresolved, or end on a bad note

I have compiled several foundational concepts to help influence your storytelling, so now the rest is up to you. How you combine them to create something useful is up to you and your team. You have an opportunity to create rich perspectives to kickstart your testing efforts.

Happy storytelling!

Test Quests – Gamification Applied to Software Test Execution

I decided to analyze a game feature, the “quest“, which is used in popular video games, particularly MMORPGs. Quests have some compelling aspects for structuring testing activitues. Jane McGonigal‘s book “Reality is Broken” provided me with a solid analysis of quests, and how they can be adapted to real life activities. Working from her example of a quest (ch. 3 pp. 56) , I created a basic test quest format:

  1. Goal statement (what we intend to accomplish with our testing work)
  2. Why the goal matters (why are we testing this?)
  3. Where to go in the application (what technique or approach are we using to test?)
  4. Guidance (not detailed steps, but enough to help. Bonus points for using video or other rich media examples.)
  5. Proof of completion (how do you know when you are finished?)

A quest is larger than a single testing mission (or a test case), but is smaller than a test plan. It’s a way we can organize testing tasks to help provide a sense of completion and interest, but in areas that require exploration and creativity. Just like in a video game, there are multiple ways to satisfy a quest. Once we have fulfilled a quest, which might take days or hours, depending on how it is created, we can move on to another one. It’s another way of organizing people, with the added bonus of leveraging years of game design success. Furthermore, modern technology involves a lot of collaboration between people in different locations, using different technology to reach a common goal, and we need to adapt testing to meet that. Testing a mobile app in your lab, one tester at a time, won’t really provide useful testing for an app that requires real-time communication and collaboration for people all over the world. MMO’s do a fabulous job of getting people to work hard and co-ordinate activities in a virtual world, and people have fun doing it. I decided to apply it to testing.

Where do quests fit? Think in terms of a hierarchy of activities:

  • test strategy and plan
  • risks that are mitigated through testing
  • different models of coverage that map to risk mitigation
  • test quests
  • sessions, tours, tasks
  • feedback and reporting

A good test approach will have more than one model of coverage (check I SLICED UP FUN for 12 mobile coverage models), and under each model of coverage, there will be multiple quests. Sometimes quests will be repeated when regressions are required.

So why add this structure?

One area I have worked on over the years is using structure and guidance to help manage exploratory testing efforts. In the past, test case management systems provided some measure of coverage and oversight, but they have little in the way of intrinsic value for testers. People get tired of repeating the same tests over and over, but management love the metrics and they provide even though they are incredibly easy to cheat with. Furthermore, from a tester’s perspective there is an extrinsic reward that is inherent in the design of the tools, and they are easy to use. There is also a sense of completion, once I have run through X number of test cases, I feel like I have accomplished something.

With exploratory testing, the rewards are more intrinsic. The approach can be more fulfilling; I personally feel like I am approaching testing in a more effective way, and I can spend my time on high value activities. However, it is harder to measure coverage, and it is more difficult to direct people in areas where coverage is required without adding some guidance. There have been a lot of different approaches to adding structure to exploratory testing over the years to find a balance. Test quests are another approach to adding structure and finding that balance between the intrinsic rewards of pure exploratory testing, and the extrinsic rewards of scripted testing. This is an idea to provide a blend.

As many of you have heard me argue over the years, test cases and test case management systems are merely one form of guidance, there are others. In the exploratory testing community, you will see coverage outlines, checklists, mind maps, charter lists, session sheets, and media such as video demonstrations and all sorts of alternatives. When it comes to managing exploratory testing, one of the first places we start is to use session-based testing management. This approach helps us focus testing in particular areas, and provides a reviewable result, which makes our auditors and stakeholders happy. I’ve used it a lot over the years.

I’ve also used Bach’s General Functionality and Stability Procedure for over a decade to help organize exploratory testing. However, through experience, unique projects and contexts, I have adapted and moved away from the orthodoxy where I saw fit. However, when I started analyzing why people on my teams have fun with testing, SBTM and Bach’s General Functionality and Stability Procedures were big reasons why. Even though I often use a much more lightweight version of SBTM than he has created, people appreciate the structure. The General Functionality and Stability Procedures is a great example of guidance for analysis, exploration, and great things to do as testers.

The other side of fun on the teams I work on are related to humour, collaboration and technology. We often come up with nicknames, and divide up testing into teams and hold contests. Who can come up with the best test approach? Who recorded the best bug report video? Who found the most difficult to find bug last week? What team has the most pop culture references in their work? Testing is filled with laughter, excitement and learning, and some good plain old fashioned silly fun. We communicate constantly using technology to help stay up to speed on changes and progress, and often other team members want to get in on the action. Sometimes, it’s hard to get the coders to code, the product owners to product own, and the managers to manage, because everyone wants in on the fun. In the midst of this fun is incredibly valuable testing. Stakeholders are blown away by the productivity of testing, the volume of useful information produced, the quality of bugs, and the detailed, useful information from bug reports to status reports and quality criteria that is produced. While there is laughter and fun, there is hard work going on. I learned why this is so effective reading Jane McGonigal’s work.

In Reality is Broken, Jane McGonigal describes Augmented Reality Games (ARGs). These are real life activities that are gamified – they have a game-like structure applied to them. She mentions Chore Wars, and how gamifiying something as mundane as household chores can turn it into a fun activity. She mentions that since cleaning the bathroom is a high value activity in the game, her and her husband have to work hard to try to clean it before the other does. McGonigal explains that since there is a choice, and meaning attached to the task, people choose to do it under the mechanism of the game. It’s not that awful thing no one wants to do anymore because it is unpleasant, when framed within a game context, it is a highly sought after quest or task to complete. You get points in the game, you get bragging rights, you get intrinsic rewards as well as the extrinsic clean bathroom. Amazing.

If we apply that to testing, how about using lessons from ARGs to gamify things like regression testing, or test data creation, or other maintenance tasks we don’t like doing? One way we can do this is to sprinkle these tasks within quests. You can only complete the quest by finishing up one of these less desirable tasks.

In Reality is Broken, McGonigal defines a game as having four traits: a goal, rules, a feedback system, and voluntary participation (pp.21). Working backwards, in exploratory testing, a lot of what we do is voluntary because testers have some degree freedom to make decisions about what they are going to test, even if it is within narrow parameters of coverage. Furthermore, we can choose a different model of coverage to reach a goal. For example, I was working with an e-commerce testing team who were bored to death of testing the purchasing engine because they were following the same set of functional test scripts. To help them be more effective and to enjoy what they were doing, I introduced a new model of coverage to test the purchasing engine: user scenarios. Suddenly, they were engaged and interested and found bugs they had previously missed. I then helped them develop more models of coverage so that they could change their perspective and test the same thing, but with variation to keep them engaged and interested while still satisfying coverage requirements. As humans, we need to mix things up. Previously, they had no choice – they were told to execute the tests in the test case management system, and that was the end of it.

Feedback systems are often linked to bug reporting systems in testing. But I like to go beyond that. Bring in other people to test with you in pairs, trios or whatever combination to bring more ideas to the table. This isn’t duplicated testing, but a redoubling of brain power and effort. I also utilize instant messaging, IRC, and big visible charts to help encourage feedback across functional areas of teams.

Rules in testing are often related to what is dictated to us by managers, developers, and tradition. It boggles my mind how many so-called Agile programmers will demand their testers work in un-Agile ways, expecting them to create test plans, test cases and use test case management systems. When I ask the programmers if they would like to work that way, they usually say no. Well guess what, not many other homo sapiens like to work that way either. I prefer to have rules around approach. We have identified risks, and models of coverage to mitigate those risks, and we use people, tools and automation to help us reach our goals. Rather than count test cases and bugs, we rate our team on our ability to get great coverage and information that helps stakeholders make quality-related decisions.

Finally, a goal in testing needs to be project-specific. If you want to fail, you just copy what you did last time on your test project. The problem with that is you are unaware of any new risks or changes and you’ll likely be blind to them. Every project has a goal, a way we can measure whether we did the right sort of work to help reach that goal, rather than “run the regression tests, automate as many as possible, and if there is time, do other testing”, we have something specific that helps ensure we aren’t doing busy work, but we’re creating value.

When it comes to quests, they can have this format as well. A goal, a feedback system, rules or parameters on where to test, and voluntary participation. As long as all the quests are fulfilled for a project, it doesn’t matter who did them.

It turns out that my application of SBTM, Bach’s General Funcationality and Stability Procedure, plus some zany fun and utilizing technology to help socialize, report and record information, I was right next door to gamification. Using gamification as a guide, I hope to provide tools for others who also want to make testing effective and fun. A test quest is one option to try. Consider using avatars, fun names and anything that resonates with your team members to help make the activity more fun. Also consider rewards for difficult quests and tasks such as a free meal, public kudos, or time off in lieu. Get creative and use as much or as little from the video game world as you like.

Some of my goals with test quests are:

  • Enough structure to provide guidance to testers so they know where to focus efforts
  • Not so much structure (like scripted test cases) that personal choice, creativity and exploration are discouraged or forbidden
  • Guidance and structure is lightweight so that it doesn’t become a maintenance burden like our scripted regression test cases become (both manual and automated)
  • Testers get a sense of purpose, they get a sense of meaning in their work, and completion by completing a set of tasks in a quest
  • Utilize tools (automated tests, automated tasks, simulators, high volume test automation, monitoring and reporting) to help boost the power of the testers and be more efficient and effective, and to do things no human could do on their own
  • Encourage collaboration and sharing information so that testers can provide feedback to other project team members on the quality of the products, but also get feedback on their own work and approaches
  • Encourage test teams to use multiple models of coverage (changing perspectives, using different testing techniques and tools) on a project instead of thinking of coverage as a singular thing
  • Utilize an effective gaming structure to augment reality and encourage people to have fun working hard at testing activities

I am encouraging testing teams to use this as a structure for organizing test execution to help make testing more engaging and fun. Feel free to add as many (or few) elements from video game quests as you see fit, and alter to match the unique personalities and goals of the people on your team. Or, study them and analyze how you organize your testing work for you and your teams. Does your structure encourage people to have fun and work hard at accomplishing something great? If not, you might learn something from how others have managed to get people to work hard in games.

Happy questing!

Applying Gamification to Software Testing

I wrote an article for Better Software magazine this month called “Software Testing is a Game”, available here in PDF format. I wrote about using gamification as an approach to analyze and help make software testing more engaging. I encouraged readers to apply some ideas from gamification to their own testing efforts. Now, why would I do a thing like that? And what do I mean by using game mechanics when we are testing? Games are all well and good, and I may enjoy them, but we are talking about serious work here, why would we make it look like a game?

Let me give you a bit of background information.

I was working with my friends Monroe Thomas and David McFadzean on product strategy when they started bringing up my gamification design ideas. I use gamification in mobile app design to help them be more engaging for users. That doesn’t mean that I make an app look like a game, it means I use ideas from games to help make the app more interesting and easier to use. However, we weren’t talking about mobile apps, so I was a bit surprised. They pointed out that the same concepts that make gamification in mobile apps apply to other apps, after all, David and I even wrote an article about using gaming when creating software processes. Why couldn’t I use those ideas in a product strategy meeting for something else?

Good point.

In fact, they even urged me to look at some of my other prior app designs, they felt I would find gamification-style aspects in those as well, because I always worry about making apps more engaging. Once I started thinking about the implications of what they were saying, an entire new world of possibility opened up. I felt like they had just kicked open a big door of perception for me.

But wait a minute. What is this business about games? Well, the thing with gamification is that when I use those tools correctly in an app, you don’t know it is there. I don’t put childish badges and leaderboards in a productivity app and then say: “Look! gamification at work!” for example. Andrzej Marczewski describes gamification mechanics in terms we can relate to in his blog Game Mechanics in Gamification as: Desired Behavior, Motivation and Supporters.

Andrzej uses a game format to illustrate his point, but it should be obvious that these three themes are not limited to games. Where game designers shine, and where policy wonks and enterprise or productivity designers tend to fail is in the structure around desired behavior. Too often, we just expect people to excel in a work place environment with little support. Games on the other hand tickle our emotions, they captivate us, and they encourage us to work hard at solving problems and reaching goals.

Framing something like software testing in terms of gaming, and borrowing some of their ideas and mechanics, applying them and experimenting can be incredibly worthwhile. After all, as I state in the article, it is difficult to get people involved in software testing, and as technology becomes more pervasive and more enmeshed in our every day lives, it has more potential to do harm. We need new people and new ideas and new approaches, and I want to figure out how to make it more engaging for people. Why can’t effective testing be fun?

It can.

If you work on a team with me, you will notice that there is a lot of laughter, a lot of collaboration, a lot of discovery and learning. And everyone tests from time to time. Sometimes, it can be difficult to get the coders to code, the designers to design and the managers to manage, because everyone wants to test. Why is that? Well, gamification can help provide a structure to analyze what we do and learn why some things are fun and help us work hard, while others cause us to avoid them.

Speaking of analyzing something from a gamification perspective, remember in the Better Software article how I described several aspects from gaming and asked you to apply it to your testing work? Prior to writing the article, I did exactly that with a product I designed called Session Tester. Aaron West and I developed a tool to help testers capture information while using an approach called Session-Based Testing. We had high hopes for the project, but after several setbacks, it’s now dormant. However, a back of the napkin analysis of the tool using a gamification approach was incredibly useful. This is what we came up with, using game concepts from Michael Wilson’s “Gamification: You’re Doing it Wrong!” presentation:

  1. Guidelines and Behaviors:
    Context and rules around the tool was hit and miss. The tool enforces the basic form of session-based testing which helps people learn how to approach testing from this perspective. People are required to fill in the minimum information to create a session sheet. There are strategy ideas readily at hand, and the elements are easily added by using tags. The tool was helpful to teach beginners on the basic form of SBT, but we didn’t enforce the original SBTM rules as set out by James and Jon Bach. This hurt the tool’s effectiveness. While we value the ability for people to modify and adapt, we should have started with the known rules and then provided the ability to adapt, rather than design it from an adapted view. This caused confusion and controversy.
  2. Strategies and Tasks:
    Elisabeth Hendrickson’s ET Heuristics Cheatsheet is provided in the tool to help people think about strategy, and there are oblique strategies to help create test ideas using the Prime Me! button. There could be more resources added to help with strategy, and in fact a lot of the strategy work can be done outside of the tool. We could have done more feature-wise to help with strategy. Tasks can be pre-planned outside of the tool, or done on the fly and recorded with the @tasks tag, which is saved in session sheets. We could also have done more to support tasks.
  3. Risks and Rewards:
    There is a risk that you don’t have a productive session, or your session sheet is woefully inadequate. The timer was a good motivator since you run the risk of running out of time, so there was a bit of a game there with trying to beat the clock and have a focused, productive session. I designed that to be analogous to the “red bar green bar game” used in Test Driven Development tools. There is a reward inherent in getting your mission completed and having a good session sheet you can be proud to share, but it is completely intrinsic. You are also rewarded a bit with the Prime Me! button to help you get a new idea, or break a creativity log jam. We could have done a lot more to help people plan and manage risks, and add features to reward testers for using a good assortment of tags, or a peer-reference or reward system for great testing. The full bar showing once time has run out helps tickle an intrinsic reward of completion. As a tester, I did all I could in that session, and now I can move on to other things.
  4. Skill and Chance Events:
    Skilled testers often like to record what they discover, to have the freedom to investigate areas of high value, and take pride in having a varied approach to their testing. However, there is no extrinsic reward for completion of session sheets. Sheets with more tags having a higher score might have been a good option to add,to help people learn how to improve what they record. Outside of discovering bugs, chance events are brought in by the Prime Me! button. Like rolling a dice, people can click the button until an oblique strategy jiggles their brain in a different direction. The Prime Me! button is the most popular feature of the tool and is still demonstrated at testing conferences by people like Jon Bach. People find it fun and useful.
  5. Cheating and Compliance:
    Cheating: Anyone who uses a test case management system will have a high degree of cheating. People just get tired of the regression tests they run over and over and start clicking pass or fail to show progress. They are very easy to cheat, but a session-based approach is much more difficult to cheat, because you have to show a description of a testing session. However, there is nothing to prevent people from saving an empty session sheet. I have seen this happen on over worked teams, and it wasn’t discovered for weeks. We could possibly have looked at flagging incomplete or blank session sheets in the system so there is visibility on them /prior/ to an audit, or encourage people to do something about it within the tool. Compliance was a big miss because we altered the original SBTM rules, which caused a lot of controversy and prevented more widespread adoption. We should have enforced the original rules by supporting the Bach SBTM format first, then added the ability to adapt it instead of approaching it from the other direction.

It’s interesting to note that the aspects that made this tool popular and engaging can also be viewed in terms of gaming mechanics. A couple of them were there by design, but the others were just there because I was trying to make the app more engaging. However, if we had used this gamification structure during design of the tool, we would have had different results, and arguably a better tool, because it provides a more thorough structure. Areas of fun such as the Prime Me! button, and trying to automate some of the processes of SBTM helped make the experience more enjoyable for our users.

However, if you didn’t look at the tool from a gaming perspective, you wouldn’t notice that there are game mechanics at play within it. This is an example of using a gamification approach that goes beyond superficial leaderboards and rewards, and I encourage you to try it not only with your testing tools, but your processes and practices in testing. Use it as a system to analyze: What is working well? Where are you lacking? It’s a useful, systematic approach.

That analysis doesn’t look like a childish game does it? Bottom line: if you aren’t a gamer, you probably won’t notice the gaming aspects I bring into testing process and tools. If you are a gamer, you’ll notice the parallels right away, and will hopefully appreciate them. For both groups, hopefully gamification will be one tool we can use to help make testing more engaging and fun.

Software Testing Training and Gaming

If you spend time at conferences, or hire a well-known testing consultant to provide some training for your company, it’s likely that one or more of them have used game mechanics as teaching tools. In fact, they probably used them on you. You may not be aware that they did, but they used gaming mechanics to help you learn something important.

James Bach is famous for using magic tricks and puzzle solving as teaching tools. When I spent time with James learning about how to be a more effective trainer, he told me that magic tricks are great teaching tools because we all love to be fooled. When we are fooled by something, we are entertained, and our mind is primed for learning about what we missed during the trick. That is an ideal state for the introduction to new ideas. If you spend any time with James or any of his adherents at a conference or peer workshop, you will likely be inundated with puzzles to solve. There is always a testing lesson to be learned at the end, and it is a novel way of helping people learn through solving a tangible problem. If you love to solve puzzles and learning about testing, you’ll enjoy these experiences.

Dorothy Graham has a board game that she developed for testing tutorials. It’s a traditional style game that she created as a training aid, and Dot loves to deliver this course. The tutorial attendees have a lot of fun, and they learn some important lessons, but Dot admits she may even have more fun than they do. Dot loves training, and the game takes the entertainment value of learning up a few notches. I’ve taught next door to Dot and heard attendees as they play the game and learn with her, and I’ve seen their smiling faces during breaks and after the course. There is something inherently positive about using a real, physical game, designed for a specific purpose (and fun) in this way.

Fiona Charles and Michael Bolton also created a board game for a software development game workshop they facilitated in 2006. Fiona says:  “Our experience with the game highlighted the power of games and simulations in teaching: their ability to teach the participants (and the teachers) more than was consciously intended.”

Ben Simo uses a variation on a board game. I’m not going to give it away, since it’s highly effective, but he used it on me when I was moving from a dabbler in performance and load testing to working on some serious projects. Ben is an experienced and talented performance tester, and he has taught a lot of people how to do the job well. Ben spent hours with me using pieces from a board game, and posing problems for me and having me work on solving them. It was highly interactive, was chock full of performance testing analysis lessons, and we enjoyed working together on it. He would set up the scenario, enhanced by the board game, and I would work on approaches to solve it. I had about 15 pages of notes from this game play activity to take back and apply to my work on Monday. After playing this training game with Ben, I had much more confidence and I was able to spot far more performance anomaly patterns than I had prior to working with him. (We worked through this in a hotel lounge, and we got a lot of weird looks. We didn’t care, we were having fun! Besides, channeling Ralph Wiggum: I was “learnding”!)

James Lyndsay developed a fascinating course on exploratory testing, and with it, simple “black box test machines” that he developed in Flash to aid in experiential learning. These machines had no text on them, and they are difficult to start using, because there are no outward signs of what they are for. This is done on purpose, and each machine helps each class participant experience the lesson through their own exploration and discovery. This is one of my favorite game-like experiences in a testing training course. The machine exercises remind me of a puzzle adventure game. One of my favorites of this type of game is Myst. You have to explore and go off of your observations and clues to figure out what to do, and the possibilities for application and experience are wide open. James managed to create 4 incredibly simple programs that can replicate this sort of game experience during training. Simply brilliant.

Those of you who follow Jerry Weinberg, or the many consultants who have been influenced by him have likely worked through simulations during a workshop or tutorial. Much like an RPG (role playing game), attendees are organized around different goals, roles, activities and tasks to create an improvised simulation of a real-life problem. This involves drawing on improvisation, your “pretending” skills and applying your problem solving techniques in a different context than a work context. Many people report having very positive experiences and “aha!” moments when learning from these sorts of activities.

Another theme in Jerry’s people working is physical activity. Jerry gets people to move around, and he can influence the mood of the room by adding in physical activity to a workshop. In the book, the Gift of Time, Fiona Charles shares a poignant story about Jerry using a movement activity to calm down a room full of people during a workshop when they first learned about the events of September 11. Michael Bolton has told me several stories of how Jerry changes the learning dynamic by getting people to move and work in different parts of the room, or grouping people and having them move and work with others in creative combinations. Movement is a huge part of many games, especially sports and outdoor activities, and it gets different parts of our brain working. If you couple movement with learning concepts, it brings together more of your senses to help with concept retention. It is also associated with good health, a sense of well being and fun.

(Speaking of experiential learning, pretty much everyone I have mentioned here, including me, (and a lot more trainers you have heard of) have been influenced either directly or indirectly by Jerry Weinberg’s work on experiential learning. He even has a series books on the topic on Leanpub. The first one: Experiential Learning: Beginning , the second: Experiential Learning: Inventing and the third: Experiential Learning: Simulation.)

There are other examples of trainers using game structures in software testing, and I’ve probably missed some obvious ones. (I haven’t even told you about the ones I use, but that doesn’t matter.) These are some good examples off the top of my head that demonstrate the use of game mechanics in teaching.

I wanted to point out that each of them use game mechanics to teach serious lessons. While people may have fun, they come away with real-world skills that they can apply to their work as soon as they are back in the office.

Don’t be turned off by the term “game” when it comes to serious business – if you look at gaming with an open mind, you’ll see that it is all around us, being used in effective ways.

Did I miss a good software testing training gaming example? Please add them in the comments.

Edit: I just discovered an interesting post on games and learning on the testinggeek.com blog: Software Testing Games – Do They Help?