Category Archives: systems

Load Testing Your Web Infrastructure: Please Be Careful. Part 3

In the Part 2 story, we saw what a load testing tool can do when it is used by someone who doesn’t have the right knowledge and skill about the tool and underlying systems. However, you also need to understand the environment where you would need to use the tool. Creating and using test environments that are optimized for load and performance testing is a must. If you use these tools on a regular network, you will likely disrupt everyone else at the office, causing lost productivity and extra work for IT staff. The last thing you want to do is try them out at home, and end up blacklisted by your ISP (internet service provider).

Bye Bye Network!

After a while, I was an old hand at load and performance testing. To bolster my hands-on experience, I attended workshops on how to overcome technical restrictions, how to accurately analyze the data and find problems others would miss, how to write reports and describe risk and problems, and I was adept with a handful of tools. I started to get hired for performance and load testing gigs, and under the right circumstances, I had some rewarding and fun projects. I worked with a lot of talented people with vastly different skills, and learned from each of them.

Since I had a lot of retail and telco experience, a work friend asked me to come in to help him with a large retail system that was going through an upgrade. One of my tasks was to provide load testing help, since they were upgrading all the software and hardware for their back end system. I was given a lot of freedom to choose the tools, to interview everyone I could about any backend system issues, how to simulate credit card processing, etc. I was given a lot of freedom to research and design exactly what they needed. However, I was not given a test network to run the tests, so I never used any load. I verified my load tests would work with only one user.

To find potential areas of concern, we set up monitoring at several key areas on the system, and I had test results output in a format we could utilize with statistical analysis software. We also monitored server utilization, and recommended moving some processes around to better utilize the system. We learned a lot, but I wasn’t ready to unleash full load testing capabilities without a dedicated test network. There was no way I wanted to use this on the corporate network, even though we knew it would only run against our internal test system. I knew from experience that we could overload the internal network and cause problems for others. My friend, the dev manager, ignored my concerns. He was confident that the internal network would handle the extra traffic, since the IT admins had shown him that it was perpetually under-utilized.

Despite my objections, the dev manager insisted I run the load tests on the regular internal network. To start, he wanted to run the tests with 1000 simultaneous users, but I suggested we try something smaller. I wanted to try 10, he insisted we try 100. Still objecting, I hit the “Enter” key on my machine to start the tests. Immediately, a collective howl started to swell across the entire floor of the office. Then people started calling out that they had no network access. The dev manager and the IT manager ran to the server room, and when they unlocked it, all we could see in the dark rook was a sea of blinking red and yellow lights. Clearly, my load tests had overwhelmed the entire network, and every piece of hardware was in an error state. No one in the office was able to do work until all of the equipment was restarted. It took about a half hour to get the network up and running again, and the first thing my friend said was: “TRY IT AGAIN!!!!” He insisted the network outage was coincidental.

I refused to run the tests again, and made him tap the button on my machine. No sooner had his hand lifted from my keyboard, when the collective howl swelled again. The IT admin opened the server room door, and again, it was all blinky lights, and no network access for the company. It was remarkable how quickly the network was getting overwhelmed. Technically, the dev manager and IT team felt it was impossible, but they agreed not to run the tests again until we had investigated the source of the problem. Furthermore, permission and a budget for a test network specifically for load and performance testing was immediately approved by stakeholders.

It turned out that it was an extraordinary event that caused the outage, but it was something that would have happened in production without us catching it internally first. In simple terms, the network cards on the new servers had been set to a default to broadcast to each other when under load, to try to load balance. This was a new feature, that looked good on paper. However there was already had a load balancing system in place, so this was redundant, and harmful. In effect, the servers spammed each other because they were all under load, and the traffic increased exponentially. Machine one would find itself under too much load, so it would message machine two to get it to process excess. Unfortunately, Machine two was also under extreme load and was also messaging machine one, who was messaging machine two for help, as were Machines three and four, messaging each other over and over and over with more and more messages.

To visualize what they were trying to process and the traffic they created themselves, imagine a geometric or hockey stick curve on a graph, or an infinite series in mathematics. The load tests were already creating a huge amount of traffic, but the servers themselves were generating more network traffic at an exponential rate. This traffic generation behavior instantly overwhelmed every component in the corporate network. We quickly turned off that setting in the network cards of the test servers, and then waited for a test network we could safely run the tests on.

The next time we ran the tests, I had several managers breathing down my neck, but the server outages they caused did not cause any network outages. There was no collective howl, no server room full of blinky error lights. We all breathed a sigh of relief, and we went on a find and fix cycle for a few weeks to get the systems ready for a production launch. We were able to ship with a lot of confidence due to this work, and the load tests were part of pre-production tests for years after that launch.

This was a relatively small company, and the impact was fairly low. The entire development team and IT team sat together, and the infrastructure was in a server room on the same floor as the office. We were able to deal with the outages quickly, and the incident became a part of office lore, brought up when a laugh was needed. It wasn’t without political fallout though, since it was disruptive and problematic. Now imagine if this was a larger company, with IT departments in another location, servers at a hosting provider or on the cloud, etc. There could be considerable downtime, and increased costs with hosting providers, etc. While this situation was more lighthearted due to friendships and a tight knit office environment, it could have been extremely serious.

Stay tuned for part 4…

Load Testing Your Web Infrastructure: Please Be Careful. Part 1

Now that I am on the product management side of software projects, I don’t deal with testing approaches in my day-to-day work very much. I get info about product quality criteria, quality goals and metrics, information on testing status and quality, or show stoppers that require attention. Unless I want to dig deeper, I don’t hear much about the actual testing work. Once in a while though, something big pops up on to my radar, usually because there is a threat to a product release, or there is a political issue at play. In those moments, my background as a software tester comes in handy.

Recently, my testing experience was called into action, because of project controversy about load testing.

There were some problems with a retail system in production, and poor performance was blamed. The tech team did not have the expertise or budget for load testing, and were instead pushing the sales team to take responsibility for that testing. The sales team didn’t have any technically minded people on their team, so they approached marketing. The marketing team has people with more technical skills, so a manager decided to take on that responsibility. They asked the team for volunteers to research load testing, try it out, and report back to the technical team. I happened to overhear this, and began waving my arms like the famous robot from Lost in Space who would warn about impending danger by saying: “Danger, Will Robinson!” This is out of character for me, since I prefer to let the team make technical decisions, and rarely weigh in, so people were shocked by my reaction. I will relay to you what I said to them.

Load testing is an important testing technique, but it needs to be done by people with specialized skills who know exactly what they are doing. It also needs to have test environments, accounts, permissions and third party relationships taken into account.

Load testing is a great way to not only find performance issues with your website or backend servers, it will also cause intermittent bugs to pop up with greater frequency. Problems you might miss with regular use will suddenly appear while under load, due to the high volume of tests that are run during a short period of time. High volume automated testing is extremely effective, and one of my favorite approaches to test automation. To do it correctly and to get utility requires work, environment setup, as well as knowledge and skill. Done well, performance bottlenecks are identified and addressed, intermittent bugs are found and fixed, and a good test environment and test suite helps mitigate risks going forward when there are pushes to production. However, when done poorly, load testing can have dangerous results. Here are some cautionary stories.

The simplest load testing tools involve setting up a recorder on your device to capture the traffic to and from the website you are testing. You start the recorder, execute a workflow test, turn off the recorder, and then use that recorded session for creating load. The load testing tool generates a certain number of unique sessions, and replays that test at the transport layer. In other words, it generates multiple tests, simulating several simultaneous users using the website. However, lots of systems get suspicious of a lot of hits coming from a particular device, and protect against that. Furthermore, internal networks aren’t designed for one machine to broadcast a huge volume of data. If you are working from home, your ISP will get suspicious if you are doing this from your account, fearing that your devices are being used for a Denial of Service attack. Payment processors are especially wary of large amounts of traffic as well. So if you use this method, you need to completely understand the system and the environments where you are performing the tests.

Part 1: Expensive Meaningless Tests

Early in my career, I was working with a popular ecommerce system. They were successful with managing load, but felt their approach was too reactive and possibly a bit expensive. If they could do load and performance testing within the organization rather than deal with complaints and outages, they could also improve customer experience. I was busy with other projects, and I had never worked with load testing tools before. Since I was a senior tester, I was asked to oversee the work by a consultant who was a well known specialist, who also worked for a tool vendor that sold load and performance testing tools. To be completely honest, I was busy, I trusted their expertise, and I didn’t pay a lot of attention to what they were doing. One day, they scheduled a meeting with me, and provided an overview. It all looked impressive, there were charts and graphs, and the consultant had a flashy presentation. They then showed me their load tests, and highlighted that they had found “tons of errors”. He said that his two weeks of work had demonstrated that we clearly needed to buy the tool he was selling. “Look at all the important errors it revealed!”

My heart sank. All they had done was record one scenario on the ecommerce system, and then played that back with various amounts of simultaneous users. They were wise enough not to saturate the local network, so they kept the numbers small, but their tests were all useless because they had no idea or curiosity about how the system actually worked. The first problem was that retail systems don’t have an endless supply of goods. Setting up test environments means you set up fake goods, or copies of production inventories that don’t actually result in a real life sale. To make them realistic, you don’t have an infinite number of widgets, unless you need that for a particular test. These tests didn’t take that into account, and the “important errors” his hard work had revealed with the tool were just standard errors about missing inventory. In other words, there were ten test books for sale, and he was trying to buy the 11th, 12th, 13th books. If he had been a real user using a website, the unavailable inventory messages would have been displayed more clearly. Because he was getting errors from the protocol level, they weren’t as pretty. A two minute chat with an IT person or programmer would have set him straight, but he didn’t look into it. He copied the messages and put them in his report, treating them as bugs, rather than the system working just fine, due to his error.

Next, they were using a test credit card number that was provided to us by the payment processor. There are lots of rules around usage of these test numbers, and he was completely oblivious to these rules. In his days of so-called analysis of our system, he had not explored this at all. That meant that our test credit card numbers were getting rejected. This was the source of some of the other “important errors” he had found, but not investigated. This was so egregious to me, I had to stop the meeting and talk to our IT accountant who managed our test credit card. My fears were confirmed – these load tests resulted in our test credit card numbers getting flagged due to suspicious activity. That meant none of us could test using the credit card, and we had to have a meeting explaining ourselves and apologizing to get them reinstated.

I got dragged into developing my own load and performance testing skills because of this. The consultant went back to the office, and I inherited these terrible tests. What I found that was while the load testing tool looked impressive, it had this terrible proprietary programming language that created unmaintainable code. While it had impressive charts and graphs, they were extremely basic and could actually mask important problems. Recording HTTP(S) traffic and playing it back could be fraught with peril, because the recorder is going to pick up ALL the HTTP traffic on your machine, including your instant messages, webmail, other websites that are open, and 3rd party services such as a weather plugin or stock ticker. Also, you need a protected test network that prevents you from causing problems and interfering with everyone else’s work. Then, you need to look at your backend and see what is possible. In my case, I worked with the team to create new load test products on the website, but the backend retail system only allowed a maximum of 9999, since it maxed out with a 4 digit integer. We also had to create a system to simulate credit card processing, since the payment processor wasn’t going to allow thousands of test purchases hitting their machine. Furthermore, our servers had DDoS protection, and would flag machines that were hitting them with lots of simultaneous requests and deny access, so we had to distribute tests across multiple machines. (These issues were all a bit more technical than I am recording here, but this should give you an idea.)

How much time do you think it took to create the environment for load tests, and then to create good load tests that would actually work?

If you answered: “weeks” with several people working on the testing project, then you are in the ballpark.

We also abandoned the expensive load testing tool, mostly due to it using a vendorscript instead of a real programming language. We used one that was based on the same language the development team used, so I would have support, and other people could maintain the tests over time. It was a bit rudimentary, but we were able to identify problem areas for performance, and address those in production. A happy side effect was the load tests caused intermittent issues that we had missed before to become repeatable cases that could be fixed. It was a lot of work, but it was the start of something useful. The tests were useful, the results were helpful, and we had tests that could be understood, maintained and run by multiple people in the organization.

I was fortunate in this case to be able to work with a great team that was finally empowered to do the right thing for the organization. We were also fortunate in our software architecture and design. We spent the time early on to create something maintainable, with simple tests. As a result, our testing framework was used for years before it required major updates.

Click here for Part 2 of the series.

Designing a Better Life

Often, my public work lags behind my current interests or passions. That’s ok, it usually catches up in time. However I wanted to talk about my current focus and passion right now: designing systems for a better life. If you read my blog regularly, you will notice a shift towards design, user engagement and other topics. I wanted to explain why.

This has been a tough year. I’ve been on the road a lot, and I have met a lot of fantastic people and worked with some amazing organizations. However, I have been away from my family and friends here at home, and I have missed out locally. The Alberta flood disaster forced me to look at my local, real life. This spring we found ourselves evacuated from our home, staying with friends wondering if the flood would wipe out our house and property. Unlike many others, we were very fortunate and came home to no damage, but it changed our perspective. A few days earlier, we took for granted that we had a safe, dry, secure home to always use as a refuge no matter what happened in our work or public lives. We came home and celebrated with our neighbors that we were all ok, and then we did what we could to help each other. I realized I need to do more to contribute to my local community as well as virtual communities.

The Alberta floods, like so many natural disasters, brought the best out in people. Organizers were turning away volunteers because they had too many, and entrepreneurial types turned their energies towards creating systems to harness that energy and that willingness to help others. I was amazed at how people used social media to mobilize people to work for a common goal and to help out others. Mobile technology wasn’t just about screen sizes and sensors and wireless conditions or merely staying informed about the emergency going on around them, (which was incredibly useful and important.) What was more interesting was the technology was helping people help others, and to mobilize together to collaborate. This is incredibly powerful. The technology enabled people to do something in real life. It wasn’t just about sharing pictures of food and videos of cats on social media, or wiling away hours playing Candy Crush or Angry Birds. This technology was exploited to make all of our lives a bit better as we lived through a natural disaster together. Those who were unaffected and wanted to help just had to grab their mobile device and utilize social media to find out what they could do to help. Those who were affected could get informed, ask for help or just read messages of encouragement.

Mobilization and collaboration to help work together to help others or to solve problems is an important area that I am exploring through human and technology systems.

Mobilization can be harnessed for helping organizations and groups of people solve really hard problems. Distributed computing can be combined with crowdsourcing to distribute problem solving amongst our most powerful tool at our disposal: the human brain. Projects like fold.it provide problems in a gaming context to help provide vital information for researchers who are looking at combating disease, or providing health care technology to improve our lives. These are enormous problems that have an impact on all of us. On a smaller scale, we can focus our energy and mobilize the people in our social circles to help us achieve health goals or recover from injury with the SuperBetter game created by Jane McGonigal. These are two powerful examples of how we can use technology and humanity together to solve problems.

Those of you who follow my writing know that this is an area that is important to me, even on simple tasks like test automation where I prefer human involvement in the computing work (see Man and Machine: Combining the Power of the Human Mind with Automation for more.) In the past, we have tried to outsource difficult problems to machines, and now we are learning better ways of getting the best of both worlds – the computing systems support us and do what they do well and enable us to really take advantage of collective wisdom and interests. I think we are just scratching the surface on this space.

Distributed collaboration to solve really hard problems is an area I am looking into more.

I’ve done a lot of work with mobile applications, and many of you are familiar with my book and course on testing mobile applications. I have trained hundreds of people, and many more have read my ideas about testing mobile apps or web experiences, but that is only part of the picture, and I do a lot more in this space than my public work suggests. To create a great mobile experience, we need vision from business leaders on how they want to use the tech – are they merely supporting it, or are they embracing mobile technology to transform their interactions with the people they are trying to help? Are they looking at mobile as something they are forced into, or are they looking to it as a new area to help increase revenue and loyalty? If business leaders are reluctant, that vision (or lack of vision) will make its way all the way down through the project, and ultimately in a poor customer experience. On the other hand, a great mobile vision is only as good as the technology that was chosen, the design of the application, and the quality of the customer experience. I have been helping organizations to create great mobile experiences in each of these areas.

A quality mobile experience requires great vision, careful choice of technology, a design that engages customers, and is reliable for people who are on the move in the real world. That reliability also depends on great design, programming and testing. That quality experience can’t be tested in at the end, so many organizations are asking for my help in other areas, such as a mobile strategy from an executive level, how to choose the best mobile technology to fit that vision, what areas need to be addressed in mobile design, and then quality practices in programming and testing. This is a fascinating area to work in, because there are many more areas to be aware of than we are used to in software development.

A fantastic mobile experience from project vision, design and execution on down to you, the person holding the device, can make your life easier, but a poor quality experience can ruin your day. I am learning how to improve this experience and I want to show you how you can too.

Some of you have wondered why I am talking about things like gamification. I am less concerned about the gaming aspect, I am more concerned with what lessons we can learn from this field with regards to collaboration and finding meaning in what we do. Modern knowledge work can be difficult to deal with over time. If the power goes out, all our work disappears, so many struggle to find motivation and meaning in their work and careers.

To me, gamification is just one of several potential models of engagement, and we can use it in different ways. If you are in a job that is difficult and you are losing hope, don’t be threatened if I talk about gamification. If making your work more like a game fits your context and your personality, as well as the people working with you, then yes, we might look at creating some sort of Alternate Reality Game (ARG). Always know I would never force that if you weren’t interested, or if it wasn’t appropriate. However, I may use mechanisms that I have learned from game designers to help with areas of work that are difficult, feel hopeless and don’t have meaning. If I do it correctly, you won’t recognize it as a game – I won’t just put up superficial gold stars and leaderboards, or worse, trivialize the important work that you do. I may however, collaborate with you to create something to help you get more meaning in what you do using engagement or other concepts I have learned from games.

That is vital in human and software systems that people work with. Can we make this activity or program engaging so they want to use it more? Can we design the system to not only solve the problems of an organization, but also to help reinforce meaning in what people do? Gamification is an interesting and powerful area of research, with a lot of potential for good, but it can also create harm. I am carefully researching how I can use this in my own work, because it is one mechanism that I see to help do something more for us.

Studying engagement models and finding and experiencing meaning in the things we spend our days working at is important and I am spending more time looking at how the intersection of software and people systems can help.

Design principles are another area of research and problem-solving for me, which are often under the umbrella of UX (User Experience). Creating great software experiences can really help us since we interact with it, or it affects us indirectly in everything that we do. A better software or computer system experience has an enormous impact on our lives. When they go wrong, they can really cause problems, but a simple, elegant solution can bring joy. User experience and design in an era where wireless and sensor technology is common, touch and gesture interaction on different technology with different screens is hard enough. What do we do when nanotechnology and other distributed or pervasive systems become much more common? I love the research and work in this space, and it is a part of what I do on projects.

The challenges we have are fascinating, so product management and product design are areas of project work for me, and what I am increasingly spending time on in my spare time.

Some of you have heard me talk about health projects. One of the most rewarding projects of my career was working on a medical program for mobile devices. It was great to try to break new ground with new technology, and determine how we could make health-care professionals lives easier, and to enable them to provide better patient care. My Mother still works as a medical professional, it is a calling, and we tease her that if she refuses to retire, she’ll pass on “in harness”. She is absolutely fine with that, she is committed to her work and patients, and takes courses every year on areas that interest her, and how to better use technology in her own work. She has passed that down to me, and finally as a professional, I have had some chances to help create better software for medical professionals. I enjoy working on medical software because I can see how we are contributing to actually make people’s lives better. When we do it right, we enable others to do great work, solve difficult problems and help real people. It’s easy to find meaning when your work has an impact on others, and we can do so much better with technology and health than we have been.

Systems that help us live more healthy lives are an area of keen interest for me, and I am interested in mobile, games for health, distributed computing, crowdsourcing and all sorts of things in that space. Healthcare professionals like Anna Sort inspire me with their creative and innovative ideas that they turn into action, and programs like Strokelink to help stroke patients using mobile technology are great.

I’m also interested in how we can create software for health professionals that is easier to use, more reliable, and enables them to focus on patients and not fight with systems that don’t take them and their unique context and work as well as the environment they are working in into account.

Finding ways to use software and related technology in health care and health research is another area of huge interest for me.

So there you have it. Watch this space for more of the above topics on how we can explore the intersection of people and technology to help design better lives for ourselves.