Performance Testing

Introduction

There are various things in the Development world that sound easy but are hard. Logging is one and another is Performance Testing.

I mean how hard can it be to throw a load of requests at a web application and find out how it behaves?

As it happens, very hard.

It is not a Friday afternoon plaything like throwing together a PoC for React or Vue JS might be, it takes much longer. In fact, it won't be too long before a growing company will need people full-time just on performance measuring and improvements.

If you are not convinced, let us start with why performance measurement is hard and then we will look at how to approach it methodically.

Why is Performance Testing Hard?

What actually is acceptable performance? Most people would answer something like, "if a request takes longer than about 1 second, it would not be acceptable". That is true but it doesn't answer the question. If you defined acceptable performance as less than unacceptable performance then you are not really getting to the bottom of things.

The answer is really, "A level of performance I can reasonably expect". Most web applications should be returning a web page to a user in under 500mS but actually, less than 100mS is not actually that hard in many cases so this would be the target performance but....

What if my web app is completely unoptimised and is not this fast? What if I have a load of marketing rubbish scripts that slow down my otherwise fast page by a factor of 10?

How much is performance even worth? If my web app is already slow when I start looking at performance, how much work will it take to improve? Is it worth it? If the first 10% is cheap and easy, is it worth it just for 10% performance improvement?

Is throwing more web servers at the problem a cheaper and easier solution to refactoring the app to work more efficiently?

If someone makes a change that slows something down by 5%, is that acceptable for the functionality? Would you even know that this has happened?

Hopefully, you are starting to get an uncomfortable feeling in your stomach and realising that even asking the questions is hard before you even start doing anything.

Setting a Baseline

The best way to start is from a clean sheet. Create a simple page on the same tech stack as your main app but ideally in a completely new test application. Do something relatively typical for your app in your new demo app. For example, if your app always requires a login, add the authentication system and then make a call ro 3 to a database.

Add some performance testing monitoring to your test app using something like Glimpse or MiniProfiler in .Net (or anything similar in other frameworks). Ideally, this should be able to show you exactly where your time is taken up during a request.

Run this on your development machine and access the page several times to get some fairly consistent results (code might need to build the first time the page is accessed etc.). Make sure you are not caching any objects!

Do the same on the production system which might be considerably different in performance specs and/or network topology.

We do both so that we know that e.g. the DB takes 100mS locally and 10mS in production so if I am testing locally, I need to test against a local baseline.

The reason we don't just do a blank page is that it is not representative of real life. There is no point learning that your stack can return a blank page in 5mS if every page on your real app is checking session and calling the database!

This gives you a starting point and ideally, your standalone requests should take no longer than perhaps 100mS depending on how many database calls you are making. In most cases, these should be sub 30mS.

Debugging the Stack

If your baseline measurements are not down in the single/low double-digits mS, you probably have some work to do debugging things like network, database connection settings, choice of language, language settings, anti-virus programs, anything that is slowing things down. Hopefully the rough location is obvious from your performance traces.

Do some research if you need to narrow it down. How long should a DB call take for your DB engine? 100mS might not sound like much but we usually get single digit millisecond performance althuogh that requires a LAN with fast network, a high performance DB server and correctly setup connection pooling.

If you are running your baseline on your normal production servers, also check that you are not testing while the system is under heavy load which will slow down your tests, remember, the baseline should give you a best-case.

Looking at the Normal Metrics

Now that you have a baseline, you are now in a position to start looking at your production metrics compared to the baseline remembering that your production system probably has a load more junk in its pages. Imagine, that your baseline is 100mS per page and your real metrics are something like 150/200mS, then you are probably not seeing any significant problems under normal loads, if you are seeing production metrics at 500mS+ or response times varying significantly, then you will need to do some more work.

One of the hard parts of performance monitoring is finding out what the slow parts are caused by. Sometimes adding a script might cause an additional X mS of load but more often it is likely a combination of CPU, network, disk access, connection pooling etc. and the symptom is not always directly linked to the cause. A slow request might be caused by a lack of avilable database connections but this is rarely visible.

A full Application Peformance Management package is often the best way to keep tabs on what is happening but they come at a price (not all pricing models are the same) and they still can't see everything. They might see slow disk access but not know that it is caused by Windows Updates or an anti-virus program going nuts.

With all of these variables, another reason why this is hard, we need a methodical approach to things. Are speeds variable depending on time of day or number of requests? Are speeds randomnly variable? Depending on what is happening on your graphs, you will need to monitor different things. We use Application Insights on Azure for most of our apps and this is a good general measure of performance over time and can show some things but not, for example, exactly what database query is being called. It can show things like CPU versus response times to see if something in the app is causing a spike based on volume or specific pages.

If your general response times are stable but just generally high, it is usually relatively easy to add timing scripts (like Application Insights) which will break down the page load into separate pieces. Maybe you are loading script synchronously which could be asynchronous. Maybe you are loading too many scripts. Maybe it is the DOM load which is very slow because you have a very complex Javascript-based page.

If, on the other hand, your normal response times are close to the baseline, then you are reading to start looking at Load Testing, another level of hard.

What is Load Testing?

Every software system will start to slow down under load and Load Testing is designed to find out how much capacity we have to start with, at what point things start to slow down and at what point the system becomes unstable and people start seeing errors and/or very slow responses. Most systems have stable response times up to a point, a gradual slow down to another point a short area where slow downs are more noticeable but errors are largely avoided and then an avalanche where the requested workload is greater than the resources available and the only possibility is for requests to slow, get dropped, error, timeout etc.

We need to take a baseline again but this is complicated by whether you already use auto-scaling, which is already designed to cope with load but which might make load testing much slower and more expensive.

If you are only using a single web server (you shouldn't be!) or have a fixed number of web servers, you can do this relatively easily but if you can test on a separate (but identical) system then great. If not, schedule it outside of normal hours, tell your customers and let your service provider know so that they might disable DoS protection etc. Also note that in general you cannot load test from a single client which will generally limit the maximum number of concurrent connections to a single destination.

You have a choice of tools, including some open source ones like Apache Bench, JMeter and Bees with Machine Guns but most of this have a learning curve and in most cases will require some configuration at low level to get them working. There are also services like Azure and BlazeMeter who can provide a higher level service to those with a bit more money than time to do it themselves. Most will use cloud servers provisioned on demand and can even run from multiple cloud locations to provide timings for remote workings.

The tricky bit about setting these up is asking what a typical request looks like. You might be able to determine this from your application insights or google analytics but the closer you can get the test to a real sample, the more useful your results will be. These requests are recorded in different ways depending on the platform but, for example, JMeter allows you to record via a browser proxy which is quite easy.

When you run the tests, you then have various options but often you will start with a number of concurrent requests and either ramp up slowly towards a target number or until the response times goes above a set amount (e.g. 1 second) or when the number of errors goes above 1. You can also step up the number of users to allow the system time to respond to changes e.g. increase users by 100 every 10 seconds until one of the failures occurs.

Once you run these tests, you get a graph that shows the response time and/or error level for each load of concurrent users. Again, this can be hard to correlate to what is actually happening which is why you generally need to watch CPU/network/RAM etc while testing to see which resource is hitting a limit and causing the problems.

The Database Limit

It is relatively easy to create a system that allows an arbitrary increase in web servers to handle the load while still sharing the database. Scaling databases out is relatively difficult unless already built-in to the architecture but finding a resource limit for the database can also be hard. This might require a load test that temporarily spins up 100s or 1000s of web servers just to get the database to the point where it is struggling. Good metrics will highlight which specific methods are slow on the database allowing you to concentrate on these but for most people, if you have design the database mainly to deal with data and query language, you shouldn't have a problem here until you have reached a very high throughput on the web server.