What is a microbenchmark?

Most people have heard of computer benchmarks (Phoronix being a good example), but not everyone is familiar with microbenchmarks.

While benchmarks are designed to accurately simulate real life workloads, a microbenchmark is a program designed to test a small piece of a larger system; they are always artificial and are not intended to represent normal use.

When would you use a microbenchmark?

Both types are used to compare things such as computer hardware, different software versions, etc, and there's no hard and fast rule for distinguishing between the two. But a guiding principle is that benchmarks test entire systems while microbenchmarks test a single component or feature of those systems.

Because they focus on a single part of a larger system, microbenchmarks are often used to analyse the performance of critical code. They give developers answers to questions like, "How will that locking scheme scale?", "Is memory bandwidth a bottleneck for this workload?" and "How long does the most frequent operation take to execute?".

A simple microbenchmark might measure Linux system call latency -- how long the write() system call takes to execute, for instance. For a hypothetical web stack, it could measure the time HTTP GET requests take to execute, from receiving the request on the server to sending a response to the client (note how this is done exclusively on the server, not the client, because otherwise you would also be measuring network latency, not just HTTP GET processing).

On the kernel performance team at SUSE, we use a bunch of microbenchmarks to check for performance regressions between releases as we're developing the latest one. Not only do we compare historical data between releases to track and fix regressions, we'll sometimes use that data to write patches to improve the performance of the Linux kernel.

Somewhat related to writing patches to improve microbenchmark scores is testing hypothesis, e.g. changing the values of OS configuration parameters to see whether scores improve. The runtime of microbenchmarks are also usually (though not always) shorter, which helps in quickly getting answers to questions.

If you're a developer, writing your own microbenchmark is a great way to really understand how your code performs, especially if the OS and software stack don't provide any other way to measure things outside of your code. This was partially the reason that I wrote the adrestia microbechnmark suite to measure the Linux scheduler load balancer.

When would you not use a microbenchmark?

Of course, like everything, they have their limitations, and the results they produce are not always useful. Because the way they exercise the system isn't based on a realistic workload, it can be difficult to justify changing the software or hardware configuration purely to improve microbenchmark results, which are by definition, artificial.

Likewise if you're using them to aid with debugging, you need to be careful not to make the wrong choice if the artificial and real world workload do not behave the same way under some conditions. You can end up solving the wrong problem.

You will quickly run into trouble if you take a shotgun approach to performance analysis by using every microbenchmark you can find to help diagnose your issue. They're best used after first identifying the important parts of your system. Pretty much any microbenchmark will find a bottleneck, you need to make sure it's a bottleneck you care about.

It's always a good idea to use popular ones, because the internet is filled with anecdotal evidence of badly designed microbenchmarks that do not test what the author intended. Make sure you only use standard ones, or ones that you've verified work correctly.

But either way, they allow you to zoom in on your system and quickly understand and make improvements. They should be a part of every developer's toolkit.