perf-probe provides a way for you instrument the Linux kernel at runtime with your very own dynamic tracepoint – you can create a tracepoint for any function or source line in the kernel. While the kernel does provide a whole bunch of tracepoints out of the box (I count 1126 on this Haswell machine), they don’t cover everything. Sometimes you need to roll your own.

Like recently, when I was looking at the following kvm_stat -d output while comparing the performance of the same workload on bare metal and in KVM:

All of these events incur a VM exit which is costly. Hundreds of cycles costly.

Worse, if you’re comparing against bare metal like I was, it’s overhead that simply doesn’t exist in the baseline because there is no need to transfer control to the VM monitor. The following diagram from the Intel Architecture Software Developer’s Manual (SDM) Vol. 3C illustrates the transition from guests to host.

Any chance to eliminate VM exits can potentially be a major performance win.

With that in mind, I paused when I read the io_exits event in the list above, having no idea what an io_exit event was. Reading the kernel source showed that the event happens in response to the KVM guest executing an I/O instruction (inb, outb, etc), at which point I’m thinking, “Why on earth is KVM executing I/O instructions? That’s gotta be emulation for some legacy peripheral”.

The problem was: I didn’t immediately know how to verify that.

The kvm:kvm_exit tracepoint gives the address of the I/O instruction that caused the VM exit, so it’s possible to find all instruction addresses, sorted by number of occurrences by doing:

Looking up the function in the guest kernel with addr2line -e vmlinux -f 0xffffffff813267ea revealed that the iowrite16 function was the cause of 99.7% of the io_exits events.

And this is where perf-probe comes into its own. From within the guest, I can create my own dynamic tracepoint on iowrite16() and gather a cpu-cycles profile with callstack to show which code paths lead to it.

There you have it, the cause of the majority of the io_exits are the virtio_blk and virtio-rng drivers. Who knew?