perf-probe provides a way for you instrument the Linux kernel at
runtime with your very own dynamic tracepoint – you can create a
tracepoint for any function or source line in the kernel. While the
kernel does provide a whole bunch of tracepoints out of the box (I
1126 on this Haswell machine), they don’t cover everything.
Sometimes you need to roll your own.
Like recently, when I was looking at the following
output while comparing the performance of the same workload on bare
metal and in KVM:
All of these events incur a VM exit which is costly. Hundreds of cycles costly.
Worse, if you’re comparing against bare metal like I was, it’s overhead that simply doesn’t exist in the baseline because there is no need to transfer control to the VM monitor. The following diagram from the Intel Architecture Software Developer’s Manual (SDM) Vol. 3C illustrates the transition from guests to host.
Any chance to eliminate VM exits can potentially be a major performance win.
With that in mind, I paused when I read the
io_exits event in the
list above, having no idea what an
io_exit event was. Reading the
kernel source showed that the event happens in response to the KVM
guest executing an I/O instruction (
outb, etc), at which
point I’m thinking, “Why on earth is KVM
executing I/O instructions? That’s gotta be emulation for some legacy
The problem was: I didn’t immediately know how to verify that.
kvm:kvm_exit tracepoint gives the address of the I/O
instruction that caused the VM exit, so it’s possible to find all
instruction addresses, sorted by number of occurrences by doing:
Looking up the function in the guest kernel with
addr2line -e vmlinux
-f 0xffffffff813267ea revealed that the
iowrite16 function was the
cause of 99.7% of the
And this is where
perf-probe comes into its own. From within the
guest, I can create my own dynamic tracepoint on
gather a cpu-cycles profile with callstack to show which code paths
lead to it.