isolcpus is deprecated, kinda
A problem that a lot of sysadmins and developers have is, how do you run a single task on a CPU without it being interrupted? It’s a common scenario for real-time and virtualised workloads where any interruption to your task could cause unacceptable latency.
For example, let’s say you’ve got a virtual machine running with 4 vCPUs, and you want to make sure those vCPU tasks don’t get preempted by other tasks since that would introduce delays into your audio transcoding app.
Running each of those vCPU tasks on its own host CPU seems like the way to go. All you need to do is choose 4 host CPUs and make sure no other tasks run on them.
How do you do that?
I’ve seen many people turn to the kernel’s isolcpus
for this. This
kernel command-line option allows you to run tasks on CPUs without
interruption from a) other tasks and b) kernel threads.
But isolcpus
is almost never the thing you want and you should
absolutely not use it apart from one specific case that I’ll get to at
the end of this article.
So what’s the problem with isolcpus?
1. Tasks are not load balanced on isolated CPUs
When you isolate CPUs with isolcpus
you prevent all kernel tasks from
running there and, crucially, it prevents the Linux scheduler load
balancer from placing tasks on those CPUs too. And the only way to get
tasks onto the list of isolated CPUs is with taskset
. They are
effectively invisible to the scheduler.
Continuing with our audio transcoding app running on 4-vCPUs example
above, let’s say you’ve booted with the following kernel command-line:
isolcpus=1-4
and you use taskset
to place your four vCPU tasks on to
those isolated CPUs like so: taskset -c 1-4 -p <vCPU task pid>
The thing that always catches people out is that it’s easy to end up with all of your vCPU tasks running on the same CPU!
$ ps -aLo comm,psr | grep qemu
qemu-system-x86 1
qemu-system-x86 1
qemu-system-x86 1
qemu-system-x86 1
Why? Well because isolcpus
disabled the scheduler load balancer for
CPUs 1-4 which means the kernel will not balance those tasks equally
among all the CPUs in the affinity mask. You can work around this by
manually placing each task onto a single CPU by adjusting its affinity.
2. The list of isolated CPUs is static
A second problem with isolcpus
is that the list of CPUs is configured
statically at boot time. Once you’ve booted, you’re out of luck if you
want to add or remove CPUs from the isolated list. The only way to
change it is by rebooting with a different isolcpus
value.
cset to the rescue
My recommended way to run tasks on CPUs without
interruption
by isolating them from the rest of the system with the cgroups subsystem
via the cset shield
command, e.g.
$ cset shield --cpu 1-4 --kthread=on
cset: --> shielding modified with:
cset: kthread shield activated, moving 34 tasks into system cpuset...
[==================================================]%
cset: **> 34 tasks are not movable, impossible to move
cset: "system" cpuset of CPUSPEC(0,3) with 1694 tasks running
cset: "user" cpuset of CPUSPEC(1-2) with 0 tasks running
$ cset shield --shield --pid <vCPU task pid 1>,<vCPU task pid 2>,<vCPU task pid 3>,<vCPU task pid 4>
cset: --> shielding following pidspec: 17063,17064,17065,17066
cset: done
With cset
you can update and modify the list of CPUs included in the
cgroup dynamically at runtime. It is a much more flexible solution for
most users.
Sometimes you really do want isolcpus
OK, I admit there are times when you really do want to use isolcpus
.
For those scenarios when you really cannot afford to have your tasks
interrupted, not even by the scheduler tick which fires once a second,
you should turn to isolcpus
and manually spread tasks over the CPU
list with taskset
.
But for most uses, cset shield
is by far the best option that’s least
likely to catch you by surprise.