Trying to diagnose problems in the early asm instructions of the x86 Linux kernel is just about the most cumbersome task you can perform in the kernel. There are still no good methods of debugging very early booting issues. For problems that occur before the serial console and EFI framebuffer are initialized the only solution is to force your machine to reboot or hang at strategic locations in the kernel to try and home in on the root cause.
If your machine is misbehaving during boot there are two symptoms that you will want to debug: either the machine unexpectedly hangs, or unexpectedly resets.
Below are some tricks that I rely on every time someone comes to me with an early EFI booting problem. I've also used them when writing the EFI boot stub and EFI mixed mode kernel patches. These techniques are not pretty, but they get the job done when you're out of other options.
Debugging a mysterious reset with an infinite loop
The strategy to employ in this scenario is to force your machine to hang. Debugging this is made slightly easier because you can use one trick for all code paths; hang the machine.
The usual idiom is this,
1:
hlt
jmp 1b
which causes the machine to halt if it reaches this instruction. For
example, assume that there's a bug in the EFI boot stub such that
hdr.code32_start isn't initialized correctly. The buggy
code would look like this (modified from the original),
Assuming we're jumping through an invalid pointer in %rax, executing the jmp instruction will cause a reset. But suppose we didn't know that already. Instead, we'd have to gradually modify the code as we got closer and closer to the root of the issue. The first time it might look like this,
And the machine hangs. OK, good. We know everything upto and including
efi_main() is working fine. We'll quickly realise that a
modification like this returns us back to the resetting problem,
Bingo. The problem is obviously a bogus %rax value. Of course, things get substantially easier once you get to the C code in x86_64_start_kernel() and can use the old reliable idiom,
while(1);
Debugging a hang by triggering a reset
There are a few different tricks to attempt when diagnosing a hang in the early kernel code, depending on where you suspect the hang is occurring. All of the tricks cause the machine to reset.
If you're debugging a hang before the interrupt handlers have been setup in x86_64_start_kernel()then you can simply do either of,
xorq %rax, %rax
jmp *%rax
or you can do something similar from C,
void(*foobar)(void);foobar=NULL;foobar();
Basically, jump through a NULL pointer. However, once you get passed
x86_64_start_kernel() simply jumping through a NULL
pointer isn't going to cause the machine to reset; it will trigger the
page fault interrupt handler. So, you need to load an empty interrupt
descriptor table (IDT) and then jump through a NULL
pointer,