Nested-Virtualization - IA32_GS_BASE / IA32_KERNEL_GS_BASE remark


Background:
   Debugging with Nested-VMM hang issue when injecting nested #DB, and the infinite #PF occur, the system will be freeze.


 Phenomenon:

After Emulation of VMExit (L0 VMRESUME to L1), then causing VMExit Reason is 28 (CR access), and after that the Kernel GS Base is changed.
Try to intercept WRMSR and find the instruction which modify IA32_KERNEL_GS_BASE, therefore, as a follow screen capture: 0xFFFFF80002ADB369, which is in area of nt!SwapContext. Matched VMExit Reason


PS. Because I take this screen capture at different time, so that address is not the same… and above: 0xFFFFF80002ADB369, there is a 0Xfffff80002a9c369


Analyze:
-    Take a look at the red box in the following screen capture, the function will be directly execute WRMSR into a 0C0000102H (IA32_KERNEL_GS_BASE) MSR with actually qword ptr [r8+80h], r8 is an address of the current thread’s User Mode Scheduling Control Block (UMC_CONTROL_BLOCK), and the offset 80h which is corresponding current thread TEB (since RSI register is KTHREAD of current thread).

-      As a result, the 0C0000102H MSR register now stored TEB of current Thread or UMS TEB.


So, what is the problem?



Analyze:
-   User Mode INT 3 VMExit 
à  L0 Save Guest context (such as, GS base in 0C000101H MSR, usually are TEB) into VMCS12’s Guest GS Base for VMRESUME Emulation uses (Current GS Base MSR is TEB, so supposed Kernel GS Base is KPCR) 
à Emulated VMExit (L0 VMRESUME to L1's VMExit Handler, it is actually running in Guest Mode) 
à During L1's handling its VMExit, Context switch may occur, it WRMSR in IA32_KERNEL_GS_BASE with TEB 
à Finally, L1 executes VMRESUME and L0 start emulation, resume to L2 And fill the VMCS12’s Guest GS into current GS Base MSR, now Current GS Base is TEB, and Kernel GS Base is TEB  



Problem:
- If now Guest OS execute any function( such as , #DB event injection) which depend on SWAPGS , #DB (since user mode #DB need to SWAPGS, but ISR can’t correctly get KPCR) , #PF depend on SWAPGS too , infinite loop with #PF, System freeze.


The Whole process as following flow chart:



Root Cause:
     After Emulation of VMExit, IRQL is APC(until L1's VMM Exit Handler raise it up to DPC level), Thread switch is possible, others still could change the MSR register without exiting. Even it can exit, we still can’t stop it.
Solution:
      Therefore, we should ensure the kernel GS base MSR consistency after VMExit and before VMEntry.
source
       https://github.com/Kelvinhack/kHypervisor

Comments

Popular posts from this blog

Android Kernel Development - Kernel compilation and Hello World

How does Nested-Virtualization works?

Understanding ACPI and Device Tree