Processor micro-architecture internals (branch prediction, branch predictor and indirect branch)

Introduction

Recently, our security research on leverage a Performance Monitor Unit as a technique for monitoring a function call and control-flow integrity.

We leverage a following perf event , and we faced an interesting problem , 

Figure[1]


The one of the following event is almost always get counted by Performance Counter.
BR_MISP_EXEC.TAKEN_INDIRECT_NEAR_CALL
The interesting question is that why such instruction is always get mispredicted ?  There are couples of things we need to clarify and dive into....

Indirect Branch

jmp rax ; Indirect jmp
call  rax ; Indirect call

Branch Target Buffer

BTB is a table that in a processor internal, for optimising the processor performance during it's making a branch decision (yes/no), and it is indexed by current RIP (instruction pointer) and the value is branch target address ,  BTB's structure as following figure




Figure[2]

Branch Predictor

Branch predictor leverages BTB and perform as following prediction process, due to the indirect branch is always unconditionally executed, as a result, the branch predictor always taken that instruction and build the BTB entry when it's first executed.

However, there is a main point for the heavy misprediction, despite of the indirect branch is unconditional branch, the target address is still unexpected until the branch instruction actually decoded, such situation may cause BTB will easily get wrong if the register/memory content changed later. (such as, windows' SSDT dispatcher, almost always different across each execution)



Figure[3]


That's mean, an indirect instruction is being executed with a different target address frequently, (such as, C++ vtable function call / Window's SSDT dispatcher, etc), the branch prediction may easily be wrong,  it's not because the branch shouldn't be taken, instead, it is due to the BTB stored the wrong (old) value and use it compare to the new branch target address, and as a result, it gets wrong result (mis-prediction, see Figure[4]) ,

It finally flush the misprediction instruction, delete the BTB entry, and restart the instruction.  As a result, these kind of indirect branch suffering such infinite-misprediction-loop. :)


Figure[4]

Reference





Comments

Popular posts from this blog

How does Nested-Virtualization works?

Understanding ACPI and Device Tree

Windows Mini Class and Class Driver internal research notes