Retpoline

vengy · February 28, 2018, 04:00:07 AM

I was wondering if this code below is truly optimized.
I'm thinking there might be some opcode hacks that may reduce the size or speed.

For indirect calls/jmps, here's the code that I'm using based upon this: https://patchwork.kernel.org/patch/10143779/

NOSPEC_JMP MACRO target:REQ
PUSH target
JMP x86_indirect_thunk
ENDM

NOSPEC_CALL MACRO target:REQ
LOCAL nospec_call_start
LOCAL nospec_call_end

JMP nospec_call_end

nospec_call_start:
PUSH target
JMP x86_indirect_thunk

nospec_call_end:
CALL nospec_call_start
ENDM

.CODE

;; This is a special sequence that prevents the CPU speculating for indirect calls.

x86_indirect_thunk:
CALL retpoline_call_target

capture_speculation:
PAUSE
JMP capture_speculation

retpoline_call_target:
IFDEF WIN64
LEA RSP,[RSP+8]
ELSE
LEA ESP,[ESP+4]
ENDIF

RET

aw27 · February 28, 2018, 04:59:06 AM

It is interesting, particularly this part

capture_speculation:
PAUSE
JMP capture_speculation

which is never actually executed.

Namely, it is could be (possibly?) transposed for some speed tests within tight loops to clear up the predictive branches.

The MASM Forum

News:

Retpoline

vengy

aw27