News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Retpoline

Started by vengy, February 28, 2018, 04:00:07 AM

Previous topic - Next topic

vengy

I was wondering if this code below is truly optimized.
I'm thinking there might be some opcode hacks that may reduce the size or speed.

For indirect calls/jmps, here's the code that I'm using based upon this:  https://patchwork.kernel.org/patch/10143779/

NOSPEC_JMP MACRO target:REQ
                PUSH            target
                JMP             x86_indirect_thunk
ENDM


NOSPEC_CALL MACRO target:REQ
                LOCAL           nospec_call_start
                LOCAL           nospec_call_end

                JMP             nospec_call_end

nospec_call_start:
                PUSH            target
                JMP             x86_indirect_thunk

nospec_call_end:
                CALL            nospec_call_start
ENDM


.CODE

;; This is a special sequence that prevents the CPU speculating for indirect calls.

x86_indirect_thunk:
                CALL            retpoline_call_target

capture_speculation:
                PAUSE
                JMP             capture_speculation

retpoline_call_target:
                IFDEF WIN64
                LEA             RSP,[RSP+8]
                ELSE
                LEA             ESP,[ESP+4]
                ENDIF

                RET

aw27

It is interesting, particularly this part

capture_speculation:
                PAUSE
                JMP             capture_speculation

which is never actually executed.

Namely, it is could be (possibly?) transposed for some speed tests within tight loops to clear up the predictive branches.