News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Processor differences with different instructions.

Started by hutch--, February 12, 2018, 11:34:52 AM

Previous topic - Next topic

hutch--

One of the sobering facts writing code for x86 hardware is performance on any given procedure varies from one processor to another. On this HASWELL I am using I just did some speed tests on simple byte copy and the combination "rep movsb" is the fastest on small memory copy tasks. On very large data the SSE and AVX versions are faster but interestingly enough the historical "rep movsd" hybrid with "rep movsb" is actually slower on all data sizes.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

NOSTACKFRAME

bcopy proc

  ; rcx = src
  ; rdx = dst
  ; r8  = count

    mov r11, rsi
    mov r10, rdi

    mov rsi, rcx
    mov rdi, rdx
    mov rcx, r8

    rep movsb

    mov rsi, r11
    mov rdi, r10

    ret

bcopy endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

jj2007


hutch--

Its worth a try, I will have to do one at some stage. Long ago Intel made special provisions for using the old REP MOVS instructions and they were always reasonably fast but this stuff tends to vary from processor to processor. This is what I am getting timing wise with different instructions.

bcopy 7141
mcopy 7281
xmmcopya 5375
ymmcopya 5407
Press any key to continue...