Author Topic: do you not approve of gotoswhy?do you instead like jumptables?  (Read 2383 times)

nidud

  • Member
  • *****
  • Posts: 1800
    • https://github.com/nidud/asmc
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #75 on: October 31, 2019, 11:42:07 PM »
or is there easier way ?

One of the big advantages using assembler is branching or apply code based on flag conditions which is hard to achieve in HLL. This also apply to function calls which may return a flag condition for direct branching:

    foo(ecx)
    je L0
    ja L1

PASCAL and STDCALL preserves the flags but C and other calling conventions adjust the stack using ADD on return. Asmc has a command line switch (/pf) that use LEA for this purpose.

However, the intrinsic functions (especially SIMD) optimize very well in modern compilers so the best approach will probably be to just write it out as you normally do and leave the optimization to the compiler.

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #76 on: November 06, 2019, 01:10:57 AM »
This is very suboptimal AVX-512 memmove based on previous Nidud AVX-64 memmove. It was tested and works very well, but slowly as mentioned.
I have done it in UASM because there is currently a problem with ASMC with AVX-512 instructions.

Code: [Select]
.xmm

OPTION EVEX:1
option SWITCHSTYLE : ASMSTYLE
option  win64 : 6


option casemap:none

.code

switchAVX512_64 proc
   mov r10,rcx

    .if r8 <= 128
.switch r8
         .case 0
            ret

          .case 1
            mov cl,[rdx]
            mov [r10],cl
            ret

          .case 2,3,4
            mov cx,[rdx]
            mov dx,[rdx+r8-2]
            mov [r10+r8-2],dx
            mov [r10],cx
            ret

          .case 5,6,7,8
            mov ecx,[rdx]
            mov edx,[rdx+r8-4]
            mov [r10+r8-4],edx
            mov [r10],ecx
            ret

          .case 9,10,11,12,13,14,15,16
            mov rcx,[rdx]
            mov rdx,[rdx+r8-8]
            mov [r10],rcx
            mov [r10+r8-8],rdx
            ret
         .case 17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
            vmovdqu xmm0,[rdx]
            vmovdqu xmm1,[rdx+r8-16]
            vmovups [r10],xmm0
            vmovups [r10+r8-16],xmm1
            ret
          .case 33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,\
                49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64
            vmovdqu ymm0,[rdx]
            vmovdqu ymm1,[rdx+r8-32]
            vmovups [r10],ymm0
            vmovups [r10+r8-32],ymm1
ret
         .case 65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,\
88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,\
110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128
vmovdqu8 zmm0, [rdx]
vmovdqu8 zmm1,[rdx+r8-64]
            vmovups [r10],zmm0
            vmovups [r10+r8-64],zmm1
ret
.endswitch
.endif
    vmovdqu8 zmm2,[rdx]
    vmovdqu8 zmm3,[rdx+64]
    vmovdqu8 zmm4,[rdx+r8-64]
    vmovdqu8 zmm5,[rdx+r8-128]
    .if r8 > 256
       mov ecx,r10d
        neg ecx
        and ecx,128-1
        add rdx,rcx
        mov r9,r8
        sub r9,rcx
        add rcx,r10
        and r9b,-128
.if rcx > rdx
            .while 1
                sub r9,128
                vmovdqu8 zmm0,[rdx+r9]
                vmovdqu8 zmm1,[rdx+r9+64]
                vmovdqu8 [rcx+r9],zmm0
                vmovdqu8 [rcx+r9+64],zmm1
.if ZERO?
.break
.endif
            .endw
            vmovdqu8 [r10],zmm2
            vmovdqu8 [r10+64],zmm3
            vmovdqu8 [r10+r8-64],zmm4
            vmovdqu8 [r10+r8-128],zmm5
            ret
            ;db 13 dup(0x90)
        .endif

        lea rcx,[rcx+r9]
        lea rdx,[rdx+r9]
        neg r9
        .while 1
            vmovdqu8 zmm0,[rdx+r9]
            vmovdqu8 zmm1,[rdx+r9+64]
            vmovdqu8 [rcx+r9],zmm0
            vmovdqu8 [rcx+r9+64],zmm1
            add r9,128
.if ZERO?
.break
.endif
        .endw
    .endif
    vmovdqu8 [r10],zmm2
    vmovdqu8 [r10+64],zmm3
    vmovdqu8 [r10+r8-64],zmm4
    vmovdqu8 [r10+r8-128],zmm5
ret
switchAVX512_64 endp

end

I attach the test program (no source code at this time). The program will run only the tests that are compatible with the computer. People without AVX will run 5 tests, people with AVX 7 tests and people with AVX- 512 8 tests.

There is a lot to say but only a few notes now:
- I added the rep movsb modified to support overlapping. This makes it not competitive against the others.
- The AVX memoves from nidud are normally faster.
- Agner Fog memove supports AVX and AVX-512 and has the advantage of falling back when the system has no AVX or above.
- memmove has a decent performance and does not use AVX, only SSE.
- This test also includes apex-memmove which claims to be the fastest memcpy/memmove on x86/x64 .. EVER, written in C.
https://www.codeproject.com/Articles/1110153/Apex-memmove-the-fastest-memcpy-memmove-on-x-x-EVE
It does not shine and is not up to its claims.

I attach also the results of the test performed on the Xeon with AVX-512. The times are slow due to the conditions under which the test was performed, so what is meaningful is how the various tests stake against one another.



daydreamer

  • Member
  • ****
  • Posts: 943
  • watch Chebyshev on the backside of the Moon
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #77 on: November 06, 2019, 08:00:06 AM »
thanks nidud and AW :thumbsup:

good question on memmove speed works on laptops with shared memory,while you simultanously run pixelshaders simultanously on gpu?terrible slowdown compared to a stationary with separate system ram and vram?on the other hand system ram->vram direction streamlined,will screencapture program suffer from usual very slow vram read on that kind of system,but hardly notice read screenmemory on shared memory?

Code: [Select]
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz

(AVX-512 not supported, will not run the AVX-512 test.)
Testing: rep movsb, memmove(), switch 32 SSE, Agner Fog, Apexmemmove, switch 32 AVX, switch AVX64

Filling random data source array for all experiments. Number of elements: 100000000
*** data size to move: 1 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00551 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00107 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00228 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00299 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00109 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.0011 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00108 miliseconds

        Sorted:
        memmove()       Average elapsed time: 0.00107 miliseconds
        switch AVX64    Average elapsed time: 0.00108 miliseconds
        Apexmemmove     Average elapsed time: 0.00109 miliseconds
        switch 32 AVX   Average elapsed time: 0.0011 miliseconds
        switch 32 SSE   Average elapsed time: 0.00228 miliseconds
        Agner Fog       Average elapsed time: 0.00299 miliseconds
        rep movsb       Average elapsed time: 0.00551 miliseconds

*** data size to move: 7 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00328 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00162 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00109 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00138 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00125 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00111 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00185 miliseconds

        Sorted:
        switch 32 SSE   Average elapsed time: 0.00109 miliseconds
        switch 32 AVX   Average elapsed time: 0.00111 miliseconds
        Apexmemmove     Average elapsed time: 0.00125 miliseconds
        Agner Fog       Average elapsed time: 0.00138 miliseconds
        memmove()       Average elapsed time: 0.00162 miliseconds
        switch AVX64    Average elapsed time: 0.00185 miliseconds
        rep movsb       Average elapsed time: 0.00328 miliseconds

*** data size to move: 16 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00365 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00179 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00152 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00124 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00157 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00138 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.0018 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.00124 miliseconds
        switch 32 AVX   Average elapsed time: 0.00138 miliseconds
        switch 32 SSE   Average elapsed time: 0.00152 miliseconds
        Apexmemmove     Average elapsed time: 0.00157 miliseconds
        memmove()       Average elapsed time: 0.00179 miliseconds
        switch AVX64    Average elapsed time: 0.0018 miliseconds
        rep movsb       Average elapsed time: 0.00365 miliseconds

*** data size to move: 46 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00385 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.0016 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00136 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00143 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00167 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00169 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00119 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.00119 miliseconds
        switch 32 SSE   Average elapsed time: 0.00136 miliseconds
        Agner Fog       Average elapsed time: 0.00143 miliseconds
        memmove()       Average elapsed time: 0.0016 miliseconds
        Apexmemmove     Average elapsed time: 0.00167 miliseconds
        switch 32 AVX   Average elapsed time: 0.00169 miliseconds
        rep movsb       Average elapsed time: 0.00385 miliseconds

*** data size to move: 128 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00424 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.0022 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00141 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00196 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00162 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00128 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00122 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.00122 miliseconds
        switch 32 AVX   Average elapsed time: 0.00128 miliseconds
        switch 32 SSE   Average elapsed time: 0.00141 miliseconds
        Apexmemmove     Average elapsed time: 0.00162 miliseconds
        Agner Fog       Average elapsed time: 0.00196 miliseconds
        memmove()       Average elapsed time: 0.0022 miliseconds
        rep movsb       Average elapsed time: 0.00424 miliseconds

*** data size to move: 511 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.01241 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.0031 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.0022 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00248 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00201 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00204 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00219 miliseconds

        Sorted:
        Apexmemmove     Average elapsed time: 0.00201 miliseconds
        switch 32 AVX   Average elapsed time: 0.00204 miliseconds
        switch AVX64    Average elapsed time: 0.00219 miliseconds
        switch 32 SSE   Average elapsed time: 0.0022 miliseconds
        Agner Fog       Average elapsed time: 0.00248 miliseconds
        memmove()       Average elapsed time: 0.0031 miliseconds
        rep movsb       Average elapsed time: 0.01241 miliseconds

*** data size to move: 4192 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.04663 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00591 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00895 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00887 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00815 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00583 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00466 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.00466 miliseconds
        switch 32 AVX   Average elapsed time: 0.00583 miliseconds
        memmove()       Average elapsed time: 0.00591 miliseconds
        Apexmemmove     Average elapsed time: 0.00815 miliseconds
        Agner Fog       Average elapsed time: 0.00887 miliseconds
        switch 32 SSE   Average elapsed time: 0.00895 miliseconds
        rep movsb       Average elapsed time: 0.04663 miliseconds

*** data size to move: 8100 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.08604 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.01023 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.01185 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00777 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.01502 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.01117 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00869 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.00777 miliseconds
        switch AVX64    Average elapsed time: 0.00869 miliseconds
        memmove()       Average elapsed time: 0.01023 miliseconds
        switch 32 AVX   Average elapsed time: 0.01117 miliseconds
        switch 32 SSE   Average elapsed time: 0.01185 miliseconds
        Apexmemmove     Average elapsed time: 0.01502 miliseconds
        rep movsb       Average elapsed time: 0.08604 miliseconds

*** data size to move: 15000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.15626 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.01744 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.02019 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.01225 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.02498 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.01553 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.0149 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.01225 miliseconds
        switch AVX64    Average elapsed time: 0.0149 miliseconds
        switch 32 AVX   Average elapsed time: 0.01553 miliseconds
        memmove()       Average elapsed time: 0.01744 miliseconds
        switch 32 SSE   Average elapsed time: 0.02019 miliseconds
        Apexmemmove     Average elapsed time: 0.02498 miliseconds
        rep movsb       Average elapsed time: 0.15626 miliseconds

*** data size to move: 65000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.7477 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.092 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.10118 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.06766 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.12553 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.07437 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.06671 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.06671 miliseconds
        Agner Fog       Average elapsed time: 0.06766 miliseconds
        switch 32 AVX   Average elapsed time: 0.07437 miliseconds
        memmove()       Average elapsed time: 0.092 miliseconds
        switch 32 SSE   Average elapsed time: 0.10118 miliseconds
        Apexmemmove     Average elapsed time: 0.12553 miliseconds
        rep movsb       Average elapsed time: 0.7477 miliseconds

*** data size to move: 127000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 1.40537 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.19132 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.22892 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.15608 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.25693 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.17683 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.16972 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.15608 miliseconds
        switch AVX64    Average elapsed time: 0.16972 miliseconds
        switch 32 AVX   Average elapsed time: 0.17683 miliseconds
        memmove()       Average elapsed time: 0.19132 miliseconds
        switch 32 SSE   Average elapsed time: 0.22892 miliseconds
        Apexmemmove     Average elapsed time: 0.25693 miliseconds
        rep movsb       Average elapsed time: 1.40537 miliseconds

*** data size to move: 255000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 2.86089 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.43914 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.47659 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.40101 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.52976 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.41554 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.4243 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.40101 miliseconds
        switch 32 AVX   Average elapsed time: 0.41554 miliseconds
        switch AVX64    Average elapsed time: 0.4243 miliseconds
        memmove()       Average elapsed time: 0.43914 miliseconds
        switch 32 SSE   Average elapsed time: 0.47659 miliseconds
        Apexmemmove     Average elapsed time: 0.52976 miliseconds
        rep movsb       Average elapsed time: 2.86089 miliseconds

*** data size to move: 10000000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 133.026 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 52.5734 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 54.9919 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 67.5378 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 61.6622 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 54.2498 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 54.7909 miliseconds

        Sorted:
        memmove()       Average elapsed time: 52.5734 miliseconds
        switch 32 AVX   Average elapsed time: 54.2498 miliseconds
        switch AVX64    Average elapsed time: 54.7909 miliseconds
        switch 32 SSE   Average elapsed time: 54.9919 miliseconds
        Apexmemmove     Average elapsed time: 61.6622 miliseconds
        Agner Fog       Average elapsed time: 67.5378 miliseconds
        rep movsb       Average elapsed time: 133.026 miliseconds

*** data size to move: 50000000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 680.076 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 311.9 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 317.202 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 368.016 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 326.971 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 319.564 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 323.725 miliseconds

        Sorted:
        memmove()       Average elapsed time: 311.9 miliseconds
        switch 32 SSE   Average elapsed time: 317.202 miliseconds
        switch 32 AVX   Average elapsed time: 319.564 miliseconds
        switch AVX64    Average elapsed time: 323.725 miliseconds
        Apexmemmove     Average elapsed time: 326.971 miliseconds
        Agner Fog       Average elapsed time: 368.016 miliseconds
        rep movsb       Average elapsed time: 680.076 miliseconds

Test completed. Press any key to exit...
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #78 on: November 06, 2019, 04:31:14 PM »
Thank you for testing, daydreamer. The theoretical aspects of your question will have to be answered by Agner Fog or some other expert. But your results are consistent with others I have found. In particular the small functions first implemented by nidud tend to perform very well. Even the suboptimized AVX-512 version performs better than the KNNSpeed AVX Memove, as I found yesterday. This is interesting, all runtime efforts those routines do to select the best approach for each particular case wastes a lot of CPU cycles and even a suboptimzed straightforward function outperforms them (at least under this test conditions).

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 6768
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #79 on: November 06, 2019, 04:55:36 PM »
The speed test, Haswell E/EP

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz

(AVX-512 not supported, will not run the AVX-512 test.)
Testing: rep movsb, memmove(), switch 32 SSE, Agner Fog, Apexmemmove, switch 32 AVX, switch AVX64

Filling random data source array for all experiments. Number of elements: 100000000
*** data size to move: 1 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.016353 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.0023583 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00496485 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.009216 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00257551 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00238933 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00232727 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.00232727 miliseconds
        memmove()       Average elapsed time: 0.0023583 miliseconds
        switch 32 AVX   Average elapsed time: 0.00238933 miliseconds
        Apexmemmove     Average elapsed time: 0.00257551 miliseconds
        switch 32 SSE   Average elapsed time: 0.00496485 miliseconds
        Agner Fog       Average elapsed time: 0.009216 miliseconds
        rep movsb       Average elapsed time: 0.016353 miliseconds

*** data size to move: 7 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00958836 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.002048 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00189285 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00266861 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00325818 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00211006 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00462351 miliseconds

        Sorted:
        switch 32 SSE   Average elapsed time: 0.00189285 miliseconds
        memmove()       Average elapsed time: 0.002048 miliseconds
        switch 32 AVX   Average elapsed time: 0.00211006 miliseconds
        Agner Fog       Average elapsed time: 0.00266861 miliseconds
        Apexmemmove     Average elapsed time: 0.00325818 miliseconds
        switch AVX64    Average elapsed time: 0.00462351 miliseconds
        rep movsb       Average elapsed time: 0.00958836 miliseconds

*** data size to move: 16 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00651636 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00198594 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00195491 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00229624 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00183079 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00238933 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00229624 miliseconds

        Sorted:
        Apexmemmove     Average elapsed time: 0.00183079 miliseconds
        switch 32 SSE   Average elapsed time: 0.00195491 miliseconds
        memmove()       Average elapsed time: 0.00198594 miliseconds
        Agner Fog       Average elapsed time: 0.00229624 miliseconds
        switch AVX64    Average elapsed time: 0.00229624 miliseconds
        switch 32 AVX   Average elapsed time: 0.00238933 miliseconds
        rep movsb       Average elapsed time: 0.00651636 miliseconds

*** data size to move: 46 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00859539 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00297891 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00223418 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00251345 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00248242 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.0068577 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00738521 miliseconds

        Sorted:
        switch 32 SSE   Average elapsed time: 0.00223418 miliseconds
        Apexmemmove     Average elapsed time: 0.00248242 miliseconds
        Agner Fog       Average elapsed time: 0.00251345 miliseconds
        memmove()       Average elapsed time: 0.00297891 miliseconds
        switch 32 AVX   Average elapsed time: 0.0068577 miliseconds
        switch AVX64    Average elapsed time: 0.00738521 miliseconds
        rep movsb       Average elapsed time: 0.00859539 miliseconds

*** data size to move: 128 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.0106124 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00282376 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00279273 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00415806 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00319612 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00555442 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00521309 miliseconds

        Sorted:
        switch 32 SSE   Average elapsed time: 0.00279273 miliseconds
        memmove()       Average elapsed time: 0.00282376 miliseconds
        Apexmemmove     Average elapsed time: 0.00319612 miliseconds
        Agner Fog       Average elapsed time: 0.00415806 miliseconds
        switch AVX64    Average elapsed time: 0.00521309 miliseconds
        switch 32 AVX   Average elapsed time: 0.00555442 miliseconds
        rep movsb       Average elapsed time: 0.0106124 miliseconds

*** data size to move: 511 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.0198904 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00577164 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00536824 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00422012 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00375467 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00775757 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00760242 miliseconds

        Sorted:
        Apexmemmove     Average elapsed time: 0.00375467 miliseconds
        Agner Fog       Average elapsed time: 0.00422012 miliseconds
        switch 32 SSE   Average elapsed time: 0.00536824 miliseconds
        memmove()       Average elapsed time: 0.00577164 miliseconds
        switch AVX64    Average elapsed time: 0.00760242 miliseconds
        switch 32 AVX   Average elapsed time: 0.00775757 miliseconds
        rep movsb       Average elapsed time: 0.0198904 miliseconds

*** data size to move: 4192 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.118163 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.0144911 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.0160737 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.0123811 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.019487 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.01536 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.0143981 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.0123811 miliseconds
        switch AVX64    Average elapsed time: 0.0143981 miliseconds
        memmove()       Average elapsed time: 0.0144911 miliseconds
        switch 32 AVX   Average elapsed time: 0.01536 miliseconds
        switch 32 SSE   Average elapsed time: 0.0160737 miliseconds
        Apexmemmove     Average elapsed time: 0.019487 miliseconds
        rep movsb       Average elapsed time: 0.118163 miliseconds

*** data size to move: 8100 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.227297 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00965042 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.0110158 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00670254 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.0343505 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.0088126 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00822303 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.00670254 miliseconds
        switch AVX64    Average elapsed time: 0.00822303 miliseconds
        switch 32 AVX   Average elapsed time: 0.0088126 miliseconds
        memmove()       Average elapsed time: 0.00965042 miliseconds
        switch 32 SSE   Average elapsed time: 0.0110158 miliseconds
        Apexmemmove     Average elapsed time: 0.0343505 miliseconds
        rep movsb       Average elapsed time: 0.227297 miliseconds

*** data size to move: 15000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.146711 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.024576 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.0202938 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.0140878 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.0307821 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.0145842 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.0138705 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.0138705 miliseconds
        Agner Fog       Average elapsed time: 0.0140878 miliseconds
        switch 32 AVX   Average elapsed time: 0.0145842 miliseconds
        switch 32 SSE   Average elapsed time: 0.0202938 miliseconds
        memmove()       Average elapsed time: 0.024576 miliseconds
        Apexmemmove     Average elapsed time: 0.0307821 miliseconds
        rep movsb       Average elapsed time: 0.146711 miliseconds

*** data size to move: 65000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.665631 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.109785 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.117698 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.101779 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.126014 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.0988315 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.100073 miliseconds

        Sorted:
        switch 32 AVX   Average elapsed time: 0.0988315 miliseconds
        switch AVX64    Average elapsed time: 0.100073 miliseconds
        Agner Fog       Average elapsed time: 0.101779 miliseconds
        memmove()       Average elapsed time: 0.109785 miliseconds
        switch 32 SSE   Average elapsed time: 0.117698 miliseconds
        Apexmemmove     Average elapsed time: 0.126014 miliseconds
        rep movsb       Average elapsed time: 0.665631 miliseconds

*** data size to move: 127000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 1.30126 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.241695 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.251594 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.234093 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.271174 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.233658 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.233317 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.233317 miliseconds
        switch 32 AVX   Average elapsed time: 0.233658 miliseconds
        Agner Fog       Average elapsed time: 0.234093 miliseconds
        memmove()       Average elapsed time: 0.241695 miliseconds
        switch 32 SSE   Average elapsed time: 0.251594 miliseconds
        Apexmemmove     Average elapsed time: 0.271174 miliseconds
        rep movsb       Average elapsed time: 1.30126 miliseconds

*** data size to move: 255000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 2.65002 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.605091 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.629046 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.587093 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.649309 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.587745 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.587248 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.587093 miliseconds
        switch AVX64    Average elapsed time: 0.587248 miliseconds
        switch 32 AVX   Average elapsed time: 0.587745 miliseconds
        memmove()       Average elapsed time: 0.605091 miliseconds
        switch 32 SSE   Average elapsed time: 0.629046 miliseconds
        Apexmemmove     Average elapsed time: 0.649309 miliseconds
        rep movsb       Average elapsed time: 2.65002 miliseconds

*** data size to move: 10000000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 109.512 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 36.0363 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 36.9078 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 82.6607 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 59.343 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 39.8034 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 37.8208 miliseconds

        Sorted:
        memmove()       Average elapsed time: 36.0363 miliseconds
        switch 32 SSE   Average elapsed time: 36.9078 miliseconds
        switch AVX64    Average elapsed time: 37.8208 miliseconds
        switch 32 AVX   Average elapsed time: 39.8034 miliseconds
        Apexmemmove     Average elapsed time: 59.343 miliseconds
        Agner Fog       Average elapsed time: 82.6607 miliseconds
        rep movsb       Average elapsed time: 109.512 miliseconds

*** data size to move: 50000000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 568.017 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 254.294 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 254.692 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 440.9 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 296.629 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 266.094 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 264.82 miliseconds

        Sorted:
        memmove()       Average elapsed time: 254.294 miliseconds
        switch 32 SSE   Average elapsed time: 254.692 miliseconds
        switch AVX64    Average elapsed time: 264.82 miliseconds
        switch 32 AVX   Average elapsed time: 266.094 miliseconds
        Apexmemmove     Average elapsed time: 296.629 miliseconds
        Agner Fog       Average elapsed time: 440.9 miliseconds
        rep movsb       Average elapsed time: 568.017 miliseconds

Test completed. Press any key to exit...
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #80 on: November 07, 2019, 06:31:43 AM »
For people that only program in MASM, other assemblers or no Microsoft HLL it is possible to use all the 16 XMM/YMM (or the whole 32 ZMM registers) for data move. Only programs built with MSVC need to preserve (in 64-bit) the XMM7-XMM15 (or the lower YMM7-YMM15 or ZMM7-ZMM15) registers. In 32-bit there is no need to preserve anything, but in 32-bit there is only 8 XMM/YMM registers.