Author Topic: do you not approve of gotoswhy?do you instead like jumptables?  (Read 21336 times)

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #75 on: October 31, 2019, 11:42:07 PM »
deleted
« Last Edit: February 26, 2022, 03:21:28 AM by nidud »

aw27

  • Guest
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #76 on: November 06, 2019, 01:10:57 AM »
This is very suboptimal AVX-512 memmove based on previous Nidud AVX-64 memmove. It was tested and works very well, but slowly as mentioned.
I have done it in UASM because there is currently a problem with ASMC with AVX-512 instructions.

Code: [Select]
.xmm

OPTION EVEX:1
option SWITCHSTYLE : ASMSTYLE
option  win64 : 6


option casemap:none

.code

switchAVX512_64 proc
   mov r10,rcx

    .if r8 <= 128
.switch r8
         .case 0
            ret

          .case 1
            mov cl,[rdx]
            mov [r10],cl
            ret

          .case 2,3,4
            mov cx,[rdx]
            mov dx,[rdx+r8-2]
            mov [r10+r8-2],dx
            mov [r10],cx
            ret

          .case 5,6,7,8
            mov ecx,[rdx]
            mov edx,[rdx+r8-4]
            mov [r10+r8-4],edx
            mov [r10],ecx
            ret

          .case 9,10,11,12,13,14,15,16
            mov rcx,[rdx]
            mov rdx,[rdx+r8-8]
            mov [r10],rcx
            mov [r10+r8-8],rdx
            ret
         .case 17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
            vmovdqu xmm0,[rdx]
            vmovdqu xmm1,[rdx+r8-16]
            vmovups [r10],xmm0
            vmovups [r10+r8-16],xmm1
            ret
          .case 33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,\
                49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64
            vmovdqu ymm0,[rdx]
            vmovdqu ymm1,[rdx+r8-32]
            vmovups [r10],ymm0
            vmovups [r10+r8-32],ymm1
ret
         .case 65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,\
88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,\
110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128
vmovdqu8 zmm0, [rdx]
vmovdqu8 zmm1,[rdx+r8-64]
            vmovups [r10],zmm0
            vmovups [r10+r8-64],zmm1
ret
.endswitch
.endif
    vmovdqu8 zmm2,[rdx]
    vmovdqu8 zmm3,[rdx+64]
    vmovdqu8 zmm4,[rdx+r8-64]
    vmovdqu8 zmm5,[rdx+r8-128]
    .if r8 > 256
       mov ecx,r10d
        neg ecx
        and ecx,128-1
        add rdx,rcx
        mov r9,r8
        sub r9,rcx
        add rcx,r10
        and r9b,-128
.if rcx > rdx
            .while 1
                sub r9,128
                vmovdqu8 zmm0,[rdx+r9]
                vmovdqu8 zmm1,[rdx+r9+64]
                vmovdqu8 [rcx+r9],zmm0
                vmovdqu8 [rcx+r9+64],zmm1
.if ZERO?
.break
.endif
            .endw
            vmovdqu8 [r10],zmm2
            vmovdqu8 [r10+64],zmm3
            vmovdqu8 [r10+r8-64],zmm4
            vmovdqu8 [r10+r8-128],zmm5
            ret
            ;db 13 dup(0x90)
        .endif

        lea rcx,[rcx+r9]
        lea rdx,[rdx+r9]
        neg r9
        .while 1
            vmovdqu8 zmm0,[rdx+r9]
            vmovdqu8 zmm1,[rdx+r9+64]
            vmovdqu8 [rcx+r9],zmm0
            vmovdqu8 [rcx+r9+64],zmm1
            add r9,128
.if ZERO?
.break
.endif
        .endw
    .endif
    vmovdqu8 [r10],zmm2
    vmovdqu8 [r10+64],zmm3
    vmovdqu8 [r10+r8-64],zmm4
    vmovdqu8 [r10+r8-128],zmm5
ret
switchAVX512_64 endp

end

I attach the test program (no source code at this time). The program will run only the tests that are compatible with the computer. People without AVX will run 5 tests, people with AVX 7 tests and people with AVX- 512 8 tests.

There is a lot to say but only a few notes now:
- I added the rep movsb modified to support overlapping. This makes it not competitive against the others.
- The AVX memoves from nidud are normally faster.
- Agner Fog memove supports AVX and AVX-512 and has the advantage of falling back when the system has no AVX or above.
- memmove has a decent performance and does not use AVX, only SSE.
- This test also includes apex-memmove which claims to be the fastest memcpy/memmove on x86/x64 .. EVER, written in C.
https://www.codeproject.com/Articles/1110153/Apex-memmove-the-fastest-memcpy-memmove-on-x-x-EVE
It does not shine and is not up to its claims.

I attach also the results of the test performed on the Xeon with AVX-512. The times are slow due to the conditions under which the test was performed, so what is meaningful is how the various tests stake against one another.



daydreamer

  • Member
  • *****
  • Posts: 2397
  • my kind of REAL10 Blonde
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #77 on: November 06, 2019, 08:00:06 AM »
thanks nidud and AW :thumbsup:

good question on memmove speed works on laptops with shared memory,while you simultanously run pixelshaders simultanously on gpu?terrible slowdown compared to a stationary with separate system ram and vram?on the other hand system ram->vram direction streamlined,will screencapture program suffer from usual very slow vram read on that kind of system,but hardly notice read screenmemory on shared memory?

Code: [Select]
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz

(AVX-512 not supported, will not run the AVX-512 test.)
Testing: rep movsb, memmove(), switch 32 SSE, Agner Fog, Apexmemmove, switch 32 AVX, switch AVX64

Filling random data source array for all experiments. Number of elements: 100000000
*** data size to move: 1 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00551 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00107 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00228 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00299 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00109 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.0011 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00108 miliseconds

        Sorted:
        memmove()       Average elapsed time: 0.00107 miliseconds
        switch AVX64    Average elapsed time: 0.00108 miliseconds
        Apexmemmove     Average elapsed time: 0.00109 miliseconds
        switch 32 AVX   Average elapsed time: 0.0011 miliseconds
        switch 32 SSE   Average elapsed time: 0.00228 miliseconds
        Agner Fog       Average elapsed time: 0.00299 miliseconds
        rep movsb       Average elapsed time: 0.00551 miliseconds

*** data size to move: 7 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00328 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00162 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00109 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00138 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00125 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00111 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00185 miliseconds

        Sorted:
        switch 32 SSE   Average elapsed time: 0.00109 miliseconds
        switch 32 AVX   Average elapsed time: 0.00111 miliseconds
        Apexmemmove     Average elapsed time: 0.00125 miliseconds
        Agner Fog       Average elapsed time: 0.00138 miliseconds
        memmove()       Average elapsed time: 0.00162 miliseconds
        switch AVX64    Average elapsed time: 0.00185 miliseconds
        rep movsb       Average elapsed time: 0.00328 miliseconds

*** data size to move: 16 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00365 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00179 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00152 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00124 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00157 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00138 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.0018 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.00124 miliseconds
        switch 32 AVX   Average elapsed time: 0.00138 miliseconds
        switch 32 SSE   Average elapsed time: 0.00152 miliseconds
        Apexmemmove     Average elapsed time: 0.00157 miliseconds
        memmove()       Average elapsed time: 0.00179 miliseconds
        switch AVX64    Average elapsed time: 0.0018 miliseconds
        rep movsb       Average elapsed time: 0.00365 miliseconds

*** data size to move: 46 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00385 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.0016 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00136 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00143 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00167 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00169 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00119 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.00119 miliseconds
        switch 32 SSE   Average elapsed time: 0.00136 miliseconds
        Agner Fog       Average elapsed time: 0.00143 miliseconds
        memmove()       Average elapsed time: 0.0016 miliseconds
        Apexmemmove     Average elapsed time: 0.00167 miliseconds
        switch 32 AVX   Average elapsed time: 0.00169 miliseconds
        rep movsb       Average elapsed time: 0.00385 miliseconds

*** data size to move: 128 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00424 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.0022 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00141 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00196 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00162 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00128 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00122 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.00122 miliseconds
        switch 32 AVX   Average elapsed time: 0.00128 miliseconds
        switch 32 SSE   Average elapsed time: 0.00141 miliseconds
        Apexmemmove     Average elapsed time: 0.00162 miliseconds
        Agner Fog       Average elapsed time: 0.00196 miliseconds
        memmove()       Average elapsed time: 0.0022 miliseconds
        rep movsb       Average elapsed time: 0.00424 miliseconds

*** data size to move: 511 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.01241 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.0031 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.0022 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00248 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00201 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00204 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00219 miliseconds

        Sorted:
        Apexmemmove     Average elapsed time: 0.00201 miliseconds
        switch 32 AVX   Average elapsed time: 0.00204 miliseconds
        switch AVX64    Average elapsed time: 0.00219 miliseconds
        switch 32 SSE   Average elapsed time: 0.0022 miliseconds
        Agner Fog       Average elapsed time: 0.00248 miliseconds
        memmove()       Average elapsed time: 0.0031 miliseconds
        rep movsb       Average elapsed time: 0.01241 miliseconds

*** data size to move: 4192 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.04663 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00591 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00895 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00887 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00815 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00583 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00466 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.00466 miliseconds
        switch 32 AVX   Average elapsed time: 0.00583 miliseconds
        memmove()       Average elapsed time: 0.00591 miliseconds
        Apexmemmove     Average elapsed time: 0.00815 miliseconds
        Agner Fog       Average elapsed time: 0.00887 miliseconds
        switch 32 SSE   Average elapsed time: 0.00895 miliseconds
        rep movsb       Average elapsed time: 0.04663 miliseconds

*** data size to move: 8100 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.08604 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.01023 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.01185 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00777 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.01502 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.01117 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00869 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.00777 miliseconds
        switch AVX64    Average elapsed time: 0.00869 miliseconds
        memmove()       Average elapsed time: 0.01023 miliseconds
        switch 32 AVX   Average elapsed time: 0.01117 miliseconds
        switch 32 SSE   Average elapsed time: 0.01185 miliseconds
        Apexmemmove     Average elapsed time: 0.01502 miliseconds
        rep movsb       Average elapsed time: 0.08604 miliseconds

*** data size to move: 15000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.15626 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.01744 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.02019 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.01225 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.02498 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.01553 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.0149 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.01225 miliseconds
        switch AVX64    Average elapsed time: 0.0149 miliseconds
        switch 32 AVX   Average elapsed time: 0.01553 miliseconds
        memmove()       Average elapsed time: 0.01744 miliseconds
        switch 32 SSE   Average elapsed time: 0.02019 miliseconds
        Apexmemmove     Average elapsed time: 0.02498 miliseconds
        rep movsb       Average elapsed time: 0.15626 miliseconds

*** data size to move: 65000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.7477 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.092 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.10118 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.06766 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.12553 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.07437 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.06671 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.06671 miliseconds
        Agner Fog       Average elapsed time: 0.06766 miliseconds
        switch 32 AVX   Average elapsed time: 0.07437 miliseconds
        memmove()       Average elapsed time: 0.092 miliseconds
        switch 32 SSE   Average elapsed time: 0.10118 miliseconds
        Apexmemmove     Average elapsed time: 0.12553 miliseconds
        rep movsb       Average elapsed time: 0.7477 miliseconds

*** data size to move: 127000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 1.40537 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.19132 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.22892 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.15608 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.25693 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.17683 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.16972 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.15608 miliseconds
        switch AVX64    Average elapsed time: 0.16972 miliseconds
        switch 32 AVX   Average elapsed time: 0.17683 miliseconds
        memmove()       Average elapsed time: 0.19132 miliseconds
        switch 32 SSE   Average elapsed time: 0.22892 miliseconds
        Apexmemmove     Average elapsed time: 0.25693 miliseconds
        rep movsb       Average elapsed time: 1.40537 miliseconds

*** data size to move: 255000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 2.86089 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.43914 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.47659 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.40101 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.52976 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.41554 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.4243 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.40101 miliseconds
        switch 32 AVX   Average elapsed time: 0.41554 miliseconds
        switch AVX64    Average elapsed time: 0.4243 miliseconds
        memmove()       Average elapsed time: 0.43914 miliseconds
        switch 32 SSE   Average elapsed time: 0.47659 miliseconds
        Apexmemmove     Average elapsed time: 0.52976 miliseconds
        rep movsb       Average elapsed time: 2.86089 miliseconds

*** data size to move: 10000000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 133.026 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 52.5734 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 54.9919 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 67.5378 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 61.6622 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 54.2498 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 54.7909 miliseconds

        Sorted:
        memmove()       Average elapsed time: 52.5734 miliseconds
        switch 32 AVX   Average elapsed time: 54.2498 miliseconds
        switch AVX64    Average elapsed time: 54.7909 miliseconds
        switch 32 SSE   Average elapsed time: 54.9919 miliseconds
        Apexmemmove     Average elapsed time: 61.6622 miliseconds
        Agner Fog       Average elapsed time: 67.5378 miliseconds
        rep movsb       Average elapsed time: 133.026 miliseconds

*** data size to move: 50000000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 680.076 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 311.9 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 317.202 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 368.016 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 326.971 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 319.564 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 323.725 miliseconds

        Sorted:
        memmove()       Average elapsed time: 311.9 miliseconds
        switch 32 SSE   Average elapsed time: 317.202 miliseconds
        switch 32 AVX   Average elapsed time: 319.564 miliseconds
        switch AVX64    Average elapsed time: 323.725 miliseconds
        Apexmemmove     Average elapsed time: 326.971 miliseconds
        Agner Fog       Average elapsed time: 368.016 miliseconds
        rep movsb       Average elapsed time: 680.076 miliseconds

Test completed. Press any key to exit...
my none asm creations
http://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

aw27

  • Guest
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #78 on: November 06, 2019, 04:31:14 PM »
Thank you for testing, daydreamer. The theoretical aspects of your question will have to be answered by Agner Fog or some other expert. But your results are consistent with others I have found. In particular the small functions first implemented by nidud tend to perform very well. Even the suboptimized AVX-512 version performs better than the KNNSpeed AVX Memove, as I found yesterday. This is interesting, all runtime efforts those routines do to select the best approach for each particular case wastes a lot of CPU cycles and even a suboptimzed straightforward function outperforms them (at least under this test conditions).

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10583
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #79 on: November 06, 2019, 04:55:36 PM »
The speed test, Haswell E/EP

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz

(AVX-512 not supported, will not run the AVX-512 test.)
Testing: rep movsb, memmove(), switch 32 SSE, Agner Fog, Apexmemmove, switch 32 AVX, switch AVX64

Filling random data source array for all experiments. Number of elements: 100000000
*** data size to move: 1 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.016353 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.0023583 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00496485 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.009216 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00257551 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00238933 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00232727 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.00232727 miliseconds
        memmove()       Average elapsed time: 0.0023583 miliseconds
        switch 32 AVX   Average elapsed time: 0.00238933 miliseconds
        Apexmemmove     Average elapsed time: 0.00257551 miliseconds
        switch 32 SSE   Average elapsed time: 0.00496485 miliseconds
        Agner Fog       Average elapsed time: 0.009216 miliseconds
        rep movsb       Average elapsed time: 0.016353 miliseconds

*** data size to move: 7 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00958836 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.002048 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00189285 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00266861 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00325818 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00211006 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00462351 miliseconds

        Sorted:
        switch 32 SSE   Average elapsed time: 0.00189285 miliseconds
        memmove()       Average elapsed time: 0.002048 miliseconds
        switch 32 AVX   Average elapsed time: 0.00211006 miliseconds
        Agner Fog       Average elapsed time: 0.00266861 miliseconds
        Apexmemmove     Average elapsed time: 0.00325818 miliseconds
        switch AVX64    Average elapsed time: 0.00462351 miliseconds
        rep movsb       Average elapsed time: 0.00958836 miliseconds

*** data size to move: 16 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00651636 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00198594 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00195491 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00229624 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00183079 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00238933 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00229624 miliseconds

        Sorted:
        Apexmemmove     Average elapsed time: 0.00183079 miliseconds
        switch 32 SSE   Average elapsed time: 0.00195491 miliseconds
        memmove()       Average elapsed time: 0.00198594 miliseconds
        Agner Fog       Average elapsed time: 0.00229624 miliseconds
        switch AVX64    Average elapsed time: 0.00229624 miliseconds
        switch 32 AVX   Average elapsed time: 0.00238933 miliseconds
        rep movsb       Average elapsed time: 0.00651636 miliseconds

*** data size to move: 46 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.00859539 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00297891 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00223418 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00251345 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00248242 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.0068577 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00738521 miliseconds

        Sorted:
        switch 32 SSE   Average elapsed time: 0.00223418 miliseconds
        Apexmemmove     Average elapsed time: 0.00248242 miliseconds
        Agner Fog       Average elapsed time: 0.00251345 miliseconds
        memmove()       Average elapsed time: 0.00297891 miliseconds
        switch 32 AVX   Average elapsed time: 0.0068577 miliseconds
        switch AVX64    Average elapsed time: 0.00738521 miliseconds
        rep movsb       Average elapsed time: 0.00859539 miliseconds

*** data size to move: 128 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.0106124 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00282376 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00279273 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00415806 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00319612 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00555442 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00521309 miliseconds

        Sorted:
        switch 32 SSE   Average elapsed time: 0.00279273 miliseconds
        memmove()       Average elapsed time: 0.00282376 miliseconds
        Apexmemmove     Average elapsed time: 0.00319612 miliseconds
        Agner Fog       Average elapsed time: 0.00415806 miliseconds
        switch AVX64    Average elapsed time: 0.00521309 miliseconds
        switch 32 AVX   Average elapsed time: 0.00555442 miliseconds
        rep movsb       Average elapsed time: 0.0106124 miliseconds

*** data size to move: 511 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.0198904 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00577164 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.00536824 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00422012 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.00375467 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.00775757 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00760242 miliseconds

        Sorted:
        Apexmemmove     Average elapsed time: 0.00375467 miliseconds
        Agner Fog       Average elapsed time: 0.00422012 miliseconds
        switch 32 SSE   Average elapsed time: 0.00536824 miliseconds
        memmove()       Average elapsed time: 0.00577164 miliseconds
        switch AVX64    Average elapsed time: 0.00760242 miliseconds
        switch 32 AVX   Average elapsed time: 0.00775757 miliseconds
        rep movsb       Average elapsed time: 0.0198904 miliseconds

*** data size to move: 4192 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.118163 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.0144911 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.0160737 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.0123811 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.019487 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.01536 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.0143981 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.0123811 miliseconds
        switch AVX64    Average elapsed time: 0.0143981 miliseconds
        memmove()       Average elapsed time: 0.0144911 miliseconds
        switch 32 AVX   Average elapsed time: 0.01536 miliseconds
        switch 32 SSE   Average elapsed time: 0.0160737 miliseconds
        Apexmemmove     Average elapsed time: 0.019487 miliseconds
        rep movsb       Average elapsed time: 0.118163 miliseconds

*** data size to move: 8100 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.227297 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.00965042 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.0110158 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.00670254 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.0343505 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.0088126 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.00822303 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.00670254 miliseconds
        switch AVX64    Average elapsed time: 0.00822303 miliseconds
        switch 32 AVX   Average elapsed time: 0.0088126 miliseconds
        memmove()       Average elapsed time: 0.00965042 miliseconds
        switch 32 SSE   Average elapsed time: 0.0110158 miliseconds
        Apexmemmove     Average elapsed time: 0.0343505 miliseconds
        rep movsb       Average elapsed time: 0.227297 miliseconds

*** data size to move: 15000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.146711 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.024576 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.0202938 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.0140878 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.0307821 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.0145842 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.0138705 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.0138705 miliseconds
        Agner Fog       Average elapsed time: 0.0140878 miliseconds
        switch 32 AVX   Average elapsed time: 0.0145842 miliseconds
        switch 32 SSE   Average elapsed time: 0.0202938 miliseconds
        memmove()       Average elapsed time: 0.024576 miliseconds
        Apexmemmove     Average elapsed time: 0.0307821 miliseconds
        rep movsb       Average elapsed time: 0.146711 miliseconds

*** data size to move: 65000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 0.665631 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.109785 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.117698 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.101779 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.126014 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.0988315 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.100073 miliseconds

        Sorted:
        switch 32 AVX   Average elapsed time: 0.0988315 miliseconds
        switch AVX64    Average elapsed time: 0.100073 miliseconds
        Agner Fog       Average elapsed time: 0.101779 miliseconds
        memmove()       Average elapsed time: 0.109785 miliseconds
        switch 32 SSE   Average elapsed time: 0.117698 miliseconds
        Apexmemmove     Average elapsed time: 0.126014 miliseconds
        rep movsb       Average elapsed time: 0.665631 miliseconds

*** data size to move: 127000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 1.30126 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.241695 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.251594 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.234093 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.271174 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.233658 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.233317 miliseconds

        Sorted:
        switch AVX64    Average elapsed time: 0.233317 miliseconds
        switch 32 AVX   Average elapsed time: 0.233658 miliseconds
        Agner Fog       Average elapsed time: 0.234093 miliseconds
        memmove()       Average elapsed time: 0.241695 miliseconds
        switch 32 SSE   Average elapsed time: 0.251594 miliseconds
        Apexmemmove     Average elapsed time: 0.271174 miliseconds
        rep movsb       Average elapsed time: 1.30126 miliseconds

*** data size to move: 255000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 2.65002 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 0.605091 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 0.629046 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 0.587093 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 0.649309 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 0.587745 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 0.587248 miliseconds

        Sorted:
        Agner Fog       Average elapsed time: 0.587093 miliseconds
        switch AVX64    Average elapsed time: 0.587248 miliseconds
        switch 32 AVX   Average elapsed time: 0.587745 miliseconds
        memmove()       Average elapsed time: 0.605091 miliseconds
        switch 32 SSE   Average elapsed time: 0.629046 miliseconds
        Apexmemmove     Average elapsed time: 0.649309 miliseconds
        rep movsb       Average elapsed time: 2.65002 miliseconds

*** data size to move: 10000000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 109.512 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 36.0363 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 36.9078 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 82.6607 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 59.343 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 39.8034 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 37.8208 miliseconds

        Sorted:
        memmove()       Average elapsed time: 36.0363 miliseconds
        switch 32 SSE   Average elapsed time: 36.9078 miliseconds
        switch AVX64    Average elapsed time: 37.8208 miliseconds
        switch 32 AVX   Average elapsed time: 39.8034 miliseconds
        Apexmemmove     Average elapsed time: 59.343 miliseconds
        Agner Fog       Average elapsed time: 82.6607 miliseconds
        rep movsb       Average elapsed time: 109.512 miliseconds

*** data size to move: 50000000 bytes ***

        Testing rep movsb
        Average elapsed time per repetition: 568.017 miliseconds
        Testing memmove()
        Average elapsed time per repetition: 254.294 miliseconds
        Testing switch 32 SSE
        Average elapsed time per repetition: 254.692 miliseconds
        Testing Agner Fog
        Average elapsed time per repetition: 440.9 miliseconds
        Testing Apexmemmove
        Average elapsed time per repetition: 296.629 miliseconds
        Testing switch 32 AVX
        Average elapsed time per repetition: 266.094 miliseconds
        Testing switch AVX64
        Average elapsed time per repetition: 264.82 miliseconds

        Sorted:
        memmove()       Average elapsed time: 254.294 miliseconds
        switch 32 SSE   Average elapsed time: 254.692 miliseconds
        switch AVX64    Average elapsed time: 264.82 miliseconds
        switch 32 AVX   Average elapsed time: 266.094 miliseconds
        Apexmemmove     Average elapsed time: 296.629 miliseconds
        Agner Fog       Average elapsed time: 440.9 miliseconds
        rep movsb       Average elapsed time: 568.017 miliseconds

Test completed. Press any key to exit...
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

aw27

  • Guest
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #80 on: November 07, 2019, 06:31:43 AM »
For people that only program in MASM, other assemblers or no Microsoft HLL it is possible to use all the 16 XMM/YMM (or the whole 32 ZMM registers) for data move. Only programs built with MSVC need to preserve (in 64-bit) the XMM7-XMM15 (or the lower YMM7-YMM15 or ZMM7-ZMM15) registers. In 32-bit there is no need to preserve anything, but in 32-bit there is only 8 XMM/YMM registers.