Author Topic: do you not approve of gotoswhy?do you instead like jumptables?  (Read 2203 times)

nidud

  • Member
  • *****
  • Posts: 1795
    • https://github.com/nidud/asmc
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #30 on: October 24, 2019, 10:24:52 PM »
Lets visualize the jump table scenario. To ensure reasonable performance what conditions need to be satisfied ? (please lambaste & educate)

I thought we had beaten that to death in the High Level Language thread, 3 years ago :rolleyes:

Been some changes of late on how to handle tables in 64-bit to make it position-independent. This is achieved by calculating the distance of each case from a fixed point rather than using the actual address.

Example:
https://github.com/nidud/asmc/blob/master/source/lib64/string/memcpy.asm

        .switch jmp r8

          .case 0
            ret

          .case 1
            mov cl,[rdx]
            mov [rax],cl
            ret

          .case 2,3,4


The fixed point here is the exit-label of the switch. The calculated address is then exit - case.

*   lea r11,@C0003
*   sub r11,[r8*8+r11-(MIN@C0002*8)+(@C0002-@C0003)]
*   jmp r11
...
*             .case 1
*   @C0006:
*               mov cl,[rdx]
*               mov [rax],cl
*               ret
*             .case 2,3,4
*   .case 2
*   @C0007:
*   .case 3
*   @C0008:
...
*           .endsw
*   jmp @C0003
*   ALIGN 8
*   @C0002:
*   MIN@C0002 equ 0
*   dq @C0003-@C0005
*   dq @C0003-@C0006
...
*   @C0003:

The whole jump-table then becomes position-independent as shown in the benchmark test (admittedly one of the reasons for this approach).
https://github.com/nidud/asmc/tree/master/source/test/benchmark/x64

The result illustrating the benefit with increasing table size. The normal switch adds a range test at the top but otherwise similar. Without a table each case are tested but with no jumps as done in a elseif-chain.

total [0 .. 4], 1++
    30365 cycles 3.asm: notest
    33243 cycles 1.asm: notable
    37697 cycles 2.asm: table
    44360 cycles 0.asm: elseif
total [5 .. 9], 1++
    30957 cycles 3.asm: notest
    41205 cycles 2.asm: table
    47558 cycles 1.asm: notable
    99094 cycles 0.asm: elseif
total [90 .. 92], 1++
    17361 cycles 3.asm: notest
    22373 cycles 2.asm: table
   122452 cycles 1.asm: notable
   440351 cycles 0.asm: elseif

daydreamer

  • Member
  • ****
  • Posts: 939
  • watch Chebyshev on the backside of the Moon
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #31 on: October 25, 2019, 02:07:56 AM »
That's what benchmarks are for.  :biggrin:

But given that higher level languages do use virtual tables for inheritance of classes. And that most people use higher level languages.
I wouldn't be surprised if the impact is not as bad as most would think.

As always. you are the craftsman. Is up to use to know what tool to use and when to use it. There is no silver bullet. Jump tables are just another tool.  One that every Win32 program uses, as when you call a DLL function, you use a virtual table to do so.
I have been thru c++ tutorials,and double their worth doing them with asm approach too: calculator ala c switch/case ,asm approach is jump table
So i also get experience in pro and cons
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D

daydreamer

  • Member
  • ****
  • Posts: 939
  • watch Chebyshev on the backside of the Moon
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #32 on: October 25, 2019, 02:14:53 AM »
Interesting code nidud
Quote
total [90 .. 92], 1++
    17361 cycles 3.asm: notest
    22373 cycles 2.asm: table
   122452 cycles 1.asm: notable
   440351 cycles 0.asm: elseif
impressive,elseif probably suffers from branch prediction misses
thanks nidud
« Last Edit: November 01, 2019, 03:56:32 AM by daydreamer »
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D

AW

  • Member
  • *****
  • Posts: 2431
  • Let's Make ASM Great Again!
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #33 on: October 26, 2019, 06:01:51 PM »
Been some changes of late on how to handle tables in 64-bit to make it position-independent. This is achieved by calculating the distance of each case from a fixed point rather than using the actual address.

Example:
https://github.com/nidud/asmc/blob/master/source/lib64/string/memcpy.asm

@nidud,

I liked this improvement.
But I believe this example is actually a memmove not a memcpy, because it appears to handle overlap.

nidud

  • Member
  • *****
  • Posts: 1795
    • https://github.com/nidud/asmc
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #34 on: October 26, 2019, 10:29:30 PM »
All memcpy routines handles overlaps these days so it's basically the same function:

    .code

memcpy::
memmove::

    mov rax,rcx

    .if r8 <= CHUNK

AW

  • Member
  • *****
  • Posts: 2431
  • Let's Make ASM Great Again!
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #35 on: October 27, 2019, 12:22:26 AM »
All memcpy routines handles overlaps these days so it's basically the same function:

 :biggrin:
I don't think so.

It appears that Microsoft has a  fall safe to memmove despite not mentioning it, it simply mentions that memcpy is banned.
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/memcpy-wmemcpy?view=vs-2019
Just in case you forgot:
"memcpy copies count bytes from src to dest; wmemcpy copies count wide characters (two bytes). If the source and destination overlap, the behavior of memcpy is undefined. Use memmove to handle overlapping regions."

Other compiler may have fall safe mechanisms as well but I had NEVER seen in any documentation a mention that memmove==memcpy.
Probably you did  :rolleyes:

nidud

  • Member
  • *****
  • Posts: 1795
    • https://github.com/nidud/asmc
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #36 on: October 27, 2019, 01:03:30 AM »
The documentation you refer to is some relics from the 16-bit area but from 32-bit and onwards all handles overlaps. This to do with sloppy programmers failing to apply each of them correctly resulting in a pileup of bug-reports, hence the fail-safe and other implementations.

So the documentation is still correct but in reality they are the same.

;***
;memmove.asm -
;
;       Copyright (c) Microsoft Corporation.  All rights reserved.
;
;Purpose:
;       memmove() copies a source memory buffer to a destination buffer.
;       Overlapping buffers are treated specially, to avoid propogation.
;
;       NOTE:  This stub module scheme is compatible with NT build
;       procedure.
;
;*******************************************************************************

    end

;memcpy.asm - contains memcpy and memmove routines
...
        public memmove

        LEAF_ENTRY_ARG3 memcpy, _TEXT, dst:ptr byte, src:ptr byte, count:dword

        OPTION PROLOGUE:NONE, EPILOGUE:NONE

        memmove = memcpy

        mov     r11, rcx                ; save destination address

AW

  • Member
  • *****
  • Posts: 2431
  • Let's Make ASM Great Again!
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #37 on: October 27, 2019, 01:21:46 AM »
Quote
The documentation you refer to is some relics from the 16-bit area
Yeah, the documentation for VS 2019 is a relics from the past.  :sad:

But, you showed how Microsoft is currently handling it, no doubts now.  :thumbsup:

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 6749
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #38 on: October 27, 2019, 02:15:52 AM »
 :biggrin:

What have I missed here, memory copy usually meant copy source to destination, memory move in MS-DOS meant renaming a file, memory move apart from that meant, memory copy THEN delete the source. The sense of "overlap" seems to be something like a binary patch, goto offset (whatever), write bytes to that offset. What have I missed ?
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

AW

  • Member
  • *****
  • Posts: 2431
  • Let's Make ASM Great Again!
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #39 on: October 27, 2019, 02:40:30 AM »
This is just an attempt to give an idea:

src
xxxxxxxxxxxxxxxxxx

dest
   xxxxxxxxxxxxxxxxxx
   
You have a number of bytes to be copied from src to dest.   
Since there is an overlap of memory addresses between src and dest, if you start copying from the beginning of src when you reach the 3rd byte of src what you will be copying is the 1st byte of src which was previously copied to dest early on. So, everything will be corrupted.
One possible strategy is to start copying from the end of src, whenever the address of src is below the address of dest and there is an overlap.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 6749
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #40 on: October 27, 2019, 11:37:26 AM »
Yes, that makes sense, don't overwrite your source with your destination. You can solve that problem by using an intermediate memory buffer but if you can find an order that works without needing the extra memory, it will be faster. You can routinely write back to a single buffer if your task is subtractive, IE: Stripping byte ranges from a string or deleting pointers from an array.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

AW

  • Member
  • *****
  • Posts: 2431
  • Let's Make ASM Great Again!
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #41 on: October 27, 2019, 08:46:17 PM »
32-bit version of nidud's memcpy/memmove (file 3.asm). It works perfectly although I have no idea what is the db 13 dup(0x90) for  :sad: (removed)

Code: [Select]
.686
.XMM

.MODEL FLAT, C
option casemap:none

.code

fastmemmove32 proc private uses ebx esi dest, src, _size
mov ecx, src
mov eax, dest
mov esi, _size

.if esi<= 64
.switch jmp esi
  .case 0
jmp @exit

  .case 1
mov dl,[ecx]
mov [eax],dl
jmp @exit
.case 2,3,4
mov dx,[ecx]
mov cx,[ecx+esi-2]
mov [eax+esi-2],cx
mov [eax],dx
jmp @exit

  .case 5,6,7,8
mov edx,[ecx]
mov ecx,[ecx+esi-4]
mov [eax+esi-4],ecx
mov [eax],edx
jmp @exit

  .case 9,10,11,12
mov edx,[ecx]
mov ebx,[ecx+esi-4]
mov ecx,[ecx+esi-8]
mov [eax],edx
mov [eax+esi-4], ebx
mov [eax+esi-8], ecx
jmp @exit

  .case 13,14,15,16
push edi
mov edx,[ecx]
mov ebx,[ecx+esi-4]
mov edi,[ecx+esi-8]
mov ecx,[ecx+esi-12]

mov [eax],edx
mov [eax+esi-4], ebx
mov [eax+esi-8], edi
mov [eax+esi-12], ecx
pop edi
jmp @exit

  .case 17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
movdqu xmm0,[ecx]
movdqu xmm1,[ecx+esi-16]
movups [eax],xmm0
movups [eax+esi-16],xmm1
jmp @exit

.case 33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,\
49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64
vmovdqu ymm0,[ecx]
vmovdqu ymm1,[ecx+esi-32]
vmovups [eax],ymm0
vmovups [eax+esi-32],ymm1
jmp @exit
.endsw
.endif
vmovdqu ymm2,[ecx]
vmovdqu ymm3,[ecx+32]
vmovdqu ymm4,[ecx+esi-32]
vmovdqu ymm5,[ecx+esi-64]
.if esi > 128
mov edx,eax
neg edx
and edx,64-1
add ecx,edx
mov ebx,esi
sub ebx,edx
add edx,eax
and bl,-64
.if edx > ecx
   .repeat
align 16    
sub ebx,64
vmovdqu ymm0,[ecx+ebx]
vmovdqu ymm1,[ecx+ebx+32]
vmovdqu [edx+ebx],ymm0
vmovdqu [edx+ebx+32],ymm1
.untilz
vmovdqu [eax],ymm2
vmovdqu [eax+32],ymm3
vmovdqu [eax+esi-32],ymm4
vmovdqu [eax+esi-64],ymm5
jmp @exit
.endif
lea edx,[edx+ebx]
lea ecx,[ecx+ebx]
neg ebx
.repeat
align 16
vmovdqu ymm0,[ecx+ebx]
vmovdqu ymm1,[ecx+ebx+32]
vmovdqu [edx+ebx],ymm0
vmovdqu [edx+ebx+32],ymm1
add ebx,64
.untilz
.endif
vmovdqu [eax],ymm2
vmovdqu [eax+32],ymm3
vmovdqu [eax+esi-32],ymm4
vmovdqu [eax+esi-64],ymm5
@exit:
ret
fastmemmove32 endp

end

If anyone is up to develop a version using AVX-512 I will be able to test it for the next few weeks.


« Last Edit: October 28, 2019, 05:36:49 AM by AW »

nidud

  • Member
  • *****
  • Posts: 1795
    • https://github.com/nidud/asmc
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #42 on: October 27, 2019, 11:36:23 PM »
32-bit version of nidud's memcpy/memmove (file 3.asm). It works perfectly although I have no idea what is the db 13 dup(0x90) for  :sad:

The fill is used to align the loop below. The first loop should also be aligned above.

You may use the list file for the offsets:

000002DB                    .if rcx > rdx
000002DB  483BCA            *   cmp rcx , rdx
000002DE  7645              *   jna @C0047
000002E0                    .repeat
000002E0                    *   @C0048:
000002E0  4983E940              sub r9,64
000002E4  C4C17E6F0411          vmovdqu ymm0,[rdx+r9]
000002EA  C4C17E6F4C1120        vmovdqu ymm1,[rdx+r9+32]
000002F1  C4C17D7F0409          vmovdqa [rcx+r9],ymm0
000002F7  C4C17D7F4C0920        vmovdqa [rcx+r9+32],ymm1
000002FE                    .untilz
000002FE  75E0              *   jne @C0048
00000300  C5FE7F10              vmovdqu [rax],ymm2
00000304  C5FE7F5820            vmovdqu [rax+32],ymm3
00000309  C4C17E7F6400E0        vmovdqu [rax+r8-32],ymm4
00000310  C4C17E7F6C00C0        vmovdqu [rax+r8-64],ymm5
00000317  C3                    ret
00000318  909090909090909090    db 13 dup(0x90)
00000325                    .endif
00000325                    *   @C0047:
00000325  498D0C09              lea rcx,[rcx+r9]
00000329  498D1411              lea rdx,[rdx+r9]
0000032D  49F7D9                neg r9
00000330                    .repeat
00000330                    *   @C0049:
00000330  C4C17E6F0411          vmovdqu ymm0,[rdx+r9]
00000336  C4C17E6F4C1120        vmovdqu ymm1,[rdx+r9+32]
0000033D  C4C17D7F0409          vmovdqa [rcx+r9],ymm0
00000343  C4C17D7F4C0920        vmovdqa [rcx+r9+32],ymm1
0000034A  4983C140              add r9,64
0000034E                    .untilz
0000034E  75E0              *   jne @C0049

AW

  • Member
  • *****
  • Posts: 2431
  • Let's Make ASM Great Again!
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #43 on: October 28, 2019, 04:26:00 AM »
I see ( :rolleyes:), but I think that inserting "align 16" is more reliable, or not?

Vortex

  • Member
  • *****
  • Posts: 2030
Re: do you not approve of gotoswhy?do you instead like jumptables?
« Reply #44 on: October 28, 2019, 05:17:22 AM »
Hi AW,

The Align statement inside the code can produce the following error message :

Code: [Select]
error A2189: invalid combination with segment alignment : 16