News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

OPTION STACKBASE fails in x86

Started by aw27, March 20, 2017, 06:15:33 PM

Previous topic - Next topic

johnsa

lol

well with a pinch of salt, stackbase:rsp is a lot more useful as the x64 fastcall ABI is a lot more lenient in that regard and more powerful, avoid push/pop and liberally aligns stack etc.. so yeah for 64bit code I'd definitely say STACKBASE:RSP is the way to go.

The x64 stack unwind works with it so it'll work with VEH etc and it's nice a fast and frees up RBP.

STACKBACK:RSP remains and always will, it's just the 32bit ESP version that has gone to it's grave :)

Also I also feel less bad about having hjwasm 64bit stuff diverge from MASM because masm + 64bit is crippled anyway and they've always been very different assemblers for x64, for x86 they were 99% compatible and I see no reason not to leave it like that, I never had any complaints with 32bit MASM.. it was my assembler of choice pre 64bit days.

aw27

Quote from: johnsa on March 24, 2017, 01:42:58 AM
lol
well with a pinch of salt, stackbase:rsp is a lot more useful as the x64 fastcall ABI
Last time I checked there were serious issues while using stackbase:rsp, I don't know if are already solved.
On the other hand I am not yet convinced it will be advantageous because instructions are longer and execution appear slower. Of course, it will release the rbp register which might be useful.

johnsa

I've not got any examples of stackbase:rsp being an issue, as I said I use it in +- 500k's worth of code.
The only issue I had found from your previous post after digging through and re-checking everything was just the omission of the FRAME attribute on the PROC.

And yes, you free up RBP and the prologue/epilogue are shorter (which == faster) and the stack isn't constantly fiddled with like other modes so that should improve cache use.

aw27

Quote from: johnsa on March 24, 2017, 02:13:28 AM
I've not got any examples of stackbase:rsp being an issue

You just forgot.


option casemap:none
option frame:auto
OPTION STACKBASE:RSP
option win64:11

.code

sub1 proc private dest:ptr, src:ptr, val1 : qword, val2:qword
mov dest, rcx
mov src, rdx
mov val1, r8
mov val2, r9
mov rax, qword ptr [rdx]
add rax, val1
add rax, val2
mov qword ptr [rcx], rax

ret
sub1 endp

getSum proc public dest:ptr, src:ptr, val1 : qword, val2:qword
INVOKE sub1, dest, rdx, r8, r9
ret
getSum endp
end


sub1 will be invoked with the stack not aligned (release HJWasm 2.21, not checked yet on latest).

johnsa

you need to put FRAME on the proc decoration

sub1 proc private FRAME dest:ptr, src:ptr, val1 : qword, val2:qword

and

getSum proc public FRAME dest:ptr, src:ptr, val1 : qword, val2:qword

(I've attached  the link to the C/C++ project with updated asm in the other thread, it all runs through perfectly). (Just for reference, but you can just add these two your side).

aw27

Quote from: johnsa on March 24, 2017, 02:53:23 AM
you need to put FRAME on the proc decoration

Now, imagine I want to use the good old standard RBP base frame. That possibility appears to have vanished completely.

option casemap:none
option frame:auto
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; OPTION STACKBASE:RSP
option win64:11
option ARCH:SSE

.code

sub1 proc private FRAME dest:ptr, src:ptr, val1 : qword, val2:qword
mov rax, qword ptr [rdx]
add rax, val1
add rax, val2
mov qword ptr [rcx], rax

ret
sub1 endp

getSum proc public FRAME uses xmm6 xmm7 dest:ptr, src:ptr, val1 : qword, val2:qword
LOCAL myVar1 : qword
mov rax, rdx
mov myVar1, rax
mov rdx, myVar1
INVOKE sub1, dest, rdx, val1, r9
ret
getSum endp

end

Disassembles to:
getSum:
000000013FDA181B  mov         qword ptr [rsp+8],rcx 
000000013FDA1820  mov         qword ptr [rsp+18h],r8 
000000013FDA1825  sub         rsp,50h 
000000013FDA1829  movdqa      xmmword ptr [rsp+20h],xmm6 
000000013FDA182F  movdqa      xmmword ptr [rsp+30h],xmm7 
000000013FDA1835  mov         rax,rdx 
000000013FDA1838  mov         qword ptr [rsp+40h],rax 
000000013FDA183D  mov         rdx,qword ptr [rsp+40h] 
000000013FDA1842  mov         rcx,qword ptr [rsp+58h] 
000000013FDA1847  mov         r8,qword ptr [rsp+68h] 
000000013FDA184C  call        000000013FDA1800 
000000013FDA1851  movdqa      xmm6,xmmword ptr [rsp+20h] 
000000013FDA1857  movdqa      xmm7,xmmword ptr [rsp+30h] 
000000013FDA185D  add         rsp,50h 
000000013FDA1861  ret



Where is my RBP base frame? Please help!  :dazzled:

And we have the stack not aligned again! OMG!  :dazzled:

hutch--

aw27,

Spare us the wise cracks, these guys are doing a lot of work and don't need nonsense. If you can help with the BETA testing, well and good but no further nonsense.

johnsa


remove option win64:11 AND stackbase:RSP
and it reverts to working the old-fashion JWASM and ML style way, which produces this (and works on that same test project I linked to) :




getSum proc public frame dest:ptr, src : ptr, val1 : qword, val2 : qword
000000013FEA1834 55                   push        rbp 
000000013FEA1835 48 8B EC             mov         rbp,rsp 
mov dest, rcx
000000013FEA1838 48 89 4D 10          mov         qword ptr [dest],rcx 
mov src, rdx
000000013FEA183C 48 89 55 18          mov         qword ptr [src],rdx 
mov val1, r8
000000013FEA1840 4C 89 45 20          mov         qword ptr [val1],r8 
mov val2, r9
000000013FEA1844 4C 89 4D 28          mov         qword ptr [val2],r9 
INVOKE sub1, dest, src, val1, val2
000000013FEA1848 48 83 EC 20          sub         rsp,20h 
000000013FEA184C 48 8B 4D 10          mov         rcx,qword ptr [dest] 
000000013FEA1850 48 8B 55 18          mov         rdx,qword ptr [src] 
000000013FEA1854 4C 8B 45 20          mov         r8,qword ptr [val1] 
000000013FEA1858 4C 8B 4D 28          mov         r9,qword ptr [val2] 
000000013FEA185C E8 AF FF FF FF       call        sub1 (013FEA1810h) 
000000013FEA1861 48 83 C4 20          add         rsp,20h 
ret
000000013FEA1865 5D                   pop         rbp 
000000013FEA1866 C3                   ret 



aw27

Quote from: johnsa on March 24, 2017, 04:21:20 AM
remove option win64:11 AND stackbase:RSP
and it reverts to working the old-fashion JWASM and ML style way, which produces this (and works on that same test project I linked to) :

But why becomes the stack not aligned in my previous example?

johnsa

This comes back to what I was saying about all these modes and options, it all boils down to really several "supportable" and sensible options

Do pretty much nothing special, RBP frame pointer style / aligned.

option frame:auto


Or the "do everything optimally way"

option frame:auto
option stackbase:rsp
option win64:11


Or possibly the only OTHER option being

option frame:auto
option win64:1


So basically we have 3 modes:
totally dumb, smart, mostly dumb

So when i was saying earlier about removing all those modes perhaps we just replace all this above complex combinatorial stuff with 2 simple directives..

OPTION WIN64:SIMPLE
OPTION WIN64:AUTO

or something like that.

Here is an example of why I think a lot of these modes are irrelevant :




option casemap : none
option frame : auto
option win64 : 11
OPTION STACKBASE : RSP

OptimalProc PROTO aVar : QWORD, bVar : DWORD
AutoProc    PROTO aVar : QWORD, bVar : DWORD
AutoProc2   PROTO aVar : QWORD, bVar : DWORD

.code

sub1 proc private frame dest : ptr, src : ptr, val1 : qword, val2 : qword
mov dest, rcx
mov src, rdx
mov val1, r8
mov val2, r9
mov rax, qword ptr[rdx]
add rax, val1
add rax, val2
mov qword ptr[rcx], rax

ret
sub1 endp

getSum proc public frame dest : ptr, src : ptr, val1 : qword, val2 : qword
mov dest, rcx
mov src, rdx
mov val1, r8
mov val2, r9
INVOKE sub1, dest, src, val1, val2

INVOKE AutoProc, 10, 20
INVOKE AutoProc2, 10, 20
INVOKE OptimalProc, 10, 20
ret
getSum endp

; Or using AUTO mode
AutoProc PROC FRAME aVar : QWORD, bVar : DWORD

mov eax, edx
mov rdx, rcx

ret
AutoProc ENDP

; Or using AUTO mode
AutoProc2 PROC FRAME aVar : QWORD, bVar : DWORD

mov eax, edx
mov rdx, aVar

ret
AutoProc2 ENDP

; You might find people doing this to create a "bare" zero - overhead procedure.
OPTION PROLOGUE : NONE
OPTION EPILOGUE : NONE

OptimalProc PROC aVar : QWORD, bVar : DWORD

mov eax, edx ; EAX = bVar
mov rdx, rcx ; RDX = aVar

ret
OptimalProc ENDP

OPTION PROLOGUE:DEFAULTPROLOGUE
OPTION EPILOGUE:DEFAULTEPILOGUE

end



We have 3 procs, OptimalProc which is coded in the way some might to ensure minimal overhead (ie: optimal call), and two version of the same proc using the win64:11 / RSP combination.



AutoProc:
8B C2                mov         eax,edx 
48 8B D1             mov         rdx,rcx 
C3                   ret 

AutoProc2:
48 89 4C 24 08       mov         qword ptr [aVar],rcx 
8B C2                mov         eax,edx 
48 8B 54 24 08       mov         rdx,qword ptr [aVar] 
C3                   ret 

OptimalProc:
8B C2                mov         eax,edx 
48 8B D1             mov         rdx,rcx 
C3                   ret 



As you can see, there is no benefit.. the autoproc is just as efficient as the zero-overhead one, and in the case of AutoProc2 where we only reference ONE of the parameters by name, only that is copied to shadow space.

So without all the options, you still have full control inside the proc as to how efficient you want it to be.


johnsa

Quote from: aw27 on March 24, 2017, 04:38:28 AM
Quote from: johnsa on March 24, 2017, 04:21:20 AM
remove option win64:11 AND stackbase:RSP
and it reverts to working the old-fashion JWASM and ML style way, which produces this (and works on that same test project I linked to) :

But why becomes the stack not aligned in my previous example?

I think it's really that win64:11 can only work with stackbase:rsp.. the two are "married" :)

All these options don't give you any flexibility, but they open up the potential for errors/problems when the wrong combinations are used.

Could we go through every single combination and make it work in some way?
Probably, but based on my previous example I'd be hard pushed to go to that amount of effort to fix things that shouldn't even be there in the first place :)

There are so many permutations to handle that it makes it a maintenance nightmare too.

My current vote is to remove them all and replace it with 2 choices, SIMPLE / AUTO (or something along those lines) and then you still have the option to remove the default prologue/epi. as per my example or use a raw label (old school style), and that should give you every combination you need and also doesn't make getting into 64bit asm coding a horrible prospect. It keeps it as simple as it was to get into x86.

hutch--

John,

What you are suggesting here is the right move, automated stackframe for high level code, something inbetween if you can be bothered and no stack frame at all for people who know what they are doing. I think more than 4 arguments needs some form of automation but if you look at the stack overhead of loading the shadow space then writing to the stack in comparison to the duration of the vast majority of high level API and similar code, the lead and tail procedural code is trivial.

I am of the view that the pile of messy stack options you inherited from JWASM were mainly experiments that were useless, the faster you get rid of them, the more time you will have to write useful stuff.

johnsa

Agreed.

Right now my biggest challenge is trying to decide on a name for the options

OPTION WIN64:COOL
OPTION WIN64:SUPERCOOL

;)

aw27

Quote from: johnsa on March 24, 2017, 06:02:17 AM
I think it's really that win64:11 can only work with stackbase:rsp.. the two are "married" :)

I am sure you did not even notice that stackbase:rsp makes the code bigger and slower. Just test to see. Before appearing in this forum, I had tested the OPTION STACKBASE:RSP on my library with JWASM and abandoned it for such reasons.
This is point 1.

The second point is that you are breaking backward compatibility with something that works. This is never a good idea, but I can't stop you.

The third point is: the most important feature I was looking for was the capability to use XMM registers with the INVOKE statement. I thought the feature was not present in JWASM but it was there although not documented. Instead of confirming that you boasted to have included the feature in HJWASM. Please, gave me a break, lol. :badgrin:

The 4th point is that I am never convinced by arguments such as: "There are so many permutations to handle that it makes it a maintenance nightmare too". For me, you simply lost control, OPTION WIN64 always worked fine with JWASM.

So, I am going to drop out before the administrator decides to expel me, he is enfuriated and is repeating that you guys are doing a great job. Good luck then!











jj2007

Quote from: aw27 on March 24, 2017, 01:50:15 PMI am sure you did not even notice that stackbase:rsp makes the code bigger and slower.

José,

You are judging hard-working people here - that is not helpful. Automating the X64 ABI is pretty complex, and the HJWasm team and Hutch as well are doing their best to go beyond what the market offers. And they are doing it for free, as a hobby. Your contributions to solving the problem are certainly appreciated, but your judgments are unfair and unnecessary.