The MASM Forum

64 bit assembler => UASM Assembler Development => Topic started by: JK on April 05, 2021, 09:46:30 PM

Title: Stackframes
Post by: JK on April 05, 2021, 09:46:30 PM
I fear this is going to be a somewhat lenghty story, so i split it into digestible parts ...


I would like to have what i would call a "clean" stackframe in 32 and 64 bit, where "clean means:
1.) E/RBP based stackframe
2.) argument and locals work by name
3.) top of locals is at or has a constant (the same in every procedure) offset to E/RBP
4.) order of locals is not changed, i.e locals appear on the stack in the same order as they were defined in code
5.) no need to keep the stack balanced before leaving (epilogue restores E/RSP automatically)


Why?

1.) E/RSP is "free" for me to use, e.g. create temp storage on the stack on the fly
2.) makes coding so much easier, everything is relative to E/RBP and the assembler
     calculates the displacement for me
3.) i can rely on the fact, that my locals are at a distinct place relative to E/RBP,
     which allows e.g. for easily setting all or certain parts of them to zero or similar
4.) same as 3.)
5.) a mov E/RBP, E/RSP, mov E/RSP, E/RBP pair reliably restores the stack before leaving,
     meaning i can do all kinds of "things" in my procedure with the stack pointer
     (exception in 64 bit it must be 16 bit aligned before calls) and i´m not obliged to keep
     track and clean up.

I know playing with the stack must follow rules, i cannot do literally everything, but as long as i play by the rules, i can do everything (even in 64 bit).


Basically such a stackframe would look like this:

; bottom of stack
;------------------ <- R/ESP points here  |  sub R/ESP, # of local bytes (+ alignment in 64 bit)
; local ...                               |
;------------------                       |
; local 2                                 |
;------------------                       |
; local 1 (DWORD)                         |                    -> local 1 = ESP-4 / RSP-4
;------------------ <- R/EBP points here  |  mov R/EBP, R/ESP  -> arg 1   = ESP + Ch / RSP + 18h
; old R/EPB                               |
;------------------                       |  push R/EBP
; return address                          |
;------------------                       |  call...
; arg 1                                   |
;------------------                       |
; arg 2                                   |
;------------------                       |
; arg 3                                   |
;------------------                       |
; arg 4                                   |
;------------------                       |
; arg 5                                   |
;------------------                       |
; arg 6                                   |
;------------------                       |
; ...                                     |
;------------------
; top of stack


The basic layout could be the same in 32 and 64 bit. In 64 bit the space for arg 1 to arg 4 must always be there, even if there are no arguments. This space (shadow space) can be used by the called procedure to save arg 1 to arg 4, which in 64 bit are passed by register.

One thing is still missing: registers to save by the callee must be placed on the stack somewhwere
There are 3 places, which fit:
- before "old R/EPB"
- after "old R/EPB"
- after locals
In every case it is possible to automatically restore E/RSP to a correct value before leaving


Question: is there any reason why this wouldn´t work? I know that this approach is not optimized, but i want it stable and reliable in first place. If i want it better, faster, smaller, whatever ...  i can always do so by hand or using special option settings as available


to be continued ...


Title: Re: Stackframes
Post by: JK on April 05, 2021, 10:04:09 PM
As most of times i started testing with UASM. Unfortunately i cannot exactly get what i want (see preceeding post) using available options. So i tried to write my own PROLGUE/EPILOGUE. I´m a novice at this, so i might have made mistakes, but i think there is an error in UASM too.


I used the attached code (i left comments of my thoughts and results, which hopefuly makes it easier to understand, what i did and tried). To make things easier, i didn´t specify registers to save ("uses ...").


Now it´s getting complicated and i hope i can explain my findings in a comprehensible way


Looking at the generated code for "testit proc", i see that arguments are not referenced correctly (RBP displacement is off). This could be a failure in my PROLOGUE/EPILOGUE code, but it isn´t off in a consistent manner too. Comparing this with what is generated without my custom PROLOGUE/EPILOGUE, i can see the displacements are different. But also the difference between displacements is different, which should not be the case, even if the basic error in is my code.

with my custom PROLOGUE/EPILOGUE:(RBP consistently points to top of locals, order of locals is kept):

000000000124101B | 48:894C24 08             | mov qword ptr ss:[rsp+8],rcx            |
0000000001241020 | 66:0FD64C24 10           | movq qword ptr ss:[rsp+10],xmm1         |
0000000001241026 | 66:0FD65424 18           | movq qword ptr ss:[rsp+18],xmm2         |
000000000124102C | 4C:894C24 20             | mov qword ptr ss:[rsp+20],r9            |
0000000001241031 | 55                       | push rbp                                |
0000000001241032 | 48:8BEC                  | mov rbp,rsp                             |
0000000001241035 | 48:83EC 20               | sub rsp,20                              |
0000000001241039 | 8B45 40                  | mov eax,dword ptr ss:[rbp+40]           | x  +40 arg5
000000000124103C | 8B45 20                  | mov eax,dword ptr ss:[rbp+20]           | d4 +20 arg2: diff = 3 -> 3 x 8h = 18h -> not ok, is 20h
000000000124103F | 8B45 10                  | mov eax,dword ptr ss:[rbp+10]           |
0000000001241042 | 48:83EC 20               | sub rsp,20                              |
0000000001241046 | 48:8D4D E8               | lea rcx,qword ptr ss:[rbp-18]           |
000000000124104A | E8 B1FFFFFF              | call uasm_test_64.1241000               |
000000000124104F | 48:83C4 20               | add rsp,20                              |
0000000001241053 | 48:8D45 40               | lea rax,qword ptr ss:[rbp+40]           |
0000000001241057 | 48:8D55 20               | lea rdx,qword ptr ss:[rbp+20]           |
000000000124105B | 48:8D75 28               | lea rsi,qword ptr ss:[rbp+28]           |
000000000124105F | 48:8D7D 30               | lea rdi,qword ptr ss:[rbp+30]           |
0000000001241063 | 48:8D45 FC               | lea rax,qword ptr ss:[rbp-4]            |
0000000001241067 | 48:8D5D E8               | lea rbx,qword ptr ss:[rbp-18]           |
000000000124106B | 48:8D4D E4               | lea rcx,qword ptr ss:[rbp-1C]           |
000000000124106F | 48:8BE5                  | mov rsp,rbp                             |
0000000001241072 | 5D                       | pop rbp                                 |
0000000001241073 | C3                       | ret                                     |


without a custom PROLOGUE/EPILOGUE (caveat here is: locals are changed in order and locals don´t start at a constant offset from RBP, offset is different for each procedure):

00000000013C1024 | 48:894C24 08             | mov qword ptr ss:[rsp+8],rcx            |
00000000013C1029 | F3:0F114C24 10           | movss dword ptr ss:[rsp+10],xmm1        |
00000000013C102F | F2:0F115424 18           | movsd qword ptr ss:[rsp+18],xmm2        |
00000000013C1035 | 4C:894C24 20             | mov qword ptr ss:[rsp+20],r9            |
00000000013C103A | 48:55                    | push rbp                                |
00000000013C103C | 48:83EC 20               | sub rsp,20                              |
00000000013C1040 | 48:8D6C24 10             | lea rbp,qword ptr ss:[rsp+10]           |
00000000013C1045 | 8B45 40                  | mov eax,dword ptr ss:[rbp+40]           | x  +40, arg5                                     
00000000013C1048 | 8B45 28                  | mov eax,dword ptr ss:[rbp+28]           | d4 +28, arg2 diff = 3 -> 3 x 8h = 18h -> ok       
00000000013C104B | 8B45 20                  | mov eax,dword ptr ss:[rbp+20]           |
00000000013C104E | 48:83EC 20               | sub rsp,20                              |
00000000013C1052 | 48:8D4D 00               | lea rcx,qword ptr ss:[rbp]              |
00000000013C1056 | E8 A5FFFFFF              | call uasm_test_64.13C1000               |
00000000013C105B | 48:83C4 20               | add rsp,20                              |
00000000013C105F | 48:8D45 40               | lea rax,qword ptr ss:[rbp+40]           |
00000000013C1063 | 48:8D55 28               | lea rdx,qword ptr ss:[rbp+28]           |
00000000013C1067 | 48:8D75 30               | lea rsi,qword ptr ss:[rbp+30]           |
00000000013C106B | 48:8D7D 38               | lea rdi,qword ptr ss:[rbp+38]           |
00000000013C106F | 48:8D45 FC               | lea rax,qword ptr ss:[rbp-4]            |
00000000013C1073 | 48:8D5D 00               | lea rbx,qword ptr ss:[rbp]              |
00000000013C1077 | 48:8D4D F8               | lea rcx,qword ptr ss:[rbp-8]            |
00000000013C107B | 48:8D65 10               | lea rsp,qword ptr ss:[rbp+10]           |
00000000013C107F | 5D                       | pop rbp                                 |
00000000013C1080 | C3                       | ret                                     |


So while the offset in RBP displacment for arguments could be a cause of error in my PROLOGUE code, the distance between arguments should be the the same in both case, but it isn´t!


testit proc (default PROLOGUE, OPTION STACKBASE RBP, OPTION WIN65:5):
RBP = RSP + 10h after PROLOGUE -> arguments are resolved correctly

argument  x:   RSP+40, (arg # 5)                                     
argument d4:  RSP+28, (arg # 2) diff = 3 -> 3 x 8h = 18h -> ok (expected)     


testit proc (custom PROLOGUE, OPTION STACKBASE RBP, OPTION WIN65:5):
RBP = RSP + 20h after PRLOGUE -> arguments are 10h off

argument  x :  RSP+40, (arg # 5)                                     
argument d4:  RSP+20, (arg # 2) diff = 3 -> 3 x 8h = 18h -> in fact it is 20h, which is wrong


additional Question: the value returned by the PROLOGUE doesn´t seem to have any effect on the code generated,
e.g. returning differnt numbers or <0> doesn´t change anything. OTOH returning nothing throws an error. So what for is this return value?


JK
Title: Re: Stackframes
Post by: johnsa on April 05, 2021, 11:52:29 PM
I think there are a number of issues here...

1. if you use stackbase:RSP mode, then that gives you RBP free for use (if you really need an extra register).
2. the automatic prologue/epi. does a lot more than just deal with params and locals, there is alignment to consider as well as pre-allocation for the largest contained invoke within the proc
.. ie if an invoke need's to reserve 128 bytes of stack for it's params, the prologue in the parent proc already handles this which I don't think you can replicate via a custom prologue macro.
3. The locals are re-arranged to ensure that they can be packed and aligned efficiently.
4. The home-space slots are only filled if the parameter is actually used (a minor optimisation to avoid copying the reg params if they're unused, or used directly via register)

The zero'ing of locals as a batch is an issue, I removed option zerolocals as it wasn't fully implemented and it's not optimal as you frequently don't need to zero all of them, but only specific ones.
I was considering adding a new directive, LOCALZ ie. which would produce the code to zero just the specific local be it primitive/struct or array etc.

Assuming you use stackbase RSP and had LOCALZ directive, that should cover all your requirements?

Title: Re: Stackframes
Post by: JK on April 06, 2021, 12:59:30 AM
Thanks for your reply!

Quote1. if you use stackbase:RSP mode, then that gives you RBP free for use (if you really need an extra register).

I want RSP to be free in a sense so that i can do pushes and pops in my procedures. It´s not about gaining an extra register, there are more than enough in 64 bit.

Quote2. the automatic prologue/epi. does a lot more than just deal with params and locals, there is alignment to consider as well as pre-allocation for the largest contained invoke within the proc
.. ie if an invoke need's to reserve 128 bytes of stack for it's params, the prologue in the parent proc already handles this which I don't think you can replicate via a custom prologue macro.

I have seen that, but this kind of stackframe layout (with pre-allocation for the largest contained invoke within the proc) makes it impossible to do pushes and pops inside the current procedure, because depending on the space needed the pushed data might be overwritten by the next stackframe (64 bit). This is why i would like to have an RBP based stackframe (which does sub RSP, offset before the call and add RSP, offset after the call (essentially what WIN64:1 to 7 does, but without the "downsides" (my personal view) i mentioned)). The price is less optimized code, but i´m willing to pay it.


Quote3. The locals are re-arranged to ensure that they can be packed and aligned efficiently
Yes, but this way you must fill each local you want to zero one by one, which is far from efficient. By keeping the order, space is wasted, that´s true, but OTOH in 64 bit there is more than enough space - much more than we ever had in 32 bit.


Quote4. The home-space slots are only filled if the parameter is actually used (a minor optimisation to avoid copying the reg params if they're unused, or used directly via register)
This a quite useful feature IMHO, because it happens automatically. In my code i must do it by hand "<rxxr>", but it´s still possible.


There is no need to always zero all locals, but sometimes it would make things easier. My idea was writing a macro (zerolocals) taking none, one or two parameters.
- if no parameter is given, all locals are set to zero
- if one parameter is given, all locals from start to (and including) this local are set to zero
- if two parameters are given, all locals starting with the first given local up to the second  are set to zero

By arranging my locals in an appropriate order i can ensure that this can be done very effectivly. I can optimize this process inside my macro (mov a few bytes vs. rep stosb), i can even repeat this inside my procedure with different parameters for different locals in different places.


Please don´t get me wrong, i do not expect you to code this for me, just because i want to have my way! I´m willing to do it myself, but currently i cannot, because:
- possibly something goes wrong inside UASM implementing a custom PROLOGUE/EPILOGUE, see my example above
- i´m doing something wrong, so would you please help doing it right
- both of it


This is not an easy matter, i know and it gives room for discussions, but please hang on - thanks,


JK
Title: Re: Stackframes
Post by: johnsa on April 06, 2021, 01:51:20 AM
Ok, well I'd strongly advise against doing pushes and pops in 64bit, or any sort of manual stack manipulation, it's likely to be error prone and give you hard to find bugs and I can't think of a good reason to do it.
That said, feel free to do whatever you want :) The point of assembler is not to enforce controls, so in that spirit I'll check out the issue with the custom prologue and see if I can help there.

option zerolocals may still be useful too, perhaps we have both LOCALZ and the OPTION, although these days I'm tending to avoid adding more directive complexity than required. Happy to hear votes on the subject as to which is preferred.

I will have a look at the re-ordering of locals again too.
Title: Re: Stackframes
Post by: jj2007 on April 06, 2021, 02:30:56 AM
Quote from: johnsa on April 06, 2021, 01:51:20 AM
Ok, well I'd strongly advise against doing pushes and pops in 64bit, or any sort of manual stack manipulation, it's likely to be error prone and give you hard to find bugs and I can't think of a good reason to do it.

The "good reason" is that saving regs to global variables bloats the exe enormously. The only reason against pushing is that it can't be done (really, it's forbidden!) if there is any call in the proc.

The bloat argument applies also to locals if their total size exceeds 128 bytes, as discussed in ZeroLocals (http://masm32.com/board/index.php?topic=9258.msg101786#msg101786).

The same logic applies to the ordering of locals: if you put the "big" locals first, such as buffer[128]:byte, then all other variables use the long encodings, which are 3 bytes longer: bloat. Hutch has a different opinion...
Quote from: hutch-- on April 01, 2021, 10:10:16 PM
Better bloat than broken. If you are dealing with instructions that need alignment you have no choice.
... but I won't change my coding style: size matters, because the code cache is limited. That's also an argument to use rbp instead of rsp for the frame: all [rsp+x] instructions are one byte longer than [rbp+x].

Since I care for compatibility between UAsm, AsmC and MASM, I will not suggest to introduce an "align 16 once you encounter the first variable that needs it" :cool:
Title: Re: Stackframes
Post by: JK on April 06, 2021, 02:38:32 AM
QuoteHappy to hear votes on the subject as to which is preferred

If LOCALZ is meant to work like LOCAL but to simultaneously zero the listed variables, i would prefer it over an OPTION:ZEROLOCALS, because it gives more control.


What i would prefer most, is having E/RBP pointing to a fixed location inside the stack frame (preferably to the top of locals, but by all means having a constant offset to the top of locals and to the procedure´s arguments) after the PROLOGUE with E/RBP based stackframes. This makes debugging so much easier, if you cannot have symbols.

Currently RBP´s offset to the top of locals (i think i remember ESP being stable in this repect) is different in different procedures. This makes debugging harder than necessary, because for each and every procedure, i must look, where locals and arguments start in relation to RBP. It would be so much easier, if i simply could rely on:
- locals start at RBP - some fixed offset
- arguments start at RBP + some other fixed offset

This is exactly, what my proposed stack frame layout ensures. I think you had a hard time developing WIN65:15, a lot of calculations must be done for optimizing out RBP at all - and you made it work. It is great to have such an option! I´m not against optimization options, but (at least for me) in the development phase, it´s a nightmare debugging such code.

In general my coding plan is: first make it work, keep it simple, don´t make it overly complicated, be sure you can debug it. And if it works, make it better, faster smaller, whatever ...


Thanks for your help!


JK 
Title: Re: Stackframes
Post by: jj2007 on April 06, 2021, 02:52:05 AM
QuoteIt would be so much easier, if i simply could rely on:
- locals start at RBP - some fixed offset
- arguments start at RBP + some other fixed offset

That's what you get with (for example) the JBasic prolog macro: [rbp-4] is the first local, [rbp+10h] the first argument.
Title: Re: Stackframes
Post by: nidud on April 06, 2021, 03:06:35 AM
deleted
Title: Re: Stackframes
Post by: JK on April 06, 2021, 03:21:02 AM
@jj,

QuoteThe "good reason" is that saving regs to global variables bloats the exe enormously
i agree on this and the next paragraphs, but i disagree on this
QuoteThe only reason against pushing is that it can't be done (really, it's forbidden!) if there is any call in the proc.
Maybe i´m wrong, but IMHO it depends on how you build a 64 bit stack frame.

- i agree, a RSP based stack frame will obviously not work.
- but a RBP based stack frame, which makes enough room (sub RSP, ...) for the shadow space and arguments (either push arg 5 and higher, or sub RSP, ... + mov [RSP+...], ...) before the call and corrects the stack afterwards (add RSP, ...) doesn´t have these restrictions to my understanding.

You must make RSP align 16, before a call, because you cannot know, if the called (external) procedure uses locals, which actually need alignment. If such a procedure is called with wrong RSP alignment, the alignment of these locals will now become wrong as well. So depending on the procedure you might get away with RSP align 8 before a call or not. But if the stack is built like i just described, you will always get away, if you make sure RSP is aligned 16 before a call, regardless how many pushes and pops there were in between (of course you must not pop more than you pushed)


JK

Title: Re: Stackframes
Post by: daydreamer on April 06, 2021, 04:10:47 AM
Quote from: jj2007 on April 06, 2021, 02:30:56 AM
Quote from: johnsa on April 06, 2021, 01:51:20 AM
Ok, well I'd strongly advise against doing pushes and pops in 64bit, or any sort of manual stack manipulation, it's likely to be error prone and give you hard to find bugs and I can't think of a good reason to do it.

The "good reason" is that saving regs to global variables bloats the exe enormously. The only reason against pushing is that it can't be done (really, it's forbidden!) if there is any call in the proc.

The bloat argument applies also to locals if their total size exceeds 128 bytes, as discussed in ZeroLocals (http://masm32.com/board/index.php?topic=9258.msg101786#msg101786).

The same logic applies to the ordering of locals: if you put the "big" locals first, such as buffer[128]:byte, then all other variables use the long encodings, which are 3 bytes longer: bloat. Hutch has a different opinion...
Quote from: hutch-- on April 01, 2021, 10:10:16 PM
Better bloat than broken. If you are dealing with instructions that need alignment you have no choice.
... but I won't change my coding style: size matters, because the code cache is limited. That's also an argument to use rbp instead of rsp for the frame: all [rsp+x] instructions are one byte longer than [rbp+x].

Since I care for compatibility between UAsm, AsmC and MASM, I will not suggest to introduce an "align 16 once you encounter the first variable that needs it" :cool:
but pushs and pops in 32bit are onebyte instructions compared to other instructions,not only size matters,speed indirect also
the copy/zero out array movsb/stosb snippet for local array,you can directly reuse rdi as pointer afterwards



 
Title: Re: Stackframes
Post by: jj2007 on April 06, 2021, 06:03:39 AM
Quote from: nidud on April 06, 2021, 03:06:35 AMYou may get the size of this using the @ReservedStack variable.

Ups... error A2006:undefined symbol : @ReservedStack
Title: Re: Stackframes
Post by: LiaoMi on April 06, 2021, 06:54:01 AM
Quote from: jj2007 on April 06, 2021, 06:03:39 AM
Quote from: nidud on April 06, 2021, 03:06:35 AMYou may get the size of this using the @ReservedStack variable.

Ups... error A2006:undefined symbol : @ReservedStack

https://www.japheth.de/JWasm/Manual.html

3.9 Directive OPTION WIN64
Directive OPTION WIN64 allows to set parameters for the Win64 output format if this format (see -win64 cmdline option) is selected. For other output formats, this option has no effect. The syntax for the directive is:
OPTION WIN64: switches
accepted values for switches are:
Store Register Arguments [ bit 0 ]:
- 0: the "home locations" (also sometimes called "shadow space") of the first 4 register parameters are uninitialized. This is the default setting.
- 1: register contents of the PROC's first 4 parameters (RCX, RDX, R8 and R9 ) will be copied to the "home locations" within a PROC's prologue.
INVOKE Stack Space Reservation [bit 1]:
- 0: for each INVOKE the stack is adjusted to reserve space for the parameters required for the call. After the call, the space is released again. This is the default setting.
- 1: the maximum stack space required by all INVOKEs inside a procedure is computed by the assembler and reserved once on the procedure's entry. It's released when the procedure is exited. If INVOKEs are to be used outside of procedures, the stack space has to be reserved manually!
Note: an assembly time variable, @ReservedStack, is created internally when this option is set. It will reflect the value computed by the assembler. It should also be mentioned that when this option is on, and a procedure contains no INVOKEs at all, then nevertheless the minimal amount of 4*8 bytes is reserved on the stack.
Warning: You should have understood exactly what this option does BEFORE you're using it. Using PUSH/POP instruction pairs to "save" values across an INVOKE is VERBOTEN if this option is on.

https://github.com/Terraspace/UASM/blob/master/procJWasm.c
/* v2.11: use @ReservedStack only if option win64:2 is set */
Title: Re: Stackframes
Post by: JK on April 06, 2021, 08:29:34 AM
Quote1: the maximum stack space required by all INVOKEs inside a procedure is computed by the assembler and reserved once on the procedure's entry. It's released when the procedure is exited.
Thanks LiaoMi for clarfying it - this is how i interpret this option as well. Using this option saves you some otherwise necessary sub/add RSP,... , but requires RSP to remain unchanged inside a procedure (after the PROLOGUE, before the EPILOGUE). This is one way of managing stack frames in 64 bit, but it is not the only way to do it and it´s not the way, i want it.

QuoteWarning: You should have understood exactly what this option does BEFORE you're using it. Using PUSH/POP instruction pairs to "save" values across an INVOKE is VERBOTEN if this option is on
i absolutely agree, this is why i used WIN65:5 (bit 1 = 0, meaning this option is off) in my code example and this is, why i´m trying to implement my custom PROLOGUE and EPILOGUE. Among other things i want to be able change RSP inside a procedure (i know i must keep 16 bit alignment before calls).


Basically you can build your (RBP based) 64 bit stack frames just like you do it in 32 bit:
- push arguments (+ shadow space in 64 bit for argument 1 to 4),
- the call pushes the return address
      BTW. this is, what INVOKE does for me until here anyway. At this point the assembler could precalculate the space needed for locals of this procedure
     and for highest number of arguments of all INVOKEs in this procedure and set RSP accordingly (including 16 bit alignment), but it isn´t obliged to do so.
     Advantage: no further stack adjustment needed for all INVOKEs in this procedure
     Disadvantage: RSP MUST NOT be changed, no further local allocations or push/pop possible

- save registers to shadow space (optional)
- push RBP
- mov RBP, RSP
- make room for locals (sub RSP, ...)
- push non-volatile registers (this could be done before or after "push RBP" as well - at any rate corrcet alignment of RSP must be ensured as a result).

Now RSP is at the bottom of all data, which must not be overwritten. It can be freely used for whatever i want (as long as 16 alignment is ensured before calls). According to what i read about the 64 bit ABI, such a layout is not forbidden. No one forces you to build a stack frame in a way, that you must not change RSP inside procedures, this is a decision taken for optimisation reasons, but it´s not a must IMHO.


JK 
Title: Re: Stackframes
Post by: nidud on April 06, 2021, 08:48:56 AM
deleted
Title: Re: Stackframes
Post by: jj2007 on April 06, 2021, 09:20:01 AM
Quote from: JK on April 06, 2021, 08:29:34 AMNow RSP is at the bottom of all data, which must not be overwritten. It can be freely used for whatever i want (as long as 16 alignment is ensured before calls). According to what i read about the 64 bit ABI, such a layout is not forbidden. No one forces you to build a stack frame in a way, that you must not change RSP inside procedures, this is a decision taken for optimisation reasons, but it´s not a must IMHO.

Be careful. Use the FillShadowSpace macro (http://masm32.com/board/index.php?topic=9270.msg101791#msg101791) to see what happens to your shadow space if you call one of the rare WinAPI functions (Sleep, for example) that actually use it :cool:
  FillShadowSpace
   int 3
   push rsi
   push rdi
  jinvoke Sleep, 100
  pop rdi
  pop rsi
Title: Re: Stackframes
Post by: hutch-- on April 06, 2021, 01:08:31 PM
John has given the same warning that I have DON'T alter the stack with PUSH/POP instructions. If you really have to preserve data in that manner, use a LOCAL and MOV the data to it in the normal manner,

mov reg, data
mov localname, reg

Unless you enjoy p*ssing around trying to find why that app will not start, don't make a mess of the stack.

A couple of basic things here, creative genius and personal preference certainly have their place but that place is not how the mechanics of the operating system work. Get the mechanics of the OS as they are designed to work THEN apply creative genius and personal preference and you will get what you are after.
Title: Re: Stackframes
Post by: jj2007 on April 06, 2021, 06:10:32 PM
Quote from: hutch-- on April 06, 2021, 01:08:31 PMUnless you enjoy p*ssing around trying to find why that app will not start, don't make a mess of the stack

Practically all apps will start, unfortunately. The problem will bite you much later, as most WinAPI calls don't use the shadow space. Until now, I found only Sleep() being a shadow space user.

This is Win7-64. I wonder how the situation is on Win10, built with more recent compilers... any evidence from the UAsm/AsmC/ML64 developers?
Title: Re: Stackframes
Post by: hutch-- on April 06, 2021, 08:47:59 PM
With about 4 years of practice, I hve yet to see misaligned procedures do anything else than not start. No doubt the OS loader will do something but the notion of "start" is the app appearing on the screen and with misaligned code via stack blunders, you try and run it and nothing happens and you don't get told anything either.

Effectively the OS loader spits the dummy and the app will not run.

This is on Win 10 64 bit which I have been using for the last 5 years.
Title: Re: Stackframes
Post by: jj2007 on April 06, 2021, 09:16:29 PM
Quote from: hutch-- on April 06, 2021, 08:47:59 PMI hve yet to see misaligned procedures do anything else than not start.

Most developers grasp quickly the concept of align 16. The real issue are the subtle problems that may arise when you push+pop pairwise. It looks good because the stack remains aligned for use with xmm regs, and most of the time nothing bad happens, but...
Title: Re: Stackframes
Post by: KradMoonRa on April 06, 2021, 09:55:22 PM
Still learning the bad way...

Part of the assembly that I' have to code, and to research plataform performance SIMD beneficts in UASM.

Not only I have to align the stack before a call to procedure, I have to save stack space for all the defaults arguments and only the defaults arguments for the platform convention, 4*8+1*8 Win, 6*8+1*8 Lin.

Some times using sp as beneficts.
Some times abusing sp as segment faults.

This run/runs perfectly....


; Constructor
procstart _uasm_CPUFeatures_Init, callconv, void, < >, < >, infolevel:dword
    ifdef __x32__
        ifdef __windows__
            mov     __uasm_dt_CPUFeatures_infolevel,       dp0()
            xor             dp0(),                    dp0()
        endif
        ifdef __unix__
            mov     __uasm_dt_CPUFeatures_infolevel,       infolevel
            ;mov             [dp0()+4],                null
        endif
    endif ;__x32__
    ifdef __x64__
            mov     __uasm_dt_CPUFeatures_infolevel,       dp0()
            xor             dp0(),                    dp0()
    endif ;__x64__

    ifdef __x32__
            push                ebx
                                                                            ; detect if cpuidinstruction supported by microprocessor:
            pushfd
            pop                 eax
            btc                 eax,                    21                  ; check if cpuidbit can toggle
            push                eax
            popfd
            pushfd
            pop                 ebx
            xor                 ebx,                    eax
            bt                  ebx,                    21
            jc                  CPUInitNoID                                 ; cpuidnot supported
            xor                 eax,                    eax                 ; 0
            ; /* %eax=00H, %ecx %ebx */
            mov     __uasm_dt_CPUFeatures_CPUID,             true

            cpuid                                                           ; get number of cpuidfunctions
            test                eax,                    eax
            jnz                 CPUInitIdentificable                        ; function 1 not supported
    CPUInitNoID:
            .if (__uasm_dt_CPUFeatures_infolevel >= 1) ;infolevel >= 1
            push                edi
            ; processor has no CPUID
            mov      dword ptr [edi],                   '8038'              ; Write text '80386 or 80486'
            mov      dword ptr [edi+4],                 '6 or'
            mov      dword ptr [edi+8],                 ' 804'
            mov      dword ptr [edi+12],                '86'                ; End with 0

            mov     __uasm_dt_CPUFeatures_ProcessorName,     edi            ; Pointer to result
            pop                 edi
            .endif
            pop                 ebx
            jmp                 CPUInitEND
    endif ;__x32__

    ifdef __x32__
            pop                 ebx
    CPUInitIdentificable:
            push                ebp
            mov                 ebp,                esp
            sub                 esp,                16          ; 3*4=12+4 Align 8
            ;mov                [esp],               esp
            mov                [esp],               ebx
            mov                [esp+4],             esi
            mov                [esp+8],             edi
            ;push                esp
            ;push                ebp
            ;push                ebx
            ;push                esi
            ;push                edi
    endif ;__x32__
    ifdef __x64__
            push                rbp
            mov                 rbp,                rsp
        ifdef __windows__
            sub                 rsp,                64          ; 7*8=56+8 Align 16
        else
            sub                 rsp,                48          ; 5*8=40+8 Align 16
        endif
            ;mov                [rsp],               rsp
            mov                [rsp],               rbx
        ifdef __windows__
            mov                [rsp+8],             rsi
            mov                [rsp+16],            rdi
            mov                [rsp+24],            r11
            mov                [rsp+32],            r12
            mov                [rsp+40],            r14
            mov                [rsp+48],            r15
        else
            mov                [rsp+8],             r11
            mov                [rsp+16],            r12
            mov                [rsp+24],            r14
            mov                [rsp+32],            r15
        endif
            ;push                rbx
        ;ifdef __windows__
            ;push                rsi
            ;push                rdi
        ;endif
            ;push                rsp
            ;push                rbp
            ;push                r11
            ;push                r12
            ;push                r14
            ;push                r15
    endif ;__x64__
    ;.........................blablablablablas------------------
    ;.........................blablablablablas------------------
    ;.........................blablablablablas------------------
    ;.........................blablablablablas------------------
    ;.........................blablablablablas------------------

not_supported:
    ifdef __x32__
            mov                 edi,               [esp+8]
            mov                 esi,               [esp+4]
            mov                 ebx,               [esp]
            ;mov                 esp,               [esp]
            add                 esp,                16
            mov                 esp,                ebp
            pop                 ebp
            ;pop                edi
            ;pop                esi
            ;pop                ebx
            ;pop                ebp
            ;pop                esp
    endif ;__x32__
    ifdef __x64__
        ifdef __windows__
            mov                 r15,               [rsp+48]
            mov                 r14,               [rsp+40]
            mov                 r12,               [rsp+32]
            mov                 r11,               [rsp+24]
            mov                 rdi,               [rsp+16]
            mov                 rsi,               [rsp+8]
        else
            mov                 r15,               [rsp+32]
            mov                 r14,               [rsp+24]
            mov                 r12,               [rsp+16]
            mov                 r11,               [rsp+8]
        endif
            mov                 rbx,               [rsp]
            ;mov                 rsp,               [rsp]
        ifdef __windows__
            add                 rsp,                64
        else
            add                 rsp,                48
        endif
            mov                 rsp,                rbp
            pop                 rbp
            ;pop                r15
            ;pop                r14
            ;pop                r12
            ;pop                r11
        ;ifdef __windows__
            ;pop                rdi
            ;pop                rsi
        ;endif
            ;pop                rbx
            ;pop                rbp
            ;pop                rsp
    endif ;__x64__
    CPUInitEND:  ; finished
            ret
procend



public main
main proc (dword) argc:dword, argv:ptr ptr byte, envp:ptr ptr byte
    ; space for 4 arguments + 16byte aligned stack
    sub             rsp,            28h
    call            _uasm_CPUFeatures_Init
    call            _uasm_CPUFeatures_ProcessorName
    mov             rp0(),          rret()
    call            printf
    mov             rp0(),          cstr(stringwith, " With caches sizes:"," L1= ","%I64d"," bytes, L2= ", "%I64d"," bytes, L3= ","%I64d"," bytes.")
    mov             rp1(),          __uasm_dt_CPUFeatures_DataCacheSizeL1
    mov             rp2(),          __uasm_dt_CPUFeatures_DataCacheSizeL2
    mov             rp3(),          __uasm_dt_CPUFeatures_DataCacheSizeL3
    call            printf
    mov             rp0(),          cstr(datawith, 10,"With:",10,0)
    call            printf
    call            _uasm_CPUFeatures_Fin
    xor             eax,            eax
    xor             ecx,            ecx
    call            exit
    add             rsp,            28h
    ret
main endp
Title: Re: Stackframes
Post by: JK on April 07, 2021, 12:10:25 AM
@jj (re post #15)

i give another example, consider this code:
;option stackbase:RBP
;option win64:7

include windows.inc
includelib kernel32.lib
includelib user32.lib                                 ;DrawTextEx


.code


testit proc uses rbx rsi rdi, x:dword, f4:real4, z:qword, f8:real4, n:qword
;*************************************************************************************
; proc
;*************************************************************************************
local tx :qword                                       ;-8h     size 8
LOCAL ty :qword                                       ;-10h    size 8
                                                      ;total size of locals = 10h

  nop
int 3

  lea rax, x                                          ;address of first argument
  lea rax, tx                                         ;address of first local


mov rax, 0ABCDEFh
push rax
push rax
push rax
push rax
push rax
push rax

  invoke Sleep, 10
;  invoke DrawTextEx, 0, 0, 0, 0, 0, 0                 ;sub rsp,38 -> sub rsp,48 (6 arguments)

pop RAX
pop RAX
pop RAX
pop RAX
pop RAX
pop RAX


ret


testit endp


;*************************************************************************************


start proc uses rbx rsi rdi ;r15
;*************************************************************************************
; main proc
;*************************************************************************************
local x  :dword                                       ;-4h      size 4
local f4 :real4                                       ;-8h      size 4
local z  :Qword                                       ;-10h     size 8
local f8 :REAL8                                       ;-18h     size 8
local n  :qword                                       ;-20h     size 8
local r  :RECT                                        ;-30h     size 10h


  nop                                                 ;procedure code starts here
  lea rax, x                                          ;address of first local

  nop                                                 ;invoke starts here
  invoke testit, x, 1.4, z, f8, n                     ;5 arguments
  invoke ExitProcess, 0
  ret


start endp


end start


please compile it once like it is and another time with options set (remove comment in the first two lines), the resulting code will be fundamentally different in how the stack pointer moves.

Version 1:
;*************************************************************************************
; wo. any options set -> RBP points to top of locals, 1. arg is RBP + 10
;*************************************************************************************
;00000000012C1000 | 55                       | push rbp                                |
;00000000012C1001 | 48:8BEC                  | mov rbp,rsp                             |
;00000000012C1004 | 48:83C4 F0               | add rsp,FFFFFFFFFFFFFFF0                | 10h for locals
;00000000012C1008 | 53                       | push rbx                                |
;00000000012C1009 | 56                       | push rsi                                |
;00000000012C100A | 57                       | push rdi                                |
;00000000012C100B | 90                       | nop                                     |
;00000000012C100C | CC                       | int3                                    |
;00000000012C100D | 48:8D45 10               | lea rax,qword ptr ss:[rbp+10]           | 1. argument
;00000000012C1011 | 48:8D45 F8               | lea rax,qword ptr ss:[rbp-8]            | 1. local
;00000000012C1015 | 48:C7C0 EFCDAB00         | mov rax,ABCDEF                          |
;00000000012C101C | 50                       | push rax                                |
;00000000012C101D | 50                       | push rax                                |
;00000000012C101E | 50                       | push rax                                |
;00000000012C101F | 50                       | push rax                                |
;00000000012C1020 | 50                       | push rax                                |
;00000000012C1021 | 50                       | push rax                                |
;00000000012C1022 | 48:83EC 20               | sub rsp,20                              | make room for 4 arguments
;00000000012C1026 | B9 0A000000              | mov ecx,A                               | arg 1
;00000000012C102B | FF15 CF0F0000            | call qword ptr ds:[<&Sleep>]            |
;00000000012C1031 | 48:83C4 20               | add rsp,20                              | correct stack
;00000000012C1035 | 58                       | pop rax                                 |
;00000000012C1036 | 58                       | pop rax                                 |
;00000000012C1037 | 58                       | pop rax                                 |
;00000000012C1038 | 58                       | pop rax                                 |
;00000000012C1039 | 58                       | pop rax                                 |
;00000000012C103A | 58                       | pop rax                                 |
;00000000012C103B | 5F                       | pop rdi                                 |
;00000000012C103C | 5E                       | pop rsi                                 |
;00000000012C103D | 5B                       | pop rbx                                 |
;00000000012C103E | C9                       | leave                                   |
;00000000012C103F | C3                       | ret                                     |
;00000000012C1040 | 55                       | push rbp                                |
;00000000012C1041 | 48:8BEC                  | mov rbp,rsp                             |
;00000000012C1044 | 48:83C4 D0               | add rsp,FFFFFFFFFFFFFFD0                |
;00000000012C1048 | 53                       | push rbx                                |
;00000000012C1049 | 56                       | push rsi                                |
;00000000012C104A | 57                       | push rdi                                |
;00000000012C104B | 90                       | nop                                     |
;00000000012C104C | 48:8D45 FC               | lea rax,qword ptr ss:[rbp-4]            | 1. local (DWORD)
;00000000012C1050 | 90                       | nop                                     | invoke starts here
;00000000012C1051 | 48:83EC 30               | sub rsp,30                              | make room for 5 arguments
;00000000012C1055 | 8B4D FC                  | mov ecx,dword ptr ss:[rbp-4]            | arg 1
;00000000012C1058 | B8 3333B33F              | mov eax,3FB33333                        |
;00000000012C105D | 66:0F6EC8                | movd xmm1,eax                           | arg 2
;00000000012C1061 | 4C:8B45 F0               | mov r8,qword ptr ss:[rbp-10]            | arg 3
;00000000012C1065 | 66:0F6E5D E8             | movd xmm3,dword ptr ss:[rbp-18]         | arg 4
;00000000012C106A | 48:8B45 E0               | mov rax,qword ptr ss:[rbp-20]           |
;00000000012C106E | 48:894424 20             | mov qword ptr ss:[rsp+20],rax           | arg 5
;00000000012C1073 | E8 88FFFFFF              | call uasm_stack_test_64.12C1000         |
;00000000012C1078 | 48:83C4 30               | add rsp,30                              | correct stack
;00000000012C107C | 48:83EC 20               | sub rsp,20                              | make room for 4 arguments
;00000000012C1080 | 33C9                     | xor ecx,ecx                             |
;00000000012C1082 | FF15 800F0000            | call qword ptr ds:[<&RtlExitUserProcess |
;00000000012C1088 | 48:83C4 20               | add rsp,20                              |
;00000000012C108C | 5F                       | pop rdi                                 |
;00000000012C108D | 5E                       | pop rsi                                 |
;00000000012C108E | 5B                       | pop rbx                                 |
;00000000012C108F | C9                       | leave                                   |
;00000000012C1090 | C3                       | ret                                     |


Version 2:
;*************************************************************************************
; Option stackbase:rbp + option win65:7
;*************************************************************************************
;00000000013A1000 | 894C24 08                | mov dword ptr ss:[rsp+8],ecx            | copy arg 1 to shadow space
;00000000013A1004 | 48:55                    | push rbp                                |
;00000000013A1006 | 53                       | push rbx                                |
;00000000013A1007 | 56                       | push rsi                                |
;00000000013A1008 | 57                       | push rdi                                |
;00000000013A1009 | 48:83EC 38               | sub rsp,38                              | make room for locals (10h)
;                                                                                      | + next call (20h)
;                                                                                      | + 8 bit stack alignment = 38h
;00000000013A100D | 48:8D6C24 30             | lea rbp,qword ptr ss:[rsp+30]           |
;00000000013A1012 | 90                       | nop                                     |
;00000000013A1013 | CC                       | int3                                    |
;00000000013A1014 | 48:8D45 30               | lea rax,qword ptr ss:[rbp+30]           | 1. argument
;00000000013A1018 | 48:8D45 F8               | lea rax,qword ptr ss:[rbp-8]            | 1. local
;00000000013A101C | 48:C7C0 EFCDAB00         | mov rax,ABCDEF                          |
;00000000013A1023 | 50                       | push rax                                |
;00000000013A1024 | 50                       | push rax                                |
;00000000013A1025 | 50                       | push rax                                |
;00000000013A1026 | 50                       | push rax                                |
;00000000013A1027 | 50                       | push rax                                |
;00000000013A1028 | 50                       | push rax                                |
;00000000013A1029 | B9 0A000000              | mov ecx,A                               | arg 1
;00000000013A102E | FF15 CC0F0000            | call qword ptr ds:[<&Sleep>]            |
;00000000013A1034 | 58                       | pop rax                                 |
;00000000013A1035 | 58                       | pop rax                                 |
;00000000013A1036 | 58                       | pop rax                                 |
;00000000013A1037 | 58                       | pop rax                                 |
;00000000013A1038 | 58                       | pop rax                                 |
;00000000013A1039 | 58                       | pop rax                                 |
;00000000013A103A | 48:8D65 08               | lea rsp,qword ptr ss:[rbp+8]            |
;00000000013A103E | 5F                       | pop rdi                                 |
;00000000013A103F | 5E                       | pop rsi                                 |
;00000000013A1040 | 5B                       | pop rbx                                 |
;00000000013A1041 | 5D                       | pop rbp                                 |
;00000000013A1042 | C3                       | ret                                     |
;00000000013A1043 | 48:55                    | push rbp                                |
;00000000013A1045 | 53                       | push rbx                                |
;00000000013A1046 | 56                       | push rsi                                |
;00000000013A1047 | 57                       | push rdi                                |
;00000000013A1048 | 48:83EC 68               | sub rsp,68                              | make room for locals + next call
;00000000013A104C | 48:8D6C24 50             | lea rbp,qword ptr ss:[rsp+50]           |
;00000000013A1051 | 90                       | nop                                     |
;00000000013A1052 | 48:8D45 E4               | lea rax,qword ptr ss:[rbp-1C]           | 1. local (DWORD)
;00000000013A1056 | 90                       | nop                                     | invoke starts here
;00000000013A1057 | 8B4D E4                  | mov ecx,dword ptr ss:[rbp-1C]           | arg 1
;00000000013A105A | B8 3333B33F              | mov eax,3FB33333                        |
;00000000013A105F | 66:0F6EC8                | movd xmm1,eax                           | arg 2
;00000000013A1063 | 4C:8B45 F8               | mov r8,qword ptr ss:[rbp-8]             | arg 3
;00000000013A1067 | 66:0F6E5D F0             | movd xmm3,dword ptr ss:[rbp-10]         | arg 4
;00000000013A106C | 48:8B45 E8               | mov rax,qword ptr ss:[rbp-18]           |
;00000000013A1070 | 48:894424 20             | mov qword ptr ss:[rsp+20],rax           | arg 5
;00000000013A1075 | E8 86FFFFFF              | call uasm_stack_test_64.13A1000         |
;00000000013A107A | 33C9                     | xor ecx,ecx                             |
;00000000013A107C | FF15 860F0000            | call qword ptr ds:[<&RtlExitUserProcess |
;00000000013A1082 | 48:8D65 18               | lea rsp,qword ptr ss:[rbp+18]           |
;00000000013A1086 | 5F                       | pop rdi                                 |
;00000000013A1087 | 5E                       | pop rsi                                 |
;00000000013A1088 | 5B                       | pop rbx                                 |
;00000000013A1089 | 5D                       | pop rbp                                 |
;00000000013A108A | C3                       | ret                                     |


when stepping through the code you will see, that in the first version all pushes in "testit" are left as the are, while in the second version 4 of them (Sleep´s shadow space) are overwritten by zero!

The first version makes room for the arguments of a new call before each new call and corrects the stack to where is was after the call. Therefore the called procedure can do with it´s arguments and it´s shadow space whatever it pleases, without affecting, what has been pushed before.

The second version sets RSP once (and for all) at procedure entry to a value, which will be suitable for all calls in this procedure. That is, it pre-calculates (see, what happens to "sub rsp,38" if you uncomment the DrawTextEx line, it turns to "sub rsp,48") the highest space needed for the arguments of coming calls, then adds the required space for the locals of this procedure and finally adjusts RSP to 16 bit. This saves all "sub RSP, ... - add RSP, ..." pairs enclosing calls in the first version. It is more efficient in this respect!

But with this approach pushes before calls are not possible, because the called procedure might overwrite, what has been pushed (your example).

So my claim is: with a stack pointer handling like in version 1 pushes and pops aren´t a problem, with version 2 pushes and pops will definitely cause problems, as you (and hutch and others) pointed out.


JK
Title: Re: Stackframes
Post by: jj2007 on April 07, 2021, 12:15:26 AM
Quote from: JK on April 07, 2021, 12:10:25 AMplease compile it once like it is

Sorry, I can't compile it. My environment variables are not set to any path, so "include windows.inc" will not work. Besides, there are three or four competing 64-bit SDKs around (plus my own), and the info on how to use them is scattered all over the place. Afaik none of them has an installer comparable to the Masm32 SDK, so I watch with awe what all of you are doing, but if I assemble 64-bit code it's with JBasic only...

Quote from: jj2007 on March 30, 2021, 02:08:44 AM
Attached the installer of the JBasic library
Title: Re: Stackframes
Post by: hutch-- on April 07, 2021, 12:52:08 AM
I am still fascinated at how you guys are trapped with STDCALL from Win32 in 64 bit. push/call notation belongs to a bygone era. If you need to save and restore registers, just create a LOCAL and MOV the register into the LOCAL.

; pseudo code
LOCAL .rax :QWORD
; .....
mov .rax, rax
; on exit
mov rax, .rax
Title: Re: Stackframes
Post by: JK on April 07, 2021, 01:02:23 AM
@jj,

sorry i cannot supply an istaller! This is. what i use in a batch file:

Assembler: UASM V2.52: (...\uasm64.exe /c -win64 -Zp8 /win64 /D_WIN64 /Cp /W2 /I ...)
Linker: MS´s link.exe V 14.20.27508.1: (...\LINK.EXE /LARGEADDRESSAWARE:NO /SUBSYSTEM:CONSOLE /RELEASE /VERSION:4.0 /MACHINE:X64 /LIBPATH: ...)
Include files: http://www.terraspace.co.uk/WinInc209.zip
Lib files: MASM64 SDK


JK
Title: Re: Stackframes
Post by: jj2007 on April 07, 2021, 01:04:22 AM
Quote from: JK on April 07, 2021, 01:02:23 AM
Lib files: MASM64 SDK

:rolleyes:
Title: Re: Stackframes
Post by: JK on April 07, 2021, 02:10:53 AM
MASM64 SDK: http://www.masm32.com/download/install64.zip
Title: Re: Stackframes
Post by: jj2007 on April 07, 2021, 02:45:14 AM
Quote from: JK on April 07, 2021, 02:10:53 AM
MASM64 SDK: http://www.masm32.com/download/install64.zip

Over two years old, so it can't be the current version. Do you think the \Masm32\install64\m64lib stuff will still work?
Title: Re: Stackframes
Post by: JK on April 07, 2021, 03:08:25 AM
Come on, jj ...
Title: Re: Stackframes
Post by: johnsa on April 07, 2021, 04:30:53 AM
I wouldn't link with /largeaddressaware:no , that shouldn't be necessary for a 64bit image, you're basically telling the OS that the 64bit exe can't deal with addresses > 4gb which will come back and bite you later if you want to do anything with large memory allocs or file mapping etc.

I see JKs idea, he wants to be able to do whatever he feels like inside his proc, if that means pushing/popping so be it.
I personally don't see the point, I just want to get stuff done and care not about the stack frame or alignment etc.. most of those nano level optimisations will have no positive benefit (but this is just me). Even with the clever prologue/epilogue optimisations an assembler is still no match for a compiler that can auto-inline, tail-call eliminations etc. If that level of optimisation is a concern then in some regards C is actually a better general purpose bet. (I hate to say it as a die-hard assembler fanatic).

If you really want to abuse yourself, you could forego using PROC at all and go oldskool, or make a custom invoke/proc macro.
IE...  XPROC myFunction, arg1, DWORD, arg2, QWORD and then generate a basic prologue with no automatic reservation.

Just a thought :)
Title: Re: Stackframes
Post by: hutch-- on April 07, 2021, 03:54:09 PM
 :biggrin:

> Do you think the \Masm32\install64\m64lib stuff will still work?

You must have confused this with the risky junk you keep posting with manually tweaked stack frames and pushed and pops. That library, even being out of date still built super reliable executables that still run perfectly.

Also note that the masm64 library is not redistributable. It is copyright freeware, not open sauce.

I extend to the Watcom derivatives the level of support I get from them, nothing.
Title: Re: Stackframes
Post by: jj2007 on April 07, 2021, 08:01:34 PM
Quote from: hutch-- on April 07, 2021, 03:54:09 PMthe risky junk you keep posting with manually tweaked stack frames and pushed and pops.

You are not talking to me, right?  :biggrin:
Title: Re: Stackframes
Post by: hutch-- on April 08, 2021, 12:18:56 AM
 :biggrin:

Would I tell a lie ? After your years long crusade against 64 bit and MASM in particular with dodgy code and unreliable technical data, why would I lie about it ?  :tongue:
Title: Re: Stackframes
Post by: jj2007 on April 08, 2021, 12:24:36 AM
Any examples for "dodgy code and unreliable technical data"? I'm curious.
Title: Re: Stackframes
Post by: hutch-- on April 08, 2021, 08:34:54 AM
 :biggrin:

Yeah, look at you last few months of postings about twiddling the stack with push and pop.  :tongue:
Title: Re: Stackframes
Post by: jj2007 on April 08, 2021, 08:43:14 AM
Quote from: hutch-- on April 08, 2021, 08:34:54 AM
:biggrin:

Yeah, look at you last few months of postings about twiddling the stack with push and pop.  :tongue:

Forum search for push pop and jj2007 does not find posts where I argued for pushing & popping in 64-bit code. Probably you are confusing me with another member :cool:
Title: Re: Stackframes
Post by: hutch-- on April 08, 2021, 12:30:16 PM
 :biggrin:

You may need a different search criteria.