News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Testing Stack Alignment Technique

Started by rrr314159, January 06, 2015, 02:25:40 PM

Previous topic - Next topic

rrr314159

To handle 64-bit stack alignment / parameter space issues we can align at the beginning of the routine, reserve necessary stack space before calling a Windows function, then clean up when it returns. For our own routines nothing special need be done, unless of course you want to use the Windows calling convention in your own code. This works fine in simple examples, but I was worried that something might go wrong in real code. So, before writing wrappers for all my necessary Windows calls, I decided to stress-test it a bit. The technique works fine, calling down as deep as 100 levels, and throwing in extra calls and pushes; why shouldn't it? The attached routine, TestStringAlignment.asm, uses recursion to stress the stack; it may be of interest to someone, so here it is. Admittedly, it's primitive; next step, I'll macro-ize the wrapper technique.

Please let me know if you have a better way, or see a mistake. Possibly no one will even look at it, since it's quite useless, so as a "teaser" I offer the following challenge: can you figure what the output will be without running it?

;; TestStackAlignment.asm, by rrr314159 1/5/15
; \bin\JWasm -win64 TestStackAlignment.asm
; \bin\link /subsystem:console TestStackAlignment.obj

; The purpose of this routine is to stress-test the stack alignment technique,   
; making sure it works correctly at all levels of the calling stack, with
; random extra calls and pushes inserted. 4 and 5 Parameters are sent to Windows
; routines, demonstrating that it works when parameter space exceeds 20h.

    sprintf proto :ptr SBYTE, :ptr SBYTE, :VARARG
    printf proto :ptr SBYTE, :VARARG
    ExitProcess proto :DWORD
    includelib \lib\kernel32.lib
    includelib \lib\msvcrt.lib
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
    StringBuffer db 2560 dup (0)
    teststring db "Hi, I'm a test string", 0
    fmtstrpiiff db "RSP %x, level %i, %.4g, %.9g", 10, 0
    fmtstrpiiS db "RSP %x, level %i, %s", 10,0
    fmtstriiff db "rsp %x, level %i, %.4g, %.9g", 10, 0
    fmtstriiS db "rsp %x, level %i, %s", 10,0
    arealvalue REAL8 -130.3
    arealvalpi REAL8 3.14159265
    toggle1 dd 0
    toggle2 dd 0
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
; »»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
start:
    and rsp, -10h
    lea rcx, StringBuffer   ; point rcx at begin of String Buffer
    mov r15, 22             ; go down 21 levels
    CALL RecursiveTest      ; launch the test routine
    lea rcx, StringBuffer   ; print out the String Buffer
    call printf
    xor ecx, ecx
call ExitProcess
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
RecursiveTest:  ; Recursive test of Stack Alignment technique
            ; call the 4 (s)print routines more-or-less randomly,
            ; with various calls and pushes between them, to
            ; demonstrate robustness of Stack Alignment technique
    dec r15
    je donerecurse
        cmp r15, 2      ; these randomly selected levels will
        je @F           ; call piiS and spiiS, toggling between them
        cmp r15, 10
        je @F
        cmp r15, 13
        je @F
        cmp r15, 20
        je @F
        cmp r15, 21
        je @F
        jmp continue
        @@:
            cmp toggle1, 0  ; toggle piiS and spiiS
            je call1
                call piiS
                jmp done1
            call1:
                call spiiS
            done1:
            sub toggle1, 1
            neg toggle1
            jmp callRecursiveTest
        continue:
        cmp r15, 4      ; these randomly selected levels skip printing
        je @F           ; those left will call piiS and spiiS, toggling
        cmp r15, 7
        je @F
        cmp r15, 14
        je @F
         cmp r15, 17
        je @F
        cmp r15, 18
        je @F
            cmp toggle2, 0  ; toggle piiff and spiif
            je call2
                call piiff
                jmp done2
            call2:
                call spiiff 
            done2:
            sub toggle2, 1
            neg toggle2
        @@:
    callRecursiveTest:
        CALL RecursiveTest  ; call "this" routine recursively
    donerecurse:
ret
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
piiS:   ; Print rsp, level (r15) and a test string
    push rcx
    lea r9, teststring
    mov r8, r15
    mov rdx, rsp
    lea rcx, fmtstrpiiS
    sub rsp, 20h
    call printf
    add rsp, 20h
    pop rcx
ret
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
spiiS:  ; Use sprintf to put formatted string with rsp,
        ; level (r15) and a test string in buffer for later printing
    lea rsi, teststring
    push rsi
    mov r9, r15
    mov r8, rsp
    lea rdx, fmtstriiS
    sub rsp, 20h
    call sprintf
    add rsp, 28h
ret
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
piiff:   ; Print rsp, level (r15) and two real numbers
    push rcx
    lea rsi, arealvalpi
    push REAL8 PTR [rsi]
    lea rsi, arealvalue
    mov r9, REAL8 PTR [rsi]
    mov r8, r15
    mov rdx, rsp
    lea rcx, fmtstrpiiff
    sub rsp, 20h
    call printf
    add rsp, 28h
    pop rcx
ret
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
spiiff: ; Use sprintf to put formatted string with rsp,
        ; level (r15) and two reals in buffer for later printing
    lea rsi, arealvalpi
    push REAL8 PTR [rsi]
    lea rsi, arealvalue
    push REAL8 PTR [rsi]
    mov r9, r15
    mov r8, rsp
    lea rdx, fmtstriiff
    sub rsp, 20h
    call sprintf
    add rsp, 30h
ret
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
I am NaN ;)

habran

Hi rrr314159
There is no need to adjust the stack for each call in the function
Here is well explained stack: http://www.codemachine.com/article_x64deepdive.html
If you use my version of JWasm with:

option casemap : none
option win64 : 11
option frame : auto
option stackbase : rsp

you will not need to worry about aligning the stack
Cod-Father

rrr314159

#2
I could be wrong but I think my example is a bit different. I'm calling Windows functions (printf and sprintf), all at different levels of the stack.

Habran: ... There is no need to adjust the stack for each call in the function ...

- that's true when the functions are at the same level. If there are a number of functions called one after the other, of course the shadow space can be reserved once at the beginning and will then be available for all of them. For instance when launching a window all of these are often called sequentially:

LoadIconA
LoadCursorA
RegisterClassExA
CreateWindowExA
ShowWindow
GetMessageA
TranslateMessage
DispatchMessageA

Since CreateWindowExA requires 60h we "sub rsp, 60h" once at the beginning and add it back when leaving the routine. The space is reused by each in turn. (Of course you know this, I only go thru it to show I know it too).

However my tester never does that. I'm making 14 calls to the print routines, every one on a different level. It looks like this:

start:
...
call printf (reserving 20h, or more)
call myroutine
       ...
       call sprintf (20h or more again)
       call myroutine
            ...
                 call printf (20h or more again)
                 call myroutine
                       etc..................
                          etc..................

...down 21 levels. Actually it's not that simple, I'm using recursion and mixing in extra calls and push/pop's, but this shows the idea. Now, surely every one of these calls requires its own new shadow/parameter space! The previously reserved spaces would be far above on the stack, even if I had left them reserved (i.e., didn't add 20h back to rsp), and there would be numerous return addresses between that space and the new, lower level invocation. Unless I'm really missing something.

Also (just to dot the i's and cross the t's) only one of these print calls occurs in a leaf function (where 60h is always automatically available).

Indeed that was the whole point of my test routine, to call these functions at different levels and with the stack being used for other things to make sure nothing would go wrong. Since stack alignment was mentioned often as an issue, I feared I was missing something. So I reread everything available, and wrote this tester.

I checked out your JWasm version and am intending to use it. I see you automatically reserve parameter space when a proc is called (and take care of other housekeeping details). Very importantly (for my needs), you also check if more parameter space is needed by a later invoke and reserve that at the beginning. From the examples it's not clear you do this for external (e.g. Windows) routines, but no doubt that's the case. For that reason I stopped writing a set of macro wrappers for the ones I use, so you've already saved me some tedious work. Also I need AVX2, hadn't yet realized regular JWasm doesn't support it.

BTW I appreciate what you're doing here. You're taking the best assembler available and trying to make it so far ahead of the pack that the whole community will rally 'round it. By reducing fragmentation we (the x64 assembler community) will be much more effective and can hope to see assembler become, once again, a standard part of every professional's arsenal. It really hurts the cause to have half-a-dozen assemblers competing for everyone's attention.

However for the moment I'm doing nothing but macros! In the past I've used them only superficially, and as I delved into 64-bit issues I began to realize that I was, metaphorically speaking, working with one hand tied behind my back. The power of advanced macro capability ("advanced" to me, that is, probably not you or anyone else here) is far more important (IMHO) than 64 bits vs. 32. I'll be up to speed soon and can get back to "HJWasm".

Well, thanks very much for your reply! Judging by your impression that I was just calling functions on the same level and, ridiculously, subtracting and adding the same 20h+ over and over again, others may have thot the same; so, thanks for this opportunity to clarify.

I am NaN ;)

habran

Hi rrr314159
Thank you for the kind words, I am glad that you are clever enough to spot good things
If you use option stackbase : rsp, you don't need to worry any more how many levels you go because the same space is reused
However, you are not allowed to use push and pop but use spare registers or locals to keep the data.
I appreciate your logic and approach to do things :biggrin:
When you write something, don't write it for yourself but for all of us.
Write it so that it is easy to follow and understand your code.
I believe in you and am looking forward for your contribution to this community 8)

BTW Marcus Aurelius was maybe right for his time but he did not have internet
I feel sorry for people without a sense of humor, they life must be unbearable
Cod-Father

rrr314159

Thanks habran I am still a bit uncertain on this issue and was not aware of that option. will look it up. When I post code for others to read in the future it will be better commented and less convoluted ... undoubtedly that approach will make it easier for me to comprehend also. Marcus was not, actually, right for his own times either.
I am NaN ;)

habran

QuoteMarcus was not, actually, right for his own times either.
I am glad to hear that. :biggrin:

Take your time, experiment, look inside code with a debugger (MSVC or WinDbg)
You can also compile some C code with asm option:
Propertiy->Configuration Properties->C/C++->Output Files->Assembler Out->Assembly With Source Code (/FAs)

Cod-Father

dedndave

option casemap : none
option win64 : 11
option frame : auto
option stackbase : rsp


i think it's good to understand what goes on, under the hood
i am not learning anything from the options - lol

habran

Quoteenum win64_flag_values {
    W64F_SAVEREGPARAMS = 0x01, /* 1=save register params in shadow space on proc entry */
    W64F_AUTOSTACKSP     = 0x02, /* 1=calculate required stack space for arguments of INVOKE */
    W64F_STACKALIGN16    = 0x04, /* 1=stack variables are 16-byte aligned; added in v2.12 */
    W64F_SMART                 = 0x08, /* 1=takes care of everything */ added by habran
    W64F_HABRAN               = W64F_SAVEREGPARAMS | W64F_AUTOSTACKSP | W64F_SMART,
    W64F_ALL = W64F_SAVEREGPARAMS | W64F_AUTOSTACKSP | W64F_STACKALIGN16 | W64F_SMART, /* all valid flags */
};
In my version there is no need  for W64F_STACKALIGN16 because it takes care of proper alignment of vars

option win64 : 11 :     
           0001   save register params in shadow space on proc entry
           0010   calculate required stack space for arguments of INVOKE
           1000   takes care of everything
           ------
           1011  binary = 11 dec or 0Bh

option frame : auto  creates .PROLOGUE and .EPILOGUE for you

option stackbase : rsp  means that local vars are based on RSP instead of RBP so RBP is free to use as normal nonvolatile register


Cod-Father

rrr314159

#8
Hello,

dedndave: ... i think it's good to understand what goes on under the hood ...

- agree, except in my case it's not "good" but "necessary". Unfortunately it takes a lot longer to learn things from the ground up, but I really can't function any other way; can't remember disconnected facts until tied together in a logical structure.

habran, thanks for the info, but not necessary to go to the trouble, it's in the manual. BTW my JWasm.chm doesn't work, TOC comes up but no text. I've had nothing but trouble with chm files, FWIW I recommend dropping them and sticking with .html or, even better, plain text. I found JWasm help text at

http://www.masmforum.com/board/index.php?PHPSESSID=8d46cd4ecb1688be429ab49694ec53e6&topic=15666.0;wap2


If anyone wants a copy I could post it here, but are there copyright issues involved? Anyway it covers the options, but I think what dedndave meant was it's good to know the mechanism behind "save register params in shadow space on proc entry". Fortunately we can find the exact details in your posts on JWasm forum! We can even get the source code if we ever get around to it - compared to MS this is heaven!

Now, it turns out my approach to printing, in the above program, is a bit primitive (as everyone but me knew). 30 confusing lines like "push REAL8 PTR [rsi]" can be replaced by 4 lines like this one:

invoke printf, chr$("RSP %x, level %i, %s", 10), rsp, r15, OFFSET teststring

chr$ - A great discovery! There are many more goodies in Masm32's macros.asm. The only names I find are Michael Webster, "huh" from New Zealand, and Greg Lyon; no doubt others were involved; they all have my gratitude, FWIW. Minor updates for 64-bit are sometimes needed, but it's not so much the macros themselves that are valuable, as the techniques. At the bottom qWord's name also appears; I don't thank him, yet, since for the most part I have no idea what he's doing, or why he's doing it. I'm sure he had good reasons.

Apologies for posting such an ugly mess as those print routines above, but in my defense, they have nothing to do with the point of the program. I won't bother to update it; let it stand as an example of what not to do!

Now, a dumb question. Invoke doesn't work in ML64. Should one therefore avoid using it when posting in this 64-bit forum, for compatibility? Or does everyone have no problem using JWasm when appropriate? Similar q for 64-bit: does one avoid posting 64-specific code in other forums, when the point of the code has nothing to do with 64 bits?

Another d.q.: why did MS drop invoke in ML64?

Not burning issues, I'm just curious.
I am NaN ;)