The MASM Forum

64 bit assembler => 64 Bit Assembler => Topic started by: markallyn on November 28, 2017, 08:04:12 AM

Title: PROC and prolog/epilog
Post by: markallyn on November 28, 2017, 08:04:12 AM
Hello everyone,

I would like to verify (or not) how the PROC directive works as far as creating prologs and epilogs.  Let us suppose that I have two PROCs, one of which calls the other.  Let us further asssume that I do not use the OPTION PROLOGUE:NONE and OPTION EPILOGU:NONE options.  Then using the PROC keyword in the first function will automatically generate a PROLOG and an EPILOG too.  But, using the PROC kleyword in the callee will NOT generate PROLOG/EPILOG pair if the function is a leaf function.

Is this true?

Thanks,
Mark Allyn
Title: Re: PROC and prolog/epilog
Post by: nidud on November 28, 2017, 09:29:12 AM
Stack frame generation depends on allocation of local memory and the caller creates the frame for arguments:

Code: [Select]
.code

a   proc
    ret
a   endp

b   proc
    call a
    ret
b   endp

    end

Code: [Select]
a       PROC
        ret             ; 0000 _ C3
a       ENDP
b       PROC
        call    a       ; 0001 _ E8, FFFFFFFA
        ret             ; 0006 _ C3
b       ENDP
Title: Re: PROC and prolog/epilog
Post by: hutch-- on November 28, 2017, 09:38:42 AM
Mark,

The way I would test that with ML64 is by creating a default proc then having a look at it with a disassembler. The problem is this, you need to at least the entry point of the executable correctly set up and aligned or the app will not start. I bothered to write the macros necessary to do this and while you can turn it off AFTER the app has started properly for simple leaf procedures then turn it back on after the procedure has exited, you must get the initial alignment right or the app exits before it even runs.
Title: Re: PROC and prolog/epilog
Post by: markallyn on November 28, 2017, 10:47:26 AM
Hutch and Nidud, good evening.

Nidud-  Using your code, what I'm saying is that in your "a" function, using the PROC directive does NOT cause prologue and epilogue code to be emitted in "a".  But, using PROC in "b" DOES cause prologue and epilogue code to be emitted, at least if "a" is a simple leaf function.  I have not found anything in msft documentation that indicates that PROC doesn't cause leaf function prologues and epilogues, but it ALWAYS causes prologues and epilogues to be generated if the function is a frame function, i.e. it calls another function.

Hutch-  Yes I actually created two functions as test functions, the one calling the other and looked them over with x64dbg.  That is how I "discovered" this apparently undocumented behavior of PROC when used with a leaf function.  I haven't yet tested what happens when I make the callee a frame function, but will do so very soon.  I'm guessing that in that case, PROC in the callee will emit prologue and epilogue code too.  If you think it's helpful I will post the test code here.

One other, related questio, albeit trivial.  When PROC is used with a frame function and prologue code gets emitted, ml64 automatically uses the instruction "enter 80,0." .  Why 80?

Thanks to both of you.

Mark
Title: Re: PROC and prolog/epilog
Post by: hutch-- on November 28, 2017, 06:29:31 PM
Mark,

It is adjustable, just look at the STACKFRAME macro and how it uses the UseStackFrame and EndStackFrame macros.

STACKFRAME MACRO dflt:=<96>,dynm:=<128>,algn:=<16>

The alignment equate which has a default of 16, you can change which allows you to align procedures so that the locals match the largest size of the data types so if you wanted to align at 64, 1024, page align at 4096, it is easy to do which means you can use SSE, AVX and AVX2 locals if you place them first before smaller sized data types.

Rather than repeat it, the reference material for how the stackframe is built and used is in the MASM64.chm help file and the actual pre-processor code is in the main macro file. As you are probably aware by now, external documentation is appalling, incomplete and often wrong, I built this system by exhaustive testing of code against the Microsoft ABI using API functions and local function, both with and without stackframes.

Now with the value 80 used with the ENTER mnemonic, you will do better by reading the Intel instruction manual. MASM has used LEAVE over many years while ENTER makes the proc simple,  clean and very reliable and the method is free of the messy and unreliable RSP twiddling that many have messed around with. It is not a fast mnemonic but on procedures that need a stack frame so they can call other high level procedures and API or external library functions, stack entry speed does not matter.

Now with the entry point being correctly set up the stack is aligned correctly so when you call a leaf procedure with no stack frame, the stack is aligned and you can avoid the tiny overhead as long as you don't mess the stack up and use a simple RET to exit the proc. Generally you don't use PUSH / POP like you did in win32, you make locals and use MOV to load the registers that need to be protected and restore them before proc exit.

LOCAL myreg :QWORD

mov myreg, rsi

; write the source code

mov rsi, myreg

ret

Title: Re: PROC and prolog/epilog
Post by: markallyn on November 29, 2017, 02:37:00 AM
Good morning/evening Hutch,

I am still reading and re-reading your last post.  As always, it's compact and loaded with information and takes more than a single pass to digest.

Meantime, I wanted to report on the results of inserting a simple invoke command into what was a leaf function to see what ml64 would do with the PROC directive under these circumstances.  You will recall previously that I had found that using PROC in a leaf function resulted in NO prologue or epilogue code being emitted.  Well, when I converted the callee into a "frame function" with the invoke macro (all it did was to call printf), then PROC puts the prologue and epilogue code in--including a gratuitous allocation of 96 bytes for locals that I'm not using.  Very annoying.  Annoying because it messes up the frame pointer to 6 parameters I had passed in from the calling function.

The one conclusive take-away from this escapade is that novices like me cannot write even the simplest masm code without a debugger, and a good one at that!

Thanks for your help and counseling.

Mark
Title: Re: PROC and prolog/epilog
Post by: nidud on November 29, 2017, 03:11:06 AM
Nidud-  Using your code, what I'm saying is that in your "a" function, using the PROC directive does NOT cause prologue and epilogue code to be emitted in "a".  But, using PROC in "b" DOES cause prologue and epilogue code to be emitted, at least if "a" is a simple leaf function.

There is no stack frame added in procedure b, so this assumption is wrong.

Quote
I have not found anything in msft documentation that indicates that PROC doesn't cause leaf function prologues and epilogues, but it ALWAYS causes prologues and epilogues to be generated if the function is a frame function, i.e. it calls another function.

The PROC directive do not causes prologues and epilogues to be generated based on these conditions. When you call another function you have to manually add a stack frame for the call regardless if the call is inside a PROC or not.

It may be simpler to use a disassembler instead of a debugger for testing this.
 
http://www.agner.org/optimize/#objconv
Title: Re: PROC and prolog/epilog
Post by: AW on November 29, 2017, 03:25:51 AM
There are only 3 cases to consider, when we are not talking about exception handling, and using the default ML64 prolog/epilog

1- Leaf functions
Align the stack on entry, restore the stack on exit

2-Functions with parameters passed in registers, or in the registers and stack, or LOCAL variables, or have USES clause as well, or all that, or part thereof, and may also call other functions
MASM automatically builds an rbp based stack frame and uses leave on the epilog.
ALL you have to do is ALIGN the stack after providing shadow space + space for the parameters after the 4th of the function(s) to be called, if there are functions to be called!

3- Functions that simply call other functions, but have no LOCALS, or parameters or USES.
Subtract from rsp the amount of shadow space plus parameters after the 4th and align the stack. Restore the stack on exit because there is no leave.

Title: Re: PROC and prolog/epilog
Post by: markallyn on November 29, 2017, 03:48:28 AM
Good morning, Nifud,

I tested the code you wrote and you're right--no prologue/epilogue pair is emitted even though procedure b has a call to function a.   So, I am very puzzled.

More experimentation is in order ....

BTW, I'm using x64dbg which has a nice disassemlber with it. 

Regards,
Mark
Title: Re: PROC and prolog/epilog
Post by: nidud on November 29, 2017, 04:34:57 AM
BTW, I'm using x64dbg which has a nice disassemlber with it. 

Yes, I'm using that too  :t

In 32-bit the stack frame for a call is created basically by pushing the arguments/return address unto the stack and jump:
Code: [Select]
a proto :ptr, :ptr, :ptr, :ptr, :ptr

.code
    push 5
    push 4
    push 3
    push 2
    push 1
    push @F
    jmp a
@@:

In 64-bit the principle are the same with some modifications.
Code: [Select]
    mov rcx,1
    mov rdx,2
    mov r8,3
    mov r9,4

    push 5
    push r9
    push r8
    push rdx
    push rcx
    push @F
    jmp a
@@:

However, you skip the pushing and only the fifth argument are assigned to the frame:
Code: [Select]
    sub rsp,5*8 ; create a frame for 5 args
    mov rcx,1   ; first 4
    mov rdx,2
    mov r8,3
    mov r9,4
    mov qword ptr [rsp+4*8],5
    call a
    add rsp,5*8 ; restore stack

In addition to this (and the very reason it's done this way) the stack have to be aligned 16. This means the stack is aligned 16 - 8 (return address) on proc-entry.
Title: Re: PROC and prolog/epilog
Post by: Vortex on November 29, 2017, 04:40:08 AM
Hi markallyn,

I use Agner Fog's objconv to disassemble object files. Studying the output is useful for me :

http://agner.org/optimize/
Title: Re: PROC and prolog/epilog
Post by: markallyn on November 29, 2017, 06:34:01 AM
Nifud, aw27, and Vortex,

Nifud:  Could you explain a bit more what you mean by:

"This means the stack is aligned 16 - 8 (return address) on proc-entry."

I follow everything else.  Very clear.

aw27:
I'm missing something very basic in what you wrote.  Namely, are you saying that these three conditions REQUIRE prologues and epilogues (whether built-in by ml or hand-written) OR are you saying the opposite, that the three conditions DO NOT require prologues and epilogues? 

Vortex:
Yes, I'm familiar with objconv by A. Fog.  It's been a couple of years since I played with it.  I'll try it again.  Thanks for reminding me about its existence.

Regards,
Mark
Title: Re: PROC and prolog/epilog
Post by: nidud on November 29, 2017, 07:06:30 AM
Nifud:  Could you explain a bit more what you mean by:

"This means the stack is aligned 16 - 8 (return address) on proc-entry."

I follow everything else.  Very clear.

Well, the stack is (or should always be) aligned on call:
Code: [Select]
    ; the stack is aligned 16 here..
    call a

However, the call itself pushes the return address unto the stack so the stack becomes (always) 8 byte off on entry:
Code: [Select]
a   proc
    ; the stack is aligned 8 here..
    push rbp   
    mov rbp,rsp
    ; stack now aligned 16

This means that the frame created for arguments must be even if RBP is used as above or odd if not to ensure alignment on call:
Code: [Select]
    ;sub rsp,5*8 ; create a frame for 5 args
    sub rsp,6*8 ; create a frame for 5 args
    ; stack aligned 16
Title: Re: PROC and prolog/epilog
Post by: AW on November 29, 2017, 08:44:22 AM
Please refer to the 3 cases above:

Code: [Select]
includelib \masm32\lib64\kernel32.lib
ExitProcess PROTO :dword

.code

;CASE 1
p1   proc ; leaf
sub rsp, 8 ; align stack

; ... do our things

add rsp, 8
ret
p1   endp

;Case 2
p2 proc parm1:dword
    ;and rsp, -16 ;no need here, but will not hurt if used, because push rbp will align

; ... do our things
ret
p2 endp

;Case 2
p3 proc
LOCAL myvar:dword
and rsp, -16 ; align

; do our things
ret
p3 endp

;Case 2
p4_1 proc uses rbx rdi rsi par1:qword, par2:qword, par3:qword, par4:qword, par5:qword
and rsp, -16 ; align

; do our things
ret
p4_1 endp

; Case 3
p4 proc
sub rsp, 28h ; shadow space+space for 5th parameter.
       ;and rsp, -16 ;no need here, but will not hurt if used, because already aligned
mov rcx,1
mov rdx,2
mov r8,3
mov r9,4
mov rax, 5
mov [rsp+20h],rax
call p4_1
add rsp, 28h
ret
p4 endp

; Case 3, but without need for epilog because ExitProcess fixes everything
main   proc
sub rsp, 28h ; shadow space + align

call p1
mov rcx, 1
call p2
call p3
call p4


;add rsp, 28h
;ret
mov ecx,0
call ExitProcess

main   endp

end
Title: Re: PROC and prolog/epilog
Post by: hutch-- on November 29, 2017, 01:24:06 PM
I did not want to double post this example of recursion so I put it here,

http://masm32.com/board/index.php?topic=6720.0

No stack frame twiddling trying to get it to work, a prologue/epilogue that constructs a minimum stack frame that automates stack frame creation in a context where it is the only way to do it. (Iteration procs are not recursive.)
Title: Re: PROC and prolog/epilog
Post by: markallyn on November 30, 2017, 02:25:09 AM
Hutch, Nidud, and aw27,

Thanks very much indeed for your contributions. 

Nidud-  Very clear explanation.  Using x64dbg I was able to verify what you wrote.

aw27:  Wow!  Very detailed and coherent.  I need to study the options more, but looking them over I think they are pretty self-explanatory.  I will certainly get back to you after completing the necessary study. 

Hutch:  I haven't yet been able to look over your attached file, but will get to it later today.  As with aw27, I will respond to what you sent.

Now, as for the issue that kicked off this series, I went back to the original Programmer's Guide for version 6.1 of masm--way back to the 1992 edition.  Here is a direct quotation from the Guide on page 198 under the heading "Generating Prologue and Epilogue Code":

Quote
When you use the PROC directive with its extended syntax and argument list the assembler automatically generates the prologue and epilogue code in your procedure.

This is true.  I tested it using a slightly expanded version of Nidud's (see above) streamlined code.  If I add to his "a PROC" a couple of arguments of the form :QWORD, :QWORD, etc.  then sure enough ml64 will emit prologue and epilogue code.  If there is no parameter list (as there isn't in Nidud's example) then indeed no prologue/epilogue pair shows up.  Of course, it matters because--as Nidud points out clearly--one must adjust the stack search by 8 bytes to allow for the return address or one will not recover a fifth or greater passed parameter.

Thanks again to all of you, and I will reply shortly to aw27 and Hutch after doing justice to their efforts.

Mark
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 01, 2017, 05:44:57 AM
Hello aw27,

Since you sent your very detailed examples of the three situations I have copied, assembled, and linked all of them.  I am still studying the results with x64dbg.  But, I do have one preliminary questtion.

Namely, in those cases where you have declared LOCAL variables I had predicted that the assembler would use the ENTER XX,0 instruction, but that never happened.  Do you have any explanation as to why ml64 persisted in using the usual individual cponents of the prologue?

I'll have more questions shortly.  But, thank you for your exhaustive demonstration.

Mark
Title: Re: PROC and prolog/epilog
Post by: AW on December 01, 2017, 06:00:16 AM
Namely, in those cases where you have declared LOCAL variables I had predicted that the assembler would use the ENTER XX,0 instruction, but that never happened.  Do you have any explanation as to why ml64 persisted in using the usual individual cponents of the prologue?
There is no obligation to use ENTER with LEAVE. ENTER is considered a slow instruction and is not very popular these days.
Title: Re: PROC and prolog/epilog
Post by: hutch-- on December 01, 2017, 07:15:07 AM
 :biggrin:

The catch here is that you don't put ENTER in a loop but then the instruction is not designed to be used in a loop so the reference to speed in this context is irrelevant. Outside of that the only gain of constructing a stack frame in an unreliable manner is that you may have the fastest MessageBoxA on the planet by saving a few picoseconds. LEAVE was always fast enough.
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 01, 2017, 07:33:16 AM
aw27,

Quote
There is no obligation to use ENTER with LEAVE. ENTER is considered a slow instruction and is not very popular these days.

I was aware of this.  But, nevertheless, for whatever reason ml64 has been ignoring this in some of the code I write and uses ENTER 80,0.  If ml64 consistently avoided this usage, I would understand why, but it doesn't.  In all of the cases you created ENTER never appears, but I can show you more that one instance in my stuff where it does.  In fact, if you look at the link that Hutch sent in connection with recursion, you will see that his code has ENTER in it too.

I'm still plowing through your 3 cases of code.  I have another question I will reserve for a follow-on post regarding stack aligning. 

Thanks for your assistance.  It is warmly welcomed.

Mark
Title: Re: PROC and prolog/epilog
Post by: hutch-- on December 01, 2017, 08:07:40 AM
Mark,

Have a look at the 64 bit MACRO file to see how the stackframe is constructed. Search for "UseStackFrame" to find it. There are 3 arguments you can pass to the STACKFRAME argument, the third being alignment which must be a power or 2. I have done a number of extras to handle a stack aligned for AVX ad AVX2 locals. You can adjust the 3 arguments if you want to reduce the stack overhead with nested procedure calls, the recursion test piece can be used to test this. In most instances trimming the stack overhead does not matter as it is pre-allocated memory but if you are writing recursive code that has a very large count of recursion depth, you can trim it down carefully or increase the linker settings or both. You will know if you have trimmed off too much as the app will not start OR it will stop once the stack memory is exhausted.
Title: Re: PROC and prolog/epilog
Post by: AW on December 01, 2017, 08:57:41 AM
There is no great damage in using ENTER, but no advantage either.  :biggrin:
ENTER was though for languages that use nested procedures, like Pascal and Delphi. However, they don't use it.  :biggrin:
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 01, 2017, 10:55:15 AM
Good evening Hutch, aw27:

Thanks.  I'll check out the macro.  By the way, I'm perfectly content to use ENTER just about always (except for recursions--which I don't write in any case), I just don't understand what circumstances cause ml to generate it, and when it does, why it picks the size of stack frame that it picks.  It seems to default to 80h--why this value is a mystery.  For a bit I thought that ml would do ENTERs whenever there were LOCALs defined.  But, testing this with one or two of aw27's small programs indicates that this isn't the case.  It stays with the conventional push rbp;  mov rbp, rsp.

Mark
Title: Re: PROC and prolog/epilog
Post by: hutch-- on December 01, 2017, 11:47:50 AM
Mark,

The values associated with the stackframe are those in the macro I have referred you to. The default values are aimed at safety but they are also modifiable which allows you to change alignment and tweak the arguments pointed at the stackframe macro to optimise the memory usage if you are running recursion to any large depth.
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 02, 2017, 01:33:05 AM
Good morning/evening Hutch:

Ah, mystery solved!  I know there will be at least one more question from me winging its way more or less in your direction, but this is for my small brain a major breakthrough.

Mark
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 02, 2017, 05:40:51 AM
Good afternoon, evening Hutch,

I'm looking at the usestackframe macro.  I'll try invoking it, but I'm unclear what the "flag" parameter is about.  I can't see it in any of the comments.

Mark
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 02, 2017, 07:47:42 AM
Hello Hutch,

Actually, if you could direct me to an example invocation of UseStackFrame it would be most helpful.  I tried a number of times to get the parameters right but haven't yet figured out how.  I googled on UseStackFrame and found two references to it, but no code, just discussion.

Thanks,
Mark
Title: Re: PROC and prolog/epilog
Post by: hutch-- on December 02, 2017, 08:45:04 AM
 :biggrin:

Mark,

Look in the "macros64" directory for the file "macros64.inc" and you will find the source of all the macros I wrote for 64 bit MASM. There you will find the macro "UseStackFrame" and its matching "EndStackFrame" and the two macros are designed to be called by a number of wrapper macros.

STACKFRAME = the default stackframe with a 16 byte alignment.
NOSTACKFRAME turns the stackframe off.

Then there a a number of alternative forms that only differ in their alignment.

YMMSTACK = 32 byte alignment for AVX instructions.
ZMMSTACK = 64 byte alignment for AVX2 instructions.
CUSTOMSTACK = roll your own.

They differ only in the equates passed to the "UseStackFrame" macro.

      stackframe_default equ <dflt>     ;; set default stack
      stackframe_dynamic equ <dynm>     ;; set byte count for ENTER mnemonic
      stackframe_align   equ <algn>     ;; align the stack by an interval of 16

The documentation for the first 2 arguments (dflt and dynm) is available in the Intel manuals under the ENTER mnemonic, the third argument "algn" has to be a power of 2 byte alignment.

With the "masm64" help file which is incomplete on many of the library and macro code you need to read the data in these categories.

 Simplified Introduction
 A basic explanation of the stackframe and invoke notations.

 Design Criteria

 Calling Convention
 How the Win 64 calling convention works.

 Stack frame reference
 How the MASM64 stackframe works.


In particular, you need to understand how the Win64 ABI works and why you must fully comply with it or your application will not start. Get it wrong and the app will just exit telling you nothing. Win64 FASTCALL calling convention is a lot more complex than how Win32 worked, rather than LIFO stack arguments, all arguments must be aligned according to the ABI with the first 4 arguments being in RCX, RDX, R8 and R9 and the stack has what is called "shadow space" that allows the register contents to be written to the stack for code design that required repeated access to the arguments (recursion being one example).

The system I have built is designed to look much like 32 bit code so you don't have to keep twiddling the stack to write reliable code, if you need to know how it works you need to read the documentation AND the main macro file. I warn anyone who want to write 64 bit MASM that it is an advanced topic with little useful data available, lousy and often inaccurate documentation and very few people who understand how it works.
 



Title: Re: PROC and prolog/epilog
Post by: markallyn on December 02, 2017, 09:51:57 AM
Godd evening/morning Hutch,

Thank you so much for your very generous assistance on this thorny business.  I spent my afternoon here in south east pennsylvania usa wrestling with the ABI -- which as you can easily see and have seen -- is what I'm attempting to grasp.  Sill a long way to go.

What I completely fmissed is that the program defaults automatically to your macro "dynamic version" without any active invocation on my part.  That much I finally comprehended, although it took longer than it should have.  I tumbled to your NOSTACKFRAME macro and that was the clue that finally drove it through my thick skull,  What was decisive was when I didn't include the masm54rt.inc file the prologue and epilogue no longer showed up in the disassembly.

So, where I am now as far as the usestackframe macro is concerned is that I DON'T invoke it, but I can use an alternative form as you show in the macro64.inc file and also in this post. 

Quote
  I warn anyone who want to write 64 bit MASM that it is an advanced topic with little useful data available, lousy and often inaccurate documentation and very few people who understand how it works.

Yes, I have been fairly warned, but I am like one of Dante's poor souls condemned to enter a 64 bit labyrinth from which no return is possible.  It's what happens when you turn 75!

Regards as always,
Mark
Title: Re: PROC and prolog/epilog
Post by: hutch-- on December 02, 2017, 01:06:34 PM
Mark,

> It's what happens when you turn 75!

You can't use that as an excuse, I am not far behind you as I turn 70 in the middle of next year. You know the old rule, use it or lose it.
Title: Re: PROC and prolog/epilog
Post by: hutch-- on December 02, 2017, 08:28:56 PM
Mark,

Give this section of the help file a good read as this is where the action is in understanding the Microsoft application binary interface (ABI).

The Win 64 Calling Convention, How Does It Work ?

The first four stack addresses are [rsp], [rsp+8], [rsp+16] and [rsp+24] which are left empty. Argument 5 and upwards are written to the RSP relative address [rsp+32] and upwards with an increase in displacement of 8 bytes for each argument.

A typical procedure call with 6 arguments will look like this.

mov rcx, arg1
mov rdx, arg2
mov r8, arg3
mov r9, arg4
mov QWORD PTR [rsp+32], arg5
mov QWORD PTR [rsp+40], arg6
call FunctionName

Now the interesting part is you can pass a BYTE, WORD, DWORD and QWORD at the same stack address and the ABI is designed this way. The stack is always aligned even with different data sizes being passed. With a MASM procedure that has a stack frame and argument list, the argument written in the procedure call arrive at known locations on the stack and are accessible by their name from the procedure argument list.

 
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 03, 2017, 05:21:56 AM
Good afternoon/evening Hutch,

I am really glad you sent me exactly this passage from that document because I have been wanting to ask you about it for several weeks--ever since I realized I had no idea what was going on with parameter passing in x64.  My question is:  where are you creating "shadow space/spill space"?  I coded this passage myself and discovered that it "works", but in most postings by various authors there is usually a "sub rsp/add rsp" pairing with the called function sandwiched between.

The document from which you extracted the passage, is, by the way, very well done.  Clear and simple.

Regards,
Mark
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 03, 2017, 05:56:09 AM
Hutch,

I'm a firm believer in the truth and wisdom of this adage.  My wife, however, who is after all the ultimate arbiter of my hours, firmly believes that getting lost in assembly language is not the most efficient means of implementing it.

Regards,
Mark
Title: Re: PROC and prolog/epilog
Post by: hutch-- on December 03, 2017, 12:01:23 PM
Mark,

You use shadow space where you need it. For a leaf procedure or a procedure where you know exactly how all of the registers are being use and is 4 arguments or less, you can directly pass the arguments in the first 4 registers and bypass the need for shadow space. Where you need a stack frame for more than 4 arguments and LOCAL variables you copy the first 4 registers into the start of the stack address, always at 8 byte spacing, OFFSETs 0, 8, 16, 24 then after that you copy any other arguments to following 8 byte OFFSETs but with a quirk, last arg next, second last arg after that etc .... This was tested against a multitude of API functions, C runtime functions and conventional assembler procedures and it works correctly as it is constructed according to the Microsoft ABI.

You stay away from stack manipulation as you risk messing up the stack alignment, you can still get away with using PUSH/POP but you need to exercise considerable care as you can kill the app stone dead by getting it wrong. For slightly more typing, allocate a LOCAL for each register and use MOV in and back out at the end of the proc.

MyProc proc

    LOCAL myreg :QWORD

    mov myreg, r15

  ; write you code using the reg

    mov r15, myreg

    ret

MyProc endp


As far as your better half, try selling her the "Use it or lose it" (do you want to care for a vegetable) view and if you survive, you can keep up your code development.  :P
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 04, 2017, 05:13:04 AM
Hutch,
I've been playing around with a technique for correctly computing the stack space required and how to push additional parameters onto the stack prior to procedure call that uses the parameters.  Here;s the results so far.  In this case I'm using six parameters in a call to "mullt6ints".  Please critique what I'm doing, if you have the time.

Quote
include \masm32\include64\masm64rt.inc

OPTION CASEMAP:NONE

printf      PROTO :QWORD, :VARARG
mult6ints   PROTO :QWORD, :QWORD, :QWORD, :QWORD, :QWORD, :QWORD

NOSTACKFRAME

.const               
NUM_PUSHREG equ  6         ;;number of variables in call
STK_LOCAL   equ  8         ;;some space for a local qword
STK_PAD       equ ((NUM_PUSHREG and 1) xor 1) * 8
STK_TOTAL   equ STK_LOCAL + STK_PAD
RBP_RA       equ NUM_PUSHREG*8 + STK_LOCAL + STK_PAD


.data
frmt1   BYTE   "Done with stack test", 13,10,0
frmt2   BYTE   "STK_PAD is %d", 13,10,0
frmt3   BYTE   "STK_TOTAL is %d",13,10,0
frmt4   BYTE   "RBP_RA is %d", 13,10,0
frmt5   BYTE   "The result of the multiplication is %d",13,10,0

.code
main      PROC
push   rbp            ;;create stack frame
mov   rbp, rsp
sub   rsp, RBP_RA
mov   rdx, STK_PAD
invoke  printf, ADDR frmt2, rdx      ;;print STK_PAD bytes
mov   rdx, STK_TOTAL
invoke   printf, ADDR frmt3, rdx      ;;print STK_TOTAL bytes
mov   rdx, RBP_RA
invoke  printf, ADDR frmt4, rdx      ;;print rbp to return address bytes
sub   rsp, RBP_RA         ;;create spill space
mov   rcx, 1
mov   rdx, 2
mov   r8, 3
mov   r9, 4
mov   qword ptr[rsp+RBP_RA], 5
mov   qword ptr[rsp+RBP_RA+8], 6
call   mult6ints         ;;call mult6ints
mov   rdx, rax
invoke   printf, ADDR frmt5, rdx      ;;print result
invoke  printf, ADDR frmt1      ;;say goodbye
      ;;undo spill space
waitkey
add   rsp, RBP_RA   
mov   rsp, rbp         ;;epilogue
pop   rbp
ret
main   ENDP
END


Mult6ints is not included.  But, the gist of it is that mult6ints uses a macro similar to the computations in the .const section to compute RBP_RA.  Then it adds 16 bytes to this figure in order to locate the two parameters passed on the stack.  The whole thing looks kind of ugly to me and I'm sure a much more skilled programmer could work out a more elegant solution.

As I say, if you have some time to analyze and criticize I would be grateful.

Mark
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 04, 2017, 05:17:50 AM
Hutch and everyone,

I should give credit to Daniel Kusswarm ("Modern s86 Assembly Language Programming") for this approach to calculating stack size and correct pointer location).  I've modified his work, but the gist of it belongs to hiim.

Mark
Title: Re: PROC and prolog/epilog
Post by: hutch-- on December 04, 2017, 04:39:48 PM
Mark,

You are not going to get many takers if you don't post complete working example that can be built.
Title: Re: PROC and prolog/epilog
Post by: AW on December 05, 2017, 12:31:17 AM
I don't think the technique of Daniel Kusswarm (as reported by you, of course) is good. If , for instance, you set "STK_LOCAL   equ  16", then we will have the stack not aligned after leaving the prolog  :( . BTW, EQU is a directive, so is nothing to place specially in the .const segment.

There are many errors in your program, but anyway, you are progressing.  :t
Title: Re: PROC and prolog/epilog
Post by: nidud on December 05, 2017, 02:09:43 AM
There are a few Masm compatible assemblers (three of them actually hosted on this forum) that handle the stack automatically. What they do is basically to calculate the maximum number of arguments used within a PROC/ENDP frame and create a common stack frame for all calls. This sort of simplify things.

Example:
Code: [Select]
include stdio.inc

.code

main proc

    printf("2 args: %d\n", 1)
    printf("5 args: %d,%d,%d,%d\n", 1, 2, 3, 4)
    xor eax,eax
    ret

main endp

    end main

Result:
Code: [Select]
main    PROC
        sub     rsp, 56                                 ; 0000 _ 48: 83. EC, 38
        mov     edx, 1                                  ; 0004 _ BA, 00000001
        lea     rcx, [DS0000]                           ; 0009 _ 48: 8D. 0D, 00000000(rel)
        call    printf                                  ; 0010 _ E8, 00000000(rel)
        mov     dword ptr [rsp+20H], 4                  ; 0015 _ C7. 44 24, 20, 00000004
        mov     r9d, 3                                  ; 001D _ 41: B9, 00000003
        mov     r8d, 2                                  ; 0023 _ 41: B8, 00000002
        mov     edx, 1                                  ; 0029 _ BA, 00000001
        lea     rcx, [DS0001]                           ; 002E _ 48: 8D. 0D, 00000000(rel)
        call    printf                                  ; 0035 _ E8, 00000000(rel)
        xor     eax, eax                                ; 003A _ 33. C0
        add     rsp, 56                                 ; 003C _ 48: 83. C4, 38
        ret                                             ; 0040 _ C3
main    ENDP

However, to get an understanding of how the calling convention works you need to do some testing.
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 05, 2017, 03:54:44 AM
Hutch, Nidud, aw27:

Recognizing the truth of Hutch's response, I will post the callee later this afternoon (EST).  I have modified the caller (which I had posted) so that the user can interact with it and specify number of pass parameters and also the amount of local stack required.  This has proven a bit more challenging -- not surprising, given my novice status -- but it doesn't change anything much about the ABI calling so that the existing post represents pretty close to the final product I have in mind. 

In the callee that I post you will see that I have converted the code in the .const section of the caller into a macro.

Regards to all and thanks for your wisdom.

Mark

Title: Re: PROC and prolog/epilog
Post by: markallyn on December 05, 2017, 06:34:10 AM
Hello everyone,

As I promised earlier, here is a copy of "arith2callee.asm" which contains the macro and associated mult6ints PROC. 

Quote
include \masm32\include64\masm64rt.inc

calcstack MACRO numregs:REQ, loc:REQ                
LOCAL   STK_PAD, STK_TOTAL
         
STK_PAD       equ ((numregs and 1) xor 1) * 8
STK_TOTAL   equ STK_LOCAL + STK_PAD
RDP_RA       equ numregs*8 + loc + STK_PAD
   ENDM
.data

.code
mult6ints   PROC
calcstack  6, 8
imul   rcx, rdx
imul   rcx, r8
imul   rcx, r9
imul   rcx, [rbp + RDP_RA + 16]
imul   rcx, [rbp + RDP_RA + 24]
mov   rax, rcx
mov   r12, rax
;invoke   printf, ADDR frmt1
mov   rax, r12
ret
mult6ints   ENDP
END


As originally written I had been using printf as a debugging assistant.  Printf of course messes up the volatile rax register so I was resorting to the trick of "pushing" it into the r12 non-valotile.  I left the old code in and that is why you see the strange locution just before the ret.

I'm still working on the caller to make it interactive.  I haven't figured out how to get the stack info from the keyboard into the .const section (or a macro which will replace it) in such fashion that the called program can get the necessary stack info without re-entering it from the keyboard -- somehow passing it from the called program to the callee.  Interesting problem that no doubt some expert has solved long ago.

Regards,
Mark
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 08, 2017, 02:38:58 AM
Hello everyone,

OK, my last words on this subject unless someone chimes in.  After playing around I decided that the simplest, brute-force approach was to collect the number of parameters and the necessary space for locals was to get keyboard input from the user.  The following is the caller code.  It's ugly, needs error checking, and no doubt one of you would do a better job of it.
Quote
include \masm32\include64\masm64rt.inc

OPTION CASEMAP:NONE


atoi      PROTO :QWORD
printf      PROTO :QWORD, :VARARG
mult6ints   PROTO :QWORD, :QWORD, :QWORD, :QWORD, :QWORD, :QWORD

NOSTACKFRAME

.data
frmt1   BYTE   "Done with stack test", 13,10,0
frmt2   BYTE   "STK_PAD is %d", 13,10,0
frmt3   BYTE   "STK_TOTAL is %d",13,10,0
frmt4   BYTE   "RBP_RA is %d", 13,10,0
frmt5   BYTE   "The result of the multiplication is %d",13,10,0
frmt6   BYTE   "Enter number of parms",13,10,0
frmt7   BYTE   "Enter number of local bytes needed",13,10,0
reserved QWORD   NULL
sze   QWORD   8

.data?
loc   QWORD   ?
numregs   QWORD   ?
buffer   BYTE   8 dup(?)
STK_PAD QWORD   ?
STK_TOTAL QWORD   ?
RBP_RA   QWORD   ?
hFile   HANDLE   ?
ccRead   DWORD   ?

.code
main      PROC
push      rbp   ;;create stack frame
mov      rbp, rsp

         ;;get number of params from the keyboard
sub      rsp, 30h
invoke      printf, ADDR frmt6
invoke      GetStdHandle, STD_INPUT_HANDLE
mov      rcx, rax
invoke      ReadConsole, rcx, ADDR buffer, 8, ADDR ccRead, ADDR reserved
lea      rcx, buffer
call      atoi
mov      numregs, rax

         ;;get number of bytes of locals
invoke      printf, ADDR frmt7
invoke      GetStdHandle, STD_INPUT_HANDLE
mov      rcx, rax
invoke      ReadConsole, rcx, ADDR buffer, 8, ADDR ccRead, ADDR reserved
lea      rcx, buffer
call      atoi
mov      loc, rax
add      rsp, 30h

         ;;compute bytes of padding
mov      r12, numregs
and      r12, 1
xor      r12, 1
imul      r12, 8
mov      rax, r12
mov      STK_PAD, rax

         ;;compute total stack required
add      rax, loc
mov      STK_TOTAL, rax
mov      r12, numregs
imul      r12, 8
mov      rax, loc
add      rax, r12
add      rax, STK_PAD

         ;;save RBP_RA       
mov      RBP_RA, rax

      ;;print values of STK_PAD STK_TOTAL and RBP_RA
sub      rsp, RBP_RA
mov      rdx, STK_PAD
invoke     printf, ADDR frmt2, rdx      ;;print STK_PAD bytes
mov      rdx, STK_TOTAL
invoke      printf, ADDR frmt3, rdx      ;;print STK_TOTAL bytes
mov      rdx, RBP_RA
invoke     printf, ADDR frmt4, rdx      ;;print rbp to return address bytes

         ;;set up call to mult6ints      
mov      rcx, 1
mov      rdx, 2
mov      r8, 3
mov      r9, 4
mov      qword ptr[rsp], 5
mov      qword ptr[rsp+8], 6
call      mult6ints         ;;call mult6ints
mov      rdx, rax
invoke      printf, ADDR frmt5, rdx      ;;print result
invoke     printf, ADDR frmt1      ;;say goodbye
         
         ;;undo spill space
add      rsp, RBP_RA   
mov      rsp, rbp         ;;epilogue
pop      rbp

waitkey
      ret
main   ENDP
END

Here is the callee, mult6ints:
Quote
include \masm32\include64\masm64rt.inc

NOSTACKFRAME

.datainclude

.code
mult6ints   PROC
push   rbp
mov   rbp, rsp
imul   rcx, rdx
imul   rcx, r8
imul   rcx, r9
imul   rcx, [rbp+16]
imul   rcx, [rbp+24]
mov   rax, rcx
mov   rsp, rbp
pop   rbp
ret
mult6ints   ENDP
END

As I say, it works, it's ugly.  I'm sure aw27 is correct that there are other versions of this sort of thing out there.  But, it helped teach me how the ABI works.

Regards,
Mark
Title: Re: PROC and prolog/epilog
Post by: hutch-- on December 08, 2017, 11:44:30 PM
Mark,

Put it in a ZIP file so it works, otherwise anyone who wants to look at it has to construct the rest to see what it does.
Title: Re: PROC and prolog/epilog
Post by: AW on December 09, 2017, 12:24:10 AM
The examples of Kusswurm and the way it shows how to produce a prologue refers to "Proc Frame" while you, and in general everybody here, including myself (most times), don't care about exception handling.
What I mean is that you should not transplant a kidney to make the job of a different organ.
Also, Kusswurm talks about "NUM_PUSHREG = number of prolog non-volatile register pushes" and you are using that concept for the number of parameters of the function.
Title: Re: PROC and prolog/epilog
Post by: Jokaste on December 09, 2017, 02:33:01 AM
I follow this post since every day but I must say that I don't understand.
What is the goal?
Is it to create a stack frame like JwAsm does?
With PoAsm there is a 'PARAMAREA' (PARMAREA=5*QWORD).


Could you explain it for me please? :dazzled:
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 09, 2017, 10:50:26 AM
aw27 and Jokaste and Hutch,

aw27. Yes, absolutely Kusswarm was after SEH type code and that's why he does what he does.  And, as you say, apparently no one on the forum cares about SEH.  And, yes you are also right that Kusswarm's NUM_PUSHREG reffers to pushed registers and not parameters.  I debated whether to change the name to PUSH_PARMS, but let it stand as is.  Why bother with what I did.? This goes to Jokaste's query.

Jokaste and aw27:  The goal originally was to understand what PROC actually does...and I gradually drifted off this and towards the code you see before you.  As I discovered, if one includes \masm32\include\64\masm64rt.inc then one causes STACKLIB macro to be run and this generates the ENTER\LEAVE pair.  If NOSTACKLIB is run, then one gets a "bare bones" PROC and one must create a frame if it's needed.  But I drifted past this "discovery" and began to wonder if it was possible to interactively define a suitable stack frame for the 64-bit ABI.  After considerable mucking about the above code resulted and mostly thanks to reading Kusswarm's code late in his book.  Primarily, this whole business was me learning how the blasted ABI actually works if there are more than 4 parameters.

Hutch:  I will post a .zip file.  Should have thought about this.  Apologies to all.

Mark
Title: Re: PROC and prolog/epilog
Post by: jj2007 on December 09, 2017, 11:14:16 AM
Kusswarm was after SEH type code and that's why he does what he does.  And, as you say, apparently no one on the forum cares about SEH.

Yes, we love to see code crash :greenclp:

Seriously: José has done a great job exploring SEH in 64-bit land, but many assembler programmers believe in code that either works or crashes - no half-baken compromises.

Btw the guy's name is Kusswurm. The origin is German, "Kusswarm" would mean "as warm as a kiss", Kusswurm is something like a kissing worm - I sincerely hope he got used to it. He offers a freely downloadable guide to using Masm in Visual Studio (https://raw.githubusercontent.com/Apress/modern-x86-assembly-language-programming/master/9781484200650_AppA.pdf). By reading only 28 pages, you will be able to build a Hello World project in Micros**t's flagship IDE. Hurry up and get it, as it will not be compatible with the coming version of Visual Crap 8)

Title: Re: PROC and prolog/epilog
Post by: felipe on December 09, 2017, 12:14:33 PM
 :P
Title: Re: PROC and prolog/epilog
Post by: hutch-- on December 09, 2017, 01:00:52 PM
While I confess to being a dinosaur in coding style where you get it right or it explodes in your face and makes you look like a jerk, SEH does have its place, even in properly written error free code with hardware based tasks where control of the required capacity cannot be done in software. Outside of specific hardware related issues, a "no error handler" approach makes your debugging a lot simpler and your code a lot more reliable. Get it exactly right and it works correctly without hand holding, make a mess of it and it very clearly tells you that it did not work.
Title: Re: PROC and prolog/epilog
Post by: AW on December 09, 2017, 02:43:34 PM
Quote
I debated whether to change the name to PUSH_PARMS, but let it stand as is.
The called function doesn't have to care about the influence of the function parameters on the alignment. This is done by the caller.
Title: Re: PROC and prolog/epilog
Post by: AW on December 09, 2017, 02:50:40 PM
Quote
but many assembler programmers believe in code that either works or crashes

That is not really the reason. The reason is that if you don't use SEH you will have problems integrating the ASM with a high-level language like C or C++ without disabling SEH for the whole application. And 99% of people that use ASM in the real World use it in this fashion.
Title: Re: PROC and prolog/epilog
Post by: jj2007 on December 09, 2017, 07:48:40 PM
The reason is that if you don't use SEH you will have problems integrating the ASM with a high-level language like C or C++ without disabling SEH for the whole application.

Will the C application that loads an asm dll or links to an asm object file notice that there is no SEH?
Title: Re: PROC and prolog/epilog
Post by: AW on December 09, 2017, 09:09:18 PM
Will the C application that loads an asm dll or links to an asm object file notice that there is no SEH?
When building in release mode, Visual Studio uses to notice. This may not apply to other tools. I don't think it will apply as well to asm dlls.
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 10, 2017, 05:29:22 AM
Hello everyone,

In response to Hutch's suggestion yesterday I have attached .zip file containing the caller and callee asm sources. 

Mark
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 10, 2017, 05:34:27 AM
Let me offer sincerest apologies to Daniel Kusswurm for mangling his name.  I suppose the mistake crept in because I thought a warm kiss much more appealing than kissing worms.

Mark
Title: Re: PROC and prolog/epilog
Post by: markallyn on December 10, 2017, 06:06:17 AM
aw27:

With respect to your statement:

Quote
he called function doesn't have to care about the influence of the function parameters on the alignment. This is done by the caller.

Exactly so.  It took me doing this bizarre program tediously over several days of dead-ends to discover this very basic fact.

Mark
Title: Re: PROC and prolog/epilog
Post by: AW on December 10, 2017, 06:31:18 PM
It took me doing this bizarre program tediously over several days of dead-ends to discover this very basic fact.
Not bad, some people take years and others haven't got it yet.