Hello everyone,
I would like to verify (or not) how the PROC directive works as far as creating prologs and epilogs. Let us suppose that I have two PROCs, one of which calls the other. Let us further asssume that I do not use the OPTION PROLOGUE:NONE and OPTION EPILOGU:NONE options. Then using the PROC keyword in the first function will automatically generate a PROLOG and an EPILOG too. But, using the PROC kleyword in the callee will NOT generate PROLOG/EPILOG pair if the function is a leaf function.
Is this true?
Thanks,
Mark Allyn
deleted
Mark,
The way I would test that with ML64 is by creating a default proc then having a look at it with a disassembler. The problem is this, you need to at least the entry point of the executable correctly set up and aligned or the app will not start. I bothered to write the macros necessary to do this and while you can turn it off AFTER the app has started properly for simple leaf procedures then turn it back on after the procedure has exited, you must get the initial alignment right or the app exits before it even runs.
Hutch and Nidud, good evening.
Nidud- Using your code, what I'm saying is that in your "a" function, using the PROC directive does NOT cause prologue and epilogue code to be emitted in "a". But, using PROC in "b" DOES cause prologue and epilogue code to be emitted, at least if "a" is a simple leaf function. I have not found anything in msft documentation that indicates that PROC doesn't cause leaf function prologues and epilogues, but it ALWAYS causes prologues and epilogues to be generated if the function is a frame function, i.e. it calls another function.
Hutch- Yes I actually created two functions as test functions, the one calling the other and looked them over with x64dbg. That is how I "discovered" this apparently undocumented behavior of PROC when used with a leaf function. I haven't yet tested what happens when I make the callee a frame function, but will do so very soon. I'm guessing that in that case, PROC in the callee will emit prologue and epilogue code too. If you think it's helpful I will post the test code here.
One other, related questio, albeit trivial. When PROC is used with a frame function and prologue code gets emitted, ml64 automatically uses the instruction "enter 80,0." . Why 80?
Thanks to both of you.
Mark
Mark,
It is adjustable, just look at the STACKFRAME macro and how it uses the UseStackFrame and EndStackFrame macros.
STACKFRAME MACRO dflt:=<96>,dynm:=<128>,algn:=<16>
The alignment equate which has a default of 16, you can change which allows you to align procedures so that the locals match the largest size of the data types so if you wanted to align at 64, 1024, page align at 4096, it is easy to do which means you can use SSE, AVX and AVX2 locals if you place them first before smaller sized data types.
Rather than repeat it, the reference material for how the stackframe is built and used is in the MASM64.chm help file and the actual pre-processor code is in the main macro file. As you are probably aware by now, external documentation is appalling, incomplete and often wrong, I built this system by exhaustive testing of code against the Microsoft ABI using API functions and local function, both with and without stackframes.
Now with the value 80 used with the ENTER mnemonic, you will do better by reading the Intel instruction manual. MASM has used LEAVE over many years while ENTER makes the proc simple, clean and very reliable and the method is free of the messy and unreliable RSP twiddling that many have messed around with. It is not a fast mnemonic but on procedures that need a stack frame so they can call other high level procedures and API or external library functions, stack entry speed does not matter.
Now with the entry point being correctly set up the stack is aligned correctly so when you call a leaf procedure with no stack frame, the stack is aligned and you can avoid the tiny overhead as long as you don't mess the stack up and use a simple RET to exit the proc. Generally you don't use PUSH / POP like you did in win32, you make locals and use MOV to load the registers that need to be protected and restore them before proc exit.
LOCAL myreg :QWORD
mov myreg, rsi
; write the source code
mov rsi, myreg
ret
Good morning/evening Hutch,
I am still reading and re-reading your last post. As always, it's compact and loaded with information and takes more than a single pass to digest.
Meantime, I wanted to report on the results of inserting a simple invoke command into what was a leaf function to see what ml64 would do with the PROC directive under these circumstances. You will recall previously that I had found that using PROC in a leaf function resulted in NO prologue or epilogue code being emitted. Well, when I converted the callee into a "frame function" with the invoke macro (all it did was to call printf), then PROC puts the prologue and epilogue code in--including a gratuitous allocation of 96 bytes for locals that I'm not using. Very annoying. Annoying because it messes up the frame pointer to 6 parameters I had passed in from the calling function.
The one conclusive take-away from this escapade is that novices like me cannot write even the simplest masm code without a debugger, and a good one at that!
Thanks for your help and counseling.
Mark
deleted
There are only 3 cases to consider, when we are not talking about exception handling, and using the default ML64 prolog/epilog
1- Leaf functions
Align the stack on entry, restore the stack on exit
2-Functions with parameters passed in registers, or in the registers and stack, or LOCAL variables, or have USES clause as well, or all that, or part thereof, and may also call other functions
MASM automatically builds an rbp based stack frame and uses leave on the epilog.
ALL you have to do is ALIGN the stack after providing shadow space + space for the parameters after the 4th of the function(s) to be called, if there are functions to be called!
3- Functions that simply call other functions, but have no LOCALS, or parameters or USES.
Subtract from rsp the amount of shadow space plus parameters after the 4th and align the stack. Restore the stack on exit because there is no leave.
Good morning, Nifud,
I tested the code you wrote and you're right--no prologue/epilogue pair is emitted even though procedure b has a call to function a. So, I am very puzzled.
More experimentation is in order ....
BTW, I'm using x64dbg which has a nice disassemlber with it.
Regards,
Mark
deleted
Hi markallyn,
I use Agner Fog's objconv to disassemble object files. Studying the output is useful for me :
http://agner.org/optimize/
Nifud, aw27, and Vortex,
Nifud: Could you explain a bit more what you mean by:
"This means the stack is aligned 16 - 8 (return address) on proc-entry."
I follow everything else. Very clear.
aw27:
I'm missing something very basic in what you wrote. Namely, are you saying that these three conditions REQUIRE prologues and epilogues (whether built-in by ml or hand-written) OR are you saying the opposite, that the three conditions DO NOT require prologues and epilogues?
Vortex:
Yes, I'm familiar with objconv by A. Fog. It's been a couple of years since I played with it. I'll try it again. Thanks for reminding me about its existence.
Regards,
Mark
deleted
Please refer to the 3 cases above:
includelib \masm32\lib64\kernel32.lib
ExitProcess PROTO :dword
.code
;CASE 1
p1 proc ; leaf
sub rsp, 8 ; align stack
; ... do our things
add rsp, 8
ret
p1 endp
;Case 2
p2 proc parm1:dword
;and rsp, -16 ;no need here, but will not hurt if used, because push rbp will align
; ... do our things
ret
p2 endp
;Case 2
p3 proc
LOCAL myvar:dword
and rsp, -16 ; align
; do our things
ret
p3 endp
;Case 2
p4_1 proc uses rbx rdi rsi par1:qword, par2:qword, par3:qword, par4:qword, par5:qword
and rsp, -16 ; align
; do our things
ret
p4_1 endp
; Case 3
p4 proc
sub rsp, 28h ; shadow space+space for 5th parameter.
;and rsp, -16 ;no need here, but will not hurt if used, because already aligned
mov rcx,1
mov rdx,2
mov r8,3
mov r9,4
mov rax, 5
mov [rsp+20h],rax
call p4_1
add rsp, 28h
ret
p4 endp
; Case 3, but without need for epilog because ExitProcess fixes everything
main proc
sub rsp, 28h ; shadow space + align
call p1
mov rcx, 1
call p2
call p3
call p4
;add rsp, 28h
;ret
mov ecx,0
call ExitProcess
main endp
end
I did not want to double post this example of recursion so I put it here,
http://masm32.com/board/index.php?topic=6720.0
No stack frame twiddling trying to get it to work, a prologue/epilogue that constructs a minimum stack frame that automates stack frame creation in a context where it is the only way to do it. (Iteration procs are not recursive.)
Hutch, Nidud, and aw27,
Thanks very much indeed for your contributions.
Nidud- Very clear explanation. Using x64dbg I was able to verify what you wrote.
aw27: Wow! Very detailed and coherent. I need to study the options more, but looking them over I think they are pretty self-explanatory. I will certainly get back to you after completing the necessary study.
Hutch: I haven't yet been able to look over your attached file, but will get to it later today. As with aw27, I will respond to what you sent.
Now, as for the issue that kicked off this series, I went back to the original Programmer's Guide for version 6.1 of masm--way back to the 1992 edition. Here is a direct quotation from the Guide on page 198 under the heading "Generating Prologue and Epilogue Code":
QuoteWhen you use the PROC directive with its extended syntax and argument list the assembler automatically generates the prologue and epilogue code in your procedure.
This is true. I tested it using a slightly expanded version of Nidud's (see above) streamlined code. If I add to his "a PROC" a couple of arguments of the form :QWORD, :QWORD, etc. then sure enough ml64 will emit prologue and epilogue code. If there is no parameter list (as there isn't in Nidud's example) then indeed no prologue/epilogue pair shows up. Of course, it matters because--as Nidud points out clearly--one must adjust the stack search by 8 bytes to allow for the return address or one will not recover a fifth or greater passed parameter.
Thanks again to all of you, and I will reply shortly to aw27 and Hutch after doing justice to their efforts.
Mark
Hello aw27,
Since you sent your very detailed examples of the three situations I have copied, assembled, and linked all of them. I am still studying the results with x64dbg. But, I do have one preliminary questtion.
Namely, in those cases where you have declared LOCAL variables I had predicted that the assembler would use the ENTER XX,0 instruction, but that never happened. Do you have any explanation as to why ml64 persisted in using the usual individual cponents of the prologue?
I'll have more questions shortly. But, thank you for your exhaustive demonstration.
Mark
Quote from: markallyn on December 01, 2017, 05:44:57 AM
Namely, in those cases where you have declared LOCAL variables I had predicted that the assembler would use the ENTER XX,0 instruction, but that never happened. Do you have any explanation as to why ml64 persisted in using the usual individual cponents of the prologue?
There is no obligation to use ENTER with LEAVE. ENTER is considered a slow instruction and is not very popular these days.
:biggrin:
The catch here is that you don't put ENTER in a loop but then the instruction is not designed to be used in a loop so the reference to speed in this context is irrelevant. Outside of that the only gain of constructing a stack frame in an unreliable manner is that you may have the fastest MessageBoxA on the planet by saving a few picoseconds. LEAVE was always fast enough.
aw27,
QuoteThere is no obligation to use ENTER with LEAVE. ENTER is considered a slow instruction and is not very popular these days.
I was aware of this. But, nevertheless, for whatever reason ml64 has been ignoring this in some of the code I write and uses ENTER 80,0. If ml64 consistently avoided this usage, I would understand why, but it doesn't. In all of the cases you created ENTER never appears, but I can show you more that one instance in my stuff where it does. In fact, if you look at the link that Hutch sent in connection with recursion, you will see that his code has ENTER in it too.
I'm still plowing through your 3 cases of code. I have another question I will reserve for a follow-on post regarding stack aligning.
Thanks for your assistance. It is warmly welcomed.
Mark
Mark,
Have a look at the 64 bit MACRO file to see how the stackframe is constructed. Search for "UseStackFrame" to find it. There are 3 arguments you can pass to the STACKFRAME argument, the third being alignment which must be a power or 2. I have done a number of extras to handle a stack aligned for AVX ad AVX2 locals. You can adjust the 3 arguments if you want to reduce the stack overhead with nested procedure calls, the recursion test piece can be used to test this. In most instances trimming the stack overhead does not matter as it is pre-allocated memory but if you are writing recursive code that has a very large count of recursion depth, you can trim it down carefully or increase the linker settings or both. You will know if you have trimmed off too much as the app will not start OR it will stop once the stack memory is exhausted.
There is no great damage in using ENTER, but no advantage either. :biggrin:
ENTER was though for languages that use nested procedures, like Pascal and Delphi. However, they don't use it. :biggrin:
Good evening Hutch, aw27:
Thanks. I'll check out the macro. By the way, I'm perfectly content to use ENTER just about always (except for recursions--which I don't write in any case), I just don't understand what circumstances cause ml to generate it, and when it does, why it picks the size of stack frame that it picks. It seems to default to 80h--why this value is a mystery. For a bit I thought that ml would do ENTERs whenever there were LOCALs defined. But, testing this with one or two of aw27's small programs indicates that this isn't the case. It stays with the conventional push rbp; mov rbp, rsp.
Mark
Mark,
The values associated with the stackframe are those in the macro I have referred you to. The default values are aimed at safety but they are also modifiable which allows you to change alignment and tweak the arguments pointed at the stackframe macro to optimise the memory usage if you are running recursion to any large depth.
Good morning/evening Hutch:
Ah, mystery solved! I know there will be at least one more question from me winging its way more or less in your direction, but this is for my small brain a major breakthrough.
Mark
Good afternoon, evening Hutch,
I'm looking at the usestackframe macro. I'll try invoking it, but I'm unclear what the "flag" parameter is about. I can't see it in any of the comments.
Mark
Hello Hutch,
Actually, if you could direct me to an example invocation of UseStackFrame it would be most helpful. I tried a number of times to get the parameters right but haven't yet figured out how. I googled on UseStackFrame and found two references to it, but no code, just discussion.
Thanks,
Mark
:biggrin:
Mark,
Look in the "macros64" directory for the file "macros64.inc" and you will find the source of all the macros I wrote for 64 bit MASM. There you will find the macro "UseStackFrame" and its matching "EndStackFrame" and the two macros are designed to be called by a number of wrapper macros.
STACKFRAME = the default stackframe with a 16 byte alignment.
NOSTACKFRAME turns the stackframe off.
Then there a a number of alternative forms that only differ in their alignment.
YMMSTACK = 32 byte alignment for AVX instructions.
ZMMSTACK = 64 byte alignment for AVX2 instructions.
CUSTOMSTACK = roll your own.
They differ only in the equates passed to the "UseStackFrame" macro.
stackframe_default equ <dflt> ;; set default stack
stackframe_dynamic equ <dynm> ;; set byte count for ENTER mnemonic
stackframe_align equ <algn> ;; align the stack by an interval of 16
The documentation for the first 2 arguments (dflt and dynm) is available in the Intel manuals under the ENTER mnemonic, the third argument "algn" has to be a power of 2 byte alignment.
With the "masm64" help file which is incomplete on many of the library and macro code you need to read the data in these categories.
Simplified Introduction
A basic explanation of the stackframe and invoke notations.
Design Criteria
Calling Convention
How the Win 64 calling convention works.
Stack frame reference
How the MASM64 stackframe works.
In particular, you need to understand how the Win64 ABI works and why you must fully comply with it or your application will not start. Get it wrong and the app will just exit telling you nothing. Win64 FASTCALL calling convention is a lot more complex than how Win32 worked, rather than LIFO stack arguments, all arguments must be aligned according to the ABI with the first 4 arguments being in RCX, RDX, R8 and R9 and the stack has what is called "shadow space" that allows the register contents to be written to the stack for code design that required repeated access to the arguments (recursion being one example).
The system I have built is designed to look much like 32 bit code so you don't have to keep twiddling the stack to write reliable code, if you need to know how it works you need to read the documentation AND the main macro file. I warn anyone who want to write 64 bit MASM that it is an advanced topic with little useful data available, lousy and often inaccurate documentation and very few people who understand how it works.
Godd evening/morning Hutch,
Thank you so much for your very generous assistance on this thorny business. I spent my afternoon here in south east pennsylvania usa wrestling with the ABI -- which as you can easily see and have seen -- is what I'm attempting to grasp. Sill a long way to go.
What I completely fmissed is that the program defaults automatically to your macro "dynamic version" without any active invocation on my part. That much I finally comprehended, although it took longer than it should have. I tumbled to your NOSTACKFRAME macro and that was the clue that finally drove it through my thick skull, What was decisive was when I didn't include the masm54rt.inc file the prologue and epilogue no longer showed up in the disassembly.
So, where I am now as far as the usestackframe macro is concerned is that I DON'T invoke it, but I can use an alternative form as you show in the macro64.inc file and also in this post.
QuoteI warn anyone who want to write 64 bit MASM that it is an advanced topic with little useful data available, lousy and often inaccurate documentation and very few people who understand how it works.
Yes, I have been fairly warned, but I am like one of Dante's poor souls condemned to enter a 64 bit labyrinth from which no return is possible. It's what happens when you turn 75!
Regards as always,
Mark
Mark,
> It's what happens when you turn 75!
You can't use that as an excuse, I am not far behind you as I turn 70 in the middle of next year. You know the old rule, use it or lose it.
Mark,
Give this section of the help file a good read as this is where the action is in understanding the Microsoft application binary interface (ABI).
The Win 64 Calling Convention, How Does It Work ?
The first four stack addresses are [rsp], [rsp+8], [rsp+16] and [rsp+24] which are left empty. Argument 5 and upwards are written to the RSP relative address [rsp+32] and upwards with an increase in displacement of 8 bytes for each argument.
A typical procedure call with 6 arguments will look like this.
mov rcx, arg1
mov rdx, arg2
mov r8, arg3
mov r9, arg4
mov QWORD PTR [rsp+32], arg5
mov QWORD PTR [rsp+40], arg6
call FunctionName
Now the interesting part is you can pass a BYTE, WORD, DWORD and QWORD at the same stack address and the ABI is designed this way. The stack is always aligned even with different data sizes being passed. With a MASM procedure that has a stack frame and argument list, the argument written in the procedure call arrive at known locations on the stack and are accessible by their name from the procedure argument list.
Good afternoon/evening Hutch,
I am really glad you sent me exactly this passage from that document because I have been wanting to ask you about it for several weeks--ever since I realized I had no idea what was going on with parameter passing in x64. My question is: where are you creating "shadow space/spill space"? I coded this passage myself and discovered that it "works", but in most postings by various authors there is usually a "sub rsp/add rsp" pairing with the called function sandwiched between.
The document from which you extracted the passage, is, by the way, very well done. Clear and simple.
Regards,
Mark
Hutch,
I'm a firm believer in the truth and wisdom of this adage. My wife, however, who is after all the ultimate arbiter of my hours, firmly believes that getting lost in assembly language is not the most efficient means of implementing it.
Regards,
Mark
Mark,
You use shadow space where you need it. For a leaf procedure or a procedure where you know exactly how all of the registers are being use and is 4 arguments or less, you can directly pass the arguments in the first 4 registers and bypass the need for shadow space. Where you need a stack frame for more than 4 arguments and LOCAL variables you copy the first 4 registers into the start of the stack address, always at 8 byte spacing, OFFSETs 0, 8, 16, 24 then after that you copy any other arguments to following 8 byte OFFSETs but with a quirk, last arg next, second last arg after that etc .... This was tested against a multitude of API functions, C runtime functions and conventional assembler procedures and it works correctly as it is constructed according to the Microsoft ABI.
You stay away from stack manipulation as you risk messing up the stack alignment, you can still get away with using PUSH/POP but you need to exercise considerable care as you can kill the app stone dead by getting it wrong. For slightly more typing, allocate a LOCAL for each register and use MOV in and back out at the end of the proc.
MyProc proc
LOCAL myreg :QWORD
mov myreg, r15
; write you code using the reg
mov r15, myreg
ret
MyProc endp
As far as your better half, try selling her the "Use it or lose it" (do you want to care for a vegetable) view and if you survive, you can keep up your code development. :P
Hutch,
I've been playing around with a technique for correctly computing the stack space required and how to push additional parameters onto the stack prior to procedure call that uses the parameters. Here;s the results so far. In this case I'm using six parameters in a call to "mullt6ints". Please critique what I'm doing, if you have the time.
Quote
include \masm32\include64\masm64rt.inc
OPTION CASEMAP:NONE
printf PROTO :QWORD, :VARARG
mult6ints PROTO :QWORD, :QWORD, :QWORD, :QWORD, :QWORD, :QWORD
NOSTACKFRAME
.const
NUM_PUSHREG equ 6 ;;number of variables in call
STK_LOCAL equ 8 ;;some space for a local qword
STK_PAD equ ((NUM_PUSHREG and 1) xor 1) * 8
STK_TOTAL equ STK_LOCAL + STK_PAD
RBP_RA equ NUM_PUSHREG*8 + STK_LOCAL + STK_PAD
.data
frmt1 BYTE "Done with stack test", 13,10,0
frmt2 BYTE "STK_PAD is %d", 13,10,0
frmt3 BYTE "STK_TOTAL is %d",13,10,0
frmt4 BYTE "RBP_RA is %d", 13,10,0
frmt5 BYTE "The result of the multiplication is %d",13,10,0
.code
main PROC
push rbp ;;create stack frame
mov rbp, rsp
sub rsp, RBP_RA
mov rdx, STK_PAD
invoke printf, ADDR frmt2, rdx ;;print STK_PAD bytes
mov rdx, STK_TOTAL
invoke printf, ADDR frmt3, rdx ;;print STK_TOTAL bytes
mov rdx, RBP_RA
invoke printf, ADDR frmt4, rdx ;;print rbp to return address bytes
sub rsp, RBP_RA ;;create spill space
mov rcx, 1
mov rdx, 2
mov r8, 3
mov r9, 4
mov qword ptr[rsp+RBP_RA], 5
mov qword ptr[rsp+RBP_RA+8], 6
call mult6ints ;;call mult6ints
mov rdx, rax
invoke printf, ADDR frmt5, rdx ;;print result
invoke printf, ADDR frmt1 ;;say goodbye
;;undo spill space
waitkey
add rsp, RBP_RA
mov rsp, rbp ;;epilogue
pop rbp
ret
main ENDP
END
Mult6ints is not included. But, the gist of it is that mult6ints uses a macro similar to the computations in the .const section to compute RBP_RA. Then it adds 16 bytes to this figure in order to locate the two parameters passed on the stack. The whole thing looks kind of ugly to me and I'm sure a much more skilled programmer could work out a more elegant solution.
As I say, if you have some time to analyze and criticize I would be grateful.
Mark
Hutch and everyone,
I should give credit to Daniel Kusswarm ("Modern s86 Assembly Language Programming") for this approach to calculating stack size and correct pointer location). I've modified his work, but the gist of it belongs to hiim.
Mark
Mark,
You are not going to get many takers if you don't post complete working example that can be built.
I don't think the technique of Daniel Kusswarm (as reported by you, of course) is good. If , for instance, you set "STK_LOCAL equ 16", then we will have the stack not aligned after leaving the prolog :( . BTW, EQU is a directive, so is nothing to place specially in the .const segment.
There are many errors in your program, but anyway, you are progressing. :t
deleted
Hutch, Nidud, aw27:
Recognizing the truth of Hutch's response, I will post the callee later this afternoon (EST). I have modified the caller (which I had posted) so that the user can interact with it and specify number of pass parameters and also the amount of local stack required. This has proven a bit more challenging -- not surprising, given my novice status -- but it doesn't change anything much about the ABI calling so that the existing post represents pretty close to the final product I have in mind.
In the callee that I post you will see that I have converted the code in the .const section of the caller into a macro.
Regards to all and thanks for your wisdom.
Mark
Hello everyone,
As I promised earlier, here is a copy of "arith2callee.asm" which contains the macro and associated mult6ints PROC.
Quote
include \masm32\include64\masm64rt.inc
calcstack MACRO numregs:REQ, loc:REQ
LOCAL STK_PAD, STK_TOTAL
STK_PAD equ ((numregs and 1) xor 1) * 8
STK_TOTAL equ STK_LOCAL + STK_PAD
RDP_RA equ numregs*8 + loc + STK_PAD
ENDM
.data
.code
mult6ints PROC
calcstack 6, 8
imul rcx, rdx
imul rcx, r8
imul rcx, r9
imul rcx, [rbp + RDP_RA + 16]
imul rcx, [rbp + RDP_RA + 24]
mov rax, rcx
mov r12, rax
;invoke printf, ADDR frmt1
mov rax, r12
ret
mult6ints ENDP
END
As originally written I had been using printf as a debugging assistant. Printf of course messes up the volatile rax register so I was resorting to the trick of "pushing" it into the r12 non-valotile. I left the old code in and that is why you see the strange locution just before the ret.
I'm still working on the caller to make it interactive. I haven't figured out how to get the stack info from the keyboard into the .const section (or a macro which will replace it) in such fashion that the called program can get the necessary stack info without re-entering it from the keyboard -- somehow passing it from the called program to the callee. Interesting problem that no doubt some expert has solved long ago.
Regards,
Mark
Hello everyone,
OK, my last words on this subject unless someone chimes in. After playing around I decided that the simplest, brute-force approach was to collect the number of parameters and the necessary space for locals was to get keyboard input from the user. The following is the caller code. It's ugly, needs error checking, and no doubt one of you would do a better job of it.
Quote
include \masm32\include64\masm64rt.inc
OPTION CASEMAP:NONE
atoi PROTO :QWORD
printf PROTO :QWORD, :VARARG
mult6ints PROTO :QWORD, :QWORD, :QWORD, :QWORD, :QWORD, :QWORD
NOSTACKFRAME
.data
frmt1 BYTE "Done with stack test", 13,10,0
frmt2 BYTE "STK_PAD is %d", 13,10,0
frmt3 BYTE "STK_TOTAL is %d",13,10,0
frmt4 BYTE "RBP_RA is %d", 13,10,0
frmt5 BYTE "The result of the multiplication is %d",13,10,0
frmt6 BYTE "Enter number of parms",13,10,0
frmt7 BYTE "Enter number of local bytes needed",13,10,0
reserved QWORD NULL
sze QWORD 8
.data?
loc QWORD ?
numregs QWORD ?
buffer BYTE 8 dup(?)
STK_PAD QWORD ?
STK_TOTAL QWORD ?
RBP_RA QWORD ?
hFile HANDLE ?
ccRead DWORD ?
.code
main PROC
push rbp ;;create stack frame
mov rbp, rsp
;;get number of params from the keyboard
sub rsp, 30h
invoke printf, ADDR frmt6
invoke GetStdHandle, STD_INPUT_HANDLE
mov rcx, rax
invoke ReadConsole, rcx, ADDR buffer, 8, ADDR ccRead, ADDR reserved
lea rcx, buffer
call atoi
mov numregs, rax
;;get number of bytes of locals
invoke printf, ADDR frmt7
invoke GetStdHandle, STD_INPUT_HANDLE
mov rcx, rax
invoke ReadConsole, rcx, ADDR buffer, 8, ADDR ccRead, ADDR reserved
lea rcx, buffer
call atoi
mov loc, rax
add rsp, 30h
;;compute bytes of padding
mov r12, numregs
and r12, 1
xor r12, 1
imul r12, 8
mov rax, r12
mov STK_PAD, rax
;;compute total stack required
add rax, loc
mov STK_TOTAL, rax
mov r12, numregs
imul r12, 8
mov rax, loc
add rax, r12
add rax, STK_PAD
;;save RBP_RA
mov RBP_RA, rax
;;print values of STK_PAD STK_TOTAL and RBP_RA
sub rsp, RBP_RA
mov rdx, STK_PAD
invoke printf, ADDR frmt2, rdx ;;print STK_PAD bytes
mov rdx, STK_TOTAL
invoke printf, ADDR frmt3, rdx ;;print STK_TOTAL bytes
mov rdx, RBP_RA
invoke printf, ADDR frmt4, rdx ;;print rbp to return address bytes
;;set up call to mult6ints
mov rcx, 1
mov rdx, 2
mov r8, 3
mov r9, 4
mov qword ptr[rsp], 5
mov qword ptr[rsp+8], 6
call mult6ints ;;call mult6ints
mov rdx, rax
invoke printf, ADDR frmt5, rdx ;;print result
invoke printf, ADDR frmt1 ;;say goodbye
;;undo spill space
add rsp, RBP_RA
mov rsp, rbp ;;epilogue
pop rbp
waitkey
ret
main ENDP
END
Here is the callee, mult6ints:
Quote
include \masm32\include64\masm64rt.inc
NOSTACKFRAME
.datainclude
.code
mult6ints PROC
push rbp
mov rbp, rsp
imul rcx, rdx
imul rcx, r8
imul rcx, r9
imul rcx, [rbp+16]
imul rcx, [rbp+24]
mov rax, rcx
mov rsp, rbp
pop rbp
ret
mult6ints ENDP
END
As I say, it works, it's ugly. I'm sure aw27 is correct that there are other versions of this sort of thing out there. But, it helped teach me how the ABI works.
Regards,
Mark
Mark,
Put it in a ZIP file so it works, otherwise anyone who wants to look at it has to construct the rest to see what it does.
The examples of Kusswurm and the way it shows how to produce a prologue refers to "Proc Frame" while you, and in general everybody here, including myself (most times), don't care about exception handling.
What I mean is that you should not transplant a kidney to make the job of a different organ.
Also, Kusswurm talks about "NUM_PUSHREG = number of prolog non-volatile register pushes" and you are using that concept for the number of parameters of the function.
I follow this post since every day but I must say that I don't understand.
What is the goal?
Is it to create a stack frame like JwAsm does?
With PoAsm there is a 'PARAMAREA' (PARMAREA=5*QWORD).
Could you explain it for me please? :dazzled:
aw27 and Jokaste and Hutch,
aw27. Yes, absolutely Kusswarm was after SEH type code and that's why he does what he does. And, as you say, apparently no one on the forum cares about SEH. And, yes you are also right that Kusswarm's NUM_PUSHREG reffers to pushed registers and not parameters. I debated whether to change the name to PUSH_PARMS, but let it stand as is. Why bother with what I did.? This goes to Jokaste's query.
Jokaste and aw27: The goal originally was to understand what PROC actually does...and I gradually drifted off this and towards the code you see before you. As I discovered, if one includes \masm32\include\64\masm64rt.inc then one causes STACKLIB macro to be run and this generates the ENTER\LEAVE pair. If NOSTACKLIB is run, then one gets a "bare bones" PROC and one must create a frame if it's needed. But I drifted past this "discovery" and began to wonder if it was possible to interactively define a suitable stack frame for the 64-bit ABI. After considerable mucking about the above code resulted and mostly thanks to reading Kusswarm's code late in his book. Primarily, this whole business was me learning how the blasted ABI actually works if there are more than 4 parameters.
Hutch: I will post a .zip file. Should have thought about this. Apologies to all.
Mark
Quote from: markallyn on December 09, 2017, 10:50:26 AMKusswarm was after SEH type code and that's why he does what he does. And, as you say, apparently no one on the forum cares about SEH.
Yes, we love to see code crash :greenclp:
Seriously: José has done a great job exploring SEH in 64-bit land, but many assembler programmers believe in code that either works or crashes - no half-baken compromises.
Btw the guy's name is Kusswurm. The origin is German, "Kusswarm" would mean "as warm as a kiss", Kusswurm is something like a kissing worm - I sincerely hope he got used to it. He offers a freely downloadable guide to using Masm in Visual Studio (https://raw.githubusercontent.com/Apress/modern-x86-assembly-language-programming/master/9781484200650_AppA.pdf). By reading only 28 pages, you will be able to build a Hello World project in Micros**t's flagship IDE. Hurry up and get it, as it will not be compatible with the coming version of Visual Crap 8)
:P
While I confess to being a dinosaur in coding style where you get it right or it explodes in your face and makes you look like a jerk, SEH does have its place, even in properly written error free code with hardware based tasks where control of the required capacity cannot be done in software. Outside of specific hardware related issues, a "no error handler" approach makes your debugging a lot simpler and your code a lot more reliable. Get it exactly right and it works correctly without hand holding, make a mess of it and it very clearly tells you that it did not work.
Quote
I debated whether to change the name to PUSH_PARMS, but let it stand as is.
The called function doesn't have to care about the influence of the function parameters on the alignment. This is done by the caller.
Quote
but many assembler programmers believe in code that either works or crashes
That is not really the reason. The reason is that if you don't use SEH you will have problems integrating the ASM with a high-level language like C or C++ without disabling SEH for the whole application. And 99% of people that use ASM in the real World use it in this fashion.
Quote from: aw27 on December 09, 2017, 02:50:40 PMThe reason is that if you don't use SEH you will have problems integrating the ASM with a high-level language like C or C++ without disabling SEH for the whole application.
Will the C application that loads an asm dll or links to an asm object file notice that there is no SEH?
Quote from: jj2007 on December 09, 2017, 07:48:40 PM
Will the C application that loads an asm dll or links to an asm object file notice that there is no SEH?
When building in release mode, Visual Studio uses to notice. This may not apply to other tools. I don't think it will apply as well to asm dlls.
Hello everyone,
In response to Hutch's suggestion yesterday I have attached .zip file containing the caller and callee asm sources.
Mark
Let me offer sincerest apologies to Daniel Kusswurm for mangling his name. I suppose the mistake crept in because I thought a warm kiss much more appealing than kissing worms.
Mark
aw27:
With respect to your statement:
Quotehe called function doesn't have to care about the influence of the function parameters on the alignment. This is done by the caller.
Exactly so. It took me doing this bizarre program tediously over several days of dead-ends to discover this very basic fact.
Mark
Quote from: markallyn on December 10, 2017, 06:06:17 AM
It took me doing this bizarre program tediously over several days of dead-ends to discover this very basic fact.
Not bad, some people take years and others haven't got it yet.