News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

PROC and prolog/epilog

Started by markallyn, November 28, 2017, 08:04:12 AM

Previous topic - Next topic

hutch--

Mark,

Give this section of the help file a good read as this is where the action is in understanding the Microsoft application binary interface (ABI).

The Win 64 Calling Convention, How Does It Work ?

The first four stack addresses are [rsp], [rsp+8], [rsp+16] and [rsp+24] which are left empty. Argument 5 and upwards are written to the RSP relative address [rsp+32] and upwards with an increase in displacement of 8 bytes for each argument.

A typical procedure call with 6 arguments will look like this.

mov rcx, arg1
mov rdx, arg2
mov r8, arg3
mov r9, arg4
mov QWORD PTR [rsp+32], arg5
mov QWORD PTR [rsp+40], arg6
call FunctionName

Now the interesting part is you can pass a BYTE, WORD, DWORD and QWORD at the same stack address and the ABI is designed this way. The stack is always aligned even with different data sizes being passed. With a MASM procedure that has a stack frame and argument list, the argument written in the procedure call arrive at known locations on the stack and are accessible by their name from the procedure argument list.



markallyn

Good afternoon/evening Hutch,

I am really glad you sent me exactly this passage from that document because I have been wanting to ask you about it for several weeks--ever since I realized I had no idea what was going on with parameter passing in x64.  My question is:  where are you creating "shadow space/spill space"?  I coded this passage myself and discovered that it "works", but in most postings by various authors there is usually a "sub rsp/add rsp" pairing with the called function sandwiched between.

The document from which you extracted the passage, is, by the way, very well done.  Clear and simple.

Regards,
Mark

markallyn

Hutch,

I'm a firm believer in the truth and wisdom of this adage.  My wife, however, who is after all the ultimate arbiter of my hours, firmly believes that getting lost in assembly language is not the most efficient means of implementing it.

Regards,
Mark

hutch--

Mark,

You use shadow space where you need it. For a leaf procedure or a procedure where you know exactly how all of the registers are being use and is 4 arguments or less, you can directly pass the arguments in the first 4 registers and bypass the need for shadow space. Where you need a stack frame for more than 4 arguments and LOCAL variables you copy the first 4 registers into the start of the stack address, always at 8 byte spacing, OFFSETs 0, 8, 16, 24 then after that you copy any other arguments to following 8 byte OFFSETs but with a quirk, last arg next, second last arg after that etc .... This was tested against a multitude of API functions, C runtime functions and conventional assembler procedures and it works correctly as it is constructed according to the Microsoft ABI.

You stay away from stack manipulation as you risk messing up the stack alignment, you can still get away with using PUSH/POP but you need to exercise considerable care as you can kill the app stone dead by getting it wrong. For slightly more typing, allocate a LOCAL for each register and use MOV in and back out at the end of the proc.

MyProc proc

    LOCAL myreg :QWORD

    mov myreg, r15

  ; write you code using the reg

    mov r15, myreg

    ret

MyProc endp


As far as your better half, try selling her the "Use it or lose it" (do you want to care for a vegetable) view and if you survive, you can keep up your code development.  :P

markallyn

Hutch,
I've been playing around with a technique for correctly computing the stack space required and how to push additional parameters onto the stack prior to procedure call that uses the parameters.  Here;s the results so far.  In this case I'm using six parameters in a call to "mullt6ints".  Please critique what I'm doing, if you have the time.

Quote
include \masm32\include64\masm64rt.inc

OPTION CASEMAP:NONE

printf      PROTO :QWORD, :VARARG
mult6ints   PROTO :QWORD, :QWORD, :QWORD, :QWORD, :QWORD, :QWORD

NOSTACKFRAME

.const               
NUM_PUSHREG equ  6         ;;number of variables in call
STK_LOCAL   equ  8         ;;some space for a local qword
STK_PAD       equ ((NUM_PUSHREG and 1) xor 1) * 8
STK_TOTAL   equ STK_LOCAL + STK_PAD
RBP_RA       equ NUM_PUSHREG*8 + STK_LOCAL + STK_PAD


.data
frmt1   BYTE   "Done with stack test", 13,10,0
frmt2   BYTE   "STK_PAD is %d", 13,10,0
frmt3   BYTE   "STK_TOTAL is %d",13,10,0
frmt4   BYTE   "RBP_RA is %d", 13,10,0
frmt5   BYTE   "The result of the multiplication is %d",13,10,0

.code
main      PROC
push   rbp            ;;create stack frame
mov   rbp, rsp
sub   rsp, RBP_RA
mov   rdx, STK_PAD
invoke  printf, ADDR frmt2, rdx      ;;print STK_PAD bytes
mov   rdx, STK_TOTAL
invoke   printf, ADDR frmt3, rdx      ;;print STK_TOTAL bytes
mov   rdx, RBP_RA
invoke  printf, ADDR frmt4, rdx      ;;print rbp to return address bytes
sub   rsp, RBP_RA         ;;create spill space
mov   rcx, 1
mov   rdx, 2
mov   r8, 3
mov   r9, 4
mov   qword ptr[rsp+RBP_RA], 5
mov   qword ptr[rsp+RBP_RA+8], 6
call   mult6ints         ;;call mult6ints
mov   rdx, rax
invoke   printf, ADDR frmt5, rdx      ;;print result
invoke  printf, ADDR frmt1      ;;say goodbye
      ;;undo spill space
waitkey
add   rsp, RBP_RA   
mov   rsp, rbp         ;;epilogue
pop   rbp
ret
main   ENDP
END


Mult6ints is not included.  But, the gist of it is that mult6ints uses a macro similar to the computations in the .const section to compute RBP_RA.  Then it adds 16 bytes to this figure in order to locate the two parameters passed on the stack.  The whole thing looks kind of ugly to me and I'm sure a much more skilled programmer could work out a more elegant solution.

As I say, if you have some time to analyze and criticize I would be grateful.

Mark

markallyn

Hutch and everyone,

I should give credit to Daniel Kusswarm ("Modern s86 Assembly Language Programming") for this approach to calculating stack size and correct pointer location).  I've modified his work, but the gist of it belongs to hiim.

Mark

hutch--

Mark,

You are not going to get many takers if you don't post complete working example that can be built.

aw27

I don't think the technique of Daniel Kusswarm (as reported by you, of course) is good. If , for instance, you set "STK_LOCAL   equ  16", then we will have the stack not aligned after leaving the prolog  :( . BTW, EQU is a directive, so is nothing to place specially in the .const segment.

There are many errors in your program, but anyway, you are progressing.  :t

nidud

#38
deleted

markallyn

Hutch, Nidud, aw27:

Recognizing the truth of Hutch's response, I will post the callee later this afternoon (EST).  I have modified the caller (which I had posted) so that the user can interact with it and specify number of pass parameters and also the amount of local stack required.  This has proven a bit more challenging -- not surprising, given my novice status -- but it doesn't change anything much about the ABI calling so that the existing post represents pretty close to the final product I have in mind. 

In the callee that I post you will see that I have converted the code in the .const section of the caller into a macro.

Regards to all and thanks for your wisdom.

Mark


markallyn

Hello everyone,

As I promised earlier, here is a copy of "arith2callee.asm" which contains the macro and associated mult6ints PROC. 

Quote
include \masm32\include64\masm64rt.inc

calcstack MACRO numregs:REQ, loc:REQ                
LOCAL   STK_PAD, STK_TOTAL
         
STK_PAD       equ ((numregs and 1) xor 1) * 8
STK_TOTAL   equ STK_LOCAL + STK_PAD
RDP_RA       equ numregs*8 + loc + STK_PAD
   ENDM
.data

.code
mult6ints   PROC
calcstack  6, 8
imul   rcx, rdx
imul   rcx, r8
imul   rcx, r9
imul   rcx, [rbp + RDP_RA + 16]
imul   rcx, [rbp + RDP_RA + 24]
mov   rax, rcx
mov   r12, rax
;invoke   printf, ADDR frmt1
mov   rax, r12
ret
mult6ints   ENDP
END


As originally written I had been using printf as a debugging assistant.  Printf of course messes up the volatile rax register so I was resorting to the trick of "pushing" it into the r12 non-valotile.  I left the old code in and that is why you see the strange locution just before the ret.

I'm still working on the caller to make it interactive.  I haven't figured out how to get the stack info from the keyboard into the .const section (or a macro which will replace it) in such fashion that the called program can get the necessary stack info without re-entering it from the keyboard -- somehow passing it from the called program to the callee.  Interesting problem that no doubt some expert has solved long ago.

Regards,
Mark

markallyn

Hello everyone,

OK, my last words on this subject unless someone chimes in.  After playing around I decided that the simplest, brute-force approach was to collect the number of parameters and the necessary space for locals was to get keyboard input from the user.  The following is the caller code.  It's ugly, needs error checking, and no doubt one of you would do a better job of it.
Quote
include \masm32\include64\masm64rt.inc

OPTION CASEMAP:NONE


atoi      PROTO :QWORD
printf      PROTO :QWORD, :VARARG
mult6ints   PROTO :QWORD, :QWORD, :QWORD, :QWORD, :QWORD, :QWORD

NOSTACKFRAME

.data
frmt1   BYTE   "Done with stack test", 13,10,0
frmt2   BYTE   "STK_PAD is %d", 13,10,0
frmt3   BYTE   "STK_TOTAL is %d",13,10,0
frmt4   BYTE   "RBP_RA is %d", 13,10,0
frmt5   BYTE   "The result of the multiplication is %d",13,10,0
frmt6   BYTE   "Enter number of parms",13,10,0
frmt7   BYTE   "Enter number of local bytes needed",13,10,0
reserved QWORD   NULL
sze   QWORD   8

.data?
loc   QWORD   ?
numregs   QWORD   ?
buffer   BYTE   8 dup(?)
STK_PAD QWORD   ?
STK_TOTAL QWORD   ?
RBP_RA   QWORD   ?
hFile   HANDLE   ?
ccRead   DWORD   ?

.code
main      PROC
push      rbp   ;;create stack frame
mov      rbp, rsp

         ;;get number of params from the keyboard
sub      rsp, 30h
invoke      printf, ADDR frmt6
invoke      GetStdHandle, STD_INPUT_HANDLE
mov      rcx, rax
invoke      ReadConsole, rcx, ADDR buffer, 8, ADDR ccRead, ADDR reserved
lea      rcx, buffer
call      atoi
mov      numregs, rax

         ;;get number of bytes of locals
invoke      printf, ADDR frmt7
invoke      GetStdHandle, STD_INPUT_HANDLE
mov      rcx, rax
invoke      ReadConsole, rcx, ADDR buffer, 8, ADDR ccRead, ADDR reserved
lea      rcx, buffer
call      atoi
mov      loc, rax
add      rsp, 30h

         ;;compute bytes of padding
mov      r12, numregs
and      r12, 1
xor      r12, 1
imul      r12, 8
mov      rax, r12
mov      STK_PAD, rax

         ;;compute total stack required
add      rax, loc
mov      STK_TOTAL, rax
mov      r12, numregs
imul      r12, 8
mov      rax, loc
add      rax, r12
add      rax, STK_PAD

         ;;save RBP_RA       
mov      RBP_RA, rax

      ;;print values of STK_PAD STK_TOTAL and RBP_RA
sub      rsp, RBP_RA
mov      rdx, STK_PAD
invoke     printf, ADDR frmt2, rdx      ;;print STK_PAD bytes
mov      rdx, STK_TOTAL
invoke      printf, ADDR frmt3, rdx      ;;print STK_TOTAL bytes
mov      rdx, RBP_RA
invoke     printf, ADDR frmt4, rdx      ;;print rbp to return address bytes

         ;;set up call to mult6ints      
mov      rcx, 1
mov      rdx, 2
mov      r8, 3
mov      r9, 4
mov      qword ptr[rsp], 5
mov      qword ptr[rsp+8], 6
call      mult6ints         ;;call mult6ints
mov      rdx, rax
invoke      printf, ADDR frmt5, rdx      ;;print result
invoke     printf, ADDR frmt1      ;;say goodbye
         
         ;;undo spill space
add      rsp, RBP_RA   
mov      rsp, rbp         ;;epilogue
pop      rbp

waitkey
      ret
main   ENDP
END

Here is the callee, mult6ints:
Quote
include \masm32\include64\masm64rt.inc

NOSTACKFRAME

.datainclude

.code
mult6ints   PROC
push   rbp
mov   rbp, rsp
imul   rcx, rdx
imul   rcx, r8
imul   rcx, r9
imul   rcx, [rbp+16]
imul   rcx, [rbp+24]
mov   rax, rcx
mov   rsp, rbp
pop   rbp
ret
mult6ints   ENDP
END

As I say, it works, it's ugly.  I'm sure aw27 is correct that there are other versions of this sort of thing out there.  But, it helped teach me how the ABI works.

Regards,
Mark

hutch--

Mark,

Put it in a ZIP file so it works, otherwise anyone who wants to look at it has to construct the rest to see what it does.

aw27

The examples of Kusswurm and the way it shows how to produce a prologue refers to "Proc Frame" while you, and in general everybody here, including myself (most times), don't care about exception handling.
What I mean is that you should not transplant a kidney to make the job of a different organ.
Also, Kusswurm talks about "NUM_PUSHREG = number of prolog non-volatile register pushes" and you are using that concept for the number of parameters of the function.

Jokaste

I follow this post since every day but I must say that I don't understand.
What is the goal?
Is it to create a stack frame like JwAsm does?
With PoAsm there is a 'PARAMAREA' (PARMAREA=5*QWORD).


Could you explain it for me please? :dazzled:
Kenavo
---------------------------
Grincheux / Jokaste