First question by new masm32.com user; also a first-time user of UASM.
I compiled a one instruction fastcall routine 32-bit that simply loads its first argument to eax, and I get what I'd expect, a "mov eax,ecx" opcode. Note the source opcode was "mov eax,arg1" -- changed in the list file to mov eax,ecx (as if arg1 was a text equated to "ecx").
UASM v2.49, Jun 21 2019, Masm-compatible assembler.
.586
.model flat, fastcall
.code
.listall
00000000 xyz proc arg1:ptr
00000000 8BC1 mov eax, ecx
00000002 ret
00000002 C3 * retn
00000003 xyz endp
end
00000003 * _TEXT ends
But a similar thing in 64-bit generates an ebp stack frame, and then attempts to load the 1st argument from its homing area (which arg1 has yet to be stored to). Here the source file opcode was "mov rax,arg1" and the arg1 was NOT changed to rcx (as was done in the 32-bit case).
UASM v2.49, Jun 21 2019, Masm-compatible assembler.
.x64
.model flat, fastcall
.code
.listall
00000000 xyz proc arg1:ptr
00000000 55 * push rbp
00000001 488BEC * mov rbp, rsp
00000004 488B4510 mov rax, arg1
00000008 ret
00000008 C9 * leave
00000009 C3 * retn
0000000A xyz endp
end
0000000A * _TEXT ends
I would have hoped for just a "mov rax,rcx" opcode in the 64-bit case -- I don't see how the 64-bit code can even work, as arg1 has never been stored to the stack. How can I get the 64-bit case to generate only a "mov rax,rcx" opcode, as was done in the 32-bit case?
Thanks!
Ml64 has a different calling convention than ml. There is no proc. Read the masm64 help file. It has all the information you need.
@asmguru
The default is do it like MASM, this will do what you want:
; uasm64 -c /Flfile.lst -win64 -Zp8 test.asm
; link /ENTRY:xyz /SUBSYSTEM:console /MACHINE:X64 test.obj
option win64:7
option frame:noauto
.code
.listall
xyz proc arg1:ptr
mov rax, arg1
ret
xyz endp
end
AW's suggestion produced for me:
UASM v2.49, Jun 21 2019, Masm-compatible assembler.
option win64:7
option frame:noauto
.code
.listall
00000000 xyz proc arg1:ptr
00000000 48894C2408 * mov [rsp+8], rcx
00000005 55 * push rbp
00000006 488BEC * mov rbp, rsp
00000009 488B4510 mov rax, arg1
0000000D ret
0000000D C9 * leave
0000000E C3 * retn
0000000F xyz endp
end
0000000F * _TEXT ends
That is working code :thumbsup:
But alas not what I want. I hoped for a simple mov rax,rcx as per the Win64 ABI. I did not want setup/teardown of an rbp local frame; I did not want arg1 saved to its homing area above the return address (64-bit offers lots of registers! that's why the ABI uses them, keeping them out of memory for speed), nor do I need para stack alignment (I am dealing with a leaf function).
I had hoped for what I'd see from a C compiler, given such a leaf function. I've continued to read for days on this and I've concluded that though 32-bit mode assembles fine (Freudian slip... it does what I want), but in 64-bit mode the assembler requires arguments coming in registers to be homed before they can be accessed by name.
Sure avoiding the smarts of the assembler via macros (epilogue/prologue, or ML64-style "roll-your-own" functions) is possible. But I wished to use a modern smarter (than clueless ML64) assembler that could generate low size/overhead while allowing me to use a standard proc fastcall and argument names. Again, basically exactly how the registered fastcall args can be handled, never leaving a register, in 32-bit mode.
My functions are obviously more complex (more arguments, more logic...) than the above example I simplified to illustrate my confusion.
Thank you for the answers, they are helpful!
Have you tried simply using a label plus ret, instead of proc?
@asmguru,
It is not risk free for an assembler to decide that it can safely replace mov rax, arg1 with mov rax,rcx.
You need to use
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
to roll your own way of doing things.
I don't think a compiler will do it as well, except in very basic cases, and here the optimization is likely to simply strip out that function and present directly the result to the caller. What we see sometimes in compilers is that they save rcx into another register instead of into the stack.
Assembling with Poasm Version 9 :
.code
xyz PROC arg1:PTR
mov rax,arg1
ret
xyz ENDP
END
Disasembling the object module file :
public xyz
_text SEGMENT PARA 'CODE'
xyz PROC
mov rax, rcx
ret
xyz ENDP
_text ENDS
END
Poasm assumes that users never do things like this:
.code
xyz proc arg1:ptr
mov rcx, 10
mov rax, arg1
add rax, rcx
ret
xyz endp
end
Well, the user has to be careful as rcx is a member of the fastcall convention. That's the trick.
AW,
My latest C project (Microsoft 64-bit C) is a couple dozen functions. None of the functions home the arguments or set up an rbp stack frame (rbp is too valuable to lose as a general-purpose register). Functions access the incoming arguments from registers (or load them via [esp+xxx] for the 5th+). An example prologue/epilogue:
; 552 : wchar_t *GetAt(ARGS *pA, wchar_t *pT) {
$LN27:
00000 40 53 push rbx
00002 55 push rbp
00003 56 push rsi
00004 57 push rdi
00005 48 81 ec 38 02
00 00 sub rsp, 568 ; 00000238H
; 553 : enum { fn_elem = 260 };
; 554 : wchar_t fn[fn_elem];
; 555 : FILE *fh = NULL;
0000c 33 ed xor ebp, ebp
... 150 lines removed ...
000b1 48 81 c4 38 02
00 00 add rsp, 568 ; 00000238H
000b8 5f pop rdi
000b9 5e pop rsi
000ba 5d pop rbp
000bb 5b pop rbx
000bc c3 ret 0
GetAt ENDP
Take a look at the output of any MSVC 64-bit compiler -- you'll see the same thing. Use of registers for incoming args and no ebp stack frame. Which makes sense to me -- if the 64-bit ABI wanted arguments forced to the stack, they would have left the calling convention stdcall.
I know there are good reasons things work the way they do, and that a lot of code depends that it works that way. Also a lot of smart people have worked very hard and I respect what's been accomplished, and it's up to me to figure out a way to use what's offered.
What I'd like to see someday as an enhancement to 64-bit mode: a switch/mode (perhaps Win64 flag) that causes named args to be loaded from their incoming regs (or via [rsp] for 5th+), that does not require use of ebp for a stack frame (perhaps option stackbase:rsp already accomplishes this). Sure one would have to insure the arguments are handled properly; nothing is risk-free in assembly -- we have to insure our pushes and pops match!
It would appear that you need to learn how the Win64 ABI works, 4 registers matched by 4 locations on the stack (shadow space) and stack address locations above that shadow space for more than 4 arguments. You certainly CAN write RBP stack frames if you need them which is mainly for LOCAL storage, otherwise you can create procedures with no stack frame for pure mnemonic code or stack adjusted procedures for high level procedure calls. Using PUSH / POP in the same manner as 32 bit STDCALL is a failure to understand that win64 is different.
Modern releases of MSVC do not use RBP based stack frames. RSP based stack frames save 2 instructions and allow the use of RBP for other purposes. That is fine although in practice do not contribute much to performance. However, compilers use many other tricks so that it is not easy for an ASM programmer to beat a C/C++ compiler, except with SIMD instructions where compilers are not particularly smart. However, learning ASM is not just to get more speed, but this another discussion.
Now, MASM does not provide any support for RSP based frame. UASM does provide support for RSP based stack frames. Personally, I prefer RBP because I find it easier to debug on RBP.
I am not very concerned when assemblers do not provide many features. People that like to work on auto-pilot or have his little nice hand guided all the time ((C) Hutch) should consider use only HLL.
Yep,
Like AW explains,
This:
.code
xyz proc arg1:ptr
mov rax, rcx
ret
xyz endp
end
Can be this:
.code
xyz proc arg1:ptr
mov rax, rcx
mov rax, [rcx]
movd xmm0, [rcx]
movd xmm0, rcx
movdqu xmm0, [rcx]
movdqa xmm0, [rcx]
ret
xyz endp
end
I mean that if implement in use64 so that procedure args recognize has general register with pointers and do-it in standard way, probably is not wath do you want to do in some cases and will lose the additional options cases.
For your intents this must be careful think.
uasm -win64 -Zp8 -Sg -nologo -c -Sa -FlMain64.lst -FoMain64.obj Main64.asm
option win64:11 ;Get us on the stack optimization point
option frame:auto ;Leave for the optimizer get our result
.code
.listall
xyz proc arg1:ptr
mov rax, rcx
ret
xyz endp
end
option win64:11 ;Get us on the stack optimization point
option frame:auto ;Leave for the optimizer get our result
;win32
IF @Platform LT 1
RRET TEXTEQU <EAX>
RPARAM0 TEXTEQU <ECX>
RPARAM1 TEXTEQU <EDX>
RPARAM2 TEXTEQU <[ESP+12]>
RPARAM3 TEXTEQU <[ESP+16]>
ENDIF
;win64
IF @Platform EQ 1
RRET TEXTEQU <RAX>
RPARAM0 TEXTEQU <RCX>
RPARAM1 TEXTEQU <RDX>
RPARAM2 TEXTEQU <R8>
RPARAM3 TEXTEQU <R9>
ENDIF
.code
.listall
xyz proc arg1:ptr
;best to forget the use of arguments, they are treated as locals in 64bits, to
;have your code portable in 32bits or 64bits windows, a simple param macro it help along road.
mov RRET, RPARAM0
ret
xyz endp
end
Quote from: asmguru on September 22, 2019, 03:34:29 AM... set up an rbp stack frame (rbp is too valuable to lose as a general-purpose register).
On extremely rare occasions I use ebp as a register to get a speed advantage. That is in 32-bit land. Claiming that you don't have enough registers in 64-bit land is courageous. Show us
one proc of yours that requires rbp not for a stack frame.
Re calling conventions etc: The simplest solution is to do it "by hand":
include \Masm32\MasmBasic\Res\JBasic.inc ; ## builds & runs in 32- or 64-bit mode with UAsm and ML ##
.code
Just_a_label:
mText equ <rcx>
mTitle equ <rdx>
push rax ; align 16 - important for Windows, no idea about Linux
jinvoke MessageBox, 0, mText, mTitle, MB_OK or MB_SETFOREGROUND
pop rdx
ret
Init
PrintLine Chr$("This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
mov rdx, Chr$("Hello")
mov rcx, Chr$("I am a message box")
call Just_a_label ; order of args: rcx rdx r8 r9 pushed5 pushed6 etc
Inkey "ok?"
EndOfCode ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
I do understand the Win64 ABI well. Yes there are lots of registers in 64-bit mode.
I should have said in the original question the concern was code space, but I didn't want to muddle the issue of not understanding what UASM was doing in the 64-bit prologue. I was a long time ML guy brand new to UASM when the question was written.
If one is compiling a small routine (in my case many of them, into a library), homing the first four arguments then fetching them from there, setting up an ebp frame, para aligning the stack, can double the size of a routine.
Yes... if it's a small routine, who cares about doubling its size.
Yes... none of this matters a smidgen from a performance standpoint.
I've shut off the prologue/epilogue.
Sorry for beating a dead horse. Thanks again for all the feedback.
BTW I just saw the UASM COMDAT support -- this is a wonderful feature and it works great.
Hi Jose,
> People that like to work on auto-pilot or have his little nice hand guided all the time ((C) Hutch) should consider use only HLL.
I have missed something here, with 64 bit MASM I have the options of no stack frame, an RBP stack frame using either ENTER/LEAVE or combinations of RSP/RBP, yet another option of just aligning the stack via RSP so HLL API functions can be called. Then there is any valid procedure alignment for any data size including ones that have not been created yet.
Have I missed something here ?
Here is a short example of the flexibility of MASM with code generation. ZIP file attached. MASM is a MACRO assembler and it is not trying to be a C compiler, CL does that just fine.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
.data
MyString db "Once upon a time, the birds chewed lime and monkeys chewed tobacco",0
pStr dq MyString
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
USING r12,r13
LOCAL lstr :QWORD
SaveRegs
rcall getlen,pStr
mov lstr, rax
mov rcx, rax
rcall show1,rcx
rcall show2,lstr
rcall show3,lstr
RestoreRegs
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
PROCALIGN
show1 proc
rcall MessageBox,0,str$(rcx),"PROCALIGN",MB_OK
ret
show1 endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
show2 proc
rcall MessageBox,0,str$(rcx),"STACKFRAME",MB_OK
ret
show2 endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
ALTSTACKFRAME 128, 64
show3 proc
rcall MessageBox,0,str$(rcx),"ALTSTACKFRAME",MB_OK
ret
show3 endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
getlen proc
mov rax, rcx
sub rax, 1
lbl:
REPEAT 3
add rax, 1
movzx r10, BYTE PTR [rax]
test r10, r10
jz lbl1
ENDM
add rax, 1
movzx r10, BYTE PTR [rax]
test r10, r10
jnz lbl
lbl1:
sub rax, rcx
ret
getlen endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
; .text:0000000140001000 C8800000 enter 0x80, 0x0
; .text:0000000140001004 4881EC80000000 sub rsp, 0x80
; .text:000000014000100b 4C896588 mov qword ptr [rbp-0x78], r12
; .text:000000014000100f 4C896D90 mov qword ptr [rbp-0x70], r13
; .text:0000000140001013 488B0D85100000 mov rcx, qword ptr [0x14000209f]
; .text:000000014000101a E8F5000000 call sub_140001114
; .text:000000014000101a
; .text:000000014000101f 48894580 mov qword ptr [rbp-0x80], rax
; .text:0000000140001023 488BC8 mov rcx, rax
; .text:0000000140001026 488BC9 mov rcx, rcx
; .text:0000000140001029 E827000000 call sub_140001055
; .text:0000000140001029
; .text:000000014000102e 488B4D80 mov rcx, qword ptr [rbp-0x80]
; .text:0000000140001032 E85C000000 call sub_140001093
; .text:0000000140001032
; .text:0000000140001037 488B4D80 mov rcx, qword ptr [rbp-0x80]
; .text:000000014000103b E892000000 call sub_1400010d2
; .text:000000014000103b
; .text:0000000140001040 4C8B6588 mov r12, qword ptr [rbp-0x78]
; .text:0000000140001044 4C8B6D90 mov r13, qword ptr [rbp-0x70]
; .text:0000000140001048 48C7C100000000 mov rcx, 0x0
; .text:000000014000104f FF1533110000 call qword ptr [ExitProcess]
; .text:000000014000104f
; ; --------------------------------------------------------------------------
; ; sub_140001055
; ; --------------------------------------------------------------------------
; sub_140001055 proc
; .text:0000000140001055 4883EC08 sub rsp, 8
; .text:0000000140001059 488BC9 mov rcx, rcx
; .text:000000014000105c 488B1544100000 mov rdx, qword ptr [0x1400020a7]
; .text:0000000140001063 49C7C00A000000 mov r8, 0xa
; .text:000000014000106a FF1538110000 call qword ptr [_i64toa]
; .text:000000014000106a
; .text:0000000140001070 49C7C100000000 mov r9, 0x0
; .text:0000000140001077 4C8B053C100000 mov r8, qword ptr [0x1400020ba]
; .text:000000014000107e 488B1522100000 mov rdx, qword ptr [0x1400020a7]
; .text:0000000140001085 4833C9 xor rcx, rcx
; .text:0000000140001088 FF150A110000 call qword ptr [MessageBoxA]
; .text:0000000140001088
; .text:000000014000108e 4883C408 add rsp, 8
; .text:0000000140001092 C3 ret
; sub_140001055 endp
;
; ; --------------------------------------------------------------------------
; ; sub_140001093
; ; --------------------------------------------------------------------------
; sub_140001093 proc
; .text:0000000140001093 C8800000 enter 0x80, 0x0
; .text:0000000140001097 4883EC60 sub rsp, 0x60
; .text:000000014000109b 488BC9 mov rcx, rcx
; .text:000000014000109e 488B151D100000 mov rdx, qword ptr [0x1400020c2]
; .text:00000001400010a5 49C7C00A000000 mov r8, 0xa
; .text:00000001400010ac FF15F6100000 call qword ptr [_i64toa]
; .text:00000001400010ac
; .text:00000001400010b2 49C7C100000000 mov r9, 0x0
; .text:00000001400010b9 4C8B051B100000 mov r8, qword ptr [0x1400020db]
; .text:00000001400010c0 488B15FB0F0000 mov rdx, qword ptr [0x1400020c2]
; .text:00000001400010c7 4833C9 xor rcx, rcx
; .text:00000001400010ca FF15C8100000 call qword ptr [MessageBoxA]
; .text:00000001400010ca
; .text:00000001400010d0 C9 leave
; .text:00000001400010d1 C3 ret
; sub_140001093 endp
;
; ; --------------------------------------------------------------------------
; ; sub_1400010d2
; ; --------------------------------------------------------------------------
; sub_1400010d2 proc
; .text:00000001400010d2 55 push rbp
; .text:00000001400010d3 488BEC mov rbp, rsp
; .text:00000001400010d6 4881EC80000000 sub rsp, 0x80
; .text:00000001400010dd 488BC9 mov rcx, rcx
; .text:00000001400010e0 488B15FC0F0000 mov rdx, qword ptr [0x1400020e3]
; .text:00000001400010e7 49C7C00A000000 mov r8, 0xa
; .text:00000001400010ee FF15B4100000 call qword ptr [_i64toa]
; .text:00000001400010ee
; .text:00000001400010f4 49C7C100000000 mov r9, 0x0
; .text:00000001400010fb 4C8B05FC0F0000 mov r8, qword ptr [0x1400020fe]
; .text:0000000140001102 488B15DA0F0000 mov rdx, qword ptr [0x1400020e3]
; .text:0000000140001109 4833C9 xor rcx, rcx
; .text:000000014000110c FF1586100000 call qword ptr [MessageBoxA]
; .text:000000014000110c
; .text:0000000140001112 C9 leave
; .text:0000000140001113 C3 ret
; sub_1400010d2 endp
Quote from: asmguru on September 22, 2019, 09:20:46 AM
BTW I just saw the UASM COMDAT support -- this is a wonderful feature and it works great.
ml64 have
-Gy for that, from version 12
Hi Hutch,
You are doing a fantastic work on MASM 64-bit, which was delivered to the masses as a diamond in rough.
A lot of things can be done with macros (but others are impossible, such as producing the vectorcall calling convention), but in particular the asmguru requirement for direct loading of parameters into registers is a piece of cake with macros.
My point was actually that, there is no justification for demanding an Assembler to do everything ex-factory. Although Masm is too brute and Uasm is a little more polished, we can always work though macros or by hand like an artisan when what we search for is not delivered. People that want everything at their disposal, without further trouble, better search for a HLL.
Quote from: TimoVJL on September 22, 2019, 05:07:03 PM
ml64 have -Gy for that, from version 12
Thank you, I didn't know that.
Hi,
If you want "naked" procedures I'd recommend just using the registers directly.
So, if you use:
option stackbase:rsp
option win64:15
for example firstly rbp is freed up and stack-frames are based on RSP instead. In addition you can declare a proc with named arguments (for type checking and invoke etc)
MyProc PROC FRAME aPtr:QWORD, aVal:DWORD
ret
MyProc ENDP
Now you have several options:
1) Use the arguments by name.. this will trigger the allocation and setup of homing area for the parameter and it will be accessed from it's stack location
2) Ignore the argument name and just use the register directly in which case UASM optimise out the setup of the stack-frame (as it only does that for parameters that are actually referenced by name in the proc).
3) Use option proc:none or option prologue/epilogue to setup a completely manual procedure with no automatic intervention by the assembler.
For me I normally stick to (1) and for leaf functions where you're trying to get every last cycle I'd opt for (2) as it's still clean with less potential for mistakes.
If you're trying to keep those leaf functions more readable (hence why you'd want to still refer to named parameters) you could just setup some equates,
MyProc PROC FRAME aPtr:QWORD, aVal:DWORD
_aPtr EQU <rcx>
_aVal EQU <edx>
mov rsi,_aPtr
ret
MyProc ENDP
Something like that, which would also give you the ability to swap the text equate between param name and register ..