News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

64-bit assembly starter kit

Started by jj2007, July 16, 2016, 11:48:29 AM

Previous topic - Next topic

jj2007

See Dual 64/32-bit assembly in the MasmBasic thread. On my machine, both the console and the GUI example assemble fine with ML64, AsmC, HJWasm and JWasm. Work in progress, though - it's more a proof of concept than anything else.

Note that the new jinvoke macro does indeed check the number and type of parameters, even with Microsoft's poor crippled 64-bit version of the once powerful MASM assembler. This is my version of progress 8)

jj2007

I had to update the library once more, sorry :(

Based on the sources that appear when clicking File/New Masm source in RichMasm as "Dual 32/64 bit console/GUI", I get (see attachment) the following code sizes:
2048 console 32-bit
2560 console 64-bit
6144 windows 32-bit
8192 windows 64-bit


So 64-bit code is definitely longer, although the effect is not dramatic 8)
(Windows6 by Hutch is a bit longer, but that is because of the bigger icon)

P.S.: The GUI versions feature a simple window with an edit control and a menu, with WM_ messages shown in the console. The more boring ones like WM_MOUSEMOVE are filtered out. So far I haven't seen message differences between 64- and 32-bit code.

sinsi

Without source code it's hard to tell, but if you are using "mov ebx,OFFSET var" that is longer than "lea ebx,[var]" (4 bytes extra?).
With ML64, using lea will use rip-relative addressing, the only catch is that var needs to be within +-2GB.
This allows for proper position independent code too.

Another cause of bloat is adjusting the stack for each API call, can end up as code like

sub rsp,20h
call API_1
add rsp,20h
sub rsp,20h
call API_2
add rsp,20h
sub rsp,20h
call API_3
add rsp,20h
...

If you write your own prologue you can pass "maximum param bytes" in the PROC declaration and adjust the stack once.

One big gotcha is what happens to the upper 32 bits of a register when you manipulate the lower 32 bits.
"sub eax,eax" will zero the top 32 bits of rax. So "sub eax,eax" is the same as "sub rax,rax" except one byte smaller (no rex prefix).
A common way to get two 16-bit numbers into a 32-bit register, extended to 32 and 64

mov ax,high16
shl eax,16
mov ax,dx
;
mov eax,high32
shl rax,32
mov eax,edx ;oops, high32 of rax now 0


jj2007

Quote from: sinsi on July 16, 2016, 07:18:38 PMif you are using "mov ebx,OFFSET var" that is longer than "lea ebx,[var]" (4 bytes extra?).

Good point, already corrected and uploaded (there were only a few instances where it mattered, though) :t

QuoteAnother cause of bloat is adjusting the stack for each API call, can end up as code like
Hmmmmmmm...

sub rsp,20h  ; once on top
push rcx   ; save a reg
call API_1
pop rcx   ; pop saved reg????

push rcx   ; save a reg
call API_2
pop rcx   ; pop saved reg????

add rsp,20h  ; once at endp


QuoteA common way to get two 16-bit numbers into a 32-bit register, extended to 32 and 64
Valid point, thanks. Keep in mind that movzx works with 16-bit operands:
48 0F B7 55 2A    movzx rdx, word ptr ss:[rbp+2A]

sinsi

Why?

push rcx   ; save a reg


Same thing, 48=REX prefix (register extension?)

seg000:0000000000000000 48 0F B7 55 2A                          movzx   rdx, word ptr [rbp+2Ah]
seg000:0000000000000005 0F B7 55 2A                             movzx   edx, word ptr [rbp+2Ah]

jj2007

Quote from: sinsi on July 16, 2016, 08:29:00 PM
Why?

push rcx   ; save a reg

Why not? I always save values with a push/pop pair 8)

QuoteSame thing, 48=REX prefix (register extension?)

seg000:0000000000000000 48 0F B7 55 2A                          movzx   rdx, word ptr [rbp+2Ah]
seg000:0000000000000005 0F B7 55 2A                             movzx   edx, word ptr [rbp+2Ah]
Good catch indeed, thanks.

sinsi

Save a reg, unalign the stack...

jj2007

MSDN:
QuoteINVOKE
Visual Studio 2015
Other Versions

Calls the procedure at the address given by expression, passing the arguments on the stack or in registers

Did I miss something? My ML claims to be Microsoft (R) Macro Assembler (x64) Version 10.00.30319.01, and it considers invoke a syntax error...

habran

ML64 is dumb, it doesn't understand any HLL :icon13:
Cod-Father

jj2007

Quote from: habran on July 18, 2016, 08:57:37 AM
ML64 is dumb, it doesn't understand any HLL :icon13:

So why do they mention INVOKE in the VS 2015 docs??

rrr314159

VS 2015 is actually 32-bit application. It can make 64-bit code but (I suppose) you can still use it with ML 32-bit
I am NaN ;)

hutch--

 :biggrin:

Some of the things you guys say makes me laugh. When Iczelion and I started on ML.EXE in 1997 it had almost no documentation except a 16 bit help file called "alang.hlp", it was sh*t canned by everyone as useless, out of date and not as good as TASM and it could not be written by programmers. Some months later the idiots shoved their foot back in their mouth when we got it up and going. ML.EXE was a bad mannered old pig even back then but it kicked arse big time when it came to results. I confess I had a lot of fun with a toy I wrote years ago called "thegun.exe" at a disgusting 6k up and running.

Neither ML.EXE or ML64.EXE are consumer software, they don't hold your hot little hand and if you make a PHUKUP it will bite you, they are industrial tools for creating object modules for executable and DLL files. Now as everyone is playing catchup by emulating MASM macros, ML64 has no problems emulating "invoke", ".if" and the rest of the control flow options. ".switch" came easily, multiple "invoke" variations were trivial and library modules are routine. I am not a fan of the assembler wars having spent some years brawling there but when push comes to shove, MASM has been in development since 1982 and is still going with a 64 bit version that is at least as bad mannered as any of the earlier versions.  :P

Now think of the days in 1990 when you could write 16 bit assembler that looked like a CodeView debug session, you could tell the men from the boys by the paper they used. (large sheet of 0000 sandpaper).  :biggrin:

jj2007

#12
Quote from: hutch-- on July 18, 2016, 12:44:48 PMwith a 64 bit version that is at least as bad mannered as any of the earlier versions

Apart from being CrippleWare, ML64 complains randomly about "invalid character in file". On second attempt with identical files, it usually works. Probably they haven't understood the X64 ABI in Redmond 8)

Stupid noob question: Do all local variables need to be individually aligned 16? I ran into trouble with a "misaligned" PAINTSTRUCT 8)

If that is the case, it would imply that all QWORD locals need to waste an extra 8 bytes ::)
And when reading what Hutch coded as Local64, this really seems to be the requirement... what a mess!

qWord

#13
Quote from: jj2007 on July 18, 2016, 06:56:11 PM
Quote from: hutch-- on July 18, 2016, 12:44:48 PMwith a 64 bit version that is at least as bad mannered as any of the earlier versions
Stupid noob question: Do all local variables need to be individually aligned 16?
The actual needed alignment depends on the concrete structure and is determined by the structure member (resp. member in sub-structure) with the largest alignment constraint. For the WinAPI the default structure member alignment is set to 8, means that alignment never gets larger than 8, but might be smaller.

Quote from: Aggregates and UnionsThe alignment of the beginning of a structure or a union is the maximum alignment of any individual member. Each member within the structure or union must be placed at its proper alignment as defined in the previous table, which may require implicit internal padding, depending on the previous member.

For PAINTSTRUCT 8 is required (because of HDC) and, e.g., for RECT 4 is sufficient.

EDIT: forgot to mention the default structure alignment
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

Thanks, qWord. I guess even ML64 can align structures, but what about the local variables? Hutch has chosen this approach:  ; LOCAL64 macro is to maintain stack alignment of locals.
  ; each macro adds a dummy local after the named LOCAL to add
  ; an extra 8 bytes to the stack.


Which I interpret that the start address (rbp+x) of every ps and rc must be aligned to 16-bit, for use with movaps and friends. Or am I wrong?