64-bit assembly starter kit

jj2007 · July 16, 2016, 11:48:29 AM

See Dual 64/32-bit assembly in the MasmBasic thread. On my machine, both the console and the GUI example assemble fine with ML64, AsmC, HJWasm and JWasm. Work in progress, though - it's more a proof of concept than anything else.

Note that the new jinvoke macro does indeed check the number and type of parameters, even with Microsoft's poor crippled 64-bit version of the once powerful MASM assembler. This is my version of progress 8)

jj2007 · July 16, 2016, 06:46:51 PM

I had to update the library once more, sorry :(

Based on the sources that appear when clicking File/New Masm source in RichMasm as "Dual 32/64 bit console/GUI", I get (see attachment) the following code sizes:

Code Select

2048 console 32-bit
2560 console 64-bit
6144 windows 32-bit
8192 windows 64-bit

So 64-bit code is definitely longer, although the effect is not dramatic 8)
(Windows6 by Hutch is a bit longer, but that is because of the bigger icon)

P.S.: The GUI versions feature a simple window with an edit control and a menu, with WM_ messages shown in the console. The more boring ones like WM_MOUSEMOVE are filtered out. So far I haven't seen message differences between 64- and 32-bit code.

sinsi · July 16, 2016, 07:18:38 PM

Without source code it's hard to tell, but if you are using "mov ebx,OFFSET var" that is longer than "lea ebx,[var]" (4 bytes extra?).
With ML64, using lea will use rip-relative addressing, the only catch is that var needs to be within +-2GB.
This allows for proper position independent code too.

Another cause of bloat is adjusting the stack for each API call, can end up as code like

Code Select


sub rsp,20h
call API_1
add rsp,20h
sub rsp,20h
call API_2
add rsp,20h
sub rsp,20h
call API_3
add rsp,20h
...

If you write your own prologue you can pass "maximum param bytes" in the PROC declaration and adjust the stack once.

One big gotcha is what happens to the upper 32 bits of a register when you manipulate the lower 32 bits.
"sub eax,eax" will zero the top 32 bits of rax. So "sub eax,eax" is the same as "sub rax,rax" except one byte smaller (no rex prefix).
A common way to get two 16-bit numbers into a 32-bit register, extended to 32 and 64

Code Select


mov ax,high16
shl eax,16
mov ax,dx
;
mov eax,high32
shl rax,32
mov eax,edx ;oops, high32 of rax now 0

jj2007 · July 16, 2016, 08:04:26 PM

Quote from: sinsi on July 16, 2016, 07:18:38 PMif you are using "mov ebx,OFFSET var" that is longer than "lea ebx,[var]" (4 bytes extra?).

Good point, already corrected and uploaded (there were only a few instances where it mattered, though) :t

QuoteAnother cause of bloat is adjusting the stack for each API call, can end up as code like

Hmmmmmmm...

Code Select


sub rsp,20h  ; once on top
push rcx   ; save a reg
call API_1
pop rcx   ; pop saved reg????

push rcx   ; save a reg
call API_2
pop rcx   ; pop saved reg????

add rsp,20h  ; once at endp

QuoteA common way to get two 16-bit numbers into a 32-bit register, extended to 32 and 64

Valid point, thanks. Keep in mind that movzx works with 16-bit operands:

Code Select

48 0F B7 55 2A movzx rdx, word ptr ss:[rbp+2A]

sinsi · July 16, 2016, 08:29:00 PM

Why?

Code Select


push rcx   ; save a reg

Same thing, 48=REX prefix (register extension?)

Code Select


seg000:0000000000000000 48 0F B7 55 2A                          movzx   rdx, word ptr [rbp+2Ah]
seg000:0000000000000005 0F B7 55 2A                             movzx   edx, word ptr [rbp+2Ah]

jj2007 · July 16, 2016, 08:33:27 PM

Quote from: sinsi on July 16, 2016, 08:29:00 PM
Why?
Code Select Expand
push rcx ; save a reg

Why not? I always save values with a push/pop pair 8)

QuoteSame thing, 48=REX prefix (register extension?)
Code Select Expand
seg000:0000000000000000 48 0F B7 55 2A movzx rdx, word ptr [rbp+2Ah] seg000:0000000000000005 0F B7 55 2A movzx edx, word ptr [rbp+2Ah]

Good catch indeed, thanks.

sinsi · July 16, 2016, 09:16:28 PM

Save a reg, unalign the stack...

jj2007 · July 18, 2016, 07:27:08 AM

MSDN:

QuoteINVOKE
Visual Studio 2015
Other Versions

Calls the procedure at the address given by expression, passing the arguments on the stack or in registers

Did I miss something? My ML claims to be Microsoft (R) Macro Assembler (x64) Version 10.00.30319.01, and it considers invoke a syntax error...

habran · July 18, 2016, 08:57:37 AM

ML64 is dumb, it doesn't understand any HLL :icon13:

jj2007 · July 18, 2016, 11:33:01 AM

Quote from: habran on July 18, 2016, 08:57:37 AM
ML64 is dumb, it doesn't understand any HLL :icon13:

So why do they mention INVOKE in the VS 2015 docs??

rrr314159 · July 18, 2016, 11:55:08 AM

VS 2015 is actually 32-bit application. It can make 64-bit code but (I suppose) you can still use it with ML 32-bit

hutch-- · July 18, 2016, 12:44:48 PM

Some of the things you guys say makes me laugh. When Iczelion and I started on ML.EXE in 1997 it had almost no documentation except a 16 bit help file called "alang.hlp", it was sh*t canned by everyone as useless, out of date and not as good as TASM and it could not be written by programmers. Some months later the idiots shoved their foot back in their mouth when we got it up and going. ML.EXE was a bad mannered old pig even back then but it kicked arse big time when it came to results. I confess I had a lot of fun with a toy I wrote years ago called "thegun.exe" at a disgusting 6k up and running.

Neither ML.EXE or ML64.EXE are consumer software, they don't hold your hot little hand and if you make a PHUKUP it will bite you, they are industrial tools for creating object modules for executable and DLL files. Now as everyone is playing catchup by emulating MASM macros, ML64 has no problems emulating "invoke", ".if" and the rest of the control flow options. ".switch" came easily, multiple "invoke" variations were trivial and library modules are routine. I am not a fan of the assembler wars having spent some years brawling there but when push comes to shove, MASM has been in development since 1982 and is still going with a 64 bit version that is at least as bad mannered as any of the earlier versions. :P

Now think of the days in 1990 when you could write 16 bit assembler that looked like a CodeView debug session, you could tell the men from the boys by the paper they used. (large sheet of 0000 sandpaper).

jj2007 · July 18, 2016, 06:56:11 PM

Quote from: hutch-- on July 18, 2016, 12:44:48 PMwith a 64 bit version that is at least as bad mannered as any of the earlier versions

Apart from being CrippleWare, ML64 complains randomly about "invalid character in file". On second attempt with identical files, it usually works. Probably they haven't understood the X64 ABI in Redmond 8)

Stupid noob question: Do all local variables need to be individually aligned 16? I ran into trouble with a "misaligned" PAINTSTRUCT 8)

If that is the case, it would imply that all QWORD locals need to waste an extra 8 bytes ::)
And when reading what Hutch coded as Local64, this really seems to be the requirement... what a mess!

qWord · July 18, 2016, 11:15:40 PM

Quote from: jj2007 on July 18, 2016, 06:56:11 PM
Quote from: hutch-- on July 18, 2016, 12:44:48 PMwith a 64 bit version that is at least as bad mannered as any of the earlier versions
Stupid noob question: Do all local variables need to be individually aligned 16?

The actual needed alignment depends on the concrete structure and is determined by the structure member (resp. member in sub-structure) with the largest alignment constraint. For the WinAPI the default structure member alignment is set to 8, means that alignment never gets larger than 8, but might be smaller.

Quote from: Aggregates and UnionsThe alignment of the beginning of a structure or a union is the maximum alignment of any individual member. Each member within the structure or union must be placed at its proper alignment as defined in the previous table, which may require implicit internal padding, depending on the previous member.

For PAINTSTRUCT 8 is required (because of HDC) and, e.g., for RECT 4 is sufficient.

EDIT: forgot to mention the default structure alignment

jj2007 · July 19, 2016, 02:47:52 AM

Thanks, qWord. I guess even ML64 can align structures, but what about the local variables? Hutch has chosen this approach:

Code Select

  ; LOCAL64 macro is to maintain stack alignment of locals.
  ; each macro adds a dummy local after the named LOCAL to add
  ; an extra 8 bytes to the stack.

Which I interpret that the start address (rbp+x) of every ps and rc must be aligned to 16-bit, for use with movaps and friends. Or am I wrong?

The MASM Forum

News:

64-bit assembly starter kit

jj2007

jj2007

sinsi

jj2007

sinsi

jj2007

sinsi

jj2007

habran

jj2007

rrr314159

hutch--

jj2007

qWord

jj2007