64-bit assembly starter kit

hutch-- · July 19, 2016, 04:53:33 AM

Any struct that I have written in win64 is done like so.


strname STRUCT QWORD
  ; members dtype ? etc ....
strname ENDS

With the macros "LOCAL64 and STRUCT64" they are written in the uninitialised data section so the data is independent from the stack so I can use them anywhere in the code section without having to put them at the top like normal LOCAL variables but you can also use them in procedures with no stack frame.

qWord · July 19, 2016, 04:59:16 AM

The problem is the default prolog/epilogue, which does not respect the alignment constraints for fastcall nor the structure member alignment of 8 (Zp8 or direct). Conclusion => setup your own prologue/epilogue macro (straight forward). The second step is to write LOCAL-replacement that does add padding to 8 bytes for structure member alignment (or, as Hutch, 16 if wished)..

BTW: I'm curious what all these 64-suffixes are good for:-D

hutch-- · July 19, 2016, 05:15:17 AM

> BTW: I'm curious what all these 64-postfixes are good for:-D

It was just a naming convenience so I knew what they were for. Now interestingly enough, if you write the "proc" line in the same way as 32 bit ML with specified data types "item:QWORD" you get the first 4 arguments that correspond to the 64 bit calling convention blank and from argument 5 onwards the arguments contain the right data so I have used the following,

Code Select


    mov QWORD PTR [rbp+10h], rcx
    mov QWORD PTR [rbp+18h], rdx
    mov QWORD PTR [rbp+20h], r8
    mov QWORD PTR [rbp+28h], r9

This preserves the registers so they are not overwritten by any other procedure call and you can use the names from the "proc" line once the data is copied to the stack. it also means that you have rax rcx rdx r8 r9 r10 and r11 as volatile registers to use in the algo if needed and that is before preserving any of the non-volatile registers so if you need it you can have genuine PHUN in algo design.

jj2007 · July 19, 2016, 06:02:48 AM

Quote from: qWord on July 19, 2016, 04:59:16 AMThe second step is to write LOCAL-replacement that does add padding to 8 bytes

This is the part where I fail to see the logic. The X64 ABI makes a big fuzz about the stack being align 16, and I always thought it's because of SIMD instructions being faster or requiring such alignment. But why then align 8 for local variables? It doesn't make sense 8)

qWord · July 19, 2016, 06:30:20 AM

Quote from: jj2007 on July 19, 2016, 06:02:48 AMBut why then align 8 for local variables? It doesn't make sense 8)

To save memory of course. Even for compilers it is no problem to keep track of the alignment and reorder locals to get best results. Actual this problem is specific to ml64 and its broken HLL capacities^**.
Remarks that the structure alignment applies to all segments and not just the stack!

^**_{when doing low-level assembler programming it no problem of course}

mineiro · July 19, 2016, 06:52:27 AM

You can see this problem better if you think with odd, even, odd, even, ... . Or at some point with even, odd, even, odd, ... .
On entry point of our windows program, rsp=???8.

Before we do a 'call' stack should be 16 bytes aligned, so we should subtract from rsp 8 bytes to get rsp=???0 and after do a call, and on procedure entry rsp is back again to rsp=???8.

Suppose we call a function that have 4 parameters only and configuration below:

4*8 because rcx,rdx,r8,r9 will be saved on stack, like function parameters.
7*8 because locals
3*8 because I'm supposing that a function that have maximum parameters inside this procedure have 7 parameters, the first 4 goes into registers, so we need 3 on stack, and this space will be reusable by all others functions with minimum parameters. We setup to biggest parameter functions.

So our prologue can be
sub rsp, (4*8+7*8+3*8 ) == 4+7+3==14, the result is even, this is not ok because rsp=???8(odd). We need insert a foo (1*8 ) to stack get aligned into this prologue.
This way you can free rbp.

Other way is:
on entry point of our program, rsp=???8, but if we just do a call to a procedure without parameters, like 'call main', so this will align stack to 16 bytes multiple.
This is why I said about odds and evens, and my opinions this is the real challenge on coding to x86-64 O.S. (the same problem on linux side, just entry point is aligned to 16 bytes instead of 8 bytes, and instead of 4 shadows you have 6 shadows (both are even)). But I'm not talking about non volatile registers into this context, so, odds and evens again.

Dword locals are pseudo promoted to qwords, this is why it is a qword as have been said.

A simple test is, call at entry point printf function with 7 parameters, after with 8. One of these will be invalid.
So, now do the same as before but now with stack aligned, one of that will be invalid again.

jj2007 · July 19, 2016, 10:19:21 AM

Quote from: qWord on July 19, 2016, 06:30:20 AM
Quote from: jj2007 on July 19, 2016, 06:02:48 AMBut why then align 8 for local variables? It doesn't make sense 8)
To save memory of course.

I meant the opposite: If they want the stack align 16, why only align for a RECT? For a movaps xmm0, rc, you really need align 16 8)

QuoteAlignment
Most structures are aligned to their natural alignment. The primary exceptions are the stack pointer and malloc or alloca memory, which are aligned to 16 byte

"Most" is nice

Besides, that stack alignment story should be investigated. Insert the equivalent of PrintLine Hex$(rsp) in your WndProc and see something amazing...

qWord · July 19, 2016, 12:38:06 PM

Quote from: jj2007 on July 19, 2016, 10:19:21 AM
I meant the opposite: If they want the stack align 16, why only align for a RECT? For a movaps xmm0, rc, you really need align 16 8)

there is no conflict: you have a defined alignment on entry thus you can align the locals as you like.

jj2007 · July 19, 2016, 11:06:29 PM

Quote from: qWord on July 19, 2016, 12:38:06 PMthere is no conflict: you have a defined alignment on entry thus you can align the locals as you like.

It doesn't seem so straightforward. In my tests, the WM_PAINT handler crashes if the PAINTSTRUCT is local, it works with a global ps. Padding with an extra qword left or right won't help. Apparently I've got the spill space wrong, BeginPaint writes into my locals :(

Besides, on entry into the WndProc, the stack is aligned to 8 bytes, not to 16. Windows doesn't follow the 16-bit alignment rule 8)

jj2007 · July 22, 2016, 10:16:38 AM

include \Masm32\MasmBasic\Res\JBasic.inc ; click to download version 22 July 2016
; ### simple console demo, assembles in 32- or 64-bit mode with ML, AsmC, JWasm, HJWasm ###
j@start ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
PrintLine Chr$("This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format") ; OPT_Assembler ML ; make sure you got the right versions
Print Str$(" \ntest with multiple tabbed args:\n\t%i\t%i\t%i\n\n", 12, 13, 14)
Print "Type a number: "
Print Str$("The value of your number is %i\n", Val(Input$()))
Print Chr$(jbit$, "-bit assembly is easy, it seems...")
j@end

Output:

Code Select

This code was assembled with ml64 in 64-bit format

test with multiple tabbed args:
        12      13      14

Type a number: 123.456
The value of your number is 123
64-bit assembly is easy, it seems...

No 64-bit libraries required, a standard Masm32 installation plus MasmBasic are sufficient. Code sizes are slightly higher for 64-bit code, although the snippet above builds at 2048 bytes for both the 64- and 32-bit versions.

Attached a simple window with a RichEdit control. Open in RichMasm and hit F6.
EDIT: Removed. Use the new one further down, or go to menu File/New Masm source in RichMasm.

jj2007 · July 28, 2016, 06:23:58 PM

Quote from: sinsi on July 16, 2016, 07:18:38 PMIf you write your own prologue you can pass "maximum param bytes" in the PROC declaration and adjust the stack once.

Implemented in the latest MasmBasic release.

QuoteOne big gotcha is what happens to the upper 32 bits of a register when you manipulate the lower 32 bits.
"sub eax,eax" will zero the top 32 bits of rax. So "sub eax,eax" is the same as "sub rax,rax" except one byte smaller (no rex prefix).

Just stumbled over this one:

Code Select

| 48 31 C0                  | xor rax, rax                      |
| 83 C8 FF                  | or eax, FFFFFFFF                  |
| 48 0D FF FF FF FF         | or rax, FFFFFFFFFFFFFFFF          |
| 31 C0                     | xor eax, eax                      |
| 48 31 C0                  | xor rax, rax                      |

Looks harmless but attention, one of the instructions does not behave as you expect it ::)

Mikl__ · July 28, 2016, 10:51:27 PM

Ciao, jj2007!

Code Select

xor eax, eax => movsx rax,eax => mov rax,0000000000000000h
or eax, 0FFFFFFFFh => movsx rax,eax => mov rax,0FFFFFFFFFFFFFFFFh

jj2007 · July 28, 2016, 11:17:45 PM

Yes, exactly, that's the one:
   int 3
   xor rax, rax
   or eax, -1 ; no movsx rax,eax!!!
   or rax, -1
   xor eax, eax
   xor rax, rax

But sinsi wrote indeed "zero the top 32 bits of rax". Would movzx rax, eax be the correct rule in all cases?

jj2007 · August 06, 2016, 10:37:03 PM

For the fans of qEditor:
- install the latest MasmBasic package
- run the 64-bit example once, i.e. click on (similar as 64-bit code) in line 12 of MbGuide.rtf and hit F6
- add a line in \Masm32\menus.ini e.g. under "console build all":
Build+Run &64,\MASM32\MasmBasic\Res\Bldallc64.bat "{b}"
- move the attached batch file to its location
- open one of the attached examples in qEditor and go to the Project menu

jj2007 · August 30, 2016, 11:47:13 AM

Update 30 August 2016 (download):

During installation, you will see the MbGuide file (greenish background). Select Init in the (lower) 64-bit example on the first page, then hit F6.

Or, better, click on menu File/New Masm source, then on Dual 32/64 bit console/GUI templates that compile both as 64 and 32-bit applications

Feedback needed, this is not yet a stable release but for me it works fine. However, I wasted several hours chasing mysterious ML64 bugs, including "internal error" with hex dumps staring at me, for a source that had absolutely no problem with JWasm, HJWasm and AsmC. Besides, ML complains every now and then about "invalid characters" in the source; when retrying with the same identical file, it doesn't find the problem any more. Time to ditch this beast in favour of better and faster assemblers (where are you, Habran, Johnsa and Nidud??).

The current versions of the console and GUI templates (attached), though, still compile with all assemblers tested, in 64- and 32-bit mode. All you need is the current Masm32 installation plus MasmBasic - no other libs required. Btw, even with ML64, the jinvoke macro counts the CreateWindowEx parameters and barks at you if there aren't exactly 12 of them 8)

P.S. for the developers: have a look at \Masm32\MasmBasic\Res\DualWin.inc (search for SIZE_P)

The MASM Forum

News:

64-bit assembly starter kit

hutch--

qWord

hutch--

jj2007

qWord

mineiro

jj2007

qWord

jj2007

jj2007

jj2007

Mikl__

jj2007

jj2007

jj2007