I finally got my favorite project ("MathMovie") converted to 64-bit. It took so long because a lot of MM had to be rewritten with "advanced" (to me) techniques I learned here. Originally used 8088-era techniques - it worked but was a horrible mess (still is, actually). It was about 17,000 lines, 30 of them macros; now it's about 13,000, 600 of them in macros. One of the most useful is my invoke macro (nvk).
You're wondering: JWasm has invoke, so who wants yet another invoke macro? For one thing, I need to assemble under ML64 also (various good reasons). More important, JWasm invoke doesn't do what I want (see below).
Many issues are involved in 64-bit conversion, the major one being stack alignment / calling convention (handled by nvk). This huge issue has consumed many person-years of expert coders across the globe - so why am I able to get past it so easily? Not because I'm smarter; au contraire, I'm dumber than most of them. It's because the issue has only to do with Windows interfacing. The hardware doesn't force you to align the stack, or pass parameters in rcx, rdx, r8, r9; nor does it care about all the other arcana of Windows calling conventions: reals in xmm's (unless vararg etc), prologues, epilogues, stack frame pointers, SEH etc etc. If you simply want to get 32-bit code working in 64-bits, you can (almost) ignore all that stuff. Of course you lose a lot: no codeview, no symbolic debugging, can't create Windows-called routines, etc.
I still have to interface with many Windows functions; nvk takes care of that. It's tested thoroughly with 100 functions, the ones I need. Below is a partial list (leaves out minor ones like strlen, etc). Perhaps the trickiest involve threading, but the most demanding was good old MessageBox (probably because it's the only one that pops up its own window). There are some Windows functions nvk won't support, but I don't happen to know what they are.some major Windows functions tested with nvk invoke macro, no particular order
GetModuleHandleA proto :LPSTR
GetCommandLineA proto
ExitProcess proto :DWORD
LoadIconA proto :HINSTANCE, :LPSTR
LoadCursorA proto :HINSTANCE, :LPSTR
RegisterClassExA proto :ptr WNDCLASSEXA
CreateWindowExA proto :DWORD, :LPSTR, :LPSTR, :DWORD, :SDWORD, :SDWORD, :SDWORD, :SDWORD, :HWND, :HMENU, :HINSTANCE, :LPVOID
ShowWindow proto :HWND, :SDWORD
UpdateWindow proto :HWND
GetMessageA proto :ptr MSG, :HWND, :SDWORD, :SDWORD
TranslateMessage proto :ptr MSG
DispatchMessageA proto :ptr MSG
PostQuitMessage proto :SDWORD
DefWindowProcA proto :HWND, :UINT, :WPARAM, :LPARAM
MoveWindow proto :HWND, :DWORD, :DWORD, :DWORD, :DWORD, :BOOL
SetWindowTextA proto :HWND, :LPSTR
InvalidateRect proto :HWND, :ptr RECT, :BOOL
BeginPaint proto :HWND, :LPPAINTSTRUCT
GetClientRect proto :HWND, :LPRECT
DrawTextA proto :HDC, :LPSTR, :DWORD, :LPRECT, :DWORD
EndPaint proto :HWND, :ptr PAINTSTRUCT
PostMessageA proto :HWND, :DWORD, :WPARAM, :LPARAM
CreateFileA proto :LPSTR, :DWORD, :DWORD, :LPSECURITY_ATTRIBUTES, :DWORD, :DWORD, :HANDLE
WriteFile proto :HANDLE, :LPCVOID, :DWORD, :LPDWORD, :LPOVERLAPPED
ReadFile proto :HANDLE, :LPVOID, :DWORD, :LPDWORD, :LPOVERLAPPED
CloseHandle proto :HANDLE
CreateThread proto :LPSECURITY_ATTRIBUTES, :SIZE_T, :LPTHREAD_START_ROUTINE, :LPVOID, :DWORD, :LPDWORD
ExitThread proto :DWORD
BitBlt proto ;:HDC, :DWORD, :DWORD, :DWORD, :DWORD, :HDC, :DWORD, :DWORD, :DWORD
StretchBlt proto :HDC, :DWORD, :DWORD, :DWORD, :DWORD, :HDC, :DWORD, :DWORD, :DWORD, :DWORD, :DWORD
printf proto :ptr SBYTE, :VARARG
sprintf proto :ptr SBYTE, :ptr SBYTE, :VARARG
sscanf proto :ptr SBYTE, :ptr SBYTE, :VARARG
CreateCompatibleDC proto :HDC
GetDC proto :HWND
CreateDIBSection proto :HDC, :ptr BITMAPINFO, :DWORD, :ptr ptr, :HANDLE, :DWORD
SelectObject proto :HDC, :HGDIOBJ
DeleteObject proto :HGDIOBJ
PeekMessageA proto :LPMSG, :HWND, :DWORD, :DWORD, :DWORD
DeleteDC proto :HDC
GetCurrentProcess proto ; all these in winbase.inc
SetProcessAffinityMask proto :HANDLE, :DWORD_PTR
GetPriorityClass proto :HANDLE
SetPriorityClass proto :HANDLE, :DWORD
GetCurrentThread proto
GetThreadPriority proto :HANDLE
SetThreadPriority proto :HANDLE, :DWORD
SetThreadAffinityMask proto :HANDLE, :DWORD_PTR
Sleep proto :DWORD ; this one works ; in winbase
Beep proto :DWORD, :DWORD ; winbase
QueryPerformanceCounter proto :ptr writeanythinghereLARGE_INTEGER
QueryPerformanceFrequency proto :ptr LARGE_INTEGER
GetStdHandle proto :DWORD
WriteConsoleA proto :HANDLE, :ptr , :DWORD, :LPDWORD, :LPVOID
MessageBoxA proto :HWND, :LPSTR, :LPSTR, :DWORD
DestroyWindow proto :HWND
IsZoomed proto :HWND
LoadMenuA proto :HINSTANCE, :LPSTR
GetMenu proto :HWND
SetMenu proto :HWND, :HMENU
GetSubMenu proto :HMENU, :DWORD
CheckMenuItem proto :HMENU, :DWORD, :DWORD
CheckMenuRadioItem proto :HMENU, :DWORD, :DWORD, :DWORD, :DWORD
SetMenuItemInfoA proto :HMENU, :DWORD, :BOOL, :LPCMENUITEMINFOA
TrackPopupMenu proto :HMENU, :DWORD, :DWORD, :DWORD, :DWORD, :HWND, :ptr RECT
GetClientRect proto :HWND, :LPRECT
GetWindowRect proto :HWND, :LPRECT
SetCursor proto :HCURSOR
GetWindowLongA proto :HWND, :DWORD
SetWindowLongA proto :HWND, :DWORD, :SDWORD
SendMessageA proto :HWND, :DWORD, :WPARAM, :LPARAM
SetFocus proto :HWND
SetProcessAffinityMask proto :HANDLE, :DWORD_PTR
GetCurrentProcessId proto
GetProcessAffinityMask proto :HANDLE, :ptr DWORD_PTR, :ptr DWORD_PTR
GetCurrentThread, proto
GetCurrentThreadId proto
SetThreadAffinityMask proto :HANDLE, :DWORD_PTR
lstrcatA proto :LPSTR, :LPSTR ; and other minor ones like this
The other reason nvk can be n00b-written is, it's not very efficient: follows the KISS principle (keep it simple, sailor). My project has two major loops running at approximately 30 iterations /second and 60 million iterations / second. Windows is never invoked from the inner loop, since there are only a few hundred instruction cycles to play with (per core, of course). The 30-ips loop uses about 10 million cycles per iteration, and averages less than 1000 Windows invocations: so speed simply isn't an issue.
nvk "features" include:
- Forget about stack alignment. No "and rsp, -16"; you can push/pop across invocations (64 or 16 bit). For testing, I even adjust the stack randomly by odd numbers throughout my code (sub rsp, 4321, followed later, after a dozen invocations, by add rsp, 4321). JWasm can't do that.
- nvk never complains "register value overwritten" like JWasm. (It only takes a few extra instructions to avoid this.) In fact, it has no error messages at all; when fed bad args it just blows up all over the place.
- it handles "aDdR", and real4, real10, bytes/words/dwords, structures, all odd-sized arguments correctly. Some of these JWasm invoke doesn't do right (altho undoubtedly there are some types I haven't run into that nvk doesn't handle, but JWasm does).
This post is written for people, like me, who just want to get their 32-bit code up and running, and will worry about SEH (etc etc) later; not for experts, who of course already know this stuff. To them: please let me know what I'm doing wrong, if you've got nothing better to do at the moment. My macro technique is primitive, any tips to clean it up wld be welcome. Nvk is full of unknown (to me) bugs, so if you notice one pls inform.
There are comments in the code, here's a brief description. When called with a function and arguments, it first aligns to 16 bits, and stores the original rsp on the stack for later recovery. All registers, including rbp, are preserved. Then the args are counted and rounded up to an even number, which (times eight) is sub'ed from rsp. The arguments are put on the stack in reverse order, including the first four (always spilled to shadow space). Then the first (up to) four are read off the stack into rcx .. r9, which can of course appear as arguments. Finally the function is called; upon return, the stack is restored.
The zip includes the JWasm sample Win64_2.asm, with minimal mods for ML64 compatibility, plus makeJ.bat and makeM.bat. Should be self-explanatory.;***********************
; INVOKE MACROS for ML64 and JWasm (if you want to use it there), by rrr314159 2015/1/29
;***********************
IFNDEF __JWASM__ ;; actually I prefer nvk in JWasm as well
invoke equ nvk
ENDIF
;***********************
;***********************
nvk MACRO thefun:REQ, args:VARARG ; "invoke"
;;***********************
;; ALIGN stack to 16 bits, call nvk_noalign, then restore rsp
push rbp
lea rbp, [rsp][8] ;; save entering rsp first into rbp then onto stack
sub rsp, 8 ;; make room for entering rsp that was saved in rbp
and rsp, -10h ;; align 16
mov [rsp], rbp ;; put entering rsp onto stack
mov rbp, [rbp-8] ;; restore rbp
nvk_noalign thefun, args
pop rsp ;; right back where we started from
ENDM
;***********************
nvk_noalign MACRO thefun:REQ, args:VARARG ;; "invoke" without aligning
;;***********************
LOCAL txt, cnt, stackadjust, cnttopass
;; Prepare stack for arguments, load stack, call function, restore rsp
;; called from nvk; also can call directly if u know stack is aligned
;; Count the arguments, prepare reversed arg list
cnt = 0
IFNB <args> ;; if args blank skip most of the work
txt equ <>
% FOR arg, <args>
txt CATSTR <arg> , <,>, txt
cnt = cnt + 1
ENDM
txt SUBSTR txt, 1 ;; force expression evaluation
txt SUBSTR txt, 1, @SizeStr( %txt ) - 1
;; Adjust stack for args, rounded up to 16 bits (necessary for some funs)
IF cnt GT 4
stackadjust = cnt
ELSE
stackadjust = 4
ENDIF
stackadjust = ((stackadjust+1)/2)*2 ; round up to 16
sub rsp, stackadjust * 8
;; Load stack, saving rdx in home space to be restored after each arg
mov [rsp], rdx ;; cld also spill rcx,r8,9 if needed, or whatever
cnttopass = cnt ;; pass by ref, so don't send cnt (gets clobbered)
nvk_loadstack txt, cnttopass
;; Load four regs from prepared args loaded on stack
mov rcx, [rsp]
mov rdx, [rsp+8]
mov r8, [rsp+10h]
mov r9, [rsp+18h]
;; Adjust by 20h, no other work needed, if arglist was blank
ELSE
sub rsp, 20h
ENDIF
;; Call the function (finally), afterwards restore stack
call thefun
IF cnt GT 0
add rsp, stackadjust * 8
ELSE
add rsp, 20h
ENDIF
ENDM
;***********************
nvk_loadstack MACRO args, posonstack
;;***********************
local leacmd
;; Load args on stack in reverse, restore rdx after each (it may in the arg list)
% FOR arg, <args>
posonstack = posonstack - 1
mov rdx, [rsp] ;; rdx gets orig value each time
;; Check for "aDdR", if present prepare lea instruction and execute it
leacmd equ <@afteraddr(arg)> ;; if addr, returns after-addr text
IFNB leacmd
leacmd CATSTR <lea rdx, >, leacmd
&leacmd ;; execute lea instruction
mov [rsp + posonstack*8], rdx
ELSE
;; Convert types as necessary; if real/integer not 8 bytes, convert to real8/qword
IF TYPE(arg) EQ REAL4 OR TYPE(arg) EQ REAL10
fld arg
fstp REAL8 PTR [rsp + posonstack*8]
ELSE
IF TYPE(arg) EQ 1 OR TYPE(arg) EQ 2
movsx edx, arg
ELSEIF TYPE(arg) EQ 4
mov edx, arg
ELSE
mov rdx, arg
ENDIF
mov [rsp + posonstack*8], rdx
ENDIF
ENDIF
ENDM
ENDM
;***********************
@afteraddr MACRO thetxt:=<>
;;***********************
LOCAL char, answer, iswhite, numchars
;; If argument starts with addr return rest of string, else blank
answer equ <>
numchars = 0
FORC char, <&thetxt>
IF numchars EQ 0
iswhite INSTR 1,< >,<&char> ;; trim leading spaces or tabs
IFE iswhite
answer CATSTR answer,<&char>
numchars = 1
ENDIF
ELSEIF numchars LT 4
answer CATSTR answer,<&char>
numchars = numchars + 1
ELSEIF numchars EQ 4
;; "answer" now holds first 4 chars after whitespace, is it "aDdR"?
IFIDNI <addr>, answer
answer equ <> ;; says addr; now get the latter part of arg
numchars = 5 ;; anything > 4 will do; no longer counting
ELSE
EXITM <> ;; not addr, return blank
ENDIF
ELSE ;; numchars > 4 means get rest of string
answer CATSTR answer,<&char>
ENDIF
ENDM
;; If, after trimming, arg was too short, it couldn't be addr
IF numchars LT 5
answer equ <>
ENDIF
EXITM answer
ENDM
;***********************