The MASM Forum

64 bit assembler => 64 Bit Assembler => Topic started by: rrr314159 on January 29, 2015, 07:44:42 PM

Title: Yet Another Invoke Macro ...
Post by: rrr314159 on January 29, 2015, 07:44:42 PM
I finally got my favorite project ("MathMovie") converted to 64-bit. It took so long because a lot of MM had to be rewritten with "advanced" (to me) techniques I learned here. Originally used 8088-era techniques - it worked but was a horrible mess (still is, actually). It was about 17,000 lines, 30 of them macros; now it's about 13,000, 600 of them in macros. One of the most useful is my invoke macro (nvk).

You're wondering: JWasm has invoke, so who wants yet another invoke macro? For one thing, I need to assemble under ML64 also (various good reasons). More important, JWasm invoke doesn't do what I want (see below).

Many issues are involved in 64-bit conversion, the major one being stack alignment / calling convention (handled by nvk). This huge issue has consumed many person-years of expert coders across the globe - so why am I able to get past it so easily? Not because I'm smarter; au contraire, I'm dumber than most of them. It's because the issue has only to do with Windows interfacing. The hardware doesn't force you to align the stack, or pass parameters in rcx, rdx, r8, r9; nor does it care about all the other arcana of Windows calling conventions: reals in xmm's (unless vararg etc), prologues, epilogues, stack frame pointers, SEH etc etc. If you simply want to get 32-bit code working in 64-bits, you can (almost) ignore all that stuff. Of course you lose a lot: no codeview, no symbolic debugging, can't create Windows-called routines, etc.

I still have to interface with many Windows functions; nvk takes care of that. It's tested thoroughly with 100 functions, the ones I need. Below is a partial list (leaves out minor ones like strlen, etc). Perhaps the trickiest involve threading, but the most demanding was good old MessageBox (probably because it's the only one that pops up its own window). There are some Windows functions nvk won't support, but I don't happen to know what they are.
Code: [Select]
some major Windows functions tested with nvk invoke macro, no particular order

GetModuleHandleA proto :LPSTR
GetCommandLineA  proto
ExitProcess      proto :DWORD
LoadIconA        proto :HINSTANCE, :LPSTR
LoadCursorA      proto :HINSTANCE, :LPSTR
RegisterClassExA proto :ptr WNDCLASSEXA
CreateWindowExA  proto :DWORD, :LPSTR, :LPSTR, :DWORD, :SDWORD, :SDWORD, :SDWORD, :SDWORD, :HWND, :HMENU, :HINSTANCE, :LPVOID
ShowWindow       proto :HWND, :SDWORD
UpdateWindow     proto :HWND
GetMessageA      proto :ptr MSG, :HWND, :SDWORD, :SDWORD
TranslateMessage proto :ptr MSG
DispatchMessageA proto :ptr MSG
PostQuitMessage  proto :SDWORD
DefWindowProcA   proto :HWND, :UINT, :WPARAM, :LPARAM
MoveWindow proto :HWND, :DWORD, :DWORD, :DWORD, :DWORD, :BOOL
SetWindowTextA proto :HWND, :LPSTR
InvalidateRect proto :HWND, :ptr RECT, :BOOL
BeginPaint proto :HWND, :LPPAINTSTRUCT
GetClientRect proto :HWND, :LPRECT
DrawTextA proto :HDC, :LPSTR, :DWORD, :LPRECT, :DWORD
EndPaint proto :HWND, :ptr PAINTSTRUCT
PostMessageA proto :HWND, :DWORD, :WPARAM, :LPARAM
CreateFileA proto :LPSTR, :DWORD, :DWORD, :LPSECURITY_ATTRIBUTES, :DWORD, :DWORD, :HANDLE
WriteFile proto :HANDLE, :LPCVOID, :DWORD, :LPDWORD, :LPOVERLAPPED
ReadFile proto :HANDLE, :LPVOID, :DWORD, :LPDWORD, :LPOVERLAPPED
CloseHandle proto :HANDLE
CreateThread proto :LPSECURITY_ATTRIBUTES, :SIZE_T, :LPTHREAD_START_ROUTINE, :LPVOID, :DWORD, :LPDWORD
ExitThread proto :DWORD
BitBlt proto ;:HDC, :DWORD, :DWORD, :DWORD, :DWORD, :HDC, :DWORD, :DWORD, :DWORD
StretchBlt proto :HDC, :DWORD, :DWORD, :DWORD, :DWORD, :HDC, :DWORD, :DWORD, :DWORD, :DWORD, :DWORD
printf proto :ptr SBYTE, :VARARG
sprintf proto :ptr SBYTE, :ptr SBYTE, :VARARG
sscanf proto :ptr SBYTE, :ptr SBYTE, :VARARG
CreateCompatibleDC proto :HDC
GetDC proto :HWND
CreateDIBSection proto :HDC, :ptr BITMAPINFO, :DWORD, :ptr ptr, :HANDLE, :DWORD
SelectObject proto :HDC, :HGDIOBJ
DeleteObject proto :HGDIOBJ
PeekMessageA proto :LPMSG, :HWND, :DWORD, :DWORD, :DWORD
DeleteDC proto :HDC
GetCurrentProcess proto ; all these in winbase.inc
SetProcessAffinityMask proto :HANDLE, :DWORD_PTR
GetPriorityClass proto :HANDLE
SetPriorityClass proto :HANDLE, :DWORD
GetCurrentThread proto
GetThreadPriority proto :HANDLE
SetThreadPriority proto :HANDLE, :DWORD
SetThreadAffinityMask proto :HANDLE, :DWORD_PTR
Sleep proto :DWORD ; this one works  ; in winbase
Beep proto :DWORD, :DWORD  ; winbase
QueryPerformanceCounter proto :ptr writeanythinghereLARGE_INTEGER
QueryPerformanceFrequency proto :ptr LARGE_INTEGER
GetStdHandle proto :DWORD
WriteConsoleA proto :HANDLE, :ptr , :DWORD, :LPDWORD, :LPVOID
MessageBoxA proto :HWND, :LPSTR, :LPSTR, :DWORD
DestroyWindow proto :HWND
IsZoomed proto :HWND
LoadMenuA proto :HINSTANCE, :LPSTR
GetMenu proto :HWND
SetMenu proto :HWND, :HMENU
GetSubMenu proto :HMENU, :DWORD
CheckMenuItem proto :HMENU, :DWORD, :DWORD
CheckMenuRadioItem proto :HMENU, :DWORD, :DWORD, :DWORD, :DWORD
SetMenuItemInfoA proto :HMENU, :DWORD, :BOOL, :LPCMENUITEMINFOA
TrackPopupMenu proto :HMENU, :DWORD, :DWORD, :DWORD, :DWORD, :HWND, :ptr RECT
GetClientRect proto :HWND, :LPRECT
GetWindowRect proto :HWND, :LPRECT
SetCursor proto :HCURSOR
GetWindowLongA proto :HWND, :DWORD
SetWindowLongA proto :HWND, :DWORD, :SDWORD
SendMessageA proto :HWND, :DWORD, :WPARAM, :LPARAM
SetFocus proto :HWND
SetProcessAffinityMask proto :HANDLE, :DWORD_PTR
GetCurrentProcessId proto
GetProcessAffinityMask proto :HANDLE, :ptr DWORD_PTR, :ptr DWORD_PTR
GetCurrentThread, proto
GetCurrentThreadId proto
SetThreadAffinityMask proto :HANDLE, :DWORD_PTR
lstrcatA proto :LPSTR, :LPSTR ; and other minor ones like this
The other reason nvk can be n00b-written is, it's not very efficient: follows the KISS principle (keep it simple, sailor). My project has two major loops running at approximately 30 iterations /second and 60 million iterations / second. Windows is never invoked from the inner loop, since there are only a few hundred instruction cycles to play with (per core, of course). The 30-ips loop uses about 10 million cycles per iteration, and averages less than 1000 Windows invocations: so speed simply isn't an issue.

nvk "features" include:

- Forget about stack alignment. No "and rsp, -16"; you can push/pop across invocations (64 or 16 bit). For testing, I even adjust the stack randomly by odd numbers throughout my code (sub rsp, 4321, followed later, after a dozen invocations, by add rsp, 4321). JWasm can't do that.
- nvk never complains "register value overwritten" like JWasm. (It only takes a few extra instructions to avoid this.) In fact, it has no error messages at all; when fed bad args it just blows up all over the place.
- it handles "aDdR", and real4, real10, bytes/words/dwords, structures, all odd-sized arguments correctly. Some of these JWasm invoke doesn't do right (altho undoubtedly there are some types I haven't run into that nvk doesn't handle, but JWasm does).

This post is written for people, like me, who just want to get their 32-bit code up and running, and will worry about SEH (etc etc) later; not for experts, who of course already know this stuff. To them: please let me know what I'm doing wrong, if you've got nothing better to do at the moment. My macro technique is primitive, any tips to clean it up wld be welcome. Nvk is full of unknown (to me) bugs, so if you notice one pls inform.

There are comments in the code, here's a brief description. When called with a function and arguments, it first aligns to 16 bits, and stores the original rsp on the stack for later recovery. All registers, including rbp, are preserved. Then the args are counted and rounded up to an even number, which (times eight) is sub'ed from rsp. The arguments are put on the stack in reverse order, including the first four (always spilled to shadow space). Then the first (up to) four are read off the stack into rcx .. r9, which can of course appear as arguments. Finally the function is called; upon return, the stack is restored.

The zip includes the JWasm sample Win64_2.asm, with minimal mods for ML64 compatibility, plus makeJ.bat and makeM.bat. Should be self-explanatory.
Code: [Select]
;***********************
; INVOKE MACROS for ML64 and JWasm (if you want to use it there), by rrr314159 2015/1/29
;***********************
IFNDEF __JWASM__                                    ;; actually I prefer nvk in JWasm as well
    invoke equ nvk
ENDIF
;***********************
;***********************
nvk MACRO thefun:REQ, args:VARARG  ; "invoke"
;;***********************
;; ALIGN stack to 16 bits, call nvk_noalign, then restore rsp

    push rbp           
    lea  rbp, [rsp][8]                              ;; save entering rsp first into rbp then onto stack
    sub rsp, 8                                      ;; make room for entering rsp that was saved in rbp
    and rsp, -10h                                   ;; align 16
    mov [rsp], rbp                                  ;; put entering rsp onto stack
    mov rbp, [rbp-8]                                ;; restore rbp

    nvk_noalign thefun, args
   
    pop rsp                                         ;; right back where we started from
ENDM

;***********************
nvk_noalign MACRO thefun:REQ, args:VARARG           ;; "invoke" without aligning
;;***********************
LOCAL txt, cnt, stackadjust, cnttopass
;; Prepare stack for arguments, load stack, call function, restore rsp
;; called from nvk; also can call directly if u know stack is aligned

;; Count the arguments, prepare reversed arg list

    cnt = 0
    IFNB <args>                                     ;; if args blank skip most of the work
        txt equ <>
%       FOR arg, <args>
            txt CATSTR <arg> , <,>, txt
            cnt = cnt + 1
        ENDM
        txt SUBSTR txt, 1                           ;; force expression evaluation
        txt SUBSTR txt, 1, @SizeStr( %txt ) - 1

;; Adjust stack for args, rounded up to 16 bits (necessary for some funs)

        IF cnt GT 4
            stackadjust = cnt
        ELSE
            stackadjust = 4
        ENDIF
        stackadjust = ((stackadjust+1)/2)*2 ; round up to 16
        sub rsp, stackadjust * 8

;; Load stack, saving rdx in home space to be restored after each arg

        mov [rsp], rdx                              ;; cld also spill rcx,r8,9 if needed, or whatever
        cnttopass = cnt                             ;; pass by ref, so don't send cnt (gets clobbered)
        nvk_loadstack txt, cnttopass
       
;; Load four regs from prepared args loaded on stack

        mov rcx, [rsp]
        mov rdx, [rsp+8]
        mov r8, [rsp+10h]
        mov r9, [rsp+18h]

;; Adjust by 20h, no other work needed, if arglist was blank

    ELSE
        sub rsp, 20h
    ENDIF

;; Call the function (finally), afterwards restore stack

    call thefun             
    IF cnt GT 0
        add rsp, stackadjust * 8
    ELSE
        add rsp, 20h
    ENDIF
ENDM

;***********************
nvk_loadstack MACRO args, posonstack
;;***********************
local leacmd
;; Load args on stack in reverse, restore rdx after each (it may in the arg list)

%   FOR arg, <args>
        posonstack = posonstack - 1
        mov rdx, [rsp]                                  ;; rdx gets orig value each time

;; Check for "aDdR", if present prepare lea instruction and execute it

        leacmd equ <@afteraddr(arg)>                    ;; if addr, returns after-addr text
        IFNB leacmd
            leacmd CATSTR <lea rdx, >, leacmd
            &leacmd                                     ;; execute lea instruction
            mov [rsp + posonstack*8], rdx
        ELSE

;; Convert types as necessary; if real/integer not 8 bytes, convert to real8/qword

            IF TYPE(arg) EQ REAL4 OR TYPE(arg) EQ REAL10
                fld arg
                fstp REAL8 PTR [rsp + posonstack*8]
            ELSE
                IF TYPE(arg) EQ 1 OR TYPE(arg) EQ 2
                    movsx edx, arg
                ELSEIF TYPE(arg) EQ 4
                    mov edx, arg
                ELSE
                    mov rdx, arg
                ENDIF
                mov [rsp + posonstack*8], rdx
            ENDIF
        ENDIF
    ENDM
ENDM

;***********************
@afteraddr MACRO thetxt:=<>
;;***********************
LOCAL char, answer, iswhite, numchars

;; If argument starts with addr return rest of string, else blank

    answer equ <>
    numchars = 0
    FORC char, <&thetxt>
        IF numchars EQ 0
            iswhite INSTR 1,< >,<&char>       ;; trim leading spaces or tabs
            IFE iswhite
                answer CATSTR answer,<&char>
                numchars = 1
            ENDIF
        ELSEIF numchars LT 4
                answer CATSTR answer,<&char>
                numchars = numchars + 1
        ELSEIF numchars EQ 4
       
;; "answer" now holds first 4 chars after whitespace, is it "aDdR"?

            IFIDNI <addr>, answer
                answer equ <>                       ;; says addr; now get the latter part of arg
                numchars = 5                        ;; anything > 4 will do; no longer counting
            ELSE
                EXITM <>                            ;; not addr, return blank
            ENDIF
        ELSE                                        ;; numchars > 4 means get rest of string
            answer CATSTR answer,<&char>
        ENDIF
    ENDM

;; If, after trimming, arg was too short, it couldn't be addr

    IF numchars LT 5                           
        answer equ <>
    ENDIF       
EXITM answer
ENDM

;***********************

Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on January 29, 2015, 09:31:08 PM
Hi rrr,
Interesting set of macros! Currently I'm testing my FCALL macro (i.e. 'FASTCALL' for JWASM on Linux) . Thinking about the best way to align the stack to 16 bits I've looked through your code and have one question :
from your "Adjust stack for args" routine:
Quote
stackadjust = ((stackadjust+1)/2)*2 ; round up to 16
 
is ((stackadjust+1)/2)*2 expression equal to (stackadjust+1) ?
Do we need to perform ODD-EVEN check on number of stack args here ? What if stackadjust=6 ,or say 8?

EDIT: I got it now - those expression are not equal for assembler 
Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on January 30, 2015, 02:19:08 AM
No doubt u figured it out, Vertograd, this statement increases an odd number to the next even number. E.g. if you have 7 arguments this puts it up to 8. That way when it's multiplied by 8 bits it's a multiple of 16, so the stack remains aligned to 16. There are a couple important points to know. This is necessary for Windows functions, NOT for the hardware (the Intel chip). So you need to consider how Linux does it - it may not be necessary there. The same goes for the other things I'm doing, they're not required by the hardware so may not be necessary in Linux - dunno, haven't studied it. If you want I could look at it. The other point is: these adjustments are NOT required for all Windows functions! Some are perfectly happy with an un-adjusted stack. So if you (or anyone) tests whether it's necessary you may decide I'm wrong if you only look at a few functions. If anyone's interested I could discuss which functions are particularly picky - Some are picky about one adjustment but not others, so it's complicated. Good luck with Linux, it's got to make more sense than Windows!
Title: Re: Yet Another Invoke Macro ...
Post by: sinsi on January 30, 2015, 06:12:07 AM
Simple function MessageBox called when unaligned :(
Code: [Select]
(13d4.e58): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Windows\system32\LPK.dll -
LPK!LpkDrawTextEx+0x315:
000007fe`ff611775 440f29842450010000 movaps xmmword ptr [rsp+150h],xmm8 ss:00000000`0006f708=000007fefef5a8e40000000001c81320
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on January 30, 2015, 06:33:31 AM
@sinsi: Interesting that the stack is not aligned to 16 bits in the MessageBox sample in the  Introduction to x64 Assembly (https://software.intel.com/en-us/articles/introduction-to-x64-assembly?page=1):
Code: [Select]
; Sample x64 Assembly Program
; Chris Lomont 2009 www.lomont.org
extrn ExitProcess: PROC   ; external functions in system libraries
extrn MessageBoxA: PROC
.data
caption db '64-bit hello!', 0
message db 'Hello World!', 0
.code
Start PROC
  sub    rsp,28h      ; shadow space, aligns stack
  mov    rcx, 0       ; hWnd = HWND_DESKTOP
  lea    rdx, message ; LPCSTR lpText
  lea    r8,  caption ; LPCSTR lpCaption
  mov    r9d, 0       ; uType = MB_OK
  call   MessageBoxA  ; call MessageBox API function
  mov    ecx, eax     ; uExitCode = MessageBox(...)
  call ExitProcess
Start ENDP
End
It worked on Windows 8 ... not sure what version

@rrr:
  After some reading I clearly understand that I'll have to re-write my FCALL macro from scratch.
As of now it doesn't handle floating point arguments and doesn't align the stack on 16 bits which seems to be necessary on Linux too (at least in some cases). 
I'm thinking about creating thin Platform Abstraction Layer for JWASM  - set of macros to make my self-educational programming process more comfortable on both Windows and Linux computers ... if only my laziness will allow me to do it  :biggrin:

Title: Re: Yet Another Invoke Macro ...
Post by: sinsi on January 30, 2015, 07:44:41 AM
Code: [Select]
  sub    rsp,28h      ; shadow space, aligns stackThe stack is aligned, on entry the stack is always unaligned by 8, the sub rsp,28h aligns it and allows 32 bytes for the spill.
Try it with that line commented out.
Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on January 30, 2015, 01:36:31 PM
Sinsi, of course you're right the stack (on the main or "start" program entry) is unaligned by 8 but there are potential gotchas for beginners, especially with Windows programs (i.e. subsystem:windows in the linker). You can see confused posters, going back for years, fall for these.

For one, on entry into the Windows callback function (usually called WndProc) the stack is aligned to 16 (I'm talking about 64-bit of course). I haven't actually read this anywhere but that's what I've found. WndProc is the main "entry" into a typical window program, so it's easy to get confused. Similarly when you call "WinMain" in a typical program it's aligned to 16, because it's called right after program entry, adding 8 to the initially unaligned stack. It's sometimes considered the "C" entry point, while the real "start" is the "Masm" entry point; which can also be called WinMainCRTStartup ... So it's easy for an assembler beginner, or a C programmer no matter how advanced, to be unsure what "on entry" really means. Typically both these other Windows "entry points" are aligned to 16, not unaligned by 8. The names "start" and "main" are used promiscuously, and they're affected by the linker settings SUBSYSTEM and ENTRY.

MessageBox is very picky; most other Windows calls don't care about the alignment, at least if it's 8 off (printf family generally will work when it's unaligned by other numbers, in fact). So you can go along happily thinking you've got it figured out; even MessageBox will work half the time (on average); but sooner or later it will explode.

Another gotcha: people put "and rsp, -10h" at the top of their program, thinking they're covered; but now they've changed the alignment. If you're unaligned by 8 you MUST use 28h with messagebox, but if you're aligned you MUST use 20h with MB (and some other picky functions).

Then there's odd number of args over 4, which normally should be rounded up to provide 16-bit offset; BUT only if the function was called aligned. Many functions work unaligned, but then if you make sure their args offset are an even multiple of 16, and they call a more picky function - boom.

JWasm invoke always adds 8 on a call, and so gives the right result when you follow the rule you're referring to: always align b4 a call, and always expect unaligned by 8 on entry. But you can still fall for the odd arg list gotcha; the rule doesn't work with WndProc; and if you're not very careful other things can go wrong.

I'm forgetting a couple other interesting gotchas, shld refer to my notes, but ... just use nvk and you're covered!

It's really amusing to read old postings and see people going around and around on these issues; they think they've got it nailed, then suddenly MB (or others) blow up. I might be in the same boat as those poor guys, but don't know it yet!
Title: Re: Yet Another Invoke Macro ...
Post by: sinsi on January 30, 2015, 03:15:39 PM
Entry point is where Windows jumps to after loading your program, the very first instruction of your code. Always aligned 8, not 16.
The window procedure, called whatever you want, is called by Windows during message processing. In all of my programs, is also always aligned 8, not 16.

The Windows ABI pretty much demands that on entry to a fastcall function the stack will be aligned 8, never 16, due to the call return address.
Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on January 31, 2015, 04:33:06 AM
To determine that on entry into the Windows callback function (typically called WndProc - whatever) rsp is aligned to 16, I used this snippet. Hopefully it's enough; I can provide a complete sample prog if desired. For instance I just ran it and got 6ff1b0; always end with 0h. Probably doing something stupid - experience shows that happens at least a dozen times a day - but, what is it?

Code: [Select]
; »»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
.data
    saveinitrsp dq 0
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
WndProc proc hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM

    mov r11, rsp

cmp saveinitrsp, 0  ; can do this only once or else no window
jg @F
    mov saveinitrsp, r11
    invoke printf, cfm$("RSP coming in to WndProc was %x\n"), saveinitrsp
@@:

    cmp edx, WM_COMMAND
    jne @F

; etc, etc ...

BTW, in my above post the word "you" doesn't mean "you", you understand, rather it means "one". It only means "you" one time, the first use. "One", OTOH, always means "1". It sounds a bit like I'm giving you advice, but no, that's "one" I'm advising. I hope that's clear.

Anyway, regardless of rsp's alignment status on WndProc entry or anywhere else, if one uses nvk, one doesn't have to worry about it!
Title: Re: Yet Another Invoke Macro ...
Post by: sinsi on January 31, 2015, 06:12:05 AM
That's odd.
Code: [Select]
wndproc:    cmp edx,WM_CREATE

Code: [Select]
rax=000000000013f760 rbx=0000000000000000 rcx=00000000000b02ca
rdx=0000000000000024 rsi=0000000000000001 rdi=0000000000000000
rip=000000013fc910a8 rsp=000000000013f6f8 rbp=0000000000000000
 r8=0000000000000000  r9=000000000013f8f0 r10=00000000000b02ca
r11=0000000000000000 r12=0000000000000000 r13=0000000000000024
r14=0000000000000000 r15=00000000000b02ca
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
image00000001_3fc90000+0x10a8:
00000001`3fc910a8 83fa01          cmp     edx,1
Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on January 31, 2015, 06:56:02 AM
Ok - if WndProc is just a label, or you use proc with no arguments, it's aligned to 8. In other words you're right, on entry it's 8. But when you use proc with the standard four arguments JWasm, or ML64, builds a frame, in such a way that rsp gets bumped by an odd number of 8's, thus aligning it to 0. So I was wrong but wouldn't call it a stupid mistake, particularly for a beginner, to make. Thanks! Yet another gotcha ...
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on February 01, 2015, 12:22:33 AM
Hi rrr,
Can your nvk macro work with arguments passed in XMM registers?
For the second day I'm fighting to death with  printf  function trying to convince her to print out the value of XMM0 register but all I get is  mysterious 7FFFFFE2 :icon_confused:

Title: Re: Yet Another Invoke Macro ...
Post by: jj2007 on February 01, 2015, 01:48:18 AM
Can your nvk macro work with arguments passed in XMM registers?

deb (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1019) prints xmm args just fine, as decimal, hex or binary, but that is 32-bit code.

It is not difficult to implement, the only tricky point is that opattr returns "it's a register", as if it was eax. Below a testbed showing workarounds. Note that ML 6.14 and 6.15 use the string XMM(0), therefore the somewhat clumsy version with ifidni (I know earlier ML versions are not relevant for 64-bit code).

Code: [Select]
include \masm32\include\masm32rt.inc
.686p
.xmm

GetType MACRO arg
LOCAL tmp$, opa, is
  opa = (opattr arg) AND 127
  tmp$ CATSTR <Myarg=>, <arg>, < with opattr=>, %opa
  % echo tmp$
  tmp$ CATSTR <arg>, <  > ; two blanks to make sure there are at least three chars
  tmp$ SUBSTR tmp$, 1, 3
  ifidni tmp$, <xmm>
echo ### xmm found ###
  else
echo ### something else...
  endif
  is INSTR <arg>, <xmm>
  if is eq 1
echo @@@ xmm found @@@
  else
echo @@@ arg = something else...
  endif

; all: Myarg=eax with opattr=48
; MLv10: Myarg=xmm0 with opattr=48
; JWasm: Myarg=xmm0 with opattr=48
; MLv615: Myarg=XMM(0) with opattr=48

ENDM

.code
x1 dd 123
start:
GetType x1 ; name is only 2 chars long
GetType eax
GetType xmm0
exit
.err ; don't build, just show the echos
end start
Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on February 01, 2015, 04:06:41 AM
@jj2007,

You don't have to go to all that trouble, the type function distinguishes between xmm0 and rax: type(xmm0) = 10h. Old ML versions were broken - type returned 8 for xmm0, like rax, but it's been fixed since ver 8. BTW ML64 gives type(ymm0) as 20h, but Jwasm incorrectly says it's 8.

@vertograd,

the good news is, you don't need xmm0 for printf (or sprintf or any of that family). Instead, pass reals in the GPR's. The bad news is you just wasted x hours trying to get printf to read xmm0, which (AFAIK) it doesn't do.

ps. I think I'll look into converting deb to 64 bits,  I need (at least some of) that capability.
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on February 01, 2015, 04:58:46 AM
Thanks , Jochen . GetType macro runs on Linux too  :t

@rrr:
       Well, thank you for the information . Honestly I've already started to suspect that something is wrong with that function itself . Now I must look for another function that can take the arguments from XMM registers ...
You said  that  I 'wasted x hours' but I confess the few hours don't matter at all when I successfully wasted the best years of my life , that's about a half of my lifetime ...
BTW I have good news for you too : my version of JWASM reports the correct value for XMM registers -   10h. What version do you use?
Mine:
Quote
JWasm v2.11, Oct 20 2013, Masm-compatible assembler.


Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on February 01, 2015, 05:31:11 AM
Well, it's not that there's anything wrong with printf. There's a lot of misinformation around - you can read that real arguments are always passed in xmm0..xmm3 (under the new Windows x64 calling conventions) but that's not the case. VARARG functions, like printf, use the GPR's instead, also some others - go to the source (MSDN) for the correct info.

Yes, JWasm does xmm registers correctly - but you're reading too fast! I said "Ymm" type is incorrect in JWasm 2.11.

FWIW actually I don't think time is wasted beating one's head against code - that's what it's all about - in coding, persistence is much more important than perspicuity  :biggrin:
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on February 01, 2015, 06:15:37 AM
...
Yes, JWasm does xmm registers correctly - but you're reading too fast! I said "Ymm" type is incorrect in JWasm 2.11.
...
oops. sorry ... where is my glasses :icon_eek:?
Title: Re: Yet Another Invoke Macro ...
Post by: jj2007 on February 01, 2015, 07:08:09 AM
ps. I think I'll look into converting deb to 64 bits,  I need (at least some of) that capability.

Lines 9533ff in \Masm32\MasmBasic\MasmBasic.inc - it wasn't meant for open source teamwork, though :bgrin:
PM me if you need details.
Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on February 01, 2015, 07:45:28 AM
Hmmmm ... beginning to sound like work! I just want to get some of the printing routines, didn't realize it was part of such a large package. Probably calls other routines, that call other routines, that call ... Well, I'll probably just borrow some techniques, as I did with qword's so-called "Simple" Math.  You guys have churned out a lot of lines!
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on February 02, 2015, 03:26:04 AM
@rrr:
      check this thread (http://masm32.com/board/index.php?topic=1892.0) for some open source printing routines.
I wonder if storing the contents of XMM register in double QWORD at memory location and printing it out as 2 QWORDs sequentially is the only way to dump XMM to console. 
Title: Re: Yet Another Invoke Macro ...
Post by: jj2007 on February 02, 2015, 05:52:19 AM
I wonder if storing the contents of XMM register in double QWORD at memory location and printing it out as 2 QWORDs sequentially is the only way to dump XMM to console.

Depends on what you want to know:

include \masm32\MasmBasic\MasmBasic.inc
  Init
  sub esp, OWORD      ; create a slot
  fldpi
  fld st
  fstp REAL8 ptr [esp]
  fstp REAL8 ptr [esp+8]
  movups xmm0, [esp]
  deb 4, "2*PI", f:xmm0, d:xmm0, x:xmm0, b:xmm0
  add esp, OWORD
  Exit
end start


Output:

2*PI
f:xmm0          3.141592653589793  <<< lower qword as REAL8 aka double
d:xmm0          4614256656552045848  <<< same as integer
x:xmm0          400921FB 54442D18 400921FB 54442D18  <<< full 128 bits as hex, as in Olly
b:xmm0          01010100010001000010110100011000  <<< 32 bits
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on February 02, 2015, 07:03:24 AM
...
Depends on what you want to know:
...
  I want to know how to print the content of XMM register in several ways.
At the moment I know this:
XMM ->double QWORD variable->GPR-> print(f)
Had no luck with this:
XMM->print(f)
and this:
XMM->stack->print(f)
I'm sure that another way of doing this is possible and it doesn't depend on anything (sort of "Ding an sich")


 
Title: Re: Yet Another Invoke Macro ...
Post by: jj2007 on February 02, 2015, 07:42:25 AM
XMM->stack->print(f)

At least this one is simple:
Code: [Select]
include \masm32\include\masm32rt.inc
.686p
.xmm

.code
o1 OWORD 12345678abcdef0112345678abcdef02h
start:
  movups xmm0, o1
  pshufd xmm0, xmm0, 00011011b
  sub esp, OWORD
  movups [esp], xmm0
  REPEAT 4
pop eax
print hex$(eax), " "
  ENDM
  exit
end start

Output: 12345678 ABCDEF01 12345678 ABCDEF02
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on February 02, 2015, 08:41:19 AM
Thanks Jochen  :t
It works on Linux in the following modification:
Code: [Select]
        INCLUDE fc.asm  ; FCALLTEST macros set
.data
    frm db "%x",0
    o1 OWORD 12345678abcdef0012345678abcdef00h
.code
_start:
        movups xmm0, o1
        sub rsp, OWORD
        movups [rsp], xmm0
        mov r15,3
        .REPEAT
       FCALLTEST printf,offset frm,[rsp+4*r15]
       dec r15
        .UNTIL r15==-1
        FCALLTEST exit,0     
end _start

OUTPUT:
Quote
12345678abcdef0012345678abcdef00

XMM->stack->print(f) variant is done!
Title: Re: Yet Another Invoke Macro ...
Post by: jj2007 on February 02, 2015, 08:45:40 AM
- you can probably use .UNTIL Sign? instead of .UNTIL r15==-1
- don't forget add rsp, OWORD (my version does 4 pops, so no need for that)
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on February 02, 2015, 08:53:29 AM
Yes,you're right , UNTIL SIGN? is much better and stack is restored:
Code: [Select]
      INCLUDE fc.asm  ; FCALLTEST macros set
.data
    frm db "%x",0
    o1 OWORD 12345678abcdef0012345678abcdef00h
.code
_start:
        movups xmm0, o1
        sub rsp, OWORD
        movups [rsp], xmm0
        mov r15,3
        .REPEAT
       FCALLTEST printf,offset frm,[rsp+4*r15]
       dec r15
        .UNTIL SIGN?

        add rsp, OWORD

        FCALLTEST exit,0 
   
end _start   
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on February 02, 2015, 09:58:30 AM
One question  that I'm asking myself:
Why cannot I  print the double QWORD value in 2 interations instead of 4?
At the start the stack is aligned to 16 bits . Maybe something in FCALLTEST macro ? :icon_confused:


Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on February 02, 2015, 10:16:56 AM
Why not this?:
Code: [Select]
include \myinc\inc64.inc
.data
    o1 OWORD 12335678aacdff0112344678abbeef02h
.code

start:
mov r15, 3
@@:
    mov eax, DWORD PTR o1[r15*4]
    prnt "%x ", eax
    dec r15
    jge @B
prnt "\n"

; or, if you wish, a second suggestion:
prnt "%x %x %x %x\n", DWORD PTR o1[12], DWORD PTR o1[8], DWORD PTR o1[4], DWORD PTR o1

ret
end start
Notes:
- I'm using my inc64.inc with my "prnt" macro, equiv to masm32rt.inc print or FC printf.
- Have to use "DWORD PTR" - but surely that's simpler than using both xmm0 AND rsp?
- requires /LARGEADDRESSAWARE:NO linker switch.

As I was about to post u asked to print in 2 interations instead of 4. U know, you could use (with my prnt function, I'm sure FC can do similar) the 2nd suggestion, do it in one line.

[edit] woops, read above posts more carefully. I see you want to print out xmm0 directly, NOT o1 - that's just a value to init xmm0 with. Sorry - where is my glasses ?? :icon_eek:
Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on February 02, 2015, 11:36:21 AM
Well, how about this?
Code: [Select]
include \myinc\inc64.inc
.data
    o1 OWORD 12335678aacdff0112344678abbeef02h
.code

start:
mov r15, 1
@@:
    mov rax, qword ptr o1[r15*8]
    prnt "%llx ", rax
    dec r15
    jge @B
prnt "\n"
ret
end start

Code: [Select]
prntxmm.asm: 16 lines, 2 passes, 0 ms, 0 warnings, 0 errors
12335678aacdff01 12344678abbeef02

Forgot u wanted it on the stack (I'm in a hurry):
Code: [Select]
include \myinc\inc64.inc
.data
    o1 OWORD 12335678aacdff0112344678abbeef02h
.code

start:
    movups xmm0, o1
    movups [rsp-16], xmm0
    mov r15, [rsp-8]
    mov r14, [rsp-16]
    prnt "%llx ", QWORD PTR r14
    prnt "%llx \n", QWORD PTR r15
ret
end start
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on February 03, 2015, 12:04:33 AM
@rrr:
      Thanks for posting your examples. Your "2-iterations"  and "stack" routines work nicely here but especially I appreciate your "second suggestion" to do it in one line  :icon14:
Yes, my macro can do that too:
Code: [Select]
FCALLTEST printf, offset frm1, qword ptr o1[12], qword ptr o1[8], qword ptr o1[4], qword ptr o1moreover it can take the values from the stack:
Code: [Select]
FCALLTEST printf, offset frm1, qword ptr [rsp+12], qword ptr [rsp+8], qword ptr [rsp+4], qword ptr [rsp]
I'm going to write PRINT macro to avoid such a long line of code:
Code: [Select]
PRINT XMM0
looks better



Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on February 03, 2015, 05:44:05 AM
Hi vertograd,

Your post took me aback for a moment because printf won't take a value from the stack, since of course it uses rsp for its own purposes, but then I noticed "FCALLTEST". I reckon you're doing the sensible thing, getting the value off the stack and putting it in rdx, r8, r9, or etc, b4 calling printf. I'll probably do that also in my "prnt" routine, but I was thinking about substituting some other register for rsp, such as rbp. That works when done "by hand" but to get it right in all cases appears difficult and/or time-wasting. There are other ways also but all seem worse than the first one.

Some would say we're wasting effort, implementing 2 similar print macros; but that's wrong. It's like saying, if we go for a bicycle ride, one of us is duplicating effort, should stay at home and watch TV! Like bicycling coding is enjoyable and good exercise. Of course too often it's like a bicycle ride where a tire goes flat, the chain breaks, u get caught in a tornado, hit over the head by bandits, then arrested on the way home for littering the path with blood and broken bicycle parts ...  ;)
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on February 03, 2015, 06:38:54 AM
BLANK
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on February 03, 2015, 08:28:42 AM
First draft of PRINT macro is done. No FPU resgisters support yet.
Code: [Select]
movups XMM0, o1
  PRINT  RSP
  PRINT  XMM0
  PRINT  RSP

OUTPUT:
Quote
RSP    = a6254f80
XMM0 = 12345678 abcdef00 12345678 abcdef00
RSP    = a6254f80

[EDIT] :BINGO ! XMM->print(f) is done! I knew it's possible:
Code: [Select]
.data
      frm   db "%f",10,0
      r  REAL8 123.456789

.code
      _start:   
               movsd    XMM0, r
               mov rdi, offset frm
               mov rax, 1
               call printf
OUTPUT:
Quote
123.456789

Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on February 04, 2015, 10:26:43 AM
@vertograd,

we got another foot of snow, been digging out all morning.

I have been considering what you did with "PRINT" for, e.g, [rsp+4], and I might have an improvement for nvk to do similar, without slowing it down much (just 2 extra instructions!) It would have been done (or, discarded as unworkable) by now but for the snow.

However, I'm confused by your last edit, BINGO XMM=>print(f). The code doesn't work for me - nothing is output. Why put the format string in rdi (normally it goes in rcx) and what's rax got to do with it (to paraphrase Tina Turner)? Of course you have a very good reason: "it works" - but not for me (yet, anyway). R u sure you've posted it correctly? Normally args go in rcx, rdx, r8, r9. I'm missing something, and it just occurred to me - is this Linux code?
Title: Re: Yet Another Invoke Macro ...
Post by: jj2007 on February 04, 2015, 10:48:27 AM
too often it's like a bicycle ride where a tire goes flat, the chain breaks, u get caught in a tornado, hit over the head by bandits, then arrested on the way home for littering the path with blood and broken bicycle parts ...  ;)

Sounds like ordinary Windows coding :lol:
Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on February 04, 2015, 12:05:39 PM
OK ... I smell a rat!  :eusa_naughty:

And now, back to our regularly scheduled programming...  :icon_cool:
Title: Re: Yet Another Invoke Macro ...
Post by: GoneFishing on February 04, 2015, 08:36:04 PM
 BLANK
Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on February 16, 2015, 02:01:58 PM
Hi vertograd,

Been working on other things (math algo's using SSE/AVX) and ran into Linux printf function, was reminded of this post. As everyone else already knew, of course your BINGO example is legit Linux - thot u were just pulling my leg!

More snow here yesterday and more on the way. Cat is going stir crazy, wants to be outside ... she wishes there were a rat in the house, give her something to do!

One of these days I'll get back to these printing issues - need to upgrade nvk to do it right - but math routines are (to me) much more important. Should have some interesting results to post soon - amazing how much power is hidden in modern CPU's. Seems no-one is really tapping the full potential.

c u later, good luck with your coding, jogging and beer!
Title: Re: Yet Another Invoke Macro ...
Post by: meneghini on May 03, 2015, 09:40:47 PM
Hello Buddy!
I'm here just to say thanks, It worked perfect! I'm just a beginner and I had to parse a x86 code to x64 version, as I'm using masm64, this helped a lot. Thanks.
Title: Re: Yet Another Invoke Macro ...
Post by: rrr314159 on May 05, 2015, 10:54:57 AM
Hi meneghini,

glad it helped. Only problem I've noticed, it can make the .exe quite a bit larger, 20%. Often u can replace it by just putting the arguments in rcx,rdx etc then calling the function. What I do, always use it when developing, then in the final product I might try replacing it like that. More than half the time stack is already aligned, etc and nvk's not necessary. Particularly helps when in a macro; if it gets re-instantiated 20 times, saves quite a few bytes to replace it. But good chance you don't care, masm produces such small .exe's anyway it's not a big deal.

thanks for the thanks, I appreciate it!