I finally got my favorite project ("MathMovie") converted to 64-bit. It took so long because a lot of MM had to be rewritten with "advanced" (to me) techniques I learned here. Originally used 8088-era techniques - it worked but was a horrible mess (still is, actually). It was about 17,000 lines, 30 of them macros; now it's about 13,000, 600 of them in macros. One of the most useful is my invoke macro (nvk).
You're wondering: JWasm has invoke, so who wants yet another invoke macro? For one thing, I need to assemble under ML64 also (various good reasons). More important, JWasm invoke doesn't do what I want (see below).
Many issues are involved in 64-bit conversion, the major one being stack alignment / calling convention (handled by nvk). This huge issue has consumed many person-years of expert coders across the globe - so why am I able to get past it so easily? Not because I'm smarter; au contraire, I'm dumber than most of them. It's because the issue has only to do with Windows interfacing. The hardware doesn't force you to align the stack, or pass parameters in rcx, rdx, r8, r9; nor does it care about all the other arcana of Windows calling conventions: reals in xmm's (unless vararg etc), prologues, epilogues, stack frame pointers, SEH etc etc. If you simply want to get 32-bit code working in 64-bits, you can (almost) ignore all that stuff. Of course you lose a lot: no codeview, no symbolic debugging, can't create Windows-called routines, etc.
I still have to interface with many Windows functions; nvk takes care of that. It's tested thoroughly with 100 functions, the ones I need. Below is a partial list (leaves out minor ones like strlen, etc). Perhaps the trickiest involve threading, but the most demanding was good old MessageBox (probably because it's the only one that pops up its own window). There are some Windows functions nvk won't support, but I don't happen to know what they are.some major Windows functions tested with nvk invoke macro, no particular order
GetModuleHandleA proto :LPSTR
GetCommandLineA proto
ExitProcess proto :DWORD
LoadIconA proto :HINSTANCE, :LPSTR
LoadCursorA proto :HINSTANCE, :LPSTR
RegisterClassExA proto :ptr WNDCLASSEXA
CreateWindowExA proto :DWORD, :LPSTR, :LPSTR, :DWORD, :SDWORD, :SDWORD, :SDWORD, :SDWORD, :HWND, :HMENU, :HINSTANCE, :LPVOID
ShowWindow proto :HWND, :SDWORD
UpdateWindow proto :HWND
GetMessageA proto :ptr MSG, :HWND, :SDWORD, :SDWORD
TranslateMessage proto :ptr MSG
DispatchMessageA proto :ptr MSG
PostQuitMessage proto :SDWORD
DefWindowProcA proto :HWND, :UINT, :WPARAM, :LPARAM
MoveWindow proto :HWND, :DWORD, :DWORD, :DWORD, :DWORD, :BOOL
SetWindowTextA proto :HWND, :LPSTR
InvalidateRect proto :HWND, :ptr RECT, :BOOL
BeginPaint proto :HWND, :LPPAINTSTRUCT
GetClientRect proto :HWND, :LPRECT
DrawTextA proto :HDC, :LPSTR, :DWORD, :LPRECT, :DWORD
EndPaint proto :HWND, :ptr PAINTSTRUCT
PostMessageA proto :HWND, :DWORD, :WPARAM, :LPARAM
CreateFileA proto :LPSTR, :DWORD, :DWORD, :LPSECURITY_ATTRIBUTES, :DWORD, :DWORD, :HANDLE
WriteFile proto :HANDLE, :LPCVOID, :DWORD, :LPDWORD, :LPOVERLAPPED
ReadFile proto :HANDLE, :LPVOID, :DWORD, :LPDWORD, :LPOVERLAPPED
CloseHandle proto :HANDLE
CreateThread proto :LPSECURITY_ATTRIBUTES, :SIZE_T, :LPTHREAD_START_ROUTINE, :LPVOID, :DWORD, :LPDWORD
ExitThread proto :DWORD
BitBlt proto ;:HDC, :DWORD, :DWORD, :DWORD, :DWORD, :HDC, :DWORD, :DWORD, :DWORD
StretchBlt proto :HDC, :DWORD, :DWORD, :DWORD, :DWORD, :HDC, :DWORD, :DWORD, :DWORD, :DWORD, :DWORD
printf proto :ptr SBYTE, :VARARG
sprintf proto :ptr SBYTE, :ptr SBYTE, :VARARG
sscanf proto :ptr SBYTE, :ptr SBYTE, :VARARG
CreateCompatibleDC proto :HDC
GetDC proto :HWND
CreateDIBSection proto :HDC, :ptr BITMAPINFO, :DWORD, :ptr ptr, :HANDLE, :DWORD
SelectObject proto :HDC, :HGDIOBJ
DeleteObject proto :HGDIOBJ
PeekMessageA proto :LPMSG, :HWND, :DWORD, :DWORD, :DWORD
DeleteDC proto :HDC
GetCurrentProcess proto ; all these in winbase.inc
SetProcessAffinityMask proto :HANDLE, :DWORD_PTR
GetPriorityClass proto :HANDLE
SetPriorityClass proto :HANDLE, :DWORD
GetCurrentThread proto
GetThreadPriority proto :HANDLE
SetThreadPriority proto :HANDLE, :DWORD
SetThreadAffinityMask proto :HANDLE, :DWORD_PTR
Sleep proto :DWORD ; this one works ; in winbase
Beep proto :DWORD, :DWORD ; winbase
QueryPerformanceCounter proto :ptr writeanythinghereLARGE_INTEGER
QueryPerformanceFrequency proto :ptr LARGE_INTEGER
GetStdHandle proto :DWORD
WriteConsoleA proto :HANDLE, :ptr , :DWORD, :LPDWORD, :LPVOID
MessageBoxA proto :HWND, :LPSTR, :LPSTR, :DWORD
DestroyWindow proto :HWND
IsZoomed proto :HWND
LoadMenuA proto :HINSTANCE, :LPSTR
GetMenu proto :HWND
SetMenu proto :HWND, :HMENU
GetSubMenu proto :HMENU, :DWORD
CheckMenuItem proto :HMENU, :DWORD, :DWORD
CheckMenuRadioItem proto :HMENU, :DWORD, :DWORD, :DWORD, :DWORD
SetMenuItemInfoA proto :HMENU, :DWORD, :BOOL, :LPCMENUITEMINFOA
TrackPopupMenu proto :HMENU, :DWORD, :DWORD, :DWORD, :DWORD, :HWND, :ptr RECT
GetClientRect proto :HWND, :LPRECT
GetWindowRect proto :HWND, :LPRECT
SetCursor proto :HCURSOR
GetWindowLongA proto :HWND, :DWORD
SetWindowLongA proto :HWND, :DWORD, :SDWORD
SendMessageA proto :HWND, :DWORD, :WPARAM, :LPARAM
SetFocus proto :HWND
SetProcessAffinityMask proto :HANDLE, :DWORD_PTR
GetCurrentProcessId proto
GetProcessAffinityMask proto :HANDLE, :ptr DWORD_PTR, :ptr DWORD_PTR
GetCurrentThread, proto
GetCurrentThreadId proto
SetThreadAffinityMask proto :HANDLE, :DWORD_PTR
lstrcatA proto :LPSTR, :LPSTR ; and other minor ones like this
The other reason nvk can be n00b-written is, it's not very efficient: follows the KISS principle (keep it simple, sailor). My project has two major loops running at approximately 30 iterations /second and 60 million iterations / second. Windows is never invoked from the inner loop, since there are only a few hundred instruction cycles to play with (per core, of course). The 30-ips loop uses about 10 million cycles per iteration, and averages less than 1000 Windows invocations: so speed simply isn't an issue.
nvk "features" include:
- Forget about stack alignment. No "and rsp, -16"; you can push/pop across invocations (64 or 16 bit). For testing, I even adjust the stack randomly by odd numbers throughout my code (sub rsp, 4321, followed later, after a dozen invocations, by add rsp, 4321). JWasm can't do that.
- nvk never complains "register value overwritten" like JWasm. (It only takes a few extra instructions to avoid this.) In fact, it has no error messages at all; when fed bad args it just blows up all over the place.
- it handles "aDdR", and real4, real10, bytes/words/dwords, structures, all odd-sized arguments correctly. Some of these JWasm invoke doesn't do right (altho undoubtedly there are some types I haven't run into that nvk doesn't handle, but JWasm does).
This post is written for people, like me, who just want to get their 32-bit code up and running, and will worry about SEH (etc etc) later; not for experts, who of course already know this stuff. To them: please let me know what I'm doing wrong, if you've got nothing better to do at the moment. My macro technique is primitive, any tips to clean it up wld be welcome. Nvk is full of unknown (to me) bugs, so if you notice one pls inform.
There are comments in the code, here's a brief description. When called with a function and arguments, it first aligns to 16 bits, and stores the original rsp on the stack for later recovery. All registers, including rbp, are preserved. Then the args are counted and rounded up to an even number, which (times eight) is sub'ed from rsp. The arguments are put on the stack in reverse order, including the first four (always spilled to shadow space). Then the first (up to) four are read off the stack into rcx .. r9, which can of course appear as arguments. Finally the function is called; upon return, the stack is restored.
The zip includes the JWasm sample Win64_2.asm, with minimal mods for ML64 compatibility, plus makeJ.bat and makeM.bat. Should be self-explanatory.;***********************
; INVOKE MACROS for ML64 and JWasm (if you want to use it there), by rrr314159 2015/1/29
;***********************
IFNDEF __JWASM__ ;; actually I prefer nvk in JWasm as well
invoke equ nvk
ENDIF
;***********************
;***********************
nvk MACRO thefun:REQ, args:VARARG ; "invoke"
;;***********************
;; ALIGN stack to 16 bits, call nvk_noalign, then restore rsp
push rbp
lea rbp, [rsp][8] ;; save entering rsp first into rbp then onto stack
sub rsp, 8 ;; make room for entering rsp that was saved in rbp
and rsp, -10h ;; align 16
mov [rsp], rbp ;; put entering rsp onto stack
mov rbp, [rbp-8] ;; restore rbp
nvk_noalign thefun, args
pop rsp ;; right back where we started from
ENDM
;***********************
nvk_noalign MACRO thefun:REQ, args:VARARG ;; "invoke" without aligning
;;***********************
LOCAL txt, cnt, stackadjust, cnttopass
;; Prepare stack for arguments, load stack, call function, restore rsp
;; called from nvk; also can call directly if u know stack is aligned
;; Count the arguments, prepare reversed arg list
cnt = 0
IFNB <args> ;; if args blank skip most of the work
txt equ <>
% FOR arg, <args>
txt CATSTR <arg> , <,>, txt
cnt = cnt + 1
ENDM
txt SUBSTR txt, 1 ;; force expression evaluation
txt SUBSTR txt, 1, @SizeStr( %txt ) - 1
;; Adjust stack for args, rounded up to 16 bits (necessary for some funs)
IF cnt GT 4
stackadjust = cnt
ELSE
stackadjust = 4
ENDIF
stackadjust = ((stackadjust+1)/2)*2 ; round up to 16
sub rsp, stackadjust * 8
;; Load stack, saving rdx in home space to be restored after each arg
mov [rsp], rdx ;; cld also spill rcx,r8,9 if needed, or whatever
cnttopass = cnt ;; pass by ref, so don't send cnt (gets clobbered)
nvk_loadstack txt, cnttopass
;; Load four regs from prepared args loaded on stack
mov rcx, [rsp]
mov rdx, [rsp+8]
mov r8, [rsp+10h]
mov r9, [rsp+18h]
;; Adjust by 20h, no other work needed, if arglist was blank
ELSE
sub rsp, 20h
ENDIF
;; Call the function (finally), afterwards restore stack
call thefun
IF cnt GT 0
add rsp, stackadjust * 8
ELSE
add rsp, 20h
ENDIF
ENDM
;***********************
nvk_loadstack MACRO args, posonstack
;;***********************
local leacmd
;; Load args on stack in reverse, restore rdx after each (it may in the arg list)
% FOR arg, <args>
posonstack = posonstack - 1
mov rdx, [rsp] ;; rdx gets orig value each time
;; Check for "aDdR", if present prepare lea instruction and execute it
leacmd equ <@afteraddr(arg)> ;; if addr, returns after-addr text
IFNB leacmd
leacmd CATSTR <lea rdx, >, leacmd
&leacmd ;; execute lea instruction
mov [rsp + posonstack*8], rdx
ELSE
;; Convert types as necessary; if real/integer not 8 bytes, convert to real8/qword
IF TYPE(arg) EQ REAL4 OR TYPE(arg) EQ REAL10
fld arg
fstp REAL8 PTR [rsp + posonstack*8]
ELSE
IF TYPE(arg) EQ 1 OR TYPE(arg) EQ 2
movsx edx, arg
ELSEIF TYPE(arg) EQ 4
mov edx, arg
ELSE
mov rdx, arg
ENDIF
mov [rsp + posonstack*8], rdx
ENDIF
ENDIF
ENDM
ENDM
;***********************
@afteraddr MACRO thetxt:=<>
;;***********************
LOCAL char, answer, iswhite, numchars
;; If argument starts with addr return rest of string, else blank
answer equ <>
numchars = 0
FORC char, <&thetxt>
IF numchars EQ 0
iswhite INSTR 1,< >,<&char> ;; trim leading spaces or tabs
IFE iswhite
answer CATSTR answer,<&char>
numchars = 1
ENDIF
ELSEIF numchars LT 4
answer CATSTR answer,<&char>
numchars = numchars + 1
ELSEIF numchars EQ 4
;; "answer" now holds first 4 chars after whitespace, is it "aDdR"?
IFIDNI <addr>, answer
answer equ <> ;; says addr; now get the latter part of arg
numchars = 5 ;; anything > 4 will do; no longer counting
ELSE
EXITM <> ;; not addr, return blank
ENDIF
ELSE ;; numchars > 4 means get rest of string
answer CATSTR answer,<&char>
ENDIF
ENDM
;; If, after trimming, arg was too short, it couldn't be addr
IF numchars LT 5
answer equ <>
ENDIF
EXITM answer
ENDM
;***********************
Hi rrr,
Interesting set of macros! Currently I'm testing my FCALL macro (i.e. 'FASTCALL' for JWASM on Linux) . Thinking about the best way to align the stack to 16 bits I've looked through your code and have one question :
from your "Adjust stack for args" routine:
Quotestackadjust = ((stackadjust+1)/2)*2 ; round up to 16
is ((stackadjust+1)/2)*2 expression equal to (stackadjust+1) ?
Do we need to perform ODD-EVEN check on number of stack args here ? What if stackadjust=6 ,or say 8?
EDIT: I got it now - those expression are not equal for assembler
No doubt u figured it out, Vertograd, this statement increases an odd number to the next even number. E.g. if you have 7 arguments this puts it up to 8. That way when it's multiplied by 8 bits it's a multiple of 16, so the stack remains aligned to 16. There are a couple important points to know. This is necessary for Windows functions, NOT for the hardware (the Intel chip). So you need to consider how Linux does it - it may not be necessary there. The same goes for the other things I'm doing, they're not required by the hardware so may not be necessary in Linux - dunno, haven't studied it. If you want I could look at it. The other point is: these adjustments are NOT required for all Windows functions! Some are perfectly happy with an un-adjusted stack. So if you (or anyone) tests whether it's necessary you may decide I'm wrong if you only look at a few functions. If anyone's interested I could discuss which functions are particularly picky - Some are picky about one adjustment but not others, so it's complicated. Good luck with Linux, it's got to make more sense than Windows!
Simple function MessageBox called when unaligned :(
(13d4.e58): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Windows\system32\LPK.dll -
LPK!LpkDrawTextEx+0x315:
000007fe`ff611775 440f29842450010000 movaps xmmword ptr [rsp+150h],xmm8 ss:00000000`0006f708=000007fefef5a8e40000000001c81320
@sinsi: Interesting that the stack is not aligned to 16 bits in the MessageBox sample in the Introduction to x64 Assembly (https://software.intel.com/en-us/articles/introduction-to-x64-assembly?page=1):
; Sample x64 Assembly Program
; Chris Lomont 2009 www.lomont.org
extrn ExitProcess: PROC ; external functions in system libraries
extrn MessageBoxA: PROC
.data
caption db '64-bit hello!', 0
message db 'Hello World!', 0
.code
Start PROC
sub rsp,28h ; shadow space, aligns stack
mov rcx, 0 ; hWnd = HWND_DESKTOP
lea rdx, message ; LPCSTR lpText
lea r8, caption ; LPCSTR lpCaption
mov r9d, 0 ; uType = MB_OK
call MessageBoxA ; call MessageBox API function
mov ecx, eax ; uExitCode = MessageBox(...)
call ExitProcess
Start ENDP
End
It worked on Windows 8 ... not sure what version
@rrr:
After some reading I clearly understand that I'll have to re-write my FCALL macro from scratch.
As of now it doesn't handle floating point arguments and doesn't align the stack on 16 bits which seems to be necessary on Linux too (at least in some cases).
I'm thinking about creating thin Platform Abstraction Layer for JWASM - set of macros to make my self-educational programming process more comfortable on both Windows and Linux computers ... if only my laziness will allow me to do it :biggrin:
sub rsp,28h ; shadow space, aligns stack
The stack is aligned, on entry the stack is always unaligned by 8, the sub rsp,28h aligns it and allows 32 bytes for the spill.
Try it with that line commented out.
Sinsi, of course you're right the stack (on the main or "start" program entry) is unaligned by 8 but there are potential gotchas for beginners, especially with Windows programs (i.e. subsystem:windows in the linker). You can see confused posters, going back for years, fall for these.
For one, on entry into the Windows callback function (usually called WndProc) the stack is aligned to 16 (I'm talking about 64-bit of course). I haven't actually read this anywhere but that's what I've found. WndProc is the main "entry" into a typical window program, so it's easy to get confused. Similarly when you call "WinMain" in a typical program it's aligned to 16, because it's called right after program entry, adding 8 to the initially unaligned stack. It's sometimes considered the "C" entry point, while the real "start" is the "Masm" entry point; which can also be called WinMainCRTStartup ... So it's easy for an assembler beginner, or a C programmer no matter how advanced, to be unsure what "on entry" really means. Typically both these other Windows "entry points" are aligned to 16, not unaligned by 8. The names "start" and "main" are used promiscuously, and they're affected by the linker settings SUBSYSTEM and ENTRY.
MessageBox is very picky; most other Windows calls don't care about the alignment, at least if it's 8 off (printf family generally will work when it's unaligned by other numbers, in fact). So you can go along happily thinking you've got it figured out; even MessageBox will work half the time (on average); but sooner or later it will explode.
Another gotcha: people put "and rsp, -10h" at the top of their program, thinking they're covered; but now they've changed the alignment. If you're unaligned by 8 you MUST use 28h with messagebox, but if you're aligned you MUST use 20h with MB (and some other picky functions).
Then there's odd number of args over 4, which normally should be rounded up to provide 16-bit offset; BUT only if the function was called aligned. Many functions work unaligned, but then if you make sure their args offset are an even multiple of 16, and they call a more picky function - boom.
JWasm invoke always adds 8 on a call, and so gives the right result when you follow the rule you're referring to: always align b4 a call, and always expect unaligned by 8 on entry. But you can still fall for the odd arg list gotcha; the rule doesn't work with WndProc; and if you're not very careful other things can go wrong.
I'm forgetting a couple other interesting gotchas, shld refer to my notes, but ... just use nvk and you're covered!
It's really amusing to read old postings and see people going around and around on these issues; they think they've got it nailed, then suddenly MB (or others) blow up. I might be in the same boat as those poor guys, but don't know it yet!
Entry point is where Windows jumps to after loading your program, the very first instruction of your code. Always aligned 8, not 16.
The window procedure, called whatever you want, is called by Windows during message processing. In all of my programs, is also always aligned 8, not 16.
The Windows ABI pretty much demands that on entry to a fastcall function the stack will be aligned 8, never 16, due to the call return address.
To determine that on entry into the Windows callback function (typically called WndProc - whatever) rsp is aligned to 16, I used this snippet. Hopefully it's enough; I can provide a complete sample prog if desired. For instance I just ran it and got 6ff1b0; always end with 0h. Probably doing something stupid - experience shows that happens at least a dozen times a day - but, what is it?
; »»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
.data
saveinitrsp dq 0
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
WndProc proc hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM
mov r11, rsp
cmp saveinitrsp, 0 ; can do this only once or else no window
jg @F
mov saveinitrsp, r11
invoke printf, cfm$("RSP coming in to WndProc was %x\n"), saveinitrsp
@@:
cmp edx, WM_COMMAND
jne @F
; etc, etc ...
BTW, in my above post the word "you" doesn't mean "you", you understand, rather it means "one". It only means "you" one time, the first use. "One", OTOH, always means "1". It sounds a bit like I'm giving you advice, but no, that's "one" I'm advising. I hope that's clear.
Anyway, regardless of rsp's alignment status on WndProc entry or anywhere else, if one uses nvk, one doesn't have to worry about it!
That's odd.
wndproc: cmp edx,WM_CREATE
rax=000000000013f760 rbx=0000000000000000 rcx=00000000000b02ca
rdx=0000000000000024 rsi=0000000000000001 rdi=0000000000000000
rip=000000013fc910a8 rsp=000000000013f6f8 rbp=0000000000000000
r8=0000000000000000 r9=000000000013f8f0 r10=00000000000b02ca
r11=0000000000000000 r12=0000000000000000 r13=0000000000000024
r14=0000000000000000 r15=00000000000b02ca
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
image00000001_3fc90000+0x10a8:
00000001`3fc910a8 83fa01 cmp edx,1
Ok - if WndProc is just a label, or you use proc with no arguments, it's aligned to 8. In other words you're right, on entry it's 8. But when you use proc with the standard four arguments JWasm, or ML64, builds a frame, in such a way that rsp gets bumped by an odd number of 8's, thus aligning it to 0. So I was wrong but wouldn't call it a stupid mistake, particularly for a beginner, to make. Thanks! Yet another gotcha ...
Hi rrr,
Can your nvk macro work with arguments passed in XMM registers?
For the second day I'm fighting to death with printf function trying to convince her to print out the value of XMM0 register but all I get is mysterious 7FFFFFE2 :icon_confused:
Quote from: vertograd on February 01, 2015, 12:22:33 AMCan your nvk macro work with arguments passed in XMM registers?
deb (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1019) prints xmm args just fine, as decimal, hex or binary, but that is 32-bit code.
It is not difficult to implement, the only tricky point is that opattr returns "it's a register", as if it was eax. Below a testbed showing workarounds. Note that ML 6.14 and 6.15 use the string XMM(0), therefore the somewhat clumsy version with
ifidni (I know earlier ML versions are not relevant for 64-bit code).
include \masm32\include\masm32rt.inc
.686p
.xmm
GetType MACRO arg
LOCAL tmp$, opa, is
opa = (opattr arg) AND 127
tmp$ CATSTR <Myarg=>, <arg>, < with opattr=>, %opa
% echo tmp$
tmp$ CATSTR <arg>, < > ; two blanks to make sure there are at least three chars
tmp$ SUBSTR tmp$, 1, 3
ifidni tmp$, <xmm>
echo ### xmm found ###
else
echo ### something else...
endif
is INSTR <arg>, <xmm>
if is eq 1
echo @@@ xmm found @@@
else
echo @@@ arg = something else...
endif
; all: Myarg=eax with opattr=48
; MLv10: Myarg=xmm0 with opattr=48
; JWasm: Myarg=xmm0 with opattr=48
; MLv615: Myarg=XMM(0) with opattr=48
ENDM
.code
x1 dd 123
start:
GetType x1 ; name is only 2 chars long
GetType eax
GetType xmm0
exit
.err ; don't build, just show the echos
end start
@jj2007,
You don't have to go to all that trouble, the type function distinguishes between xmm0 and rax: type(xmm0) = 10h. Old ML versions were broken - type returned 8 for xmm0, like rax, but it's been fixed since ver 8. BTW ML64 gives type(ymm0) as 20h, but Jwasm incorrectly says it's 8.
@vertograd,
the good news is, you don't need xmm0 for printf (or sprintf or any of that family). Instead, pass reals in the GPR's. The bad news is you just wasted x hours trying to get printf to read xmm0, which (AFAIK) it doesn't do.
ps. I think I'll look into converting deb to 64 bits, I need (at least some of) that capability.
Thanks , Jochen . GetType macro runs on Linux too :t
@rrr:
Well, thank you for the information . Honestly I've already started to suspect that something is wrong with that function itself . Now I must look for another function that can take the arguments from XMM registers ...
You said that I 'wasted x hours' but I confess the few hours don't matter at all when I successfully wasted the best years of my life , that's about a half of my lifetime ...
BTW I have good news for you too : my version of JWASM reports the correct value for XMM registers - 10h. What version do you use?
Mine:
QuoteJWasm v2.11, Oct 20 2013, Masm-compatible assembler.
Well, it's not that there's anything wrong with printf. There's a lot of misinformation around - you can read that real arguments are always passed in xmm0..xmm3 (under the new Windows x64 calling conventions) but that's not the case. VARARG functions, like printf, use the GPR's instead, also some others - go to the source (MSDN) for the correct info.
Yes, JWasm does xmm registers correctly - but you're reading too fast! I said "Ymm" type is incorrect in JWasm 2.11.
FWIW actually I don't think time is wasted beating one's head against code - that's what it's all about - in coding, persistence is much more important than perspicuity :biggrin:
Quote from: rrr314159 on February 01, 2015, 05:31:11 AM
...
Yes, JWasm does xmm registers correctly - but you're reading too fast! I said "Ymm" type is incorrect in JWasm 2.11.
...
oops. sorry ... where is my glasses :icon_eek:?
Quote from: rrr314159 on February 01, 2015, 04:06:41 AMps. I think I'll look into converting deb to 64 bits, I need (at least some of) that capability.
Lines 9533ff in \Masm32\MasmBasic\MasmBasic.inc - it wasn't meant for open source teamwork, though :bgrin:
PM me if you need details.
Hmmmm ... beginning to sound like work! I just want to get some of the printing routines, didn't realize it was part of such a large package. Probably calls other routines, that call other routines, that call ... Well, I'll probably just borrow some techniques, as I did with qword's so-called "Simple" Math. You guys have churned out a lot of lines!
@rrr:
check this thread (http://masm32.com/board/index.php?topic=1892.0) for some open source printing routines.
I wonder if storing the contents of XMM register in double QWORD at memory location and printing it out as 2 QWORDs sequentially is the only way to dump XMM to console.
Quote from: vertograd on February 02, 2015, 03:26:04 AMI wonder if storing the contents of XMM register in double QWORD at memory location and printing it out as 2 QWORDs sequentially is the only way to dump XMM to console.
Depends on what you want to know:
include \masm32\MasmBasic\MasmBasic.inc
Init
sub esp, OWORD ; create a slot
fldpi
fld st
fstp REAL8 ptr [esp]
fstp REAL8 ptr [esp+8]
movups xmm0, [esp]
deb 4, "2*PI", f:xmm0, d:xmm0, x:xmm0, b:xmm0
add esp, OWORD
Exit
end startOutput:
2*PI
f:xmm0 3.141592653589793 <<< lower qword as REAL8 aka double
d:xmm0 4614256656552045848 <<< same as integer
x:xmm0 400921FB 54442D18 400921FB 54442D18 <<< full 128 bits as hex, as in Olly
b:xmm0 01010100010001000010110100011000 <<< 32 bits
Quote from: jj2007 on February 02, 2015, 05:52:19 AM
...
Depends on what you want to know:
...
I want to know how to print the content of XMM register in several ways.
At the moment I know this:
XMM ->double QWORD variable->GPR-> print(f)
Had no luck with this:
XMM->print(f)
and this:
XMM->stack->print(f)
I'm sure that another way of doing this is possible and it doesn't depend on anything (sort of
"Ding an sich")
Quote from: vertograd on February 02, 2015, 07:03:24 AMXMM->stack->print(f)
At least this one is simple:
include \masm32\include\masm32rt.inc
.686p
.xmm
.code
o1 OWORD 12345678abcdef0112345678abcdef02h
start:
movups xmm0, o1
pshufd xmm0, xmm0, 00011011b
sub esp, OWORD
movups [esp], xmm0
REPEAT 4
pop eax
print hex$(eax), " "
ENDM
exit
end start
Output: 12345678 ABCDEF01 12345678 ABCDEF02
Thanks Jochen :t
It works on Linux in the following modification:
INCLUDE fc.asm ; FCALLTEST macros set
.data
frm db "%x",0
o1 OWORD 12345678abcdef0012345678abcdef00h
.code
_start:
movups xmm0, o1
sub rsp, OWORD
movups [rsp], xmm0
mov r15,3
.REPEAT
FCALLTEST printf,offset frm,[rsp+4*r15]
dec r15
.UNTIL r15==-1
FCALLTEST exit,0
end _start
OUTPUT:
Quote12345678abcdef0012345678abcdef00
XMM->stack->print(f) variant is done!
- you can probably use .UNTIL Sign? instead of .UNTIL r15==-1
- don't forget add rsp, OWORD (my version does 4 pops, so no need for that)
Yes,you're right , UNTIL SIGN? is much better and stack is restored:
INCLUDE fc.asm ; FCALLTEST macros set
.data
frm db "%x",0
o1 OWORD 12345678abcdef0012345678abcdef00h
.code
_start:
movups xmm0, o1
sub rsp, OWORD
movups [rsp], xmm0
mov r15,3
.REPEAT
FCALLTEST printf,offset frm,[rsp+4*r15]
dec r15
.UNTIL SIGN?
add rsp, OWORD
FCALLTEST exit,0
end _start
One question that I'm asking myself:
Why cannot I print the double QWORD value in 2 interations instead of 4?
At the start the stack is aligned to 16 bits . Maybe something in FCALLTEST macro ? :icon_confused:
Why not this?:
include \myinc\inc64.inc
.data
o1 OWORD 12335678aacdff0112344678abbeef02h
.code
start:
mov r15, 3
@@:
mov eax, DWORD PTR o1[r15*4]
prnt "%x ", eax
dec r15
jge @B
prnt "\n"
; or, if you wish, a second suggestion:
prnt "%x %x %x %x\n", DWORD PTR o1[12], DWORD PTR o1[8], DWORD PTR o1[4], DWORD PTR o1
ret
end start
Notes:
- I'm using my inc64.inc with my "prnt" macro, equiv to masm32rt.inc print or FC printf.
- Have to use "DWORD PTR" - but surely that's simpler than using both xmm0 AND rsp?
- requires /LARGEADDRESSAWARE:NO linker switch.
As I was about to post u asked to print in 2 interations instead of 4. U know, you could use (with my prnt function, I'm sure FC can do similar) the 2nd suggestion, do it in one line.
[edit] woops, read above posts more carefully. I see you want to print out xmm0 directly, NOT o1 - that's just a value to init xmm0 with. Sorry - where is my glasses ?? :icon_eek:
Well, how about this?include \myinc\inc64.inc
.data
o1 OWORD 12335678aacdff0112344678abbeef02h
.code
start:
mov r15, 1
@@:
mov rax, qword ptr o1[r15*8]
prnt "%llx ", rax
dec r15
jge @B
prnt "\n"
ret
end start
prntxmm.asm: 16 lines, 2 passes, 0 ms, 0 warnings, 0 errors
12335678aacdff01 12344678abbeef02
Forgot u wanted it on the stack (I'm in a hurry):include \myinc\inc64.inc
.data
o1 OWORD 12335678aacdff0112344678abbeef02h
.code
start:
movups xmm0, o1
movups [rsp-16], xmm0
mov r15, [rsp-8]
mov r14, [rsp-16]
prnt "%llx ", QWORD PTR r14
prnt "%llx \n", QWORD PTR r15
ret
end start
@rrr:
Thanks for posting your examples. Your "2-iterations" and "stack" routines work nicely here but especially I appreciate your "second suggestion" to do it in one line :icon14:
Yes, my macro can do that too:
FCALLTEST printf, offset frm1, qword ptr o1[12], qword ptr o1[8], qword ptr o1[4], qword ptr o1
moreover it can take the values from the stack:
FCALLTEST printf, offset frm1, qword ptr [rsp+12], qword ptr [rsp+8], qword ptr [rsp+4], qword ptr [rsp]
I'm going to write PRINT macro to avoid such a long line of code:
PRINT XMM0
looks better
Hi vertograd,
Your post took me aback for a moment because printf won't take a value from the stack, since of course it uses rsp for its own purposes, but then I noticed "FCALLTEST". I reckon you're doing the sensible thing, getting the value off the stack and putting it in rdx, r8, r9, or etc, b4 calling printf. I'll probably do that also in my "prnt" routine, but I was thinking about substituting some other register for rsp, such as rbp. That works when done "by hand" but to get it right in all cases appears difficult and/or time-wasting. There are other ways also but all seem worse than the first one.
Some would say we're wasting effort, implementing 2 similar print macros; but that's wrong. It's like saying, if we go for a bicycle ride, one of us is duplicating effort, should stay at home and watch TV! Like bicycling coding is enjoyable and good exercise. Of course too often it's like a bicycle ride where a tire goes flat, the chain breaks, u get caught in a tornado, hit over the head by bandits, then arrested on the way home for littering the path with blood and broken bicycle parts ... ;)
BLANK
First draft of PRINT macro is done. No FPU resgisters support yet.
movups XMM0, o1
PRINT RSP
PRINT XMM0
PRINT RSP
OUTPUT:
QuoteRSP = a6254f80
XMM0 = 12345678 abcdef00 12345678 abcdef00
RSP = a6254f80
[EDIT] :BINGO ! XMM->print(f) is done! I knew it's possible:
.data
frm db "%f",10,0
r REAL8 123.456789
.code
_start:
movsd XMM0, r
mov rdi, offset frm
mov rax, 1
call printf
OUTPUT:
Quote123.456789
@vertograd,
we got another foot of snow, been digging out all morning.
I have been considering what you did with "PRINT" for, e.g, [rsp+4], and I might have an improvement for nvk to do similar, without slowing it down much (just 2 extra instructions!) It would have been done (or, discarded as unworkable) by now but for the snow.
However, I'm confused by your last edit, BINGO XMM=>print(f). The code doesn't work for me - nothing is output. Why put the format string in rdi (normally it goes in rcx) and what's rax got to do with it (to paraphrase Tina Turner)? Of course you have a very good reason: "it works" - but not for me (yet, anyway). R u sure you've posted it correctly? Normally args go in rcx, rdx, r8, r9. I'm missing something, and it just occurred to me - is this Linux code?
Quote from: rrr314159 on February 03, 2015, 05:44:05 AMtoo often it's like a bicycle ride where a tire goes flat, the chain breaks, u get caught in a tornado, hit over the head by bandits, then arrested on the way home for littering the path with blood and broken bicycle parts ... ;)
Sounds like ordinary Windows coding :lol:
OK ... I smell a rat! :eusa_naughty:
And now, back to our regularly scheduled programming... :icon_cool:
BLANK
Hi vertograd,
Been working on other things (math algo's using SSE/AVX) and ran into Linux printf function, was reminded of this post. As everyone else already knew, of course your BINGO example is legit Linux - thot u were just pulling my leg!
More snow here yesterday and more on the way. Cat is going stir crazy, wants to be outside ... she wishes there were a rat in the house, give her something to do!
One of these days I'll get back to these printing issues - need to upgrade nvk to do it right - but math routines are (to me) much more important. Should have some interesting results to post soon - amazing how much power is hidden in modern CPU's. Seems no-one is really tapping the full potential.
c u later, good luck with your coding, jogging and beer!
Hello Buddy!
I'm here just to say thanks, It worked perfect! I'm just a beginner and I had to parse a x86 code to x64 version, as I'm using masm64, this helped a lot. Thanks.
Hi meneghini,
glad it helped. Only problem I've noticed, it can make the .exe quite a bit larger, 20%. Often u can replace it by just putting the arguments in rcx,rdx etc then calling the function. What I do, always use it when developing, then in the final product I might try replacing it like that. More than half the time stack is already aligned, etc and nvk's not necessary. Particularly helps when in a macro; if it gets re-instantiated 20 times, saves quite a few bytes to replace it. But good chance you don't care, masm produces such small .exe's anyway it's not a big deal.
thanks for the thanks, I appreciate it!