Hi, this is my first post. Nice to know you.
Finally got a minimal "Hello World" working, with ML64.exe and GoLink for console. You have no idea how happy I am to get it to work. :greenclp:
64-bit is my entry point into MASM though. I have never tried 32-bit MASM. Here's my initiation code;
;-----------------------------------
;Descr : My First 64-bit MASM program
;ml64 prog.asm /c
;golink /console /entry main prog.obj msvcrt.dll, or
;gcc -m64 prog.obj -o prog.exe (main/ret). This needs 64-bit GCC.
;-----------------------------------
extrn printf:proc
extrn exit:proc
.data
hello db 'Hello 64-bit world!',0ah,0
.code
main proc
mov rcx,offset hello
sub rsp,20h
call printf
add rsp,20h
mov rcx,0
call exit
main endp
end
There are questions I'd like to ask some more but I just want to share the joy here first. Maybe you can add some more advice and tricks.
Thanks
My next questions;
1. Is this the correct thread to post something noob like this? Or should I move to other threads.
2. ML64 did not complain the use of "extern" and "extrn". Should I be concerned about this? Are they interchangeable in this particular setting?
3. Don't I need some kind of alignment somewhere? This alignment stuff scares me a little bit though.
Thanks
I can sympathise with anyone who has had to twiddle the stack manually but the simple answer is you need a stack frame for any high level procedure including any of the C runtime functions. You basically have 2 types of procedures, pure mnemonic procedures where you pass arguments through registers to a limit of 4 registers and for higher level procedures you set up a stack frame. The recommended method is a pair of prologue / epilogue macros for managing the stack frame so you can turn the stackframe on and off.
Having suffered the same learning curve as yourself, I wrote a pair of macros (among many others) that do both the stackframe and a reasonable simulation of the old 32 bit MASM "invoke". If you have a look in the ML64 subforum you will find a reasonable amount of stuff already up and working correctly, API libraries including MSVCRT, the masm64 macro file and a collection of examples. The macros for the stackframe are complicated but they will show you exactly how they work. The most recent help file explains how the prologue and invoke code works.
Quote from: hutch-- on January 10, 2017, 04:19:32 AM
I can sympathise with anyone who has had to twiddle the stack manually but the simple answer is you need a stack frame for any high level procedure including any of the C runtime functions. You basically have 2 types of procedures, pure mnemonic procedures where you pass arguments through registers to a limit of 4 registers and for higher level procedures you set up a stack frame. The recommended method is a pair of prologue / epilogue macros for managing the stack frame so you can turn the stackframe on and off.
Having suffered the same learning curve as yourself, I wrote a pair of macros (among many others) that do both the stackframe and a reasonable simulation of the old 32 bit MASM "invoke". If you have a look in the ML64 subforum you will find a reasonable amount of stuff already up and working correctly, API libraries including MSVCRT, the masm64 macro file and a collection of examples. The macros for the stackframe are complicated but they will show you exactly how they work. The most recent help file explains how the prologue and invoke code works.
Thanks for the reply. I've looked into your 64-bit project and can't praise you enough for what you been doing. Lots of future 64-bit younglings will rely heavily on it for sure.
As for the alignment I think you are right. A friend of mine also told me that the safest bet for a beginner like me wading through the 64-bit WIN APIs is to use the old school prolog/epilog setup until I can figure out the different alignment / frame setup requirements for different things. So for the above code, I should have setup my main function to be
main proc
push rbp
mov rbp,rsp
mov rcx,offset hello
sub rsp,20h
call printf
add rsp,20h
mov rsp,rbp
pop rbp
ret ;for GCC
main endp
I am still confused about PROC/ENDP though. What does a "PROC / ENDP" do exactly down the assembly level? What does it hide or do silently in 64-bit environment when MS employs only a single fastcall convention. For example, does it automatically setup the shadow space / stack frame for my routine or just some simple
label...ret scope definition?
Regards
You control the "proc" "endp" pair with the prologue / epilogue macro if the stack frame is turned on. If the stack frame is turned off it is just the start and end of a pure mnemonic procedure. With ML64 you only use RET, not RETN, return value is in RAX, but you can use any of the other unprotected registers as well to return more than one value. RAX RCX RDX R8 R9 R10 R11 are the unprotected or transient registers. The numbered registers can specify sizes as well, R8 R8b R8w R8d gives you the four sizes. You cannot access the high byte in 64 bit directly so AH BH CH DH cannot be used. Its not a problem as you have enough registers to handle it.
Something I found incredibly useful was a disassembler/debugger called ArkDasm. It allowed me to have a look at exactly the questions you are asking and it was one of the main tools I used while developing the prologue and invoke style macros. The stackframe is pretty low overhead because it uses ENTER and LEAVE. The stackframe macro loads the first 4 register arguments into the shadow space which is necessary in a high level procedure so you don't overwrite the register values.
With the stack frame you can write high level code without the melodrama of twiddling the stack and when you need real speed and low overhead with a pure mnemonic leaf procedure you can turn it off.
Great explanations. Thanks.
That means I can go bare-metal with MASM without any obvious limitations as far as the OS is concern.
One more thing, try and avoid using PUSH POP as they change the RSP register. Here is one of the procs so far for the 64 bit library. This one uses other registers but if you want to write a pure mnemonic procedure with many registers including the ones that must not change across procedure calls, you can use ".DATA?" memory.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
mcopy proc
; rcx = source address
; rdx = destination address
; r8 = byte count
; --------------
; save rsi & rdi
; --------------
mov r11, rsi
mov r10, rdi
cld
mov rsi, rcx
mov rdi, rdx
mov rcx, r8
shr rcx, 3
rep movsq
mov rcx, r8
and rcx, 7
rep movsb
; -----------------
; restore rsi & rdi
; -----------------
mov rdi, r10
mov rsi, r11
ret
mcopy endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Nice! If I read it correctly, it's a memory copy function?
A few questions Hutch;
1. Why use R11? R15/R14 are the loneliest registers on earth and nobody seems to be using them. AFAIK, R11 returns the RFLAGS status to some interested parties. This is true in Linux too.
2. Do you use SSE instructions / registers in your 64-bit libraries or just standard x64 instructions? Love your mcopy routine. But if you could shr it by 4, it opens up to a whole new opportunity to use SSE string instructions and registers.
Love what you're doing Hutch. Keep it up and God bless you.
There is no problem using SSE and AVX instructions, you just have to look up the instructions in the Intel manuals. Sad to say I have to do a lot more of the hacky stuff before I can get into the really fast stuff.
Now as far as register usage goes, long ago I have learnt to fully comply with whatever the appropriate ABI happens to be and you get reliable code that works across different version of Windows. From memory Linux has a slightly different ABI but its the same mentality, C compilers generally use the full spread of registers and use them according to the OS register convention so its worth playing safe here.
This algo below is an AVX memory copy procedure and it is faster than the legacy version and the SSE version. Note that the memory must be 256 bit aligned.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
ymmcopya proc
; rcx = source address
; rdx = destination address
; r8 = byte count
mov r11, r8
shr r11, 5 ; div by 32 for loop count
xor r10, r10 ; zero r10 to use as index
lpst:
vmovntdqa ymm0, YMMWORD PTR [rcx+r10]
vmovntdq YMMWORD PTR [rdx+r10], ymm0
add r10, 32
sub r11, 1
jnz lpst
mov rax, r8 ; calculate remainder if any
and rax, 31
test rax, rax
jnz @F
ret
@@:
mov r9b, [rcx+r10] ; copy any remainder
mov [rdx+r10], r9b
add r10, 1
sub rax, 1
jnz @B
ret
ymmcopya endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Hi,
Quote from: coder on January 10, 2017, 01:46:43 AM
2. ML64 did not complain the use of "extern" and "extrn". Should I be concerned about this? Are they interchangeable in this particular setting?
The oldest versions of MASM used "EXTRN" only. Somewhere
along the way "EXTERN" was added. They are the same directive,
fully interchangeable.
Regards,
Steve N.
Quote from: hutch-- on January 10, 2017, 09:21:10 AM
There is no problem using SSE and AVX instructions, you just have to look up the instructions in the Intel manuals. Sad to say I have to do a lot more of the hacky stuff before I can get into the really fast stuff.
Now as far as register usage goes, long ago I have learnt to fully comply with whatever the appropriate ABI happens to be and you get reliable code that works across different version of Windows. From memory Linux has a slightly different ABI but its the same mentality, C compilers generally use the full spread of registers and use them according to the OS register convention so its worth playing safe here.
This algo below is an AVX memory copy procedure and it is faster than the legacy version and the SSE version. Note that the memory must be 256 bit aligned.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
ymmcopya proc
; rcx = source address
; rdx = destination address
; r8 = byte count
mov r11, r8
shr r11, 5 ; div by 32 for loop count
xor r10, r10 ; zero r10 to use as index
lpst:
vmovntdqa ymm0, YMMWORD PTR [rcx+r10]
vmovntdq YMMWORD PTR [rdx+r10], ymm0
add r10, 32
sub r11, 1
jnz lpst
mov rax, r8 ; calculate remainder if any
and rax, 31
test rax, rax
jnz @F
ret
@@:
mov r9b, [rcx+r10] ; copy any remainder
mov [rdx+r10], r9b
add r10, 1
sub rax, 1
jnz @B
ret
ymmcopya endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
That's very impressive Hutch. Thanks for the code :greenclp:
Quote from: FORTRANS on January 10, 2017, 09:48:55 AM
Hi,
Quote from: coder on January 10, 2017, 01:46:43 AM
2. ML64 did not complain the use of "extern" and "extrn". Should I be concerned about this? Are they interchangeable in this particular setting?
The oldest versions of MASM used "EXTRN" only. Somewhere
along the way "EXTERN" was added. They are the same directive,
fully interchangeable.
Regards,
Steve N.
Got it.
deleted
nidud is right here, when I first started on Win64 I was using the simple EXTERN but was getting massive object modules from it. One of our members "qword" said use EXTERNDEF and they dropped to normal size. Here are the first few prototypes from kernel32.inc for ML64.
externdef __imp_ActivateActCtx:PPROC
ActivateActCtx equ <__imp_ActivateActCtx>
externdef __imp_AddAtomA:PPROC
AddAtomA equ <__imp_AddAtomA>
IFNDEF __UNICODE__
AddAtom equ <__imp_AddAtomA>
ENDIF
externdef __imp_AddAtomW:PPROC
AddAtomW equ <__imp_AddAtomW>
IFDEF __UNICODE__
AddAtom equ <__imp_AddAtomW>
ENDIF
externdef __imp_AddConsoleAliasA:PPROC
AddConsoleAliasA equ <__imp_AddConsoleAliasA>
IFNDEF __UNICODE__
AddConsoleAlias equ <__imp_AddConsoleAliasA>
ENDIF
externdef __imp_AddConsoleAliasW:PPROC
AddConsoleAliasW equ <__imp_AddConsoleAliasW>
IFDEF __UNICODE__
AddConsoleAlias equ <__imp_AddConsoleAliasW>
ENDIF
Thanks nidud and Hutch for that important pointer on EXTERNDEF. One more future hiccup to avoid.
include win64a.inc
includelib msvcrt.lib
include msvcrt.inc
.code
WinMain proc
mov ecx,offset hello
call printf
xor ecx,ecx
call exit
WinMain endp
hello db 'Hello 64-bit world!',0ah,0
end
try MasmBasic (http://masm32.com/board/index.php?topic=94.0):
include \Masm32\MasmBasic\Res\JBasic.inc
Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
Inkey Chr$("Hello World: This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format")
EndOfCode
Hello World: This code was assembled with HJWasm32 in 64-bit format
Another example :
include HelloX64.inc
.data
msg db 'Hello world!',0
.code
start:
sub rsp,4*8+8
invoke StdOut,ADDR msg
invoke ExitProcess,0
StdOut PROC lpszText:QWORD
LOCAL hOutPut:QWORD
LOCAL bWritten:QWORD
LOCAL sl:QWORD
LOCAL _lpszText:QWORD
mov _lpszText,rcx
invoke GetStdHandle,STD_OUTPUT_HANDLE
mov hOutPut,rax
invoke lstrlen,_lpszText
mov sl,rax
invoke WriteFile,hOutPut,_lpszText,sl,ADDR bWritten,NULL
mov rax,bWritten
ret
StdOut ENDP
END
Thanks for the extra codes guys. The more, the merrier.
Here is another one using the MASM64 project I am working on.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
conout "Howdy, your new console template here.",lf,lf
waitkey
invoke ExitProcess,0
ret
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
Hutch, you are breaking Initiation Ritual for beginners. It should be "Hello World" or something :lol:
In my OZ dialect is "Urrrrgh, G'day". :biggrin:
Haha. You have no idea how scary that is to asm beginners coming from HLL who are hoping to see something like hello world during the first days.
It's like "hello world or never" thing.