So as mentioned in another thread, HJWasm 2.22+ features a built-in macro library which automatically adapts to the selected OPTION ARCH:<SSE|AVX> settings etc.
If you have any ideas for macros that should be built-in to HJWasm (custom invokes, prologue, helper functions.. etc) put them here.
For example we might add a DELPHI32_INVOKE, and DELPHI32_PROLOGUE/DELPHI32_EPILOGUE as built in macros to enable that form of ABI.
Quote from: johnsa on March 31, 2017, 08:00:16 AM
For example we might add a DELPHI32_INVOKE, and DELPHI32_PROLOGUE/DELPHI32_EPILOGUE as built in macros to enable that form of ABI.
That will be amazing. :t
Hi johnsa,
A relaxed invoke macro option calling registers and variables without prototyping :
include MsgBoxTimeout.inc
.data
user32 db 'user32.dll',0
text db 'This message box will destroy itself after 4000 miliseconds',0
caption db 'Self-destroying message box',0
func db 'MessageBoxTimeoutA',0
.data?
hModule dd ?
.code
start:
invoke LoadLibrary,ADDR user32
mov hModule,eax
invoke GetProcAddress,eax,ADDR func
test eax,eax
jz _exit
_invoke eax,0,ADDR text,ADDR caption,\
MB_ICONWARNING,LANG_NEUTRAL,TIMEOUT
_exit:
invoke FreeLibrary,hModule
invoke ExitProcess,0
END start
return keyword on prototypes;
mov myvar,invoke function,par1,par2,par3
From what I have see, on ms-dos, linux and windows (32 or 64), most functions return values on ax/eax/rax register, and if 2nd return value exists it will be on dx/edx/rdx register. But some functions can return some flags setup. This can be expanded to xmm registers, ... .
If function above have a void return type, so an error message should inform user.
A enum macro can be usefull too.
A syscall like invoke (with prototype check):
__NR_exit equ 60 <--- an enum
syscall __NR_exit,0 <---
I cannot create prototypes to syscall instruction, the same way I can't create to 'int' instruction, like int 80,int 21,int 2f...
syscall eax,rdi,rsi,rdx,r10,r8,r9 <--eax means function enum, other registers are parameters, used on linux x86-64.
int 80h,eax,ebx,ecx,edx,esi,edi,ebp <--eax means function enum, used on linux 32 to call native functions, system call.
Please, check abi just to be sure if sequence above is right.
So, you should create a new name (cannot be invoke because we can mix 'call' and 'syscall' on same source code) , I don't have suggestions, and program will appear like ideal mode to be portable, we can port linux 32 to 64 bits this way. What changes are enumerations only. On 32 bits, "__NR_exit EQU 1" and on x86-64 "__NR_exit EQU 60".
I don't have sure if on linux 32 bits have 2 different calling way, maybe from kernel 2.2 below is one and above 2.2 is the one listed above.
So, linux to bsd can be done too (bsd use other abi or calling convention, I don't know exact name to be said).
Option to Align XMM Local variables to 16 bytes under x86.
Quote from: aw27 on April 04, 2017, 10:42:33 PM
Option to Align XMM Local variables to 16 bytes under x86.
+1 :)
Quote from: aw27 on April 04, 2017, 10:42:33 PM
Option to Align XMM Local variables to 16 bytes under x86.
Could be easily done in the PROLOG macro, with the advantage that the source would still assemble with ML. There is a huge 32-bit codebase...
Quote from: jj2007 on April 05, 2017, 01:46:20 AM
Quote from: aw27 on April 04, 2017, 10:42:33 PM
Option to Align XMM Local variables to 16 bytes under x86.
Could be easily done in the PROLOG macro, with the advantage that the source would still assemble with ML. There is a huge 32-bit codebase...
I guess so, Jochen, but ML is not using a macro for prolog since pre-6.0 ages and I am not the man to develop one to do this job.
I am ready to sacrifice ML, which I am using now for x86, for the Option I requested.
In 64 bit MASM a 32 byte aligned stack frame is this easy. Tweak is done in the stackframe macro.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
tstproc proc
LOCAL .ymm12 :YMMWORD
LOCAL .ymm13 :YMMWORD
LOCAL .ymm14 :YMMWORD
LOCAL .ymm15 :YMMWORD
vmovntdq .ymm12, ymm12
vmovntdq .ymm13, ymm13
vmovntdq .ymm14, ymm14
vmovntdq .ymm15, ymm15
nop
nop
nop
nop
nop
nop
nop
nop
vmovntdqa ymm12, .ymm12
vmovntdqa ymm13, .ymm13
vmovntdqa ymm14, .ymm14
vmovntdqa ymm15, .ymm15
ret
tstproc endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
The biggest problem with 32byte aligned stack is that the OS doesn't guarantee that for you on entry in the first place, so you'd need to start with a manual adjust to RSP to get things aligned 32 first before having the right prologue/epilogues to keep it that way.
Quote from: hutch-- on April 05, 2017, 03:49:23 AM
In 64 bit MASM a 32 byte aligned stack frame is this easy. Tweak is done in the stackframe macro.
You will need to place all the 16-byte variables together at the beginning otherwise they will not remain aligned. If you intersperse variable types you will have an alignment problem.
On the other hand, the stack is always aligned to 16-bytes at the beginning after you push rbp, so why bother to align again. After your tweak, which I have not seen, will be 32-bytes but conclusion is the same.
This is the way I read the Prolog macro, but something may have escaped me.
I'm thinking, that built-in macros mought affect only on replacing builti-in assembler commands (or it could became AsmC), that could overheat system in much repetitions (especially cheap) as xlat, scasX, stosX, lodsX, movsX, cld/std (could be need empty), movsx/movzx (and not used in invoke), enter/leave : as possible mark authors of compilers them also avoiding.
> If you intersperse variable types you will have an alignment problem.
This is correct and it involves the discipline of stacking LOCAL variables in descending order of size, YMM then XMM then 64 bit and so on down to BYTE variables. The default prologue macro I use is 16 byte aligned which works fine with XMM, all it needs to handle YMM registers is 32 byte alignment. What I am inclined to do with MASM is just add the 32 byte version to the macro options as the main one works well and I don't want to further complicate the macro call with another option.
The reason for making this suggestion is that at a design level of an assembler, it would be very easy to test the data sizes while parsing the source code to ensure that a top down ordering is done and create an alignment error if the locals are out of order.
Quote from: hutch-- on April 05, 2017, 11:39:07 AM
> If you intersperse variable types you will have an alignment problem.
This is correct and it involves the discipline of stacking LOCAL variables in descending order of size,
If we assume all big structure and union variables will have fields aligned to 16-bytes at least, which can be a waste of memory when they don't contain SIMD instructions.
Quote from: hutch-- on April 05, 2017, 11:39:07 AMensure that a top down ordering is done and create an alignment error if the locals are out of order.
John has remarked that ordering can be problematic if coder accesses more than one variable e.g. with a movups for 4 dwords. But an error (or a warning?) for unaligned locals would be a great solution, as it forces the coder to observe the "big ones first" logic. Better than chasing mysterious bugs.
> which can be a waste of memory when they don't contain SIMD instructions
This is not the case as stack memory is ALREADY allocated and all you are doing for the duration of the procedure and any further nested procedures is offsetting the stack usage until the instruction sequence returns to the caller. Now it could be a problem if you were writing a highly recursive algorithm that progressively used a large amount of stack address space but the solution to that problem would be to either set a large stack in the linker OR set a recursion depth limiter OR both.
The simple answer is you cannot waste what has already been wasted.
Quote from: hutch-- on April 05, 2017, 05:54:09 PM
> which can be a waste of memory when they don't contain SIMD instructions
This is not the case as stack memory is ALREADY allocated and all you are doing for the duration of the procedure and any further nested procedures is offsetting the stack usage until the instruction sequence returns to the caller. Now it could be a problem if you were writing a highly recursive algorithm that progressively used a large amount of stack address space but the solution to that problem would be to either set a large stack in the linker OR set a recursion depth limiter OR both.
The simple answer is you cannot waste what has already been wasted.
For ML64 you have no better alternative, you really have to do it with macros.
:biggrin:
That's the price of write in a MACRO assembler, you can. :P
Quote from: hutch-- on April 05, 2017, 05:54:09 PMit could be a problem if you were writing a highly recursive algorithm
Right, although rarely relevant. Another argument is cache use, of course.
Not all XMMWORDs need align 16. On modern CPUs, movups and movaps are equally fast. Perhaps the coder could decide (with an option) if a misaligned XMMWORD should issue a warning, or throw an error.
The case is more that movups will be "equally" fast if the data item happens to be aligned, but slower if not.
so movups gives you something as performant as movaps when the data is aligned, but doesn't explode in a heap when un-aligned..
basically it works the way the damn thing should have in the first place and there should never have been an aligned/unaligned variant :) imho
Quote from: johnsa on April 05, 2017, 08:53:53 PM
basically it works the way the damn thing should have in the first place and there should never have been an aligned/unaligned variant :) imho
It gave jobs to C++ programmers at Microsoft who invented data types to pass data already aligned.
Yes, my wording was sloppy - and I fully agree with both of you :bgrin:
Anyway, warning or error for misaligned local xmmwords should be an option. While movups can replace movaps with no price to pay, there are many SIMD instructions that blow up when you write to/from unaligned memory.
My variant of incresing macrolib flexibility and abilities, only basic improvements realised : as Win16-32-64 universality and more classes to macros added, that's possible turn on and off by one, and even by name of each macro, using mlib and nomlib command lline options :eusa_boohoo:
And that not to make substitiutes for microlib names, maybe better to change code searching symbols :
cmdline.c :
line 42 :
#include "macrolib.h"
line 410 :
static void OPTQUAL Set_NOMLIB(void)
{
#if defined ML_SWN
if (*OptName) noAutoMacrosAdd(OptName + 1);
else
#endif
Options.nomlib = TRUE;
}
static void OPTQUAL Set_MLIB(void)
{
#if defined ML_SWN
if (*OptName) inAutoMacrosAdd(OptName + 1);
else
#endif
Options.nomlib = FALSE;
}
line 611 :
{ "nomlib=@", 0, Set_NOMLIB },
{ "mlib=@", 0, Set_MLIB },
expans.c
line 1164 : for invariant to register macro names
if( tokenarray[i].token == T_ID || tokenarray[i].token == T_INSTRUCTION) {
line 1168 : for invariant to register macro names
sym = SymFindDeclare(tokenarray[i].string_ptr);
else
#ifdef __SW_BD
sym = SymFindToken(tokenarray[i].string_ptr, tokenarray[i].token);
#else
sym = SymSearch( tokenarray[i].string_ptr );
#endif
symbols.c
line 309 : for invariant to register macro names
#ifdef __SW_BD
struct asym *SymFindToken( const char *name, int token)
/**************************************/
/* find a symbol in the local/global symbol table,
* FOR REPLACE INSTRUCTIONS BY MACROSES
* return ptr to next free entry in global table if not found.
* Note: lsym must be global, thus if the symbol isn't
* found and is to be added to the local table, there's no
* second scan necessary.
*/
{
int i;
int len;
len = strlen( name );
i = hashpjw( name );
if ( CurrProc ) {
for( lsym = &lsym_table[ i % LHASH_TABLE_SIZE ]; *lsym; lsym = &((*lsym)->nextitem ) ) {
if ( len == (*lsym)->name_size && SYMCMP( name, (*lsym)->name, len ) == 0 ) {
DebugMsg1(("SymFind(%s): found in local table, state=%u, local=%u\n", name, (*lsym)->state, (*lsym)->scoped ));
(*lsym)->used = TRUE;
return( *lsym );
}
}
}
for( gsym = &gsym_table[ i % GHASH_TABLE_SIZE ]; *gsym; gsym = &((*gsym)->nextitem ) ) {
if ( len == (*gsym)->name_size && ((token == T_INSTRUCTION && (*gsym)->state == SYM_MACRO) ? (_memicmp(name, (*gsym)->name, len) == 0) : (SYMCMP(name, (*gsym)->name, len) == 0)) ) {
DebugMsg1(("SymFind(%s): found, state=%u memtype=%X lang=%u\n", name, (*gsym)->state, (*gsym)->mem_type, (*gsym)->langtype ));
return( *gsym );
}
}
return( NULL );
}
#endif