The MASM Forum

64 bit assembler => UASM Assembler Development => Topic started by: johnsa on March 31, 2017, 08:00:16 AM

Title: HJWasm Macro Library Suggestions
Post by: johnsa on March 31, 2017, 08:00:16 AM
So as mentioned in another thread, HJWasm 2.22+ features a built-in macro library which automatically adapts to the selected OPTION ARCH:<SSE|AVX> settings etc.

If you have any ideas for macros that should be built-in to HJWasm (custom invokes, prologue, helper functions.. etc) put them here.

For example we might add a DELPHI32_INVOKE, and DELPHI32_PROLOGUE/DELPHI32_EPILOGUE as built in macros to enable that form of ABI.
Title: Re: HJWasm Macro Library Suggestions
Post by: aw27 on March 31, 2017, 02:43:35 PM
Quote from: johnsa on March 31, 2017, 08:00:16 AM
For example we might add a DELPHI32_INVOKE, and DELPHI32_PROLOGUE/DELPHI32_EPILOGUE as built in macros to enable that form of ABI.
That will be amazing.  :t
Title: Re: HJWasm Macro Library Suggestions
Post by: Vortex on April 01, 2017, 07:00:26 AM
Hi johnsa,

A relaxed invoke macro option calling registers and variables without prototyping :

include     MsgBoxTimeout.inc

.data

user32      db 'user32.dll',0
text        db 'This message box will destroy itself after 4000 miliseconds',0
caption     db 'Self-destroying message box',0
func        db 'MessageBoxTimeoutA',0

.data?

hModule     dd ?

.code

start:

    invoke  LoadLibrary,ADDR user32
    mov     hModule,eax

    invoke  GetProcAddress,eax,ADDR func
    test    eax,eax
    jz      _exit

   _invoke  eax,0,ADDR text,ADDR caption,\
            MB_ICONWARNING,LANG_NEUTRAL,TIMEOUT

_exit:

    invoke  FreeLibrary,hModule

    invoke  ExitProcess,0

END start
Title: Re: HJWasm Macro Library Suggestions
Post by: mineiro on April 01, 2017, 12:42:24 PM
return keyword on prototypes;
mov myvar,invoke function,par1,par2,par3
From what I have see, on ms-dos, linux and windows (32 or 64), most functions return values on ax/eax/rax register, and if 2nd return value exists it will be on dx/edx/rdx register. But some functions can return some flags setup. This can be expanded to xmm registers, ... .
If function above have a void return type, so an error message should inform user.

A enum macro can be usefull too.

A syscall like invoke (with prototype check):
__NR_exit equ 60 <--- an enum
syscall __NR_exit,0 <---
I cannot create prototypes to syscall instruction, the same way I can't create to 'int' instruction, like int 80,int 21,int 2f...

syscall eax,rdi,rsi,rdx,r10,r8,r9   <--eax means function enum, other registers are parameters, used on linux x86-64.
int 80h,eax,ebx,ecx,edx,esi,edi,ebp  <--eax means function enum, used on linux 32 to call native functions, system call.
Please, check abi just to be sure if sequence above is right.
So, you should create a new name (cannot be invoke because we can mix 'call' and 'syscall' on same source code) , I don't have suggestions, and program will appear like ideal mode to be portable, we can port linux 32 to 64 bits this way. What changes are enumerations only. On 32 bits, "__NR_exit EQU 1" and on x86-64 "__NR_exit EQU 60".

I don't have sure if on linux 32 bits have 2 different calling way, maybe from kernel 2.2 below is one and above 2.2 is the one listed above.
So, linux to bsd can be done too (bsd use other abi or calling convention, I don't know exact name to be said).
Title: Re: HJWasm Macro Library Suggestions
Post by: aw27 on April 04, 2017, 10:42:33 PM
Option to Align XMM Local variables to 16 bytes under x86.
Title: Re: HJWasm Macro Library Suggestions
Post by: johnsa on April 04, 2017, 11:14:45 PM
Quote from: aw27 on April 04, 2017, 10:42:33 PM
Option to Align XMM Local variables to 16 bytes under x86.

+1 :)
Title: Re: HJWasm Macro Library Suggestions
Post by: jj2007 on April 05, 2017, 01:46:20 AM
Quote from: aw27 on April 04, 2017, 10:42:33 PM
Option to Align XMM Local variables to 16 bytes under x86.

Could be easily done in the PROLOG macro, with the advantage that the source would still assemble with ML. There is a huge 32-bit codebase...
Title: Re: HJWasm Macro Library Suggestions
Post by: aw27 on April 05, 2017, 02:48:57 AM
Quote from: jj2007 on April 05, 2017, 01:46:20 AM
Quote from: aw27 on April 04, 2017, 10:42:33 PM
Option to Align XMM Local variables to 16 bytes under x86.

Could be easily done in the PROLOG macro, with the advantage that the source would still assemble with ML. There is a huge 32-bit codebase...

I guess so, Jochen, but ML is not using a macro for prolog since pre-6.0 ages and I am not the man to develop one to do this job.
I am ready to sacrifice ML, which I am using now for x86, for the Option I requested.
Title: Re: HJWasm Macro Library Suggestions
Post by: hutch-- on April 05, 2017, 03:49:23 AM
In 64 bit MASM a 32 byte aligned stack frame is this easy. Tweak is done in the stackframe macro.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

tstproc proc

    LOCAL .ymm12 :YMMWORD
    LOCAL .ymm13 :YMMWORD
    LOCAL .ymm14 :YMMWORD
    LOCAL .ymm15 :YMMWORD

    vmovntdq .ymm12, ymm12
    vmovntdq .ymm13, ymm13
    vmovntdq .ymm14, ymm14
    vmovntdq .ymm15, ymm15

    nop
    nop
    nop
    nop
    nop
    nop
    nop
    nop

    vmovntdqa ymm12, .ymm12
    vmovntdqa ymm13, .ymm13
    vmovntdqa ymm14, .ymm14
    vmovntdqa ymm15, .ymm15

    ret

tstproc endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Title: Re: HJWasm Macro Library Suggestions
Post by: johnsa on April 05, 2017, 04:25:33 AM
The biggest problem with 32byte aligned stack is that the OS doesn't guarantee that for you on entry in the first place, so you'd need to start with a manual adjust to RSP to get things aligned 32 first before having the right prologue/epilogues to keep it that way.
Title: Re: HJWasm Macro Library Suggestions
Post by: aw27 on April 05, 2017, 04:44:33 AM
Quote from: hutch-- on April 05, 2017, 03:49:23 AM
In 64 bit MASM a 32 byte aligned stack frame is this easy. Tweak is done in the stackframe macro.

You will need to place all the 16-byte variables together at the beginning otherwise they will not remain aligned. If you intersperse variable types you will have an alignment problem.
On the other hand, the stack is always aligned to 16-bytes at the beginning after you push rbp, so why bother to align again. After your tweak, which I have not seen, will be 32-bytes but conclusion is the same.
This is the way I read the Prolog macro, but something may have escaped me.
Title: Re: HJWasm Macro Library Suggestions
Post by: Adamanteus on April 05, 2017, 06:02:25 AM
I'm thinking, that built-in macros mought affect only on replacing builti-in assembler commands (or it could became AsmC), that could overheat system in much repetitions (especially cheap) as xlat, scasX,  stosX, lodsX, movsX, cld/std (could be need empty), movsx/movzx (and not used in invoke), enter/leave : as possible mark authors of compilers them also avoiding.
Title: Re: HJWasm Macro Library Suggestions
Post by: hutch-- on April 05, 2017, 11:39:07 AM
> If you intersperse variable types you will have an alignment problem.

This is correct and it involves the discipline of stacking LOCAL variables in descending order of size, YMM then XMM then 64 bit and so on down to BYTE variables. The default prologue macro I use is 16 byte aligned which works fine with XMM, all it needs to handle YMM registers is 32 byte alignment. What I am inclined to do with MASM is just add the 32 byte version to the macro options as the main one works well and I don't want to further complicate the macro call with another option.

The reason for making this suggestion is that at a design level of an assembler, it would be very easy to test the data sizes while parsing the source code to ensure that a top down ordering is done and create an alignment error if the locals are out of order.
Title: Re: HJWasm Macro Library Suggestions
Post by: aw27 on April 05, 2017, 03:34:39 PM
Quote from: hutch-- on April 05, 2017, 11:39:07 AM
> If you intersperse variable types you will have an alignment problem.
This is correct and it involves the discipline of stacking LOCAL variables in descending order of size,
If we assume all big structure and union variables will have fields aligned to 16-bytes at least, which can be a waste of memory when they don't contain SIMD instructions.
Title: Re: HJWasm Macro Library Suggestions
Post by: jj2007 on April 05, 2017, 04:40:05 PM
Quote from: hutch-- on April 05, 2017, 11:39:07 AMensure that a top down ordering is done and create an alignment error if the locals are out of order.

John has remarked that ordering can be problematic if coder accesses more than one variable e.g. with a movups for 4 dwords. But an error (or a warning?) for unaligned locals would be a great solution, as it forces the coder to observe the "big ones first" logic. Better than chasing mysterious bugs.
Title: Re: HJWasm Macro Library Suggestions
Post by: hutch-- on April 05, 2017, 05:54:09 PM
> which can be a waste of memory when they don't contain SIMD instructions

This is not the case as stack memory is ALREADY allocated and all you are doing for the duration of the procedure and any further nested procedures is offsetting the stack usage until the instruction sequence returns to the caller. Now it could be a problem if you were writing a highly recursive algorithm that progressively used a large amount of stack address space but the solution to that problem would be to either set a large stack in the linker OR set a recursion depth limiter OR both.

The simple answer is you cannot waste what has already been wasted.
Title: Re: HJWasm Macro Library Suggestions
Post by: aw27 on April 05, 2017, 06:36:11 PM
Quote from: hutch-- on April 05, 2017, 05:54:09 PM
> which can be a waste of memory when they don't contain SIMD instructions

This is not the case as stack memory is ALREADY allocated and all you are doing for the duration of the procedure and any further nested procedures is offsetting the stack usage until the instruction sequence returns to the caller. Now it could be a problem if you were writing a highly recursive algorithm that progressively used a large amount of stack address space but the solution to that problem would be to either set a large stack in the linker OR set a recursion depth limiter OR both.

The simple answer is you cannot waste what has already been wasted.
For ML64 you have no better alternative, you really have to do it with macros.
Title: Re: HJWasm Macro Library Suggestions
Post by: hutch-- on April 05, 2017, 08:04:34 PM
 :biggrin:

That's the price of write in a MACRO assembler, you can.  :P
Title: Re: HJWasm Macro Library Suggestions
Post by: jj2007 on April 05, 2017, 08:46:53 PM
Quote from: hutch-- on April 05, 2017, 05:54:09 PMit could be a problem if you were writing a highly recursive algorithm

Right, although rarely relevant. Another argument is cache use, of course.

Not all XMMWORDs need align 16. On modern CPUs, movups and movaps are equally fast. Perhaps the coder could decide (with an option) if a misaligned XMMWORD should issue a warning, or throw an error.
Title: Re: HJWasm Macro Library Suggestions
Post by: johnsa on April 05, 2017, 08:53:53 PM
The case is more that movups will be "equally" fast if the data item happens to be aligned, but slower if not.

so movups gives you something as performant as movaps when the data is aligned, but doesn't explode in a heap when un-aligned..

basically it works the way the damn thing should have in the first place and there should never have been an aligned/unaligned variant :) imho
Title: Re: HJWasm Macro Library Suggestions
Post by: aw27 on April 05, 2017, 09:09:30 PM
Quote from: johnsa on April 05, 2017, 08:53:53 PM
basically it works the way the damn thing should have in the first place and there should never have been an aligned/unaligned variant :) imho
It gave jobs to C++ programmers at Microsoft who invented data types to pass data already aligned.
Title: Re: HJWasm Macro Library Suggestions
Post by: jj2007 on April 05, 2017, 09:26:30 PM
Yes, my wording was sloppy - and I fully agree with both of you :bgrin:

Anyway, warning or error for misaligned local xmmwords should be an option. While movups can replace movaps with no price to pay, there are many SIMD instructions that blow up when you write to/from unaligned memory.
Title: Re: HJWasm Macro Library Suggestions
Post by: Adamanteus on May 17, 2017, 12:30:41 AM
 My variant of incresing macrolib flexibility and abilities, only basic improvements realised : as Win16-32-64 universality and more classes to macros added, that's possible turn on and off by one, and even by name of each macro, using mlib  and nomlib command lline options :eusa_boohoo:
And that not to make substitiutes for microlib names, maybe better to change code searching symbols :

cmdline.c :

line 42 :

#include "macrolib.h"

line 410 :

static void OPTQUAL Set_NOMLIB(void)
{
#if defined ML_SWN
if (*OptName) noAutoMacrosAdd(OptName + 1);
else
#endif
Options.nomlib = TRUE;
}

static void OPTQUAL Set_MLIB(void)
{
#if defined ML_SWN
if (*OptName) inAutoMacrosAdd(OptName + 1);
else
#endif
Options.nomlib = FALSE;
}

line 611 :

{ "nomlib=@", 0,      Set_NOMLIB },
{ "mlib=@", 0,        Set_MLIB },


expans.c

line 1164 : for invariant to register macro names

      if( tokenarray[i].token == T_ID || tokenarray[i].token == T_INSTRUCTION) {


line 1168 : for invariant to register macro names

sym = SymFindDeclare(tokenarray[i].string_ptr);
else
#ifdef __SW_BD
sym = SymFindToken(tokenarray[i].string_ptr, tokenarray[i].token);
#else
sym = SymSearch( tokenarray[i].string_ptr );
#endif


symbols.c

line 309 : for invariant to register macro names

#ifdef __SW_BD
struct asym *SymFindToken( const char *name, int token)
/**************************************/
/* find a symbol in the local/global symbol table,
* FOR REPLACE INSTRUCTIONS BY MACROSES
* return ptr to next free entry in global table if not found.
* Note: lsym must be global, thus if the symbol isn't
* found and is to be added to the local table, there's no
* second scan necessary.
*/
{
    int i;
    int len;

    len = strlen( name );
    i = hashpjw( name );

    if ( CurrProc ) {
        for( lsym = &lsym_table[ i % LHASH_TABLE_SIZE ]; *lsym; lsym = &((*lsym)->nextitem ) ) {
            if ( len == (*lsym)->name_size && SYMCMP( name, (*lsym)->name, len ) == 0 ) {
                DebugMsg1(("SymFind(%s): found in local table, state=%u, local=%u\n", name, (*lsym)->state, (*lsym)->scoped )); 
(*lsym)->used = TRUE;
                return( *lsym );
            }
        }
    }

    for( gsym = &gsym_table[ i % GHASH_TABLE_SIZE ]; *gsym; gsym = &((*gsym)->nextitem ) ) {
if ( len == (*gsym)->name_size && ((token == T_INSTRUCTION && (*gsym)->state == SYM_MACRO) ? (_memicmp(name, (*gsym)->name, len) == 0) : (SYMCMP(name, (*gsym)->name, len) == 0)) ) {
            DebugMsg1(("SymFind(%s): found, state=%u memtype=%X lang=%u\n", name, (*gsym)->state, (*gsym)->mem_type, (*gsym)->langtype ));
            return( *gsym );
        }
    }

    return( NULL );
}
#endif