News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Zero locals

Started by JK, March 27, 2021, 05:18:13 AM

Previous topic - Next topic

Vortex

Hello,

Here is an example for Masm 64-bit :

include \masm32\include64\masm64rt.inc

.data

string1 db 'y = %d , x = %d , &rc = %X',0


INITLOC MACRO

    VarSize=0

ENDM


LOCALX MACRO _name,_type

    VarSize=VarSize + SIZEOF(_type)

    LastVar TEXTEQU <_name>

    LOCAL _name : _type

ENDM


ENDLOC MACRO

; % echo    LastVar
    lea     rcx,LastVar
    invoke  vc_memset,rcx,0,VarSize

ENDM


.code

start PROC

    call    main
    invoke  ExitProcess,0

start ENDP


main PROC

INITLOC

LOCALX .rsi,QWORD
LOCALX .rdi,QWORD
LOCALX .rbx,QWORD

LOCALX rc,RECT
LOCALX x,QWORD
LOCALX y,QWORD

ENDLOC

    mov     .rsi,rsi
    mov     .rdi,rdi
    mov     .rbx,rbx

    lea     r9,rc

    invoke  vc_printf,\
            ADDR string1,\
            y,x,r9

    mov     rsi,.rsi
    mov     rdi,.rdi
    mov     rbx,.rbx
    ret

main ENDP

END

Vortex

Another version :

include     \masm32\include\masm32rt.inc


INITLOC MACRO

    VarSize=0

ENDM


@p MACRO _name,_type

    VarSize=VarSize + SIZEOF(_type)

    LastVar TEXTEQU <_name>

    EXITM <_name : _type >

ENDM


ENDLOC MACRO

    lea     eax,LastVar
    invoke  memfill,eax,VarSize,0

ENDM


.code

start:

    call    main
    invoke  ExitProcess,0

main PROC USES esi edi ebx

INITLOC

LOCAL @p(rc,RECT)
LOCAL @p(x,DWORD)
LOCAL @p(y,DWORD)

ENDLOC

    invoke  crt_printf,\
            CTXT("y = %d , x = %d"),\
            y,x

    ret

main ENDP

END start


JK

Thanks Vortex,

in my initial post i supplied code, which essentially does the same as your macros do. All of this works only if the order of locals isn´t changed by the assembler. My first try was UASM, which does change (optimize?) the order of locals on the stack. In the meantime i played a bit with UASM options and with ml/ml64 and found that this not always the case. So avoiding certain options basically solves my problem.

I read about rep stosb vs memfill and made own timing tests. To my surprise rep stosb - at least on my (fairly old) machine - can not only keep up with memfill but is slightly faster in a range, where you would expect the average total size of locals.


JK

hutch--

Hi JK,

The instruction pair REP MOVSB/STOSB is still very competitive as it has special case circuitry as long as you use REP with them.

Now with both ML and ML64 they explicitly maintain the written order of your locals and one of the ways to keep everything aligned properly is to start with the biggest data types first and place the rest in decending order. This way everything is correctly aligned.

jj2007

Quote from: hutch-- on April 01, 2021, 11:28:05 AMone of the ways to keep everything aligned properly is to start with the biggest data types first and place the rest in decending order. This way everything is correctly aligned.

That's correct, but it creates bloat:
Local buffer[124]:BYTE, v1, v2, v3, v4, v5, v6, v7, rc:RECT
  int 3
  mov eax, v1     ; still short
  mov ecx, v2     ; bloated long instructions for v2...rc, 3 bytes more for each mov, inc, add etc
  lea rdx, buffer ; short instruction

00000001400011D7  | CC                               | int3                              |
00000001400011D8  | 8B 45 80                         | mov eax,dword ptr ss:[rbp-80]     |
00000001400011DB  | 8B 8D 7C FF FF FF                | mov ecx,dword ptr ss:[rbp-84]     |
00000001400011E1  | 48 8D 55 84                      | lea rdx,qword ptr ss:[rbp-7C]     |

hutch--

 :biggrin:

Better bloat than broken. If you are dealing with instructions that need alignment you have no choice. You may in some circumstances get misaligned data to work but why waste the performance when you just need to do it correctly in the first place.

With x64 and Win64, you will always have more memory than older OS versions and you need to use it in a compatible way with the OS specs, fiddling a few bytes here and there simply does not matter.

In order, align the proc if it needs to be larger than the default 64 bit then add any locals in decending order of size. MASM can do it so with any of thew Watcom derivatives, look for an option that will do this for you.

daydreamer

is it possible to use rep movsd for LOCAL array and xchg esp,edi and start pop data from array in a loop?
and xchg esp back
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

JK

@hutch

about data alignment: why must data be aligned in 64 bit? is it a matter of performance, or is it a matter of crash or not?

regarding locals: is it sufficient to properly align only the first local in a procedure, or must every local be aligned according to it´s size?

This is still confusing to me, the latter seems to be the case when stepping through 64 bit code with a debugger. Even if i don´t care about sorting locals according to their size - the assembler seems to align it for me. I see only addresses ending with 0 or 8 (except for byte, word and dword locals, which are aligned at 1,2 or 4 byte boundaries).

So what is the advantage of sorting locals by size, did i miss something? I want to make it stable in first place, i don´t want to over-optimize things and risk hard to find bugs.


Thanks


JK

hutch--

 :biggrin:

Its simple enough, win64 is not tolerant, misalign data and code and the app may not start. Usually the default alignment for a procedure is 64 bit so if you put 64 bit locals first, then 32 bit locals then 16 bit then 8 bit in decending order, they are all aligned correctly. Doing this avoids hard to track problems in 64 bit.

Now if you need to use 128 or 256 byte data types, you need to align the procedure to the largest data type then put any other in descending order. In MASM there are facilities to create larger alignments so you will have to look through the UASM capacity to see if you can do the same.

JK

Thanks hutch,

i´m not an exclusive UASM user, UASM seemed easier to start with in 64 bit, because of some already built-in features, but i´m also looking for ways to do things with ml/ml64. Ml/ml64 has been around for a long time and i think it will be in the future. I´m not sure, if clones like UASM, ASMC and other will be alive then as well.

Currently i try to learn new things in the 64 bit assembler world, filling knowledge gaps at all knowlegde levels. Therefore i may ask for very basic things and next time i may ask for very special stuff.

jj2007

Quote from: hutch-- on April 01, 2021, 10:10:16 PM
:biggrin:

Better bloat than broken. If you are dealing with instructions that need alignment you have no choice. You may in some circumstances get misaligned data to work but why waste the performance when you just need to do it correctly in the first place.

As an old friend of mine used to say: "Bottom line is REAL MEN[tm] write their loop code in Intel mnemonics, stack frames by hand, not everyone wants a compiler writer to hold their hot little hand." :biggrin:

hutch--

 :biggrin:

Well, there is nothing wrong with writing your loop code in Intel mnemonics, particularly if you want it fast, in 32 bit it was easy to write manual stack frames or lack of one and you have to be careful about letting compiler writers hold your hot little hand, they may lead you down the garden path to a visual garbage generator.

Walk the straight and narrow and you will learn to write missiles that make the script kiddies sob into their chardonnet. (perhaps lemonade)  :tongue:

daydreamer

Quote from: hutch-- on April 02, 2021, 05:30:46 AM
Its simple enough, win64 is not tolerant, misalign data and code and the app may not start. Usually the default alignment for a procedure is 64 bit so if you put 64 bit locals first, then 32 bit locals then 16 bit then 8 bit in decending order, they are all aligned correctly. Doing this avoids hard to track problems in 64 bit
Example
Local a,b,c,D:dword
Local array[256]:dword
;copy data to local array
Mov saveesp,esp
So in which order do I put array and other locals,to start pop 256 dwords in a loop?



my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

Quote from: daydreamer on April 03, 2021, 06:03:32 PMpop 256 dwords in a loop?

Can you post complete code for that, please?

hutch--

Magnus,

Quote
to start pop 256 dwords in a loop

In Win64 you don't pop anything. What you have said did not make sense.