The MASM Forum

General => The Campus => Topic started by: JK on March 27, 2021, 05:18:13 AM

Title: Zero locals
Post by: JK on March 27, 2021, 05:18:13 AM
In 32 bit i can zero all locals quite easy whenever i use an EBP based stack frame. Everything between ESP and EBP is "local" right after the prologue. The more, locals are located on the stack in the same order they are defined in code in 32 bit.

Running the same code in 64 bit, i see that locals don´t start always at RBP and i cannot rely on the fact that the defined order of locals, will be the same on the stack. E.g. in code i have 3 locals: a DWORD, a structure and another DWORD, on the stack i find 2 DWORDs one right after the other followed by the structure.

The basic idea in 64 bit for the following code:
local x :dword
local y :rect
local z: dword

was:
cld
xor rax, rax
lea rcx, x
add rcx, sizeof(x)
lea rdi, z
sub rcx, rdi
rep stosb

(get the address of the first local, add it´s size, get the address of the last local to zero out, fill everything in beween with zero)

But obviously this doesn´t work! What else could i do? I want to reliably zero out all or certain portions of my locals in a procedure. Of course i could explicitly zero out each local (mov rax, 0 -> mov ..., rax), but there must be a much more elegant and faster way - how?


Thanks


JK
Title: Re: Zero locals
Post by: jj2007 on March 27, 2021, 06:05:00 AM
Quote from: JK on March 27, 2021, 05:18:13 AMin 64 bit, i see that locals don´t start always at RBP and i cannot rely on the fact that the defined order of locals, will be the same on the stack. E.g. in code i have 3 locals: a DWORD, a structure and another DWORD, on the stack i find 2 DWORDs one right after the other followed by the structure.

With which frameword/assembler does this happen? I use my own PROLOGUE and EPILOGUE, which assigns LOCALs exactly where the programmer wants them to be; and the "ZeroLocals" is built into the PROLOGUE macro:

SayHi proc <cb> arg:SIZE_P ; <cb> indicates ok for usage as a callback function; the arg is a pointer
Local v1, v2, v3, v4, rc:RECT
  Print Str$(" \nLocals v1...v4: %i %i %i %i", v1, v2, v3, v4)
  jinvoke MessageBox, 0, arg, Chr$("Hi"), MB_OK or MB_SETFOREGROUND
  ret
SayHi endp

0000000140001002  | 55                      | push rbp                        |
0000000140001003  | 48 C7 C5 E0 FF FF FF    | mov rbp,FFFFFFFFFFFFFFE0        | clear locals
000000014000100A  | 83 24 2C 00             | and dword ptr ss:[rsp+rbp],0    |
000000014000100E  | 48 83 C5 04             | add rbp,4                       |
0000000140001012  | 78 F6                   | js 14000100A                    |
0000000140001014  | 48 8B EC                | mov rbp,rsp                     |
0000000140001017  | 48 89 4D 10             | mov qword ptr ss:[rbp+10],rcx   | load arguments (option <cb>)
000000014000101B  | 48 89 55 18             | mov qword ptr ss:[rbp+18],rdx   |
000000014000101F  | 4C 89 45 20             | mov qword ptr ss:[rbp+20],r8    |
0000000140001023  | 4C 89 4D 28             | mov qword ptr ss:[rbp+28],r9    |
Title: Re: Zero locals
Post by: JK on March 27, 2021, 08:00:07 AM
Thanks JJ, i´m using UASM with option win64:15 and option stackbase:rbp. But maybe i should run my own PROLOGUE and EPILOGUE like you do. I suspect that some of UASM´s "optimizations" make a procedure look different depending on number and type of parameters, number and type of locals, if it´s a leaf procedure or not and maybe some other stuff. This makes it difficult to find a generic approach.

Could you please expand a bit on how you do it (PROLOGUE and EPILOGUE).

I´m very much interested in from case to case not zeroing all locals all the time, but sometimes only some of it. Something like this:
local a ...
local b ...
local c ...
zero_until_here (or zero_from_here)
local x ...
local y
 

This would require a custom PROLOGUE and an additional macro for setting a list of locals to zero. If i knew the start or end address of the locals block and i could rely on locals appearing on the stack in the order, they were defined in code, i could specify one local as a start/end point for a loop setting everything in between to zero. Maybe better like this:
local a ...
local b ...
local c ...
local x ...
local y
zero_until(c)
...
 

would zero a,b and c, and:
local a ...
local b ...
local c ...
local x ...
local y
zero_until(yc)
...
   
would zero out all locals.

How to do this, preferably for both 32 and 64 bit ?


JK
Title: Re: Zero locals
Post by: jj2007 on March 27, 2021, 08:20:05 AM
Quote from: JK on March 27, 2021, 08:00:07 AMI´m very much interested in from case to case not zeroing all locals all the time, but sometimes only some of it

Known problem: you have a dozen local dwords and structures, plus a 8kB buffer that doesn't need zeroing.
In JBasic (the dual 64/32 bit demo that comes with MasmBasic), with e.g. useClv=120 you can limit the ClearLocals to 120 bytes

How I do it, that is a long story. The file JBasic.inc comes with the MasmBasic package (http://masm32.com/board/index.php?topic=94.0), but explaining the PROLOGUE macro would require more time than I currently have, sorry...
Title: Re: Zero locals
Post by: JK on March 27, 2021, 09:03:05 AM
Thanks anyway - i found JBasic.inc, now i can study how you do it!

One more question - is this your generic PROLOGUE and EPILOGUE, or are there more variations to be found in MasmBasic?


JK
Title: Re: Zero locals
Post by: jj2007 on March 27, 2021, 02:41:52 PM
It 's the generic one, set in j@start
Title: Re: Zero locals
Post by: jj2007 on March 27, 2021, 09:44:46 PM
Btw there was a long PROC and prolog/epilog (http://masm32.com/board/index.php?topic=6717.0) thread three years ago. Sooner or later you will stumble over "undocumented MASM features" when testing your stuff with Watcom vs ML64 assemblers.
Title: Re: Zero locals
Post by: hutch-- on March 27, 2021, 11:02:50 PM
While the guts of UASM is not my concern, if you need to be able to set local values in your prologue, Bob Zale did it in PB so if you disassemble a PB executable, you will see his technique. It means you must be able to set dynamic code to do this after the entry to the proc.

If I had a reason to, I think it can be done in MASM so if UASM has properly duplicated the MASM pre-processor, it should be able to insert dynamic code after the procedure entry.
Title: Re: Zero locals
Post by: jj2007 on March 27, 2021, 11:26:05 PM
Quote from: hutch-- on March 27, 2021, 11:02:50 PMI think it can be done in MASM

Indeed, it can be done. I added the "clear the locals" feature to JBasic (my dual 64-/32-bit library) half a year ago.

if useClv lt localbytes and useClv ge 4 ; with e.g. useClv=120, you can limit the ClearLocals to 120 bytes
  lea rbp, [rsp-useClv]
else
  lea rbp, [rsp-localbytes-locBytesOff] ; ML64 and AsmC address locals differently, and create a 16-byte unused zone near the stack pointer
endif
ClearLoc:  and dword ptr [rbp], 0
  add rbp, 4
  cmp rbp, rsp
  js ClearLoc
Title: Re: Zero locals
Post by: JK on March 28, 2021, 08:04:11 AM
@hutch

I know how PB does it, a disassembly reveals this. Basically i know and understand what i must do. But i hoped to get away without a custom PROLOGUE/EPILOGUE. UASM does a good job optimzing code and stack space usage, but the downside is - at least in 64 bit - i cannot calculate the stack position of locals from RBP and RSP in a reliable manner, because of these optimizations. I even cannot rely on the order of locals to appear as these are defined. UASM puts larger locals at the start.

@JJ

In the meantime i had a chance to study your code and i understand, what you are doing. As said above, i hoped to be able to use, what´s built-in into UASM, and to add some code clearing the locals, i want to be cleared. But maybe the only way to accheive, what i want, is doing it the hard way by writing my own custom PROLOGUE/EPILGUE. I will be able to learn from your code and extract, what i need!

One more question: the return value of a custom PROLOGUE tells the assembler where the locals are located in relation to EBP/RBP - right ?


JK
Title: Re: Zero locals
Post by: Vortex on March 28, 2021, 11:37:43 PM
Here is another attempt to initialize the local variables. This can be combined with the custom PROLOGUE \ EPILOGUE method :

include     \masm32\include\masm32rt.inc


INITLOC MACRO registers

    VarSize=0

    StackPos=0

    IFNB <registers>   

       StackPos=4*registers

    ENDIF

ENDM


LOCALX MACRO _name,_type

    VarSize=VarSize + SIZEOF(_type)

    LOCAL _name : _type

ENDM


ENDLOC MACRO

    lea     eax,[esp+StackPos]
    invoke  memfill,eax,VarSize,0

ENDM


.code

start:

    call    main
    invoke  ExitProcess,0

main PROC USES esi edi ebx

INITLOC 1 ; number of preserved registers

LOCALX rc,RECT
LOCALX x,DWORD
LOCALX y,DWORD

ENDLOC

    lea     eax,rc

    invoke  crt_printf,\
            CTXT("y = %d , x = %d , &rc = %X"),\
            y,x,eax
    ret

main ENDP

END start

Title: Re: Zero locals
Post by: rsala on March 28, 2021, 11:39:31 PM
Hi,

UASM now handles primitives first then structs/arrays, so do not expect them to be stored in the same order they are declared.
Title: Re: Zero locals
Post by: nidud on March 29, 2021, 12:11:56 AM
deleted
Title: Re: Zero locals
Post by: nidud on March 29, 2021, 12:25:57 AM
deleted
Title: Re: Zero locals
Post by: jj2007 on March 29, 2021, 01:04:31 AM
Quote from: JK on March 28, 2021, 08:04:11 AMOne more question: the return value of a custom PROLOGUE tells the assembler where the locals are located in relation to EBP/RBP - right ?

I'm afraid it doesn't, that would make the task easier.

The JBasic library installer has moved here (http://masm32.com/board/index.php?topic=9266.0).

I attach a shortened version of the JBasic library. Extract all files to \Masm32\MasmBasic\Res, then drag an asm file over Buildme.bat

(it needs \Masm32\bin\UAsm64.exe, all the rest should be available).
Title: Re: Zero locals
Post by: Vortex on March 29, 2021, 03:07:33 AM
Hello,

Here is an example for Masm 64-bit :

include \masm32\include64\masm64rt.inc

.data

string1 db 'y = %d , x = %d , &rc = %X',0


INITLOC MACRO

    VarSize=0

ENDM


LOCALX MACRO _name,_type

    VarSize=VarSize + SIZEOF(_type)

    LastVar TEXTEQU <_name>

    LOCAL _name : _type

ENDM


ENDLOC MACRO

; % echo    LastVar
    lea     rcx,LastVar
    invoke  vc_memset,rcx,0,VarSize

ENDM


.code

start PROC

    call    main
    invoke  ExitProcess,0

start ENDP


main PROC

INITLOC

LOCALX .rsi,QWORD
LOCALX .rdi,QWORD
LOCALX .rbx,QWORD

LOCALX rc,RECT
LOCALX x,QWORD
LOCALX y,QWORD

ENDLOC

    mov     .rsi,rsi
    mov     .rdi,rdi
    mov     .rbx,rbx

    lea     r9,rc

    invoke  vc_printf,\
            ADDR string1,\
            y,x,r9

    mov     rsi,.rsi
    mov     rdi,.rdi
    mov     rbx,.rbx
    ret

main ENDP

END
Title: Re: Zero locals
Post by: Vortex on April 01, 2021, 05:59:35 AM
Another version :

include     \masm32\include\masm32rt.inc


INITLOC MACRO

    VarSize=0

ENDM


@p MACRO _name,_type

    VarSize=VarSize + SIZEOF(_type)

    LastVar TEXTEQU <_name>

    EXITM <_name : _type >

ENDM


ENDLOC MACRO

    lea     eax,LastVar
    invoke  memfill,eax,VarSize,0

ENDM


.code

start:

    call    main
    invoke  ExitProcess,0

main PROC USES esi edi ebx

INITLOC

LOCAL @p(rc,RECT)
LOCAL @p(x,DWORD)
LOCAL @p(y,DWORD)

ENDLOC

    invoke  crt_printf,\
            CTXT("y = %d , x = %d"),\
            y,x

    ret

main ENDP

END start

Title: Re: Zero locals
Post by: JK on April 01, 2021, 06:38:16 AM
Thanks Vortex,

in my initial post i supplied code, which essentially does the same as your macros do. All of this works only if the order of locals isn´t changed by the assembler. My first try was UASM, which does change (optimize?) the order of locals on the stack. In the meantime i played a bit with UASM options and with ml/ml64 and found that this not always the case. So avoiding certain options basically solves my problem.

I read about rep stosb vs memfill and made own timing tests. To my surprise rep stosb - at least on my (fairly old) machine - can not only keep up with memfill but is slightly faster in a range, where you would expect the average total size of locals.


JK
Title: Re: Zero locals
Post by: hutch-- on April 01, 2021, 11:28:05 AM
Hi JK,

The instruction pair REP MOVSB/STOSB is still very competitive as it has special case circuitry as long as you use REP with them.

Now with both ML and ML64 they explicitly maintain the written order of your locals and one of the ways to keep everything aligned properly is to start with the biggest data types first and place the rest in decending order. This way everything is correctly aligned.
Title: Re: Zero locals
Post by: jj2007 on April 01, 2021, 02:59:51 PM
Quote from: hutch-- on April 01, 2021, 11:28:05 AMone of the ways to keep everything aligned properly is to start with the biggest data types first and place the rest in decending order. This way everything is correctly aligned.

That's correct, but it creates bloat:
Local buffer[124]:BYTE, v1, v2, v3, v4, v5, v6, v7, rc:RECT
  int 3
  mov eax, v1     ; still short
  mov ecx, v2     ; bloated long instructions for v2...rc, 3 bytes more for each mov, inc, add etc
  lea rdx, buffer ; short instruction

00000001400011D7  | CC                               | int3                              |
00000001400011D8  | 8B 45 80                         | mov eax,dword ptr ss:[rbp-80]     |
00000001400011DB  | 8B 8D 7C FF FF FF                | mov ecx,dword ptr ss:[rbp-84]     |
00000001400011E1  | 48 8D 55 84                      | lea rdx,qword ptr ss:[rbp-7C]     |
Title: Re: Zero locals
Post by: hutch-- on April 01, 2021, 10:10:16 PM
 :biggrin:

Better bloat than broken. If you are dealing with instructions that need alignment you have no choice. You may in some circumstances get misaligned data to work but why waste the performance when you just need to do it correctly in the first place.

With x64 and Win64, you will always have more memory than older OS versions and you need to use it in a compatible way with the OS specs, fiddling a few bytes here and there simply does not matter.

In order, align the proc if it needs to be larger than the default 64 bit then add any locals in decending order of size. MASM can do it so with any of thew Watcom derivatives, look for an option that will do this for you.
Title: Re: Zero locals
Post by: daydreamer on April 01, 2021, 11:14:14 PM
is it possible to use rep movsd for LOCAL array and xchg esp,edi and start pop data from array in a loop?
and xchg esp back
Title: Re: Zero locals
Post by: JK on April 02, 2021, 12:54:29 AM
@hutch

about data alignment: why must data be aligned in 64 bit? is it a matter of performance, or is it a matter of crash or not?

regarding locals: is it sufficient to properly align only the first local in a procedure, or must every local be aligned according to it´s size?

This is still confusing to me, the latter seems to be the case when stepping through 64 bit code with a debugger. Even if i don´t care about sorting locals according to their size - the assembler seems to align it for me. I see only addresses ending with 0 or 8 (except for byte, word and dword locals, which are aligned at 1,2 or 4 byte boundaries).

So what is the advantage of sorting locals by size, did i miss something? I want to make it stable in first place, i don´t want to over-optimize things and risk hard to find bugs.


Thanks


JK
Title: Re: Zero locals
Post by: hutch-- on April 02, 2021, 05:30:46 AM
 :biggrin:

Its simple enough, win64 is not tolerant, misalign data and code and the app may not start. Usually the default alignment for a procedure is 64 bit so if you put 64 bit locals first, then 32 bit locals then 16 bit then 8 bit in decending order, they are all aligned correctly. Doing this avoids hard to track problems in 64 bit.

Now if you need to use 128 or 256 byte data types, you need to align the procedure to the largest data type then put any other in descending order. In MASM there are facilities to create larger alignments so you will have to look through the UASM capacity to see if you can do the same.
Title: Re: Zero locals
Post by: JK on April 02, 2021, 06:09:00 AM
Thanks hutch,

i´m not an exclusive UASM user, UASM seemed easier to start with in 64 bit, because of some already built-in features, but i´m also looking for ways to do things with ml/ml64. Ml/ml64 has been around for a long time and i think it will be in the future. I´m not sure, if clones like UASM, ASMC and other will be alive then as well.

Currently i try to learn new things in the 64 bit assembler world, filling knowledge gaps at all knowlegde levels. Therefore i may ask for very basic things and next time i may ask for very special stuff.
Title: Re: Zero locals
Post by: jj2007 on April 02, 2021, 06:28:30 AM
Quote from: hutch-- on April 01, 2021, 10:10:16 PM
:biggrin:

Better bloat than broken. If you are dealing with instructions that need alignment you have no choice. You may in some circumstances get misaligned data to work but why waste the performance when you just need to do it correctly in the first place.

As an old friend of mine used to say: "Bottom line is REAL MEN[tm] write their loop code in Intel mnemonics, stack frames by hand, not everyone wants a compiler writer to hold their hot little hand." :biggrin:
Title: Re: Zero locals
Post by: hutch-- on April 02, 2021, 07:20:52 AM
 :biggrin:

Well, there is nothing wrong with writing your loop code in Intel mnemonics, particularly if you want it fast, in 32 bit it was easy to write manual stack frames or lack of one and you have to be careful about letting compiler writers hold your hot little hand, they may lead you down the garden path to a visual garbage generator.

Walk the straight and narrow and you will learn to write missiles that make the script kiddies sob into their chardonnet. (perhaps lemonade)  :tongue:
Title: Re: Zero locals
Post by: daydreamer on April 03, 2021, 06:03:32 PM
Quote from: hutch-- on April 02, 2021, 05:30:46 AM
Its simple enough, win64 is not tolerant, misalign data and code and the app may not start. Usually the default alignment for a procedure is 64 bit so if you put 64 bit locals first, then 32 bit locals then 16 bit then 8 bit in decending order, they are all aligned correctly. Doing this avoids hard to track problems in 64 bit
Example
Local a,b,c,D:dword
Local array[256]:dword
;copy data to local array
Mov saveesp,esp
So in which order do I put array and other locals,to start pop 256 dwords in a loop?



Title: Re: Zero locals
Post by: jj2007 on April 03, 2021, 07:55:08 PM
Quote from: daydreamer on April 03, 2021, 06:03:32 PMpop 256 dwords in a loop?

Can you post complete code for that, please?
Title: Re: Zero locals
Post by: hutch-- on April 03, 2021, 11:13:04 PM
Magnus,

Quote
to start pop 256 dwords in a loop

In Win64 you don't pop anything. What you have said did not make sense.
Title: Re: Zero locals
Post by: Vortex on April 04, 2021, 02:52:31 AM
Converting the LOCVAR macro to Poasm is easy :

LOCVAR MACRO _name,_type

    VarSize=VarSize + SIZEOF(_type)

    LastVar TEXTEQU _name ; Masm equivalent = LastVar TEXTEQU <_name>

    LOCAL _name : _type

ENDM


The ENDLOC macro remains the same.