News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

More Advanced Prologue and Epilogue MACROses for PROCs

Started by Antariy, June 07, 2013, 11:00:04 PM

Previous topic - Next topic

Antariy

Here is the macro I mentioned here about.

It:
• Probes the stack if the size of locals in total is greater than default x86 page size (4096), which prevents silent crashes if proc uses much of locals space. It just continuously probe the stack with the page sized step and allocates the pages properly for usage in the code. If the size of locals is not greater than page size, the probing code is not generated by the macro and is not called. If the macro generates the probing code, it will notify about it in the console. To supress these notification messages, specify the macro "AxProcProl_NoWarnChkStk".
• Zeroes the locals at the start of the proc, so you have them all set to 0 right from first instruction of your code in the proc, if required. You should specify the "AxProcProl_ZeroSpace = 1" equation to turn this feature on.
• Does an optimization of the epilogue :greensml: If you set "AxProcProl_OptimizeSpeed" macro to 1, it will produce "mov esp,ebp \ pop ebp" code, if you set that macro to 0, it will produce "leave" :lol: The default is 0.
• Important feature: it checks the stack balancing at the return and at the same time it checks for buffer overflows (also called as buffer overruns). If at the time of return from the proc there will be detected a corruption of a variable placed in the stack, it will mean that either the stack is imbalanced, or some code has overwritten this value so the return address may potentially point to other location than the proc was called from (this is security threat). If you want this feature to be turned ON - to test your code and to find imbalancing bugs if there are any, or for a release edition for increasing the program security - then you should set the macro "AxProcProl_CheckBuffersOverflow" to 1, otherwise set it to 0.

Also one may want to randomize the "salt" used by code to check the stack state correctness. For this you may manipulate with the variable "AxProcProl_CheckingForOverflowDWord" - XOR it or change it as you want.

Also remember that you can change the macroses' values in any place in the source, so you can turn on or off any feature or set of features for any specific proc.



As for stack checking thing - if there is corruption detected, then message box will be displayed at the time of return from the proc. It will look like:
---------------------------
Error
---------------------------
Warning: in the proc 'start' there is the problem with the stack!   
This may bring to undesired results!   
Continue Executing? (NO - is recommended)
---------------------------
Yes   No   
---------------------------

So user can chose to close program, or to continue it (not recommended but in some circumstancies it maybe important).


It maybe a bit crude, so thoughts and suggestions are welcome :t
Also simple testing proggie is included with example of stack imbalanced proc.

qWord

Quote from: Antariy on June 07, 2013, 11:00:04 PMIt:
• Probes the stack if the size of locals in total is greater than default x86 page size (4096), which prevents silent crashes if proc uses much of locals space.
thats new to me  :icon_confused:
(or you are talking about 64 bit processes?)
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

Quote from: Antariy on June 07, 2013, 11:00:04 PM
Here is the macro I mentioned here about.
...
It maybe a bit crude, so thoughts and suggestions are welcome :t
Also simple testing proggie is included with example of stack imbalanced proc.

Great work, Alex :t


> thats new to me

For playing with the limits. 3930 is fine on Win7-32.

include \masm32\include\masm32rt.inc

crash=3930      ; try 3940

.code
Testme3 proc arg
LOCAL abc, buffer[4096+crash]:BYTE
  mov eax, arg
  mov buffer[0], al
  ret
Testme3 endp

Testme2 proc arg
LOCAL abc, buffer[4096]:BYTE
  invoke Testme3, arg
  ret
Testme2 endp

Testme1 proc arg
LOCAL abc, buffer[4096]:BYTE
  invoke Testme2, arg
  ret
Testme1 endp

start:   invoke Testme1, 123
   inkey "ok"
   exit

end start


Antariy

Strange error when posted - half of message was not posted, so I changed it a bit and explained in more details ::)

Antariy

Quote from: qWord on June 07, 2013, 11:18:19 PM
Quote from: Antariy on June 07, 2013, 11:00:04 PMIt:
• Probes the stack if the size of locals in total is greater than default x86 page size (4096), which prevents silent crashes if proc uses much of locals space.
thats new to me  :icon_confused:
(or you are talking about 64 bit processes?)

The default (ok, "default") page size for 32 bit x86 CPU is 4096 bytes (1000h).


Thanks, Jochen :biggrin:

Quote from: jj2007 on June 07, 2013, 11:34:25 PM
For playing with the limits. 3930 is fine on Win7-32.

One will get crash only when is accessing to an uncommited stack page. Since the stack layout is so:


=== top of stack (address, in pages, let's say, 10) ===

some data in the stack

=== esp (address 5,5) ===

== guard page (address 5) ===
=== no access space (address 4) ===


so one needs to allocate from one to two pages in the stack and to access to the lowest variable to get actual and sure crash. The exact size depends only on runtime position of the ESP in the page field and on the lowest commited page. The probing code work is in accessing with the page step, so it will eventually hit the guard page that lies below current ESP value and that lies above the memory that is not commited yet. The system intercepts this hit to a guard page and understand that the stack requirement is growing, and commiting the memory - the guard page becomes commited page, the next page below previously-guard page becomes guard instead of no-access, and so on. So, if the stack requirement increasing consequently and with not too big steps (greater than 1-2 pages), the system allocates it fine. If the code allocates too much - so the locals pointing below the guard page, with accessing to that locating code will cause an exception that system will decide as critical, because it waits only for guard page hits as proper stack increasing. It is just the system design - it does not commits the full stack size for the thread until it is really required - it saves the memory space that way, but the system needs a way to know if the code needs more stack - and this "commited-guard-noaccess" technique is the way how it does so.

That's not fancy stuff - MS's compilers actually have this functionality internally - they probe stack if the function has big locals (programmer can turn this functionality off, but it will lead to unpredictable results - on some systems prog with, let's say, 4200 locals bytes will work fine, on some it will crash, but if function has more than two pages of locals - it will crash for sure on every system). This macroses set just implements in more or less similar manner the techniques that are industrial standards for Windows ::) (stack probing and overflow checking).

dedndave

Quote from: Antariy on June 07, 2013, 11:58:59 PM
(programmer can turn this functionality off, but it will lead to unpredictable results - on some systems prog with, let's say, 4200 locals bytes will work fine, on some it will crash, but if function has more than two pages of locals - it will crash for sure on every system)

yah, with a compiler, the programmer really doesn't know how much space is used in locals - lol
with assembly language, you do know   :t

for probe code, i would suggest something like this
        ASSUME  FS:Nothing

    push    eax
    push    ecx
    mov     eax,esp
    mov     ecx,esp
    sub     eax,<(size of locals) AND -4)>
    .repeat
        push    ecx
        mov     esp,fs:[8]
    .until eax>=esp
    mov     esp,ecx
    pop     ecx
    pop     eax

        ASSUME  FS:ERROR


on another note.....
one of the things that bothers me about the assembler prologue/epilogue is the order registers are push/pop'ed
MyFunc PROC USES EBX ESI EDI dwParm:DWORD

    mov     eax,dwParm
    ret

MyFunc ENDP

will generate code that looks like this
MyFunc PROC dwParm:DWORD

    push    ebp
    mov     ebp,esp
    push    ebx
    push    esi
    push    edi
    mov     eax,[ebp+8]
    pop     edi
    pop     esi
    pop     ebx
    leave
    ret     4

MyFunc ENDP


it would be better to do it this way
LEAVE is executed first (balance the stack), then pop the USES registers
MyFunc PROC dwParm:DWORD

    push    ebx
    push    esi
    push    edi
    push    ebp
    mov     ebp,esp
    mov     eax,[ebp+20]
    leave
    pop     edi
    pop     esi
    pop     ebx
    ret     4

MyFunc ENDP


that way, you can use whatever stack you like without balancing it
when the routine exits, the stack is balanced before you restore the USES registers
you simply have to adjust EBP offsets by the space used in USES
i don't know if you can do that with a macro   :P

qWord

Quote from: Antariy on June 07, 2013, 11:58:59 PMIf the code allocates too much - so the locals pointing below the guard page, with accessing to that locating code will cause an exception that system will decide as critical, because it waits only for guard page hits as proper stack increasing.
that makes sense - learning never stops.
MREAL macros - when you need floating point arithmetic while assembling!

Antariy

Quote from: dedndave on June 08, 2013, 12:32:32 AM
Quote from: Antariy on June 07, 2013, 11:58:59 PM
(programmer can turn this functionality off, but it will lead to unpredictable results - on some systems prog with, let's say, 4200 locals bytes will work fine, on some it will crash, but if function has more than two pages of locals - it will crash for sure on every system)

yah, with a compiler, the programmer really doesn't know how much space is used in locals - lol
with assembly language, you do know   :t

for probe code, i would suggest something like this
...
on another note.....
one of the things that bothers me about the assembler prologue/epilogue is the order registers are push/pop'ed
...
that way, you can use whatever stack you like without balancing it
when the routine exits, the stack is balanced before you restore the USES registers
you simply have to adjust EBP offsets by the space used in USES
i don't know if you can do that with a macro   :P


As for runtime TEB values of the stack top and bottom - yes, I know that - in 2010 in the old forum I posted the prog in the recursion stack-discussion thread, don't remember how attachment is called (the file is TestStack.asm).
Here is the code:

include \masm32\include\masm32rt.inc

.data?
startFlag dd ?
numThreads dd ?
.code

AxGetStackBottom MACRO thereg:REQ, theallocationsize
ASSUME fs:NOTHING
mov thereg, fs:[4]
ifdif <theallocationsize>,<>
sub thereg,theallocationsize
add thereg,1024*32
else
sub thereg,(1024*1024-(1024*32))
endif
ASSUME fs:ERROR
EXITM<thereg>
ENDM

; First parameter is pointer to a structure:
; sampleControl struct
; aThreadNumber dd ?
; dwStackSizeEntered dd ?
; sampleControl ends
; but in main function this structure created with simple pushs - to be local for each thread,
; and for simpleness
; Second parameter is a pointer to a DWORD, which is incremented with each successful recursive
; calling to the next level

RecursiveFunction proc dwParam1:DWORD, dwParam2:DWORD
LOCAL bigbuffer[4000]:BYTE

mov ecx,dwParam1 ; get pointer to 2 DWORD - first is thread num, second is its stack size
mov ecx,[ecx+4]

cmp esp,AxGetStackBottom(eax,ecx)
jbe @tooLowStackLevel

mov ecx,dwParam2
inc dword ptr [ecx]
invoke RecursiveFunction,dwParam1,dwParam2
@tooLowStackLevel:
ret

RecursiveFunction endp

TheThread proc lpVoid:DWORD
lock inc dword ptr numThreads ; lock for case if you too fastly enter values :P

@@:
invoke Sleep,100
cmp startFlag,0
jz @B

mov ecx,lpVoid
mov ecx,[ecx]
invoke crt_printf,CTXT("Hi, this is thread #%u, starting recursive function...",10),ecx

push 0
invoke RecursiveFunction,lpVoid,esp

mov eax,lpVoid
mov eax,[eax]
invoke crt_printf,CTXT("The thread #%u do %u recursive calls (limited to the stack size only)",10),eax
lock dec dword ptr numThreads ; ...the same :P
pop edx
ret


TheThread endp


start:



invoke crt_printf,CTXT("Hi, this is testing program",10,"Enter a number of threads to test: ")
push eax
invoke crt_scanf,CTXT("%u"),esp
pop ebx
mov esi,1

push eax
@@:
invoke crt_printf,CTXT("Enter a stack size for the thread #%u: "),esi
push eax
invoke crt_scanf,CTXT("%u"),esp
mov ecx,[esp] ; thread stack size a second parameter of the structure
push esi ; thread number is a first parameter of the structure
mov edx,esp
push eax
invoke CreateThread,0,ecx,offset TheThread,edx,0,esp
pop eax
inc esi
cmp esi,ebx
jbe @B


or startFlag,1

@@:
invoke Sleep,100
cmp numThreads,0
jnz @B

lea esp,[esp+ebx*8+4]

invoke crt_printf,CTXT("All threads are finished, find and press [Any] key to exit..."),esi
invoke crt__getch

ret
end start


But I actually find it simpler to use the way I used. But your way is good, too :t Maybe it's worth to change it that way (that's why the probing code is a separate proc, not "inlined" code).

As for pushes/pops - I afraid it is not possible. At least as I understand it - the assembler only supports the "displacement" for a values inbetween the ebp and locals. I.e. it's just like the thing I used for the "salt" positioning - it lies between ebp and locals, so the first, let's say, local dword will be reffered not as [ebp-4] but as [ebp-8]. It's the addition that MACRO returns as a result of macro function (localbytes+4). But there are no documented values that may change the level of "up" direction (i.e. [ebp+12] instead of [ebp+8], for example). I decide this as an omission in the design, too - it may be very useful to change the uses pushes/locals placement.

But we can of course do it other way - I thought about it, too, this macros set was even unfinished till today, but the other question is: if something with the code goes that way that it really does crazy things like registers mess etc, is it really required to make things "look like" all is ok and restore execution state - is not it better to warn and / or terminate? The code that goes crazy may be already unpredictable, the data may be corrupted etc etc, and, the main note - in a well designed and tested programs there is no possibility of a stack imbalancing - that maybe accidental thing and it may be caught in a testing stage (with using this or similar prologue macroses), in a "release build" such a mess like regs trashing is not that frequent and important as a buffer overruns are, for an instance. The point is: if there is something wrong with the code and it already flagged that it's buggy, the values of regs that were saved above the locals (or below) may also be trashed - we cannot be sure, and that's a danger thing - just silently restore the state with the values we even cannot be sure are proper, and continue execution.

And, additional note: if we really need to preserve non-volatile regs across call, and if we decide EBP as the reg that will not be trashed in the called code (we decide it so because we anyway need proper EBP in the called buggy code to restore the ESP and, if we doing so, the execution state), then we can simply save the regs in a locals of a calling proc :biggrin: And restore them after a call (and check and warn and anything).

I.e. the sinopsis is that if there is something that really wants to change behaviour or crash our code - it will crash it, at least, if there is nothing such - for what is required such a complicated restoration methods? Restored state is not 100% guaranteed to be the same as it was before (something may even change the data in the .data section, may overwrite data in the stack above locals of current proc etc etc - if the assumed that the code got crazy - the regs will not trash from no reason, so, there was something that intentionally or accidentally makes a mess in the our code/data execution state, that maybe even external function from a DLL etc), and the improperly functionating program is worst than fully unfunctionating (crashed, warned and terminated etc) program.


What is your suggestion, Dave? It's maybe possible to duplicate the USES regs inbetween ebp and locals, but it will be not beautiful solution since assembler does not give other ways to do something like, and, again, buffer overruns will trash these preserved values. Probably it's better to make LOCALS as usually, but preserve regs not via USES but with a manual pushes - it will, at least, put the regs values below the locals (though this not gives much if huge buffer overrun had occured).


Quote from: qWord on June 08, 2013, 12:55:15 AM
that makes sense - learning never stops.

You are perfectly right, qWord :t

dedndave

i really don't have a suggestion, Alex
i am not much of a macro guy - lol

i quite often write my own stack frame so i can get what i want,
then i alias the names with TEXTEQU, using an underscore character
        OPTION  PROLOGUE:None
        OPTION  EPILOGUE:None

GetFirstControlPoints PROC uKnotQty:UINT,lpKnotArray:LPVOID,lpResArray:LPVOID

;-----------------------------------------

_lpResArray   TEXTEQU <dword ptr [ebp+20]>  ;pointer to Res (result) array
_lpKnotArray  TEXTEQU <dword ptr [ebp+16]>  ;pointer to Knot array
_uKnotQty     TEXTEQU <dword ptr [ebp+12]>  ;knot point qty
;                                [ebp+8]    ;RETurn address
;                                [ebp+4]    ;saved EBX contents
;                                [ebp]      ;saved EBP contents

;-----------------------------------------

        push    ebx
        push    ebp
        mov     ebp,esp


i can leave the stack in whatever unbalanced state i like, then...
        leave
        pop     ebx
        ret     12

GetFirstControlPoints ENDP

        OPTION  PROLOGUE:PrologueDef
        OPTION  EPILOGUE:EpilogueDef

jj2007

#9
Here is a similar option. It uses a standard stack frame but provides an extra large buffer:

include \masm32\MasmBasic\MasmBasic.inc        ; download

.code
MyProc proc uses esi edi ebx arg        ; ### StackBuffer example ###
LOCAL pBuffer, whatever
  ClearLocals
  mov pBuffer, StackBuffer(100000)        ; 16-byte aligned & probed
  Print "pBuffer=", Hex$(pBuffer)
  StackBuffer()  ; release buffer
  ret
MyProc endp

        Init
        mov ecx, esp
        mov esi, 111111111
        mov ebx, 222222222
        invoke MyProc, 12345h
        sub ecx, esp
        deb 4, "Out", esi, ebx, ecx
        Inkey
        Exit
end start


Output:
pBuffer=001178B0
Out
esi             111111111
ebx             222222222
ecx             0


However, it's not yet the solution that Dave has in mind, i.e. leaving without caring for the stack...


jj2007

Excellent links, Dave :t
And it works:

include \masm32\include\masm32rt.inc
include JJLogue.inc

.code
mytest proc uses esi edi ebx arg1, arg2
LOCAL abc, rc:RECT, def, ghi, jkl
  push 123                ; let's play foul...
  push 456
  push 789
  mov abc, 12345678h
  print hex$(arg1), 9, "arg1", 13, 10
  print hex$(arg2), 9, "arg2", 13, 10
  print hex$(abc), 9, "abc", 13, 10, 10
  push 123                ; let's play foul...
  push 456
  push 789
  ret
mytest endp

start:
        mov esi, 11111111h
        mov edi, 22222222h
        mov ebx, 33333333h
        invoke mytest, 123h, 456h
        print hex$(esi), 9, "esi", 13, 10
        print hex$(edi), 9, "edi", 13, 10
        print hex$(ebx), 9, "ebx", 13, 10
        inkey
        exit
end start


Output:
00000123        arg1
00000456        arg2
12345678        abc

11111111        esi
22222222        edi
33333333        ebx

:biggrin:

Antariy

Quote from: dedndave on June 08, 2013, 04:32:10 AM
these might be of help

it may be possible to do USES push first   :P

http://msdn.microsoft.com/en-us/library/4zc781yh%28v=vs.80%29.aspx
http://read.pudn.com/downloads149/doc/642485/MASM613/INCLUDE/PROLOGUE.INC__.htm

Yes, we can do pushes before or after stack allocation, but if we do it before stack frame setup, the assembler still have no way to be informed that params of the function now lie higher than usually.
Thank you for the links, Dave :t It seems that missed FORCEFRAME option.

But the way Jochen used works :t Even if we cannot make assembler to refer "higher ebp" values, we still can use inter-macro (non-local) variable - nice and simple solution :biggrin:

TouEnMasm

I don't see where is the problem to have a proc who use more than one page of stack memory ?.
Perhaps someone could enlight me or give a sample who have a real problem with that ?.

sample:

;################################################################
Big_stack_proc PROC
Local bigone[1000h]:DWORD
Local  retour:DWORD
         mov retour,1            ;access violation without correction

FindeBig_stack_proc:
         mov eax,retour
         ret
Big_stack_proc endp


Soluce:add this to your code

option dotname
.drectve  segment info
    db "-stack:0x100000,0x5000 "
.drectve ends

and no more problem
The .drectve  is a special object segment dedicated to pass order for link.
Same thing can be done with "link /STACK:0x100000,0x5000 "
Fa is a musical note to play with CL

MichaelW

Quote from: ToutEnMasm on June 08, 2013, 03:01:50 PM
I don't see where is the problem to have a proc who use more than one page of stack memory ?.
Perhaps someone could enlight me or give a sample who have a real problem with that ?.

It depends on the value in ESP at procedure entry. This is obviously a contrived example, but the problem can occur when ESP is anywhere in the bottom page of the stack.

;==============================================================================
include \masm32\include\masm32rt.inc
;==============================================================================
.data
.code
;==============================================================================
Proc1 proc
    LOCAL array[4096]:BYTE
    mov al, array[0]
    ret
Proc1 endp
;==============================================================================
start:
;==============================================================================
    assume fs:NOTHING
    mov ebx, fs:[8]
    printf("Current bottom of stack: %Xh\n", ebx)
    add ebx, 4
    mov esp, ebx
    call Proc1
    mov ebx, fs:[8]
    printf("Current bottom of stack: %Xh\n\n", ebx)
    inkey
    exit
;==============================================================================
end start


From the listing, with comments added:

Proc1 proc
    LOCAL array[4096]:BYTE
    push ebp                ; access into guard page
    mov ebp, esp
    add esp, 0FFFFF000h
    mov al, array[0]        ; access into new guard page
    leave
    ret
Proc1 endp


Current bottom of stack: 12E000h
Current bottom of stack: 12C000h


If I change the LOCAL allocation to 4096*2 then the app crashes because the second access skips over the new guard page and accesses uncommitted stack space.
Well Microsoft, here's another nice mess you've gotten us into.