Reviving two old threads (zerolocals (http://www.masmforum.com/board/index.php?topic=12914.0) and "About LOCAL string (http://www.masmforum.com/board/index.php?topic=12306.msg94399#msg94399)"): Can I have some timings please?
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
clearing 8000 local bytes ...
1020 cycles for ZEROLOCALES, eax+ecx+edx trashed
6106 cycles for cll, eax trashed
4023 cycles for ClearLocVars macro, eax trashed
4039 cycles for call ClearLocalsB, eax trashed
4023 cycles for call ClearLocals, ALL regs preserved (old MB)
4042 cycles for ClearLocalVariables, ALL regs preserved (new MB)
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
clearing 8000 local bytes ...
821 cycles for ZEROLOCALES, eax+ecx+edx trashed
2095 cycles for cll, eax trashed
3976 cycles for ClearLocVars macro, eax trashed
4006 cycles for call ClearLocalsB, eax trashed
3953 cycles for call ClearLocals, ALL regs preserved (old MB)
4364 cycles for ClearLocalVariables, ALL regs preserved (new MB)
778 cycles for ZEROLOCALES, eax+ecx+edx trashed
2062 cycles for cll, eax trashed
3975 cycles for ClearLocVars macro, eax trashed
3981 cycles for call ClearLocalsB, eax trashed
3981 cycles for call ClearLocals, ALL regs preserved (old MB)
4044 cycles for ClearLocalVariables, ALL regs preserved (new MB)
795 cycles for ZEROLOCALES, eax+ecx+edx trashed
2245 cycles for cll, eax trashed
3954 cycles for ClearLocVars macro, eax trashed
3973 cycles for call ClearLocalsB, eax trashed
3968 cycles for call ClearLocals, ALL regs preserved (old MB)
4249 cycles for ClearLocalVariables, ALL regs preserved (new MB)
Jochen,
your timings:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
clearing 8000 local bytes ...
876 cycles for ZEROLOCALES, eax+ecx+edx trashed
1837 cycles for cll, eax trashed
3576 cycles for ClearLocVars macro, eax trashed
3588 cycles for call ClearLocalsB, eax trashed
3535 cycles for call ClearLocals, ALL regs preserved (old MB)
3544 cycles for ClearLocalVariables, ALL regs preserved (new MB)
697 cycles for ZEROLOCALES, eax+ecx+edx trashed
1807 cycles for cll, eax trashed
3525 cycles for ClearLocVars macro, eax trashed
3556 cycles for call ClearLocalsB, eax trashed
3538 cycles for call ClearLocals, ALL regs preserved (old MB)
3559 cycles for ClearLocalVariables, ALL regs preserved (new MB)
709 cycles for ZEROLOCALES, eax+ecx+edx trashed
1814 cycles for cll, eax trashed
3573 cycles for ClearLocVars macro, eax trashed
3575 cycles for call ClearLocalsB, eax trashed
3564 cycles for call ClearLocals, ALL regs preserved (old MB)
3564 cycles for ClearLocalVariables, ALL regs preserved (new MB)
Gunther
Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz (SSE4) (note: Overclocked to 4.2Ghz)
clearing 8000 local bytes ...
668 cycles for ZEROLOCALES, eax+ecx+edx trashed
1686 cycles for cll, eax trashed
3294 cycles for ClearLocVars macro, eax trashed
3308 cycles for call ClearLocalsB, eax trashed
3285 cycles for call ClearLocals, ALL regs preserved (old MB)
3302 cycles for ClearLocalVariables, ALL regs preserved (new MB)
646 cycles for ZEROLOCALES, eax+ecx+edx trashed
1680 cycles for cll, eax trashed
3275 cycles for ClearLocVars macro, eax trashed
3301 cycles for call ClearLocalsB, eax trashed
3277 cycles for call ClearLocals, ALL regs preserved (old MB)
3300 cycles for ClearLocalVariables, ALL regs preserved (new MB)
648 cycles for ZEROLOCALES, eax+ecx+edx trashed
1678 cycles for cll, eax trashed
3280 cycles for ClearLocVars macro, eax trashed
3296 cycles for call ClearLocalsB, eax trashed
3279 cycles for call ClearLocals, ALL regs preserved (old MB)
3295 cycles for ClearLocalVariables, ALL regs preserved (new MB)
Thanks, Sinsi & Gunther & fearless. I have changed a few bytes in version 2, increasing the bytes to be cleared to 80k, and observing that for a buffer that large
mov dword ptr [ebp+4*ecx], 0
is 10% faster than
and dword ptr [ebp+4*ecx], 0
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
clearing 81920 local bytes ...
23384 cycles for ZEROLOCALES, eax+ecx+edx trashed
41014 cycles for cll, eax trashed
61512 cycles for ClearLocVars macro, eax trashed
41110 cycles for call ClearLocalsB, eax trashed
41063 cycles for call ClearLocals, ALL regs preserved (old MB)
41073 cycles for ClearLocalVariables, ALL regs preserved (new MB)
Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz (SSE4)
clearing 81920 local bytes ...
6977 cycles for ZEROLOCALES, eax+ecx+edx trashed
18441 cycles for cll, eax trashed
50324 cycles for ClearLocVars macro, eax trashed
33505 cycles for call ClearLocalsB, eax trashed
33308 cycles for call ClearLocals, ALL regs preserved (old MB)
17753 cycles for ClearLocalVariables, ALL regs preserved (new MB)
6605 cycles for ZEROLOCALES, eax+ecx+edx trashed
17721 cycles for cll, eax trashed
50412 cycles for ClearLocVars macro, eax trashed
33346 cycles for call ClearLocalsB, eax trashed
33358 cycles for call ClearLocals, ALL regs preserved (old MB)
17735 cycles for ClearLocalVariables, ALL regs preserved (new MB)
6609 cycles for ZEROLOCALES, eax+ecx+edx trashed
17728 cycles for cll, eax trashed
50162 cycles for ClearLocVars macro, eax trashed
33556 cycles for call ClearLocalsB, eax trashed
33300 cycles for call ClearLocals, ALL regs preserved (old MB)
17731 cycles for ClearLocalVariables, ALL regs preserved (new MB)
Code sizes:
MyTest1 = 24
MyTest2 = 21
MyTest3 = 13
MyTest4 = 8
MyTest5 = 8
MyTest6 = 8
plus once for procs (call ClearLoc*):
ClearLocalsB = 17
ClearLocVars = 64
ClearLocals = 19
Hi,
Measurements from three older processors. Interesting that
the timings vary less than 0.5% for the second one.
pre-P4 (SSE1)
clearing 81920 local bytes ...
31197 cycles for ZEROLOCALES, eax+ecx+edx trashed
41399 cycles for cll, eax trashed
62224 cycles for ClearLocVars macro, eax trashed
41436 cycles for call ClearLocalsB, eax trashed
41474 cycles for call ClearLocals, ALL regs preserved (old MB)
41589 cycles for ClearLocalVariables, ALL regs preserved (new MB)
31155 cycles for ZEROLOCALES, eax+ecx+edx trashed
41436 cycles for cll, eax trashed
62100 cycles for ClearLocVars macro, eax trashed
41535 cycles for call ClearLocalsB, eax trashed
41452 cycles for call ClearLocals, ALL regs preserved (old MB)
41455 cycles for ClearLocalVariables, ALL regs preserved (new MB)
31256 cycles for ZEROLOCALES, eax+ecx+edx trashed
41477 cycles for cll, eax trashed
62134 cycles for ClearLocVars macro, eax trashed
41688 cycles for call ClearLocalsB, eax trashed
41778 cycles for call ClearLocals, ALL regs preserved (old MB)
41769 cycles for ClearLocalVariables, ALL regs preserved (new MB)
pre-P4clearing 81920 local bytes ...
260560 cycles for ZEROLOCALES, eax+ecx+edx trashed
259603 cycles for cll, eax trashed
259593 cycles for ClearLocVars macro, eax trashed
259489 cycles for call ClearLocalsB, eax trashed
259370 cycles for call ClearLocals, ALL regs preserved (old MB)
259537 cycles for ClearLocalVariables, ALL regs preserved (new MB)
259917 cycles for ZEROLOCALES, eax+ecx+edx trashed
259498 cycles for cll, eax trashed
259613 cycles for ClearLocVars macro, eax trashed
259542 cycles for call ClearLocalsB, eax trashed
259394 cycles for call ClearLocals, ALL regs preserved (old MB)
260441 cycles for ClearLocalVariables, ALL regs preserved (new MB)
259784 cycles for ZEROLOCALES, eax+ecx+edx trashed
259419 cycles for cll, eax trashed
259618 cycles for ClearLocVars macro, eax trashed
259628 cycles for call ClearLocalsB, eax trashed
260104 cycles for call ClearLocals, ALL regs preserved (old MB)
259610 cycles for ClearLocalVariables, ALL regs preserved (new MB)
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
clearing 81920 local bytes ...
20912 cycles for ZEROLOCALES, eax+ecx+edx trashed
41490 cycles for cll, eax trashed
41619 cycles for ClearLocVars macro, eax trashed
41554 cycles for call ClearLocalsB, eax trashed
41613 cycles for call ClearLocals, ALL regs preserved (old MB)
41542 cycles for ClearLocalVariables, ALL regs preserved (new MB)
20843 cycles for ZEROLOCALES, eax+ecx+edx trashed
41566 cycles for cll, eax trashed
41487 cycles for ClearLocVars macro, eax trashed
41507 cycles for call ClearLocalsB, eax trashed
41530 cycles for call ClearLocals, ALL regs preserved (old MB)
41524 cycles for ClearLocalVariables, ALL regs preserved (new MB)
20849 cycles for ZEROLOCALES, eax+ecx+edx trashed
41524 cycles for cll, eax trashed
41519 cycles for ClearLocVars macro, eax trashed
41546 cycles for call ClearLocalsB, eax trashed
41551 cycles for call ClearLocals, ALL regs preserved (old MB)
41546 cycles for ClearLocalVariables, ALL regs preserved (new MB)
Regards,
Steve N.
What version of Windows are we all running? Timings for similar CPUs are strange.
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
clearing 81920 local bytes ...
8057 cycles for ZEROLOCALES, eax+ecx+edx trashed
20137 cycles for cll, eax trashed
56446 cycles for ClearLocVars macro, eax trashed
37108 cycles for call ClearLocalsB, eax trashed
37045 cycles for call ClearLocals, ALL regs preserved (old MB)
19741 cycles for ClearLocalVariables, ALL regs preserved (new MB)
Jochen,
version B timings:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
clearing 81920 local bytes ...
7765 cycles for ZEROLOCALES, eax+ecx+edx trashed
19350 cycles for cll, eax trashed
54183 cycles for ClearLocVars macro, eax trashed
36050 cycles for call ClearLocalsB, eax trashed
36034 cycles for call ClearLocals, ALL regs preserved (old MB)
19148 cycles for ClearLocalVariables, ALL regs preserved (new MB)
7476 cycles for ZEROLOCALES, eax+ecx+edx trashed
19529 cycles for cll, eax trashed
54103 cycles for ClearLocVars macro, eax trashed
35980 cycles for call ClearLocalsB, eax trashed
36065 cycles for call ClearLocals, ALL regs preserved (old MB)
19185 cycles for ClearLocalVariables, ALL regs preserved (new MB)
7248 cycles for ZEROLOCALES, eax+ecx+edx trashed
19233 cycles for cll, eax trashed
54129 cycles for ClearLocVars macro, eax trashed
35991 cycles for call ClearLocalsB, eax trashed
36227 cycles for call ClearLocals, ALL regs preserved (old MB)
19101 cycles for ClearLocalVariables, ALL regs preserved (new MB)
Code sizes:
MyTest1 = 24
MyTest2 = 21
MyTest3 = 13
MyTest4 = 8
MyTest5 = 8
MyTest6 = 8
plus once for procs (call ClearLoc*):
ClearLocalsB = 17
ClearLocVars = 64
ClearLocals = 19
Japheth would say: Aha, the cycle counter brigade. :lol:
Gunther
Quote from: Gunther on July 22, 2013, 11:51:01 PM
Japheth would say: Aha, the cycle counter brigade. :lol:
Pssst...
@sinsi: I've added GetVersionEx;
6.1. is Win7 (32-bit)
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
6.1. Service Pack 1
clearing 81920 local bytes ...
23375 cycles for ZEROLOCALES, eax+ecx+edx trashed
61559 cycles for cll, eax trashed
41031 cycles for ClearLocVars macro, eax trashed
41058 cycles for call ClearLocalsB, eax trashed
41062 cycles for call ClearLocals, ALL regs preserved (old MB)
41073 cycles for ClearLocalVariables, ALL regs preserved (new MB)
23428 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
23319 cycles for ZEROLOCALES, eax+ecx+edx trashed
61735 cycles for cll, eax trashed
41053 cycles for ClearLocVars macro, eax trashed
41054 cycles for call ClearLocalsB, eax trashed
41031 cycles for call ClearLocals, ALL regs preserved (old MB)
41073 cycles for ClearLocalVariables, ALL regs preserved (new MB)
23549 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
clearing 100 local bytes ...
31 cycles for ZEROLOCALES, eax+ecx+edx trashed
92 cycles for cll, eax trashed
93 cycles for ClearLocVars macro, eax trashed
86 cycles for call ClearLocalsB, eax trashed
71 cycles for call ClearLocals, ALL regs preserved (old MB)
92 cycles for ClearLocalVariables, ALL regs preserved (new MB)
51 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
31 cycles for ZEROLOCALES, eax+ecx+edx trashed
92 cycles for cll, eax trashed
93 cycles for ClearLocVars macro, eax trashed
86 cycles for call ClearLocalsB, eax trashed
71 cycles for call ClearLocals, ALL regs preserved (old MB)
92 cycles for ClearLocalVariables, ALL regs preserved (new MB)
50 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3) 5.1. Service Pack 3
clearing 81920 local bytes ...
14373 cycles for ZEROLOCALES, eax+ecx+edx trashed
41079 cycles for cll, eax trashed
41043 cycles for ClearLocVars macro, eax trashed
41061 cycles for call ClearLocalsB, eax trashed
41203 cycles for call ClearLocals, ALL regs preserved (old MB)
41103 cycles for ClearLocalVariables, ALL regs preserved (new MB)
14344 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
14277 cycles for ZEROLOCALES, eax+ecx+edx trashed
41024 cycles for cll, eax trashed
41087 cycles for ClearLocVars macro, eax trashed
41182 cycles for call ClearLocalsB, eax trashed
41051 cycles for call ClearLocals, ALL regs preserved (old MB)
41143 cycles for ClearLocalVariables, ALL regs preserved (new MB)
14373 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
clearing 100 local bytes ...
74 cycles for ZEROLOCALES, eax+ecx+edx trashed
80 cycles for cll, eax trashed
105 cycles for ClearLocVars macro, eax trashed
73 cycles for call ClearLocalsB, eax trashed
64 cycles for call ClearLocals, ALL regs preserved (old MB)
112 cycles for ClearLocalVariables, ALL regs preserved (new MB)
93 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
74 cycles for ZEROLOCALES, eax+ecx+edx trashed
80 cycles for cll, eax trashed
105 cycles for ClearLocVars macro, eax trashed
76 cycles for call ClearLocalsB, eax trashed
64 cycles for call ClearLocals, ALL regs preserved (old MB)
112 cycles for ClearLocalVariables, ALL regs preserved (new MB)
93 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
Another run:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4) 6.1. Service Pack 1
clearing 81920 local bytes ...
10300 cycles for ZEROLOCALES, eax+ecx+edx trashed
19305 cycles for cll, eax trashed
36224 cycles for ClearLocVars macro, eax trashed
36317 cycles for call ClearLocalsB, eax trashed
35946 cycles for call ClearLocals, ALL regs preserved (old MB)
19543 cycles for ClearLocalVariables, ALL regs preserved (new MB)
7278 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
7207 cycles for ZEROLOCALES, eax+ecx+edx trashed
19267 cycles for cll, eax trashed
35998 cycles for ClearLocVars macro, eax trashed
35925 cycles for call ClearLocalsB, eax trashed
36074 cycles for call ClearLocals, ALL regs preserved (old MB)
19142 cycles for ClearLocalVariables, ALL regs preserved (new MB)
7183 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
clearing 100 local bytes ...
38 cycles for ZEROLOCALES, eax+ecx+edx trashed
43 cycles for cll, eax trashed
66 cycles for ClearLocVars macro, eax trashed
71 cycles for call ClearLocalsB, eax trashed
52 cycles for call ClearLocals, ALL regs preserved (old MB)
64 cycles for ClearLocalVariables, ALL regs preserved (new MB)
57 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
43 cycles for ZEROLOCALES, eax+ecx+edx trashed
49 cycles for cll, eax trashed
73 cycles for ClearLocVars macro, eax trashed
71 cycles for call ClearLocalsB, eax trashed
54 cycles for call ClearLocals, ALL regs preserved (old MB)
65 cycles for ClearLocalVariables, ALL regs preserved (new MB)
57 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
Gunther
Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz (SSE4) 6.1. Service Pack 1 (Win7 64bit CPU OC'd to 4.2Ghz)
clearing 81920 local bytes ...
7388 cycles for ZEROLOCALES, eax+ecx+edx trashed
18261 cycles for cll, eax trashed
34105 cycles for ClearLocVars macro, eax trashed
34369 cycles for call ClearLocalsB, eax trashed
34281 cycles for call ClearLocals, ALL regs preserved (old MB)
18126 cycles for ClearLocalVariables, ALL regs preserved (new MB)
6764 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
6772 cycles for ZEROLOCALES, eax+ecx+edx trashed
18056 cycles for cll, eax trashed
33844 cycles for ClearLocVars macro, eax trashed
33946 cycles for call ClearLocalsB, eax trashed
33878 cycles for call ClearLocals, ALL regs preserved (old MB)
18044 cycles for ClearLocalVariables, ALL regs preserved (new MB)
6760 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
clearing 100 local bytes ...
37 cycles for ZEROLOCALES, eax+ecx+edx trashed
43 cycles for cll, eax trashed
67 cycles for ClearLocVars macro, eax trashed
62 cycles for call ClearLocalsB, eax trashed
46 cycles for call ClearLocals, ALL regs preserved (old MB)
65 cycles for ClearLocalVariables, ALL regs preserved (new MB)
51 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
38 cycles for ZEROLOCALES, eax+ecx+edx trashed
49 cycles for cll, eax trashed
64 cycles for ClearLocVars macro, eax trashed
67 cycles for call ClearLocalsB, eax trashed
45 cycles for call ClearLocals, ALL regs preserved (old MB)
55 cycles for ClearLocalVariables, ALL regs preserved (new MB)
49 cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
I have nothing to the speed contest here just like to post what I am using for clearing locals: :idea:
; *********************************************************
; ZEROLOCALS NameofProc
; *********************************************************
ZEROLOCALS MACRO Subroutine
mov ecx, ebp
lea eax, [esp+(($-Subroutine)-8)*4]
sub ecx, eax
invokeA RtlZeroMemory,eax,ecx
ENDM
And a proc if size matters more than speed: (Taking care of not only MASM add esp, -n, sub esp, +n as well)
;ZeroSubVars(pProc:POINTER)
mov eax, [esp+4]
mov ecx, dword ptr [eax+5]
.if (byte ptr [eax+4] == 0C4h)
neg ecx
.endif
.if (byte ptr [eax+3] == 83h)
movzx ecx, cl
.endif
mov edx, edi
mov edi, ecx
sub edi, ebp
neg edi
shr ecx, 2
xor eax, eax
rep stosd
mov edi, edx
ret 4
Hi Ficko,
Looks remarkably close to my version (see attachment above, line 385ff):
align 16
ClearLocVars2 proc ; ### 22.7.13, 62 bytes ###
push eax
push edi
push ecx
mov eax, [esp+12] ; get EIP
sub eax, 9 ; start before sub/add esp and the call; no add/sub->int 77
lea ecx, [eax-127] ; set a limit
@@:
dec eax
cmp eax, ecx ; -17 works for 4*uses and long sub/add esp
js cLerr
cmp word ptr [eax], 0EC8Bh ; mov ebp, esp - PoAsm uses 0E589h and sub, nn
jne @B
cmp byte ptr [eax+2], 81h ; short add/sub esp?
mov ecx, [eax+4] ; long
je @F
; cmp byte ptr [eax+ecx+2], 83h ; can only occur if there are no locals
; jne cLerr
movsx ecx, cl ; short, sign-extend
@@:
mov edi, ecx
sar ecx, 2 ; stack is always dword-aligned, so divide by 4 is OK
js @F
neg ecx ; JWasm uses sub esp, nn
neg edi ; ML uses add esp, -nn
@@:
xor eax, eax
neg ecx
add edi, ebp
rep stosd
pop ecx
pop edi
pop eax
retn
cLerr:
int 77h ; set a really strange marker in case of crash
ClearLocVars2 endp
Your version needs a
push offset whateverproc ; 5 bytes extra
call ZeroSubVars
How do you deal with uses esi edi ebx?
Quote from: jj2007 on July 23, 2013, 04:39:39 AM
Hi Ficko,
Looks remarkably close to my version (see attachment above, line 385ff):
Indeed. :shock:
Quote
How do you deal with uses esi edi ebx?
I am writing my prgs generally in high level languages and using static libraries with procs written in JWASM.
So I have to adhere to certain conventions.
Like not care about preserving eax,ecx,edx like you do.
My macro above should be of curse the first item after the declarations so preserving edi,esi,ebx happens naturally with "use".
Test proc public use edi esi ebx param00:dword
LOCAL .......
ZEROLOCALS Test
...
..
- If that what you were asking for. –
The "ZerorSubVars" proc trashes only "edi" which is saved into "edx" temporally.
Quote from: Ficko on July 23, 2013, 09:18:07 PM
The "ZerorSubVars" proc trashes only "edi" which is saved into "edx" temporally.
I'm trying to test your version:
push offset MyProc
call ZeroSubVarsBut mov eax, [esp+4] yields 401005, which happens to be
00401005 . /E9 53000000 jmp MyProcIt works for
MyProc proc private uses ... args but not for
public (the default).
Will keep trying ;-)
I am not sure what you are doing jj. :icon_eek:
If you call a proc with a parameter then [esp+4] is the parameter you pushed.
Therefore you push the offset of a PROC than you get the offset of a PROC.
Ok, I have a thought.
If you did this:
ZeroSubVars proc public pProc:DWORD
....
It will not work.
It has to be a frameless proc.
ZeroSubVars:mov eax, [esp+4]
Quote from: Ficko on July 23, 2013, 11:39:51 PM
I am not sure what you are doing jj. :icon_eek:
If you call a proc with a parameter then [esp+4] is the parameter you pushed.
Therefore you push the offset of a PROC than you get the offset of a PROC.
That sounds entirely plausible. Nonetheless, the bloody beast pushes the jump table, not the actual proc start - unless I declare it private. And of course, ZeroSubVars has option prologue:none etc.
::)
Now the good news is that I found the culprit. Testbed attached - try with and without /debug in the linker's commandline, and prepare for a fat surprise :icon_mrgreen:
P.S.: Don't use polink, use the old Masm32 default linker.
Here is my test code (Tested with JWASM and ML the "bloody beast" must be "GoASM":P):
.686p
.model flat,stdcall
option casemap :none ; case sensitive
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib
; *********************************************************
; turn stackframe off and on for low overhead procedures
; *********************************************************
Stackframe MACRO arg
IFIDNI <on>,<arg>
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
ELSEIFIDNI <off>,<arg>
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
ELSE
echo -----------------------------------
echo ERROR IN "stackframe" MACRO
echo Incorrect Argument Supplied
echo Options
echo 1. off Turn default stack frame off
echo 2. on Restore stack frame defaults
echo SYNTAX : frame on/off
echo -----------------------------------
.err
ENDIF
ENDM
; ---------------------------------------------------------
Pilot PROTO
.data
CommandLine dd 0
hInstance dd 0
.code
start:
invoke GetModuleHandle, NULL ; provides the instance handle
mov hInstance, eax
invoke GetCommandLine ; provides the command line address
mov CommandLine, eax
invoke Pilot
invoke ExitProcess,eax
Stackframe off
ZeroSubVars proc public
mov eax, [esp+4]
mov ecx, dword ptr [eax+5]
.if (byte ptr [eax+4] == 0C4h)
neg ecx
.endif
.if (byte ptr [eax+3] == 83h)
movzx ecx, cl
.endif
mov edx, edi
mov edi, ecx
sub edi, ebp
neg edi
shr ecx, 2
xor eax, eax
rep stosd
mov edi, edx
ret 4
ZeroSubVars endp
Stackframe on
Pilot proc public
LOCAL MyVar :QWORD
push offset Pilot
call ZeroSubVars
ret 4
Pilot endp
end start
Hi Ficko,
It's the linker. Depending on version and debug settings, it sometimes pushes the real offset, sometimes the offset of the jump table. The snippet below shows this:
00401005 offset MyPublic
00402000 offset MyPrivate
Same with polink:
00401FEE offset MyPublic
00401FF0 offset MyPrivate
include \masm32\include\masm32rt.inc
.code
start:
push offset MyPublic
pop ecx
print hex$(ecx), 9, "offset MyPublic", 13, 10
push offset MyPrivate
pop ecx
inkey hex$(ecx), 9, "offset MyPrivate", 13, 10
exit
nops 1000h-72h ; to make it clearer
MyPublic proc public
nop
ret
MyPublic endp
MyPrivate proc private
nop
ret
MyPrivate endp
end start
OPT_Linker link ; polink works, 6.14 sometimes, 10.0 chokes if /debug is set
OPT_DebugA /Zi
OPT_DebugL /debug /DYNAMICBASE:NO
OPT_Assembler mlv10
That's really strange having no explanation. :icon_eek:
Hi Jochen :t
When you're using /debug switch with the linker, and do not want to have those "jumps-to" wrappers, add /INCREMENTAL:NO switch to the linker. (Those jumps are things having two targets: fast linking scheme (incremental build), and/or are helpers in a debugging (that was especially important for breakpoints on an APIs on Win9x systems)).
Thanks, Alex, that solves the mystery :t
We had a long thread about the IAT jump tables in the old forum (http://www.masmforum.com/board/index.php?topic=11541.0). For my "zero locals" macro, the problem is not relevant because I chose to go back and look for mov ebp, esp, but Ficko's version relies on direct calls, so the linker behaviour needs to be watched.
Yes, IAT jump thunks are not absolutely equal to "incremental/debug-mode" ones, but very similar things.
Interesting, with search of /INCREMENTAL found this thread on the old forum, too:
http://www.masmforum.com/board/index.php?topic=795.msg5491#msg5491
Thanks to everybody :icon14:
The fastest algo is now part of the MasmBasic library (more (http://masm32.com/board/index.php?topic=94.msg22404#msg22404)).