The MASM Forum

General => The Laboratory => Topic started by: jj2007 on July 22, 2013, 08:25:30 PM

Title: ClearLocalVariables timings
Post by: jj2007 on July 22, 2013, 08:25:30 PM
Reviving two old threads (zerolocals (http://www.masmforum.com/board/index.php?topic=12914.0) and "About LOCAL string (http://www.masmforum.com/board/index.php?topic=12306.msg94399#msg94399)"): Can I have some timings please?

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
clearing 8000 local bytes ...
1020    cycles for ZEROLOCALES, eax+ecx+edx trashed
6106    cycles for cll, eax trashed
4023    cycles for ClearLocVars macro, eax trashed
4039    cycles for call ClearLocalsB, eax trashed
4023    cycles for call ClearLocals, ALL regs preserved (old MB)
4042    cycles for ClearLocalVariables, ALL regs preserved (new MB)
Title: Re: ClearLocalVariables timings
Post by: sinsi on July 22, 2013, 08:43:51 PM

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
clearing 8000 local bytes ...
821     cycles for ZEROLOCALES, eax+ecx+edx trashed
2095    cycles for cll, eax trashed
3976    cycles for ClearLocVars macro, eax trashed
4006    cycles for call ClearLocalsB, eax trashed
3953    cycles for call ClearLocals, ALL regs preserved (old MB)
4364    cycles for ClearLocalVariables, ALL regs preserved (new MB)

778     cycles for ZEROLOCALES, eax+ecx+edx trashed
2062    cycles for cll, eax trashed
3975    cycles for ClearLocVars macro, eax trashed
3981    cycles for call ClearLocalsB, eax trashed
3981    cycles for call ClearLocals, ALL regs preserved (old MB)
4044    cycles for ClearLocalVariables, ALL regs preserved (new MB)

795     cycles for ZEROLOCALES, eax+ecx+edx trashed
2245    cycles for cll, eax trashed
3954    cycles for ClearLocVars macro, eax trashed
3973    cycles for call ClearLocalsB, eax trashed
3968    cycles for call ClearLocals, ALL regs preserved (old MB)
4249    cycles for ClearLocalVariables, ALL regs preserved (new MB)
Title: Re: ClearLocalVariables timings
Post by: Gunther on July 22, 2013, 09:40:06 PM
Jochen,

your timings:


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
clearing 8000 local bytes ...
876     cycles for ZEROLOCALES, eax+ecx+edx trashed
1837    cycles for cll, eax trashed
3576    cycles for ClearLocVars macro, eax trashed
3588    cycles for call ClearLocalsB, eax trashed
3535    cycles for call ClearLocals, ALL regs preserved (old MB)
3544    cycles for ClearLocalVariables, ALL regs preserved (new MB)

697     cycles for ZEROLOCALES, eax+ecx+edx trashed
1807    cycles for cll, eax trashed
3525    cycles for ClearLocVars macro, eax trashed
3556    cycles for call ClearLocalsB, eax trashed
3538    cycles for call ClearLocals, ALL regs preserved (old MB)
3559    cycles for ClearLocalVariables, ALL regs preserved (new MB)

709     cycles for ZEROLOCALES, eax+ecx+edx trashed
1814    cycles for cll, eax trashed
3573    cycles for ClearLocVars macro, eax trashed
3575    cycles for call ClearLocalsB, eax trashed
3564    cycles for call ClearLocals, ALL regs preserved (old MB)
3564    cycles for ClearLocalVariables, ALL regs preserved (new MB)


Gunther
Title: Re: ClearLocalVariables timings
Post by: fearless on July 22, 2013, 09:42:47 PM
Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz (SSE4) (note: Overclocked to 4.2Ghz)
clearing 8000 local bytes ...
668     cycles for ZEROLOCALES, eax+ecx+edx trashed
1686    cycles for cll, eax trashed
3294    cycles for ClearLocVars macro, eax trashed
3308    cycles for call ClearLocalsB, eax trashed
3285    cycles for call ClearLocals, ALL regs preserved (old MB)
3302    cycles for ClearLocalVariables, ALL regs preserved (new MB)

646     cycles for ZEROLOCALES, eax+ecx+edx trashed
1680    cycles for cll, eax trashed
3275    cycles for ClearLocVars macro, eax trashed
3301    cycles for call ClearLocalsB, eax trashed
3277    cycles for call ClearLocals, ALL regs preserved (old MB)
3300    cycles for ClearLocalVariables, ALL regs preserved (new MB)

648     cycles for ZEROLOCALES, eax+ecx+edx trashed
1678    cycles for cll, eax trashed
3280    cycles for ClearLocVars macro, eax trashed
3296    cycles for call ClearLocalsB, eax trashed
3279    cycles for call ClearLocals, ALL regs preserved (old MB)
3295    cycles for ClearLocalVariables, ALL regs preserved (new MB)
Title: Re: ClearLocalVariables timings
Post by: jj2007 on July 22, 2013, 09:54:29 PM
Thanks, Sinsi & Gunther & fearless. I have changed a few bytes in version 2, increasing the bytes to be cleared to 80k, and observing that for a buffer that large

  mov dword ptr [ebp+4*ecx], 0

is 10% faster than

  and dword ptr [ebp+4*ecx], 0

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
clearing 81920 local bytes ...
23384   cycles for ZEROLOCALES, eax+ecx+edx trashed
41014   cycles for cll, eax trashed
61512   cycles for ClearLocVars macro, eax trashed
41110   cycles for call ClearLocalsB, eax trashed
41063   cycles for call ClearLocals, ALL regs preserved (old MB)
41073   cycles for ClearLocalVariables, ALL regs preserved (new MB)
Title: Re: ClearLocalVariables timings
Post by: fearless on July 22, 2013, 10:07:35 PM
Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz (SSE4)
clearing 81920 local bytes ...
6977    cycles for ZEROLOCALES, eax+ecx+edx trashed
18441   cycles for cll, eax trashed
50324   cycles for ClearLocVars macro, eax trashed
33505   cycles for call ClearLocalsB, eax trashed
33308   cycles for call ClearLocals, ALL regs preserved (old MB)
17753   cycles for ClearLocalVariables, ALL regs preserved (new MB)

6605    cycles for ZEROLOCALES, eax+ecx+edx trashed
17721   cycles for cll, eax trashed
50412   cycles for ClearLocVars macro, eax trashed
33346   cycles for call ClearLocalsB, eax trashed
33358   cycles for call ClearLocals, ALL regs preserved (old MB)
17735   cycles for ClearLocalVariables, ALL regs preserved (new MB)

6609    cycles for ZEROLOCALES, eax+ecx+edx trashed
17728   cycles for cll, eax trashed
50162   cycles for ClearLocVars macro, eax trashed
33556   cycles for call ClearLocalsB, eax trashed
33300   cycles for call ClearLocals, ALL regs preserved (old MB)
17731   cycles for ClearLocalVariables, ALL regs preserved (new MB)


Code sizes:
MyTest1      = 24
MyTest2      = 21
MyTest3      = 13
MyTest4      = 8
MyTest5      = 8
MyTest6      = 8

plus once for procs (call ClearLoc*):
ClearLocalsB = 17
ClearLocVars = 64
ClearLocals  = 19
Title: Re: ClearLocalVariables timings
Post by: FORTRANS on July 22, 2013, 11:16:21 PM
Hi,

   Measurements from three older processors.  Interesting that
the timings vary less than 0.5% for the second one.

pre-P4 (SSE1)
clearing 81920 local bytes ...
31197 cycles for ZEROLOCALES, eax+ecx+edx trashed
41399 cycles for cll, eax trashed
62224 cycles for ClearLocVars macro, eax trashed
41436 cycles for call ClearLocalsB, eax trashed
41474 cycles for call ClearLocals, ALL regs preserved (old MB)
41589 cycles for ClearLocalVariables, ALL regs preserved (new MB)

31155 cycles for ZEROLOCALES, eax+ecx+edx trashed
41436 cycles for cll, eax trashed
62100 cycles for ClearLocVars macro, eax trashed
41535 cycles for call ClearLocalsB, eax trashed
41452 cycles for call ClearLocals, ALL regs preserved (old MB)
41455 cycles for ClearLocalVariables, ALL regs preserved (new MB)

31256 cycles for ZEROLOCALES, eax+ecx+edx trashed
41477 cycles for cll, eax trashed
62134 cycles for ClearLocVars macro, eax trashed
41688 cycles for call ClearLocalsB, eax trashed
41778 cycles for call ClearLocals, ALL regs preserved (old MB)
41769 cycles for ClearLocalVariables, ALL regs preserved (new MB)


pre-P4clearing 81920 local bytes ...
260560 cycles for ZEROLOCALES, eax+ecx+edx trashed
259603 cycles for cll, eax trashed
259593 cycles for ClearLocVars macro, eax trashed
259489 cycles for call ClearLocalsB, eax trashed
259370 cycles for call ClearLocals, ALL regs preserved (old MB)
259537 cycles for ClearLocalVariables, ALL regs preserved (new MB)

259917 cycles for ZEROLOCALES, eax+ecx+edx trashed
259498 cycles for cll, eax trashed
259613 cycles for ClearLocVars macro, eax trashed
259542 cycles for call ClearLocalsB, eax trashed
259394 cycles for call ClearLocals, ALL regs preserved (old MB)
260441 cycles for ClearLocalVariables, ALL regs preserved (new MB)

259784 cycles for ZEROLOCALES, eax+ecx+edx trashed
259419 cycles for cll, eax trashed
259618 cycles for ClearLocVars macro, eax trashed
259628 cycles for call ClearLocalsB, eax trashed
260104 cycles for call ClearLocals, ALL regs preserved (old MB)
259610 cycles for ClearLocalVariables, ALL regs preserved (new MB)

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
clearing 81920 local bytes ...
20912 cycles for ZEROLOCALES, eax+ecx+edx trashed
41490 cycles for cll, eax trashed
41619 cycles for ClearLocVars macro, eax trashed
41554 cycles for call ClearLocalsB, eax trashed
41613 cycles for call ClearLocals, ALL regs preserved (old MB)
41542 cycles for ClearLocalVariables, ALL regs preserved (new MB)

20843 cycles for ZEROLOCALES, eax+ecx+edx trashed
41566 cycles for cll, eax trashed
41487 cycles for ClearLocVars macro, eax trashed
41507 cycles for call ClearLocalsB, eax trashed
41530 cycles for call ClearLocals, ALL regs preserved (old MB)
41524 cycles for ClearLocalVariables, ALL regs preserved (new MB)

20849 cycles for ZEROLOCALES, eax+ecx+edx trashed
41524 cycles for cll, eax trashed
41519 cycles for ClearLocVars macro, eax trashed
41546 cycles for call ClearLocalsB, eax trashed
41551 cycles for call ClearLocals, ALL regs preserved (old MB)
41546 cycles for ClearLocalVariables, ALL regs preserved (new MB)

Regards,

Steve N.
Title: Re: ClearLocalVariables timings
Post by: sinsi on July 22, 2013, 11:17:32 PM
What version of Windows are we all running? Timings for similar CPUs are strange.
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
clearing 81920 local bytes ...
8057    cycles for ZEROLOCALES, eax+ecx+edx trashed
20137   cycles for cll, eax trashed
56446   cycles for ClearLocVars macro, eax trashed
37108   cycles for call ClearLocalsB, eax trashed
37045   cycles for call ClearLocals, ALL regs preserved (old MB)
19741   cycles for ClearLocalVariables, ALL regs preserved (new MB)
Title: Re: ClearLocalVariables timings
Post by: Gunther on July 22, 2013, 11:51:01 PM
Jochen,

version B timings:


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
clearing 81920 local bytes ...
7765    cycles for ZEROLOCALES, eax+ecx+edx trashed
19350   cycles for cll, eax trashed
54183   cycles for ClearLocVars macro, eax trashed
36050   cycles for call ClearLocalsB, eax trashed
36034   cycles for call ClearLocals, ALL regs preserved (old MB)
19148   cycles for ClearLocalVariables, ALL regs preserved (new MB)

7476    cycles for ZEROLOCALES, eax+ecx+edx trashed
19529   cycles for cll, eax trashed
54103   cycles for ClearLocVars macro, eax trashed
35980   cycles for call ClearLocalsB, eax trashed
36065   cycles for call ClearLocals, ALL regs preserved (old MB)
19185   cycles for ClearLocalVariables, ALL regs preserved (new MB)

7248    cycles for ZEROLOCALES, eax+ecx+edx trashed
19233   cycles for cll, eax trashed
54129   cycles for ClearLocVars macro, eax trashed
35991   cycles for call ClearLocalsB, eax trashed
36227   cycles for call ClearLocals, ALL regs preserved (old MB)
19101   cycles for ClearLocalVariables, ALL regs preserved (new MB)


Code sizes:
MyTest1      = 24
MyTest2      = 21
MyTest3      = 13
MyTest4      = 8
MyTest5      = 8
MyTest6      = 8

plus once for procs (call ClearLoc*):
ClearLocalsB = 17
ClearLocVars = 64
ClearLocals  = 19


Japheth would say: Aha, the cycle counter brigade.  :lol:

Gunther
Title: Re: ClearLocalVariables timings
Post by: jj2007 on July 23, 2013, 12:06:40 AM
Quote from: Gunther on July 22, 2013, 11:51:01 PM
Japheth would say: Aha, the cycle counter brigade.  :lol:
Pssst...

@sinsi: I've added GetVersionEx; 6.1. is Win7 (32-bit)

AMD Athlon(tm) Dual Core Processor 4450B (SSE3) 6.1. Service Pack 1
clearing 81920 local bytes ...
23375   cycles for ZEROLOCALES, eax+ecx+edx trashed
61559   cycles for cll, eax trashed
41031   cycles for ClearLocVars macro, eax trashed
41058   cycles for call ClearLocalsB, eax trashed
41062   cycles for call ClearLocals, ALL regs preserved (old MB)
41073   cycles for ClearLocalVariables, ALL regs preserved (new MB)
23428   cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

23319   cycles for ZEROLOCALES, eax+ecx+edx trashed
61735   cycles for cll, eax trashed
41053   cycles for ClearLocVars macro, eax trashed
41054   cycles for call ClearLocalsB, eax trashed
41031   cycles for call ClearLocals, ALL regs preserved (old MB)
41073   cycles for ClearLocalVariables, ALL regs preserved (new MB)
23549   cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

clearing 100 local bytes ...
31      cycles for ZEROLOCALES, eax+ecx+edx trashed
92      cycles for cll, eax trashed
93      cycles for ClearLocVars macro, eax trashed
86      cycles for call ClearLocalsB, eax trashed
71      cycles for call ClearLocals, ALL regs preserved (old MB)
92      cycles for ClearLocalVariables, ALL regs preserved (new MB)
51      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

31      cycles for ZEROLOCALES, eax+ecx+edx trashed
92      cycles for cll, eax trashed
93      cycles for ClearLocVars macro, eax trashed
86      cycles for call ClearLocalsB, eax trashed
71      cycles for call ClearLocals, ALL regs preserved (old MB)
92      cycles for ClearLocalVariables, ALL regs preserved (new MB)
50      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3) 5.1. Service Pack 3
clearing 81920 local bytes ...
14373   cycles for ZEROLOCALES, eax+ecx+edx trashed
41079   cycles for cll, eax trashed
41043   cycles for ClearLocVars macro, eax trashed
41061   cycles for call ClearLocalsB, eax trashed
41203   cycles for call ClearLocals, ALL regs preserved (old MB)
41103   cycles for ClearLocalVariables, ALL regs preserved (new MB)
14344   cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

14277   cycles for ZEROLOCALES, eax+ecx+edx trashed
41024   cycles for cll, eax trashed
41087   cycles for ClearLocVars macro, eax trashed
41182   cycles for call ClearLocalsB, eax trashed
41051   cycles for call ClearLocals, ALL regs preserved (old MB)
41143   cycles for ClearLocalVariables, ALL regs preserved (new MB)
14373   cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

clearing 100 local bytes ...
74      cycles for ZEROLOCALES, eax+ecx+edx trashed
80      cycles for cll, eax trashed
105     cycles for ClearLocVars macro, eax trashed
73      cycles for call ClearLocalsB, eax trashed
64      cycles for call ClearLocals, ALL regs preserved (old MB)
112     cycles for ClearLocalVariables, ALL regs preserved (new MB)
93      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

74      cycles for ZEROLOCALES, eax+ecx+edx trashed
80      cycles for cll, eax trashed
105     cycles for ClearLocVars macro, eax trashed
76      cycles for call ClearLocalsB, eax trashed
64      cycles for call ClearLocals, ALL regs preserved (old MB)
112     cycles for ClearLocalVariables, ALL regs preserved (new MB)
93      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
Title: Re: ClearLocalVariables timings
Post by: Gunther on July 23, 2013, 01:02:31 AM
Another run:


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4) 6.1. Service Pack 1
clearing 81920 local bytes ...
10300   cycles for ZEROLOCALES, eax+ecx+edx trashed
19305   cycles for cll, eax trashed
36224   cycles for ClearLocVars macro, eax trashed
36317   cycles for call ClearLocalsB, eax trashed
35946   cycles for call ClearLocals, ALL regs preserved (old MB)
19543   cycles for ClearLocalVariables, ALL regs preserved (new MB)
7278    cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

7207    cycles for ZEROLOCALES, eax+ecx+edx trashed
19267   cycles for cll, eax trashed
35998   cycles for ClearLocVars macro, eax trashed
35925   cycles for call ClearLocalsB, eax trashed
36074   cycles for call ClearLocals, ALL regs preserved (old MB)
19142   cycles for ClearLocalVariables, ALL regs preserved (new MB)
7183    cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

clearing 100 local bytes ...
38      cycles for ZEROLOCALES, eax+ecx+edx trashed
43      cycles for cll, eax trashed
66      cycles for ClearLocVars macro, eax trashed
71      cycles for call ClearLocalsB, eax trashed
52      cycles for call ClearLocals, ALL regs preserved (old MB)
64      cycles for ClearLocalVariables, ALL regs preserved (new MB)
57      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

43      cycles for ZEROLOCALES, eax+ecx+edx trashed
49      cycles for cll, eax trashed
73      cycles for ClearLocVars macro, eax trashed
71      cycles for call ClearLocalsB, eax trashed
54      cycles for call ClearLocals, ALL regs preserved (old MB)
65      cycles for ClearLocalVariables, ALL regs preserved (new MB)
57      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)


Gunther
Title: Re: ClearLocalVariables timings
Post by: fearless on July 23, 2013, 03:31:28 AM
Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz (SSE4) 6.1. Service Pack 1 (Win7 64bit CPU OC'd to 4.2Ghz)
clearing 81920 local bytes ...
7388    cycles for ZEROLOCALES, eax+ecx+edx trashed
18261   cycles for cll, eax trashed
34105   cycles for ClearLocVars macro, eax trashed
34369   cycles for call ClearLocalsB, eax trashed
34281   cycles for call ClearLocals, ALL regs preserved (old MB)
18126   cycles for ClearLocalVariables, ALL regs preserved (new MB)
6764    cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

6772    cycles for ZEROLOCALES, eax+ecx+edx trashed
18056   cycles for cll, eax trashed
33844   cycles for ClearLocVars macro, eax trashed
33946   cycles for call ClearLocalsB, eax trashed
33878   cycles for call ClearLocals, ALL regs preserved (old MB)
18044   cycles for ClearLocalVariables, ALL regs preserved (new MB)
6760    cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

clearing 100 local bytes ...
37      cycles for ZEROLOCALES, eax+ecx+edx trashed
43      cycles for cll, eax trashed
67      cycles for ClearLocVars macro, eax trashed
62      cycles for call ClearLocalsB, eax trashed
46      cycles for call ClearLocals, ALL regs preserved (old MB)
65      cycles for ClearLocalVariables, ALL regs preserved (new MB)
51      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

38      cycles for ZEROLOCALES, eax+ecx+edx trashed
49      cycles for cll, eax trashed
64      cycles for ClearLocVars macro, eax trashed
67      cycles for call ClearLocalsB, eax trashed
45      cycles for call ClearLocals, ALL regs preserved (old MB)
55      cycles for ClearLocalVariables, ALL regs preserved (new MB)
49      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)
Title: Re: ClearLocalVariables timings
Post by: Ficko on July 23, 2013, 04:17:01 AM
I have nothing to the speed contest here just like to post what I am using for clearing locals: :idea:


; *********************************************************
; ZEROLOCALS NameofProc
; *********************************************************
ZEROLOCALS MACRO Subroutine
mov ecx, ebp
lea eax, [esp+(($-Subroutine)-8)*4]
sub ecx, eax
invokeA RtlZeroMemory,eax,ecx
ENDM


And a proc if size matters more than speed: (Taking care of not only MASM add esp, -n, sub esp, +n as well)


;ZeroSubVars(pProc:POINTER)
mov eax, [esp+4]
mov ecx, dword ptr [eax+5]
.if (byte ptr [eax+4] == 0C4h)
neg ecx
.endif
.if (byte ptr [eax+3] == 83h)
movzx ecx, cl
.endif
mov edx, edi
mov edi, ecx
sub edi, ebp
neg edi
shr ecx, 2
xor eax, eax
rep stosd
mov edi, edx
        ret 4
Title: Re: ClearLocalVariables timings
Post by: jj2007 on July 23, 2013, 04:39:39 AM
Hi Ficko,
Looks remarkably close to my version (see attachment above, line 385ff):
align 16
ClearLocVars2 proc        ; ### 22.7.13, 62 bytes ###
  push eax
  push edi
  push ecx
  mov eax, [esp+12]                ; get EIP
  sub eax, 9                        ; start before sub/add esp and the call; no add/sub->int 77
  lea ecx, [eax-127]                ; set a limit
@@:
  dec eax
  cmp eax, ecx                        ; -17 works for 4*uses and long sub/add esp
  js cLerr
  cmp word ptr [eax], 0EC8Bh        ; mov ebp, esp - PoAsm uses 0E589h and sub, nn
  jne @B
  cmp byte ptr [eax+2], 81h        ; short add/sub esp?
  mov ecx, [eax+4]                ; long
  je @F
; cmp byte ptr [eax+ecx+2], 83h        ; can only occur if there are no locals
; jne cLerr
  movsx ecx, cl                ; short, sign-extend
@@:
  mov edi, ecx
  sar ecx, 2                      ; stack is always dword-aligned, so divide by 4 is OK
  js @F
  neg ecx                        ; JWasm uses sub esp, nn
  neg edi                        ; ML uses add esp, -nn
@@:
  xor eax, eax
  neg ecx
  add edi, ebp
  rep stosd
  pop ecx
  pop edi
  pop eax
  retn
cLerr:
  int 77h              ; set a really strange marker in case of crash
ClearLocVars2 endp


Your version needs a
  push offset whateverproc  ; 5 bytes extra
  call ZeroSubVars


How do you deal with uses esi edi ebx?
Title: Re: ClearLocalVariables timings
Post by: Ficko on July 23, 2013, 09:18:07 PM
Quote from: jj2007 on July 23, 2013, 04:39:39 AM
Hi Ficko,
Looks remarkably close to my version (see attachment above, line 385ff):

Indeed. :shock:

Quote
How do you deal with uses esi edi ebx?

I am writing my prgs generally in high level languages and using static libraries with procs written in JWASM.
So I have to adhere to certain conventions.
Like not care about preserving eax,ecx,edx like you do.

My macro above should be of curse the first item after the declarations so preserving edi,esi,ebx happens naturally with "use".

Test proc public use edi esi ebx param00:dword
LOCAL .......
ZEROLOCALS Test
...
..

- If that what you were asking for. –

The "ZerorSubVars" proc trashes only "edi" which is saved into "edx" temporally.
Title: Re: ClearLocalVariables timings
Post by: jj2007 on July 23, 2013, 10:37:41 PM
Quote from: Ficko on July 23, 2013, 09:18:07 PM
The "ZerorSubVars" proc trashes only "edi" which is saved into "edx" temporally.

I'm trying to test your version:
   push offset MyProc
   call ZeroSubVars

But mov eax, [esp+4] yields 401005, which happens to be
00401005   . /E9 53000000   jmp MyProc


It works for MyProc proc private uses ... args but not for public (the default).

Will keep trying ;-)
Title: Re: ClearLocalVariables timings
Post by: Ficko on July 23, 2013, 11:39:51 PM
I am not sure what you are doing jj. :icon_eek:
If you call a proc with a parameter then [esp+4] is the parameter you pushed.
Therefore you push the offset of a PROC than you get the offset of a PROC.

Title: Re: ClearLocalVariables timings
Post by: Ficko on July 23, 2013, 11:50:52 PM
Ok, I have a thought.

If you did this:


ZeroSubVars proc public pProc:DWORD
....


It will not work.
It has to be a frameless proc.


ZeroSubVars:mov eax, [esp+4]
Title: Re: ClearLocalVariables timings
Post by: jj2007 on July 24, 2013, 12:13:32 AM
Quote from: Ficko on July 23, 2013, 11:39:51 PM
I am not sure what you are doing jj. :icon_eek:
If you call a proc with a parameter then [esp+4] is the parameter you pushed.
Therefore you push the offset of a PROC than you get the offset of a PROC.

That sounds entirely plausible. Nonetheless, the bloody beast pushes the jump table, not the actual proc start - unless I declare it private. And of course, ZeroSubVars has option prologue:none etc.
::)

Now the good news is that I found the culprit. Testbed attached - try with and without /debug in the linker's commandline, and prepare for a fat surprise :icon_mrgreen:

P.S.: Don't use polink, use the old Masm32 default linker.
Title: Re: ClearLocalVariables timings
Post by: Ficko on July 24, 2013, 01:01:35 AM
Here is my test code (Tested with JWASM and ML the "bloody beast" must be "GoASM":P):


.686p
.model flat,stdcall
option casemap :none ; case sensitive
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib
; *********************************************************
; turn stackframe off and on for low overhead procedures
; *********************************************************
Stackframe MACRO arg
  IFIDNI <on>,<arg>
    OPTION PROLOGUE:PrologueDef
    OPTION EPILOGUE:EpilogueDef
  ELSEIFIDNI <off>,<arg>
    OPTION PROLOGUE:NONE
    OPTION EPILOGUE:NONE
  ELSE
    echo -----------------------------------
    echo ERROR IN "stackframe" MACRO
    echo Incorrect Argument Supplied
    echo Options
    echo 1. off Turn default stack frame off
    echo 2. on  Restore stack frame defaults
    echo SYNTAX : frame on/off
    echo -----------------------------------
    .err
  ENDIF
ENDM     
; ---------------------------------------------------------
Pilot PROTO
.data
    CommandLine   dd 0
    hInstance     dd 0
.code
start:
    invoke GetModuleHandle, NULL ; provides the instance handle
    mov hInstance, eax
    invoke GetCommandLine        ; provides the command line address
    mov CommandLine, eax
     invoke Pilot
    invoke ExitProcess,eax

Stackframe off
ZeroSubVars proc public
mov eax, [esp+4]
mov ecx, dword ptr [eax+5]
.if (byte ptr [eax+4] == 0C4h)
neg ecx
.endif
.if (byte ptr [eax+3] == 83h)
movzx ecx, cl
.endif
mov edx, edi
mov edi, ecx
sub edi, ebp
neg edi
shr ecx, 2
xor eax, eax
rep stosd
mov edi, edx
ret 4
ZeroSubVars endp

Stackframe on
Pilot proc public
LOCAL MyVar :QWORD
push offset Pilot
call ZeroSubVars
ret 4
Pilot endp
end start
Title: Re: ClearLocalVariables timings
Post by: jj2007 on July 24, 2013, 01:19:07 AM
Hi Ficko,
It's the linker. Depending on version and debug settings, it sometimes pushes the real offset, sometimes the offset of the jump table. The snippet below shows this:
00401005        offset MyPublic
00402000        offset MyPrivate

Same with polink:
00401FEE        offset MyPublic
00401FF0        offset MyPrivate


include \masm32\include\masm32rt.inc

.code
start:
   push offset MyPublic
   pop ecx
   print hex$(ecx), 9, "offset MyPublic", 13, 10

   push offset MyPrivate
   pop ecx
   inkey hex$(ecx), 9, "offset MyPrivate", 13, 10

   exit

nops 1000h-72h   ; to make it clearer

MyPublic proc public
  nop
  ret
MyPublic endp

MyPrivate proc private
  nop
  ret
MyPrivate endp

end start

OPT_Linker   link   ; polink works, 6.14 sometimes, 10.0 chokes if /debug is set
OPT_DebugA   /Zi
OPT_DebugL   /debug /DYNAMICBASE:NO
OPT_Assembler   mlv10
Title: Re: ClearLocalVariables timings
Post by: Ficko on July 24, 2013, 01:21:18 AM
That's really strange having no explanation. :icon_eek:
Title: Re: ClearLocalVariables timings
Post by: Antariy on July 24, 2013, 01:01:17 PM
Hi Jochen :t

When you're using /debug switch with the linker, and do not want to have those "jumps-to" wrappers, add /INCREMENTAL:NO switch to the linker. (Those jumps are things having two targets: fast linking scheme (incremental build), and/or are helpers in a debugging (that was especially important for breakpoints on an APIs on Win9x systems)).
Title: Re: ClearLocalVariables timings
Post by: jj2007 on July 24, 2013, 03:44:09 PM
Thanks, Alex, that solves the mystery :t

We had a long thread about the IAT jump tables in the old forum (http://www.masmforum.com/board/index.php?topic=11541.0). For my "zero locals" macro, the problem is not relevant because I chose to go back and look for mov ebp, esp, but Ficko's version relies on direct calls, so the linker behaviour needs to be watched.
Title: Re: ClearLocalVariables timings
Post by: Antariy on July 24, 2013, 10:40:33 PM
Yes, IAT jump thunks are not absolutely equal to "incremental/debug-mode" ones, but very similar things.

Interesting, with search of /INCREMENTAL found this thread on the old forum, too:
http://www.masmforum.com/board/index.php?topic=795.msg5491#msg5491
Title: Re: ClearLocalVariables timings
Post by: jj2007 on July 26, 2013, 08:07:57 AM
Thanks to everybody :icon14:

The fastest algo is now part of the MasmBasic library (more (http://masm32.com/board/index.php?topic=94.msg22404#msg22404)).