News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

ClearLocalVariables timings

Started by jj2007, July 22, 2013, 08:25:30 PM

Previous topic - Next topic

jj2007

Reviving two old threads (zerolocals and "About LOCAL string"): Can I have some timings please?

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
clearing 8000 local bytes ...
1020    cycles for ZEROLOCALES, eax+ecx+edx trashed
6106    cycles for cll, eax trashed
4023    cycles for ClearLocVars macro, eax trashed
4039    cycles for call ClearLocalsB, eax trashed
4023    cycles for call ClearLocals, ALL regs preserved (old MB)
4042    cycles for ClearLocalVariables, ALL regs preserved (new MB)

sinsi


Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
clearing 8000 local bytes ...
821     cycles for ZEROLOCALES, eax+ecx+edx trashed
2095    cycles for cll, eax trashed
3976    cycles for ClearLocVars macro, eax trashed
4006    cycles for call ClearLocalsB, eax trashed
3953    cycles for call ClearLocals, ALL regs preserved (old MB)
4364    cycles for ClearLocalVariables, ALL regs preserved (new MB)

778     cycles for ZEROLOCALES, eax+ecx+edx trashed
2062    cycles for cll, eax trashed
3975    cycles for ClearLocVars macro, eax trashed
3981    cycles for call ClearLocalsB, eax trashed
3981    cycles for call ClearLocals, ALL regs preserved (old MB)
4044    cycles for ClearLocalVariables, ALL regs preserved (new MB)

795     cycles for ZEROLOCALES, eax+ecx+edx trashed
2245    cycles for cll, eax trashed
3954    cycles for ClearLocVars macro, eax trashed
3973    cycles for call ClearLocalsB, eax trashed
3968    cycles for call ClearLocals, ALL regs preserved (old MB)
4249    cycles for ClearLocalVariables, ALL regs preserved (new MB)

Gunther

Jochen,

your timings:


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
clearing 8000 local bytes ...
876     cycles for ZEROLOCALES, eax+ecx+edx trashed
1837    cycles for cll, eax trashed
3576    cycles for ClearLocVars macro, eax trashed
3588    cycles for call ClearLocalsB, eax trashed
3535    cycles for call ClearLocals, ALL regs preserved (old MB)
3544    cycles for ClearLocalVariables, ALL regs preserved (new MB)

697     cycles for ZEROLOCALES, eax+ecx+edx trashed
1807    cycles for cll, eax trashed
3525    cycles for ClearLocVars macro, eax trashed
3556    cycles for call ClearLocalsB, eax trashed
3538    cycles for call ClearLocals, ALL regs preserved (old MB)
3559    cycles for ClearLocalVariables, ALL regs preserved (new MB)

709     cycles for ZEROLOCALES, eax+ecx+edx trashed
1814    cycles for cll, eax trashed
3573    cycles for ClearLocVars macro, eax trashed
3575    cycles for call ClearLocalsB, eax trashed
3564    cycles for call ClearLocals, ALL regs preserved (old MB)
3564    cycles for ClearLocalVariables, ALL regs preserved (new MB)


Gunther
You have to know the facts before you can distort them.

fearless

Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz (SSE4) (note: Overclocked to 4.2Ghz)
clearing 8000 local bytes ...
668     cycles for ZEROLOCALES, eax+ecx+edx trashed
1686    cycles for cll, eax trashed
3294    cycles for ClearLocVars macro, eax trashed
3308    cycles for call ClearLocalsB, eax trashed
3285    cycles for call ClearLocals, ALL regs preserved (old MB)
3302    cycles for ClearLocalVariables, ALL regs preserved (new MB)

646     cycles for ZEROLOCALES, eax+ecx+edx trashed
1680    cycles for cll, eax trashed
3275    cycles for ClearLocVars macro, eax trashed
3301    cycles for call ClearLocalsB, eax trashed
3277    cycles for call ClearLocals, ALL regs preserved (old MB)
3300    cycles for ClearLocalVariables, ALL regs preserved (new MB)

648     cycles for ZEROLOCALES, eax+ecx+edx trashed
1678    cycles for cll, eax trashed
3280    cycles for ClearLocVars macro, eax trashed
3296    cycles for call ClearLocalsB, eax trashed
3279    cycles for call ClearLocals, ALL regs preserved (old MB)
3295    cycles for ClearLocalVariables, ALL regs preserved (new MB)

jj2007

Thanks, Sinsi & Gunther & fearless. I have changed a few bytes in version 2, increasing the bytes to be cleared to 80k, and observing that for a buffer that large

  mov dword ptr [ebp+4*ecx], 0

is 10% faster than

  and dword ptr [ebp+4*ecx], 0

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
clearing 81920 local bytes ...
23384   cycles for ZEROLOCALES, eax+ecx+edx trashed
41014   cycles for cll, eax trashed
61512   cycles for ClearLocVars macro, eax trashed
41110   cycles for call ClearLocalsB, eax trashed
41063   cycles for call ClearLocals, ALL regs preserved (old MB)
41073   cycles for ClearLocalVariables, ALL regs preserved (new MB)

fearless

Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz (SSE4)
clearing 81920 local bytes ...
6977    cycles for ZEROLOCALES, eax+ecx+edx trashed
18441   cycles for cll, eax trashed
50324   cycles for ClearLocVars macro, eax trashed
33505   cycles for call ClearLocalsB, eax trashed
33308   cycles for call ClearLocals, ALL regs preserved (old MB)
17753   cycles for ClearLocalVariables, ALL regs preserved (new MB)

6605    cycles for ZEROLOCALES, eax+ecx+edx trashed
17721   cycles for cll, eax trashed
50412   cycles for ClearLocVars macro, eax trashed
33346   cycles for call ClearLocalsB, eax trashed
33358   cycles for call ClearLocals, ALL regs preserved (old MB)
17735   cycles for ClearLocalVariables, ALL regs preserved (new MB)

6609    cycles for ZEROLOCALES, eax+ecx+edx trashed
17728   cycles for cll, eax trashed
50162   cycles for ClearLocVars macro, eax trashed
33556   cycles for call ClearLocalsB, eax trashed
33300   cycles for call ClearLocals, ALL regs preserved (old MB)
17731   cycles for ClearLocalVariables, ALL regs preserved (new MB)


Code sizes:
MyTest1      = 24
MyTest2      = 21
MyTest3      = 13
MyTest4      = 8
MyTest5      = 8
MyTest6      = 8

plus once for procs (call ClearLoc*):
ClearLocalsB = 17
ClearLocVars = 64
ClearLocals  = 19

FORTRANS

Hi,

   Measurements from three older processors.  Interesting that
the timings vary less than 0.5% for the second one.

pre-P4 (SSE1)
clearing 81920 local bytes ...
31197 cycles for ZEROLOCALES, eax+ecx+edx trashed
41399 cycles for cll, eax trashed
62224 cycles for ClearLocVars macro, eax trashed
41436 cycles for call ClearLocalsB, eax trashed
41474 cycles for call ClearLocals, ALL regs preserved (old MB)
41589 cycles for ClearLocalVariables, ALL regs preserved (new MB)

31155 cycles for ZEROLOCALES, eax+ecx+edx trashed
41436 cycles for cll, eax trashed
62100 cycles for ClearLocVars macro, eax trashed
41535 cycles for call ClearLocalsB, eax trashed
41452 cycles for call ClearLocals, ALL regs preserved (old MB)
41455 cycles for ClearLocalVariables, ALL regs preserved (new MB)

31256 cycles for ZEROLOCALES, eax+ecx+edx trashed
41477 cycles for cll, eax trashed
62134 cycles for ClearLocVars macro, eax trashed
41688 cycles for call ClearLocalsB, eax trashed
41778 cycles for call ClearLocals, ALL regs preserved (old MB)
41769 cycles for ClearLocalVariables, ALL regs preserved (new MB)


pre-P4clearing 81920 local bytes ...
260560 cycles for ZEROLOCALES, eax+ecx+edx trashed
259603 cycles for cll, eax trashed
259593 cycles for ClearLocVars macro, eax trashed
259489 cycles for call ClearLocalsB, eax trashed
259370 cycles for call ClearLocals, ALL regs preserved (old MB)
259537 cycles for ClearLocalVariables, ALL regs preserved (new MB)

259917 cycles for ZEROLOCALES, eax+ecx+edx trashed
259498 cycles for cll, eax trashed
259613 cycles for ClearLocVars macro, eax trashed
259542 cycles for call ClearLocalsB, eax trashed
259394 cycles for call ClearLocals, ALL regs preserved (old MB)
260441 cycles for ClearLocalVariables, ALL regs preserved (new MB)

259784 cycles for ZEROLOCALES, eax+ecx+edx trashed
259419 cycles for cll, eax trashed
259618 cycles for ClearLocVars macro, eax trashed
259628 cycles for call ClearLocalsB, eax trashed
260104 cycles for call ClearLocals, ALL regs preserved (old MB)
259610 cycles for ClearLocalVariables, ALL regs preserved (new MB)

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
clearing 81920 local bytes ...
20912 cycles for ZEROLOCALES, eax+ecx+edx trashed
41490 cycles for cll, eax trashed
41619 cycles for ClearLocVars macro, eax trashed
41554 cycles for call ClearLocalsB, eax trashed
41613 cycles for call ClearLocals, ALL regs preserved (old MB)
41542 cycles for ClearLocalVariables, ALL regs preserved (new MB)

20843 cycles for ZEROLOCALES, eax+ecx+edx trashed
41566 cycles for cll, eax trashed
41487 cycles for ClearLocVars macro, eax trashed
41507 cycles for call ClearLocalsB, eax trashed
41530 cycles for call ClearLocals, ALL regs preserved (old MB)
41524 cycles for ClearLocalVariables, ALL regs preserved (new MB)

20849 cycles for ZEROLOCALES, eax+ecx+edx trashed
41524 cycles for cll, eax trashed
41519 cycles for ClearLocVars macro, eax trashed
41546 cycles for call ClearLocalsB, eax trashed
41551 cycles for call ClearLocals, ALL regs preserved (old MB)
41546 cycles for ClearLocalVariables, ALL regs preserved (new MB)

Regards,

Steve N.

sinsi

What version of Windows are we all running? Timings for similar CPUs are strange.
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
clearing 81920 local bytes ...
8057    cycles for ZEROLOCALES, eax+ecx+edx trashed
20137   cycles for cll, eax trashed
56446   cycles for ClearLocVars macro, eax trashed
37108   cycles for call ClearLocalsB, eax trashed
37045   cycles for call ClearLocals, ALL regs preserved (old MB)
19741   cycles for ClearLocalVariables, ALL regs preserved (new MB)

Gunther

Jochen,

version B timings:


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
clearing 81920 local bytes ...
7765    cycles for ZEROLOCALES, eax+ecx+edx trashed
19350   cycles for cll, eax trashed
54183   cycles for ClearLocVars macro, eax trashed
36050   cycles for call ClearLocalsB, eax trashed
36034   cycles for call ClearLocals, ALL regs preserved (old MB)
19148   cycles for ClearLocalVariables, ALL regs preserved (new MB)

7476    cycles for ZEROLOCALES, eax+ecx+edx trashed
19529   cycles for cll, eax trashed
54103   cycles for ClearLocVars macro, eax trashed
35980   cycles for call ClearLocalsB, eax trashed
36065   cycles for call ClearLocals, ALL regs preserved (old MB)
19185   cycles for ClearLocalVariables, ALL regs preserved (new MB)

7248    cycles for ZEROLOCALES, eax+ecx+edx trashed
19233   cycles for cll, eax trashed
54129   cycles for ClearLocVars macro, eax trashed
35991   cycles for call ClearLocalsB, eax trashed
36227   cycles for call ClearLocals, ALL regs preserved (old MB)
19101   cycles for ClearLocalVariables, ALL regs preserved (new MB)


Code sizes:
MyTest1      = 24
MyTest2      = 21
MyTest3      = 13
MyTest4      = 8
MyTest5      = 8
MyTest6      = 8

plus once for procs (call ClearLoc*):
ClearLocalsB = 17
ClearLocVars = 64
ClearLocals  = 19


Japheth would say: Aha, the cycle counter brigade.  :lol:

Gunther
You have to know the facts before you can distort them.

jj2007

#9
Quote from: Gunther on July 22, 2013, 11:51:01 PM
Japheth would say: Aha, the cycle counter brigade.  :lol:
Pssst...

@sinsi: I've added GetVersionEx; 6.1. is Win7 (32-bit)

AMD Athlon(tm) Dual Core Processor 4450B (SSE3) 6.1. Service Pack 1
clearing 81920 local bytes ...
23375   cycles for ZEROLOCALES, eax+ecx+edx trashed
61559   cycles for cll, eax trashed
41031   cycles for ClearLocVars macro, eax trashed
41058   cycles for call ClearLocalsB, eax trashed
41062   cycles for call ClearLocals, ALL regs preserved (old MB)
41073   cycles for ClearLocalVariables, ALL regs preserved (new MB)
23428   cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

23319   cycles for ZEROLOCALES, eax+ecx+edx trashed
61735   cycles for cll, eax trashed
41053   cycles for ClearLocVars macro, eax trashed
41054   cycles for call ClearLocalsB, eax trashed
41031   cycles for call ClearLocals, ALL regs preserved (old MB)
41073   cycles for ClearLocalVariables, ALL regs preserved (new MB)
23549   cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

clearing 100 local bytes ...
31      cycles for ZEROLOCALES, eax+ecx+edx trashed
92      cycles for cll, eax trashed
93      cycles for ClearLocVars macro, eax trashed
86      cycles for call ClearLocalsB, eax trashed
71      cycles for call ClearLocals, ALL regs preserved (old MB)
92      cycles for ClearLocalVariables, ALL regs preserved (new MB)
51      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

31      cycles for ZEROLOCALES, eax+ecx+edx trashed
92      cycles for cll, eax trashed
93      cycles for ClearLocVars macro, eax trashed
86      cycles for call ClearLocalsB, eax trashed
71      cycles for call ClearLocals, ALL regs preserved (old MB)
92      cycles for ClearLocalVariables, ALL regs preserved (new MB)
50      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3) 5.1. Service Pack 3
clearing 81920 local bytes ...
14373   cycles for ZEROLOCALES, eax+ecx+edx trashed
41079   cycles for cll, eax trashed
41043   cycles for ClearLocVars macro, eax trashed
41061   cycles for call ClearLocalsB, eax trashed
41203   cycles for call ClearLocals, ALL regs preserved (old MB)
41103   cycles for ClearLocalVariables, ALL regs preserved (new MB)
14344   cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

14277   cycles for ZEROLOCALES, eax+ecx+edx trashed
41024   cycles for cll, eax trashed
41087   cycles for ClearLocVars macro, eax trashed
41182   cycles for call ClearLocalsB, eax trashed
41051   cycles for call ClearLocals, ALL regs preserved (old MB)
41143   cycles for ClearLocalVariables, ALL regs preserved (new MB)
14373   cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

clearing 100 local bytes ...
74      cycles for ZEROLOCALES, eax+ecx+edx trashed
80      cycles for cll, eax trashed
105     cycles for ClearLocVars macro, eax trashed
73      cycles for call ClearLocalsB, eax trashed
64      cycles for call ClearLocals, ALL regs preserved (old MB)
112     cycles for ClearLocalVariables, ALL regs preserved (new MB)
93      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

74      cycles for ZEROLOCALES, eax+ecx+edx trashed
80      cycles for cll, eax trashed
105     cycles for ClearLocVars macro, eax trashed
76      cycles for call ClearLocalsB, eax trashed
64      cycles for call ClearLocals, ALL regs preserved (old MB)
112     cycles for ClearLocalVariables, ALL regs preserved (new MB)
93      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

Gunther

Another run:


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4) 6.1. Service Pack 1
clearing 81920 local bytes ...
10300   cycles for ZEROLOCALES, eax+ecx+edx trashed
19305   cycles for cll, eax trashed
36224   cycles for ClearLocVars macro, eax trashed
36317   cycles for call ClearLocalsB, eax trashed
35946   cycles for call ClearLocals, ALL regs preserved (old MB)
19543   cycles for ClearLocalVariables, ALL regs preserved (new MB)
7278    cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

7207    cycles for ZEROLOCALES, eax+ecx+edx trashed
19267   cycles for cll, eax trashed
35998   cycles for ClearLocVars macro, eax trashed
35925   cycles for call ClearLocalsB, eax trashed
36074   cycles for call ClearLocals, ALL regs preserved (old MB)
19142   cycles for ClearLocalVariables, ALL regs preserved (new MB)
7183    cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

clearing 100 local bytes ...
38      cycles for ZEROLOCALES, eax+ecx+edx trashed
43      cycles for cll, eax trashed
66      cycles for ClearLocVars macro, eax trashed
71      cycles for call ClearLocalsB, eax trashed
52      cycles for call ClearLocals, ALL regs preserved (old MB)
64      cycles for ClearLocalVariables, ALL regs preserved (new MB)
57      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

43      cycles for ZEROLOCALES, eax+ecx+edx trashed
49      cycles for cll, eax trashed
73      cycles for ClearLocVars macro, eax trashed
71      cycles for call ClearLocalsB, eax trashed
54      cycles for call ClearLocals, ALL regs preserved (old MB)
65      cycles for ClearLocalVariables, ALL regs preserved (new MB)
57      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)


Gunther
You have to know the facts before you can distort them.

fearless

Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz (SSE4) 6.1. Service Pack 1 (Win7 64bit CPU OC'd to 4.2Ghz)
clearing 81920 local bytes ...
7388    cycles for ZEROLOCALES, eax+ecx+edx trashed
18261   cycles for cll, eax trashed
34105   cycles for ClearLocVars macro, eax trashed
34369   cycles for call ClearLocalsB, eax trashed
34281   cycles for call ClearLocals, ALL regs preserved (old MB)
18126   cycles for ClearLocalVariables, ALL regs preserved (new MB)
6764    cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

6772    cycles for ZEROLOCALES, eax+ecx+edx trashed
18056   cycles for cll, eax trashed
33844   cycles for ClearLocVars macro, eax trashed
33946   cycles for call ClearLocalsB, eax trashed
33878   cycles for call ClearLocals, ALL regs preserved (old MB)
18044   cycles for ClearLocalVariables, ALL regs preserved (new MB)
6760    cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

clearing 100 local bytes ...
37      cycles for ZEROLOCALES, eax+ecx+edx trashed
43      cycles for cll, eax trashed
67      cycles for ClearLocVars macro, eax trashed
62      cycles for call ClearLocalsB, eax trashed
46      cycles for call ClearLocals, ALL regs preserved (old MB)
65      cycles for ClearLocalVariables, ALL regs preserved (new MB)
51      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

38      cycles for ZEROLOCALES, eax+ecx+edx trashed
49      cycles for cll, eax trashed
64      cycles for ClearLocVars macro, eax trashed
67      cycles for call ClearLocalsB, eax trashed
45      cycles for call ClearLocals, ALL regs preserved (old MB)
55      cycles for ClearLocalVariables, ALL regs preserved (new MB)
49      cycles for ClearLocalVariables, ALL regs preserved (new MB, rep stosd)

Ficko

I have nothing to the speed contest here just like to post what I am using for clearing locals: :idea:


; *********************************************************
; ZEROLOCALS NameofProc
; *********************************************************
ZEROLOCALS MACRO Subroutine
mov ecx, ebp
lea eax, [esp+(($-Subroutine)-8)*4]
sub ecx, eax
invokeA RtlZeroMemory,eax,ecx
ENDM


And a proc if size matters more than speed: (Taking care of not only MASM add esp, -n, sub esp, +n as well)


;ZeroSubVars(pProc:POINTER)
mov eax, [esp+4]
mov ecx, dword ptr [eax+5]
.if (byte ptr [eax+4] == 0C4h)
neg ecx
.endif
.if (byte ptr [eax+3] == 83h)
movzx ecx, cl
.endif
mov edx, edi
mov edi, ecx
sub edi, ebp
neg edi
shr ecx, 2
xor eax, eax
rep stosd
mov edi, edx
        ret 4

jj2007

Hi Ficko,
Looks remarkably close to my version (see attachment above, line 385ff):
align 16
ClearLocVars2 proc        ; ### 22.7.13, 62 bytes ###
  push eax
  push edi
  push ecx
  mov eax, [esp+12]                ; get EIP
  sub eax, 9                        ; start before sub/add esp and the call; no add/sub->int 77
  lea ecx, [eax-127]                ; set a limit
@@:
  dec eax
  cmp eax, ecx                        ; -17 works for 4*uses and long sub/add esp
  js cLerr
  cmp word ptr [eax], 0EC8Bh        ; mov ebp, esp - PoAsm uses 0E589h and sub, nn
  jne @B
  cmp byte ptr [eax+2], 81h        ; short add/sub esp?
  mov ecx, [eax+4]                ; long
  je @F
; cmp byte ptr [eax+ecx+2], 83h        ; can only occur if there are no locals
; jne cLerr
  movsx ecx, cl                ; short, sign-extend
@@:
  mov edi, ecx
  sar ecx, 2                      ; stack is always dword-aligned, so divide by 4 is OK
  js @F
  neg ecx                        ; JWasm uses sub esp, nn
  neg edi                        ; ML uses add esp, -nn
@@:
  xor eax, eax
  neg ecx
  add edi, ebp
  rep stosd
  pop ecx
  pop edi
  pop eax
  retn
cLerr:
  int 77h              ; set a really strange marker in case of crash
ClearLocVars2 endp


Your version needs a
  push offset whateverproc  ; 5 bytes extra
  call ZeroSubVars


How do you deal with uses esi edi ebx?

Ficko

Quote from: jj2007 on July 23, 2013, 04:39:39 AM
Hi Ficko,
Looks remarkably close to my version (see attachment above, line 385ff):

Indeed. :shock:

Quote
How do you deal with uses esi edi ebx?

I am writing my prgs generally in high level languages and using static libraries with procs written in JWASM.
So I have to adhere to certain conventions.
Like not care about preserving eax,ecx,edx like you do.

My macro above should be of curse the first item after the declarations so preserving edi,esi,ebx happens naturally with "use".

Test proc public use edi esi ebx param00:dword
LOCAL .......
ZEROLOCALS Test
...
..

- If that what you were asking for. –

The "ZerorSubVars" proc trashes only "edi" which is saved into "edx" temporally.