Author Topic: Zero a stack buffer (and probe it)  (Read 44858 times)

jj2007

  • Member
  • *****
  • Posts: 11772
  • Assembler is fun ;-)
    • MasmBasic
Zero a stack buffer (and probe it)
« on: October 25, 2013, 07:31:54 PM »
Spin-off from MemStrategy:

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 268/100 cycles

3778    kCycles for 100 * rep stosd
4905    kCycles for 100 * push 0
4890    kCycles for 100 * push edx
3343    kCycles for 100 * movups xmm0
3319    kCycles for 100 * movaps xmm0

3785    kCycles for 100 * rep stosd
4891    kCycles for 100 * push 0
4894    kCycles for 100 * push edx
3457    kCycles for 100 * movups xmm0
3319    kCycles for 100 * movaps xmm0

3785    kCycles for 100 * rep stosd
4891    kCycles for 100 * push 0
4896    kCycles for 100 * push edx
3342    kCycles for 100 * movups xmm0
3320    kCycles for 100 * movaps xmm0

18      bytes for rep stosd
17      bytes for push 0
16      bytes for push edx
22      bytes for movups xmm0
25      bytes for movaps xmm0

Siekmanski

  • Member
  • *****
  • Posts: 2442
Re: Zero a stack buffer (and probe it)
« Reply #1 on: October 25, 2013, 07:58:50 PM »
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
loop overhead is approx. 579/100 cycles

2700    kCycles for 100 * rep stosd
5467    kCycles for 100 * push 0
4888    kCycles for 100 * push edx
4266    kCycles for 100 * movups xmm0
1411    kCycles for 100 * movaps xmm0

2752    kCycles for 100 * rep stosd
4887    kCycles for 100 * push 0
5651    kCycles for 100 * push edx
4262    kCycles for 100 * movups xmm0
1030    kCycles for 100 * movaps xmm0

2699    kCycles for 100 * rep stosd
4892    kCycles for 100 * push 0
4888    kCycles for 100 * push edx
4263    kCycles for 100 * movups xmm0
1744    kCycles for 100 * movaps xmm0

18      bytes for rep stosd
17      bytes for push 0
16      bytes for push edx
22      bytes for movups xmm0
25      bytes for movaps xmm0

Creative coders use backward thinking techniques as a strategy.

sinsi

  • Guest
Re: Zero a stack buffer (and probe it)
« Reply #2 on: October 25, 2013, 08:40:15 PM »
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
loop overhead is approx. 310/100 cycles

2385    kCycles for 100 * rep stosd
4530    kCycles for 100 * push 0
4508    kCycles for 100 * push edx
3932    kCycles for 100 * movups xmm0
871     kCycles for 100 * movaps xmm0

TWell

  • Member
  • ****
  • Posts: 748
Re: Zero a stack buffer (and probe it)
« Reply #3 on: October 25, 2013, 09:05:27 PM »
AMD Athlon(tm) II X2 220 Processor (SSE3) 2.80 GHz
loop overhead is approx. 239/100 cycles

2621    kCycles for 100 * rep stosd
4891    kCycles for 100 * push 0
4895    kCycles for 100 * push edx
1666    kCycles for 100 * movups xmm0
1605    kCycles for 100 * movaps xmm0

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Zero a stack buffer (and probe it)
« Reply #4 on: October 25, 2013, 10:09:55 PM »
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 248/100 cycles

4986    kCycles for 100 * rep stosd
4827    kCycles for 100 * push 0
4991    kCycles for 100 * push edx
6187    kCycles for 100 * movups xmm0
2767    kCycles for 100 * movaps xmm0

5023    kCycles for 100 * rep stosd
4857    kCycles for 100 * push 0
4935    kCycles for 100 * push edx
6207    kCycles for 100 * movups xmm0
2766    kCycles for 100 * movaps xmm0

5023    kCycles for 100 * rep stosd
4855    kCycles for 100 * push 0
4990    kCycles for 100 * push edx
6225    kCycles for 100 * movups xmm0
2765    kCycles for 100 * movaps xmm0

nidud

  • Member
  • *****
  • Posts: 2302
    • https://github.com/nidud/asmc
Re: Zero a stack buffer (and probe it)
« Reply #5 on: October 25, 2013, 10:18:42 PM »
I cleaned up the rep stosd function a bit
Code: [Select]
mov edx,edi
lea edi,[esp-bufsize]
mov ecx,bufsize/4
xor eax,eax
rep stosd
mov edi,edx
dec ebx

AMD Athlon(tm) II X2 245 Processor (SSE3)
loop overhead is approx. 239/100 cycles

2623    kCycles for 100 * rep stosd
4900    kCycles for 100 * push 0
4901    kCycles for 100 * push edx
1592    kCycles for 100 * movups xmm0
1597    kCycles for 100 * movaps xmm0
1955    kCycles for 100 * rep stosd

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Zero a stack buffer (and probe it)
« Reply #6 on: October 25, 2013, 10:34:57 PM »
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 245/100 cycles

5107    kCycles for 100 * rep stosd
4844    kCycles for 100 * push 0
4902    kCycles for 100 * push edx
6153    kCycles for 100 * movups xmm0
2827    kCycles for 100 * movaps xmm0
2815    kCycles for 100 * rep stosd

5111    kCycles for 100 * rep stosd
4873    kCycles for 100 * push 0
4887    kCycles for 100 * push edx
6150    kCycles for 100 * movups xmm0
2795    kCycles for 100 * movaps xmm0
2782    kCycles for 100 * rep stosd

5053    kCycles for 100 * rep stosd
4892    kCycles for 100 * push 0
4850    kCycles for 100 * push edx
6179    kCycles for 100 * movups xmm0
2767    kCycles for 100 * movaps xmm0
2827    kCycles for 100 * rep stosd

Gunther

  • Member
  • *****
  • Posts: 3809
  • Forgive your enemies, but never forget their names
Re: Zero a stack buffer (and probe it)
« Reply #7 on: October 25, 2013, 11:22:20 PM »
Jochen,

here are the results from an old Computer (located in an University laboratory). The other tests from my machine at home will come this evening.

Code: [Select]
AMD Athlon(tm) Dual Core Processor 5000B (SSE3)
loop overhead is approx. 239/100 cycles

3779    kCycles for 100 * rep stosd
4897    kCycles for 100 * push 0
4901    kCycles for 100 * push edx
3344    kCycles for 100 * movups xmm0
3347    kCycles for 100 * movaps xmm0

3774    kCycles for 100 * rep stosd
4897    kCycles for 100 * push 0
4899    kCycles for 100 * push edx
3343    kCycles for 100 * movups xmm0
3341    kCycles for 100 * movaps xmm0

3778    kCycles for 100 * rep stosd
4897    kCycles for 100 * push 0
4901    kCycles for 100 * push edx
3344    kCycles for 100 * movups xmm0
3331    kCycles for 100 * movaps xmm0

18      bytes for rep stosd
17      bytes for push 0
16      bytes for push edx
22      bytes for movups xmm0
25      bytes for movaps xmm0

--- ok ---

Gunther
Get your facts first, and then you can distort them.

nidud

  • Member
  • *****
  • Posts: 2302
    • https://github.com/nidud/asmc
Re: Zero a stack buffer (and probe it)
« Reply #8 on: October 25, 2013, 11:35:41 PM »
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 245/100 cycles

5107    kCycles for 100 * rep stosd
4844    kCycles for 100 * push 0
4902    kCycles for 100 * push edx
6153    kCycles for 100 * movups xmm0
2827    kCycles for 100 * movaps xmm0
2815    kCycles for 100 * rep stosd

manipulation of the (direction) flag again  :biggrin:
shaves off some cycles on AMD but more on Intel

FORTRANS

  • Member
  • *****
  • Posts: 1115
Re: Zero a stack buffer (and probe it)
« Reply #9 on: October 26, 2013, 12:05:07 AM »
Mobile Intel(R) Celeron(R) processor     600MHz (SSE2)
loop overhead is approx. 211/100 cycles

7356    kCycles for 100 * rep stosd
4902    kCycles for 100 * push 0
4902    kCycles for 100 * push edx
3059    kCycles for 100 * movups xmm0
2312    kCycles for 100 * movaps xmm0
2207    kCycles for 100 * rep stosd

7358    kCycles for 100 * rep stosd
4905    kCycles for 100 * push 0
4897    kCycles for 100 * push edx
3064    kCycles for 100 * movups xmm0
2303    kCycles for 100 * movaps xmm0
2212    kCycles for 100 * rep stosd

7372    kCycles for 100 * rep stosd
4913    kCycles for 100 * push 0
4901    kCycles for 100 * push edx
3063    kCycles for 100 * movups xmm0
2303    kCycles for 100 * movaps xmm0
2214    kCycles for 100 * rep stosd

18      bytes for rep stosd
17      bytes for push 0
16      bytes for push edx
22      bytes for movups xmm0
25      bytes for movaps xmm0
17      bytes for rep stosd


--- ok ---

jj2007

  • Member
  • *****
  • Posts: 11772
  • Assembler is fun ;-)
    • MasmBasic
Re: Zero a stack buffer (and probe it)
« Reply #10 on: October 26, 2013, 12:52:01 AM »
Thanks to everybody :icon14:

I cleaned up the rep stosd function a bit

I appreciate your good intentions, Nidud. Put it under TestA, just for fun ;)
(hint: look at this thread's title)

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1204
Re: Zero a stack buffer (and probe it)
« Reply #11 on: October 26, 2013, 01:43:49 AM »
Northwood w/htt
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE2)
++18 of 20 tests valid, loop overhead is approx. 309/100 cycles

4910    kCycles for 100 * rep stosd
4902    kCycles for 100 * push 0
4904    kCycles for 100 * push edx
4904    kCycles for 100 * movups xmm0
2144    kCycles for 100 * movaps xmm0

4910    kCycles for 100 * rep stosd
5130    kCycles for 100 * push 0
4901    kCycles for 100 * push edx
4893    kCycles for 100 * movups xmm0
2140    kCycles for 100 * movaps xmm0

4911    kCycles for 100 * rep stosd
4909    kCycles for 100 * push 0
4903    kCycles for 100 * push edx
4895    kCycles for 100 * movups xmm0
2150    kCycles for 100 * movaps xmm0

18      bytes for rep stosd
17      bytes for push 0
16      bytes for push edx
22      bytes for movups xmm0
25      bytes for movaps xmm0

Well Microsoft, here’s another nice mess you’ve gotten us into.

nidud

  • Member
  • *****
  • Posts: 2302
    • https://github.com/nidud/asmc
Re: Zero a stack buffer (and probe it)
« Reply #12 on: October 26, 2013, 01:47:07 AM »
Put it under TestA, just for fun ;)

the intention should at best be educational  :P

having both of them will illustrate the penalty of manipulating the flags on different CPU's

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Zero a stack buffer (and probe it)
« Reply #13 on: October 26, 2013, 01:50:10 AM »
i try to avoid STD's   :lol:

in fact, i have gotten to where i don't use them at all
if i have to move things in that direction, i write a discrete loop

in this case, you could probe, then clear, one page at a time
something like this...
Code: [Select]
    ASSUME  FS:Nothing

    mov     edx,esp
    mov     fs:[700h],edi
    xor     eax,eax
    sub     edx,<NumberOfBytesRequiredPlus3Mod4>
    .repeat
        push    eax
        mov     ecx,esp
        mov     esp,fs:[8]
        sub     ecx,esp
        shr     ecx,2
        .if !ZERO
            mov     edi,esp
            rep     stosd
        .endif
    .until edx>=esp
    mov     edi,fs:[700h]
    mov     esp,edx

    ASSUME  FS:ERROR

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Zero a stack buffer (and probe it)
« Reply #14 on: October 26, 2013, 02:28:02 AM »
this is a simpler version...
Code: [Select]
    ASSUME  FS:Nothing

    mov     edx,esp
    mov     ecx,esp
    sub     edx,<NumberOfBytesRequiredPlus3Mod4>
    .repeat
        push    eax
        mov     esp,fs:[8]
    .until edx>=esp
    sub     ecx,edx
    xchg    edx,edi
    shr     ecx,2
    xor     eax,eax
    mov     esp,edi
    rep     stosd
    mov     edi,edx

    ASSUME  FS:ERROR