News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Test of timing code

Started by Antariy, May 08, 2013, 01:11:33 AM

Previous topic - Next topic

Antariy

Hi all, can I please have some timings for the attached testbed?

It's not test for the algo but rather a test of a timing method - so it will be very interesting to see how it performs on wide variety of hardware.

In this code I tried to eliminate the influence of the OS and the CPU's short-cuts for high loop counted tests.

Please, run it several times and post all results here.

Timings:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)

Integer looped code test

Cycles count: 56 (minimal was: 520, overhead: 464)
Cycles count: 56 (minimal was: 520, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 56 (minimal was: 520, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)


SSE code test

Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
--- ok ---


This method is just a rough idea so I don't know how good it is.

The timing method:


.data?
overhead    dd  ?
.code


    mov edi,0
    align 16
    @@:
    xor eax,eax
    cpuid
    rdtsc
    push edx
    push eax
    xor eax,eax
    cpuid
    rdtsc
    pop ecx
    sub eax,ecx
    pop ecx
    sbb edx,ecx
    inc edi
    cmp eax,edi  ; minimal cycles count, if the result is higher - the test has got interrupted
    ja @B
   
mov overhead,eax

print "Integer looped code test",13,10,13,10

REPEAT 10


    mov edi,overhead   ; put here any starting value you think the timings may not be smaller than that
    align 16
    @@:
    xor eax,eax
    cpuid
    rdtsc
    push edx
    push eax
    invoke Axhex2dw1,offset hexstring
    xor eax,eax
    cpuid
    rdtsc
    pop ecx
    sub eax,ecx
    pop ecx
    sbb edx,ecx
    inc edi
    cmp eax,edi  ; minimal cycles count, if the result is higher - the test has got interrupted
    ja @B

    sub eax,overhead

    invoke crt_printf,CTXT("Cycles count: %d (minimal was: %d, overhead: %d)",13,10),eax,edi,overhead

ENDM


The timing code calculates the smallest cycles count that its loop logic takes, and saves it as overhead.
Then timing code runs the testing piece in the same loop as it was for overhead calculation, but the starting counter to check the timing (edi) is set to a minumum - overhead, so, the code runs as less loops as possible before it gets passed the comparsion with a minimum - this should probably avoid influence of the CPU prediction logic and background OS work (managing hardware interrupts) as much as possible (probably, probably... ::))

Well, this is probably crazy idea and the timings will be unstable but it is interesting how it works :biggrin:
Tests and thoughts are welcome.

dedndave

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

Integer looped code test

Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 37 (minimal was: 519, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 37 (minimal was: 517, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)


SSE code test

Cycles count: 15 (minimal was: 495, overhead: 480)
Cycles count: 15 (minimal was: 496, overhead: 480)
Cycles count: 15 (minimal was: 496, overhead: 480)
Cycles count: 15 (minimal was: 499, overhead: 480)
Cycles count: 15 (minimal was: 495, overhead: 480)
Cycles count: 15 (minimal was: 497, overhead: 480)
Cycles count: 15 (minimal was: 496, overhead: 480)
Cycles count: 15 (minimal was: 495, overhead: 480)
Cycles count: 15 (minimal was: 501, overhead: 480)
Cycles count: 15 (minimal was: 495, overhead: 480)

Antariy

Thank you Dave for a very fast reply :t

dedndave

well - they are all over the place
that's because each test is so short   :P

Antariy

My idea is that this method may give more or less right results, especially on modern CPUs which have tendency to "run" tests ~20 instructions long for just couple of cycles with current testing method. This doesn't look as a possible thing just from tech point of view. This code tries to eliminate as much possible influence of CPU's prediction logic and OS' background work. Interesting - your CPU runs SSE code more stable than mine. Interesting how this method will work on very modern CPUs (if cycles count for SSE code will not be something like 2 cycles, then it probably works :lol:)

anta40


Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz (SSE4)

Integer looped code test

Cycles count: 36 (minimal was: 124, overhead: 87)
Cycles count: 36 (minimal was: 125, overhead: 87)
Cycles count: 36 (minimal was: 126, overhead: 87)
Cycles count: 36 (minimal was: 126, overhead: 87)
Cycles count: 36 (minimal was: 125, overhead: 87)
Cycles count: 36 (minimal was: 124, overhead: 87)
Cycles count: 39 (minimal was: 126, overhead: 87)
Cycles count: 36 (minimal was: 124, overhead: 87)
Cycles count: 36 (minimal was: 126, overhead: 87)
Cycles count: 36 (minimal was: 124, overhead: 87)


SSE code test

Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 103, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
--- ok ---

Antariy

Incredible! :biggrin:

Thank you, anta40 :t

FORTRANS

Hi Alex,

   P-III, Windows 2000; P-MMX, Windows 98; and Pentium M, WinXP.

Regards,

Steve N.


G:\WORK\TEMP>35hex2dw
???? (SSE1)

Integer looped code test

Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)


SSE code test

Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
--- ok ---

A:\>35hex2dw

Integer looped code test

Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)


SSE code test

A:\>35hex2dw
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

Integer looped code test

Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)


SSE code test

Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
--- ok ---

jj2007

Looks very promising, Alex :t
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)

Integer looped code test

Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)


SSE code test

Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)

hutch--



Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)

Integer looped code test

Cycles count: 36 (minimal was: 261, overhead: 225)
Cycles count: 27 (minimal was: 259, overhead: 225)
Cycles count: 27 (minimal was: 254, overhead: 225)
Cycles count: 27 (minimal was: 257, overhead: 225)
Cycles count: 27 (minimal was: 253, overhead: 225)
Cycles count: 36 (minimal was: 261, overhead: 225)
Cycles count: 27 (minimal was: 254, overhead: 225)
Cycles count: 27 (minimal was: 253, overhead: 225)
Cycles count: 27 (minimal was: 253, overhead: 225)
Cycles count: 27 (minimal was: 252, overhead: 225)


SSE code test

Cycles count: 9 (minimal was: 235, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Cycles count: 9 (minimal was: 235, overhead: 225)
Cycles count: 9 (minimal was: 236, overhead: 225)
Cycles count: 9 (minimal was: 236, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Cycles count: 9 (minimal was: 236, overhead: 225)
Cycles count: 9 (minimal was: 235, overhead: 225)
--- ok ---

TouEnMasm

Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)

Integer looped code test

Cycles count: 53 (minimal was: 515, overhead: 462)
Cycles count: 63 (minimal was: 525, overhead: 462)
Cycles count: 63 (minimal was: 525, overhead: 462)
Cycles count: 52 (minimal was: 522, overhead: 462)
Cycles count: 52 (minimal was: 524, overhead: 462)
Cycles count: 52 (minimal was: 518, overhead: 462)
Cycles count: 52 (minimal was: 520, overhead: 462)
Cycles count: 52 (minimal was: 516, overhead: 462)
Cycles count: 63 (minimal was: 525, overhead: 462)
Cycles count: 63 (minimal was: 525, overhead: 462)


SSE code test

Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 484, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
--- ok ---

Fa is a musical note to play with CL

jj2007

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

Integer looped code test

Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)


SSE code test

Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)

Siekmanski

Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)

Integer looped code test

Cycles count: 36 (minimal was: 306, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 45 (minimal was: 322, overhead: 243)
Cycles count: 36 (minimal was: 280, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 36 (minimal was: 311, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 36 (minimal was: 300, overhead: 243)


SSE code test

Cycles count: 27 (minimal was: 320, overhead: 243)
Cycles count: 18 (minimal was: 264, overhead: 243)
Cycles count: 18 (minimal was: 351, overhead: 243)
Cycles count: 18 (minimal was: 262, overhead: 243)
Cycles count: 153 (minimal was: 396, overhead: 243)
Cycles count: 153 (minimal was: 396, overhead: 243)
Cycles count: 153 (minimal was: 396, overhead: 243)
Cycles count: 18 (minimal was: 264, overhead: 243)
Cycles count: 18 (minimal was: 261, overhead: 243)
Cycles count: 18 (minimal was: 261, overhead: 243)
--- ok ---
Creative coders use backward thinking techniques as a strategy.

habran

Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)

Integer looped code test

Cycles count: 46 (minimal was: 108, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 48 (minimal was: 107, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 46 (minimal was: 104, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 46 (minimal was: 106, overhead: 58)
Cycles count: 48 (minimal was: 107, overhead: 58)


SSE code test

Cycles count: 36 (minimal was: 94, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)
--- ok ---
Cod-Father

Magnum


Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz (SSE4)

Integer looped code test

Cycles count: 90 (minimal was: 783, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 90 (minimal was: 785, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 108 (minimal was: 792, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 108 (minimal was: 792, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 108 (minimal was: 792, overhead: 684)


SSE code test

Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 704, overhead: 684)
Cycles count: 18 (minimal was: 704, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 704, overhead: 684)
Cycles count: 18 (minimal was: 703, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
--- ok ---
Take care,
                   Andy

Ubuntu-mate-18.04-desktop-amd64

http://www.goodnewsnetwork.org