The MASM Forum

General => The Laboratory => Topic started by: Antariy on May 08, 2013, 01:11:33 AM

Title: Test of timing code
Post by: Antariy on May 08, 2013, 01:11:33 AM
Hi all, can I please have some timings for the attached testbed?

It's not test for the algo but rather a test of a timing method - so it will be very interesting to see how it performs on wide variety of hardware.

In this code I tried to eliminate the influence of the OS and the CPU's short-cuts for high loop counted tests.

Please, run it several times and post all results here.

Timings:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)

Integer looped code test

Cycles count: 56 (minimal was: 520, overhead: 464)
Cycles count: 56 (minimal was: 520, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 56 (minimal was: 520, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)


SSE code test

Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
--- ok ---


This method is just a rough idea so I don't know how good it is.

The timing method:


.data?
overhead    dd  ?
.code


    mov edi,0
    align 16
    @@:
    xor eax,eax
    cpuid
    rdtsc
    push edx
    push eax
    xor eax,eax
    cpuid
    rdtsc
    pop ecx
    sub eax,ecx
    pop ecx
    sbb edx,ecx
    inc edi
    cmp eax,edi  ; minimal cycles count, if the result is higher - the test has got interrupted
    ja @B
   
mov overhead,eax

print "Integer looped code test",13,10,13,10

REPEAT 10


    mov edi,overhead   ; put here any starting value you think the timings may not be smaller than that
    align 16
    @@:
    xor eax,eax
    cpuid
    rdtsc
    push edx
    push eax
    invoke Axhex2dw1,offset hexstring
    xor eax,eax
    cpuid
    rdtsc
    pop ecx
    sub eax,ecx
    pop ecx
    sbb edx,ecx
    inc edi
    cmp eax,edi  ; minimal cycles count, if the result is higher - the test has got interrupted
    ja @B

    sub eax,overhead

    invoke crt_printf,CTXT("Cycles count: %d (minimal was: %d, overhead: %d)",13,10),eax,edi,overhead

ENDM


The timing code calculates the smallest cycles count that its loop logic takes, and saves it as overhead.
Then timing code runs the testing piece in the same loop as it was for overhead calculation, but the starting counter to check the timing (edi) is set to a minumum - overhead, so, the code runs as less loops as possible before it gets passed the comparsion with a minimum - this should probably avoid influence of the CPU prediction logic and background OS work (managing hardware interrupts) as much as possible (probably, probably... ::))

Well, this is probably crazy idea and the timings will be unstable but it is interesting how it works :biggrin:
Tests and thoughts are welcome.
Title: Re: Test of timing code
Post by: dedndave on May 08, 2013, 01:20:25 AM
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

Integer looped code test

Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 37 (minimal was: 519, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 37 (minimal was: 517, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)


SSE code test

Cycles count: 15 (minimal was: 495, overhead: 480)
Cycles count: 15 (minimal was: 496, overhead: 480)
Cycles count: 15 (minimal was: 496, overhead: 480)
Cycles count: 15 (minimal was: 499, overhead: 480)
Cycles count: 15 (minimal was: 495, overhead: 480)
Cycles count: 15 (minimal was: 497, overhead: 480)
Cycles count: 15 (minimal was: 496, overhead: 480)
Cycles count: 15 (minimal was: 495, overhead: 480)
Cycles count: 15 (minimal was: 501, overhead: 480)
Cycles count: 15 (minimal was: 495, overhead: 480)
Title: Re: Test of timing code
Post by: Antariy on May 08, 2013, 01:22:15 AM
Thank you Dave for a very fast reply :t
Title: Re: Test of timing code
Post by: dedndave on May 08, 2013, 01:23:05 AM
well - they are all over the place
that's because each test is so short   :P
Title: Re: Test of timing code
Post by: Antariy on May 08, 2013, 01:29:46 AM
My idea is that this method may give more or less right results, especially on modern CPUs which have tendency to "run" tests ~20 instructions long for just couple of cycles with current testing method. This doesn't look as a possible thing just from tech point of view. This code tries to eliminate as much possible influence of CPU's prediction logic and OS' background work. Interesting - your CPU runs SSE code more stable than mine. Interesting how this method will work on very modern CPUs (if cycles count for SSE code will not be something like 2 cycles, then it probably works :lol:)
Title: Re: Test of timing code
Post by: anta40 on May 08, 2013, 01:32:55 AM

Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz (SSE4)

Integer looped code test

Cycles count: 36 (minimal was: 124, overhead: 87)
Cycles count: 36 (minimal was: 125, overhead: 87)
Cycles count: 36 (minimal was: 126, overhead: 87)
Cycles count: 36 (minimal was: 126, overhead: 87)
Cycles count: 36 (minimal was: 125, overhead: 87)
Cycles count: 36 (minimal was: 124, overhead: 87)
Cycles count: 39 (minimal was: 126, overhead: 87)
Cycles count: 36 (minimal was: 124, overhead: 87)
Cycles count: 36 (minimal was: 126, overhead: 87)
Cycles count: 36 (minimal was: 124, overhead: 87)


SSE code test

Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 103, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
--- ok ---
Title: Re: Test of timing code
Post by: Antariy on May 08, 2013, 01:36:36 AM
Incredible! :biggrin:

Thank you, anta40 :t
Title: Re: Test of timing code
Post by: FORTRANS on May 08, 2013, 01:47:23 AM
Hi Alex,

   P-III, Windows 2000; P-MMX, Windows 98; and Pentium M, WinXP.

Regards,

Steve N.


G:\WORK\TEMP>35hex2dw
???? (SSE1)

Integer looped code test

Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)


SSE code test

Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
--- ok ---

A:\>35hex2dw

Integer looped code test

Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)


SSE code test

A:\>35hex2dw
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

Integer looped code test

Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)


SSE code test

Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
--- ok ---
Title: Re: Test of timing code
Post by: jj2007 on May 08, 2013, 01:53:25 AM
Looks very promising, Alex :t
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)

Integer looped code test

Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)


SSE code test

Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Title: Re: Test of timing code
Post by: hutch-- on May 08, 2013, 02:01:14 AM


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)

Integer looped code test

Cycles count: 36 (minimal was: 261, overhead: 225)
Cycles count: 27 (minimal was: 259, overhead: 225)
Cycles count: 27 (minimal was: 254, overhead: 225)
Cycles count: 27 (minimal was: 257, overhead: 225)
Cycles count: 27 (minimal was: 253, overhead: 225)
Cycles count: 36 (minimal was: 261, overhead: 225)
Cycles count: 27 (minimal was: 254, overhead: 225)
Cycles count: 27 (minimal was: 253, overhead: 225)
Cycles count: 27 (minimal was: 253, overhead: 225)
Cycles count: 27 (minimal was: 252, overhead: 225)


SSE code test

Cycles count: 9 (minimal was: 235, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Cycles count: 9 (minimal was: 235, overhead: 225)
Cycles count: 9 (minimal was: 236, overhead: 225)
Cycles count: 9 (minimal was: 236, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Cycles count: 9 (minimal was: 236, overhead: 225)
Cycles count: 9 (minimal was: 235, overhead: 225)
--- ok ---
Title: Re: Test of timing code
Post by: TouEnMasm on May 08, 2013, 02:48:56 AM
Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)

Integer looped code test

Cycles count: 53 (minimal was: 515, overhead: 462)
Cycles count: 63 (minimal was: 525, overhead: 462)
Cycles count: 63 (minimal was: 525, overhead: 462)
Cycles count: 52 (minimal was: 522, overhead: 462)
Cycles count: 52 (minimal was: 524, overhead: 462)
Cycles count: 52 (minimal was: 518, overhead: 462)
Cycles count: 52 (minimal was: 520, overhead: 462)
Cycles count: 52 (minimal was: 516, overhead: 462)
Cycles count: 63 (minimal was: 525, overhead: 462)
Cycles count: 63 (minimal was: 525, overhead: 462)


SSE code test

Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 484, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
--- ok ---

Title: Re: Test of timing code
Post by: jj2007 on May 08, 2013, 03:19:34 AM
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

Integer looped code test

Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)


SSE code test

Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Title: Re: Test of timing code
Post by: Siekmanski on May 08, 2013, 05:10:35 AM
Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)

Integer looped code test

Cycles count: 36 (minimal was: 306, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 45 (minimal was: 322, overhead: 243)
Cycles count: 36 (minimal was: 280, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 36 (minimal was: 311, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 36 (minimal was: 300, overhead: 243)


SSE code test

Cycles count: 27 (minimal was: 320, overhead: 243)
Cycles count: 18 (minimal was: 264, overhead: 243)
Cycles count: 18 (minimal was: 351, overhead: 243)
Cycles count: 18 (minimal was: 262, overhead: 243)
Cycles count: 153 (minimal was: 396, overhead: 243)
Cycles count: 153 (minimal was: 396, overhead: 243)
Cycles count: 153 (minimal was: 396, overhead: 243)
Cycles count: 18 (minimal was: 264, overhead: 243)
Cycles count: 18 (minimal was: 261, overhead: 243)
Cycles count: 18 (minimal was: 261, overhead: 243)
--- ok ---
Title: Re: Test of timing code
Post by: habran on May 08, 2013, 06:21:37 AM
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)

Integer looped code test

Cycles count: 46 (minimal was: 108, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 48 (minimal was: 107, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 46 (minimal was: 104, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 46 (minimal was: 106, overhead: 58)
Cycles count: 48 (minimal was: 107, overhead: 58)


SSE code test

Cycles count: 36 (minimal was: 94, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)
--- ok ---
Title: Re: Test of timing code
Post by: Magnum on May 08, 2013, 06:38:30 AM

Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz (SSE4)

Integer looped code test

Cycles count: 90 (minimal was: 783, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 90 (minimal was: 785, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 108 (minimal was: 792, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 108 (minimal was: 792, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 108 (minimal was: 792, overhead: 684)


SSE code test

Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 704, overhead: 684)
Cycles count: 18 (minimal was: 704, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 704, overhead: 684)
Cycles count: 18 (minimal was: 703, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
--- ok ---
Title: Re: Test of timing code
Post by: Gunther on May 08, 2013, 07:32:34 AM
Hi Alex,

here are my results:

Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

Integer looped code test

Cycles count: 68 (minimal was: 314, overhead: 246)
Cycles count: 70 (minimal was: 319, overhead: 246)
Cycles count: 76 (minimal was: 322, overhead: 246)
Cycles count: 68 (minimal was: 317, overhead: 246)
Cycles count: 76 (minimal was: 322, overhead: 246)
Cycles count: 70 (minimal was: 317, overhead: 246)
Cycles count: 68 (minimal was: 315, overhead: 246)
Cycles count: 70 (minimal was: 317, overhead: 246)
Cycles count: 68 (minimal was: 322, overhead: 246)
Cycles count: 76 (minimal was: 322, overhead: 246)


SSE code test

Cycles count: 18 (minimal was: 265, overhead: 246)
Cycles count: 16 (minimal was: 263, overhead: 246)
Cycles count: 18 (minimal was: 264, overhead: 246)
Cycles count: 18 (minimal was: 264, overhead: 246)
Cycles count: 16 (minimal was: 266, overhead: 246)
Cycles count: 16 (minimal was: 264, overhead: 246)
Cycles count: 18 (minimal was: 264, overhead: 246)
Cycles count: 16 (minimal was: 266, overhead: 246)
Cycles count: 18 (minimal was: 264, overhead: 246)
Cycles count: 16 (minimal was: 262, overhead: 246)
--- ok ---

I hope that helps a bit.

Gunther
Title: Re: Test of timing code
Post by: Antariy on May 08, 2013, 10:06:51 AM
Thank you all very much for your tests :biggrin: Thank you, thank you! :biggrin:


We all of course noticed that on more or less modern CPUs there is something strange with timings: they are obviously inadequate - can you imagine that the loop of code that consists 10+ instructions, and it gets 8 iterations, plus prologue of the function, epilogue (pops/ret etc) call to it, all this takes ~20 cycles? Or even for SSE code - the timings are unbelievable small - well, yes, SSE is a fast beast, but just think on it: how can ~20 instructions take only 2 cycles to be called, processed, returned? It is just impossible technically.

Well, my guess is that every CPU which has out of order possibility and the prediction logic, i.e. PPro and above, just "plays" a short-cut, some kind of history - when we set the high loop count to get average, we just force CPU to be smart and after, let's say, couple of thousands of repeats it just show us how good is its prediction logic in the particular generation. So, there we see such things that PIII have pretty good stability of timings, PIV - not so good, Core family does have increasing inadequateness of reported timings with every new generation.

Just imagine such an algorighm of short-cut: some code gets called, CPU fethed it, decoded it, executed it. This first time run will be the slowest, but now CPU have the code ready to be run just in microops. Also CPU tracks the history - where the code has accessed the memory to, so, next time the CPU checks - if the code accesses the same memory location again, then if that region is not "dirty" i.e. it's not marked as changed, the CPU knows that the data is the same, and if it knows (and it knows, of course :biggrin: especially if the testing data is smaller than one page) that the code will not access beyond the same area as it was before, so the code will produce the same results as before, and there is no reason to execute it again - it just returns the previously tracked result. So, instead of execution time we have the time that CPU's logic has checked the execution context for being the same as previously used, and returned the predicted result.

This is rough example, probably, but you know what I mean.


There I tried to solve the main problem that we have with the timings measurement: the influence of OS' background work if loop counter specified for timing macro is small, the influence of advanced CPUs logic when the loop counter specified for timing macro is high.

If we use high loop counter, we get "better" average, but counting a lot of superfluous stuff like background drivers execution for every hardware interrupt (every HDD I/O operation - the entire virtual memory subsystem based on HDD operations, so it can have influence even if there is at first look nothing to cause an I/O operations; mouse movements, network adapters and so on). With high loop counts this stuff gets divided between loops, so, we have more or less right timings for not out-of-order / prediction logic CPUs. For more or less modern CPUs we have inadequate timings - just because it's not execution time but rather prediction time - as I guessed above. I.e., the timings macro work well and does its job, but the CPU became to play smart games, so, the measurement that made with timing code is not that we want to measure.

If we use small loop counter - we get inadequate timings just because if with high counts those background OS work was divided by high counter to get average, with small counts we have hundreds or thousands of background (mostly driver's) cycles divided by small value, so, the timings do not show any useful info at all.

So, we need use as small loop count as possible, but at the same time avoid influence of multitasking environment.

The code below tries to use the smallest possible loop count just by getting the smallest time that tested code got executed. The logic is: we run the code inside timing frame, then we check total timing with a value that we think is the minimal possible loop count for the union of timing code and tested code. If the current timing measurement is bigger than that minimal, then it means that the minimal is too small, or the OS has interrupted our tested code somewhere in the testing time. First issue we fix with consequent increasing of the minimal value we check with, second issue we fix just by re-executing timing frame again. Statistically there should be a much timespace when we will run the test without any iterrupts, and this method catches that timespace. As you noticed it doesn't bother much even with dropping a timeslice after a test/printing results - and still it works, because, for instance, "20 Cycles" here meant that the frame timing code + tested code have got ran just 20 loops: starting from an minimal (for testing frame it is overhead) to the minimal + loops count. I.e. not hundrends, not thousands, not millions of loops that force CPU to show us its advanced techniques of the out of order logic, but just 20.
The minimal - the overhead - is computed just by execution "empty" timing loop and starting to check it with zero minimal. Since the timing loops consists mostly from not OoO code (it has cpuid), the value we measure will be pretty stable thing and will be used as the minimal at the further measurements to avoid excessive measurement loops (so eleminate CPU's short-cuts as much as possible).


This meant that for the timing frame, let's say, 100 clocks long (actual number is the "minimal was: XXX" printed in this testbed), we take only 2000 of CPU cycles for entire process of a timing measurement, that with modern CPUs meant no more than one microsecond - the OS' timislice is much bigger, so, we obviously will find a "free time" when nothing is interrupting our test. At the same time - 20...100...(needs checking) loops is not so much to get CPU's advanced logic involved, at least not so much as with high loop counts.




Now the drawbacks. If the tested code runs longer that some maximal count of cycles, then the test will be interrupted by the OS, and having small loop count of measurements, we will probably have inadequate results.
If we will guess a timeslice for user code as 250 milliseconds, so, the rough maximum for the example above will be ~5000 clock cycles, i.e., if tested code runs near to or longer than 5000 cycles, the timings will not be stable.

So, this is not absolutely universal timing code, it's rather the solution for the measurement under modern CPUs and/or for very fast algos that just may not be adequately measured with the timing method we are currently wide using in our testbeds. So, it's something like "quantum measurement" for lighthing fast algos :biggrin:


I think the tests proved that this method is good, so I probably will try to make these as a macroses.

Many thanks to Michael (MichaelW) for his timings macro! And to Jochen (jj2007), had been talking with him I thought out this idea, now it needs wide testing in format of simplified usage as a macro :biggrin:
Title: Re: Test of timing code
Post by: Antariy on May 08, 2013, 10:32:44 AM
And here we can see that CPU's prediction logic works best of all with SSE code - well, this seems pretty logically - SSE does not bother with flags nor is used for anything else "unpredictable-driven" things like unpredictable runtime changements are. The GPR code like MOV EAX,[somewhere] / ... / MOV ECX,[EAX + EDX] is hard thing to predict, such things are rare and almost unused with SSE code, there we rather have consequent access to a memory block and simple math with the XMM regs that doesn't relate to execution path at all, so, if CPU knows that the data is not changed (the cache line / page is not marked as a dirty) from previous call, it just plays short-cut and returns predicted result - that's the reason for "proud" 2 clock cycles timings for a complex code :biggrin:

This timing method shows that even on very power and modern CPUs real performace of SSE is far from "near zero" cycles for a bunch of a depending instructions like tested code is. And this obviously should be so.

Hm... Note for Jochen: how do you think, will you-know-who register here to sprinkle all of us with his hydrophobic saliva, now, seeing these results, for instance:

Quote from: habran on May 08, 2013, 06:21:37 AM
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)

Integer looped code test

Cycles count: 46 (minimal was: 108, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)

...

SSE code test

Cycles count: 36 (minimal was: 94, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)
...

--- ok ---


:greensml:

After reading such an impolite note it will be hard question for him: to register, or not to register, that's the question :greensml:

(note for those who don't know what I'm talking about to Jochen: there is someone who will maybe disappointed to see that his SSE code is not way faster than some old-fashioned i386 code, both being running on very modern model of CPU, at least no one test shows it running at "zero" clock cycles :lol:)
Title: Re: Test of timing code
Post by: Antariy on May 08, 2013, 11:43:23 AM
BTW, im my big post two posts above here:

> If we will guess a timeslice for user code as 250 milliseconds

It is mistypo: not miiliseconds but microseconds I meant - the other numbers are right since they were computed for microseconds, that's just a mistypo.


Also, by results of testing you may see interesting details, for instance:

Quote from: habran on May 08, 2013, 06:21:37 AM
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)

Integer looped code test

Cycles count: 46 (minimal was: 108, overhead: 58) 1)
Cycles count: 48 (minimal was: 106, overhead: 58) 2)

...

SSE code test

Cycles count: 36 (minimal was: 94, overhead: 58) 3)
Cycles count: 34 (minimal was: 93, overhead: 58) 4)
...

--- ok ---



In a test cases 1) and 4) you see that overhead + cycles count is smaller than the minimal. For instance for 1) it is 46+58=104 clock cycles took the measurement process, but the minimal was 108 cycles. What this meant? This probably meant that the test was interrupted and/or was slow down at the measurement loops with 103 <= minimal < 108, and timings were too big, so, before the minimal=104 the timings did not suit because code was slower than that, but after minimal got value 103 or 104 there was a slowdown from the OS side, right after that all have got to a "calm", the first value was grabbed that passed the comparsion with minimal to be smaller than it.

This is representative image of how timing method filters and eliminates the multitasking environment issues.
Title: Re: Test of timing code
Post by: Antariy on May 08, 2013, 03:01:37 PM
Interesting how small penalties does have P-MMX which is not a true OoO CPU so don't have deep pipelines which cause these penalties when CPUID is executed (overhead value is very small):

Quote from: FORTRANS on May 08, 2013, 01:47:23 AM
A:\>35hex2dw

Integer looped code test

Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)


The data collected in this thread is very informative, thank you all again!

Thesis
With the numbers collected here we can guess that CPUs with high overhead value have deeper pipeline and/or have higher penalties for mispredictions, but for the code that is well predictable (for example is well serializable) the deep pipeline has advantages of OoO and prediction logic.

Just an analysis of the data collected:

Quote from: habran on May 08, 2013, 06:21:37 AM
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)

Integer looped code test

Cycles count: 46 (minimal was: 108, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)

SSE code test

Cycles count: 36 (minimal was: 94, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)


Not deep pipeline:
• CPU fast recovers from mispredictions;
• Very predictable and serialized code like SSE one is does not have much advantages in front of looped GPR code - the timings factor SSE vs GPR is 1,35.

Also, just a thought: maybe this is because this mobile model of the CPU does have different circuit design for the predictor logic - to be less power consuming it has smaller number of logic schemes => smaller circuit => smaller TDP than Gunther's one:




Quote from: Gunther on May 08, 2013, 07:32:34 AM

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

Integer looped code test

Cycles count: 68 (minimal was: 314, overhead: 246)
Cycles count: 70 (minimal was: 319, overhead: 246)

SSE code test

Cycles count: 18 (minimal was: 265, overhead: 246)
Cycles count: 16 (minimal was: 263, overhead: 246)



Much deeper pipeline (this paragraph goes here just because it relates to previous CPU with the same architecture, actually here would have been next paragraph about other CPU):
• CPU has bigger penalties from mispredictions;
• Well serializable code like tested SSE one is have more advantage: SSE:GPR factor is 4,25 (!).

So, it looks like this CPU should have more compex circuit => higher power consumption.

OK, that was an guess... will check it now... Here the numbers are: habran's CPU TDP is 45W, Gunther's CPU TDP is 77W - just like was guessed :biggrin: So, now we see that from timings results we can even guess some details of particular CPU's model design :biggrin:

It will be very good if John (sinsi) will kindly give us timings for his desktop CPU model :biggrin:




Quote from: anta40 on May 08, 2013, 01:32:55 AM

Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz (SSE4)

Integer looped code test

Cycles count: 36 (minimal was: 124, overhead: 87)
Cycles count: 36 (minimal was: 125, overhead: 87)

SSE code test

Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)


Deeper pipeline (overhead is 87 clocks instead of 58):
• As have already been said, CPU has bigger penalties for mispredictions and faults of OoO logic;
• Well serializable code runs better: SSE:GPR ratio is 2,4.




Quote from: hutch-- on May 08, 2013, 02:01:14 AM

Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)

Integer looped code test

Cycles count: 36 (minimal was: 261, overhead: 225)
Cycles count: 27 (minimal was: 259, overhead: 225)

SSE code test

Cycles count: 9 (minimal was: 235, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)


Deeper pipeline: SSE:GPR ration is 3.




Here you should look to a Gunther's CPU data - it fits here in this series of consequentally growing overhead = pipeline = prediction possibilities. As we remember SSE:GPR ratio is 4,25.




Quote from: Magnum on May 08, 2013, 06:38:30 AM

Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz (SSE4)

Integer looped code test

Cycles count: 90 (minimal was: 783, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)

SSE code test

Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)


Deepest pipeline in the series of test: SSE:GPR factor is 5.



So, as you can see, the collected experimental data prove the thesis from the start of this message: higher overhead value shows more complex prediction logic of CPU.
Here I used only data of Core family CPUs, but you all know that Netburst has a deep pipeline and it very badly behaves when the code is not well predictable and when it does a fault prediction, so, PIV and Celerons here show the same behaviour in the series of tests, as well as one AMD - it has small overhead (63), and SSE:GPR ratio is small, too (1,31) :biggrin:

Probably we have a good additional measurement instrument :biggrin:
Title: Re: Test of timing code
Post by: sinsi on May 08, 2013, 03:13:33 PM
Your wish is my command  :biggrin:

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)

Integer looped code test

Cycles count: 53 (minimal was: 141, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 53 (minimal was: 141, overhead: 87)
Cycles count: 53 (minimal was: 143, overhead: 87)
Cycles count: 53 (minimal was: 143, overhead: 87)
Cycles count: 53 (minimal was: 143, overhead: 87)


SSE code test

Cycles count: 9 (minimal was: 96, overhead: 87)
Cycles count: 6 (minimal was: 93, overhead: 87)
Cycles count: 9 (minimal was: 116, overhead: 87)
Cycles count: 35 (minimal was: 122, overhead: 87)
Cycles count: 35 (minimal was: 122, overhead: 87)
Cycles count: 35 (minimal was: 123, overhead: 87)
Cycles count: 35 (minimal was: 122, overhead: 87)
Cycles count: 9 (minimal was: 98, overhead: 87)
Cycles count: 35 (minimal was: 123, overhead: 87)
Cycles count: 35 (minimal was: 123, overhead: 87)
Title: Re: Test of timing code
Post by: Antariy on May 08, 2013, 03:33:26 PM
Thank you very much, John! :biggrin: Well, I'm crazy already, bored to type-type-type tonight... :greensml: Anyone can explain its behaviour if wants, but I'm going offline. At least, half of the time its SSE:GPR factor is 1,51 that perfectly fits into thesis, but sometimes there is something weird, probably desktop CPU models have so powerful prediction logic that even 9 loops is enough to get it to use short-cut, and the same is applied for the overhead computation loop, so... We strongly need more experimental data! :lol:
BTW, did you run it from an WinRAR, didn't you?

My next thought was a "threshold" value to add into testing method, but it's already for macroses...
Please, John, test the archive I posted now, it has hard-coded threshold for both algos for specifically your results. We may actually automate such things and get the timing scheme where the code get called just couple of times - this will be really quantum measurement :lol:

My results:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)

Integer looped code test

Cycles count: 56 (minimal was: 528, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
Cycles count: 48 (minimal was: 523, overhead: 472)
Cycles count: 48 (minimal was: 523, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)


SSE code test

Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
--- ok ---


Please all - test it, but if you have problems like unstable timings - remember it is because threshold is hardcoded and is not known for your particular system. It should actually make timings more stable, like it done this for my CPU, but "hardcoded" value is a hardcoded...
Title: Re: Test of timing code
Post by: sinsi on May 08, 2013, 03:58:36 PM

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)

Integer looped code test

Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 144, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 144, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 144, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)


SSE code test

Cycles count: 9 (minimal was: 119, overhead: 87)
Cycles count: 35 (minimal was: 122, overhead: 87)
Cycles count: 6 (minimal was: 120, overhead: 87)
Cycles count: 35 (minimal was: 122, overhead: 87)
Cycles count: 35 (minimal was: 122, overhead: 87)
Cycles count: 9 (minimal was: 119, overhead: 87)
Cycles count: 9 (minimal was: 119, overhead: 87)
Cycles count: 6 (minimal was: 122, overhead: 87)
Cycles count: 9 (minimal was: 121, overhead: 87)
Cycles count: 38 (minimal was: 125, overhead: 87)

BTW, for the first test I ran it from WinRAR, the first two sse tests were -23 which I ignored.
Title: Re: Test of timing code
Post by: Antariy on May 08, 2013, 04:07:58 PM
Thank you, John :biggrin: At least for GPR code the timings now are stable, still it seems something funny with highly predictable SSE code... well, it's improper thing to choose threshold manually, so this obviously needs some sharpening and automation :lol: Maybe even recalculate overhead before every measurement just like Michael's macroses do.

So, thoughts are welcome :t
Title: Re: Test of timing code
Post by: habran on May 08, 2013, 05:39:05 PM
me know nothing...  :icon_redface:
you :t
:biggrin: :biggrin: :biggrin:
Title: Re: Test of timing code
Post by: FORTRANS on May 08, 2013, 10:56:21 PM
Quote from: Antariy on May 08, 2013, 03:33:26 PM
Please all - test it,

Hi Alex,

   Ran it on the same computers as in Reply #7, and the results
look to be unchanged.

Regards,

Steve
Title: Re: Test of timing code
Post by: Antariy on May 09, 2013, 01:44:14 AM
Quote from: habran on May 08, 2013, 05:39:05 PM
me know nothing...  :icon_redface:
you :t
:biggrin: :biggrin: :biggrin:

:biggrin: :t (no, I really don't know either :icon_redface:)

Quote from: FORTRANS on May 08, 2013, 10:56:21 PM
Hi Alex,

   Ran it on the same computers as in Reply #7, and the results
look to be unchanged.

Thank you very much, Steve!


So it seems that the reason is in the John's CPU :biggrin:
Title: Re: Test of timing code
Post by: Gunther on May 09, 2013, 02:35:44 AM
Alex,

here again the results with the program from test2.zip:
Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

Integer looped code test

Cycles count: 36 (minimal was: 189, overhead: 136)
Cycles count: 48 (minimal was: 186, overhead: 136)
Cycles count: 40 (minimal was: 187, overhead: 136)
Cycles count: 44 (minimal was: 186, overhead: 136)
Cycles count: 44 (minimal was: 187, overhead: 136)
Cycles count: 40 (minimal was: 186, overhead: 136)
Cycles count: 40 (minimal was: 187, overhead: 136)
Cycles count: 56 (minimal was: 193, overhead: 136)
Cycles count: 52 (minimal was: 188, overhead: 136)
Cycles count: 28 (minimal was: 186, overhead: 136)


SSE code test

Cycles count: 8 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
Cycles count: 8 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
Cycles count: 8 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
--- ok ---

Gunther
Title: Re: Test of timing code
Post by: Antariy on May 09, 2013, 10:56:04 AM
Thank you, Gunther! :t
Title: Re: Test of timing code
Post by: habran on May 09, 2013, 11:37:10 AM
Hey Antariy,
I appreciate your enthusiasm :t
I guess you must be a young man according to your energy level :biggrin:
   
Title: Re: Test of timing code
Post by: Antariy on May 09, 2013, 03:01:16 PM
Quote from: habran on May 09, 2013, 11:37:10 AM
Hey Antariy,
I appreciate your enthusiasm :t
I guess you must be a young man according to your energy level :biggrin:
   

Thanks :biggrin:
It's an Inspiration rather than enthusiasm :biggrin: We had told about unstable timings when it comes to very small count of cycles couple years ago, too. So, it's not a new topic here.

As for this method, there is the problem in that the CPUID instruction timings are unstable, this may sounds strange, but they are. If the CPUID in the timing frame gets executed faster than it was executed when computing an overhead, this makes too small or even negative values when computing the clock cycles. It seem possible to avoid influence of the OS and to get the code executed not too big number of times, but it is still the problems in the instability of CPU's behaviour, depending on the previous and the next code, relative to the currently executed one.
Title: Re: Test of timing code
Post by: habran on May 09, 2013, 03:22:53 PM
maybe we should look at the average result rather than one execution result??? :idea:
it is frustrating that we can not force machine to do what we desire although we fuel it with ASM :icon13:


Title: Re: Test of timing code
Post by: Antariy on May 09, 2013, 03:32:07 PM
...and it makes to feel crazy when the machine shows that it is "too smart" :biggrin:

Yes, I think about range of results, not one average number, but something close to it by sense: two (or more) number which will show the probably range of the code speed, let's say: from 20 to 25 clocks at speed. It seem that getting exact number is not possible, and this becoming an issue when it comes to small cycles count (the light travelling half of a meter at the time that modern CPUs execute these 5 clocks :biggrin: :biggrin:). But it's better to have a range than nothing :lol:
Title: Re: Test of timing code
Post by: habran on May 09, 2013, 03:58:07 PM
except if we want EVERYTHING OR NOTHING 8)
Title: Re: Test of timing code
Post by: Antariy on May 11, 2013, 02:02:40 PM
Hi all :biggrin:

Here is one more test, can I ask for the timings, please? Every hardware is welcome, since timings for now will be maybe a bit other as well as representation method is more comprehensive, but modern CPUs preferable, PIV and above, since there there is bigger instability of the timings.

Code now shows the "probability" values that mean the more big is the value the more big is probability of this clocks speed. If two values are near and are bigger that other ones, then this means that the peak of the graph is inbetween, atm the info is present in the "raw" format, actually it's the question how to implement the displayment, or to leave it as it is?

You may see sometimes unbelievable values, yes, there is even "0" clocks entries, but it's OK since the code just shows everything that is statistically possible, from 0 to maximally possible clocks for the algo. But, again, one value that is much larger that other ones is the peak which shows actual performance that was in that timing frame, or two near values that larger that other ones are the two points with the peak in the graph inbetween.

The point of this raw data is that being a humans we can easily drop the unbelievable timings, and choose ones which are suitable. I.e., if you see that the biggest probability is in zero cycles entry, or near that, with the algo that far from being just couple instructions long, then you will for sure find other one or two values that are large and much larger that other ones - those values are true, so, just drop the timing that was unbelievable. Remember, the code just shows everything, it does not bothers with checking how long the tested code is etc. As a proving of this - you may see that even if there are sometimes timings with too small clock number, the other, true, values are always the next largest ones, independ of the length of the shown table (i.e. number of entries for every result).


Well, there are much complications in the timing code isolation and still it is unstable, you may see that, for example, for my CPU, timing frame because of cpuid instruction has a wide bias from 456 clocks to the 536 clocks, and from the results you can see that it obviously has step of 8 clocks. As you see, there is 80 clocks difference (and that's too much for the algos that run not too big number of clocks), so it's no wonder that there are many false results - again, that's just everything that code founds "not bigger that it should be", next step - choose what is proper, is for the human.


One may play with a LOOP_COUNT equation in the start of listing, but increasing it well lead just "smoothing" the timings - too much loops obviously make CPU to become smart and we will again get something unbelievable results like ">20 instructions long, >12 of whose are non independed, algo of 2 clocks speed".

At least, for LOOP_COUNT = 3 it seem to work (probably ::)), maybe it a bit decreases actual timings, but with LOOP_COUNT = 1 the timings are a mess - because when LOOP_COUNT = 1 there are no passes done with the minimal set to closest to true speed, but if LOOP_COUNT set to too high value - there are too much loops, because CPU's logic comes to play. Maybe LOOP_COUNT = 2 is good, too, one may need to play with it for the particular machine.

Here are the results for the source in archive:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)

Min: 456, Max: 536, Ave: 530

Integer looped code test

RAW value: 512
        Probability of clock speed:
                0 clocks        59
                8 clocks        1
                16 clocks       11
                24 clocks       161
                32 clocks       5505
                40 clocks       1719
                48 clocks       1
                56 clocks       16
RAW value: 512
        Probability of clock speed:
                0 clocks        59
                8 clocks        1
                16 clocks       11
                24 clocks       161
                32 clocks       5505
                40 clocks       1719
                48 clocks       1
                56 clocks       16
RAW value: 512
        Probability of clock speed:
                0 clocks        59
                8 clocks        1
                16 clocks       11
                24 clocks       161
                32 clocks       5505
                40 clocks       1719
                48 clocks       1
                56 clocks       16
RAW value: 512
        Probability of clock speed:
                0 clocks        59
                8 clocks        1
                16 clocks       11
                24 clocks       161
                32 clocks       5505
                40 clocks       1719
                48 clocks       1
                56 clocks       16
RAW value: 512
        Probability of clock speed:
                0 clocks        59
                8 clocks        1
                16 clocks       11
                24 clocks       161
                32 clocks       5505
                40 clocks       1719
                48 clocks       1
                56 clocks       16
RAW value: 512
        Probability of clock speed:
                0 clocks        59
                8 clocks        1
                16 clocks       11
                24 clocks       161
                32 clocks       5505
                40 clocks       1719
                48 clocks       1
                56 clocks       16
RAW value: 512
        Probability of clock speed:
                0 clocks        59
                8 clocks        1
                16 clocks       11
                24 clocks       161
                32 clocks       5505
                40 clocks       1719
                48 clocks       1
                56 clocks       16
RAW value: 512
        Probability of clock speed:
                0 clocks        59
                8 clocks        1
                16 clocks       11
                24 clocks       161
                32 clocks       5505
                40 clocks       1719
                48 clocks       1
                56 clocks       16
RAW value: 512
        Probability of clock speed:
                0 clocks        59
                8 clocks        1
                16 clocks       11
                24 clocks       161
                32 clocks       5505
                40 clocks       1719
                48 clocks       1
                56 clocks       16
RAW value: 512
        Probability of clock speed:
                0 clocks        59
                8 clocks        1
                16 clocks       11
                24 clocks       161
                32 clocks       5505
                40 clocks       1719
                48 clocks       1
                56 clocks       16


SSE code test

RAW value: 504
        Probability of clock speed:
                0 clocks        1
                8 clocks        11
                16 clocks       161
                24 clocks       5505
                32 clocks       1719
                40 clocks       1
                48 clocks       16
RAW value: 504
        Probability of clock speed:
                0 clocks        1
                8 clocks        11
                16 clocks       161
                24 clocks       5505
                32 clocks       1719
                40 clocks       1
                48 clocks       16
RAW value: 504
        Probability of clock speed:
                0 clocks        1
                8 clocks        11
                16 clocks       161
                24 clocks       5505
                32 clocks       1719
                40 clocks       1
                48 clocks       16
RAW value: 504
        Probability of clock speed:
                0 clocks        1
                8 clocks        11
                16 clocks       161
                24 clocks       5505
                32 clocks       1719
                40 clocks       1
                48 clocks       16
RAW value: 504
        Probability of clock speed:
                0 clocks        1
                8 clocks        11
                16 clocks       161
                24 clocks       5505
                32 clocks       1719
                40 clocks       1
                48 clocks       16
RAW value: 504
        Probability of clock speed:
                0 clocks        1
                8 clocks        11
                16 clocks       161
                24 clocks       5505
                32 clocks       1719
                40 clocks       1
                48 clocks       16
RAW value: 504
        Probability of clock speed:
                0 clocks        1
                8 clocks        11
                16 clocks       161
                24 clocks       5505
                32 clocks       1719
                40 clocks       1
                48 clocks       16
RAW value: 504
        Probability of clock speed:
                0 clocks        1
                8 clocks        11
                16 clocks       161
                24 clocks       5505
                32 clocks       1719
                40 clocks       1
                48 clocks       16
RAW value: 504
        Probability of clock speed:
                0 clocks        1
                8 clocks        11
                16 clocks       161
                24 clocks       5505
                32 clocks       1719
                40 clocks       1
                48 clocks       16
RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16


Almost empty test

RAW value: 480
        Probability of clock speed:
                0 clocks        5505
                8 clocks        1719
                16 clocks       1
                24 clocks       16
RAW value: 480
        Probability of clock speed:
                0 clocks        5505
                8 clocks        1719
                16 clocks       1
                24 clocks       16
RAW value: 480
        Probability of clock speed:
                0 clocks        5505
                8 clocks        1719
                16 clocks       1
                24 clocks       16
RAW value: 464
        Probability of clock speed:
                0 clocks        1
                8 clocks        16
RAW value: 480
        Probability of clock speed:
                0 clocks        5505
                8 clocks        1719
                16 clocks       1
                24 clocks       16
RAW value: 464
        Probability of clock speed:
                0 clocks        1
                8 clocks        16
RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16
RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16
RAW value: 488
        Probability of clock speed:
                0 clocks        161
                8 clocks        5505
                16 clocks       1719
                24 clocks       1
                32 clocks       16
RAW value: 464
        Probability of clock speed:
                0 clocks        1
                8 clocks        16


Empty test

RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16
RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16
RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16
RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16
RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16
RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16
RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16
RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16
RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16
RAW value: 472
        Probability of clock speed:
                0 clocks        1719
                8 clocks        1
                16 clocks       16
--- ok ---



This is a bit uncleaned version, just wanted to see how it behaves.
Testing and thoughts are welcome.

Now it should probably work as expected.

Many, many thanks to John who helped me a lot with testing this boring thing! :biggrin:
Title: Re: Test of timing code
Post by: habran on May 11, 2013, 03:54:07 PM
                10 clocks       78723
                13 clocks       36931
                15 clocks       35993
                34 clocks       7844
                36 clocks       13347
RAW value: 96
        Probability of clock speed:
                0 clocks        32
                2 clocks        18
                4 clocks        221144
                6 clocks        374794
                8 clocks        231121
                10 clocks       78723
                13 clocks       36931
                15 clocks       35993
                34 clocks       7844
                36 clocks       13347
RAW value: 96
        Probability of clock speed:
                0 clocks        32
                2 clocks        18
                4 clocks        221144
                6 clocks        374794
                8 clocks        231121
                10 clocks       78723
                13 clocks       36931
                15 clocks       35993
                34 clocks       7844
                36 clocks       13347
RAW value: 100
        Probability of clock speed:
                0 clocks        6
                2 clocks        10
                4 clocks        32
                6 clocks        18
                8 clocks        221144
                10 clocks       374794
                12 clocks       231121
                14 clocks       78723
                17 clocks       36931
                19 clocks       35993
                38 clocks       7844
                40 clocks       13347
RAW value: 96
        Probability of clock speed:
                0 clocks        32
                2 clocks        18
                4 clocks        221144
                6 clocks        374794
                8 clocks        231121
                10 clocks       78723
                13 clocks       36931
                15 clocks       35993
                34 clocks       7844
                36 clocks       13347


Almost empty test

RAW value: 98
        Probability of clock speed:
                0 clocks        10
                2 clocks        32
                4 clocks        18
                6 clocks        221144
                8 clocks        374794
                10 clocks       231121
                12 clocks       78723
                15 clocks       36931
                17 clocks       35993
                36 clocks       7844
                38 clocks       13347
RAW value: 100
        Probability of clock speed:
                0 clocks        6
                2 clocks        10
                4 clocks        32
                6 clocks        18
                8 clocks        221144
                10 clocks       374794
                12 clocks       231121
                14 clocks       78723
                17 clocks       36931
                19 clocks       35993
                38 clocks       7844
                40 clocks       13347
RAW value: 100
        Probability of clock speed:
                0 clocks        6
                2 clocks        10
                4 clocks        32
                6 clocks        18
                8 clocks        221144
                10 clocks       374794
                12 clocks       231121
                14 clocks       78723
                17 clocks       36931
                19 clocks       35993
                38 clocks       7844
                40 clocks       13347
RAW value: 100
        Probability of clock speed:
                0 clocks        6
                2 clocks        10
                4 clocks        32
                6 clocks        18
                8 clocks        221144
                10 clocks       374794
                12 clocks       231121
                14 clocks       78723
                17 clocks       36931
                19 clocks       35993
                38 clocks       7844
                40 clocks       13347
RAW value: 98
        Probability of clock speed:
                0 clocks        10
                2 clocks        32
                4 clocks        18
                6 clocks        221144
                8 clocks        374794
                10 clocks       231121
                12 clocks       78723
                15 clocks       36931
                17 clocks       35993
                36 clocks       7844
                38 clocks       13347
RAW value: 100
        Probability of clock speed:
                0 clocks        6
                2 clocks        10
                4 clocks        32
                6 clocks        18
                8 clocks        221144
                10 clocks       374794
                12 clocks       231121
                14 clocks       78723
                17 clocks       36931
                19 clocks       35993
                38 clocks       7844
                40 clocks       13347
RAW value: 100
        Probability of clock speed:
                0 clocks        6
                2 clocks        10
                4 clocks        32
                6 clocks        18
                8 clocks        221144
                10 clocks       374794
                12 clocks       231121
                14 clocks       78723
                17 clocks       36931
                19 clocks       35993
                38 clocks       7844
                40 clocks       13347
RAW value: 100
        Probability of clock speed:
                0 clocks        6
                2 clocks        10
                4 clocks        32
                6 clocks        18
                8 clocks        221144
                10 clocks       374794
                12 clocks       231121
                14 clocks       78723
                17 clocks       36931
                19 clocks       35993
                38 clocks       7844
                40 clocks       13347
RAW value: 100
        Probability of clock speed:
                0 clocks        6
                2 clocks        10
                4 clocks        32
                6 clocks        18
                8 clocks        221144
                10 clocks       374794
                12 clocks       231121
                14 clocks       78723
                17 clocks       36931
                19 clocks       35993
                38 clocks       7844
                40 clocks       13347
RAW value: 100
        Probability of clock speed:
                0 clocks        6
                2 clocks        10
                4 clocks        32
                6 clocks        18
                8 clocks        221144
                10 clocks       374794
                12 clocks       231121
                14 clocks       78723
                17 clocks       36931
                19 clocks       35993
                38 clocks       7844
                40 clocks       13347


Empty test

RAW value: 92
        Probability of clock speed:
                0 clocks        221144
                2 clocks        374794
                4 clocks        231121
                6 clocks        78723
                9 clocks        36931
                11 clocks       35993
                30 clocks       7844
                32 clocks       13347
RAW value: 92
        Probability of clock speed:
                0 clocks        221144
                2 clocks        374794
                4 clocks        231121
                6 clocks        78723
                9 clocks        36931
                11 clocks       35993
                30 clocks       7844
                32 clocks       13347
RAW value: 90
        Probability of clock speed:
                0 clocks        374794
                2 clocks        231121
                4 clocks        78723
                7 clocks        36931
                9 clocks        35993
                28 clocks       7844
                30 clocks       13347
RAW value: 92
        Probability of clock speed:
                0 clocks        221144
                2 clocks        374794
                4 clocks        231121
                6 clocks        78723
                9 clocks        36931
                11 clocks       35993
                30 clocks       7844
                32 clocks       13347
RAW value: 92
        Probability of clock speed:
                0 clocks        221144
                2 clocks        374794
                4 clocks        231121
                6 clocks        78723
                9 clocks        36931
                11 clocks       35993
                30 clocks       7844
                32 clocks       13347
RAW value: 92
        Probability of clock speed:
                0 clocks        221144
                2 clocks        374794
                4 clocks        231121
                6 clocks        78723
                9 clocks        36931
                11 clocks       35993
                30 clocks       7844
                32 clocks       13347
RAW value: 92
        Probability of clock speed:
                0 clocks        221144
                2 clocks        374794
                4 clocks        231121
                6 clocks        78723
                9 clocks        36931
                11 clocks       35993
                30 clocks       7844
                32 clocks       13347
RAW value: 90
        Probability of clock speed:
                0 clocks        374794
                2 clocks        231121
                4 clocks        78723
                7 clocks        36931
                9 clocks        35993
                28 clocks       7844
                30 clocks       13347
RAW value: 92
        Probability of clock speed:
                0 clocks        221144
                2 clocks        374794
                4 clocks        231121
                6 clocks        78723
                9 clocks        36931
                11 clocks       35993
                30 clocks       7844
                32 clocks       13347
RAW value: 92
        Probability of clock speed:
                0 clocks        221144
                2 clocks        374794
                4 clocks        231121
                6 clocks        78723
                9 clocks        36931
                11 clocks       35993
                30 clocks       7844
                32 clocks       13347
--- ok ---
Title: Re: Test of timing code
Post by: jj2007 on May 11, 2013, 04:30:47 PM
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

Min: 264, Max: 264, Ave: 264

Integer looped code test

RAW value: 324
        Probability of clock speed:
                60 clocks       1000000
RAW value: 324
        Probability of clock speed:
                60 clocks       1000000
RAW value: 324
        Probability of clock speed:
                60 clocks       1000000
RAW value: 324
        Probability of clock speed:
                60 clocks       1000000
RAW value: 324
        Probability of clock speed:
                60 clocks       1000000
RAW value: 324
        Probability of clock speed:
                60 clocks       1000000
RAW value: 324
        Probability of clock speed:
                60 clocks       1000000
RAW value: 324
        Probability of clock speed:
                60 clocks       1000000
RAW value: 324
        Probability of clock speed:
                60 clocks       1000000
RAW value: 324
        Probability of clock speed:
                60 clocks       1000000


SSE code test

RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000


Almost empty test

RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000
RAW value: 288
        Probability of clock speed:
                24 clocks       1000000


Empty test

RAW value: 264
        Probability of clock speed:
                0 clocks        1000000
RAW value: 264
        Probability of clock speed:
                0 clocks        1000000
RAW value: 276
        Probability of clock speed:
                12 clocks       1000000
RAW value: 264
        Probability of clock speed:
                0 clocks        1000000
RAW value: 264
        Probability of clock speed:
                0 clocks        1000000
RAW value: 276
        Probability of clock speed:
                12 clocks       1000000
RAW value: 264
        Probability of clock speed:
                0 clocks        1000000
RAW value: 264
        Probability of clock speed:
                0 clocks        1000000
RAW value: 276
        Probability of clock speed:
                12 clocks       1000000
RAW value: 276
        Probability of clock speed:
                12 clocks       1000000
Title: Re: Test of timing code
Post by: hutch-- on May 11, 2013, 04:39:27 PM
 :biggrin:

Now do you understand why I only ever clock an algo in real time ?  :P
Title: Re: Test of timing code
Post by: MichaelW on May 11, 2013, 05:22:38 PM
The test procedures that Agner Fog provides  here (http://www.agner.org/optimize/#testp), include some DOS apps that switch to 32-bit protected mode to do cycle-count tests, then back to 16-bit real mode to display the results. Running under DOS has the advantages that you avoid the effects of multitasking and you can disable hardware interrupts, so at least in theory your test app can have exclusive use of the CPU. So hoping to get consistent and meaningful cycle counts in a single execution of the test code, I created a DOS test app. To simplify the task I used a FreeBASIC DOS app. FreeBASIC DOS creates 32-bit protected mode DOS apps that require a DOS Extender, and for this I used Japheth's  HX DOS Extender (http://www.japheth.de/HX.html). And I ran the test from an MS-DOS 6.22 boot diskette (if anybody here remembers what a diskette is).


#include "counter.bas"

dim as double d = 123

sleep 5000

asm cli

for i as integer = 1 to 40

    COUNTER_BEGIN( 1 )
        asm
            mov ebx, 1
            mov ecx, 8
            imul ebx, ecx
        end asm
    COUNTER_END()
    print counter_cycles;chr(9);

    COUNTER_BEGIN( 1 )
        asm
            mov ebx, 1
            mov ecx, 8
            imul ebx, ecx, 1
        end asm
    COUNTER_END()
    print counter_cycles;chr(9);

    COUNTER_BEGIN( 1 )
        asm
            mov eax, 1
            mov ecx, 8
            mul ecx
        end asm
    COUNTER_END()
    print counter_cycles;chr(9);

    COUNTER_BEGIN( 1 )
        asm
            mov eax, 123
            cdq
            mov ecx, 8
            div ecx
        end asm
    COUNTER_END()
    print counter_cycles;chr(9);

    COUNTER_BEGIN( 1 )
        asm
            fldpi
            fmul QWORD PTR [d]
            fstp QWORD PTR [d]
        end asm
    COUNTER_END()
    print counter_cycles;chr(9);

next

asm sti

sleep


As you can see in the above code the macro loop count is 1 so the macros are timing only a single pass through the test code.

Results running on my P3:

-52 1   4   36  21  4   4   4   47  26
4   4   -6  47  26  4   4   5   40  26
4   4   5   40  26  4   4   5   40  26
4   4   5   40  26  4   4   5   40  26
4   4   5   40  26  4   4   5   40  26
4   4   5   40  26  4   4   5   40  26
4   4   5   40  26  4   4   5   40  26
4   4   5   40  26  4   4   5   40  26
4   4   5   40  26  4   4   5   40  26
4   4   5   40  26  4   4   5   40  26
4   4   5   40  26  4   4   5   40  26
4   4   5   40  26  4   4   5   40  26
4   4   5   40  26  4   4   5   40  26
4   4   5   40  26  4   4   5   40  26
4   4   8   29  26  4   4   8   40  26
4   4   8   29  26  4   4   8   40  26
4   4   8   29  26  4   4   8   40  26
4   4   8   29  26  4   4   8   40  26
4   4   8   29  26  4   4   8   40  26
4   4   8   29  26  4   4   8   40  26


Running on my P4 Northwood:

12  12  12  76  1048 12 12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120
12  12  12  76  120 12  12  12  76  120


I don't know how to explain the variations at the start, or the other variations in the P3 results. And note in the P4 results that the counts are all a multiple of 4. I recall reading somewhere that for the NetBurst processors the TSC counts at the "quad-pumped" speed but updates at the FSB speed.

And note that the app will run fine under the Windows XP NTVDM, but on my P3 I get results like this:

260     129     53      67      180     15     -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       40      22
4      -10     -6       40      21      4      -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       40      21
4      -10     -6       40      21      4      -8      -6       40      21
4      -9      -6       40      21      4      -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       40      21
4      -10     -6       40      22      4      -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       40      21
4      -10     -6       40      47      4      -10     -6       40      21
4      -10     -6       40      21      4      -10     -6       47      21
4      -10     -6       40      21      4      -10     -6       40      21





Title: Re: Test of timing code
Post by: Antariy on May 11, 2013, 06:16:22 PM
Thank you for test, habran and Jochen! :biggrin:


Quote from: hutch-- on May 11, 2013, 04:39:27 PM
:biggrin:

Now do you understand why I only ever clock an algo in real time ?  :P

Hi Hutch :biggrin:
Nothing wrong with this method - it works OK. The results are not "strange" but are just like scientifical - in the world of nanoseconds under OS there is no code which will measure clocks one by one, you know :biggrin: These results are strong statistics with as less possible loops under multitasking environment, and no any "guessing" or "fancy stuff".
So, real world performance will be something inbetween of the worst and the best values we get.

Test it, please, give one minute :t

And, yes, I vote for real time tests, too :t Timings are just thing to roughly see how good code performs, and especially when the code is not intended for processing big amounts of data. And this timing was specially intented to avoid the average method with many-many loops over the same data piece, when too much things depends on the CPU's prediction logic (BTW, this question was touched in the libraries vs WinAPI thread, too).


Michael, thank you for info and the code (had removed Win98 couple years ago, so obviously there is need in a boot diskette :biggrin:).
Title: Re: Test of timing code
Post by: Antariy on May 11, 2013, 06:21:40 PM
Jochen, if you decrease LOOP_COUNT to 2, or even 1, what results do you get? Your CPU seems to be very stable with no bias in the timings of the timing code, so you obviously may try to decrease that equate, maybe this can give couple of cycles more precision :biggrin:
Title: Re: Test of timing code
Post by: Antariy on May 11, 2013, 06:41:30 PM
Results for the Michael's EXE runned under XPSP2 NTVDM:

32      32     -576     56      1112    24      24      8       40      160
24      24      24      88      160     24      24      24      88      168
24      24      24      88      160     24      24      24      88      160
24      24      24      88      160     24      24      8       88      168
24      24      24      88      160     24      24      24      88      3640
24     -24      8       88      160     24      24      8       88      160
24      24      24      88      160     24      24      24      88      168
24      24      24      88      168     24      24      24      88      168
24      24      24      88      168     24      24      24      88      160
24      24      8       88      160     24      24      8       88      160
24      24      8       88      168     24      24      24      88      168
24      24      8       88      160     24     -24      24      88      168
24      24      8       88      168     24      24      24      88      160
24      24      8       88      168     24      24      24      88      168
24      24      24      88      168     24      24      8       88      168
24      24      24      88      160     24      24      24      88      168
24      24      8       88      168     24      24      24      88      160
24      24      24      88      168     24      24      24      88      168
24     -24      8       88      168     24      24      8       88      160
24      24      24      88      160     24      24      24      88      160

Title: Re: Test of timing code
Post by: Gunther on May 11, 2013, 08:55:08 PM
Alex,

here are my results:


                26 clocks       3
                27 clocks       95389
                29 clocks       2
                30 clocks       117145
                31 clocks       1
                32 clocks       6
                33 clocks       351619
                34 clocks       3
                35 clocks       10
                36 clocks       199661
                37 clocks       13
                38 clocks       15
                40 clocks       13
                41 clocks       16704
                43 clocks       29
                44 clocks       23260
                46 clocks       36
                49 clocks       74
                52 clocks       102520
                55 clocks       80465
RAW value: 127
        Probability of clock speed:
                0 clocks        208
                3 clocks        185
                5 clocks        1
                6 clocks        261
                8 clocks        3
                9 clocks        95389
                11 clocks       2
                12 clocks       117145
                13 clocks       1
                14 clocks       6
                15 clocks       351619
                16 clocks       3
                17 clocks       10
                18 clocks       199661
                19 clocks       13
                20 clocks       15
                22 clocks       13
                23 clocks       16704
                25 clocks       29
                26 clocks       23260
                28 clocks       36
                31 clocks       74
                34 clocks       102520
                37 clocks       80465
RAW value: 145
        Probability of clock speed:
                0 clocks        18
                3 clocks        50
                6 clocks        88
                9 clocks        148
                12 clocks       157
                15 clocks       185
                18 clocks       208
                21 clocks       185
                23 clocks       1
                24 clocks       261
                26 clocks       3
                27 clocks       95389
                29 clocks       2
                30 clocks       117145
                31 clocks       1
                32 clocks       6
                33 clocks       351619
                34 clocks       3
                35 clocks       10
                36 clocks       199661
                37 clocks       13
                38 clocks       15
                40 clocks       13
                41 clocks       16704
                43 clocks       29
                44 clocks       23260
                46 clocks       36
                49 clocks       74
                52 clocks       102520
                55 clocks       80465
RAW value: 145
        Probability of clock speed:
                0 clocks        18
                3 clocks        50
                6 clocks        88
                9 clocks        148
                12 clocks       157
                15 clocks       185
                18 clocks       208
                21 clocks       185
                23 clocks       1
                24 clocks       261
                26 clocks       3
                27 clocks       95389
                29 clocks       2
                30 clocks       117145
                31 clocks       1
                32 clocks       6
                33 clocks       351619
                34 clocks       3
                35 clocks       10
                36 clocks       199661
                37 clocks       13
                38 clocks       15
                40 clocks       13
                41 clocks       16704
                43 clocks       29
                44 clocks       23260
                46 clocks       36
                49 clocks       74
                52 clocks       102520
                55 clocks       80465


Empty test

RAW value: 115
        Probability of clock speed:
                0 clocks        117145
                1 clocks        1
                2 clocks        6
                3 clocks        351619
                4 clocks        3
                5 clocks        10
                6 clocks        199661
                7 clocks        13
                8 clocks        15
                10 clocks       13
                11 clocks       16704
                13 clocks       29
                14 clocks       23260
                16 clocks       36
                19 clocks       74
                22 clocks       102520
                25 clocks       80465
RAW value: 118
        Probability of clock speed:
                0 clocks        95389
                2 clocks        2
                3 clocks        117145
                4 clocks        1
                5 clocks        6
                6 clocks        351619
                7 clocks        3
                8 clocks        10
                9 clocks        199661
                10 clocks       13
                11 clocks       15
                13 clocks       13
                14 clocks       16704
                16 clocks       29
                17 clocks       23260
                19 clocks       36
                22 clocks       74
                25 clocks       102520
                28 clocks       80465
RAW value: 104
        Probability of clock speed:
                0 clocks        16704
                2 clocks        29
                3 clocks        23260
                5 clocks        36
                8 clocks        74
                11 clocks       102520
                14 clocks       80465
RAW value: 118
        Probability of clock speed:
                0 clocks        95389
                2 clocks        2
                3 clocks        117145
                4 clocks        1
                5 clocks        6
                6 clocks        351619
                7 clocks        3
                8 clocks        10
                9 clocks        199661
                10 clocks       13
                11 clocks       15
                13 clocks       13
                14 clocks       16704
                16 clocks       29
                17 clocks       23260
                19 clocks       36
                22 clocks       74
                25 clocks       102520
                28 clocks       80465
RAW value: 115
        Probability of clock speed:
                0 clocks        117145
                1 clocks        1
                2 clocks        6
                3 clocks        351619
                4 clocks        3
                5 clocks        10
                6 clocks        199661
                7 clocks        13
                8 clocks        15
                10 clocks       13
                11 clocks       16704
                13 clocks       29
                14 clocks       23260
                16 clocks       36
                19 clocks       74
                22 clocks       102520
                25 clocks       80465
RAW value: 115
        Probability of clock speed:
                0 clocks        117145
                1 clocks        1
                2 clocks        6
                3 clocks        351619
                4 clocks        3
                5 clocks        10
                6 clocks        199661
                7 clocks        13
                8 clocks        15
                10 clocks       13
                11 clocks       16704
                13 clocks       29
                14 clocks       23260
                16 clocks       36
                19 clocks       74
                22 clocks       102520
                25 clocks       80465
RAW value: 118
        Probability of clock speed:
                0 clocks        95389
                2 clocks        2
                3 clocks        117145
                4 clocks        1
                5 clocks        6
                6 clocks        351619
                7 clocks        3
                8 clocks        10
                9 clocks        199661
                10 clocks       13
                11 clocks       15
                13 clocks       13
                14 clocks       16704
                16 clocks       29
                17 clocks       23260
                19 clocks       36
                22 clocks       74
                25 clocks       102520
                28 clocks       80465
RAW value: 115
        Probability of clock speed:
                0 clocks        117145
                1 clocks        1
                2 clocks        6
                3 clocks        351619
                4 clocks        3
                5 clocks        10
                6 clocks        199661
                7 clocks        13
                8 clocks        15
                10 clocks       13
                11 clocks       16704
                13 clocks       29
                14 clocks       23260
                16 clocks       36
                19 clocks       74
                22 clocks       102520
                25 clocks       80465
RAW value: 112
        Probability of clock speed:
                0 clocks        351619
                1 clocks        3
                2 clocks        10
                3 clocks        199661
                4 clocks        13
                5 clocks        15
                7 clocks        13
                8 clocks        16704
                10 clocks       29
                11 clocks       23260
                13 clocks       36
                16 clocks       74
                19 clocks       102520
                22 clocks       80465
RAW value: 118
        Probability of clock speed:
                0 clocks        95389
                2 clocks        2
                3 clocks        117145
                4 clocks        1
                5 clocks        6
                6 clocks        351619
                7 clocks        3
                8 clocks        10
                9 clocks        199661
                10 clocks       13
                11 clocks       15
                13 clocks       13
                14 clocks       16704
                16 clocks       29
                17 clocks       23260
                19 clocks       36
                22 clocks       74
                25 clocks       102520
                28 clocks       80465
--- ok ---


Gunther
Title: Re: Test of timing code
Post by: Antariy on May 11, 2013, 11:22:29 PM
Thank you, Gunther! :biggrin:
Title: Re: Test of timing code
Post by: Gunther on May 11, 2013, 11:24:16 PM
Hi Alex,

you're welcome, my friend.

Gunther