Hi all, can I please have some timings for the attached testbed?
It's not test for the algo but rather a test of a timing method - so it will be very interesting to see how it performs on wide variety of hardware.
In this code I tried to eliminate the influence of the OS and the CPU's short-cuts for high loop counted tests.
Please, run it several times and post all results here.
Timings:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
Integer looped code test
Cycles count: 56 (minimal was: 520, overhead: 464)
Cycles count: 56 (minimal was: 520, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 56 (minimal was: 520, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
Cycles count: 64 (minimal was: 528, overhead: 464)
SSE code test
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 16 (minimal was: 480, overhead: 464)
Cycles count: 24 (minimal was: 488, overhead: 464)
--- ok ---
This method is just a rough idea so I don't know how good it is.
The timing method:
.data?
overhead dd ?
.code
mov edi,0
align 16
@@:
xor eax,eax
cpuid
rdtsc
push edx
push eax
xor eax,eax
cpuid
rdtsc
pop ecx
sub eax,ecx
pop ecx
sbb edx,ecx
inc edi
cmp eax,edi ; minimal cycles count, if the result is higher - the test has got interrupted
ja @B
mov overhead,eax
print "Integer looped code test",13,10,13,10
REPEAT 10
mov edi,overhead ; put here any starting value you think the timings may not be smaller than that
align 16
@@:
xor eax,eax
cpuid
rdtsc
push edx
push eax
invoke Axhex2dw1,offset hexstring
xor eax,eax
cpuid
rdtsc
pop ecx
sub eax,ecx
pop ecx
sbb edx,ecx
inc edi
cmp eax,edi ; minimal cycles count, if the result is higher - the test has got interrupted
ja @B
sub eax,overhead
invoke crt_printf,CTXT("Cycles count: %d (minimal was: %d, overhead: %d)",13,10),eax,edi,overhead
ENDM
The timing code calculates the smallest cycles count that its loop logic takes, and saves it as overhead.
Then timing code runs the testing piece in the same loop as it was for overhead calculation, but the starting counter to check the timing (edi) is set to a minumum - overhead, so, the code runs as less loops as possible before it gets passed the comparsion with a minimum - this should probably avoid influence of the CPU prediction logic and background OS work (managing hardware interrupts) as much as possible (probably, probably... ::))
Well, this is probably crazy idea and the timings will be unstable but it is interesting how it works :biggrin:
Tests and thoughts are welcome.
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Integer looped code test
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 37 (minimal was: 519, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 37 (minimal was: 517, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
Cycles count: 45 (minimal was: 525, overhead: 480)
SSE code test
Cycles count: 15 (minimal was: 495, overhead: 480)
Cycles count: 15 (minimal was: 496, overhead: 480)
Cycles count: 15 (minimal was: 496, overhead: 480)
Cycles count: 15 (minimal was: 499, overhead: 480)
Cycles count: 15 (minimal was: 495, overhead: 480)
Cycles count: 15 (minimal was: 497, overhead: 480)
Cycles count: 15 (minimal was: 496, overhead: 480)
Cycles count: 15 (minimal was: 495, overhead: 480)
Cycles count: 15 (minimal was: 501, overhead: 480)
Cycles count: 15 (minimal was: 495, overhead: 480)
Thank you Dave for a very fast reply :t
well - they are all over the place
that's because each test is so short :P
My idea is that this method may give more or less right results, especially on modern CPUs which have tendency to "run" tests ~20 instructions long for just couple of cycles with current testing method. This doesn't look as a possible thing just from tech point of view. This code tries to eliminate as much possible influence of CPU's prediction logic and OS' background work. Interesting - your CPU runs SSE code more stable than mine. Interesting how this method will work on very modern CPUs (if cycles count for SSE code will not be something like 2 cycles, then it probably works :lol:)
Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz (SSE4)
Integer looped code test
Cycles count: 36 (minimal was: 124, overhead: 87)
Cycles count: 36 (minimal was: 125, overhead: 87)
Cycles count: 36 (minimal was: 126, overhead: 87)
Cycles count: 36 (minimal was: 126, overhead: 87)
Cycles count: 36 (minimal was: 125, overhead: 87)
Cycles count: 36 (minimal was: 124, overhead: 87)
Cycles count: 39 (minimal was: 126, overhead: 87)
Cycles count: 36 (minimal was: 124, overhead: 87)
Cycles count: 36 (minimal was: 126, overhead: 87)
Cycles count: 36 (minimal was: 124, overhead: 87)
SSE code test
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 103, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
--- ok ---
Incredible! :biggrin:
Thank you, anta40 :t
Hi Alex,
P-III, Windows 2000; P-MMX, Windows 98; and Pentium M, WinXP.
Regards,
Steve N.
G:\WORK\TEMP>35hex2dw
???? (SSE1)
Integer looped code test
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
Cycles count: 62 (minimal was: 185, overhead: 123)
SSE code test
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
Cycles count: 29 (minimal was: 152, overhead: 123)
--- ok ---
A:\>35hex2dw
Integer looped code test
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)
SSE code test
A:\>35hex2dw
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
Integer looped code test
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
Cycles count: 59 (minimal was: 227, overhead: 168)
SSE code test
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
Cycles count: 27 (minimal was: 195, overhead: 168)
--- ok ---
Looks very promising, Alex :t
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
Integer looped code test
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
Cycles count: 46 (minimal was: 109, overhead: 63)
SSE code test
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Cycles count: 35 (minimal was: 98, overhead: 63)
Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz (SSE4)
Integer looped code test
Cycles count: 36 (minimal was: 261, overhead: 225)
Cycles count: 27 (minimal was: 259, overhead: 225)
Cycles count: 27 (minimal was: 254, overhead: 225)
Cycles count: 27 (minimal was: 257, overhead: 225)
Cycles count: 27 (minimal was: 253, overhead: 225)
Cycles count: 36 (minimal was: 261, overhead: 225)
Cycles count: 27 (minimal was: 254, overhead: 225)
Cycles count: 27 (minimal was: 253, overhead: 225)
Cycles count: 27 (minimal was: 253, overhead: 225)
Cycles count: 27 (minimal was: 252, overhead: 225)
SSE code test
Cycles count: 9 (minimal was: 235, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Cycles count: 9 (minimal was: 235, overhead: 225)
Cycles count: 9 (minimal was: 236, overhead: 225)
Cycles count: 9 (minimal was: 236, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Cycles count: 9 (minimal was: 236, overhead: 225)
Cycles count: 9 (minimal was: 235, overhead: 225)
--- ok ---
Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
Integer looped code test
Cycles count: 53 (minimal was: 515, overhead: 462)
Cycles count: 63 (minimal was: 525, overhead: 462)
Cycles count: 63 (minimal was: 525, overhead: 462)
Cycles count: 52 (minimal was: 522, overhead: 462)
Cycles count: 52 (minimal was: 524, overhead: 462)
Cycles count: 52 (minimal was: 518, overhead: 462)
Cycles count: 52 (minimal was: 520, overhead: 462)
Cycles count: 52 (minimal was: 516, overhead: 462)
Cycles count: 63 (minimal was: 525, overhead: 462)
Cycles count: 63 (minimal was: 525, overhead: 462)
SSE code test
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 484, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
Cycles count: 21 (minimal was: 483, overhead: 462)
--- ok ---
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Integer looped code test
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
Cycles count: 60 (minimal was: 312, overhead: 252)
SSE code test
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Cycles count: 24 (minimal was: 276, overhead: 252)
Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (SSE4)
Integer looped code test
Cycles count: 36 (minimal was: 306, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 45 (minimal was: 322, overhead: 243)
Cycles count: 36 (minimal was: 280, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 36 (minimal was: 311, overhead: 243)
Cycles count: 36 (minimal was: 279, overhead: 243)
Cycles count: 36 (minimal was: 300, overhead: 243)
SSE code test
Cycles count: 27 (minimal was: 320, overhead: 243)
Cycles count: 18 (minimal was: 264, overhead: 243)
Cycles count: 18 (minimal was: 351, overhead: 243)
Cycles count: 18 (minimal was: 262, overhead: 243)
Cycles count: 153 (minimal was: 396, overhead: 243)
Cycles count: 153 (minimal was: 396, overhead: 243)
Cycles count: 153 (minimal was: 396, overhead: 243)
Cycles count: 18 (minimal was: 264, overhead: 243)
Cycles count: 18 (minimal was: 261, overhead: 243)
Cycles count: 18 (minimal was: 261, overhead: 243)
--- ok ---
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
Integer looped code test
Cycles count: 46 (minimal was: 108, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 48 (minimal was: 107, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 46 (minimal was: 104, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
Cycles count: 46 (minimal was: 106, overhead: 58)
Cycles count: 48 (minimal was: 107, overhead: 58)
SSE code test
Cycles count: 36 (minimal was: 94, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 92, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)
--- ok ---
Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz (SSE4)
Integer looped code test
Cycles count: 90 (minimal was: 783, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 90 (minimal was: 785, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 108 (minimal was: 792, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 108 (minimal was: 792, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
Cycles count: 108 (minimal was: 792, overhead: 684)
SSE code test
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 704, overhead: 684)
Cycles count: 18 (minimal was: 704, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 704, overhead: 684)
Cycles count: 18 (minimal was: 703, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
--- ok ---
Hi Alex,
here are my results:
Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
Integer looped code test
Cycles count: 68 (minimal was: 314, overhead: 246)
Cycles count: 70 (minimal was: 319, overhead: 246)
Cycles count: 76 (minimal was: 322, overhead: 246)
Cycles count: 68 (minimal was: 317, overhead: 246)
Cycles count: 76 (minimal was: 322, overhead: 246)
Cycles count: 70 (minimal was: 317, overhead: 246)
Cycles count: 68 (minimal was: 315, overhead: 246)
Cycles count: 70 (minimal was: 317, overhead: 246)
Cycles count: 68 (minimal was: 322, overhead: 246)
Cycles count: 76 (minimal was: 322, overhead: 246)
SSE code test
Cycles count: 18 (minimal was: 265, overhead: 246)
Cycles count: 16 (minimal was: 263, overhead: 246)
Cycles count: 18 (minimal was: 264, overhead: 246)
Cycles count: 18 (minimal was: 264, overhead: 246)
Cycles count: 16 (minimal was: 266, overhead: 246)
Cycles count: 16 (minimal was: 264, overhead: 246)
Cycles count: 18 (minimal was: 264, overhead: 246)
Cycles count: 16 (minimal was: 266, overhead: 246)
Cycles count: 18 (minimal was: 264, overhead: 246)
Cycles count: 16 (minimal was: 262, overhead: 246)
--- ok ---
I hope that helps a bit.
Gunther
Thank you all very much for your tests :biggrin: Thank you, thank you! :biggrin:
We all of course noticed that on more or less modern CPUs there is something strange with timings: they are obviously inadequate - can you imagine that the loop of code that consists 10+ instructions, and it gets 8 iterations, plus prologue of the function, epilogue (pops/ret etc) call to it, all this takes ~20 cycles? Or even for SSE code - the timings are unbelievable small - well, yes, SSE is a fast beast, but just think on it: how can ~20 instructions take only 2 cycles to be called, processed, returned? It is just impossible technically.
Well, my guess is that every CPU which has out of order possibility and the prediction logic, i.e. PPro and above, just "plays" a short-cut, some kind of history - when we set the high loop count to get average, we just force CPU to be smart and after, let's say, couple of thousands of repeats it just show us how good is its prediction logic in the particular generation. So, there we see such things that PIII have pretty good stability of timings, PIV - not so good, Core family does have increasing inadequateness of reported timings with every new generation.
Just imagine such an algorighm of short-cut: some code gets called, CPU fethed it, decoded it, executed it. This first time run will be the slowest, but now CPU have the code ready to be run just in microops. Also CPU tracks the history - where the code has accessed the memory to, so, next time the CPU checks - if the code accesses the same memory location again, then if that region is not "dirty" i.e. it's not marked as changed, the CPU knows that the data is the same, and if it knows (and it knows, of course :biggrin: especially if the testing data is smaller than one page) that the code will not access beyond the same area as it was before, so the code will produce the same results as before, and there is no reason to execute it again - it just returns the previously tracked result. So, instead of execution time we have the time that CPU's logic has checked the execution context for being the same as previously used, and returned the predicted result.
This is rough example, probably, but you know what I mean.
There I tried to solve the main problem that we have with the timings measurement: the influence of OS' background work if loop counter specified for timing macro is small, the influence of advanced CPUs logic when the loop counter specified for timing macro is high.
If we use high loop counter, we get "better" average, but counting a lot of superfluous stuff like background drivers execution for every hardware interrupt (every HDD I/O operation - the entire virtual memory subsystem based on HDD operations, so it can have influence even if there is at first look nothing to cause an I/O operations; mouse movements, network adapters and so on). With high loop counts this stuff gets divided between loops, so, we have more or less right timings for not out-of-order / prediction logic CPUs. For more or less modern CPUs we have inadequate timings - just because it's not execution time but rather prediction time - as I guessed above. I.e., the timings macro work well and does its job, but the CPU became to play smart games, so, the measurement that made with timing code is not that we want to measure.
If we use small loop counter - we get inadequate timings just because if with high counts those background OS work was divided by high counter to get average, with small counts we have hundreds or thousands of background (mostly driver's) cycles divided by small value, so, the timings do not show any useful info at all.
So, we need use as small loop count as possible, but at the same time avoid influence of multitasking environment.
The code below tries to use the smallest possible loop count just by getting the smallest time that tested code got executed. The logic is: we run the code inside timing frame, then we check total timing with a value that we think is the minimal possible loop count for the union of timing code and tested code. If the current timing measurement is bigger than that minimal, then it means that the minimal is too small, or the OS has interrupted our tested code somewhere in the testing time. First issue we fix with consequent increasing of the minimal value we check with, second issue we fix just by re-executing timing frame again. Statistically there should be a much timespace when we will run the test without any iterrupts, and this method catches that timespace. As you noticed it doesn't bother much even with dropping a timeslice after a test/printing results - and still it works, because, for instance, "20 Cycles" here meant that the frame timing code + tested code have got ran just 20 loops: starting from an minimal (for testing frame it is overhead) to the minimal + loops count. I.e. not hundrends, not thousands, not millions of loops that force CPU to show us its advanced techniques of the out of order logic, but just 20.
The minimal - the overhead - is computed just by execution "empty" timing loop and starting to check it with zero minimal. Since the timing loops consists mostly from not OoO code (it has cpuid), the value we measure will be pretty stable thing and will be used as the minimal at the further measurements to avoid excessive measurement loops (so eleminate CPU's short-cuts as much as possible).
This meant that for the timing frame, let's say, 100 clocks long (actual number is the "minimal was: XXX" printed in this testbed), we take only 2000 of CPU cycles for entire process of a timing measurement, that with modern CPUs meant no more than one microsecond - the OS' timislice is much bigger, so, we obviously will find a "free time" when nothing is interrupting our test. At the same time - 20...100...(needs checking) loops is not so much to get CPU's advanced logic involved, at least not so much as with high loop counts.
Now the drawbacks. If the tested code runs longer that some maximal count of cycles, then the test will be interrupted by the OS, and having small loop count of measurements, we will probably have inadequate results.
If we will guess a timeslice for user code as 250 milliseconds, so, the rough maximum for the example above will be ~5000 clock cycles, i.e., if tested code runs near to or longer than 5000 cycles, the timings will not be stable.
So, this is not absolutely universal timing code, it's rather the solution for the measurement under modern CPUs and/or for very fast algos that just may not be adequately measured with the timing method we are currently wide using in our testbeds. So, it's something like "quantum measurement" for lighthing fast algos :biggrin:
I think the tests proved that this method is good, so I probably will try to make these as a macroses.
Many thanks to Michael (MichaelW) for his timings macro! And to Jochen (jj2007), had been talking with him I thought out this idea, now it needs wide testing in format of simplified usage as a macro :biggrin:
And here we can see that CPU's prediction logic works best of all with SSE code - well, this seems pretty logically - SSE does not bother with flags nor is used for anything else "unpredictable-driven" things like unpredictable runtime changements are. The GPR code like MOV EAX,[somewhere] / ... / MOV ECX,[EAX + EDX] is hard thing to predict, such things are rare and almost unused with SSE code, there we rather have consequent access to a memory block and simple math with the XMM regs that doesn't relate to execution path at all, so, if CPU knows that the data is not changed (the cache line / page is not marked as a dirty) from previous call, it just plays short-cut and returns predicted result - that's the reason for "proud" 2 clock cycles timings for a complex code :biggrin:
This timing method shows that even on very power and modern CPUs real performace of SSE is far from "near zero" cycles for a bunch of a depending instructions like tested code is. And this obviously should be so.
Hm... Note for
Jochen: how do you think, will you-know-who register here to sprinkle all of us with his hydrophobic saliva, now, seeing these results, for instance:
Quote from: habran on May 08, 2013, 06:21:37 AM
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
Integer looped code test
Cycles count: 46 (minimal was: 108, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
...
SSE code test
Cycles count: 36 (minimal was: 94, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)
...
--- ok ---
:greensml:
After reading such an impolite note it will be hard question for him: to register, or not to register, that's the question :greensml:
(note for those who don't know what I'm talking about to Jochen: there is someone who will maybe disappointed to see that his SSE code is not way faster than some old-fashioned i386 code, both being running on very modern model of CPU, at least no one test shows it running at "zero" clock cycles :lol:)
BTW, im my big post two posts above here:
> If we will guess a timeslice for user code as 250 milliseconds
It is mistypo: not miiliseconds but microseconds I meant - the other numbers are right since they were computed for microseconds, that's just a mistypo.
Also, by results of testing you may see interesting details, for instance:
Quote from: habran on May 08, 2013, 06:21:37 AM
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
Integer looped code test
Cycles count: 46 (minimal was: 108, overhead: 58) 1)
Cycles count: 48 (minimal was: 106, overhead: 58) 2)
...
SSE code test
Cycles count: 36 (minimal was: 94, overhead: 58) 3)
Cycles count: 34 (minimal was: 93, overhead: 58) 4)
...
--- ok ---
In a test cases 1) and 4) you see that overhead + cycles count is smaller than the minimal. For instance for 1) it is 46+58=104 clock cycles took the measurement process, but the minimal was 108 cycles. What this meant? This probably meant that the test was interrupted and/or was slow down at the measurement loops with 103 <= minimal < 108, and timings were too big, so, before the minimal=104 the timings did not suit because code was slower than that, but after minimal got value 103 or 104 there was a slowdown from the OS side, right after that all have got to a "calm", the first value was grabbed that passed the comparsion with minimal to be smaller than it.
This is representative image of how timing method filters and eliminates the multitasking environment issues.
Interesting how small penalties does have P-MMX which is not a true OoO CPU so don't have deep pipelines which cause these penalties when CPUID is executed (overhead value is very small):
Quote from: FORTRANS on May 08, 2013, 01:47:23 AM
A:\>35hex2dw
Integer looped code test
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 116 (minimal was: 145, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)
Cycles count: 121 (minimal was: 150, overhead: 29)
The data collected in this thread is very informative, thank you all again!
ThesisWith the numbers collected here we can guess that CPUs with high overhead value have deeper pipeline and/or have higher penalties for mispredictions, but for the code that is well predictable (for example is well serializable) the deep pipeline has advantages of OoO and prediction logic.
Just an analysis of the data collected:Quote from: habran on May 08, 2013, 06:21:37 AM
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
Integer looped code test
Cycles count: 46 (minimal was: 108, overhead: 58)
Cycles count: 48 (minimal was: 106, overhead: 58)
SSE code test
Cycles count: 36 (minimal was: 94, overhead: 58)
Cycles count: 34 (minimal was: 93, overhead: 58)
Not deep pipeline:
• CPU fast recovers from mispredictions;
• Very predictable and serialized code like SSE one is does not have much advantages in front of looped GPR code - the timings factor SSE vs GPR is 1,35.
Also, just a thought: maybe this is because this mobile model of the CPU does have different circuit design for the predictor logic - to be less power consuming it has smaller number of logic schemes => smaller circuit => smaller TDP than Gunther's one:
Quote from: Gunther on May 08, 2013, 07:32:34 AM
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
Integer looped code test
Cycles count: 68 (minimal was: 314, overhead: 246)
Cycles count: 70 (minimal was: 319, overhead: 246)
SSE code test
Cycles count: 18 (minimal was: 265, overhead: 246)
Cycles count: 16 (minimal was: 263, overhead: 246)
Much deeper pipeline (this paragraph goes here just because it relates to previous CPU with the same architecture, actually here would have been next paragraph about other CPU):
• CPU has bigger penalties from mispredictions;
• Well serializable code like tested SSE one is have more advantage: SSE:GPR factor is 4,25 (!).
So, it looks like this CPU should have more compex circuit => higher power consumption.
OK, that was an guess... will check it now... Here the numbers are: habran's CPU TDP is 45W, Gunther's CPU TDP is 77W - just like was guessed :biggrin: So, now we see that from timings results we can even guess some details of particular CPU's model design :biggrin:
It will be very good if
John (sinsi) will kindly give us timings for his desktop CPU model :biggrin:
Quote from: anta40 on May 08, 2013, 01:32:55 AM
Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz (SSE4)
Integer looped code test
Cycles count: 36 (minimal was: 124, overhead: 87)
Cycles count: 36 (minimal was: 125, overhead: 87)
SSE code test
Cycles count: 15 (minimal was: 102, overhead: 87)
Cycles count: 15 (minimal was: 102, overhead: 87)
Deeper pipeline (overhead is 87 clocks instead of 58):
• As have already been said, CPU has bigger penalties for mispredictions and faults of OoO logic;
• Well serializable code runs better: SSE:GPR ratio is 2,4.
Quote from: hutch-- on May 08, 2013, 02:01:14 AM
Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz (SSE4)
Integer looped code test
Cycles count: 36 (minimal was: 261, overhead: 225)
Cycles count: 27 (minimal was: 259, overhead: 225)
SSE code test
Cycles count: 9 (minimal was: 235, overhead: 225)
Cycles count: 9 (minimal was: 234, overhead: 225)
Deeper pipeline: SSE:GPR ration is 3.
Here you should look to a Gunther's CPU data - it fits here in this series of consequentally growing overhead = pipeline = prediction possibilities. As we remember SSE:GPR ratio is 4,25.
Quote from: Magnum on May 08, 2013, 06:38:30 AM
Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz (SSE4)
Integer looped code test
Cycles count: 90 (minimal was: 783, overhead: 684)
Cycles count: 90 (minimal was: 774, overhead: 684)
SSE code test
Cycles count: 18 (minimal was: 702, overhead: 684)
Cycles count: 18 (minimal was: 702, overhead: 684)
Deepest pipeline in the series of test: SSE:GPR factor is 5.
So, as you can see, the collected experimental data prove the thesis from the start of this message: higher overhead value shows more complex prediction logic of CPU.
Here I used only data of Core family CPUs, but you all know that Netburst has a deep pipeline and it very badly behaves when the code is not well predictable and when it does a fault prediction, so, PIV and Celerons here show the same behaviour in the series of tests, as well as one AMD - it has small overhead (63), and SSE:GPR ratio is small, too (1,31) :biggrin:
Probably we have a good additional measurement instrument :biggrin:
Your wish is my command :biggrin:
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
Integer looped code test
Cycles count: 53 (minimal was: 141, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 53 (minimal was: 141, overhead: 87)
Cycles count: 53 (minimal was: 143, overhead: 87)
Cycles count: 53 (minimal was: 143, overhead: 87)
Cycles count: 53 (minimal was: 143, overhead: 87)
SSE code test
Cycles count: 9 (minimal was: 96, overhead: 87)
Cycles count: 6 (minimal was: 93, overhead: 87)
Cycles count: 9 (minimal was: 116, overhead: 87)
Cycles count: 35 (minimal was: 122, overhead: 87)
Cycles count: 35 (minimal was: 122, overhead: 87)
Cycles count: 35 (minimal was: 123, overhead: 87)
Cycles count: 35 (minimal was: 122, overhead: 87)
Cycles count: 9 (minimal was: 98, overhead: 87)
Cycles count: 35 (minimal was: 123, overhead: 87)
Cycles count: 35 (minimal was: 123, overhead: 87)
Thank you very much, John! :biggrin: Well, I'm crazy already, bored to type-type-type tonight... :greensml: Anyone can explain its behaviour if wants, but I'm going offline. At least, half of the time its SSE:GPR factor is 1,51 that perfectly fits into thesis, but sometimes there is something weird, probably desktop CPU models have so powerful prediction logic that even 9 loops is enough to get it to use short-cut, and the same is applied for the overhead computation loop, so... We strongly need more experimental data! :lol:
BTW, did you run it from an WinRAR, didn't you?
My next thought was a "threshold" value to add into testing method, but it's already for macroses...
Please, John, test the archive I posted now, it has hard-coded threshold for both algos for specifically your results. We may actually automate such things and get the timing scheme where the code get called just couple of times - this will be really quantum measurement :lol:
My results:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
Integer looped code test
Cycles count: 56 (minimal was: 528, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
Cycles count: 48 (minimal was: 523, overhead: 472)
Cycles count: 48 (minimal was: 523, overhead: 472)
Cycles count: 48 (minimal was: 522, overhead: 472)
SSE code test
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
Cycles count: 16 (minimal was: 504, overhead: 472)
--- ok ---
Please all - test it, but if you have problems like unstable timings - remember it is because threshold is hardcoded and is not known for your particular system. It should actually make timings more stable, like it done this for my CPU, but "hardcoded" value is a hardcoded...
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
Integer looped code test
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 144, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 144, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 144, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
Cycles count: 56 (minimal was: 143, overhead: 87)
SSE code test
Cycles count: 9 (minimal was: 119, overhead: 87)
Cycles count: 35 (minimal was: 122, overhead: 87)
Cycles count: 6 (minimal was: 120, overhead: 87)
Cycles count: 35 (minimal was: 122, overhead: 87)
Cycles count: 35 (minimal was: 122, overhead: 87)
Cycles count: 9 (minimal was: 119, overhead: 87)
Cycles count: 9 (minimal was: 119, overhead: 87)
Cycles count: 6 (minimal was: 122, overhead: 87)
Cycles count: 9 (minimal was: 121, overhead: 87)
Cycles count: 38 (minimal was: 125, overhead: 87)
BTW, for the first test I ran it from WinRAR, the first two sse tests were -23 which I ignored.
Thank you, John :biggrin: At least for GPR code the timings now are stable, still it seems something funny with highly predictable SSE code... well, it's improper thing to choose threshold manually, so this obviously needs some sharpening and automation :lol: Maybe even recalculate overhead before every measurement just like Michael's macroses do.
So, thoughts are welcome :t
me know nothing... :icon_redface:
you :t
:biggrin: :biggrin: :biggrin:
Quote from: Antariy on May 08, 2013, 03:33:26 PM
Please all - test it,
Hi Alex,
Ran it on the same computers as in Reply #7, and the results
look to be unchanged.
Regards,
Steve
Quote from: habran on May 08, 2013, 05:39:05 PM
me know nothing... :icon_redface:
you :t
:biggrin: :biggrin: :biggrin:
:biggrin: :t (no, I really don't know either :icon_redface:)
Quote from: FORTRANS on May 08, 2013, 10:56:21 PM
Hi Alex,
Ran it on the same computers as in Reply #7, and the results
look to be unchanged.
Thank you very much,
Steve!
So it seems that the reason is in the John's CPU :biggrin:
Alex,
here again the results with the program from test2.zip:
Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
Integer looped code test
Cycles count: 36 (minimal was: 189, overhead: 136)
Cycles count: 48 (minimal was: 186, overhead: 136)
Cycles count: 40 (minimal was: 187, overhead: 136)
Cycles count: 44 (minimal was: 186, overhead: 136)
Cycles count: 44 (minimal was: 187, overhead: 136)
Cycles count: 40 (minimal was: 186, overhead: 136)
Cycles count: 40 (minimal was: 187, overhead: 136)
Cycles count: 56 (minimal was: 193, overhead: 136)
Cycles count: 52 (minimal was: 188, overhead: 136)
Cycles count: 28 (minimal was: 186, overhead: 136)
SSE code test
Cycles count: 8 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
Cycles count: 8 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
Cycles count: 8 (minimal was: 168, overhead: 136)
Cycles count: 12 (minimal was: 168, overhead: 136)
--- ok ---
Gunther
Thank you, Gunther! :t
Hey Antariy,
I appreciate your enthusiasm :t
I guess you must be a young man according to your energy level :biggrin:
Quote from: habran on May 09, 2013, 11:37:10 AM
Hey Antariy,
I appreciate your enthusiasm :t
I guess you must be a young man according to your energy level :biggrin:
Thanks :biggrin:
It's an Inspiration rather than enthusiasm :biggrin: We had told about unstable timings when it comes to very small count of cycles couple years ago, too. So, it's not a new topic here.
As for this method, there is the problem in that the CPUID instruction timings are unstable, this may sounds strange, but they are. If the CPUID in the timing frame gets executed faster than it was executed when computing an overhead, this makes too small or even negative values when computing the clock cycles. It seem possible to avoid influence of the OS and to get the code executed not too big number of times, but it is still the problems in the instability of CPU's behaviour, depending on the previous and the next code, relative to the currently executed one.
maybe we should look at the average result rather than one execution result??? :idea:
it is frustrating that we can not force machine to do what we desire although we fuel it with ASM :icon13:
...and it makes to feel crazy when the machine shows that it is "too smart" :biggrin:
Yes, I think about range of results, not one average number, but something close to it by sense: two (or more) number which will show the probably range of the code speed, let's say: from 20 to 25 clocks at speed. It seem that getting exact number is not possible, and this becoming an issue when it comes to small cycles count (the light travelling half of a meter at the time that modern CPUs execute these 5 clocks :biggrin: :biggrin:). But it's better to have a range than nothing :lol:
except if we want EVERYTHING OR NOTHING 8)
Hi all :biggrin:
Here is one more test, can I ask for the timings, please? Every hardware is welcome, since timings for now will be maybe a bit other as well as representation method is more comprehensive, but modern CPUs preferable, PIV and above, since there there is bigger instability of the timings.
Code now shows the "probability" values that mean the more big is the value the more big is probability of this clocks speed. If two values are near and are bigger that other ones, then this means that the peak of the graph is inbetween, atm the info is present in the "raw" format, actually it's the question how to implement the displayment, or to leave it as it is?
You may see sometimes unbelievable values, yes, there is even "0" clocks entries, but it's OK since the code just shows everything that is statistically possible, from 0 to maximally possible clocks for the algo. But, again, one value that is much larger that other ones is the peak which shows actual performance that was in that timing frame, or two near values that larger that other ones are the two points with the peak in the graph inbetween.
The point of this raw data is that being a humans we can easily drop the unbelievable timings, and choose ones which are suitable. I.e., if you see that the biggest probability is in zero cycles entry, or near that, with the algo that far from being just couple instructions long, then you will for sure find other one or two values that are large and much larger that other ones - those values are true, so, just drop the timing that was unbelievable. Remember, the code just shows everything, it does not bothers with checking how long the tested code is etc. As a proving of this - you may see that even if there are sometimes timings with too small clock number, the other, true, values are always the next largest ones, independ of the length of the shown table (i.e. number of entries for every result).
Well, there are much complications in the timing code isolation and still it is unstable, you may see that, for example, for my CPU, timing frame because of cpuid instruction has a wide bias from 456 clocks to the 536 clocks, and from the results you can see that it obviously has step of 8 clocks. As you see, there is 80 clocks difference (and that's too much for the algos that run not too big number of clocks), so it's no wonder that there are many false results - again, that's just everything that code founds "not bigger that it should be", next step - choose what is proper, is for the human.
One may play with a LOOP_COUNT equation in the start of listing, but increasing it well lead just "smoothing" the timings - too much loops obviously make CPU to become smart and we will again get something unbelievable results like ">20 instructions long, >12 of whose are non independed, algo of 2 clocks speed".
At least, for LOOP_COUNT = 3 it seem to work (probably ::)), maybe it a bit decreases actual timings, but with LOOP_COUNT = 1 the timings are a mess - because when LOOP_COUNT = 1 there are no passes done with the minimal set to closest to true speed, but if LOOP_COUNT set to too high value - there are too much loops, because CPU's logic comes to play. Maybe LOOP_COUNT = 2 is good, too, one may need to play with it for the particular machine.
Here are the results for the source in archive:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
Min: 456, Max: 536, Ave: 530
Integer looped code test
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
SSE code test
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
Almost empty test
RAW value: 480
Probability of clock speed:
0 clocks 5505
8 clocks 1719
16 clocks 1
24 clocks 16
RAW value: 480
Probability of clock speed:
0 clocks 5505
8 clocks 1719
16 clocks 1
24 clocks 16
RAW value: 480
Probability of clock speed:
0 clocks 5505
8 clocks 1719
16 clocks 1
24 clocks 16
RAW value: 464
Probability of clock speed:
0 clocks 1
8 clocks 16
RAW value: 480
Probability of clock speed:
0 clocks 5505
8 clocks 1719
16 clocks 1
24 clocks 16
RAW value: 464
Probability of clock speed:
0 clocks 1
8 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 488
Probability of clock speed:
0 clocks 161
8 clocks 5505
16 clocks 1719
24 clocks 1
32 clocks 16
RAW value: 464
Probability of clock speed:
0 clocks 1
8 clocks 16
Empty test
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
--- ok ---
This is a bit uncleaned version, just wanted to see how it behaves.
Testing and thoughts are welcome.
Now it should probably work as expected.
Many, many thanks to John who helped me a lot with testing this boring thing! :biggrin:
10 clocks 78723
13 clocks 36931
15 clocks 35993
34 clocks 7844
36 clocks 13347
RAW value: 96
Probability of clock speed:
0 clocks 32
2 clocks 18
4 clocks 221144
6 clocks 374794
8 clocks 231121
10 clocks 78723
13 clocks 36931
15 clocks 35993
34 clocks 7844
36 clocks 13347
RAW value: 96
Probability of clock speed:
0 clocks 32
2 clocks 18
4 clocks 221144
6 clocks 374794
8 clocks 231121
10 clocks 78723
13 clocks 36931
15 clocks 35993
34 clocks 7844
36 clocks 13347
RAW value: 100
Probability of clock speed:
0 clocks 6
2 clocks 10
4 clocks 32
6 clocks 18
8 clocks 221144
10 clocks 374794
12 clocks 231121
14 clocks 78723
17 clocks 36931
19 clocks 35993
38 clocks 7844
40 clocks 13347
RAW value: 96
Probability of clock speed:
0 clocks 32
2 clocks 18
4 clocks 221144
6 clocks 374794
8 clocks 231121
10 clocks 78723
13 clocks 36931
15 clocks 35993
34 clocks 7844
36 clocks 13347
Almost empty test
RAW value: 98
Probability of clock speed:
0 clocks 10
2 clocks 32
4 clocks 18
6 clocks 221144
8 clocks 374794
10 clocks 231121
12 clocks 78723
15 clocks 36931
17 clocks 35993
36 clocks 7844
38 clocks 13347
RAW value: 100
Probability of clock speed:
0 clocks 6
2 clocks 10
4 clocks 32
6 clocks 18
8 clocks 221144
10 clocks 374794
12 clocks 231121
14 clocks 78723
17 clocks 36931
19 clocks 35993
38 clocks 7844
40 clocks 13347
RAW value: 100
Probability of clock speed:
0 clocks 6
2 clocks 10
4 clocks 32
6 clocks 18
8 clocks 221144
10 clocks 374794
12 clocks 231121
14 clocks 78723
17 clocks 36931
19 clocks 35993
38 clocks 7844
40 clocks 13347
RAW value: 100
Probability of clock speed:
0 clocks 6
2 clocks 10
4 clocks 32
6 clocks 18
8 clocks 221144
10 clocks 374794
12 clocks 231121
14 clocks 78723
17 clocks 36931
19 clocks 35993
38 clocks 7844
40 clocks 13347
RAW value: 98
Probability of clock speed:
0 clocks 10
2 clocks 32
4 clocks 18
6 clocks 221144
8 clocks 374794
10 clocks 231121
12 clocks 78723
15 clocks 36931
17 clocks 35993
36 clocks 7844
38 clocks 13347
RAW value: 100
Probability of clock speed:
0 clocks 6
2 clocks 10
4 clocks 32
6 clocks 18
8 clocks 221144
10 clocks 374794
12 clocks 231121
14 clocks 78723
17 clocks 36931
19 clocks 35993
38 clocks 7844
40 clocks 13347
RAW value: 100
Probability of clock speed:
0 clocks 6
2 clocks 10
4 clocks 32
6 clocks 18
8 clocks 221144
10 clocks 374794
12 clocks 231121
14 clocks 78723
17 clocks 36931
19 clocks 35993
38 clocks 7844
40 clocks 13347
RAW value: 100
Probability of clock speed:
0 clocks 6
2 clocks 10
4 clocks 32
6 clocks 18
8 clocks 221144
10 clocks 374794
12 clocks 231121
14 clocks 78723
17 clocks 36931
19 clocks 35993
38 clocks 7844
40 clocks 13347
RAW value: 100
Probability of clock speed:
0 clocks 6
2 clocks 10
4 clocks 32
6 clocks 18
8 clocks 221144
10 clocks 374794
12 clocks 231121
14 clocks 78723
17 clocks 36931
19 clocks 35993
38 clocks 7844
40 clocks 13347
RAW value: 100
Probability of clock speed:
0 clocks 6
2 clocks 10
4 clocks 32
6 clocks 18
8 clocks 221144
10 clocks 374794
12 clocks 231121
14 clocks 78723
17 clocks 36931
19 clocks 35993
38 clocks 7844
40 clocks 13347
Empty test
RAW value: 92
Probability of clock speed:
0 clocks 221144
2 clocks 374794
4 clocks 231121
6 clocks 78723
9 clocks 36931
11 clocks 35993
30 clocks 7844
32 clocks 13347
RAW value: 92
Probability of clock speed:
0 clocks 221144
2 clocks 374794
4 clocks 231121
6 clocks 78723
9 clocks 36931
11 clocks 35993
30 clocks 7844
32 clocks 13347
RAW value: 90
Probability of clock speed:
0 clocks 374794
2 clocks 231121
4 clocks 78723
7 clocks 36931
9 clocks 35993
28 clocks 7844
30 clocks 13347
RAW value: 92
Probability of clock speed:
0 clocks 221144
2 clocks 374794
4 clocks 231121
6 clocks 78723
9 clocks 36931
11 clocks 35993
30 clocks 7844
32 clocks 13347
RAW value: 92
Probability of clock speed:
0 clocks 221144
2 clocks 374794
4 clocks 231121
6 clocks 78723
9 clocks 36931
11 clocks 35993
30 clocks 7844
32 clocks 13347
RAW value: 92
Probability of clock speed:
0 clocks 221144
2 clocks 374794
4 clocks 231121
6 clocks 78723
9 clocks 36931
11 clocks 35993
30 clocks 7844
32 clocks 13347
RAW value: 92
Probability of clock speed:
0 clocks 221144
2 clocks 374794
4 clocks 231121
6 clocks 78723
9 clocks 36931
11 clocks 35993
30 clocks 7844
32 clocks 13347
RAW value: 90
Probability of clock speed:
0 clocks 374794
2 clocks 231121
4 clocks 78723
7 clocks 36931
9 clocks 35993
28 clocks 7844
30 clocks 13347
RAW value: 92
Probability of clock speed:
0 clocks 221144
2 clocks 374794
4 clocks 231121
6 clocks 78723
9 clocks 36931
11 clocks 35993
30 clocks 7844
32 clocks 13347
RAW value: 92
Probability of clock speed:
0 clocks 221144
2 clocks 374794
4 clocks 231121
6 clocks 78723
9 clocks 36931
11 clocks 35993
30 clocks 7844
32 clocks 13347
--- ok ---
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Min: 264, Max: 264, Ave: 264
Integer looped code test
RAW value: 324
Probability of clock speed:
60 clocks 1000000
RAW value: 324
Probability of clock speed:
60 clocks 1000000
RAW value: 324
Probability of clock speed:
60 clocks 1000000
RAW value: 324
Probability of clock speed:
60 clocks 1000000
RAW value: 324
Probability of clock speed:
60 clocks 1000000
RAW value: 324
Probability of clock speed:
60 clocks 1000000
RAW value: 324
Probability of clock speed:
60 clocks 1000000
RAW value: 324
Probability of clock speed:
60 clocks 1000000
RAW value: 324
Probability of clock speed:
60 clocks 1000000
RAW value: 324
Probability of clock speed:
60 clocks 1000000
SSE code test
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
Almost empty test
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
RAW value: 288
Probability of clock speed:
24 clocks 1000000
Empty test
RAW value: 264
Probability of clock speed:
0 clocks 1000000
RAW value: 264
Probability of clock speed:
0 clocks 1000000
RAW value: 276
Probability of clock speed:
12 clocks 1000000
RAW value: 264
Probability of clock speed:
0 clocks 1000000
RAW value: 264
Probability of clock speed:
0 clocks 1000000
RAW value: 276
Probability of clock speed:
12 clocks 1000000
RAW value: 264
Probability of clock speed:
0 clocks 1000000
RAW value: 264
Probability of clock speed:
0 clocks 1000000
RAW value: 276
Probability of clock speed:
12 clocks 1000000
RAW value: 276
Probability of clock speed:
12 clocks 1000000
:biggrin:
Now do you understand why I only ever clock an algo in real time ? :P
The test procedures that Agner Fog provides here (http://www.agner.org/optimize/#testp), include some DOS apps that switch to 32-bit protected mode to do cycle-count tests, then back to 16-bit real mode to display the results. Running under DOS has the advantages that you avoid the effects of multitasking and you can disable hardware interrupts, so at least in theory your test app can have exclusive use of the CPU. So hoping to get consistent and meaningful cycle counts in a single execution of the test code, I created a DOS test app. To simplify the task I used a FreeBASIC DOS app. FreeBASIC DOS creates 32-bit protected mode DOS apps that require a DOS Extender, and for this I used Japheth's HX DOS Extender (http://www.japheth.de/HX.html). And I ran the test from an MS-DOS 6.22 boot diskette (if anybody here remembers what a diskette is).
#include "counter.bas"
dim as double d = 123
sleep 5000
asm cli
for i as integer = 1 to 40
COUNTER_BEGIN( 1 )
asm
mov ebx, 1
mov ecx, 8
imul ebx, ecx
end asm
COUNTER_END()
print counter_cycles;chr(9);
COUNTER_BEGIN( 1 )
asm
mov ebx, 1
mov ecx, 8
imul ebx, ecx, 1
end asm
COUNTER_END()
print counter_cycles;chr(9);
COUNTER_BEGIN( 1 )
asm
mov eax, 1
mov ecx, 8
mul ecx
end asm
COUNTER_END()
print counter_cycles;chr(9);
COUNTER_BEGIN( 1 )
asm
mov eax, 123
cdq
mov ecx, 8
div ecx
end asm
COUNTER_END()
print counter_cycles;chr(9);
COUNTER_BEGIN( 1 )
asm
fldpi
fmul QWORD PTR [d]
fstp QWORD PTR [d]
end asm
COUNTER_END()
print counter_cycles;chr(9);
next
asm sti
sleep
As you can see in the above code the macro loop count is 1 so the macros are timing only a single pass through the test code.
Results running on my P3:
-52 1 4 36 21 4 4 4 47 26
4 4 -6 47 26 4 4 5 40 26
4 4 5 40 26 4 4 5 40 26
4 4 5 40 26 4 4 5 40 26
4 4 5 40 26 4 4 5 40 26
4 4 5 40 26 4 4 5 40 26
4 4 5 40 26 4 4 5 40 26
4 4 5 40 26 4 4 5 40 26
4 4 5 40 26 4 4 5 40 26
4 4 5 40 26 4 4 5 40 26
4 4 5 40 26 4 4 5 40 26
4 4 5 40 26 4 4 5 40 26
4 4 5 40 26 4 4 5 40 26
4 4 5 40 26 4 4 5 40 26
4 4 8 29 26 4 4 8 40 26
4 4 8 29 26 4 4 8 40 26
4 4 8 29 26 4 4 8 40 26
4 4 8 29 26 4 4 8 40 26
4 4 8 29 26 4 4 8 40 26
4 4 8 29 26 4 4 8 40 26
Running on my P4 Northwood:
12 12 12 76 1048 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
12 12 12 76 120 12 12 12 76 120
I don't know how to explain the variations at the start, or the other variations in the P3 results. And note in the P4 results that the counts are all a multiple of 4. I recall reading somewhere that for the NetBurst processors the TSC counts at the "quad-pumped" speed but updates at the FSB speed.
And note that the app will run fine under the Windows XP NTVDM, but on my P3 I get results like this:
260 129 53 67 180 15 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 40 22
4 -10 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 21 4 -8 -6 40 21
4 -9 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 22 4 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 40 21
4 -10 -6 40 47 4 -10 -6 40 21
4 -10 -6 40 21 4 -10 -6 47 21
4 -10 -6 40 21 4 -10 -6 40 21
Thank you for test,
habran and
Jochen! :biggrin:
Quote from: hutch-- on May 11, 2013, 04:39:27 PM
:biggrin:
Now do you understand why I only ever clock an algo in real time ? :P
Hi
Hutch :biggrin:
Nothing wrong with this method - it works OK. The results are not "strange" but are just like scientifical - in the world of nanoseconds under OS there is no code which will measure clocks one by one, you know :biggrin: These results are strong statistics with as less possible loops under multitasking environment, and no any "guessing" or "fancy stuff".
So, real world performance will be something inbetween of the worst and the best values we get.
Test it, please, give one minute :t
And, yes, I vote for real time tests, too :t Timings are just thing to roughly see how good code performs, and especially when the code is not intended for processing big amounts of data. And this timing was specially intented to avoid the average method with many-many loops over the same data piece, when too much things depends on the CPU's prediction logic (BTW, this question was touched in the libraries vs WinAPI thread, too).
Michael, thank you for info and the code (had removed Win98 couple years ago, so obviously there is need in a boot diskette :biggrin:).
Jochen, if you decrease LOOP_COUNT to 2, or even 1, what results do you get? Your CPU seems to be very stable with no bias in the timings of the timing code, so you obviously may try to decrease that equate, maybe this can give couple of cycles more precision :biggrin:
Results for the Michael's EXE runned under XPSP2 NTVDM:
32 32 -576 56 1112 24 24 8 40 160
24 24 24 88 160 24 24 24 88 168
24 24 24 88 160 24 24 24 88 160
24 24 24 88 160 24 24 8 88 168
24 24 24 88 160 24 24 24 88 3640
24 -24 8 88 160 24 24 8 88 160
24 24 24 88 160 24 24 24 88 168
24 24 24 88 168 24 24 24 88 168
24 24 24 88 168 24 24 24 88 160
24 24 8 88 160 24 24 8 88 160
24 24 8 88 168 24 24 24 88 168
24 24 8 88 160 24 -24 24 88 168
24 24 8 88 168 24 24 24 88 160
24 24 8 88 168 24 24 24 88 168
24 24 24 88 168 24 24 8 88 168
24 24 24 88 160 24 24 24 88 168
24 24 8 88 168 24 24 24 88 160
24 24 24 88 168 24 24 24 88 168
24 -24 8 88 168 24 24 8 88 160
24 24 24 88 160 24 24 24 88 160
Alex,
here are my results:
26 clocks 3
27 clocks 95389
29 clocks 2
30 clocks 117145
31 clocks 1
32 clocks 6
33 clocks 351619
34 clocks 3
35 clocks 10
36 clocks 199661
37 clocks 13
38 clocks 15
40 clocks 13
41 clocks 16704
43 clocks 29
44 clocks 23260
46 clocks 36
49 clocks 74
52 clocks 102520
55 clocks 80465
RAW value: 127
Probability of clock speed:
0 clocks 208
3 clocks 185
5 clocks 1
6 clocks 261
8 clocks 3
9 clocks 95389
11 clocks 2
12 clocks 117145
13 clocks 1
14 clocks 6
15 clocks 351619
16 clocks 3
17 clocks 10
18 clocks 199661
19 clocks 13
20 clocks 15
22 clocks 13
23 clocks 16704
25 clocks 29
26 clocks 23260
28 clocks 36
31 clocks 74
34 clocks 102520
37 clocks 80465
RAW value: 145
Probability of clock speed:
0 clocks 18
3 clocks 50
6 clocks 88
9 clocks 148
12 clocks 157
15 clocks 185
18 clocks 208
21 clocks 185
23 clocks 1
24 clocks 261
26 clocks 3
27 clocks 95389
29 clocks 2
30 clocks 117145
31 clocks 1
32 clocks 6
33 clocks 351619
34 clocks 3
35 clocks 10
36 clocks 199661
37 clocks 13
38 clocks 15
40 clocks 13
41 clocks 16704
43 clocks 29
44 clocks 23260
46 clocks 36
49 clocks 74
52 clocks 102520
55 clocks 80465
RAW value: 145
Probability of clock speed:
0 clocks 18
3 clocks 50
6 clocks 88
9 clocks 148
12 clocks 157
15 clocks 185
18 clocks 208
21 clocks 185
23 clocks 1
24 clocks 261
26 clocks 3
27 clocks 95389
29 clocks 2
30 clocks 117145
31 clocks 1
32 clocks 6
33 clocks 351619
34 clocks 3
35 clocks 10
36 clocks 199661
37 clocks 13
38 clocks 15
40 clocks 13
41 clocks 16704
43 clocks 29
44 clocks 23260
46 clocks 36
49 clocks 74
52 clocks 102520
55 clocks 80465
Empty test
RAW value: 115
Probability of clock speed:
0 clocks 117145
1 clocks 1
2 clocks 6
3 clocks 351619
4 clocks 3
5 clocks 10
6 clocks 199661
7 clocks 13
8 clocks 15
10 clocks 13
11 clocks 16704
13 clocks 29
14 clocks 23260
16 clocks 36
19 clocks 74
22 clocks 102520
25 clocks 80465
RAW value: 118
Probability of clock speed:
0 clocks 95389
2 clocks 2
3 clocks 117145
4 clocks 1
5 clocks 6
6 clocks 351619
7 clocks 3
8 clocks 10
9 clocks 199661
10 clocks 13
11 clocks 15
13 clocks 13
14 clocks 16704
16 clocks 29
17 clocks 23260
19 clocks 36
22 clocks 74
25 clocks 102520
28 clocks 80465
RAW value: 104
Probability of clock speed:
0 clocks 16704
2 clocks 29
3 clocks 23260
5 clocks 36
8 clocks 74
11 clocks 102520
14 clocks 80465
RAW value: 118
Probability of clock speed:
0 clocks 95389
2 clocks 2
3 clocks 117145
4 clocks 1
5 clocks 6
6 clocks 351619
7 clocks 3
8 clocks 10
9 clocks 199661
10 clocks 13
11 clocks 15
13 clocks 13
14 clocks 16704
16 clocks 29
17 clocks 23260
19 clocks 36
22 clocks 74
25 clocks 102520
28 clocks 80465
RAW value: 115
Probability of clock speed:
0 clocks 117145
1 clocks 1
2 clocks 6
3 clocks 351619
4 clocks 3
5 clocks 10
6 clocks 199661
7 clocks 13
8 clocks 15
10 clocks 13
11 clocks 16704
13 clocks 29
14 clocks 23260
16 clocks 36
19 clocks 74
22 clocks 102520
25 clocks 80465
RAW value: 115
Probability of clock speed:
0 clocks 117145
1 clocks 1
2 clocks 6
3 clocks 351619
4 clocks 3
5 clocks 10
6 clocks 199661
7 clocks 13
8 clocks 15
10 clocks 13
11 clocks 16704
13 clocks 29
14 clocks 23260
16 clocks 36
19 clocks 74
22 clocks 102520
25 clocks 80465
RAW value: 118
Probability of clock speed:
0 clocks 95389
2 clocks 2
3 clocks 117145
4 clocks 1
5 clocks 6
6 clocks 351619
7 clocks 3
8 clocks 10
9 clocks 199661
10 clocks 13
11 clocks 15
13 clocks 13
14 clocks 16704
16 clocks 29
17 clocks 23260
19 clocks 36
22 clocks 74
25 clocks 102520
28 clocks 80465
RAW value: 115
Probability of clock speed:
0 clocks 117145
1 clocks 1
2 clocks 6
3 clocks 351619
4 clocks 3
5 clocks 10
6 clocks 199661
7 clocks 13
8 clocks 15
10 clocks 13
11 clocks 16704
13 clocks 29
14 clocks 23260
16 clocks 36
19 clocks 74
22 clocks 102520
25 clocks 80465
RAW value: 112
Probability of clock speed:
0 clocks 351619
1 clocks 3
2 clocks 10
3 clocks 199661
4 clocks 13
5 clocks 15
7 clocks 13
8 clocks 16704
10 clocks 29
11 clocks 23260
13 clocks 36
16 clocks 74
19 clocks 102520
22 clocks 80465
RAW value: 118
Probability of clock speed:
0 clocks 95389
2 clocks 2
3 clocks 117145
4 clocks 1
5 clocks 6
6 clocks 351619
7 clocks 3
8 clocks 10
9 clocks 199661
10 clocks 13
11 clocks 15
13 clocks 13
14 clocks 16704
16 clocks 29
17 clocks 23260
19 clocks 36
22 clocks 74
25 clocks 102520
28 clocks 80465
--- ok ---
Gunther
Thank you, Gunther! :biggrin:
Hi Alex,
you're welcome, my friend.
Gunther