Hi all

Here is one more test, can I ask for the timings, please? Every hardware is welcome, since timings for now will be maybe a bit other as well as representation method is more comprehensive, but modern CPUs preferable, PIV and above, since there there is bigger instability of the timings.
Code now shows the "probability" values that mean the more big is the value the more big is probability of this clocks speed. If two values are near and are bigger that other ones, then this means that the peak of the graph is inbetween, atm the info is present in the "raw" format, actually it's the question how to implement the displayment, or to leave it as it is?
You may see sometimes unbelievable values, yes, there is even "0" clocks entries, but it's OK since the code just shows everything that is statistically possible, from 0 to maximally possible clocks for the algo. But, again, one value that is much larger that other ones is the peak which shows actual performance that was in that timing frame, or two near values that larger that other ones are the two points with the peak in the graph inbetween.
The point of this raw data is that being a humans we can easily drop the unbelievable timings, and choose ones which are suitable. I.e., if you see that the biggest probability is in zero cycles entry, or near that, with the algo that far from being just couple instructions long, then you will for sure find other one or two values that are large and much larger that other ones - those values are true, so, just drop the timing that was unbelievable. Remember, the code just shows everything, it does not bothers with checking how long the tested code is etc. As a proving of this - you may see that even if there are sometimes timings with too small clock number, the other, true, values are always the next largest ones, independ of the length of the shown table (i.e. number of entries for every result).
Well, there are much complications in the timing code isolation and still it is unstable, you may see that, for example, for my CPU, timing frame because of cpuid instruction has a wide bias from 456 clocks to the 536 clocks, and from the results you can see that it obviously has step of 8 clocks. As you see, there is 80 clocks difference (and that's too much for the algos that run not too big number of clocks), so it's no wonder that there are many false results - again, that's just everything that code founds "not bigger that it should be", next step - choose what is proper, is for the human.
One may play with a LOOP_COUNT equation in the start of listing, but increasing it well lead just "smoothing" the timings - too much loops obviously make CPU to become smart and we will again get something unbelievable results like ">20 instructions long, >12 of whose are non independed, algo of 2 clocks speed".
At least, for LOOP_COUNT = 3 it seem to work (probably ::)), maybe it a bit decreases actual timings, but with LOOP_COUNT = 1 the timings are a mess - because when LOOP_COUNT = 1 there are no passes done with the minimal set to closest to true speed, but if LOOP_COUNT set to too high value - there are too much loops, because CPU's logic comes to play. Maybe LOOP_COUNT = 2 is good, too, one may need to play with it for the particular machine.
Here are the results for the source in archive:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
Min: 456, Max: 536, Ave: 530
Integer looped code test
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
RAW value: 512
Probability of clock speed:
0 clocks 59
8 clocks 1
16 clocks 11
24 clocks 161
32 clocks 5505
40 clocks 1719
48 clocks 1
56 clocks 16
SSE code test
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 504
Probability of clock speed:
0 clocks 1
8 clocks 11
16 clocks 161
24 clocks 5505
32 clocks 1719
40 clocks 1
48 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
Almost empty test
RAW value: 480
Probability of clock speed:
0 clocks 5505
8 clocks 1719
16 clocks 1
24 clocks 16
RAW value: 480
Probability of clock speed:
0 clocks 5505
8 clocks 1719
16 clocks 1
24 clocks 16
RAW value: 480
Probability of clock speed:
0 clocks 5505
8 clocks 1719
16 clocks 1
24 clocks 16
RAW value: 464
Probability of clock speed:
0 clocks 1
8 clocks 16
RAW value: 480
Probability of clock speed:
0 clocks 5505
8 clocks 1719
16 clocks 1
24 clocks 16
RAW value: 464
Probability of clock speed:
0 clocks 1
8 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 488
Probability of clock speed:
0 clocks 161
8 clocks 5505
16 clocks 1719
24 clocks 1
32 clocks 16
RAW value: 464
Probability of clock speed:
0 clocks 1
8 clocks 16
Empty test
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
RAW value: 472
Probability of clock speed:
0 clocks 1719
8 clocks 1
16 clocks 16
--- ok ---
This is a bit uncleaned version, just wanted to see how it behaves.
Testing and thoughts are welcome.
Now it should probably work as expected.
Many, many thanks to
John who helped me a lot with testing this boring thing!
