You'll probably just have to live with the 'outliers'... 
...
Yeah, the cycle count should'nt change I would think.
The point is that I
don't live with the outliers, because I do eliminate the 40% slowest. And yet, there are runs that are
systematically some % slower. Same code, same position, so there is absolutely no reason why some fild [esp] ... fistp myvar (for example) sequence should take 80x 77 cycles in the first run, 80x 83 in the second and 80x 77 again in the third run. In all three runs, there are a few outliers with 99 or even 200 cycles, but they are thrown out by sorting and eliminating the slower ones.
It's not logical.
I attach a new version that allows (by pressing y at the end) to export the sorted arrays of timings. Have a look at the "tails", but also at the start of each array.
In theory,
the fastest should be the "true" result, because a cpu can't just drop some cycles if it feels in the right mood, right?
Sample top results (the 0 0 is a bug that I still have to fix):
Sorted timings for finding 'Duplicate' with crt strstr
0 0
1 2494306
2 2518611
3 2526306
Sorted timings for finding 'Duplicate' with crt strstr
0 0
1 2471897
2 2532000
3 2538188
Sorted timings for finding 'Duplicate' with crt strstr
0 0
1 2514656
2 2552965
3 2554821
Sorted timings for finding 'Duplicate' with crt strstr
0 0
1 2515663
2 2536230
3 2539026
Sorted timings for finding 'Duplicate' with crt strstr
0 0
1 2520319
2 2551161
3 2557537
Corresponding 'tails' (note #4, 10189333!):
198 4024483
199 4029707
200 4168568
2567 kCycles for finding 'Duplicate' with crt strstr
198 4039726
199 4358796
200 4849407
2568 kCycles for finding 'Duplicate' with crt strstr
198 3059861
199 3060097
200 3106652
2563 kCycles for finding 'Duplicate' with crt strstr
198 6868355
199 10076800
200 10189333
2588 kCycles for finding 'Duplicate' with crt strstr
198 4001836
199 4147888
200 4317080
2567 kCycles for finding 'Duplicate' with crt strstr