News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Instr timings: Boyer-Moore not working

Started by jj2007, March 07, 2014, 02:21:09 PM

Previous topic - Next topic

jj2007

I've done some timings for pattern find algos, and stumbled over a problem with BMBinSearch:

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
++++++++++++++++++++
41877   kCycles for 10 * MB Instr_
58159   kCycles for 10 * InString
57885   kCycles for 10 * crt_strstr
27089   kCycles for 10 * BMBinSearch
30342   kCycles for 10 * BMHBinsearch
14482   kCycles for 10 * InstrJJ

41882   kCycles for 10 * MB Instr_
58191   kCycles for 10 * InString
57842   kCycles for 10 * crt_strstr
27063   kCycles for 10 * BMBinSearch
30348   kCycles for 10 * BMHBinsearch
14515   kCycles for 10 * InstrJJ

41941   kCycles for 10 * MB Instr_
58122   kCycles for 10 * InString
57875   kCycles for 10 * crt_strstr
27137   kCycles for 10 * BMBinSearch
30288   kCycles for 10 * BMHBinsearch
14887   kCycles for 10 * InstrJJ

2110    kCycles for 10000 * MB Instr_
2540    kCycles for 10000 * InString
2224    kCycles for 10000 * crt_strstr
4144    kCycles for 10000 * BMBinSearch
3784    kCycles for 10000 * BMHBinsearch
788     kCycles for 10000 * InstrJJ

2110    kCycles for 10000 * MB Instr_
2546    kCycles for 10000 * InString
2218    kCycles for 10000 * crt_strstr
4245    kCycles for 10000 * BMBinSearch
3794    kCycles for 10000 * BMHBinsearch
789     kCycles for 10000 * InstrJJ

2188    kCycles for 10000 * MB Instr_
2541    kCycles for 10000 * InString
2218    kCycles for 10000 * crt_strstr
4133    kCycles for 10000 * BMBinSearch
3787    kCycles for 10000 * BMHBinsearch
795     kCycles for 10000 * InstrJJ

13      bytes for MB Instr_
10      bytes for InString
15      bytes for crt_strstr
19      bytes for BMBinSearch
19      bytes for BMHBinsearch
311     bytes for InstrJJ

39      = eax MB Instr_
39      = eax InString
39      = eax crt_strstr
-1      = eax BMBinSearch
38      = eax BMHBinsearch
39      = eax InstrJJ


Problem:

39      = eax crt_strstr
-1      = eax BMBinSearch
38      = eax BMHBinsearch


What am I doing wrong here?
   invoke BMBinSearch,0, SearchString, Len(SearchString), chr$(Pattern), 5

The Boyer-Moore-Horspool variant returns a valid match, using the same invoke line.

hutch--

You may not be doing anything wrong, the BM variants are very hard to exhaustively verify and this is after testing them on hundreds of million of string variations.

jj2007

#2
OK. I added an old algo of mine above, a bit bloated but the timings look competitive ;-)

EDIT: The attached new version yields some surprising results... ::)

Inter alia, in round 2 ("contains TXFS_") all algos return the correct result, but InString and the two Boyer-Moore variants need a lot of time for doing so... a factor 25 for Masm32 InString vs CRT strstr!

I have no clue why that is the case. Any ideas, or different results?

The comment * -= is Windows.inc and WinExtra.inc combined - more than 2 MB to mitigate cache effects.

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
++++++++++++++++++++
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_DO_NOT_RESET_RM_AT_NE
XT_START
31716   kCycles for 3 * MB Instr_
46678   kCycles for 3 * InString
45015   kCycles for 3 * crt_strstr
15563   kCycles for 3 * BMBinSearch
15980   kCycles for 3 * BMHBinsearch
12553   kCycles for 3 * InstrJJ

31322   kCycles for 3 * MB Instr_
46642   kCycles for 3 * InString
45696   kCycles for 3 * crt_strstr
15170   kCycles for 3 * BMBinSearch
15942   kCycles for 3 * BMHBinsearch
12535   kCycles for 3 * InstrJJ

31228   kCycles for 3 * MB Instr_
46267   kCycles for 3 * InString
45374   kCycles for 3 * crt_strstr
15143   kCycles for 3 * BMBinSearch
16418   kCycles for 3 * BMHBinsearch
12569   kCycles for 3 * InstrJJ

2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMBinSearch
2026935 = eax BMHBinsearch
2026935 = eax InstrJJ

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
251     kCycles for 3 * MB Instr_
9125    kCycles for 3 * InString
372     kCycles for 3 * crt_strstr
6391    kCycles for 3 * BMBinSearch
6416    kCycles for 3 * BMHBinsearch
74      kCycles for 3 * InstrJJ

251     kCycles for 3 * MB Instr_
8678    kCycles for 3 * InString
776     kCycles for 3 * crt_strstr
6404    kCycles for 3 * BMBinSearch
6408    kCycles for 3 * BMHBinsearch
74      kCycles for 3 * InstrJJ

252     kCycles for 3 * MB Instr_
8648    kCycles for 3 * InString
373     kCycles for 3 * crt_strstr
6412    kCycles for 3 * BMBinSearch
6483    kCycles for 3 * BMHBinsearch
76      kCycles for 3 * InstrJJ

18057   = eax MB Instr_
18057   = eax InString
18057   = eax crt_strstr
18057   = eax BMBinSearch
18057   = eax BMHBinsearch
18057   = eax InstrJJ

Testing if [This is a simple string which has toward...] contains Dupli
3497    kCycles for 10000 * MB Instr_
4322    kCycles for 10000 * InString
3980    kCycles for 10000 * crt_strstr
4953    kCycles for 10000 * BMBinSearch
5665    kCycles for 10000 * BMHBinsearch
1030    kCycles for 10000 * InstrJJ

3500    kCycles for 10000 * MB Instr_
4328    kCycles for 10000 * InString
3986    kCycles for 10000 * crt_strstr
5050    kCycles for 10000 * BMBinSearch
5379    kCycles for 10000 * BMHBinsearch
1033    kCycles for 10000 * InstrJJ

3501    kCycles for 10000 * MB Instr_
4323    kCycles for 10000 * InString
3980    kCycles for 10000 * crt_strstr
4949    kCycles for 10000 * BMBinSearch
5529    kCycles for 10000 * BMHBinsearch
1036    kCycles for 10000 * InstrJJ

75      = eax MB Instr_
75      = eax InString
75      = eax crt_strstr
0       = eax BMBinSearch
75      = eax BMHBinsearch
75      = eax InstrJJ


Same phenomenon on my Celeron M:

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
245     kCycles for 3 * MB Instr_
8814    kCycles for 3 * InString
237     kCycles for 3 * crt_strstr
4764    kCycles for 3 * BMBinSearch
4783    kCycles for 3 * BMHBinsearch
58      kCycles for 3 * InstrJJ


EDIT(2): Here is the culprit for InString (\Masm32\m32lib\instring.asm, line 45):

if 0
   mov sLen, Len(lpSource)   ; 4830 cycles for the TXFS_ case (MasmBasic Len is a bit faster than StrLen...)
else
   invoke StrLen,lpSource   ; 8690 cycles for the TXFS_ case
   mov sLen, eax           ; source length
endif


The other algos don't check the source len beforehand, and therefore are much faster if the source string is long and there is an early match.

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
++++++++++++++++++2 of 20 tests valid, loop overhead is approx. 8/3 cycles

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
39731   kCycles for 3 * MB Instr_
56594   kCycles for 3 * InString
45181   kCycles for 3 * crt_strstr
12963   kCycles for 3 * BMBinSearch
17715   kCycles for 3 * BMHBinsearch
22831   kCycles for 3 * InstrJJ

39550   kCycles for 3 * MB Instr_
56437   kCycles for 3 * InString
45368   kCycles for 3 * crt_strstr
12850   kCycles for 3 * BMBinSearch
16990   kCycles for 3 * BMHBinsearch
23049   kCycles for 3 * InstrJJ

39448   kCycles for 3 * MB Instr_
56528   kCycles for 3 * InString
45606   kCycles for 3 * crt_strstr
12935   kCycles for 3 * BMBinSearch
17348   kCycles for 3 * BMHBinsearch
23027   kCycles for 3 * InstrJJ

2026771 = eax MB Instr_
2026771 = eax InString
2026771 = eax crt_strstr
2026771 = eax BMBinSearch
2026771 = eax BMHBinsearch
2026771 = eax InstrJJ

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
338     kCycles for 3 * MB Instr_
12471   kCycles for 3 * InString
340     kCycles for 3 * crt_strstr
5087    kCycles for 3 * BMBinSearch
5153    kCycles for 3 * BMHBinsearch
144     kCycles for 3 * InstrJJ

335     kCycles for 3 * MB Instr_
12728   kCycles for 3 * InString
341     kCycles for 3 * crt_strstr
5137    kCycles for 3 * BMBinSearch
5283    kCycles for 3 * BMHBinsearch
150     kCycles for 3 * InstrJJ

341     kCycles for 3 * MB Instr_
12273   kCycles for 3 * InString
340     kCycles for 3 * crt_strstr
5117    kCycles for 3 * BMBinSearch
5492    kCycles for 3 * BMHBinsearch
146     kCycles for 3 * InstrJJ

17893   = eax MB Instr_
17893   = eax InString
17893   = eax crt_strstr
17893   = eax BMBinSearch
17893   = eax BMHBinsearch
17893   = eax InstrJJ

Testing if [This is a simple string which has toward...] contains Dupli
4225    kCycles for 10000 * MB Instr_
5270    kCycles for 10000 * InString
3536    kCycles for 10000 * crt_strstr
7601    kCycles for 10000 * BMBinSearch
8514    kCycles for 10000 * BMHBinsearch
1461    kCycles for 10000 * InstrJJ

4261    kCycles for 10000 * MB Instr_
5135    kCycles for 10000 * InString
3627    kCycles for 10000 * crt_strstr
7884    kCycles for 10000 * BMBinSearch
8420    kCycles for 10000 * BMHBinsearch
1457    kCycles for 10000 * InstrJJ

4224    kCycles for 10000 * MB Instr_
5104    kCycles for 10000 * InString
3529    kCycles for 10000 * crt_strstr
7891    kCycles for 10000 * BMBinSearch
8378    kCycles for 10000 * BMHBinsearch
1461    kCycles for 10000 * InstrJJ

75      = eax MB Instr_
75      = eax InString
75      = eax crt_strstr
0       = eax BMBinSearch
75      = eax BMHBinsearch
75      = eax InstrJJ

Gunther

Jochen,


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
++++++++++++++++++++
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
14679   kCycles for 3 * MB Instr_
25238   kCycles for 3 * InString
19752   kCycles for 3 * crt_strstr
4576    kCycles for 3 * BMBinSearch
7510    kCycles for 3 * BMHBinsearch
6096    kCycles for 3 * InstrJJ

14331   kCycles for 3 * MB Instr_
27113   kCycles for 3 * InString
22980   kCycles for 3 * crt_strstr
4748    kCycles for 3 * BMBinSearch
7450    kCycles for 3 * BMHBinsearch
6046    kCycles for 3 * InstrJJ

14210   kCycles for 3 * MB Instr_
25347   kCycles for 3 * InString
22804   kCycles for 3 * crt_strstr
4637    kCycles for 3 * BMBinSearch
7607    kCycles for 3 * BMHBinsearch
6078    kCycles for 3 * InstrJJ

2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMBinSearch
2026935 = eax BMHBinsearch
2026935 = eax InstrJJ

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
124     kCycles for 3 * MB Instr_
4489    kCycles for 3 * InString
173     kCycles for 3 * crt_strstr
1103    kCycles for 3 * BMBinSearch
1092    kCycles for 3 * BMHBinsearch
16      kCycles for 3 * InstrJJ

127     kCycles for 3 * MB Instr_
6381    kCycles for 3 * InString
170     kCycles for 3 * crt_strstr
1162    kCycles for 3 * BMBinSearch
1162    kCycles for 3 * BMHBinsearch
16      kCycles for 3 * InstrJJ

124     kCycles for 3 * MB Instr_
7405    kCycles for 3 * InString
152     kCycles for 3 * crt_strstr
1141    kCycles for 3 * BMBinSearch
1155    kCycles for 3 * BMHBinsearch
16      kCycles for 3 * InstrJJ

18057   = eax MB Instr_
18057   = eax InString
18057   = eax crt_strstr
18057   = eax BMBinSearch
18057   = eax BMHBinsearch
18057   = eax InstrJJ

Testing if [This is a simple string which has toward...] contains Dupli
1716    kCycles for 10000 * MB Instr_
2546    kCycles for 10000 * InString
1804    kCycles for 10000 * crt_strstr
2392    kCycles for 10000 * BMBinSearch
2447    kCycles for 10000 * BMHBinsearch
458     kCycles for 10000 * InstrJJ

1742    kCycles for 10000 * MB Instr_
2535    kCycles for 10000 * InString
1700    kCycles for 10000 * crt_strstr
2352    kCycles for 10000 * BMBinSearch
2425    kCycles for 10000 * BMHBinsearch
1111    kCycles for 10000 * InstrJJ

1656    kCycles for 10000 * MB Instr_
2525    kCycles for 10000 * InString
1854    kCycles for 10000 * crt_strstr
2295    kCycles for 10000 * BMBinSearch
2444    kCycles for 10000 * BMHBinsearch
1119    kCycles for 10000 * InstrJJ

75      = eax MB Instr_
75      = eax InString
75      = eax crt_strstr
0       = eax BMBinSearch
75      = eax BMHBinsearch
75      = eax InstrJJ

--- ok ---


Gunther
You have to know the facts before you can distort them.

Siekmanski

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
+++++++++++++++++++1 of 20 tests valid, loop overhead is approx. 13/3 cycles

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
18560   kCycles for 3 * MB Instr_
33473   kCycles for 3 * InString
26370   kCycles for 3 * crt_strstr
6243    kCycles for 3 * BMBinSearch
9480    kCycles for 3 * BMHBinsearch
7954    kCycles for 3 * InstrJJ

18421   kCycles for 3 * MB Instr_
33493   kCycles for 3 * InString
25844   kCycles for 3 * crt_strstr
6192    kCycles for 3 * BMBinSearch
9686    kCycles for 3 * BMHBinsearch
8163    kCycles for 3 * InstrJJ

18932   kCycles for 3 * MB Instr_
33374   kCycles for 3 * InString
26216   kCycles for 3 * crt_strstr
6265    kCycles for 3 * BMBinSearch
9498    kCycles for 3 * BMHBinsearch
8013    kCycles for 3 * InstrJJ

2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMBinSearch
2026935 = eax BMHBinsearch
2026935 = eax InstrJJ

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
143     kCycles for 3 * MB Instr_
5745    kCycles for 3 * InString
312     kCycles for 3 * crt_strstr
1798    kCycles for 3 * BMBinSearch
1400    kCycles for 3 * BMHBinsearch
20      kCycles for 3 * InstrJJ

143     kCycles for 3 * MB Instr_
6027    kCycles for 3 * InString
181     kCycles for 3 * crt_strstr
1561    kCycles for 3 * BMBinSearch
1504    kCycles for 3 * BMHBinsearch
20      kCycles for 3 * InstrJJ

143     kCycles for 3 * MB Instr_
6113    kCycles for 3 * InString
175     kCycles for 3 * crt_strstr
1692    kCycles for 3 * BMBinSearch
1504    kCycles for 3 * BMHBinsearch
19      kCycles for 3 * InstrJJ

18057   = eax MB Instr_
18057   = eax InString
18057   = eax crt_strstr
18057   = eax BMBinSearch
18057   = eax BMHBinsearch
18057   = eax InstrJJ

Testing if [This is a simple string which has toward...] contains Dupli
2166    kCycles for 10000 * MB Instr_
3191    kCycles for 10000 * InString
2224    kCycles for 10000 * crt_strstr
2962    kCycles for 10000 * BMBinSearch
3092    kCycles for 10000 * BMHBinsearch
787     kCycles for 10000 * InstrJJ

2109    kCycles for 10000 * MB Instr_
3189    kCycles for 10000 * InString
2233    kCycles for 10000 * crt_strstr
3004    kCycles for 10000 * BMBinSearch
3064    kCycles for 10000 * BMHBinsearch
585     kCycles for 10000 * InstrJJ

2194    kCycles for 10000 * MB Instr_
3162    kCycles for 10000 * InString
2214    kCycles for 10000 * crt_strstr
2927    kCycles for 10000 * BMBinSearch
3095    kCycles for 10000 * BMHBinsearch
547     kCycles for 10000 * InstrJJ

75      = eax MB Instr_
75      = eax InString
75      = eax crt_strstr
0       = eax BMBinSearch
75      = eax BMHBinsearch
75      = eax InstrJJ

--- ok ---
Creative coders use backward thinking techniques as a strategy.

Ficko

I just see no clock rate, why ?  ::)


AMD FX(tm)-8320 Eight-Core Processor            (SSE4)
++++++++++++++++++++
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
22894   kCycles for 3 * MB Instr_
42856   kCycles for 3 * InString
33888   kCycles for 3 * crt_strstr
5857    kCycles for 3 * BMBinSearch
9078    kCycles for 3 * BMHBinsearch
9763    kCycles for 3 * InstrJJ

23025   kCycles for 3 * MB Instr_
42396   kCycles for 3 * InString
33900   kCycles for 3 * crt_strstr
5982    kCycles for 3 * BMBinSearch
9084    kCycles for 3 * BMHBinsearch
9871    kCycles for 3 * InstrJJ

24064   kCycles for 3 * MB Instr_
42467   kCycles for 3 * InString
33944   kCycles for 3 * crt_strstr
5732    kCycles for 3 * BMBinSearch
8951    kCycles for 3 * BMHBinsearch
9755    kCycles for 3 * InstrJJ

2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMBinSearch
2026935 = eax BMHBinsearch
2026935 = eax InstrJJ

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
198     kCycles for 3 * MB Instr_
13271   kCycles for 3 * InString
295     kCycles for 3 * crt_strstr
1845    kCycles for 3 * BMBinSearch
1883    kCycles for 3 * BMHBinsearch
80      kCycles for 3 * InstrJJ

200     kCycles for 3 * MB Instr_
8780    kCycles for 3 * InString
296     kCycles for 3 * crt_strstr
4574    kCycles for 3 * BMBinSearch
1896    kCycles for 3 * BMHBinsearch
70      kCycles for 3 * InstrJJ

196     kCycles for 3 * MB Instr_
8768    kCycles for 3 * InString
296     kCycles for 3 * crt_strstr
5267    kCycles for 3 * BMBinSearch
1788    kCycles for 3 * BMHBinsearch
79      kCycles for 3 * InstrJJ

18057   = eax MB Instr_
18057   = eax InString
18057   = eax crt_strstr
18057   = eax BMBinSearch
18057   = eax BMHBinsearch
18057   = eax InstrJJ

Testing if [This is a simple string which has toward...] contains Dupli
2243    kCycles for 10000 * MB Instr_
3933    kCycles for 10000 * InString
2940    kCycles for 10000 * crt_strstr
3951    kCycles for 10000 * BMBinSearch
4177    kCycles for 10000 * BMHBinsearch
749     kCycles for 10000 * InstrJJ

2274    kCycles for 10000 * MB Instr_
3898    kCycles for 10000 * InString
2911    kCycles for 10000 * crt_strstr
4014    kCycles for 10000 * BMBinSearch
4104    kCycles for 10000 * BMHBinsearch
741     kCycles for 10000 * InstrJJ

2570    kCycles for 10000 * MB Instr_
3881    kCycles for 10000 * InString
2887    kCycles for 10000 * crt_strstr
3903    kCycles for 10000 * BMBinSearch
4143    kCycles for 10000 * BMHBinsearch
913     kCycles for 10000 * InstrJJ

75      = eax MB Instr_
75      = eax InString
75      = eax crt_strstr
0       = eax BMBinSearch
75      = eax BMHBinsearch
75      = eax InstrJJ

Ficko

I do not know exactly why the CPU clocking is not displayed but I do see a big bug in logic for timing multi core AMD cpus.

Quoteinvoke SetProcessAffinityMask, -1, 1   ; restrict to one core

Modern AMD multi core CPUs are running in very low clocking if the average of all cores are not above a certain % - I do not know the exact threshold but ~30% -

Mine is running at 1400Mhz but can go up to 3600Mhz on demand.

So to force the CPUs be clocked at max all cores have to be make working.
1 core 100% will not make the clocking go up.

KeepingRealBusy

That is interesting, but not a happy interesting. On a 4 core AMD, to get good timings, you have to kick off 3 cores, locked to the cores, each in a dummy loop burning 30% cpu time, then kick off your timing test locked on the the 4th core? What a bummer. What does that do to the CPU temp?

Dave.

jj2007

Quote from: Ficko on March 09, 2014, 02:24:48 AMSo to force the CPUs be clocked at max all cores have to be make working.

That's correct, but the purpose of lab benchmarks is to test relative performance of algos, not of CPUs.

Thanks to all for the interesting results. I am tempted to add InstrJJ to the MB lib, but with source & pattern len passed as parameter, i.e. not limited to zero-delimited strings. For parsing files, the source len is known, and it seems that without the test for zero bytes the algo could be 15-20% faster.

Ficko

Yes, depending what you wanna time.

If you wanna compare 2 or more algos execution time but not interested how it compares to other CPUs than the above timing can work under certain circumstances just fine.

Like in my case with 8 cores the clock will not likely to change during the measurement.
But having 4 cores only making 1 core running on 100% would perhaps force clocking fluctuation.

I am not an AMD expert but may it is possible to force programmatically the CPU to run on a certain clock or not to change its clocking.

KeepingRealBusy

Ficko,

I'll check to see what AMD says. I certainly want to run my programs as fast as I can (take as short a time as possible to run), and my algos are usually not easily able to be split into multi core threading algos (they all use huge memory buffers and if I used multi threading, I'd get killed with memory paging).

Dave.

Ficko


KeepingRealBusy


Ficko

I have added a combined "Instr/RevInstr" algo more size as speed optimized but performs well on random lenght string/pattern since it will quit if LEN(pattern) > LEN(string).


AMD FX(tm)-8320 Eight-Core Processor            (SSE4)
++++++++++++++++++++
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
22823   kCycles for 3 * MB Instr_
42185   kCycles for 3 * InString
33973   kCycles for 3 * crt_strstr
5473    kCycles for 3 * BMBinSearch
8673    kCycles for 3 * BMHBinsearch
9792    kCycles for 3 * InstrJJ
40499   kCycles for 3 * InstrFicko
1760    kCycles for 3 * RevInstrFicko

22453   kCycles for 3 * MB Instr_
42327   kCycles for 3 * InString
33933   kCycles for 3 * crt_strstr
5614    kCycles for 3 * BMBinSearch
8806    kCycles for 3 * BMHBinsearch
9729    kCycles for 3 * InstrJJ
40208   kCycles for 3 * InstrFicko
2333    kCycles for 3 * RevInstrFicko

22461   kCycles for 3 * MB Instr_
42165   kCycles for 3 * InString
33840   kCycles for 3 * crt_strstr
5849    kCycles for 3 * BMBinSearch
8807    kCycles for 3 * BMHBinsearch
9910    kCycles for 3 * InstrJJ
39540   kCycles for 3 * InstrFicko
1867    kCycles for 3 * RevInstrFicko

2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMBinSearch
2026935 = eax BMHBinsearch
2026935 = eax InstrJJ
2026935 = eax InstrFicko
2026935 = eax RevInstrFicko

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
192     kCycles for 3 * MB Instr_
12205   kCycles for 3 * InString
300     kCycles for 3 * crt_strstr
1485    kCycles for 3 * BMBinSearch
1634    kCycles for 3 * BMHBinsearch
61      kCycles for 3 * InstrJJ
1426    kCycles for 3 * InstrFicko
1580    kCycles for 3 * RevInstrFicko

194     kCycles for 3 * MB Instr_
12415   kCycles for 3 * InString
293     kCycles for 3 * crt_strstr
1469    kCycles for 3 * BMBinSearch
1605    kCycles for 3 * BMHBinsearch
79      kCycles for 3 * InstrJJ
2699    kCycles for 3 * InstrFicko
1472    kCycles for 3 * RevInstrFicko

195     kCycles for 3 * MB Instr_
11464   kCycles for 3 * InString
295     kCycles for 3 * crt_strstr
1472    kCycles for 3 * BMBinSearch
1538    kCycles for 3 * BMHBinsearch
81      kCycles for 3 * InstrJJ
1616    kCycles for 3 * InstrFicko
1586    kCycles for 3 * RevInstrFicko

18057   = eax MB Instr_
18057   = eax InString
18057   = eax crt_strstr
18057   = eax BMBinSearch
18057   = eax BMHBinsearch
18057   = eax InstrJJ
18057   = eax InstrFicko
2028751 = eax RevInstrFicko

Testing if [This is a simple string which has toward...] contains Dupli
2256    kCycles for 10000 * MB Instr_
4103    kCycles for 10000 * InString
2918    kCycles for 10000 * crt_strstr
4292    kCycles for 10000 * BMBinSearch
4371    kCycles for 10000 * BMHBinsearch
766     kCycles for 10000 * InstrJJ
3641    kCycles for 10000 * InstrFicko
2847    kCycles for 10000 * RevInstrFicko

2294    kCycles for 10000 * MB Instr_
4264    kCycles for 10000 * InString
2919    kCycles for 10000 * crt_strstr
4260    kCycles for 10000 * BMBinSearch
4542    kCycles for 10000 * BMHBinsearch
1072    kCycles for 10000 * InstrJJ
3689    kCycles for 10000 * InstrFicko
2833    kCycles for 10000 * RevInstrFicko

2236    kCycles for 10000 * MB Instr_
4074    kCycles for 10000 * InString
3002    kCycles for 10000 * crt_strstr
4284    kCycles for 10000 * BMBinSearch
4103    kCycles for 10000 * BMHBinsearch
806     kCycles for 10000 * InstrJJ
3623    kCycles for 10000 * InstrFicko
2912    kCycles for 10000 * RevInstrFicko

75      = eax MB Instr_
75      = eax InString
75      = eax crt_strstr
0       = eax BMBinSearch
75      = eax BMHBinsearch
75      = eax InstrJJ
75      = eax InstrFicko
75      = eax RevInstrFicko
--- ok ---