News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Instr timings: Boyer-Moore not working

Started by jj2007, March 07, 2014, 02:21:09 PM

Previous topic - Next topic

jj2007

Thanks a lot, Ficko. I have added your algo to the original testbed, attached below, and here are my results:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
+++++++13 of 20 tests valid, loop overhead is approx. 10/3 cycles

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_DO_NOT_RESET_RM
_AT_NEXT_START
28115   kCycles for 3 * MB Instr_
32154   kCycles for 3 * InString
29185   kCycles for 3 * crt_strstr
18713   kCycles for 3 * BMHBinSearch, using len
7549    kCycles for 3 * BMHBinsearch, known length
9636    kCycles for 3 * InstrJJ
60151   kCycles for 3 * InstrFicko
5172    kCycles for 3 * RevInstrFicko
4894    kCycles for 3 * MB Rinstr

28012   kCycles for 3 * MB Instr_
32150   kCycles for 3 * InString
29138   kCycles for 3 * crt_strstr
18742   kCycles for 3 * BMHBinSearch, using len
7538    kCycles for 3 * BMHBinsearch, known length
9577    kCycles for 3 * InstrJJ
60022   kCycles for 3 * InstrFicko
5156    kCycles for 3 * RevInstrFicko
4900    kCycles for 3 * MB Rinstr

2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMHBinSearch, using len
2026935 = eax BMHBinsearch, known length
2026935 = eax InstrJJ
2026935 = eax InstrFicko
2026935 = eax RevInstrFicko
2026935 = eax MB Rinstr

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
245     kCycles for 3 * MB Instr_
8609    kCycles for 3 * InString
237     kCycles for 3 * crt_strstr
11329   kCycles for 3 * BMHBinSearch, using len
120     kCycles for 3 * BMHBinsearch, known length
57      kCycles for 3 * InstrJJ
5011    kCycles for 3 * InstrFicko
5172    kCycles for 3 * RevInstrFicko
4875    kCycles for 3 * MB Rinstr

245     kCycles for 3 * MB Instr_
8611    kCycles for 3 * InString
237     kCycles for 3 * crt_strstr
11332   kCycles for 3 * BMHBinSearch, using len
118     kCycles for 3 * BMHBinsearch, known length
58      kCycles for 3 * InstrJJ
5011    kCycles for 3 * InstrFicko
5186    kCycles for 3 * RevInstrFicko
4935    kCycles for 3 * MB Rinstr

18057   = eax MB Instr_
18057   = eax InString
18057   = eax crt_strstr
18057   = eax BMHBinSearch, using len
18057   = eax BMHBinsearch, known length
18057   = eax InstrJJ
18057   = eax InstrFicko
2028751 = eax RevInstrFicko
2028751 = eax MB Rinstr

Testing if [This is a simple string which has at the...] contains Dupli
2869    kCycles for 10000 * MB Instr_
3168    kCycles for 10000 * InString
2433    kCycles for 10000 * crt_strstr
6243    kCycles for 10000 * BMHBinSearch, using len
5298    kCycles for 10000 * BMHBinsearch, known length
869     kCycles for 10000 * InstrJJ
4091    kCycles for 10000 * InstrFicko
1974    kCycles for 10000 * RevInstrFicko
1171    kCycles for 10000 * MB Rinstr

2881    kCycles for 10000 * MB Instr_
3177    kCycles for 10000 * InString
2433    kCycles for 10000 * crt_strstr
6366    kCycles for 10000 * BMHBinSearch, using len
5329    kCycles for 10000 * BMHBinsearch, known length
870     kCycles for 10000 * InstrJJ
4085    kCycles for 10000 * InstrFicko
1974    kCycles for 10000 * RevInstrFicko
1171    kCycles for 10000 * MB Rinstr


Note that I replaced the not-so-reliable standard Boyer-Moore algo with its Horspool variant, in two flavours, one using len() to determine the size of the source, the other using a precalculated source length.

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
++++++++++++++++4 of 20 tests valid, loop overhead is approx. 11/3 cycles

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
39387   kCycles for 3 * MB Instr_
57139   kCycles for 3 * InString
45535   kCycles for 3 * crt_strstr
31791   kCycles for 3 * BMHBinSearch, using len
12079   kCycles for 3 * BMHBinsearch, known length
23119   kCycles for 3 * InstrJJ
85090   kCycles for 3 * InstrFicko
6215    kCycles for 3 * RevInstrFicko
5637    kCycles for 3 * MB Rinstr

39401   kCycles for 3 * MB Instr_
56685   kCycles for 3 * InString
45497   kCycles for 3 * crt_strstr
31761   kCycles for 3 * BMHBinSearch, using len
11845   kCycles for 3 * BMHBinsearch, known length
23208   kCycles for 3 * InstrJJ
85101   kCycles for 3 * InstrFicko
6475    kCycles for 3 * RevInstrFicko
5527    kCycles for 3 * MB Rinstr

2026771 = eax MB Instr_
2026771 = eax InString
2026771 = eax crt_strstr
2026771 = eax BMHBinSearch, using len
2026771 = eax BMHBinsearch, known length
2026771 = eax InstrJJ
2026771 = eax InstrFicko
2026771 = eax RevInstrFicko
2026771 = eax MB Rinstr

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
336     kCycles for 3 * MB Instr_
12614   kCycles for 3 * InString
340     kCycles for 3 * crt_strstr
19855   kCycles for 3 * BMHBinSearch, using len
172     kCycles for 3 * BMHBinsearch, known length
142     kCycles for 3 * InstrJJ
5976    kCycles for 3 * InstrFicko
6208    kCycles for 3 * RevInstrFicko
5294    kCycles for 3 * MB Rinstr

335     kCycles for 3 * MB Instr_
12458   kCycles for 3 * InString
370     kCycles for 3 * crt_strstr
19813   kCycles for 3 * BMHBinSearch, using len
170     kCycles for 3 * BMHBinsearch, known length
142     kCycles for 3 * InstrJJ
5924    kCycles for 3 * InstrFicko
6269    kCycles for 3 * RevInstrFicko
5520    kCycles for 3 * MB Rinstr

17893   = eax MB Instr_
17893   = eax InString
17893   = eax crt_strstr
17893   = eax BMHBinSearch, using len
17893   = eax BMHBinsearch, known length
17893   = eax InstrJJ
17893   = eax InstrFicko
2028587 = eax RevInstrFicko
2028587 = eax MB Rinstr

Testing if [This is a simple string which has at the...] contains Dupli
4017    kCycles for 10000 * MB Instr_
4751    kCycles for 10000 * InString
3348    kCycles for 10000 * crt_strstr
9705    kCycles for 10000 * BMHBinSearch, using len
7654    kCycles for 10000 * BMHBinsearch, known length
1389    kCycles for 10000 * InstrJJ
5823    kCycles for 10000 * InstrFicko
5583    kCycles for 10000 * RevInstrFicko
1948    kCycles for 10000 * MB Rinstr

4044    kCycles for 10000 * MB Instr_
4726    kCycles for 10000 * InString
3368    kCycles for 10000 * crt_strstr
9642    kCycles for 10000 * BMHBinSearch, using len
7655    kCycles for 10000 * BMHBinsearch, known length
1390    kCycles for 10000 * InstrJJ
5974    kCycles for 10000 * InstrFicko
5585    kCycles for 10000 * RevInstrFicko
1944    kCycles for 10000 * MB Rinstr

70      = eax MB Instr_
70      = eax InString
70      = eax crt_strstr
70      = eax BMHBinSearch, using len
70      = eax BMHBinsearch, known length
70      = eax InstrJJ
70      = eax InstrFicko
70      = eax RevInstrFicko
70      = eax MB Rinstr

Siekmanski

[codeIntel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
+++++++++++++++++++1 of 20 tests valid, loop overhead is approx. 15/3 cycles

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
15985   kCycles for 3 * MB Instr_
27849   kCycles for 3 * InString
22156   kCycles for 3 * crt_strstr
14938   kCycles for 3 * BMHBinSearch, using len
6891    kCycles for 3 * BMHBinsearch, known length
6804    kCycles for 3 * InstrJJ
43312   kCycles for 3 * InstrFicko
1695    kCycles for 3 * RevInstrFicko
1368    kCycles for 3 * MB Rinstr

15967   kCycles for 3 * MB Instr_
27955   kCycles for 3 * InString
22148   kCycles for 3 * crt_strstr
14943   kCycles for 3 * BMHBinSearch, using len
6900    kCycles for 3 * BMHBinsearch, known length
6802    kCycles for 3 * InstrJJ
43323   kCycles for 3 * InstrFicko
1691    kCycles for 3 * RevInstrFicko
1371    kCycles for 3 * MB Rinstr

2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMHBinSearch, using len
2026935 = eax BMHBinsearch, known length
2026935 = eax InstrJJ
2026935 = eax InstrFicko
2026935 = eax RevInstrFicko
2026935 = eax MB Rinstr

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
145     kCycles for 3 * MB Instr_
8892    kCycles for 3 * InString
171     kCycles for 3 * crt_strstr
8151    kCycles for 3 * BMHBinSearch, using len
78      kCycles for 3 * BMHBinsearch, known length
21      kCycles for 3 * InstrJJ
4158    kCycles for 3 * InstrFicko
2351    kCycles for 3 * RevInstrFicko
1844    kCycles for 3 * MB Rinstr

143     kCycles for 3 * MB Instr_
5014    kCycles for 3 * InString
167     kCycles for 3 * crt_strstr
8134    kCycles for 3 * BMHBinSearch, using len
74      kCycles for 3 * BMHBinsearch, known length
19      kCycles for 3 * InstrJJ
1592    kCycles for 3 * InstrFicko
1730    kCycles for 3 * RevInstrFicko
1357    kCycles for 3 * MB Rinstr

18057   = eax MB Instr_
18057   = eax InString
18057   = eax crt_strstr
18057   = eax BMHBinSearch, using len
18057   = eax BMHBinsearch, known length
18057   = eax InstrJJ
18057   = eax InstrFicko
2028751 = eax RevInstrFicko
2028751 = eax MB Rinstr

Testing if [This is a simple string which has at the...] contains Dupli
1825    kCycles for 10000 * MB Instr_
2745    kCycles for 10000 * InString
1525    kCycles for 10000 * crt_strstr
3327    kCycles for 10000 * BMHBinSearch, using len
2402    kCycles for 10000 * BMHBinsearch, known length
520     kCycles for 10000 * InstrJJ
2895    kCycles for 10000 * InstrFicko
2707    kCycles for 10000 * RevInstrFicko
828     kCycles for 10000 * MB Rinstr

1842    kCycles for 10000 * MB Instr_
2746    kCycles for 10000 * InString
1532    kCycles for 10000 * crt_strstr
3354    kCycles for 10000 * BMHBinSearch, using len
2409    kCycles for 10000 * BMHBinsearch, known length
523     kCycles for 10000 * InstrJJ
2897    kCycles for 10000 * InstrFicko
2705    kCycles for 10000 * RevInstrFicko
835     kCycles for 10000 * MB Rinstr

70      = eax MB Instr_
70      = eax InString
70      = eax crt_strstr
70      = eax BMHBinSearch, using len
70      = eax BMHBinsearch, known length
70      = eax InstrJJ
70      = eax InstrFicko
70      = eax RevInstrFicko
70      = eax MB Rinstr

--- ok ---]
Creative coders use backward thinking techniques as a strategy.

sinsi

Maybe you should test a deliberate failure? I would imagine that testing strings would give lots of fails i.e. substring not found.

jj2007

Quote from: sinsi on March 09, 2014, 07:37:25 PM
Maybe you should test a deliberate failure? I would imagine that testing strings would give lots of fails i.e. substring not found.
Good idea - "No such string" in Windows.inc:

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains No such string
28235   kCycles for 3 * MB Instr_
30764   kCycles for 3 * InString
28484   kCycles for 3 * crt_strstr
30098   kCycles for 3 * BMHBinSearch, using len
18890   kCycles for 3 * BMHBinsearch, known length
8827    kCycles for 3 * InstrJJ
54616   kCycles for 3 * InstrFicko
36272   kCycles for 3 * RevInstrFicko
30595   kCycles for 3 * MB Rinstr

28191   kCycles for 3 * MB Instr_
30792   kCycles for 3 * InString
28334   kCycles for 3 * crt_strstr
30071   kCycles for 3 * BMHBinSearch, using len
19009   kCycles for 3 * BMHBinsearch, known length
8826    kCycles for 3 * InstrJJ
54628   kCycles for 3 * InstrFicko
36266   kCycles for 3 * RevInstrFicko
30599   kCycles for 3 * MB Rinstr

Gunther

Jochen,

the timings from 0803:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
+++++++++++++++++++1 of 20 tests valid, loop overhead is approx. 6/3 cycles

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
14064   kCycles for 3 * MB Instr_
24877   kCycles for 3 * InString
19507   kCycles for 3 * crt_strstr
13258   kCycles for 3 * BMHBinSearch, using len
6090    kCycles for 3 * BMHBinsearch, known length
6102    kCycles for 3 * InstrJJ
38735   kCycles for 3 * InstrFicko
3651    kCycles for 3 * RevInstrFicko
1460    kCycles for 3 * MB Rinstr

15733   kCycles for 3 * MB Instr_
24878   kCycles for 3 * InString
23220   kCycles for 3 * crt_strstr
13145   kCycles for 3 * BMHBinSearch, using len
6102    kCycles for 3 * BMHBinsearch, known length
6154    kCycles for 3 * InstrJJ
38324   kCycles for 3 * InstrFicko
1499    kCycles for 3 * RevInstrFicko
1164    kCycles for 3 * MB Rinstr

2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMHBinSearch, using len
2026935 = eax BMHBinsearch, known length
2026935 = eax InstrJJ
2026935 = eax InstrFicko
2026935 = eax RevInstrFicko
2026935 = eax MB Rinstr

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
124     kCycles for 3 * MB Instr_
4392    kCycles for 3 * InString
147     kCycles for 3 * crt_strstr
7152    kCycles for 3 * BMHBinSearch, using len
68      kCycles for 3 * BMHBinsearch, known length
42      kCycles for 3 * InstrJJ
3423    kCycles for 3 * InstrFicko
2005    kCycles for 3 * RevInstrFicko
1501    kCycles for 3 * MB Rinstr

125     kCycles for 3 * MB Instr_
7601    kCycles for 3 * InString
148     kCycles for 3 * crt_strstr
10462   kCycles for 3 * BMHBinSearch, using len
89      kCycles for 3 * BMHBinsearch, known length
41      kCycles for 3 * InstrJJ
3503    kCycles for 3 * InstrFicko
3667    kCycles for 3 * RevInstrFicko
1454    kCycles for 3 * MB Rinstr

18057   = eax MB Instr_
18057   = eax InString
18057   = eax crt_strstr
18057   = eax BMHBinSearch, using len
18057   = eax BMHBinsearch, known length
18057   = eax InstrJJ
18057   = eax InstrFicko
2028751 = eax RevInstrFicko
2028751 = eax MB Rinstr

Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains No such string

14289   kCycles for 3 * MB Instr_
23227   kCycles for 3 * InString
21122   kCycles for 3 * crt_strstr
21344   kCycles for 3 * BMHBinSearch, using len
17299   kCycles for 3 * BMHBinsearch, known length
5317    kCycles for 3 * InstrJJ
35784   kCycles for 3 * InstrFicko
16121   kCycles for 3 * RevInstrFicko
15147   kCycles for 3 * MB Rinstr

14124   kCycles for 3 * MB Instr_
23151   kCycles for 3 * InString
18370   kCycles for 3 * crt_strstr
21374   kCycles for 3 * BMHBinSearch, using len
14369   kCycles for 3 * BMHBinsearch, known length
5282    kCycles for 3 * InstrJJ
33821   kCycles for 3 * InstrFicko
16071   kCycles for 3 * RevInstrFicko
18405   kCycles for 3 * MB Rinstr

0       = eax MB Instr_
0       = eax InString
0       = eax crt_strstr
0       = eax BMHBinSearch, using len
0       = eax BMHBinsearch, known length
0       = eax InstrJJ
0       = eax InstrFicko
0       = eax RevInstrFicko
0       = eax MB Rinstr

Testing if [This is a simple string which has at the...] contains Dupli
1567    kCycles for 10000 * MB Instr_
2370    kCycles for 10000 * InString
1332    kCycles for 10000 * crt_strstr
2814    kCycles for 10000 * BMHBinSearch, using len
2039    kCycles for 10000 * BMHBinsearch, known length
1180    kCycles for 10000 * InstrJJ
6366    kCycles for 10000 * InstrFicko
2789    kCycles for 10000 * RevInstrFicko
698     kCycles for 10000 * MB Rinstr

1571    kCycles for 10000 * MB Instr_
2389    kCycles for 10000 * InString
1319    kCycles for 10000 * crt_strstr
2804    kCycles for 10000 * BMHBinSearch, using len
4874    kCycles for 10000 * BMHBinsearch, known length
481     kCycles for 10000 * InstrJJ
2591    kCycles for 10000 * InstrFicko
2420    kCycles for 10000 * RevInstrFicko
747     kCycles for 10000 * MB Rinstr

70      = eax MB Instr_
70      = eax InString
70      = eax crt_strstr
70      = eax BMHBinSearch, using len
70      = eax BMHBinsearch, known length
70      = eax InstrJJ
70      = eax InstrFicko
70      = eax RevInstrFicko
70      = eax MB Rinstr

--- ok ---


Gunther
You have to know the facts before you can distort them.

jj2007

Thanks, Gunther. Your results show clearly that getting the source len is not a realistic option for the Boyer-Moore algos:

7152    kCycles for 3 * BMHBinSearch, using len
68      kCycles for 3 * BMHBinsearch, known length
42      kCycles for 3 * InstrJJ

It gives me some ideas  :P

Gunther

Jochen,

Quote from: jj2007 on March 09, 2014, 11:38:59 PM
It gives me some ideas  :P

for MB or assembly language in general?

Gunther
You have to know the facts before you can distort them.

MichaelW

Quote from: Ficko on March 09, 2014, 03:35:35 AM
Like in my case with 8 cores the clock will not likely to change during the measurement.
But having 4 cores only making 1 core running on 100% would perhaps force clocking fluctuation.

I am not an AMD expert but may it is possible to force programmatically the CPU to run on a certain clock or not to change its clocking.

The timing in clock cycles is independent of the clock speed.
Well Microsoft, here's another nice mess you've gotten us into.