I've done some timings for pattern find algos, and stumbled over a problem with BMBinSearch:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
++++++++++++++++++++
41877 kCycles for 10 * MB Instr_
58159 kCycles for 10 * InString
57885 kCycles for 10 * crt_strstr
27089 kCycles for 10 * BMBinSearch
30342 kCycles for 10 * BMHBinsearch
14482 kCycles for 10 * InstrJJ
41882 kCycles for 10 * MB Instr_
58191 kCycles for 10 * InString
57842 kCycles for 10 * crt_strstr
27063 kCycles for 10 * BMBinSearch
30348 kCycles for 10 * BMHBinsearch
14515 kCycles for 10 * InstrJJ
41941 kCycles for 10 * MB Instr_
58122 kCycles for 10 * InString
57875 kCycles for 10 * crt_strstr
27137 kCycles for 10 * BMBinSearch
30288 kCycles for 10 * BMHBinsearch
14887 kCycles for 10 * InstrJJ
2110 kCycles for 10000 * MB Instr_
2540 kCycles for 10000 * InString
2224 kCycles for 10000 * crt_strstr
4144 kCycles for 10000 * BMBinSearch
3784 kCycles for 10000 * BMHBinsearch
788 kCycles for 10000 * InstrJJ
2110 kCycles for 10000 * MB Instr_
2546 kCycles for 10000 * InString
2218 kCycles for 10000 * crt_strstr
4245 kCycles for 10000 * BMBinSearch
3794 kCycles for 10000 * BMHBinsearch
789 kCycles for 10000 * InstrJJ
2188 kCycles for 10000 * MB Instr_
2541 kCycles for 10000 * InString
2218 kCycles for 10000 * crt_strstr
4133 kCycles for 10000 * BMBinSearch
3787 kCycles for 10000 * BMHBinsearch
795 kCycles for 10000 * InstrJJ
13 bytes for MB Instr_
10 bytes for InString
15 bytes for crt_strstr
19 bytes for BMBinSearch
19 bytes for BMHBinsearch
311 bytes for InstrJJ
39 = eax MB Instr_
39 = eax InString
39 = eax crt_strstr
-1 = eax BMBinSearch
38 = eax BMHBinsearch
39 = eax InstrJJ
Problem:
39 = eax crt_strstr
-1 = eax BMBinSearch
38 = eax BMHBinsearch
What am I doing wrong here?
invoke BMBinSearch,0, SearchString, Len(SearchString), chr$(Pattern), 5
The Boyer-Moore-Horspool variant returns a valid match, using the same invoke line.
You may not be doing anything wrong, the BM variants are very hard to exhaustively verify and this is after testing them on hundreds of million of string variations.
OK. I added an old algo of mine above, a bit bloated but the timings look competitive ;-)
EDIT: The attached new version yields some surprising results... ::)
Inter alia, in round 2 ("contains TXFS_") all algos return the correct result, but InString and the two Boyer-Moore variants need a lot of time for doing so... a factor 25 for Masm32 InString vs CRT strstr!
I have no clue why that is the case. Any ideas, or different results?
The comment * -= is Windows.inc and WinExtra.inc combined - more than 2 MB to mitigate cache effects.
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
++++++++++++++++++++
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_DO_NOT_RESET_RM_AT_NE
XT_START
31716 kCycles for 3 * MB Instr_
46678 kCycles for 3 * InString
45015 kCycles for 3 * crt_strstr
15563 kCycles for 3 * BMBinSearch
15980 kCycles for 3 * BMHBinsearch
12553 kCycles for 3 * InstrJJ
31322 kCycles for 3 * MB Instr_
46642 kCycles for 3 * InString
45696 kCycles for 3 * crt_strstr
15170 kCycles for 3 * BMBinSearch
15942 kCycles for 3 * BMHBinsearch
12535 kCycles for 3 * InstrJJ
31228 kCycles for 3 * MB Instr_
46267 kCycles for 3 * InString
45374 kCycles for 3 * crt_strstr
15143 kCycles for 3 * BMBinSearch
16418 kCycles for 3 * BMHBinsearch
12569 kCycles for 3 * InstrJJ
2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMBinSearch
2026935 = eax BMHBinsearch
2026935 = eax InstrJJ
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
251 kCycles for 3 * MB Instr_
9125 kCycles for 3 * InString
372 kCycles for 3 * crt_strstr
6391 kCycles for 3 * BMBinSearch
6416 kCycles for 3 * BMHBinsearch
74 kCycles for 3 * InstrJJ
251 kCycles for 3 * MB Instr_
8678 kCycles for 3 * InString
776 kCycles for 3 * crt_strstr
6404 kCycles for 3 * BMBinSearch
6408 kCycles for 3 * BMHBinsearch
74 kCycles for 3 * InstrJJ
252 kCycles for 3 * MB Instr_
8648 kCycles for 3 * InString
373 kCycles for 3 * crt_strstr
6412 kCycles for 3 * BMBinSearch
6483 kCycles for 3 * BMHBinsearch
76 kCycles for 3 * InstrJJ
18057 = eax MB Instr_
18057 = eax InString
18057 = eax crt_strstr
18057 = eax BMBinSearch
18057 = eax BMHBinsearch
18057 = eax InstrJJ
Testing if [This is a simple string which has toward...] contains Dupli
3497 kCycles for 10000 * MB Instr_
4322 kCycles for 10000 * InString
3980 kCycles for 10000 * crt_strstr
4953 kCycles for 10000 * BMBinSearch
5665 kCycles for 10000 * BMHBinsearch
1030 kCycles for 10000 * InstrJJ
3500 kCycles for 10000 * MB Instr_
4328 kCycles for 10000 * InString
3986 kCycles for 10000 * crt_strstr
5050 kCycles for 10000 * BMBinSearch
5379 kCycles for 10000 * BMHBinsearch
1033 kCycles for 10000 * InstrJJ
3501 kCycles for 10000 * MB Instr_
4323 kCycles for 10000 * InString
3980 kCycles for 10000 * crt_strstr
4949 kCycles for 10000 * BMBinSearch
5529 kCycles for 10000 * BMHBinsearch
1036 kCycles for 10000 * InstrJJ
75 = eax MB Instr_
75 = eax InString
75 = eax crt_strstr
0 = eax BMBinSearch
75 = eax BMHBinsearch
75 = eax InstrJJ
Same phenomenon on my Celeron M:
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
245 kCycles for 3 * MB Instr_
8814 kCycles for 3 * InString
237 kCycles for 3 * crt_strstr
4764 kCycles for 3 * BMBinSearch
4783 kCycles for 3 * BMHBinsearch
58 kCycles for 3 * InstrJJ
EDIT(2): Here is the culprit for InString (\Masm32\m32lib\instring.asm, line 45):
if 0
mov sLen, Len(lpSource) ; 4830 cycles for the TXFS_ case (MasmBasic Len is a bit faster than StrLen...)
else
invoke StrLen,lpSource ; 8690 cycles for the TXFS_ case
mov sLen, eax ; source length
endif
The other algos don't check the source len beforehand, and therefore are much faster if the source string is long and there is an early match.
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
++++++++++++++++++2 of 20 tests valid, loop overhead is approx. 8/3 cycles
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
39731 kCycles for 3 * MB Instr_
56594 kCycles for 3 * InString
45181 kCycles for 3 * crt_strstr
12963 kCycles for 3 * BMBinSearch
17715 kCycles for 3 * BMHBinsearch
22831 kCycles for 3 * InstrJJ
39550 kCycles for 3 * MB Instr_
56437 kCycles for 3 * InString
45368 kCycles for 3 * crt_strstr
12850 kCycles for 3 * BMBinSearch
16990 kCycles for 3 * BMHBinsearch
23049 kCycles for 3 * InstrJJ
39448 kCycles for 3 * MB Instr_
56528 kCycles for 3 * InString
45606 kCycles for 3 * crt_strstr
12935 kCycles for 3 * BMBinSearch
17348 kCycles for 3 * BMHBinsearch
23027 kCycles for 3 * InstrJJ
2026771 = eax MB Instr_
2026771 = eax InString
2026771 = eax crt_strstr
2026771 = eax BMBinSearch
2026771 = eax BMHBinsearch
2026771 = eax InstrJJ
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
338 kCycles for 3 * MB Instr_
12471 kCycles for 3 * InString
340 kCycles for 3 * crt_strstr
5087 kCycles for 3 * BMBinSearch
5153 kCycles for 3 * BMHBinsearch
144 kCycles for 3 * InstrJJ
335 kCycles for 3 * MB Instr_
12728 kCycles for 3 * InString
341 kCycles for 3 * crt_strstr
5137 kCycles for 3 * BMBinSearch
5283 kCycles for 3 * BMHBinsearch
150 kCycles for 3 * InstrJJ
341 kCycles for 3 * MB Instr_
12273 kCycles for 3 * InString
340 kCycles for 3 * crt_strstr
5117 kCycles for 3 * BMBinSearch
5492 kCycles for 3 * BMHBinsearch
146 kCycles for 3 * InstrJJ
17893 = eax MB Instr_
17893 = eax InString
17893 = eax crt_strstr
17893 = eax BMBinSearch
17893 = eax BMHBinsearch
17893 = eax InstrJJ
Testing if [This is a simple string which has toward...] contains Dupli
4225 kCycles for 10000 * MB Instr_
5270 kCycles for 10000 * InString
3536 kCycles for 10000 * crt_strstr
7601 kCycles for 10000 * BMBinSearch
8514 kCycles for 10000 * BMHBinsearch
1461 kCycles for 10000 * InstrJJ
4261 kCycles for 10000 * MB Instr_
5135 kCycles for 10000 * InString
3627 kCycles for 10000 * crt_strstr
7884 kCycles for 10000 * BMBinSearch
8420 kCycles for 10000 * BMHBinsearch
1457 kCycles for 10000 * InstrJJ
4224 kCycles for 10000 * MB Instr_
5104 kCycles for 10000 * InString
3529 kCycles for 10000 * crt_strstr
7891 kCycles for 10000 * BMBinSearch
8378 kCycles for 10000 * BMHBinsearch
1461 kCycles for 10000 * InstrJJ
75 = eax MB Instr_
75 = eax InString
75 = eax crt_strstr
0 = eax BMBinSearch
75 = eax BMHBinsearch
75 = eax InstrJJ
Jochen,
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
++++++++++++++++++++
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
14679 kCycles for 3 * MB Instr_
25238 kCycles for 3 * InString
19752 kCycles for 3 * crt_strstr
4576 kCycles for 3 * BMBinSearch
7510 kCycles for 3 * BMHBinsearch
6096 kCycles for 3 * InstrJJ
14331 kCycles for 3 * MB Instr_
27113 kCycles for 3 * InString
22980 kCycles for 3 * crt_strstr
4748 kCycles for 3 * BMBinSearch
7450 kCycles for 3 * BMHBinsearch
6046 kCycles for 3 * InstrJJ
14210 kCycles for 3 * MB Instr_
25347 kCycles for 3 * InString
22804 kCycles for 3 * crt_strstr
4637 kCycles for 3 * BMBinSearch
7607 kCycles for 3 * BMHBinsearch
6078 kCycles for 3 * InstrJJ
2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMBinSearch
2026935 = eax BMHBinsearch
2026935 = eax InstrJJ
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
124 kCycles for 3 * MB Instr_
4489 kCycles for 3 * InString
173 kCycles for 3 * crt_strstr
1103 kCycles for 3 * BMBinSearch
1092 kCycles for 3 * BMHBinsearch
16 kCycles for 3 * InstrJJ
127 kCycles for 3 * MB Instr_
6381 kCycles for 3 * InString
170 kCycles for 3 * crt_strstr
1162 kCycles for 3 * BMBinSearch
1162 kCycles for 3 * BMHBinsearch
16 kCycles for 3 * InstrJJ
124 kCycles for 3 * MB Instr_
7405 kCycles for 3 * InString
152 kCycles for 3 * crt_strstr
1141 kCycles for 3 * BMBinSearch
1155 kCycles for 3 * BMHBinsearch
16 kCycles for 3 * InstrJJ
18057 = eax MB Instr_
18057 = eax InString
18057 = eax crt_strstr
18057 = eax BMBinSearch
18057 = eax BMHBinsearch
18057 = eax InstrJJ
Testing if [This is a simple string which has toward...] contains Dupli
1716 kCycles for 10000 * MB Instr_
2546 kCycles for 10000 * InString
1804 kCycles for 10000 * crt_strstr
2392 kCycles for 10000 * BMBinSearch
2447 kCycles for 10000 * BMHBinsearch
458 kCycles for 10000 * InstrJJ
1742 kCycles for 10000 * MB Instr_
2535 kCycles for 10000 * InString
1700 kCycles for 10000 * crt_strstr
2352 kCycles for 10000 * BMBinSearch
2425 kCycles for 10000 * BMHBinsearch
1111 kCycles for 10000 * InstrJJ
1656 kCycles for 10000 * MB Instr_
2525 kCycles for 10000 * InString
1854 kCycles for 10000 * crt_strstr
2295 kCycles for 10000 * BMBinSearch
2444 kCycles for 10000 * BMHBinsearch
1119 kCycles for 10000 * InstrJJ
75 = eax MB Instr_
75 = eax InString
75 = eax crt_strstr
0 = eax BMBinSearch
75 = eax BMHBinsearch
75 = eax InstrJJ
--- ok ---
Gunther
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
+++++++++++++++++++1 of 20 tests valid, loop overhead is approx. 13/3 cycles
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
18560 kCycles for 3 * MB Instr_
33473 kCycles for 3 * InString
26370 kCycles for 3 * crt_strstr
6243 kCycles for 3 * BMBinSearch
9480 kCycles for 3 * BMHBinsearch
7954 kCycles for 3 * InstrJJ
18421 kCycles for 3 * MB Instr_
33493 kCycles for 3 * InString
25844 kCycles for 3 * crt_strstr
6192 kCycles for 3 * BMBinSearch
9686 kCycles for 3 * BMHBinsearch
8163 kCycles for 3 * InstrJJ
18932 kCycles for 3 * MB Instr_
33374 kCycles for 3 * InString
26216 kCycles for 3 * crt_strstr
6265 kCycles for 3 * BMBinSearch
9498 kCycles for 3 * BMHBinsearch
8013 kCycles for 3 * InstrJJ
2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMBinSearch
2026935 = eax BMHBinsearch
2026935 = eax InstrJJ
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
143 kCycles for 3 * MB Instr_
5745 kCycles for 3 * InString
312 kCycles for 3 * crt_strstr
1798 kCycles for 3 * BMBinSearch
1400 kCycles for 3 * BMHBinsearch
20 kCycles for 3 * InstrJJ
143 kCycles for 3 * MB Instr_
6027 kCycles for 3 * InString
181 kCycles for 3 * crt_strstr
1561 kCycles for 3 * BMBinSearch
1504 kCycles for 3 * BMHBinsearch
20 kCycles for 3 * InstrJJ
143 kCycles for 3 * MB Instr_
6113 kCycles for 3 * InString
175 kCycles for 3 * crt_strstr
1692 kCycles for 3 * BMBinSearch
1504 kCycles for 3 * BMHBinsearch
19 kCycles for 3 * InstrJJ
18057 = eax MB Instr_
18057 = eax InString
18057 = eax crt_strstr
18057 = eax BMBinSearch
18057 = eax BMHBinsearch
18057 = eax InstrJJ
Testing if [This is a simple string which has toward...] contains Dupli
2166 kCycles for 10000 * MB Instr_
3191 kCycles for 10000 * InString
2224 kCycles for 10000 * crt_strstr
2962 kCycles for 10000 * BMBinSearch
3092 kCycles for 10000 * BMHBinsearch
787 kCycles for 10000 * InstrJJ
2109 kCycles for 10000 * MB Instr_
3189 kCycles for 10000 * InString
2233 kCycles for 10000 * crt_strstr
3004 kCycles for 10000 * BMBinSearch
3064 kCycles for 10000 * BMHBinsearch
585 kCycles for 10000 * InstrJJ
2194 kCycles for 10000 * MB Instr_
3162 kCycles for 10000 * InString
2214 kCycles for 10000 * crt_strstr
2927 kCycles for 10000 * BMBinSearch
3095 kCycles for 10000 * BMHBinsearch
547 kCycles for 10000 * InstrJJ
75 = eax MB Instr_
75 = eax InString
75 = eax crt_strstr
0 = eax BMBinSearch
75 = eax BMHBinsearch
75 = eax InstrJJ
--- ok ---
I just see no clock rate, why ? ::)
AMD FX(tm)-8320 Eight-Core Processor (SSE4)
++++++++++++++++++++
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
22894 kCycles for 3 * MB Instr_
42856 kCycles for 3 * InString
33888 kCycles for 3 * crt_strstr
5857 kCycles for 3 * BMBinSearch
9078 kCycles for 3 * BMHBinsearch
9763 kCycles for 3 * InstrJJ
23025 kCycles for 3 * MB Instr_
42396 kCycles for 3 * InString
33900 kCycles for 3 * crt_strstr
5982 kCycles for 3 * BMBinSearch
9084 kCycles for 3 * BMHBinsearch
9871 kCycles for 3 * InstrJJ
24064 kCycles for 3 * MB Instr_
42467 kCycles for 3 * InString
33944 kCycles for 3 * crt_strstr
5732 kCycles for 3 * BMBinSearch
8951 kCycles for 3 * BMHBinsearch
9755 kCycles for 3 * InstrJJ
2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMBinSearch
2026935 = eax BMHBinsearch
2026935 = eax InstrJJ
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
198 kCycles for 3 * MB Instr_
13271 kCycles for 3 * InString
295 kCycles for 3 * crt_strstr
1845 kCycles for 3 * BMBinSearch
1883 kCycles for 3 * BMHBinsearch
80 kCycles for 3 * InstrJJ
200 kCycles for 3 * MB Instr_
8780 kCycles for 3 * InString
296 kCycles for 3 * crt_strstr
4574 kCycles for 3 * BMBinSearch
1896 kCycles for 3 * BMHBinsearch
70 kCycles for 3 * InstrJJ
196 kCycles for 3 * MB Instr_
8768 kCycles for 3 * InString
296 kCycles for 3 * crt_strstr
5267 kCycles for 3 * BMBinSearch
1788 kCycles for 3 * BMHBinsearch
79 kCycles for 3 * InstrJJ
18057 = eax MB Instr_
18057 = eax InString
18057 = eax crt_strstr
18057 = eax BMBinSearch
18057 = eax BMHBinsearch
18057 = eax InstrJJ
Testing if [This is a simple string which has toward...] contains Dupli
2243 kCycles for 10000 * MB Instr_
3933 kCycles for 10000 * InString
2940 kCycles for 10000 * crt_strstr
3951 kCycles for 10000 * BMBinSearch
4177 kCycles for 10000 * BMHBinsearch
749 kCycles for 10000 * InstrJJ
2274 kCycles for 10000 * MB Instr_
3898 kCycles for 10000 * InString
2911 kCycles for 10000 * crt_strstr
4014 kCycles for 10000 * BMBinSearch
4104 kCycles for 10000 * BMHBinsearch
741 kCycles for 10000 * InstrJJ
2570 kCycles for 10000 * MB Instr_
3881 kCycles for 10000 * InString
2887 kCycles for 10000 * crt_strstr
3903 kCycles for 10000 * BMBinSearch
4143 kCycles for 10000 * BMHBinsearch
913 kCycles for 10000 * InstrJJ
75 = eax MB Instr_
75 = eax InString
75 = eax crt_strstr
0 = eax BMBinSearch
75 = eax BMHBinsearch
75 = eax InstrJJ
I do not know exactly why the CPU clocking is not displayed but I do see a big bug in logic for timing multi core AMD cpus.
Quoteinvoke SetProcessAffinityMask, -1, 1 ; restrict to one core
Modern AMD multi core CPUs are running in very low clocking if the average of all cores are not above a certain % - I do not know the exact threshold but ~30% -
Mine is running at 1400Mhz but can go up to 3600Mhz on demand.
So to force the CPUs be clocked at max all cores have to be make working.
1 core 100% will not make the clocking go up.
That is interesting, but not a happy interesting. On a 4 core AMD, to get good timings, you have to kick off 3 cores, locked to the cores, each in a dummy loop burning 30% cpu time, then kick off your timing test locked on the the 4th core? What a bummer. What does that do to the CPU temp?
Dave.
Quote from: Ficko on March 09, 2014, 02:24:48 AMSo to force the CPUs be clocked at max all cores have to be make working.
That's correct, but the purpose of lab benchmarks is to test relative performance of algos, not of CPUs.
Thanks to all for the interesting results. I am tempted to add InstrJJ to the MB lib, but with source & pattern len passed as parameter, i.e. not limited to zero-delimited strings. For parsing files, the source len is known, and it seems that without the test for zero bytes the algo could be 15-20% faster.
Yes, depending what you wanna time.
If you wanna compare 2 or more algos execution time but not interested how it compares to other CPUs than the above timing can work under certain circumstances just fine.
Like in my case with 8 cores the clock will not likely to change during the measurement.
But having 4 cores only making 1 core running on 100% would perhaps force clocking fluctuation.
I am not an AMD expert but may it is possible to force programmatically the CPU to run on a certain clock or not to change its clocking.
Ficko,
I'll check to see what AMD says. I certainly want to run my programs as fast as I can (take as short a time as possible to run), and my algos are usually not easily able to be split into multi core threading algos (they all use huge memory buffers and if I used multi threading, I'd get killed with memory paging).
Dave.
Hi Dave,
Have found something maybe interessting little OS patching to squeeze out more of "Bulldozers".
https://forums.geforce.com/default/topic/548205/pc-components/amd-cpu-clock-guide-comparison-bulldozer-vs-piledriver-/ (https://forums.geforce.com/default/topic/548205/pc-components/amd-cpu-clock-guide-comparison-bulldozer-vs-piledriver-/)
Thank you for the link.
Dave.
I have added a combined "Instr/RevInstr" algo more size as speed optimized but performs well on random lenght string/pattern since it will quit if LEN(pattern) > LEN(string).
AMD FX(tm)-8320 Eight-Core Processor (SSE4)
++++++++++++++++++++
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
22823 kCycles for 3 * MB Instr_
42185 kCycles for 3 * InString
33973 kCycles for 3 * crt_strstr
5473 kCycles for 3 * BMBinSearch
8673 kCycles for 3 * BMHBinsearch
9792 kCycles for 3 * InstrJJ
40499 kCycles for 3 * InstrFicko
1760 kCycles for 3 * RevInstrFicko
22453 kCycles for 3 * MB Instr_
42327 kCycles for 3 * InString
33933 kCycles for 3 * crt_strstr
5614 kCycles for 3 * BMBinSearch
8806 kCycles for 3 * BMHBinsearch
9729 kCycles for 3 * InstrJJ
40208 kCycles for 3 * InstrFicko
2333 kCycles for 3 * RevInstrFicko
22461 kCycles for 3 * MB Instr_
42165 kCycles for 3 * InString
33840 kCycles for 3 * crt_strstr
5849 kCycles for 3 * BMBinSearch
8807 kCycles for 3 * BMHBinsearch
9910 kCycles for 3 * InstrJJ
39540 kCycles for 3 * InstrFicko
1867 kCycles for 3 * RevInstrFicko
2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMBinSearch
2026935 = eax BMHBinsearch
2026935 = eax InstrJJ
2026935 = eax InstrFicko
2026935 = eax RevInstrFicko
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
192 kCycles for 3 * MB Instr_
12205 kCycles for 3 * InString
300 kCycles for 3 * crt_strstr
1485 kCycles for 3 * BMBinSearch
1634 kCycles for 3 * BMHBinsearch
61 kCycles for 3 * InstrJJ
1426 kCycles for 3 * InstrFicko
1580 kCycles for 3 * RevInstrFicko
194 kCycles for 3 * MB Instr_
12415 kCycles for 3 * InString
293 kCycles for 3 * crt_strstr
1469 kCycles for 3 * BMBinSearch
1605 kCycles for 3 * BMHBinsearch
79 kCycles for 3 * InstrJJ
2699 kCycles for 3 * InstrFicko
1472 kCycles for 3 * RevInstrFicko
195 kCycles for 3 * MB Instr_
11464 kCycles for 3 * InString
295 kCycles for 3 * crt_strstr
1472 kCycles for 3 * BMBinSearch
1538 kCycles for 3 * BMHBinsearch
81 kCycles for 3 * InstrJJ
1616 kCycles for 3 * InstrFicko
1586 kCycles for 3 * RevInstrFicko
18057 = eax MB Instr_
18057 = eax InString
18057 = eax crt_strstr
18057 = eax BMBinSearch
18057 = eax BMHBinsearch
18057 = eax InstrJJ
18057 = eax InstrFicko
2028751 = eax RevInstrFicko
Testing if [This is a simple string which has toward...] contains Dupli
2256 kCycles for 10000 * MB Instr_
4103 kCycles for 10000 * InString
2918 kCycles for 10000 * crt_strstr
4292 kCycles for 10000 * BMBinSearch
4371 kCycles for 10000 * BMHBinsearch
766 kCycles for 10000 * InstrJJ
3641 kCycles for 10000 * InstrFicko
2847 kCycles for 10000 * RevInstrFicko
2294 kCycles for 10000 * MB Instr_
4264 kCycles for 10000 * InString
2919 kCycles for 10000 * crt_strstr
4260 kCycles for 10000 * BMBinSearch
4542 kCycles for 10000 * BMHBinsearch
1072 kCycles for 10000 * InstrJJ
3689 kCycles for 10000 * InstrFicko
2833 kCycles for 10000 * RevInstrFicko
2236 kCycles for 10000 * MB Instr_
4074 kCycles for 10000 * InString
3002 kCycles for 10000 * crt_strstr
4284 kCycles for 10000 * BMBinSearch
4103 kCycles for 10000 * BMHBinsearch
806 kCycles for 10000 * InstrJJ
3623 kCycles for 10000 * InstrFicko
2912 kCycles for 10000 * RevInstrFicko
75 = eax MB Instr_
75 = eax InString
75 = eax crt_strstr
0 = eax BMBinSearch
75 = eax BMHBinsearch
75 = eax InstrJJ
75 = eax InstrFicko
75 = eax RevInstrFicko
--- ok ---
Thanks a lot, Ficko. I have added your algo to the original testbed, attached below, and here are my results:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
+++++++13 of 20 tests valid, loop overhead is approx. 10/3 cycles
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_DO_NOT_RESET_RM
_AT_NEXT_START
28115 kCycles for 3 * MB Instr_
32154 kCycles for 3 * InString
29185 kCycles for 3 * crt_strstr
18713 kCycles for 3 * BMHBinSearch, using len
7549 kCycles for 3 * BMHBinsearch, known length
9636 kCycles for 3 * InstrJJ
60151 kCycles for 3 * InstrFicko
5172 kCycles for 3 * RevInstrFicko
4894 kCycles for 3 * MB Rinstr
28012 kCycles for 3 * MB Instr_
32150 kCycles for 3 * InString
29138 kCycles for 3 * crt_strstr
18742 kCycles for 3 * BMHBinSearch, using len
7538 kCycles for 3 * BMHBinsearch, known length
9577 kCycles for 3 * InstrJJ
60022 kCycles for 3 * InstrFicko
5156 kCycles for 3 * RevInstrFicko
4900 kCycles for 3 * MB Rinstr
2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMHBinSearch, using len
2026935 = eax BMHBinsearch, known length
2026935 = eax InstrJJ
2026935 = eax InstrFicko
2026935 = eax RevInstrFicko
2026935 = eax MB Rinstr
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
245 kCycles for 3 * MB Instr_
8609 kCycles for 3 * InString
237 kCycles for 3 * crt_strstr
11329 kCycles for 3 * BMHBinSearch, using len
120 kCycles for 3 * BMHBinsearch, known length
57 kCycles for 3 * InstrJJ
5011 kCycles for 3 * InstrFicko
5172 kCycles for 3 * RevInstrFicko
4875 kCycles for 3 * MB Rinstr
245 kCycles for 3 * MB Instr_
8611 kCycles for 3 * InString
237 kCycles for 3 * crt_strstr
11332 kCycles for 3 * BMHBinSearch, using len
118 kCycles for 3 * BMHBinsearch, known length
58 kCycles for 3 * InstrJJ
5011 kCycles for 3 * InstrFicko
5186 kCycles for 3 * RevInstrFicko
4935 kCycles for 3 * MB Rinstr
18057 = eax MB Instr_
18057 = eax InString
18057 = eax crt_strstr
18057 = eax BMHBinSearch, using len
18057 = eax BMHBinsearch, known length
18057 = eax InstrJJ
18057 = eax InstrFicko
2028751 = eax RevInstrFicko
2028751 = eax MB Rinstr
Testing if [This is a simple string which has at the...] contains Dupli
2869 kCycles for 10000 * MB Instr_
3168 kCycles for 10000 * InString
2433 kCycles for 10000 * crt_strstr
6243 kCycles for 10000 * BMHBinSearch, using len
5298 kCycles for 10000 * BMHBinsearch, known length
869 kCycles for 10000 * InstrJJ
4091 kCycles for 10000 * InstrFicko
1974 kCycles for 10000 * RevInstrFicko
1171 kCycles for 10000 * MB Rinstr
2881 kCycles for 10000 * MB Instr_
3177 kCycles for 10000 * InString
2433 kCycles for 10000 * crt_strstr
6366 kCycles for 10000 * BMHBinSearch, using len
5329 kCycles for 10000 * BMHBinsearch, known length
870 kCycles for 10000 * InstrJJ
4085 kCycles for 10000 * InstrFicko
1974 kCycles for 10000 * RevInstrFicko
1171 kCycles for 10000 * MB Rinstr
Note that I replaced the not-so-reliable standard Boyer-Moore algo with its Horspool variant, in two flavours, one using len() to determine the size of the source, the other using a precalculated source length.
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
++++++++++++++++4 of 20 tests valid, loop overhead is approx. 11/3 cycles
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
39387 kCycles for 3 * MB Instr_
57139 kCycles for 3 * InString
45535 kCycles for 3 * crt_strstr
31791 kCycles for 3 * BMHBinSearch, using len
12079 kCycles for 3 * BMHBinsearch, known length
23119 kCycles for 3 * InstrJJ
85090 kCycles for 3 * InstrFicko
6215 kCycles for 3 * RevInstrFicko
5637 kCycles for 3 * MB Rinstr
39401 kCycles for 3 * MB Instr_
56685 kCycles for 3 * InString
45497 kCycles for 3 * crt_strstr
31761 kCycles for 3 * BMHBinSearch, using len
11845 kCycles for 3 * BMHBinsearch, known length
23208 kCycles for 3 * InstrJJ
85101 kCycles for 3 * InstrFicko
6475 kCycles for 3 * RevInstrFicko
5527 kCycles for 3 * MB Rinstr
2026771 = eax MB Instr_
2026771 = eax InString
2026771 = eax crt_strstr
2026771 = eax BMHBinSearch, using len
2026771 = eax BMHBinsearch, known length
2026771 = eax InstrJJ
2026771 = eax InstrFicko
2026771 = eax RevInstrFicko
2026771 = eax MB Rinstr
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
336 kCycles for 3 * MB Instr_
12614 kCycles for 3 * InString
340 kCycles for 3 * crt_strstr
19855 kCycles for 3 * BMHBinSearch, using len
172 kCycles for 3 * BMHBinsearch, known length
142 kCycles for 3 * InstrJJ
5976 kCycles for 3 * InstrFicko
6208 kCycles for 3 * RevInstrFicko
5294 kCycles for 3 * MB Rinstr
335 kCycles for 3 * MB Instr_
12458 kCycles for 3 * InString
370 kCycles for 3 * crt_strstr
19813 kCycles for 3 * BMHBinSearch, using len
170 kCycles for 3 * BMHBinsearch, known length
142 kCycles for 3 * InstrJJ
5924 kCycles for 3 * InstrFicko
6269 kCycles for 3 * RevInstrFicko
5520 kCycles for 3 * MB Rinstr
17893 = eax MB Instr_
17893 = eax InString
17893 = eax crt_strstr
17893 = eax BMHBinSearch, using len
17893 = eax BMHBinsearch, known length
17893 = eax InstrJJ
17893 = eax InstrFicko
2028587 = eax RevInstrFicko
2028587 = eax MB Rinstr
Testing if [This is a simple string which has at the...] contains Dupli
4017 kCycles for 10000 * MB Instr_
4751 kCycles for 10000 * InString
3348 kCycles for 10000 * crt_strstr
9705 kCycles for 10000 * BMHBinSearch, using len
7654 kCycles for 10000 * BMHBinsearch, known length
1389 kCycles for 10000 * InstrJJ
5823 kCycles for 10000 * InstrFicko
5583 kCycles for 10000 * RevInstrFicko
1948 kCycles for 10000 * MB Rinstr
4044 kCycles for 10000 * MB Instr_
4726 kCycles for 10000 * InString
3368 kCycles for 10000 * crt_strstr
9642 kCycles for 10000 * BMHBinSearch, using len
7655 kCycles for 10000 * BMHBinsearch, known length
1390 kCycles for 10000 * InstrJJ
5974 kCycles for 10000 * InstrFicko
5585 kCycles for 10000 * RevInstrFicko
1944 kCycles for 10000 * MB Rinstr
70 = eax MB Instr_
70 = eax InString
70 = eax crt_strstr
70 = eax BMHBinSearch, using len
70 = eax BMHBinsearch, known length
70 = eax InstrJJ
70 = eax InstrFicko
70 = eax RevInstrFicko
70 = eax MB Rinstr
[codeIntel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
+++++++++++++++++++1 of 20 tests valid, loop overhead is approx. 15/3 cycles
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
15985 kCycles for 3 * MB Instr_
27849 kCycles for 3 * InString
22156 kCycles for 3 * crt_strstr
14938 kCycles for 3 * BMHBinSearch, using len
6891 kCycles for 3 * BMHBinsearch, known length
6804 kCycles for 3 * InstrJJ
43312 kCycles for 3 * InstrFicko
1695 kCycles for 3 * RevInstrFicko
1368 kCycles for 3 * MB Rinstr
15967 kCycles for 3 * MB Instr_
27955 kCycles for 3 * InString
22148 kCycles for 3 * crt_strstr
14943 kCycles for 3 * BMHBinSearch, using len
6900 kCycles for 3 * BMHBinsearch, known length
6802 kCycles for 3 * InstrJJ
43323 kCycles for 3 * InstrFicko
1691 kCycles for 3 * RevInstrFicko
1371 kCycles for 3 * MB Rinstr
2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMHBinSearch, using len
2026935 = eax BMHBinsearch, known length
2026935 = eax InstrJJ
2026935 = eax InstrFicko
2026935 = eax RevInstrFicko
2026935 = eax MB Rinstr
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
145 kCycles for 3 * MB Instr_
8892 kCycles for 3 * InString
171 kCycles for 3 * crt_strstr
8151 kCycles for 3 * BMHBinSearch, using len
78 kCycles for 3 * BMHBinsearch, known length
21 kCycles for 3 * InstrJJ
4158 kCycles for 3 * InstrFicko
2351 kCycles for 3 * RevInstrFicko
1844 kCycles for 3 * MB Rinstr
143 kCycles for 3 * MB Instr_
5014 kCycles for 3 * InString
167 kCycles for 3 * crt_strstr
8134 kCycles for 3 * BMHBinSearch, using len
74 kCycles for 3 * BMHBinsearch, known length
19 kCycles for 3 * InstrJJ
1592 kCycles for 3 * InstrFicko
1730 kCycles for 3 * RevInstrFicko
1357 kCycles for 3 * MB Rinstr
18057 = eax MB Instr_
18057 = eax InString
18057 = eax crt_strstr
18057 = eax BMHBinSearch, using len
18057 = eax BMHBinsearch, known length
18057 = eax InstrJJ
18057 = eax InstrFicko
2028751 = eax RevInstrFicko
2028751 = eax MB Rinstr
Testing if [This is a simple string which has at the...] contains Dupli
1825 kCycles for 10000 * MB Instr_
2745 kCycles for 10000 * InString
1525 kCycles for 10000 * crt_strstr
3327 kCycles for 10000 * BMHBinSearch, using len
2402 kCycles for 10000 * BMHBinsearch, known length
520 kCycles for 10000 * InstrJJ
2895 kCycles for 10000 * InstrFicko
2707 kCycles for 10000 * RevInstrFicko
828 kCycles for 10000 * MB Rinstr
1842 kCycles for 10000 * MB Instr_
2746 kCycles for 10000 * InString
1532 kCycles for 10000 * crt_strstr
3354 kCycles for 10000 * BMHBinSearch, using len
2409 kCycles for 10000 * BMHBinsearch, known length
523 kCycles for 10000 * InstrJJ
2897 kCycles for 10000 * InstrFicko
2705 kCycles for 10000 * RevInstrFicko
835 kCycles for 10000 * MB Rinstr
70 = eax MB Instr_
70 = eax InString
70 = eax crt_strstr
70 = eax BMHBinSearch, using len
70 = eax BMHBinsearch, known length
70 = eax InstrJJ
70 = eax InstrFicko
70 = eax RevInstrFicko
70 = eax MB Rinstr
--- ok ---]
Maybe you should test a deliberate failure? I would imagine that testing strings would give lots of fails i.e. substring not found.
Quote from: sinsi on March 09, 2014, 07:37:25 PM
Maybe you should test a deliberate failure? I would imagine that testing strings would give lots of fails i.e. substring not found.
Good idea - "No such string" in Windows.inc:
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains No such string
28235 kCycles for 3 * MB Instr_
30764 kCycles for 3 * InString
28484 kCycles for 3 * crt_strstr
30098 kCycles for 3 * BMHBinSearch, using len
18890 kCycles for 3 * BMHBinsearch, known length
8827 kCycles for 3 * InstrJJ
54616 kCycles for 3 * InstrFicko
36272 kCycles for 3 * RevInstrFicko
30595 kCycles for 3 * MB Rinstr
28191 kCycles for 3 * MB Instr_
30792 kCycles for 3 * InString
28334 kCycles for 3 * crt_strstr
30071 kCycles for 3 * BMHBinSearch, using len
19009 kCycles for 3 * BMHBinsearch, known length
8826 kCycles for 3 * InstrJJ
54628 kCycles for 3 * InstrFicko
36266 kCycles for 3 * RevInstrFicko
30599 kCycles for 3 * MB Rinstr
Jochen,
the timings from 0803:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
+++++++++++++++++++1 of 20 tests valid, loop overhead is approx. 6/3 cycles
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_RM_FLAG_D
O_NOT_RESET_RM_AT_NEXT_START
14064 kCycles for 3 * MB Instr_
24877 kCycles for 3 * InString
19507 kCycles for 3 * crt_strstr
13258 kCycles for 3 * BMHBinSearch, using len
6090 kCycles for 3 * BMHBinsearch, known length
6102 kCycles for 3 * InstrJJ
38735 kCycles for 3 * InstrFicko
3651 kCycles for 3 * RevInstrFicko
1460 kCycles for 3 * MB Rinstr
15733 kCycles for 3 * MB Instr_
24878 kCycles for 3 * InString
23220 kCycles for 3 * crt_strstr
13145 kCycles for 3 * BMHBinSearch, using len
6102 kCycles for 3 * BMHBinsearch, known length
6154 kCycles for 3 * InstrJJ
38324 kCycles for 3 * InstrFicko
1499 kCycles for 3 * RevInstrFicko
1164 kCycles for 3 * MB Rinstr
2026935 = eax MB Instr_
2026935 = eax InString
2026935 = eax crt_strstr
2026935 = eax BMHBinSearch, using len
2026935 = eax BMHBinsearch, known length
2026935 = eax InstrJJ
2026935 = eax InstrFicko
2026935 = eax RevInstrFicko
2026935 = eax MB Rinstr
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains TXFS_
124 kCycles for 3 * MB Instr_
4392 kCycles for 3 * InString
147 kCycles for 3 * crt_strstr
7152 kCycles for 3 * BMHBinSearch, using len
68 kCycles for 3 * BMHBinsearch, known length
42 kCycles for 3 * InstrJJ
3423 kCycles for 3 * InstrFicko
2005 kCycles for 3 * RevInstrFicko
1501 kCycles for 3 * MB Rinstr
125 kCycles for 3 * MB Instr_
7601 kCycles for 3 * InString
148 kCycles for 3 * crt_strstr
10462 kCycles for 3 * BMHBinSearch, using len
89 kCycles for 3 * BMHBinsearch, known length
41 kCycles for 3 * InstrJJ
3503 kCycles for 3 * InstrFicko
3667 kCycles for 3 * RevInstrFicko
1454 kCycles for 3 * MB Rinstr
18057 = eax MB Instr_
18057 = eax InString
18057 = eax crt_strstr
18057 = eax BMHBinSearch, using len
18057 = eax BMHBinsearch, known length
18057 = eax InstrJJ
18057 = eax InstrFicko
2028751 = eax RevInstrFicko
2028751 = eax MB Rinstr
Testing if [comment * -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=...] contains No such string
14289 kCycles for 3 * MB Instr_
23227 kCycles for 3 * InString
21122 kCycles for 3 * crt_strstr
21344 kCycles for 3 * BMHBinSearch, using len
17299 kCycles for 3 * BMHBinsearch, known length
5317 kCycles for 3 * InstrJJ
35784 kCycles for 3 * InstrFicko
16121 kCycles for 3 * RevInstrFicko
15147 kCycles for 3 * MB Rinstr
14124 kCycles for 3 * MB Instr_
23151 kCycles for 3 * InString
18370 kCycles for 3 * crt_strstr
21374 kCycles for 3 * BMHBinSearch, using len
14369 kCycles for 3 * BMHBinsearch, known length
5282 kCycles for 3 * InstrJJ
33821 kCycles for 3 * InstrFicko
16071 kCycles for 3 * RevInstrFicko
18405 kCycles for 3 * MB Rinstr
0 = eax MB Instr_
0 = eax InString
0 = eax crt_strstr
0 = eax BMHBinSearch, using len
0 = eax BMHBinsearch, known length
0 = eax InstrJJ
0 = eax InstrFicko
0 = eax RevInstrFicko
0 = eax MB Rinstr
Testing if [This is a simple string which has at the...] contains Dupli
1567 kCycles for 10000 * MB Instr_
2370 kCycles for 10000 * InString
1332 kCycles for 10000 * crt_strstr
2814 kCycles for 10000 * BMHBinSearch, using len
2039 kCycles for 10000 * BMHBinsearch, known length
1180 kCycles for 10000 * InstrJJ
6366 kCycles for 10000 * InstrFicko
2789 kCycles for 10000 * RevInstrFicko
698 kCycles for 10000 * MB Rinstr
1571 kCycles for 10000 * MB Instr_
2389 kCycles for 10000 * InString
1319 kCycles for 10000 * crt_strstr
2804 kCycles for 10000 * BMHBinSearch, using len
4874 kCycles for 10000 * BMHBinsearch, known length
481 kCycles for 10000 * InstrJJ
2591 kCycles for 10000 * InstrFicko
2420 kCycles for 10000 * RevInstrFicko
747 kCycles for 10000 * MB Rinstr
70 = eax MB Instr_
70 = eax InString
70 = eax crt_strstr
70 = eax BMHBinSearch, using len
70 = eax BMHBinsearch, known length
70 = eax InstrJJ
70 = eax InstrFicko
70 = eax RevInstrFicko
70 = eax MB Rinstr
--- ok ---
Gunther
Thanks, Gunther. Your results show clearly that getting the source len is not a realistic option for the Boyer-Moore algos:
7152 kCycles for 3 * BMHBinSearch, using len
68 kCycles for 3 * BMHBinsearch, known length
42 kCycles for 3 * InstrJJ
It gives me some ideas :P
Jochen,
Quote from: jj2007 on March 09, 2014, 11:38:59 PM
It gives me some ideas :P
for MB or assembly language in general?
Gunther
Quote from: Ficko on March 09, 2014, 03:35:35 AM
Like in my case with 8 cores the clock will not likely to change during the measurement.
But having 4 cores only making 1 core running on 100% would perhaps force clocking fluctuation.
I am not an AMD expert but may it is possible to force programmatically the CPU to run on a certain clock or not to change its clocking.
The timing in clock cycles is independent of the clock speed.