Hi,
Can I have some timings please? Thanks :t
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
40057 cycles for 100 * MbInstr 0
40767 cycles for 100 * MbInstr 1
40017 cycles for 100 * MbInstr 2
40965 cycles for 100 * MbInstr 4
51485 cycles for 100 * crt_strstr
52835 cycles for 100 * M32 find$
deleted
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
42200 cycles for 100 * MbInstr 0
42483 cycles for 100 * MbInstr 1
42815 cycles for 100 * MbInstr 2
43101 cycles for 100 * MbInstr 4
33541 cycles for 100 * crt_strstr
38404 cycles for 100 * M32 find$
42136 cycles for 100 * MbInstr 0
42894 cycles for 100 * MbInstr 1
42309 cycles for 100 * MbInstr 2
42928 cycles for 100 * MbInstr 4
33481 cycles for 100 * crt_strstr
38438 cycles for 100 * M32 find$
42199 cycles for 100 * MbInstr 0
42577 cycles for 100 * MbInstr 1
42297 cycles for 100 * MbInstr 2
43530 cycles for 100 * MbInstr 4
33490 cycles for 100 * crt_strstr
38416 cycles for 100 * M32 find$
18 bytes for MbInstr 0
18 bytes for MbInstr 1
18 bytes for MbInstr 2
18 bytes for MbInstr 4
22 bytes for crt_strstr
15 bytes for M32 find$
97 = eax MbInstr 0
97 = eax MbInstr 1
97 = eax MbInstr 2
97 = eax MbInstr 4
97 = eax crt_strstr
97 = eax M32 find$
--- ok ---
Jochen,
your timings:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
22112 cycles for 100 * MbInstr 0
22187 cycles for 100 * MbInstr 1
22319 cycles for 100 * MbInstr 2
22243 cycles for 100 * MbInstr 4
28535 cycles for 100 * crt_strstr
30875 cycles for 100 * M32 find$
22176 cycles for 100 * MbInstr 0
22185 cycles for 100 * MbInstr 1
22194 cycles for 100 * MbInstr 2
22304 cycles for 100 * MbInstr 4
28572 cycles for 100 * crt_strstr
30766 cycles for 100 * M32 find$
21962 cycles for 100 * MbInstr 0
22149 cycles for 100 * MbInstr 1
22309 cycles for 100 * MbInstr 2
22278 cycles for 100 * MbInstr 4
28534 cycles for 100 * crt_strstr
30747 cycles for 100 * M32 find$
18 bytes for MbInstr 0
18 bytes for MbInstr 1
18 bytes for MbInstr 2
18 bytes for MbInstr 4
22 bytes for crt_strstr
15 bytes for M32 find$
97 = eax MbInstr 0
97 = eax MbInstr 1
97 = eax MbInstr 2
97 = eax MbInstr 4
97 = eax crt_strstr
97 = eax M32 find$
--- ok ---
Gunther
Quote from: Gunther on July 14, 2014, 10:45:22 PM
Jochen,
your timings:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
22112 cycles for 100 * MbInstr 0
28535 cycles for 100 * crt_strstr
30875 cycles for 100 * M32 find$
Gunther,
I love your CPU :greensml:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
41492 cycles for 100 * MbInstr 0
41718 cycles for 100 * MbInstr 1
41553 cycles for 100 * MbInstr 2
42373 cycles for 100 * MbInstr 4
32945 cycles for 100 * crt_strstr
37785 cycles for 100 * M32 find$
Jochen,
Quote from: jj2007 on July 14, 2014, 11:23:32 PM
Gunther,
I love your CPU :greensml:
me too. :lol: :lol: :lol:
Gunther
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
54210 cycles for 100 * MbInstr 0
53810 cycles for 100 * MbInstr 1
54892 cycles for 100 * MbInstr 2
55200 cycles for 100 * MbInstr 4
42894 cycles for 100 * crt_strstr
59477 cycles for 100 * M32 find$
53732 cycles for 100 * MbInstr 0
54695 cycles for 100 * MbInstr 1
54596 cycles for 100 * MbInstr 2
55538 cycles for 100 * MbInstr 4
44184 cycles for 100 * crt_strstr
57831 cycles for 100 * M32 find$
54744 cycles for 100 * MbInstr 0
54076 cycles for 100 * MbInstr 1
54848 cycles for 100 * MbInstr 2
55803 cycles for 100 * MbInstr 4
43372 cycles for 100 * crt_strstr
57899 cycles for 100 * M32 find$
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
23621 cycles for 100 * MbInstr 0
23818 cycles for 100 * MbInstr 1
23339 cycles for 100 * MbInstr 2
23404 cycles for 100 * MbInstr 4
22305 cycles for 100 * crt_strstr
31867 cycles for 100 * M32 find$
AMD A10-7850K APU with Radeon(TM) R7 Graphics (SSE4)
35325 cycles for 100 * MbInstr 0
35340 cycles for 100 * MbInstr 1
35409 cycles for 100 * MbInstr 2
37523 cycles for 100 * MbInstr 4
37294 cycles for 100 * crt_strstr
42007 cycles for 100 * M32 find$
Interesting:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4) - Gunther
22112 cycles for 100 * MbInstr 0
28535 cycles for 100 * crt_strstr
30875 cycles for 100 * M32 find$
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4) - Sinsi
23621 cycles for 100 * MbInstr 0
22305 cycles for 100 * crt_strstr
31867 cycles for 100 * M32 find$
i don't have to remind you how many different versions of MSVCRT there are :P
i am a little surprised you compare them
Would be nice to see where they differ :biggrin:
New test:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
3734 cycles for 10 * MbInstr 0
3302 cycles for 10 * crt_strstr
3780 cycles for 10 * M32 find$
4237 cycles for 10 * MB Instr old
3734 cycles for 10 * MbInstr 0
3290 cycles for 10 * crt_strstr
3792 cycles for 10 * M32 find$
4232 cycles for 10 * MB Instr old
3735 cycles for 10 * MbInstr 0
3292 cycles for 10 * crt_strstr
3785 cycles for 10 * M32 find$
4230 cycles for 10 * MB Instr old
C:\Windows\SysWOW64\msvcrt.dll 7.0.9600.16384
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
2547 cycles for 10 * MbInstr 0
2169 cycles for 10 * crt_strstr
3159 cycles for 10 * M32 find$
2195 cycles for 10 * MB Instr old
2552 cycles for 10 * MbInstr 0
2220 cycles for 10 * crt_strstr
3141 cycles for 10 * M32 find$
2204 cycles for 10 * MB Instr old
2564 cycles for 10 * MbInstr 0
2190 cycles for 10 * crt_strstr
3169 cycles for 10 * M32 find$
2175 cycles for 10 * MB Instr old
AMD Athlon(tm) II X2 250 Processor (SSE3)
++++++++++++++++++++
4947 cycles for 10 * MbInstr 0
5109 cycles for 10 * crt_strstr
5292 cycles for 10 * M32 find$
3894 cycles for 10 * MB Instr old
4828 cycles for 10 * MbInstr 0
5109 cycles for 10 * crt_strstr
5298 cycles for 10 * M32 find$
3895 cycles for 10 * MB Instr old
4827 cycles for 10 * MbInstr 0
5106 cycles for 10 * crt_strstr
5353 cycles for 10 * M32 find$
3881 cycles for 10 * MB Instr old
Thanxalot to everybody :icon14:
Won't be easy to reconcile all CPUs.
Background to this exercise: A real life application where I tried to search a 250MB text file (Thunderbird inbox...) for pattern A near pattern B, where "near" means +- 500 bytes. If pattern A is frequent, and pattern B is only present towards the end of the file, the exercise gets incredibly slow.
So I wrote a new version of Instr_() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1153) that takes a search limit, in this case: 2*500 bytes as an additional parameter. And voilĂ , searching the inbox is a factor 20 or so faster. But the additional parameter slows down the simple search a little bit, and this thread is aimed to investigate that problem.
As a side effect, it will be possible to search non-text files (i.e. with embedded zeros), if the len is known.
Jochen,
your timings:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
2449 cycles for 10 * MbInstr 0
4062 cycles for 10 * crt_strstr
3051 cycles for 10 * M32 find$
2235 cycles for 10 * MB Instr old
2451 cycles for 10 * MbInstr 0
2822 cycles for 10 * crt_strstr
3059 cycles for 10 * M32 find$
2232 cycles for 10 * MB Instr old
2448 cycles for 10 * MbInstr 0
4063 cycles for 10 * crt_strstr
4311 cycles for 10 * M32 find$
3503 cycles for 10 * MB Instr old
Gunther
deleted
Quite efficient :t
However,
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
++++++++++++++++++++
3296 cycles for 10 * MbInstr 0 (A)
4000 cycles for 10 * MbInstr 0 (B)
5169 cycles for 10 * crt_strstr
5519 cycles for 10 * M32 find$
but
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
++++++++++++++++++++
3540 cycles for 10 * MbInstr 0 (A)
3999 cycles for 10 * MbInstr 0 (B)
5159 cycles for 10 * crt_strstr
5335 cycles for 10 * M32 find$
3247 cycles for 10 * strstr_nidud
What did you smuggle in that destroys the performance of my algo??? :eusa_naughty:
What is even more worrying is the bad performance on my Celeron :eusa_boohoo:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
4094 cycles for 10 * MbInstr 0 (A)
3291 cycles for 10 * crt_strstr
3778 cycles for 10 * M32 find$
3347 cycles for 10 * strstr_nidud
deleted
put them in seperate programs - run a batch file :P
Quote from: dedndave on July 16, 2014, 05:17:21 AM
put them in seperate programs - run a batch file :P
Not a bad idea.
Gunther
deleted
Here is Instr() with another setting: find
echo WARNING in WinExtra.inc
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
33284 kCycles for 10 * MbInstr 0 (zero-delimited)
30100 kCycles for 10 * MbInstr 0 (file size)
34983 kCycles for 10 * crt_strstr
37981 kCycles for 10 * M32 find$
38525 kCycles for 10 * strstr_nidudThe second entry (file size) refers to the additional parameter mentioned above: The function knows how many bytes are available in the source string. The difference is surprisingly low, though - I had expected a stronger influence of the data cache.
Quote from: nidud on July 16, 2014, 06:20:58 AM
function to test:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
10568 cycles for 100 * proc_4
3939 cycles for 100 * Len
13860 cycles for 100 * len
Jochen,
that's what InstrTimingsNew did:
(http://ibunker.us/photos/20140716140546290309048.jpg)
Here is the output of proc4:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
6860 cycles for 100 * proc_4
1978 cycles for 100 * Len
9348 cycles for 100 * len
6205 cycles for 100 * proc_4
1957 cycles for 100 * Len
9312 cycles for 100 * len
6224 cycles for 100 * proc_4
1955 cycles for 100 * Len
9924 cycles for 100 * len
100 = eax proc_4
100 = eax Len
100 = eax len
--- ok ---
Gunther
Quote from: Gunther on July 16, 2014, 08:24:24 AM
that's what InstrTimingsNew did:
Gunther,
Either you have no \Masm32\include\winextra.inc (unlikely), or you launched the exe from a different drive than your Masm32 drive.
deleted
Jochen,
I've the winextra.inc, but I fired up the application from a different drive. Here is the new output:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
+++++++++11 of 20 tests valid, loop overhead is approx. 43/10 cycles
20184 kCycles for 10 * MbInstr 0 (zero-delimited)
18778 kCycles for 10 * MbInstr 0 (file size)
22744 kCycles for 10 * crt_strstr
28534 kCycles for 10 * M32 find$
30970 kCycles for 10 * strstr_nidud
19942 kCycles for 10 * MbInstr 0 (zero-delimited)
18751 kCycles for 10 * MbInstr 0 (file size)
22837 kCycles for 10 * crt_strstr
28442 kCycles for 10 * M32 find$
30908 kCycles for 10 * strstr_nidud
20094 kCycles for 10 * MbInstr 0 (zero-delimited)
18727 kCycles for 10 * MbInstr 0 (file size)
22739 kCycles for 10 * crt_strstr
28537 kCycles for 10 * M32 find$
30943 kCycles for 10 * strstr_nidud
1068448 = eax MbInstr 0 (zero-delimited)
1068448 = eax MbInstr 0 (file size)
1068448 = eax crt_strstr
1068448 = eax M32 find$
1068448 = eax strstr_nidud
Gunther
deleted
With Dave's suggestion, try this before the timing of each algo in each separate test piece. Set the priority class high enough to avoid the wanders and see if this helps to stabilise the results.
cpuid ; serialising instruction for wider seperation
pause ; spinlock delay instruction
invoke SleepEx,10,0
cpuid ; serialising instruction for wider seperation
pause ; spinlock delay instruction
Usually I have found that some algos are much more sensitive to code location than others, usually intensive BYTE operations where dealing in larger data types reduces the variation.
I've added a fast variant of Instr() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1153). At least on my CPUs, it looks competitive:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
33220 kCycles for 10 * MbInstr 0 (zero-delimited)
30094 kCycles for 10 * MbInstr 0 (file size)
8362 kCycles for 10 * MbInstr FAST
34858 kCycles for 10 * crt_strstr
38010 kCycles for 10 * M32 find$
38399 kCycles for 10 * strstr_nidud
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
32413 kCycles for 10 * MbInstr 0 (zero-delimited)
24107 kCycles for 10 * MbInstr 0 (file size)
13446 kCycles for 10 * MbInstr FAST
57954 kCycles for 10 * crt_strstr
58467 kCycles for 10 * M32 find$
38112 kCycles for 10 * strstr_nidud
Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz (SSE4)
22035 kCycles for 10 * MbInstr 0 (zero-delimited)
19469 kCycles for 10 * MbInstr 0 (file size)
5340 kCycles for 10 * MbInstr FAST
27871 kCycles for 10 * crt_strstr
28522 kCycles for 10 * M32 find$
24944 kCycles for 10 * strstr_nidud
To assemble the source, you will need MasmBasic of today, 24 July. (http://masm32.com/board/index.php?topic=94.0)
Usage: Instr_(1, "Test", "Te", FAST) ; 4 args, last one is uppercase FAST
This is always case-sensitive (same for find$, strstr etc).
Timings from me...
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
++++++++12 of 20 tests valid, loop overhead is approx. 46/10 cycles
16446 kCycles for 10 * MbInstr 0 (zero-delimited)
15438 kCycles for 10 * MbInstr 0 (file size)
3842 kCycles for 10 * MbInstr FAST
26039 kCycles for 10 * crt_strstr
23588 kCycles for 10 * M32 find$
20650 kCycles for 10 * strstr_nidud
16556 kCycles for 10 * MbInstr 0 (zero-delimited)
15557 kCycles for 10 * MbInstr 0 (file size)
3839 kCycles for 10 * MbInstr FAST
25890 kCycles for 10 * crt_strstr
23566 kCycles for 10 * M32 find$
20681 kCycles for 10 * strstr_nidud
16534 kCycles for 10 * MbInstr 0 (zero-delimited)
15786 kCycles for 10 * MbInstr 0 (file size)
3788 kCycles for 10 * MbInstr FAST
25781 kCycles for 10 * crt_strstr
23581 kCycles for 10 * M32 find$
20714 kCycles for 10 * strstr_nidud
1068448 = eax MbInstr 0 (zero-delimited)
1068448 = eax MbInstr 0 (file size)
1068448 = eax MbInstr FAST
1068448 = eax crt_strstr
1068448 = eax M32 find$
1068448 = eax strstr_nidud
I'm at work at the moment so I don't have much time to test, what are the parameters for the test:
Search string (needle)
Search body (haystack)
and number of timing iterations?
ie: find "te" in "test" 1 million times?
I've got my own instr algo but it's 64bit so i can't put it into your testbench, would be interesting to see the comparison (I'd have to report ms not cycles for now).
Quote from: johnsa on July 25, 2014, 12:03:51 AM
Search string (needle)
Search body (haystack)
and number of timing iterations?
Yes, body is \Masm32\include\winextra.inc, and string is echo WARNING
Thanks for your timings :icon14:
Jochen,
InstrTimings5 brings:
c:\yasm\work>InstrTimingsNew.exe
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
+++++++13 of 20 tests valid, loop overhead is approx. 36/10 cycles
20406 kCycles for 10 * MbInstr 0 (zero-delimited)
18924 kCycles for 10 * MbInstr 0 (file size)
7679 kCycles for 10 * MbInstr FAST
23118 kCycles for 10 * crt_strstr
31910 kCycles for 10 * M32 find$
25510 kCycles for 10 * strstr_nidud
20353 kCycles for 10 * MbInstr 0 (zero-delimited)
21436 kCycles for 10 * MbInstr 0 (file size)
4674 kCycles for 10 * MbInstr FAST
22646 kCycles for 10 * crt_strstr
28390 kCycles for 10 * M32 find$
24980 kCycles for 10 * strstr_nidud
20306 kCycles for 10 * MbInstr 0 (zero-delimited)
18900 kCycles for 10 * MbInstr 0 (file size)
4582 kCycles for 10 * MbInstr FAST
22577 kCycles for 10 * crt_strstr
28547 kCycles for 10 * M32 find$
24806 kCycles for 10 * strstr_nidud
1068448 = eax MbInstr 0 (zero-delimited)
1068448 = eax MbInstr 0 (file size)
1068448 = eax MbInstr FAST
1068448 = eax crt_strstr
1068448 = eax M32 find$
1068448 = eax strstr_nidud
The environment is Windows XP under Virtual PC.
Gunther
Quote from: Gunther on July 25, 2014, 04:04:31 AM
4674 kCycles for 10 * MbInstr FAST
22646 kCycles for 10 * crt_strstr
Nice :biggrin:
It gets even worse with a string like
vc2010 (attached) - the speed depends strongly on the frequency of the first pattern byte, and
e as in
echo WARNING is pretty frequent.
Jochen,
InstrTimings5a (same environment):
c:\yasm\work>InstrTimingsNew.exe
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
+++++++++++++7 of 20 tests valid, loop overhead is approx. 44/10 cycles
18711 kCycles for 10 * MbInstr 0 (zero-delimited)
20339 kCycles for 10 * MbInstr 0 (file size)
1794 kCycles for 10 * MbInstr FAST
17515 kCycles for 10 * crt_strstr
22578 kCycles for 10 * M32 find$
17415 kCycles for 10 * strstr_nidud
18347 kCycles for 10 * MbInstr 0 (zero-delimited)
17085 kCycles for 10 * MbInstr 0 (file size)
4439 kCycles for 10 * MbInstr FAST
17287 kCycles for 10 * crt_strstr
22561 kCycles for 10 * M32 find$
17376 kCycles for 10 * strstr_nidud
18455 kCycles for 10 * MbInstr 0 (zero-delimited)
17043 kCycles for 10 * MbInstr 0 (file size)
1773 kCycles for 10 * MbInstr FAST
17269 kCycles for 10 * crt_strstr
22668 kCycles for 10 * M32 find$
20817 kCycles for 10 * strstr_nidud
995374 = eax MbInstr 0 (zero-delimited)
995374 = eax MbInstr 0 (file size)
995374 = eax MbInstr FAST
995374 = eax crt_strstr
995374 = eax M32 find$
995374 = eax strstr_nidud
Gunther
prescott w/htt xp sp3 msvcrt 7.0.2600.5701
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
++18 of 20 tests valid, loop overhead is approx. 30/10 cycles
54049 kCycles for 10 * MbInstr 0 (zero-delimited)
47497 kCycles for 10 * MbInstr 0 (file size)
7165 kCycles for 10 * MbInstr FAST
44331 kCycles for 10 * crt_strstr
43292 kCycles for 10 * M32 find$
40909 kCycles for 10 * strstr_nidud
54272 kCycles for 10 * MbInstr 0 (zero-delimited)
47732 kCycles for 10 * MbInstr 0 (file size)
7299 kCycles for 10 * MbInstr FAST
43498 kCycles for 10 * crt_strstr
43493 kCycles for 10 * M32 find$
41158 kCycles for 10 * strstr_nidud
53552 kCycles for 10 * MbInstr 0 (zero-delimited)
47853 kCycles for 10 * MbInstr 0 (file size)
7176 kCycles for 10 * MbInstr FAST
44348 kCycles for 10 * crt_strstr
42919 kCycles for 10 * M32 find$
41179 kCycles for 10 * strstr_nidud
995374 = eax MbInstr 0 (zero-delimited)
995374 = eax MbInstr 0 (file size)
995374 = eax MbInstr FAST
995374 = eax crt_strstr
995374 = eax M32 find$
995374 = eax strstr_nidud
Quote from: Gunther on July 25, 2014, 08:20:08 AM
1794 kCycles for 10 * MbInstr FAST
17515 kCycles for 10 * crt_strstr
Almost 10:1 against CRT is really nice, that's ammunition against those who claim that C compilers are better than assembler :greensml:
Even with Dave's trusty P4 it's 6 x CRT. But again, the test with a pattern that starts with "v" is a little bit unfair ;-)
deleted
So Intel themselves use assembler to code the CRT... ::)
Somebody will probably argue now that strstr would be much faster had they only used their compiler instead :badgrin:
Jochen,
Quote from: jj2007 on July 26, 2014, 06:27:43 AM
Somebody will probably argue now that strstr would be much faster had they only used their compiler instead :badgrin:
but that would be a bad joke. No one will believe that.
Gunther
http://volnitsky.com/project/str_search/
QuoteDescribed new online substring search algorithm which allows faster string traversal. Presented here implementation is substantially faster than any other online substring search algorithms for average case.
Is possible to make a backward seach. I explain :
1 - I search for "jpg"
2 - I search the first '"' BEFORE "jpg"
That would simplify the program.
Quote
C:\Users\Grincheux\Downloads\InstrTimings5a>InstrTimingsNew.exe
AMD Athlon(tm) II X2 250 Processor (SSE3)
++++++++++++++++++++
22129 kCycles for 10 * MbInstr 0 (zero-delimited)
19881 kCycles for 10 * MbInstr 0 (file size)
3080 kCycles for 10 * MbInstr FAST
49397 kCycles for 10 * crt_strstr
46292 kCycles for 10 * M32 find$
20135 kCycles for 10 * strstr_nidud
22177 kCycles for 10 * MbInstr 0 (zero-delimited)
19904 kCycles for 10 * MbInstr 0 (file size)
3100 kCycles for 10 * MbInstr FAST
49343 kCycles for 10 * crt_strstr
45889 kCycles for 10 * M32 find$
20143 kCycles for 10 * strstr_nidud
22259 kCycles for 10 * MbInstr 0 (zero-delimited)
19881 kCycles for 10 * MbInstr 0 (file size)
3048 kCycles for 10 * MbInstr FAST
49342 kCycles for 10 * crt_strstr
45947 kCycles for 10 * M32 find$
20126 kCycles for 10 * strstr_nidud
995374 = eax MbInstr 0 (zero-delimited)
995374 = eax MbInstr 0 (file size)
995374 = eax MbInstr FAST
995374 = eax crt_strstr
995374 = eax M32 find$
995374 = eax strstr_nidud
C:\Users\Grincheux\Downloads\InstrTimings5a>
I think that the best is
MbInstrI include MasmBasic.inc and MasmBasic.lib. That'sll I have to do?
I have installed the JJ2007's "InstrJJ" function.
A small part of AgnerAfrog (strlen) and an other part of JJ2007.
I forgot, a small part of Hutch for the memory, and Fearless for the interface.
I dropped VirtualAlloc and replaced it with a big buffer into the data segment.
It works fine.
I don't understand again files are not well downloaded, some of them are good but a big part are bad.
I continue searching before uploading this new version
Quote
0D 0A 0D 0A 0D 0A 0D 0A 3C 21 44 4F 43 54 59 50 45 20 68 74 6D 6C 3E 0D 0A 0D 0A 3C 21 2D 2D 5B
........<!DOCTYPE html>....<!--[
I thought that "
<!DOCTYPE html>" always was on the first line, so my test verifying if the file is an html file was wrong. I made correction.
Yup a little late, but I wanted to test the performance of my new toy...
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (SSE4)
27111 cycles for 100 * MbInstr 0
24259 cycles for 100 * MbInstr 1
26501 cycles for 100 * MbInstr 2
23610 cycles for 100 * MbInstr 4
24436 cycles for 100 * crt_strstr
27061 cycles for 100 * M32 find$
26900 cycles for 100 * MbInstr 0
29255 cycles for 100 * MbInstr 1
29007 cycles for 100 * MbInstr 2
27196 cycles for 100 * MbInstr 4
28573 cycles for 100 * crt_strstr
26529 cycles for 100 * M32 find$
29880 cycles for 100 * MbInstr 0
24030 cycles for 100 * MbInstr 1
23902 cycles for 100 * MbInstr 2
25233 cycles for 100 * MbInstr 4
22454 cycles for 100 * crt_strstr
26943 cycles for 100 * M32 find$
18 bytes for MbInstr 0
18 bytes for MbInstr 1
18 bytes for MbInstr 2
18 bytes for MbInstr 4
22 bytes for crt_strstr
15 bytes for M32 find$
97 = eax MbInstr 0
97 = eax MbInstr 1
97 = eax MbInstr 2
97 = eax MbInstr 4
97 = eax crt_strstr
97 = eax M32 find$
:biggrin: :P