News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Instr, strstr, find$

Started by jj2007, July 14, 2014, 09:20:15 PM

Previous topic - Next topic

johnsa

I'm at work at the moment so I don't have much time to test, what are the parameters for the test:

Search string (needle)
Search body (haystack)

and number of timing iterations?

ie: find "te" in "test" 1 million times?
I've got my own instr algo but it's 64bit so i can't put it into your testbench, would be interesting to see the comparison (I'd have to report ms not cycles for now).

jj2007

Quote from: johnsa on July 25, 2014, 12:03:51 AM
Search string (needle)
Search body (haystack)

and number of timing iterations?

Yes, body is \Masm32\include\winextra.inc, and string is echo WARNING

Thanks for your timings :icon14:

Gunther

Jochen,

InstrTimings5 brings:


c:\yasm\work>InstrTimingsNew.exe
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
+++++++13 of 20 tests valid, loop overhead is approx. 36/10 cycles

20406   kCycles for 10 * MbInstr 0 (zero-delimited)
18924   kCycles for 10 * MbInstr 0 (file size)
7679    kCycles for 10 * MbInstr FAST
23118   kCycles for 10 * crt_strstr
31910   kCycles for 10 * M32 find$
25510   kCycles for 10 * strstr_nidud

20353   kCycles for 10 * MbInstr 0 (zero-delimited)
21436   kCycles for 10 * MbInstr 0 (file size)
4674    kCycles for 10 * MbInstr FAST
22646   kCycles for 10 * crt_strstr
28390   kCycles for 10 * M32 find$
24980   kCycles for 10 * strstr_nidud

20306   kCycles for 10 * MbInstr 0 (zero-delimited)
18900   kCycles for 10 * MbInstr 0 (file size)
4582    kCycles for 10 * MbInstr FAST
22577   kCycles for 10 * crt_strstr
28547   kCycles for 10 * M32 find$
24806   kCycles for 10 * strstr_nidud

1068448 = eax MbInstr 0 (zero-delimited)
1068448 = eax MbInstr 0 (file size)
1068448 = eax MbInstr FAST
1068448 = eax crt_strstr
1068448 = eax M32 find$
1068448 = eax strstr_nidud


The environment is Windows XP under Virtual PC.

Gunther
You have to know the facts before you can distort them.

jj2007

Quote from: Gunther on July 25, 2014, 04:04:31 AM
4674    kCycles for 10 * MbInstr FAST
22646   kCycles for 10 * crt_strstr

Nice  :biggrin:

It gets even worse with a string like vc2010 (attached) - the speed depends strongly on the frequency of the first pattern byte, and e as in echo WARNING is pretty frequent.

Gunther

Jochen,

InstrTimings5a (same environment):


c:\yasm\work>InstrTimingsNew.exe
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
+++++++++++++7 of 20 tests valid, loop overhead is approx. 44/10 cycles

18711   kCycles for 10 * MbInstr 0 (zero-delimited)
20339   kCycles for 10 * MbInstr 0 (file size)
1794    kCycles for 10 * MbInstr FAST
17515   kCycles for 10 * crt_strstr
22578   kCycles for 10 * M32 find$
17415   kCycles for 10 * strstr_nidud

18347   kCycles for 10 * MbInstr 0 (zero-delimited)
17085   kCycles for 10 * MbInstr 0 (file size)
4439    kCycles for 10 * MbInstr FAST
17287   kCycles for 10 * crt_strstr
22561   kCycles for 10 * M32 find$
17376   kCycles for 10 * strstr_nidud

18455   kCycles for 10 * MbInstr 0 (zero-delimited)
17043   kCycles for 10 * MbInstr 0 (file size)
1773    kCycles for 10 * MbInstr FAST
17269   kCycles for 10 * crt_strstr
22668   kCycles for 10 * M32 find$
20817   kCycles for 10 * strstr_nidud

995374  = eax MbInstr 0 (zero-delimited)
995374  = eax MbInstr 0 (file size)
995374  = eax MbInstr FAST
995374  = eax crt_strstr
995374  = eax M32 find$
995374  = eax strstr_nidud


Gunther
You have to know the facts before you can distort them.

dedndave

prescott w/htt xp sp3 msvcrt 7.0.2600.5701
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
++18 of 20 tests valid, loop overhead is approx. 30/10 cycles

54049   kCycles for 10 * MbInstr 0 (zero-delimited)
47497   kCycles for 10 * MbInstr 0 (file size)
7165    kCycles for 10 * MbInstr FAST
44331   kCycles for 10 * crt_strstr
43292   kCycles for 10 * M32 find$
40909   kCycles for 10 * strstr_nidud

54272   kCycles for 10 * MbInstr 0 (zero-delimited)
47732   kCycles for 10 * MbInstr 0 (file size)
7299    kCycles for 10 * MbInstr FAST
43498   kCycles for 10 * crt_strstr
43493   kCycles for 10 * M32 find$
41158   kCycles for 10 * strstr_nidud

53552   kCycles for 10 * MbInstr 0 (zero-delimited)
47853   kCycles for 10 * MbInstr 0 (file size)
7176    kCycles for 10 * MbInstr FAST
44348   kCycles for 10 * crt_strstr
42919   kCycles for 10 * M32 find$
41179   kCycles for 10 * strstr_nidud

995374  = eax MbInstr 0 (zero-delimited)
995374  = eax MbInstr 0 (file size)
995374  = eax MbInstr FAST
995374  = eax crt_strstr
995374  = eax M32 find$
995374  = eax strstr_nidud

jj2007

Quote from: Gunther on July 25, 2014, 08:20:08 AM
1794    kCycles for 10 * MbInstr FAST
17515   kCycles for 10 * crt_strstr

Almost 10:1 against CRT is really nice, that's ammunition against those who claim that C compilers are better than assembler :greensml:

Even with Dave's trusty P4 it's 6 x CRT. But again, the test with a pattern that starts with "v" is a little bit unfair ;-)

nidud

#37
deleted

jj2007

So Intel themselves use assembler to code the CRT... ::)
Somebody will probably argue now that strstr would be much faster had they only used their compiler instead :badgrin:

Gunther

Jochen,

Quote from: jj2007 on July 26, 2014, 06:27:43 AM
Somebody will probably argue now that strstr would be much faster had they only used their compiler instead :badgrin:

but that would be a bad joke. No one will believe that.

Gunther
You have to know the facts before you can distort them.

jj2007

http://volnitsky.com/project/str_search/
QuoteDescribed new online substring search algorithm which allows faster string traversal. Presented here implementation is substantially faster than any other online substring search algorithms for average case.

Grincheux

Is possible to make a backward seach. I explain :
1 - I search for "jpg"
2 - I search the first '"' BEFORE "jpg"

That would simplify the program.

Grincheux

Quote
C:\Users\Grincheux\Downloads\InstrTimings5a>InstrTimingsNew.exe
AMD Athlon(tm) II X2 250 Processor (SSE3)
++++++++++++++++++++
22129   kCycles for 10 * MbInstr 0 (zero-delimited)
19881   kCycles for 10 * MbInstr 0 (file size)
3080    kCycles for 10 * MbInstr FAST
49397   kCycles for 10 * crt_strstr
46292   kCycles for 10 * M32 find$
20135   kCycles for 10 * strstr_nidud

22177   kCycles for 10 * MbInstr 0 (zero-delimited)
19904   kCycles for 10 * MbInstr 0 (file size)
3100    kCycles for 10 * MbInstr FAST
49343   kCycles for 10 * crt_strstr
45889   kCycles for 10 * M32 find$
20143   kCycles for 10 * strstr_nidud

22259   kCycles for 10 * MbInstr 0 (zero-delimited)
19881   kCycles for 10 * MbInstr 0 (file size)
3048    kCycles for 10 * MbInstr FAST
49342   kCycles for 10 * crt_strstr
45947   kCycles for 10 * M32 find$
20126   kCycles for 10 * strstr_nidud

995374  = eax MbInstr 0 (zero-delimited)
995374  = eax MbInstr 0 (file size)
995374  = eax MbInstr FAST
995374  = eax crt_strstr
995374  = eax M32 find$
995374  = eax strstr_nidud

C:\Users\Grincheux\Downloads\InstrTimings5a>

I think that the best is MbInstr

I include MasmBasic.inc and MasmBasic.lib. That'sll I have to do?

Grincheux

I have installed the JJ2007's "InstrJJ" function.
A small part of AgnerAfrog (strlen) and an other part of JJ2007.
I forgot, a small part of Hutch for the memory, and Fearless for the interface.
I dropped VirtualAlloc and replaced it with a big buffer into the data segment.
It works fine.
I don't understand again files are not well downloaded, some of them are good but a big part are bad.
I continue searching before uploading this new version

Grincheux

Quote
0D 0A 0D 0A 0D 0A 0D 0A 3C 21 44 4F 43 54 59 50 45 20 68 74 6D 6C 3E 0D 0A 0D 0A 3C 21 2D 2D 5B
........<!DOCTYPE html>....<!--[

I thought that "<!DOCTYPE html>" always was on the first line, so my test verifying if the file is an html file was wrong. I made correction.