News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Instr, strstr, find$

Started by jj2007, July 14, 2014, 09:20:15 PM

Previous topic - Next topic

jj2007

Hi,
Can I have some timings please? Thanks :t

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
40057   cycles for 100 * MbInstr 0
40767   cycles for 100 * MbInstr 1
40017   cycles for 100 * MbInstr 2
40965   cycles for 100 * MbInstr 4
51485   cycles for 100 * crt_strstr
52835   cycles for 100 * M32 find$

nidud

#1
deleted

FORTRANS

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

42200   cycles for 100 * MbInstr 0
42483   cycles for 100 * MbInstr 1
42815   cycles for 100 * MbInstr 2
43101   cycles for 100 * MbInstr 4
33541   cycles for 100 * crt_strstr
38404   cycles for 100 * M32 find$

42136   cycles for 100 * MbInstr 0
42894   cycles for 100 * MbInstr 1
42309   cycles for 100 * MbInstr 2
42928   cycles for 100 * MbInstr 4
33481   cycles for 100 * crt_strstr
38438   cycles for 100 * M32 find$

42199   cycles for 100 * MbInstr 0
42577   cycles for 100 * MbInstr 1
42297   cycles for 100 * MbInstr 2
43530   cycles for 100 * MbInstr 4
33490   cycles for 100 * crt_strstr
38416   cycles for 100 * M32 find$

18   bytes for MbInstr 0
18   bytes for MbInstr 1
18   bytes for MbInstr 2
18   bytes for MbInstr 4
22   bytes for crt_strstr
15   bytes for M32 find$

97   = eax MbInstr 0
97   = eax MbInstr 1
97   = eax MbInstr 2
97   = eax MbInstr 4
97   = eax crt_strstr
97   = eax M32 find$

--- ok ---

Gunther

Jochen,

your timings:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

22112   cycles for 100 * MbInstr 0
22187   cycles for 100 * MbInstr 1
22319   cycles for 100 * MbInstr 2
22243   cycles for 100 * MbInstr 4
28535   cycles for 100 * crt_strstr
30875   cycles for 100 * M32 find$

22176   cycles for 100 * MbInstr 0
22185   cycles for 100 * MbInstr 1
22194   cycles for 100 * MbInstr 2
22304   cycles for 100 * MbInstr 4
28572   cycles for 100 * crt_strstr
30766   cycles for 100 * M32 find$

21962   cycles for 100 * MbInstr 0
22149   cycles for 100 * MbInstr 1
22309   cycles for 100 * MbInstr 2
22278   cycles for 100 * MbInstr 4
28534   cycles for 100 * crt_strstr
30747   cycles for 100 * M32 find$

18      bytes for MbInstr 0
18      bytes for MbInstr 1
18      bytes for MbInstr 2
18      bytes for MbInstr 4
22      bytes for crt_strstr
15      bytes for M32 find$

97      = eax MbInstr 0
97      = eax MbInstr 1
97      = eax MbInstr 2
97      = eax MbInstr 4
97      = eax crt_strstr
97      = eax M32 find$

--- ok ---


Gunther
You have to know the facts before you can distort them.

jj2007

Quote from: Gunther on July 14, 2014, 10:45:22 PM
Jochen,

your timings:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
22112   cycles for 100 * MbInstr 0
28535   cycles for 100 * crt_strstr
30875   cycles for 100 * M32 find$

Gunther,
I love your CPU :greensml:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

41492   cycles for 100 * MbInstr 0
41718   cycles for 100 * MbInstr 1
41553   cycles for 100 * MbInstr 2
42373   cycles for 100 * MbInstr 4
32945   cycles for 100 * crt_strstr
37785   cycles for 100 * M32 find$

Gunther

Jochen,

Quote from: jj2007 on July 14, 2014, 11:23:32 PM
Gunther,
I love your CPU :greensml:

me too.  :lol: :lol: :lol:

Gunther
You have to know the facts before you can distort them.

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

54210   cycles for 100 * MbInstr 0
53810   cycles for 100 * MbInstr 1
54892   cycles for 100 * MbInstr 2
55200   cycles for 100 * MbInstr 4
42894   cycles for 100 * crt_strstr
59477   cycles for 100 * M32 find$

53732   cycles for 100 * MbInstr 0
54695   cycles for 100 * MbInstr 1
54596   cycles for 100 * MbInstr 2
55538   cycles for 100 * MbInstr 4
44184   cycles for 100 * crt_strstr
57831   cycles for 100 * M32 find$

54744   cycles for 100 * MbInstr 0
54076   cycles for 100 * MbInstr 1
54848   cycles for 100 * MbInstr 2
55803   cycles for 100 * MbInstr 4
43372   cycles for 100 * crt_strstr
57899   cycles for 100 * M32 find$

sinsi

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
23621   cycles for 100 * MbInstr 0
23818   cycles for 100 * MbInstr 1
23339   cycles for 100 * MbInstr 2
23404   cycles for 100 * MbInstr 4
22305   cycles for 100 * crt_strstr
31867   cycles for 100 * M32 find$


AMD A10-7850K APU with Radeon(TM) R7 Graphics   (SSE4)
35325   cycles for 100 * MbInstr 0
35340   cycles for 100 * MbInstr 1
35409   cycles for 100 * MbInstr 2
37523   cycles for 100 * MbInstr 4
37294   cycles for 100 * crt_strstr
42007   cycles for 100 * M32 find$

🍺🍺🍺

jj2007

Interesting:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4) - Gunther
22112   cycles for 100 * MbInstr 0
28535   cycles for 100 * crt_strstr
30875   cycles for 100 * M32 find$

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4) - Sinsi
23621   cycles for 100 * MbInstr 0
22305   cycles for 100 * crt_strstr
31867   cycles for 100 * M32 find$

dedndave

i don't have to remind you how many different versions of MSVCRT there are   :P
i am a little surprised you compare them

jj2007

Would be nice to see where they differ :biggrin:

New test:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

3734    cycles for 10 * MbInstr 0
3302    cycles for 10 * crt_strstr
3780    cycles for 10 * M32 find$
4237    cycles for 10 * MB Instr old

3734    cycles for 10 * MbInstr 0
3290    cycles for 10 * crt_strstr
3792    cycles for 10 * M32 find$
4232    cycles for 10 * MB Instr old

3735    cycles for 10 * MbInstr 0
3292    cycles for 10 * crt_strstr
3785    cycles for 10 * M32 find$
4230    cycles for 10 * MB Instr old

sinsi

C:\Windows\SysWOW64\msvcrt.dll  7.0.9600.16384

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)

2547    cycles for 10 * MbInstr 0
2169    cycles for 10 * crt_strstr
3159    cycles for 10 * M32 find$
2195    cycles for 10 * MB Instr old

2552    cycles for 10 * MbInstr 0
2220    cycles for 10 * crt_strstr
3141    cycles for 10 * M32 find$
2204    cycles for 10 * MB Instr old

2564    cycles for 10 * MbInstr 0
2190    cycles for 10 * crt_strstr
3169    cycles for 10 * M32 find$
2175    cycles for 10 * MB Instr old

🍺🍺🍺

jcfuller

AMD Athlon(tm) II X2 250 Processor (SSE3)
++++++++++++++++++++
4947    cycles for 10 * MbInstr 0
5109    cycles for 10 * crt_strstr
5292    cycles for 10 * M32 find$
3894    cycles for 10 * MB Instr old

4828    cycles for 10 * MbInstr 0
5109    cycles for 10 * crt_strstr
5298    cycles for 10 * M32 find$
3895    cycles for 10 * MB Instr old

4827    cycles for 10 * MbInstr 0
5106    cycles for 10 * crt_strstr
5353    cycles for 10 * M32 find$
3881    cycles for 10 * MB Instr old


jj2007

Thanxalot to everybody :icon14:

Won't be easy to reconcile all CPUs.
Background to this exercise: A real life application where I tried to search a 250MB text file (Thunderbird inbox...) for pattern A near pattern B, where "near" means +- 500 bytes. If pattern A is frequent, and pattern B is only present towards the end of the file, the exercise gets incredibly slow.

So I wrote a new version of Instr_() that takes a search limit, in this case: 2*500 bytes as an additional parameter. And voilĂ , searching the inbox is a factor 20 or so faster. But the additional parameter slows down the simple search a little bit, and this thread is aimed to investigate that problem.

As a side effect, it will be possible to search non-text files (i.e. with embedded zeros), if the len is known.

Gunther

Jochen,

your timings:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

2449    cycles for 10 * MbInstr 0
4062    cycles for 10 * crt_strstr
3051    cycles for 10 * M32 find$
2235    cycles for 10 * MB Instr old

2451    cycles for 10 * MbInstr 0
2822    cycles for 10 * crt_strstr
3059    cycles for 10 * M32 find$
2232    cycles for 10 * MB Instr old

2448    cycles for 10 * MbInstr 0
4063    cycles for 10 * crt_strstr
4311    cycles for 10 * M32 find$
3503    cycles for 10 * MB Instr old


Gunther
You have to know the facts before you can distort them.