The MASM Forum

General => The Laboratory => Topic started by: jj2007 on July 14, 2014, 09:20:15 PM

Title: Instr, strstr, find$
Post by: jj2007 on July 14, 2014, 09:20:15 PM
Hi,
Can I have some timings please? Thanks :t

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
40057   cycles for 100 * MbInstr 0
40767   cycles for 100 * MbInstr 1
40017   cycles for 100 * MbInstr 2
40965   cycles for 100 * MbInstr 4
51485   cycles for 100 * crt_strstr
52835   cycles for 100 * M32 find$
Title: Re: Instr, strstr, find$
Post by: nidud on July 14, 2014, 09:57:25 PM
AMD Athlon(tm) II X2 245 Processor (SSE3)

38870   cycles for 100 * MbInstr 0
40286   cycles for 100 * MbInstr 1
38311   cycles for 100 * MbInstr 2
39355   cycles for 100 * MbInstr 4
23442   cycles for 100 * crt_strstr
51846   cycles for 100 * M32 find$

38482   cycles for 100 * MbInstr 0
40338   cycles for 100 * MbInstr 1
38316   cycles for 100 * MbInstr 2
39653   cycles for 100 * MbInstr 4
23432   cycles for 100 * crt_strstr
52037   cycles for 100 * M32 find$

38801   cycles for 100 * MbInstr 0
40392   cycles for 100 * MbInstr 1
39449   cycles for 100 * MbInstr 2
39780   cycles for 100 * MbInstr 4
23438   cycles for 100 * crt_strstr
52430   cycles for 100 * M32 find$
Title: Re: Instr, strstr, find$
Post by: FORTRANS on July 14, 2014, 10:44:49 PM
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

42200   cycles for 100 * MbInstr 0
42483   cycles for 100 * MbInstr 1
42815   cycles for 100 * MbInstr 2
43101   cycles for 100 * MbInstr 4
33541   cycles for 100 * crt_strstr
38404   cycles for 100 * M32 find$

42136   cycles for 100 * MbInstr 0
42894   cycles for 100 * MbInstr 1
42309   cycles for 100 * MbInstr 2
42928   cycles for 100 * MbInstr 4
33481   cycles for 100 * crt_strstr
38438   cycles for 100 * M32 find$

42199   cycles for 100 * MbInstr 0
42577   cycles for 100 * MbInstr 1
42297   cycles for 100 * MbInstr 2
43530   cycles for 100 * MbInstr 4
33490   cycles for 100 * crt_strstr
38416   cycles for 100 * M32 find$

18   bytes for MbInstr 0
18   bytes for MbInstr 1
18   bytes for MbInstr 2
18   bytes for MbInstr 4
22   bytes for crt_strstr
15   bytes for M32 find$

97   = eax MbInstr 0
97   = eax MbInstr 1
97   = eax MbInstr 2
97   = eax MbInstr 4
97   = eax crt_strstr
97   = eax M32 find$

--- ok ---
Title: Re: Instr, strstr, find$
Post by: Gunther on July 14, 2014, 10:45:22 PM
Jochen,

your timings:
Code: [Select]
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

22112   cycles for 100 * MbInstr 0
22187   cycles for 100 * MbInstr 1
22319   cycles for 100 * MbInstr 2
22243   cycles for 100 * MbInstr 4
28535   cycles for 100 * crt_strstr
30875   cycles for 100 * M32 find$

22176   cycles for 100 * MbInstr 0
22185   cycles for 100 * MbInstr 1
22194   cycles for 100 * MbInstr 2
22304   cycles for 100 * MbInstr 4
28572   cycles for 100 * crt_strstr
30766   cycles for 100 * M32 find$

21962   cycles for 100 * MbInstr 0
22149   cycles for 100 * MbInstr 1
22309   cycles for 100 * MbInstr 2
22278   cycles for 100 * MbInstr 4
28534   cycles for 100 * crt_strstr
30747   cycles for 100 * M32 find$

18      bytes for MbInstr 0
18      bytes for MbInstr 1
18      bytes for MbInstr 2
18      bytes for MbInstr 4
22      bytes for crt_strstr
15      bytes for M32 find$

97      = eax MbInstr 0
97      = eax MbInstr 1
97      = eax MbInstr 2
97      = eax MbInstr 4
97      = eax crt_strstr
97      = eax M32 find$

--- ok ---

Gunther
Title: Re: Instr, strstr, find$
Post by: jj2007 on July 14, 2014, 11:23:32 PM
Jochen,

your timings:
Code: [Select]
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
22112   cycles for 100 * MbInstr 0
28535   cycles for 100 * crt_strstr
30875   cycles for 100 * M32 find$

Gunther,
I love your CPU :greensml:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

41492   cycles for 100 * MbInstr 0
41718   cycles for 100 * MbInstr 1
41553   cycles for 100 * MbInstr 2
42373   cycles for 100 * MbInstr 4
32945   cycles for 100 * crt_strstr
37785   cycles for 100 * M32 find$
Title: Re: Instr, strstr, find$
Post by: Gunther on July 15, 2014, 02:05:30 AM
Jochen,

Gunther,
I love your CPU :greensml:

me too.  :lol: :lol: :lol:

Gunther
Title: Re: Instr, strstr, find$
Post by: dedndave on July 15, 2014, 02:06:30 AM
prescott w/htt
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

54210   cycles for 100 * MbInstr 0
53810   cycles for 100 * MbInstr 1
54892   cycles for 100 * MbInstr 2
55200   cycles for 100 * MbInstr 4
42894   cycles for 100 * crt_strstr
59477   cycles for 100 * M32 find$

53732   cycles for 100 * MbInstr 0
54695   cycles for 100 * MbInstr 1
54596   cycles for 100 * MbInstr 2
55538   cycles for 100 * MbInstr 4
44184   cycles for 100 * crt_strstr
57831   cycles for 100 * M32 find$

54744   cycles for 100 * MbInstr 0
54076   cycles for 100 * MbInstr 1
54848   cycles for 100 * MbInstr 2
55803   cycles for 100 * MbInstr 4
43372   cycles for 100 * crt_strstr
57899   cycles for 100 * M32 find$
Title: Re: Instr, strstr, find$
Post by: sinsi on July 15, 2014, 07:17:48 AM
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
23621   cycles for 100 * MbInstr 0
23818   cycles for 100 * MbInstr 1
23339   cycles for 100 * MbInstr 2
23404   cycles for 100 * MbInstr 4
22305   cycles for 100 * crt_strstr
31867   cycles for 100 * M32 find$


AMD A10-7850K APU with Radeon(TM) R7 Graphics   (SSE4)
35325   cycles for 100 * MbInstr 0
35340   cycles for 100 * MbInstr 1
35409   cycles for 100 * MbInstr 2
37523   cycles for 100 * MbInstr 4
37294   cycles for 100 * crt_strstr
42007   cycles for 100 * M32 find$

Title: Re: Instr, strstr, find$
Post by: jj2007 on July 15, 2014, 11:55:39 AM
Interesting:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4) - Gunther
22112   cycles for 100 * MbInstr 0
28535   cycles for 100 * crt_strstr
30875   cycles for 100 * M32 find$

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4) - Sinsi
23621   cycles for 100 * MbInstr 0
22305   cycles for 100 * crt_strstr
31867   cycles for 100 * M32 find$
Title: Re: Instr, strstr, find$
Post by: dedndave on July 15, 2014, 12:22:47 PM
i don't have to remind you how many different versions of MSVCRT there are   :P
i am a little surprised you compare them
Title: Re: Instr, strstr, find$
Post by: jj2007 on July 15, 2014, 12:31:44 PM
Would be nice to see where they differ :biggrin:

New test:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

3734    cycles for 10 * MbInstr 0
3302    cycles for 10 * crt_strstr
3780    cycles for 10 * M32 find$
4237    cycles for 10 * MB Instr old

3734    cycles for 10 * MbInstr 0
3290    cycles for 10 * crt_strstr
3792    cycles for 10 * M32 find$
4232    cycles for 10 * MB Instr old

3735    cycles for 10 * MbInstr 0
3292    cycles for 10 * crt_strstr
3785    cycles for 10 * M32 find$
4230    cycles for 10 * MB Instr old
Title: Re: Instr, strstr, find$
Post by: sinsi on July 15, 2014, 12:56:14 PM
C:\Windows\SysWOW64\msvcrt.dll  7.0.9600.16384

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)

2547    cycles for 10 * MbInstr 0
2169    cycles for 10 * crt_strstr
3159    cycles for 10 * M32 find$
2195    cycles for 10 * MB Instr old

2552    cycles for 10 * MbInstr 0
2220    cycles for 10 * crt_strstr
3141    cycles for 10 * M32 find$
2204    cycles for 10 * MB Instr old

2564    cycles for 10 * MbInstr 0
2190    cycles for 10 * crt_strstr
3169    cycles for 10 * M32 find$
2175    cycles for 10 * MB Instr old

Title: Re: Instr, strstr, find$
Post by: jcfuller on July 15, 2014, 07:36:28 PM
AMD Athlon(tm) II X2 250 Processor (SSE3)
++++++++++++++++++++
4947    cycles for 10 * MbInstr 0
5109    cycles for 10 * crt_strstr
5292    cycles for 10 * M32 find$
3894    cycles for 10 * MB Instr old

4828    cycles for 10 * MbInstr 0
5109    cycles for 10 * crt_strstr
5298    cycles for 10 * M32 find$
3895    cycles for 10 * MB Instr old

4827    cycles for 10 * MbInstr 0
5106    cycles for 10 * crt_strstr
5353    cycles for 10 * M32 find$
3881    cycles for 10 * MB Instr old

Title: Re: Instr, strstr, find$
Post by: jj2007 on July 15, 2014, 08:37:50 PM
Thanxalot to everybody :icon14:

Won't be easy to reconcile all CPUs.
Background to this exercise: A real life application where I tried to search a 250MB text file (Thunderbird inbox...) for pattern A near pattern B, where "near" means +- 500 bytes. If pattern A is frequent, and pattern B is only present towards the end of the file, the exercise gets incredibly slow.

So I wrote a new version of Instr_() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1153) that takes a search limit, in this case: 2*500 bytes as an additional parameter. And voilà, searching the inbox is a factor 20 or so faster. But the additional parameter slows down the simple search a little bit, and this thread is aimed to investigate that problem.

As a side effect, it will be possible to search non-text files (i.e. with embedded zeros), if the len is known.
Title: Re: Instr, strstr, find$
Post by: Gunther on July 15, 2014, 08:46:59 PM
Jochen,

your timings:
Code: [Select]
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

2449    cycles for 10 * MbInstr 0
4062    cycles for 10 * crt_strstr
3051    cycles for 10 * M32 find$
2235    cycles for 10 * MB Instr old

2451    cycles for 10 * MbInstr 0
2822    cycles for 10 * crt_strstr
3059    cycles for 10 * M32 find$
2232    cycles for 10 * MB Instr old

2448    cycles for 10 * MbInstr 0
4063    cycles for 10 * crt_strstr
4311    cycles for 10 * M32 find$
3503    cycles for 10 * MB Instr old

Gunther
Title: Re: Instr, strstr, find$
Post by: nidud on July 15, 2014, 09:22:15 PM
here is a simple one, 44 byte
Code: [Select]
strstr  proc dst, src
@1:     mov     ecx,src
        mov     edx,dst
@2:     xor     eax,eax
        xor     al,[edx]
        jz      @5
        inc     edx
        sub     al,[ecx]
        jnz     @2
        mov     dst,edx
        inc     ecx
@3:     xor     al,[ecx]
        jz      @4
        sub     al,[edx]
        jnz     @1
        inc     ecx
        inc     edx
        jmp     @3
@4:     mov     eax,dst
        dec     eax
@5:     ret
strstr  endp
Title: Re: Instr, strstr, find$
Post by: jj2007 on July 16, 2014, 03:31:34 AM
Quite efficient :t
However,

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
++++++++++++++++++++
3296    cycles for 10 * MbInstr 0 (A)
4000    cycles for 10 * MbInstr 0 (B)
5169    cycles for 10 * crt_strstr
5519    cycles for 10 * M32 find$

but

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
++++++++++++++++++++
3540    cycles for 10 * MbInstr 0 (A)
3999    cycles for 10 * MbInstr 0 (B)
5159    cycles for 10 * crt_strstr
5335    cycles for 10 * M32 find$
3247    cycles for 10 * strstr_nidud


What did you smuggle in that destroys the performance of my algo??? :eusa_naughty:

What is even more worrying is the bad performance on my Celeron :eusa_boohoo:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
4094    cycles for 10 * MbInstr 0 (A)
3291    cycles for 10 * crt_strstr
3778    cycles for 10 * M32 find$
3347    cycles for 10 * strstr_nidud
Title: Re: Instr, strstr, find$
Post by: nidud on July 16, 2014, 04:51:11 AM
What did you smuggle in that destroys the performance of my algo??? :eusa_naughty:

 :biggrin:

I think the new function is "stealing" (allocating) cache from the other functions, so the problem is you got a fake (good) result before. The more similar functions you have in the binary the faster it get.

498415  cycles - 50 (62) a 1: strstr
478468  cycles - 50 (66) a 3: x
428413  cycles - 50 (54) a 6: x
428425  cycles - 50 (48) a 7: x

if other functions are added (and not used):

481651  cycles - 50 (62) a 1: strstr
465114  cycles - 50 (66) a 3: x
632872  cycles - 50 (54) a 6: x
629542  cycles - 50 (48) a 7: x


so it’s difficult now to know which one to choose...
Title: Re: Instr, strstr, find$
Post by: dedndave on July 16, 2014, 05:17:21 AM
put them in seperate programs - run a batch file   :P
Title: Re: Instr, strstr, find$
Post by: Gunther on July 16, 2014, 06:00:48 AM
put them in seperate programs - run a batch file   :P

Not a bad idea.

Gunther
Title: Re: Instr, strstr, find$
Post by: nidud on July 16, 2014, 06:20:58 AM
function to test:
Code: [Select]
.486
.model flat,stdcall
.code
proc_4 proc string
mov eax,string
sub eax,4
@@: add eax,4
mov edx,[eax]
lea ecx,[edx-01010101H]
not edx
and ecx,edx
and ecx,80808080H
jz @B
bsf ecx,ecx
shr ecx,3
sub eax,string
add eax,ecx
mov ecx,eax
ret
proc_4  endp

end

jwasm -bin proc_4.asm

Code: [Select]
flag_0  equ 0
flag_1  equ 0
flag_2  equ 0
flag_3  equ 1
flag_4  equ 1
flag_5  equ 1
flag_6  equ 1

proc_0  equ crt_strlen
proc_1  equ szLen
proc_2  equ MbStrLen
proc_3  db "strlen\proc_3.bin",0
proc_4  db "strlen\proc_4.bin",0
proc_5  db "strlen\proc_5.bin",0
proc_6  db "strlen\proc_6.bin",0

info_0  db "crt_strlen",0
info_1  db "MASM32 - szLen() ",0
info_2  db "MasmBasic MbStrLen Len() - SSE",0
...

pushargs macro
push str_x
endm

; get the cycle count for each algo

test_algo macro x, loopcount
if flag_&x&
lea edx,proc_&x&
else
mov edx,proc_&x&
endif
invoke timeit,edx,0,flag_&x&,loopcount,x,addr info_&x&
endm

.code

readit proc uses ebx edi fname
sub edi,edi
invoke CreateFile,fname,80000000h,0,0,3,0,0
.if eax != -1
    mov ebx,eax
    push 0
    mov edx,esp ; lpNumberOfBytesRead
    invoke ReadFile,ebx,addr proc_x,4096,edx,0
    test eax,eax
    pop edi
    .if ZERO?
sub edi,edi
    .endif
    invoke CloseHandle,ebx
.endif
mov eax,edi
ret
readit endp

timeit proc uses ebx esi edi p,plen,ptype,count,id,info
.if ptype
    invoke readit,p
    mov plen,eax
    lea esi,proc_x
    test eax,eax
    jz error1
.else
     mov esi,p
.endif
counter_begin 1000, HIGH_PRIORITY_CLASS
mov edi,count
mov ebx,esp
.while edi
    pushargs
    call esi
    mov esp,ebx
    dec edi
.endw
counter_end
printf("%d\tcycles - %d (%3d) %d: %s\n",eax,count,plen,id,info)
toend:
ret
error1:
printf("error reading %s\n",info)
jmp toend
timeit endp

151336  cycles - 50 (  0) 0: crt_strlen
244234  cycles - 50 (  0) 1: MASM32 - szLen()
47350   cycles - 50 (  0) 2: MasmBasic MbStrLen Len() - SSE
157571  cycles - 50 ( 86) 3: AgnerFog
140117  cycles - 50 ( 49) 4: AgnerFog unaligned
340870  cycles - 50 ( 94) 5: Dave
38961   cycles - 50 ( 45) 6: strlen SSE
Title: Re: Instr, strstr, find$
Post by: jj2007 on July 16, 2014, 06:29:03 AM
Here is Instr() with another setting: find echo WARNING in WinExtra.inc

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
33284   kCycles for 10 * MbInstr 0 (zero-delimited)
30100   kCycles for 10 * MbInstr 0 (file size)
34983   kCycles for 10 * crt_strstr
37981   kCycles for 10 * M32 find$
38525   kCycles for 10 * strstr_nidud


The second entry (file size) refers to the additional parameter mentioned above: The function knows how many bytes are available in the source string. The difference is surprisingly low, though - I had expected a stronger influence of the data cache.


function to test:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
10568   cycles for 100 * proc_4
3939    cycles for 100 * Len
13860   cycles for 100 * len
Title: Re: Instr, strstr, find$
Post by: Gunther on July 16, 2014, 08:24:24 AM
Jochen,

that's what InstrTimingsNew did:
(http://ibunker.us/photos/20140716140546290309048.jpg)

Here is the output of proc4:
Code: [Select]
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

6860    cycles for 100 * proc_4
1978    cycles for 100 * Len
9348    cycles for 100 * len

6205    cycles for 100 * proc_4
1957    cycles for 100 * Len
9312    cycles for 100 * len

6224    cycles for 100 * proc_4
1955    cycles for 100 * Len
9924    cycles for 100 * len

100     = eax proc_4
100     = eax Len
100     = eax len

--- ok ---

Gunther
Title: Re: Instr, strstr, find$
Post by: jj2007 on July 16, 2014, 08:41:00 AM
that's what InstrTimingsNew did:

Gunther,
Either you have no \Masm32\include\winextra.inc (unlikely), or you launched the exe from a different drive than your Masm32 drive.
Title: Re: Instr, strstr, find$
Post by: nidud on July 16, 2014, 08:53:39 AM
here is the aligned version
Code: [Select]
align 16
strstr  proc dst, src
nop
@1: mov ecx,src ; 04h
mov edx,dst
xor eax,eax ; fill code..
cmp [ecx],al
je @5
@2: xor eax,eax ; 10h
xor al,[edx]
jz @5
lea edx,[edx+1]
sub al,[ecx]
jnz @2
mov dst,edx
lea ecx,[ecx+1]
nop
@3: xor al,[ecx] ; 24h
jz @4
sub al,[edx]
jnz @1
inc ecx
inc edx
jmp @3
@4: mov eax,dst ; 30h
dec eax
@5: ret ; 34h
strstr  endp
Title: Re: Instr, strstr, find$
Post by: Gunther on July 16, 2014, 08:58:45 AM
Jochen,

I've the winextra.inc, but I fired up the application from a different drive. Here is the new output:

Code: [Select]
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
+++++++++11 of 20 tests valid, loop overhead is approx. 43/10 cycles

20184   kCycles for 10 * MbInstr 0 (zero-delimited)
18778   kCycles for 10 * MbInstr 0 (file size)
22744   kCycles for 10 * crt_strstr
28534   kCycles for 10 * M32 find$
30970   kCycles for 10 * strstr_nidud

19942   kCycles for 10 * MbInstr 0 (zero-delimited)
18751   kCycles for 10 * MbInstr 0 (file size)
22837   kCycles for 10 * crt_strstr
28442   kCycles for 10 * M32 find$
30908   kCycles for 10 * strstr_nidud

20094   kCycles for 10 * MbInstr 0 (zero-delimited)
18727   kCycles for 10 * MbInstr 0 (file size)
22739   kCycles for 10 * crt_strstr
28537   kCycles for 10 * M32 find$
30943   kCycles for 10 * strstr_nidud

1068448 = eax MbInstr 0 (zero-delimited)
1068448 = eax MbInstr 0 (file size)
1068448 = eax crt_strstr
1068448 = eax M32 find$
1068448 = eax strstr_nidud

Gunther
Title: Re: Instr, strstr, find$
Post by: nidud on July 16, 2014, 09:14:31 PM
here is the strlen test from the new template
the unaligned version (4) was faster in the old test

AMD Athlon(tm) II X2 245 Processor (SSE3)
------------------------------------------------------
151616  cycles - 50 (  0) 0: crt_strlen
244109  cycles - 50 (  0) 1: MASM32 - szLen()
47645   cycles - 50 (  0) 2: MasmBasic MbStrLen Len() - SSE
130962  cycles - 50 ( 86) 3: AgnerFog
156628  cycles - 50 ( 49) 4: AgnerFog unaligned
340432  cycles - 50 ( 94) 5: Dave
40714   cycles - 50 ( 45) 6: strlen SSE


here is the strstr test

24255   kCycles for 10 * MbInstr 0 (zero-delimited)
21810   kCycles for 10 * MbInstr 0 (file size)
27030   kCycles for 10 * crt_strstr
54318   kCycles for 10 * M32 find$
36887   kCycles for 10 * strstr_nidud


and with the aligned version

23684   kCycles for 10 * MbInstr 0 (zero-delimited)
21400   kCycles for 10 * MbInstr 0 (file size)
26562   kCycles for 10 * crt_strstr
52755   kCycles for 10 * M32 find$
27872   kCycles for 10 * strstr_nidud


after the changes I now get this result

513228  cycles - 50 ( 67) 1: strstr
465545  cycles - 50 ( 71) 2: x
426742  cycles - 50 ( 57) 4: x - new
628714  cycles - 50 ( 54) 5: x - old
628309  cycles - 50 ( 50) 7: x


compare to the (fake) first test

428413  cycles - 50 ( 54) 5: x
428425  cycles - 50 ( 50) 7: x
Title: Re: Instr, strstr, find$
Post by: hutch-- on July 18, 2014, 03:05:45 PM
With Dave's suggestion, try this before the timing of each algo in each separate test piece. Set the priority class high enough to avoid the wanders and see if this helps to stabilise the results.

Code: [Select]
    cpuid                           ; serialising instruction for wider seperation
    pause                           ; spinlock delay instruction

    invoke SleepEx,10,0

    cpuid                           ; serialising instruction for wider seperation
    pause                           ; spinlock delay instruction

Usually I have found that some algos are much more sensitive to code location than others, usually intensive BYTE operations where dealing in larger data types reduces the variation.
Title: Re: Instr, strstr, find$
Post by: jj2007 on July 24, 2014, 10:17:31 AM
I've added a fast variant of Instr() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1153). At least on my CPUs, it looks competitive:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
33220   kCycles for 10 * MbInstr 0 (zero-delimited)
30094   kCycles for 10 * MbInstr 0 (file size)
8362    kCycles for 10 * MbInstr FAST
34858   kCycles for 10 * crt_strstr
38010   kCycles for 10 * M32 find$
38399   kCycles for 10 * strstr_nidud

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
32413   kCycles for 10 * MbInstr 0 (zero-delimited)
24107   kCycles for 10 * MbInstr 0 (file size)
13446   kCycles for 10 * MbInstr FAST
57954   kCycles for 10 * crt_strstr
58467   kCycles for 10 * M32 find$
38112   kCycles for 10 * strstr_nidud

Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz (SSE4)
22035   kCycles for 10 * MbInstr 0 (zero-delimited)
19469   kCycles for 10 * MbInstr 0 (file size)
5340    kCycles for 10 * MbInstr FAST
27871   kCycles for 10 * crt_strstr
28522   kCycles for 10 * M32 find$
24944   kCycles for 10 * strstr_nidud


To assemble the source, you will need MasmBasic of today, 24 July. (http://masm32.com/board/index.php?topic=94.0)

Usage: Instr_(1, "Test", "Te", FAST)   ; 4 args, last one is uppercase FAST
This is always case-sensitive (same for find$, strstr etc).
Title: Re: Instr, strstr, find$
Post by: johnsa on July 24, 2014, 11:50:58 PM

Timings from me...

Code: [Select]

Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
++++++++12 of 20 tests valid, loop overhead is approx. 46/10 cycles

16446   kCycles for 10 * MbInstr 0 (zero-delimited)
15438   kCycles for 10 * MbInstr 0 (file size)
3842    kCycles for 10 * MbInstr FAST
26039   kCycles for 10 * crt_strstr
23588   kCycles for 10 * M32 find$
20650   kCycles for 10 * strstr_nidud

16556   kCycles for 10 * MbInstr 0 (zero-delimited)
15557   kCycles for 10 * MbInstr 0 (file size)
3839    kCycles for 10 * MbInstr FAST
25890   kCycles for 10 * crt_strstr
23566   kCycles for 10 * M32 find$
20681   kCycles for 10 * strstr_nidud

16534   kCycles for 10 * MbInstr 0 (zero-delimited)
15786   kCycles for 10 * MbInstr 0 (file size)
3788    kCycles for 10 * MbInstr FAST
25781   kCycles for 10 * crt_strstr
23581   kCycles for 10 * M32 find$
20714   kCycles for 10 * strstr_nidud

1068448 = eax MbInstr 0 (zero-delimited)
1068448 = eax MbInstr 0 (file size)
1068448 = eax MbInstr FAST
1068448 = eax crt_strstr
1068448 = eax M32 find$
1068448 = eax strstr_nidud

Title: Re: Instr, strstr, find$
Post by: johnsa on July 25, 2014, 12:03:51 AM
I'm at work at the moment so I don't have much time to test, what are the parameters for the test:

Search string (needle)
Search body (haystack)

and number of timing iterations?

ie: find "te" in "test" 1 million times?
I've got my own instr algo but it's 64bit so i can't put it into your testbench, would be interesting to see the comparison (I'd have to report ms not cycles for now).
Title: Re: Instr, strstr, find$
Post by: jj2007 on July 25, 2014, 02:04:25 AM
Search string (needle)
Search body (haystack)

and number of timing iterations?

Yes, body is \Masm32\include\winextra.inc, and string is echo WARNING

Thanks for your timings :icon14:
Title: Re: Instr, strstr, find$
Post by: Gunther on July 25, 2014, 04:04:31 AM
Jochen,

InstrTimings5 brings:
Code: [Select]

c:\yasm\work>InstrTimingsNew.exe
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
+++++++13 of 20 tests valid, loop overhead is approx. 36/10 cycles

20406   kCycles for 10 * MbInstr 0 (zero-delimited)
18924   kCycles for 10 * MbInstr 0 (file size)
7679    kCycles for 10 * MbInstr FAST
23118   kCycles for 10 * crt_strstr
31910   kCycles for 10 * M32 find$
25510   kCycles for 10 * strstr_nidud

20353   kCycles for 10 * MbInstr 0 (zero-delimited)
21436   kCycles for 10 * MbInstr 0 (file size)
4674    kCycles for 10 * MbInstr FAST
22646   kCycles for 10 * crt_strstr
28390   kCycles for 10 * M32 find$
24980   kCycles for 10 * strstr_nidud

20306   kCycles for 10 * MbInstr 0 (zero-delimited)
18900   kCycles for 10 * MbInstr 0 (file size)
4582    kCycles for 10 * MbInstr FAST
22577   kCycles for 10 * crt_strstr
28547   kCycles for 10 * M32 find$
24806   kCycles for 10 * strstr_nidud

1068448 = eax MbInstr 0 (zero-delimited)
1068448 = eax MbInstr 0 (file size)
1068448 = eax MbInstr FAST
1068448 = eax crt_strstr
1068448 = eax M32 find$
1068448 = eax strstr_nidud

The environment is Windows XP under Virtual PC.

Gunther
Title: Re: Instr, strstr, find$
Post by: jj2007 on July 25, 2014, 05:30:22 AM
4674    kCycles for 10 * MbInstr FAST
22646   kCycles for 10 * crt_strstr

Nice  :biggrin:

It gets even worse with a string like vc2010 (attached) - the speed depends strongly on the frequency of the first pattern byte, and e as in echo WARNING is pretty frequent.
Title: Re: Instr, strstr, find$
Post by: Gunther on July 25, 2014, 08:20:08 AM
Jochen,

InstrTimings5a (same environment):
Code: [Select]

c:\yasm\work>InstrTimingsNew.exe
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
+++++++++++++7 of 20 tests valid, loop overhead is approx. 44/10 cycles

18711   kCycles for 10 * MbInstr 0 (zero-delimited)
20339   kCycles for 10 * MbInstr 0 (file size)
1794    kCycles for 10 * MbInstr FAST
17515   kCycles for 10 * crt_strstr
22578   kCycles for 10 * M32 find$
17415   kCycles for 10 * strstr_nidud

18347   kCycles for 10 * MbInstr 0 (zero-delimited)
17085   kCycles for 10 * MbInstr 0 (file size)
4439    kCycles for 10 * MbInstr FAST
17287   kCycles for 10 * crt_strstr
22561   kCycles for 10 * M32 find$
17376   kCycles for 10 * strstr_nidud

18455   kCycles for 10 * MbInstr 0 (zero-delimited)
17043   kCycles for 10 * MbInstr 0 (file size)
1773    kCycles for 10 * MbInstr FAST
17269   kCycles for 10 * crt_strstr
22668   kCycles for 10 * M32 find$
20817   kCycles for 10 * strstr_nidud

995374  = eax MbInstr 0 (zero-delimited)
995374  = eax MbInstr 0 (file size)
995374  = eax MbInstr FAST
995374  = eax crt_strstr
995374  = eax M32 find$
995374  = eax strstr_nidud

Gunther
Title: Re: Instr, strstr, find$
Post by: dedndave on July 26, 2014, 02:30:55 AM
prescott w/htt xp sp3 msvcrt 7.0.2600.5701
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
++18 of 20 tests valid, loop overhead is approx. 30/10 cycles

54049   kCycles for 10 * MbInstr 0 (zero-delimited)
47497   kCycles for 10 * MbInstr 0 (file size)
7165    kCycles for 10 * MbInstr FAST
44331   kCycles for 10 * crt_strstr
43292   kCycles for 10 * M32 find$
40909   kCycles for 10 * strstr_nidud

54272   kCycles for 10 * MbInstr 0 (zero-delimited)
47732   kCycles for 10 * MbInstr 0 (file size)
7299    kCycles for 10 * MbInstr FAST
43498   kCycles for 10 * crt_strstr
43493   kCycles for 10 * M32 find$
41158   kCycles for 10 * strstr_nidud

53552   kCycles for 10 * MbInstr 0 (zero-delimited)
47853   kCycles for 10 * MbInstr 0 (file size)
7176    kCycles for 10 * MbInstr FAST
44348   kCycles for 10 * crt_strstr
42919   kCycles for 10 * M32 find$
41179   kCycles for 10 * strstr_nidud

995374  = eax MbInstr 0 (zero-delimited)
995374  = eax MbInstr 0 (file size)
995374  = eax MbInstr FAST
995374  = eax crt_strstr
995374  = eax M32 find$
995374  = eax strstr_nidud
Title: Assembler beats C hands down
Post by: jj2007 on July 26, 2014, 04:11:17 AM
1794    kCycles for 10 * MbInstr FAST
17515   kCycles for 10 * crt_strstr

Almost 10:1 against CRT is really nice, that's ammunition against those who claim that C compilers are better than assembler :greensml:

Even with Dave's trusty P4 it's 6 x CRT. But again, the test with a pattern that starts with "v" is a little bit unfair ;-)
Title: Re: Instr, strstr, find$
Post by: nidud on July 26, 2014, 05:12:15 AM
Quote
Assembler beats C hands down
Title: Re: Assembler beats C hands down
Post by: jj2007 on July 26, 2014, 06:27:43 AM
So Intel themselves use assembler to code the CRT... ::)
Somebody will probably argue now that strstr would be much faster had they only used their compiler instead :badgrin:
Title: Re: Instr, strstr, find$
Post by: Gunther on July 26, 2014, 08:40:44 AM
Jochen,

Somebody will probably argue now that strstr would be much faster had they only used their compiler instead :badgrin:

but that would be a bad joke. No one will believe that.

Gunther
Title: Volnitsky substring search algorithm
Post by: jj2007 on July 30, 2015, 07:24:13 AM
http://volnitsky.com/project/str_search/
Quote
Described new online substring search algorithm which allows faster string traversal. Presented here implementation is substantially faster than any other online substring search algorithms for average case.
Title: Re: Instr, strstr, find$
Post by: Grincheux on January 04, 2016, 11:49:45 PM
Is possible to make a backward seach. I explain :
1 - I search for "jpg"
2 - I search the first '"' BEFORE "jpg"

That would simplify the program.
Title: Re: Instr, strstr, find$
Post by: Grincheux on January 04, 2016, 11:59:06 PM
Quote
C:\Users\Grincheux\Downloads\InstrTimings5a>InstrTimingsNew.exe
AMD Athlon(tm) II X2 250 Processor (SSE3)
++++++++++++++++++++
22129   kCycles for 10 * MbInstr 0 (zero-delimited)
19881   kCycles for 10 * MbInstr 0 (file size)
3080    kCycles for 10 * MbInstr FAST
49397   kCycles for 10 * crt_strstr
46292   kCycles for 10 * M32 find$
20135   kCycles for 10 * strstr_nidud

22177   kCycles for 10 * MbInstr 0 (zero-delimited)
19904   kCycles for 10 * MbInstr 0 (file size)
3100    kCycles for 10 * MbInstr FAST
49343   kCycles for 10 * crt_strstr
45889   kCycles for 10 * M32 find$
20143   kCycles for 10 * strstr_nidud

22259   kCycles for 10 * MbInstr 0 (zero-delimited)
19881   kCycles for 10 * MbInstr 0 (file size)
3048    kCycles for 10 * MbInstr FAST
49342   kCycles for 10 * crt_strstr
45947   kCycles for 10 * M32 find$
20126   kCycles for 10 * strstr_nidud

995374  = eax MbInstr 0 (zero-delimited)
995374  = eax MbInstr 0 (file size)
995374  = eax MbInstr FAST
995374  = eax crt_strstr
995374  = eax M32 find$
995374  = eax strstr_nidud

C:\Users\Grincheux\Downloads\InstrTimings5a>

I think that the best is MbInstr

I include MasmBasic.inc and MasmBasic.lib. That'sll I have to do?
Title: Re: Instr, strstr, find$
Post by: Grincheux on January 05, 2016, 02:01:22 AM
I have installed the JJ2007's "InstrJJ" function.
A small part of AgnerAfrog (strlen) and an other part of JJ2007.
I forgot, a small part of Hutch for the memory, and Fearless for the interface.
I dropped VirtualAlloc and replaced it with a big buffer into the data segment.
It works fine.
I don't understand again files are not well downloaded, some of them are good but a big part are bad.
I continue searching before uploading this new version
Title: Re: Instr, strstr, find$
Post by: Grincheux on January 05, 2016, 04:20:53 AM
Quote
0D 0A 0D 0A 0D 0A 0D 0A 3C 21 44 4F 43 54 59 50 45 20 68 74 6D 6C 3E 0D 0A 0D 0A 3C 21 2D 2D 5B
........<!DOCTYPE html>....<!--[

I thought that "<!DOCTYPE html>" always was on the first line, so my test verifying if the file is an html file was wrong. I made correction.
Title: A little late, but....
Post by: 0000 on May 15, 2018, 11:04:41 AM
Yup a little late, but I wanted to test the performance of my new toy...

Code: [Select]
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

27111   cycles for 100 * MbInstr 0
24259   cycles for 100 * MbInstr 1
26501   cycles for 100 * MbInstr 2
23610   cycles for 100 * MbInstr 4
24436   cycles for 100 * crt_strstr
27061   cycles for 100 * M32 find$

26900   cycles for 100 * MbInstr 0
29255   cycles for 100 * MbInstr 1
29007   cycles for 100 * MbInstr 2
27196   cycles for 100 * MbInstr 4
28573   cycles for 100 * crt_strstr
26529   cycles for 100 * M32 find$

29880   cycles for 100 * MbInstr 0
24030   cycles for 100 * MbInstr 1
23902   cycles for 100 * MbInstr 2
25233   cycles for 100 * MbInstr 4
22454   cycles for 100 * crt_strstr
26943   cycles for 100 * M32 find$

18      bytes for MbInstr 0
18      bytes for MbInstr 1
18      bytes for MbInstr 2
18      bytes for MbInstr 4
22      bytes for crt_strstr
15      bytes for M32 find$

97      = eax MbInstr 0
97      = eax MbInstr 1
97      = eax MbInstr 2
97      = eax MbInstr 4
97      = eax crt_strstr
97      = eax M32 find$


 :biggrin:   :P