## News:

Message to All Guests

## String Compare - Updated

Started by guga, January 28, 2013, 12:12:44 PM

#### guga

Hi guys

can someone test this string comparation function i made ? I have no idea how fast/slow is it.

In theory it should be twice as fast as strcmp from msvcrt, but i´m unable to benchmark it.

UPDATED VERSION 29/01/2013

Proc FastStringCompare:
Arguments @String1, @String2
Uses esi, ecx, ebx, edx

mov esi D@String1
mov ecx 0
mov eax 0
mov ebx D@String2

mov edx D\$esi
cmp edx D\$ebx   ; just to force the zero flags. if we have a match zero flag is settled. But we need to check if the last byte at string1 is zero.

jz L0> ; The 1st 4 bytes are not zero, jmp over

mov ecx 3
shr edx 8 | jz L1> ; If string is only 1 byte len. ecx = 3
inc ecx | jmp L1> ; else, ecx = 4

L0:
test dh dh | jz L1> ; the last byte of the 1st string is zero. We reached the end of the search. Contunue computing the remaining bytes at "L1"
lea ecx D\$ecx+4 ; compute the len of the string1
mov edx D\$esi+ecx
cmp D\$ebx+ecx edx
jz L0<

jmp L3>> ; strings definitelly dont match

L1:

lea ebx D\$ebx+ecx | mov edx D\$ebx-3
lea esi D\$esi+ecx
cmp D\$esi-3 edx; if both words on string1 and string2 don´ match jmp over. Otherwise, continue analysing the remaining bytes
jnz L3>>
; We are almost there. If ecx = 0, means that the string1 have only 2 byte len. So let´t search for the 2nd byte of string2
;;
; so far all bytes are equal. Let´t see if teh string2 is null terminated as it is on string1. Ex: str1) "guga1" | str2) "guga1" Or if it is str1) "guga1" | str2) "gugaM"...
; On the dword, we need to check the byte after the dx like: 00XX_0000
If B\$ebx-1 = 0
inc eax
End_If

Note for me, i commented it out, because the above assumption is wrong. In fact, when we reach here we have analysed all the string
and both are exactly the same.
;;

; In mine tests no matter what bytes gets here, because it always means that we fully analysed the strings, and they always matches
inc eax

L3:

EndP

Tested strings:

call FastStringCompare {"1", 0}, {"g", 0}
call FastStringCompare {"g", 0}, {"g", 0}
call FastStringCompare {"gu", 0}, {"gu", 0}
call FastStringCompare {"gug", 0}, {"gug", 0}
call FastStringCompare {"guga", 0}, {"guga", 0}
call FastStringCompare {"guga1", 0}, {"guga1", 0}
call FastStringCompare {"guga12", 0}, {"guga12", 0}
call FastStringCompare {"guga123", 0}, {"guga123", 0}
call FastStringCompare {"guga1234", 0}, {"guga1234", 0}
call FastStringCompare {"guga12345", 0}, {"guga12345", 0}
call FastStringCompare {"guga123456", 0}, {"guga123456", 0}

call FastStringCompare {"1234", 0}, {"5678", 0}
call FastStringCompare {"1", 0}, {"g", 0}
call FastStringCompare {"gustavo trigu", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"g", 0}, {"gu", 0}
call FastStringCompare {"gu", 0}, {"gu", 0}
call FastStringCompare {"gu", 0}, {"guu", 0}
call FastStringCompare {"guuu", 0}, {"guu", 0}
call FastStringCompare {"guu", 0}, {"guuu", 0}
call FastStringCompare {"gug", 0}, {"guga", 0}
call FastStringCompare {"g44564", 0}, {"112345g", 0}
call FastStringCompare {"gu", 0}, {"g", 0}
call FastStringCompare {"gu", 0}, {"gu", 0}
call FastStringCompare {"g", 0}, {"gu", 0}
call FastStringCompare {"g", 0}, {"g", 0}
call FastStringCompare {"1", 0}, {"g", 0}

call FastStringCompare {"g1stavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gu1tavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gus1avo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gust1vo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gusta1o trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustav1 trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo1trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo 1rigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros e guilherm1", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros e guilher1e", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros e guilhe1me", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros e guilh1rme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros e guil1erme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros e gui1herme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros e gu1lherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros e g1ilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros e 1uilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros e1guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros 1 guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros1e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiro1 e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueir1s e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
call FastStringCompare {"gustavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}

call FastStringCompare {"guga", 0}, {"guga1", 0}
call FastStringCompare {"guga1", 0}, {"guga", 0}

________________________________________________________________________________________________
________________________________________________________________________________________________
________________________________________________________________________________________________
________________________________________________________________________________________________
________________________________________________________________________________________________

OLDER VERSION 29/01/2013 - Aligned data. It is fast, but the above one is faster.

Proc FastStringCompare:
Arguments @String1, @String2
Uses esi, ecx, ebx, edx

mov esi D@String1
mov ecx 0
mov eax 0
mov ebx D@String2

mov edx D\$esi
cmp edx D\$ebx   ; just to force the zero flags. if we have a match zero flag is settled. But we need to check if the last byte at string1 is zero.

jz L0> ; The 1st 4 bytes are not zero, jmp over

mov ecx 3
shr edx 8 | jz L1> ; If string is only 1 byte len. ecx = 3
inc ecx | jmp L1> ; else, ecx = 4

L0:
cmp dh 0 | jz L1> ; the last byte of the 1st string is zero. We reached the end of the search. Contunue computing the remaining bytes at "L1"
lea ecx D\$ecx+4 ; compute the len of the string1
mov edx D\$esi+ecx
cmp D\$ebx+ecx edx
jz L0<

jmp L3>> ; strings definitelly dont match

L1:

lea ebx D\$ebx+ecx | mov edx D\$ebx-3
lea esi D\$esi+ecx
cmp edx D\$esi-3 ; if both words on string1 and string2 don´ match jmp over. Otherwise, continue analysing the remaining bytes
jnz L3>>
; We are almost there. If ecx = 0, means that the string1 have only 2 byte len. So let´t search for the 2nd byte of string2
;;
; so far all bytes are equal. Let´t see if teh string2 is null terminated as it is on string1. Ex: str1) "guga1" | str2) "guga1" Or if it is str1) "guga1" | str2) "gugaM"...
; On the dword, we need to check the byte after the dx like: 00XX_0000
If B\$ebx-1 = 0
inc eax
End_If

Note for me, i commented it out, because the above assumption is wrong. In fact, when we reach here we have analysed all the string
and both are exactly the same.
;;

; In mine tests no matter what bytes gets here, because it always means that we fully analysed the strings, and they always matches
inc eax

L3:

EndP

OLDER VERSION 28/01/2013 - Unaligned data and some mistake when the string is only 1 byte len

; Return values:
; Eax = 0. Strings don´t match
; Eax = 1. Strings matches

Proc FastStringCompare:
Arguments @String1, @String2
Uses esi, ecx, ebx, edx

mov esi D@String1
xor ecx ecx
xor eax eax
xor edx edx
mov ebx D@String2

mov ax W\$ebx
mov dx W\$esi
xor ax dx   ; just to force the zero flags. if we have a match zero flag is settled. But we need to check if the last byte at string1 is zero.
L0:
or dh 0 | jz L1> ; the last byte of the 1st string is zero. We reached the end of the search. Contunue computing the remaining bytes at "L1"
lea ecx D\$ecx+2 ; compute the len of the string1
mov dx W\$esi+ecx
cmp W\$ebx+ecx dx
jz L0<

L1:

lea ebx D\$ebx+ecx | mov ax W\$ebx-1
lea esi D\$esi+ecx | mov dx W\$esi-1
xor ax dx ; if both words on string1 and string2 don´ match jmp over. Otherwise, continue analysing the remaining bytes

jz L1> | L2: | xor eax eax | ExitP | L1:
; We are almost there. If ecx = 0, means that the string1 have only 2 byte len. So let´t search for the 2nd byte of string2
L1:

; do again for the last bytes
; so far all bytes are equal

mov eax &TRUE
.If ecx = 0 ; when len (ecx) is zero it means that we the string is smaller then 2 bytes (including null terminated)
If B\$ebx+1 <> 0 ; so all we need is analyse if the last byte if the string2 is zero (Because the 1st one of both matches)
xor eax eax
End_If
.Else_If B\$ebx <> 0
xor eax eax
.End_If

EndP

Code examples

call FastStringCompare {"g", 0}, {"gu", 0}
call FastStringCompare {"gu", 0}, {"gu", 0}
call FastStringCompare {"gu", 0}, {"guu", 0}
call FastStringCompare {"guuu", 0}, {"guu", 0}
call FastStringCompare {"guu", 0}, {"guuu", 0}
call FastStringCompare {"gug", 0}, {"guga", 0}
call FastStringCompare {"g44564", 0}, {"112345g", 0}
call FastStringCompare {"gu", 0}, {"g", 0}
call FastStringCompare {"gu", 0}, {"gu", 0}
call FastStringCompare {"g", 0}, {"gu", 0}
call FastStringCompare {"g", 0}, {"g", 0}
mov ecx 0-1
Do
call FastStringCompare {"gustavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
dec ecx
Loop_Until ecx = 0
call FastStringCompare {"guga", 0}, {"guga1", 0}
call FastStringCompare {"guga1", 0}, {"guga", 0}
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### MichaelW

Judging from the function test results, my translation to MASM code is apparently correct.

;==============================================================================
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
;==============================================================================
.data
str1 db "The MASM32 SDK has a large body of example code that deals",0
str2 db "The MASM32 SDK has a large body of example code that deal",0
.code
;==============================================================================
FastStringCompare proc uses esi ecx ebx edx string1:DWORD, string2:DWORD

mov esi, string1
xor ecx, ecx
xor eax, eax
xor edx, edx
mov ebx, string2

mov ax, [ebx]
mov dx, [esi]
xor ax, dx          ; just to force the zero flags.
; if we have a match zero flag is settled.
; But we need to check if the last byte at
; string1 is zero.
L0:
or dh, 0
jz L1               ; the last byte of the 1st string is zero.
; We reached the end of the search.
; Contunue computing the remaining bytes at "L1"
lea ecx, [ecx+2]    ; compute the len of the string1
mov dx, [esi+ecx]
cmp [ebx+ecx], dx
jz L0

L1:
lea ebx, [ebx+ecx]
mov ax, [ebx-1]
lea esi, [esi+ecx]
mov dx, [esi-1]
xor ax, dx          ; if both words on string1 and string2 don't
; match jmp over. Otherwise, continue analysing
;the remaining bytes
jz L2
xor eax, eax
ret

L2:                   ; We are almost there. If ecx = 0, means that the
; string1 have only 2 byte len. So let´t search
; for the 2nd byte of string2.

; do again for the last bytes
; so far all bytes are equal

mov eax, TRUE
.if ecx == 0        ; when len (ecx) is zero it means that we the
; string is smaller then 2 bytes (including null
; terminated).
.if byte ptr [ebx+1]
; so all we need is analyse if the last byte if
; the string2 is zero (Because the 1st one of both
; matches)
xor eax, eax
.endif
.elseif byte ptr [ebx]
xor eax, eax
.endif

ret

FastStringCompare endp

;==============================================================================
start:
;==============================================================================

invoke FastStringCompare, chr\$("g"), chr\$("gu")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("gu"), chr\$("gu")
printf("(should be 1)\t%d\n", eax)
invoke FastStringCompare, chr\$("gu"), chr\$("guu")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("guuu"), chr\$("guu")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("guu"), chr\$("guuu")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("gug"), chr\$("guga")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("g44564"), chr\$("112345g")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("gu"), chr\$("g")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("gu"), chr\$("gu")
printf("(should be 1)\t%d\n", eax)
invoke FastStringCompare, chr\$("g"), chr\$("gu")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("g"), chr\$("g")
printf("(should be 1)\t%d\n\n", eax)

invoke GetCurrentProcess
invoke SetProcessAffinityMask, eax, 1

invoke Sleep, 5000

REPEAT 3

counter_begin 10000000, HIGH_PRIORITY_CLASS
counter_end
printf("%d cycles, crt_strcmp str1, str1\n", eax)

counter_begin 10000000, HIGH_PRIORITY_CLASS
counter_end
printf("%d cycles, FastStringCompare str1, str1\n", eax)

counter_begin 10000000, HIGH_PRIORITY_CLASS
counter_end
printf("%d cycles, crt_strcmp str1, str2\n", eax)

counter_begin 10000000, HIGH_PRIORITY_CLASS
counter_end
printf("%d cycles, FastStringCompare str1, str2\n\n", eax)

ENDM

inkey
exit
;==============================================================================
end start

Running on a P3:

321 cycles, crt_strcmp str1, str1
274 cycles, FastStringCompare str1, str1
308 cycles, crt_strcmp str1, str2
151 cycles, FastStringCompare str1, str2

320 cycles, crt_strcmp str1, str1
274 cycles, FastStringCompare str1, str1
307 cycles, crt_strcmp str1, str2
151 cycles, FastStringCompare str1, str2

320 cycles, crt_strcmp str1, str1
274 cycles, FastStringCompare str1, str1
308 cycles, crt_strcmp str1, str2
151 cycles, FastStringCompare str1, str2

Running on a P4 Northwood:

199 cycles, crt_strcmp str1, str1
274 cycles, FastStringCompare str1, str1
197 cycles, crt_strcmp str1, str2
205 cycles, FastStringCompare str1, str2

195 cycles, crt_strcmp str1, str1
272 cycles, FastStringCompare str1, str1
194 cycles, crt_strcmp str1, str2
205 cycles, FastStringCompare str1, str2

199 cycles, crt_strcmp str1, str1
272 cycles, FastStringCompare str1, str1
185 cycles, crt_strcmp str1, str2
205 cycles, FastStringCompare str1, str2

I think eliminating the partial-register (and misaligned) accesses should improve the speed significantly.
Well Microsoft, here's another nice mess you've gotten us into.

#### Ficko

Hi Guga,

For some times ago I was playing with string compare and tried aligning the strings.

I have no Idea how fast or "slow" :P is this algo just thought to post it for interest.

; STRCMP\$(lpString1:STRING,lpString2:STRING),INT
;lpString1 < lpString2 EAX = -n : lpString1 > lpString2 EAX = +n : lpString1 = lpString2 EAX = 0
;Flags are set as well
; =============== S U B R O U T I N E =======================================
align 4
STRCMP\$ proc public uses edi esi ebx String1:DWORD, String2:DWORD
mov esi, String1
mov edi, String2
cmp edi, esi
jz Exit_I
xor edx, edx
xor eax, eax
xor ecx, ecx
mov dl, 3
@@: test esi, edx
jz aligned
lodsb
mov cl, [edi]
sub eax, ecx
jnz Exit
jecxz Exit
inc edi
jmp @B
notequ: xchg eax, edx
@@: movzx eax, dl
movzx ecx, bl
sub eax, ecx
jnz Exit
jecxz Exit
shr edx, 8
shr ebx, 8
jmp @B
aligned:mov ecx, 80808080h ;ECX = 0x80808080
mov edx, 1010101h    ;EDX = 0x1010101
@@: mov eax, [esi]
mov ebx, [edi]
cmp eax, ebx
jnz notequ
sub eax, edx
and eax, ecx
je @B
not ebx
and eax, ebx
je @B
Exit_I: xor eax, eax
Exit: ret 8
STRCMP\$ endp

#### dedndave

the problem with string comparison is one string may be aligned when the other is not

if you want maximum speed, i would think you would access both strings as aligned dwords
you may wind up writing 4 different loops to do the grunt-work

some testing may be required to find out if the in-register re-alignment
effort is actually faster than un-aligned accesses (for one of the strings)

#### MichaelW

My first test had a problem in that I failed to DWORD align my two test strings in the data section. Aligning them dropped the FastStringCompare (P3) cycle counts from ~274 and ~150 to ~154 and ~149. Then I determined that these two strings had to be the same length to get the correct comparison result. To fix this I simply made the strings the same length, but there is some problem either in the original code or in my translation. I was able to eliminate a few of the partial-register accesses, but not all of them.

;==============================================================================
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
;==============================================================================
.data
align 4
str1 db "The MASM32 SDK has a large body of example code that deals",0
align 4
str2 db "The MASM32 SDK has a large body of example code that deal ",0
.code
;==============================================================================
FastStringCompare proc uses esi ecx ebx edx string1:DWORD, string2:DWORD

mov esi, string1
xor ecx, ecx
xor eax, eax
xor edx, edx
mov ebx, string2

;; ELIMINATED PARTIAL REGISTER ACCESSES HERE

movzx eax, WORD PTR [ebx]
movzx edx, WORD PTR [esi]
xor eax, edx        ; just to force the zero flags.
; if we have a match zero flag is settled.
; But we need to check if the last byte at
; string1 is zero.
L0:

or dh, 0
jz L1               ; the last byte of the 1st string is zero.
; We reached the end of the search.
; Contunue computing the remaining bytes at "L1"
lea ecx, [ecx+2]    ; compute the len of the string1

mov dx, [esi+ecx]
cmp [ebx+ecx], dx
jz L0

L1:
lea ebx, [ebx+ecx]
mov ax, [ebx-1]
lea esi, [esi+ecx]
mov dx, [esi-1]
xor ax, dx          ; if both words on string1 and string2 don't
; match jmp over. Otherwise, continue analysing
; the remaining bytes
jz L2
xor eax, eax
ret

L2:                   ; We are almost there. If ecx = 0, means that the
; string1 have only 2 byte len. So let´t search
; for the 2nd byte of string2.

; do again for the last bytes
; so far all bytes are equal

mov eax, TRUE
.if ecx == 0        ; when len (ecx) is zero it means that we the
; string is smaller then 2 bytes (including null
; terminated).
.if byte ptr [ebx+1]
; so all we need is analyse if the last byte if
; the string2 is zero (Because the 1st one of both
; matches)
xor eax, eax
.endif
.elseif byte ptr [ebx]
xor eax, eax
.endif

ret

FastStringCompare endp

;==============================================================================
start:
;==============================================================================

invoke FastStringCompare, chr\$("g"), chr\$("gu")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("gu"), chr\$("gu")
printf("(should be 1)\t%d\n", eax)
invoke FastStringCompare, chr\$("gu"), chr\$("guu")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("guuu"), chr\$("guu")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("guu"), chr\$("guuu")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("gug"), chr\$("guga")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("g44564"), chr\$("112345g")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("gu"), chr\$("g")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("gu"), chr\$("gu")
printf("(should be 1)\t%d\n", eax)
invoke FastStringCompare, chr\$("g"), chr\$("gu")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("g"), chr\$("g")
printf("(should be 1)\t%d\n\n", eax)

invoke FastStringCompare, chr\$("gustavo trigueiros e guilherme"),
chr\$("gustavo trigueiros e guilherme")
printf("(should be 1)\t%d\n\n", eax)

invoke FastStringCompare, chr\$("guga"), chr\$("guga1")
printf("(should be 0)\t%d\n", eax)
invoke FastStringCompare, chr\$("guga1"), chr\$("guga")
printf("(should be 0)\t%d\n\n", eax)

printf("(should be 1)\t%d\n", eax)
printf("(should be 1)\t%d\n", eax)
printf("(should be 0)\t%d\n", eax)
printf("(should be 0)\t%d\n\n", eax)

invoke GetCurrentProcess
invoke SetProcessAffinityMask, eax, 1

invoke Sleep, 5000

REPEAT 3

counter_begin 10000000, HIGH_PRIORITY_CLASS
counter_end
printf("%d cycles, crt_strcmp str1, str1\n", eax)

counter_begin 10000000, HIGH_PRIORITY_CLASS
counter_end
printf("%d cycles, FastStringCompare str1, str1\n", eax)

counter_begin 10000000, HIGH_PRIORITY_CLASS
counter_end
printf("%d cycles, crt_strcmp str1, str2\n", eax)

counter_begin 10000000, HIGH_PRIORITY_CLASS
counter_end
printf("%d cycles, FastStringCompare str1, str2\n\n", eax)

ENDM

inkey
exit
;==============================================================================
end start

P3:

320 cycles, crt_strcmp str1, str1
129 cycles, FastStringCompare str1, str1
307 cycles, crt_strcmp str1, str2
124 cycles, FastStringCompare str1, str2

320 cycles, crt_strcmp str1, str1
129 cycles, FastStringCompare str1, str1
307 cycles, crt_strcmp str1, str2
123 cycles, FastStringCompare str1, str2

320 cycles, crt_strcmp str1, str1
129 cycles, FastStringCompare str1, str1
307 cycles, crt_strcmp str1, str2
124 cycles, FastStringCompare str1, str2

P4 Northwood:

193 cycles, crt_strcmp str1, str1
151 cycles, FastStringCompare str1, str1
194 cycles, crt_strcmp str1, str2
147 cycles, FastStringCompare str1, str2

196 cycles, crt_strcmp str1, str1
152 cycles, FastStringCompare str1, str1
196 cycles, crt_strcmp str1, str2
152 cycles, FastStringCompare str1, str2

196 cycles, crt_strcmp str1, str1
151 cycles, FastStringCompare str1, str1
196 cycles, crt_strcmp str1, str2
149 cycles, FastStringCompare str1, str2

Well Microsoft, here's another nice mess you've gotten us into.

#### jj2007

FastStringCompare looks competitive, but is the return value correct?

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 238/100 cycles

6929    cycles for 100 * crt_strcmp
7926    cycles for 100 * FastStringCompare
5713    cycles for 100 * MasmBasic StringsDiffer
6034    cycles for 100 * STRCMP\$

6936    cycles for 100 * crt_strcmp
7925    cycles for 100 * FastStringCompare
5714    cycles for 100 * MasmBasic StringsDiffer
6033    cycles for 100 * STRCMP\$

6930    cycles for 100 * crt_strcmp
7920    cycles for 100 * FastStringCompare
5720    cycles for 100 * MasmBasic StringsDiffer
6033    cycles for 100 * STRCMP\$

6928    cycles for 100 * crt_strcmp
7924    cycles for 100 * FastStringCompare
5716    cycles for 100 * MasmBasic StringsDiffer
6034    cycles for 100 * STRCMP\$

6933    cycles for 100 * crt_strcmp
7985    cycles for 100 * FastStringCompare
5717    cycles for 100 * MasmBasic StringsDiffer
6048    cycles for 100 * STRCMP\$

-1      = eax crt_strcmp
0       = eax FastStringCompare
-23     = eax MasmBasic StringsDiffer
-23     = eax STRCMP\$

#### Ficko

Mine above rocks despite using some slow OP codes like "jecxz".

Quote
130 cycles, crt_strcmp str1, str1
116 cycles, FastStringCompare str1, str1
1 cycles, STRCMP\$ str1, str1
128 cycles, crt_strcmp str1, str2
111 cycles, FastStringCompare str1, str2
51 cycles, STRCMP\$ str1, str2

130 cycles, crt_strcmp str1, str1
116 cycles, FastStringCompare str1, str1
1 cycles, STRCMP\$ str1, str1
127 cycles, crt_strcmp str1, str2
111 cycles, FastStringCompare str1, str2
51 cycles, STRCMP\$ str1, str2

130 cycles, crt_strcmp str1, str1
116 cycles, FastStringCompare str1, str1
1 cycles, STRCMP\$ str1, str1
127 cycles, crt_strcmp str1, str2
111 cycles, FastStringCompare str1, str2
50 cycles, STRCMP\$ str1, str2

#### MichaelW

The CRT strcmp function returns 0 if the strings match, < 0 if string1 < string2, or > 0 if string1 is > string2, allowing it to be used directly as a comparison function for qsort and bsearch.
Well Microsoft, here's another nice mess you've gotten us into.

#### jj2007

Quote from: Ficko on January 29, 2013, 01:46:36 AM
Mine above rocks despite using some slow OP codes like "jecxz".

It's pretty good for a non-SSE2 algo - I added it above :t

@Michael: I meant FastStringCompare, it returns zero for some reason - see above.

#### Ficko

Quote from: MichaelW on January 29, 2013, 02:02:34 AM
The CRT strcmp function returns 0 if the strings match, < 0 if string1 < string2, or > 0 if string1 is > string2, allowing it to be used directly as a comparison function for qsort and bsearch.

STRCMP\$ mimics this behavior as well - additionaly it set's the flags too so "eax" do not have to be checked. -

I designed it for my QSORT algo indeed.  ;)

#### MichaelW

FastStringCompare is returning false because the strings do not match.
Well Microsoft, here's another nice mess you've gotten us into.

#### jj2007

Quote from: MichaelW on January 29, 2013, 02:27:52 AM
FastStringCompare is returning false because the strings do not match.

Make them match and see what it returns.

#### MichaelW

Quote from: jj2007 on January 29, 2013, 02:49:46 AM
Quote from: MichaelW on January 29, 2013, 02:27:52 AM
FastStringCompare is returning false because the strings do not match.

Make them match and see what it returns.

In my tests the return value is correct only if the strings are aligned (I used a 4-byte alignment and did not investigate other alignments). I intended to add this information to my second post, but forgot to do so.
Well Microsoft, here's another nice mess you've gotten us into.

#### jj2007

Strange - I get this:

align 4  ; not equal
Str1   db "gustavo trigueiros e guilhermeA", 0
align 4
Str2   db "gustavo trigueiros e guilhermeX", 0

6930    cycles for 100 * crt_strcmp
8307    cycles for 100 * FastStringCompare
4738    cycles for 100 * MasmBasic StringsDiffer
6028    cycles for 100 * STRCMP\$

-1      = eax crt_strcmp
0       = eax FastStringCompare
-23     = eax MasmBasic StringsDiffer
-23     = eax STRCMP\$

7910    cycles for 100 * crt_strcmp
8122    cycles for 100 * FastStringCompare
4741    cycles for 100 * MasmBasic StringsDiffer
3921    cycles for 100 * STRCMP\$

align 4  ; EQUAL
Str1   db "gustavo trigueiros e guilhermeX", 0
align 4
Str2   db "gustavo trigueiros e guilhermeX", 0
0       = eax crt_strcmp
0       = eax FastStringCompare
0       = eax MasmBasic StringsDiffer
0       = eax STRCMP\$

#### guga

Hi guys, many tks for testing :)

Note: The return value for this function is only &TRUE or &FALSE

• If the strings matched it returns &TRUE
• If it fails (no matter the order of the previously good match) it always returns &FALSE

I didn´t coded it to return 0-1 (The cases that we have a partial match) because i wasn´t thinking on the qsort and bsearch functions. (I´ll see if i can implement this return, if it don´t slow down the resultant timmings)

About the alignment of the strings, indeed, i agree, it would (in theory) speed up the scanning process in 2x, approximately , but, how compilers align their strings (in 2 byte or 4) ? This is because i thought in 2 byte alignment/scanning because the minimum null terminated string to be compared is always a char+zero (Ex: "a", 0). But, what happens when we are dealing with a string inside a structure for example ? Or inside another chain of bytes (Like an array, etc), or even if this strings is in coded in the .text section in the middle of a function (Like in 3dframes.exe from masm package, for example) ?

Or even worst, what if the string ends on the end of a memory chunck ? I mean, if we have a string the looks like this
.data section --->
Guga db "Hello", 0
Virtual Memory: B\$ ? ? ?
(...)
.idata section.

If the function starts comparing in a 4 byte aligned when it reach the 5th byte "o, 0", wouldn´t cause the app to crash ? (Sorry for the dumb question, but i´m not sure if all compilers are 4 byte aligned)

Have anyone knows if all compilers align their strings (or others data) always in a 4 byte boundary ?

If it is always aligned on a 4 byte, i can try to implement a alignment strings for the comparison and see the results.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com