The MASM Forum

Projects => Rarely Used Projects => RosAsm => Topic started by: guga on January 28, 2013, 12:12:44 PM

Title: String Compare - Updated
Post by: guga on January 28, 2013, 12:12:44 PM
Hi guys

can someone test this string comparation function i made ? I have no idea how fast/slow is it.

In theory it should be twice as fast as strcmp from msvcrt, but i´m unable to benchmark it.

UPDATED VERSION 29/01/2013



Proc FastStringCompare:
    Arguments @String1, @String2
    Uses esi, ecx, ebx, edx

    mov esi D@String1
    mov ecx 0
    mov eax 0
    mov ebx D@String2

    mov edx D$esi
    cmp edx D$ebx   ; just to force the zero flags. if we have a match zero flag is settled. But we need to check if the last byte at string1 is zero.

jz L0> ; The 1st 4 bytes are not zero, jmp over

        mov ecx 3
        shr edx 8 | jz L1> ; If string is only 1 byte len. ecx = 3
        inc ecx | jmp L1> ; else, ecx = 4

    L0:
        test dh dh | jz L1> ; the last byte of the 1st string is zero. We reached the end of the search. Contunue computing the remaining bytes at "L1"
        lea ecx D$ecx+4 ; compute the len of the string1
        mov edx D$esi+ecx
        cmp D$ebx+ecx edx
        jz L0<

        jmp L3>> ; strings definitelly dont match

L1:

        lea ebx D$ebx+ecx | mov edx D$ebx-3
        lea esi D$esi+ecx
        cmp D$esi-3 edx; if both words on string1 and string2 don´ match jmp over. Otherwise, continue analysing the remaining bytes
        jnz L3>>
        ; We are almost there. If ecx = 0, means that the string1 have only 2 byte len. So let´t search for the 2nd byte of string2
;;
    ; so far all bytes are equal. Let´t see if teh string2 is null terminated as it is on string1. Ex: str1) "guga1" | str2) "guga1" Or if it is str1) "guga1" | str2) "gugaM"...
    ; On the dword, we need to check the byte after the dx like: 00XX_0000
    If B$ebx-1 = 0
        inc eax
    End_If

    Note for me, i commented it out, because the above assumption is wrong. In fact, when we reach here we have analysed all the string
    and both are exactly the same.
;;

    ; In mine tests no matter what bytes gets here, because it always means that we fully analysed the strings, and they always matches
    inc eax


L3:

EndP


Tested strings:


    call FastStringCompare {"1", 0}, {"g", 0}
    call FastStringCompare {"g", 0}, {"g", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"gug", 0}, {"gug", 0}
    call FastStringCompare {"guga", 0}, {"guga", 0}
    call FastStringCompare {"guga1", 0}, {"guga1", 0}
    call FastStringCompare {"guga12", 0}, {"guga12", 0}
    call FastStringCompare {"guga123", 0}, {"guga123", 0}
    call FastStringCompare {"guga1234", 0}, {"guga1234", 0}
    call FastStringCompare {"guga12345", 0}, {"guga12345", 0}
    call FastStringCompare {"guga123456", 0}, {"guga123456", 0}
   
    call FastStringCompare {"1234", 0}, {"5678", 0}
    call FastStringCompare {"1", 0}, {"g", 0}
    call FastStringCompare {"gustavo trigu", 0}, {"gustavo trigueiros e guilherme", 0}
    call FastStringCompare {"g", 0}, {"gu", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"gu", 0}, {"guu", 0}
    call FastStringCompare {"guuu", 0}, {"guu", 0}
    call FastStringCompare {"guu", 0}, {"guuu", 0}
    call FastStringCompare {"gug", 0}, {"guga", 0}
    call FastStringCompare {"g44564", 0}, {"112345g", 0}
    call FastStringCompare {"gu", 0}, {"g", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"g", 0}, {"gu", 0}
    call FastStringCompare {"g", 0}, {"g", 0}
    call FastStringCompare {"1", 0}, {"g", 0}

        call FastStringCompare {"g1stavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gu1tavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gus1avo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gust1vo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gusta1o trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustav1 trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo1trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo 1rigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilherm1", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilher1e", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilhe1me", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilh1rme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guil1erme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e gui1herme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e gu1lherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e g1ilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e 1uilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e1guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros 1 guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros1e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiro1 e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueir1s e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}

        call FastStringCompare {"guga", 0}, {"guga1", 0}
        call FastStringCompare {"guga1", 0}, {"guga", 0}



________________________________________________________________________________________________
________________________________________________________________________________________________
________________________________________________________________________________________________
________________________________________________________________________________________________
________________________________________________________________________________________________

OLDER VERSION 29/01/2013 - Aligned data. It is fast, but the above one is faster.

Proc FastStringCompare:
    Arguments @String1, @String2
    Uses esi, ecx, ebx, edx

    mov esi D@String1
    mov ecx 0
    mov eax 0
    mov ebx D@String2

    mov edx D$esi
    cmp edx D$ebx   ; just to force the zero flags. if we have a match zero flag is settled. But we need to check if the last byte at string1 is zero.

jz L0> ; The 1st 4 bytes are not zero, jmp over

        mov ecx 3
        shr edx 8 | jz L1> ; If string is only 1 byte len. ecx = 3
        inc ecx | jmp L1> ; else, ecx = 4

    L0:
        cmp dh 0 | jz L1> ; the last byte of the 1st string is zero. We reached the end of the search. Contunue computing the remaining bytes at "L1"
        lea ecx D$ecx+4 ; compute the len of the string1
        mov edx D$esi+ecx
        cmp D$ebx+ecx edx
        jz L0<

        jmp L3>> ; strings definitelly dont match

L1:

        lea ebx D$ebx+ecx | mov edx D$ebx-3
        lea esi D$esi+ecx
        cmp edx D$esi-3 ; if both words on string1 and string2 don´ match jmp over. Otherwise, continue analysing the remaining bytes
        jnz L3>>
        ; We are almost there. If ecx = 0, means that the string1 have only 2 byte len. So let´t search for the 2nd byte of string2
;;
    ; so far all bytes are equal. Let´t see if teh string2 is null terminated as it is on string1. Ex: str1) "guga1" | str2) "guga1" Or if it is str1) "guga1" | str2) "gugaM"...
    ; On the dword, we need to check the byte after the dx like: 00XX_0000
    If B$ebx-1 = 0
        inc eax
    End_If

    Note for me, i commented it out, because the above assumption is wrong. In fact, when we reach here we have analysed all the string
    and both are exactly the same.
;;

    ; In mine tests no matter what bytes gets here, because it always means that we fully analysed the strings, and they always matches
    inc eax

L3:

EndP



OLDER VERSION 28/01/2013 - Unaligned data and some mistake when the string is only 1 byte len


; Return values:
; Eax = 0. Strings don´t match
; Eax = 1. Strings matches

Proc FastStringCompare:
    Arguments @String1, @String2
    Uses esi, ecx, ebx, edx

    mov esi D@String1
    xor ecx ecx
    xor eax eax
    xor edx edx
    mov ebx D@String2

    mov ax W$ebx
    mov dx W$esi
    xor ax dx   ; just to force the zero flags. if we have a match zero flag is settled. But we need to check if the last byte at string1 is zero.
    L0:
        or dh 0 | jz L1> ; the last byte of the 1st string is zero. We reached the end of the search. Contunue computing the remaining bytes at "L1"
        lea ecx D$ecx+2 ; compute the len of the string1
        mov dx W$esi+ecx
        cmp W$ebx+ecx dx
        jz L0<

L1:

        lea ebx D$ebx+ecx | mov ax W$ebx-1
        lea esi D$esi+ecx | mov dx W$esi-1
        xor ax dx ; if both words on string1 and string2 don´ match jmp over. Otherwise, continue analysing the remaining bytes

        jz L1> | L2: | xor eax eax | ExitP | L1:
        ; We are almost there. If ecx = 0, means that the string1 have only 2 byte len. So let´t search for the 2nd byte of string2
L1:

    ; do again for the last bytes
    ; so far all bytes are equal

    mov eax &TRUE
    .If ecx = 0 ; when len (ecx) is zero it means that we the string is smaller then 2 bytes (including null terminated)
        If B$ebx+1 <> 0 ; so all we need is analyse if the last byte if the string2 is zero (Because the 1st one of both matches)
            xor eax eax
        End_If
    .Else_If B$ebx <> 0
        xor eax eax
    .End_If

EndP


Code examples

    call FastStringCompare {"g", 0}, {"gu", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"gu", 0}, {"guu", 0}
    call FastStringCompare {"guuu", 0}, {"guu", 0}
    call FastStringCompare {"guu", 0}, {"guuu", 0}
    call FastStringCompare {"gug", 0}, {"guga", 0}
    call FastStringCompare {"g44564", 0}, {"112345g", 0}
    call FastStringCompare {"gu", 0}, {"g", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"g", 0}, {"gu", 0}
    call FastStringCompare {"g", 0}, {"g", 0}
    mov ecx 0-1
    Do
        call FastStringCompare {"gustavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        dec ecx
    Loop_Until ecx = 0
    call FastStringCompare {"guga", 0}, {"guga1", 0}
    call FastStringCompare {"guga1", 0}, {"guga", 0}
Title: Re: String Compare
Post by: MichaelW on January 28, 2013, 02:29:07 PM
Judging from the function test results, my translation to MASM code is apparently correct.

;==============================================================================
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
;==============================================================================
.data
    str1 db "The MASM32 SDK has a large body of example code that deals",0
    str2 db "The MASM32 SDK has a large body of example code that deal",0
.code
;==============================================================================
FastStringCompare proc uses esi ecx ebx edx string1:DWORD, string2:DWORD

    mov esi, string1
    xor ecx, ecx
    xor eax, eax
    xor edx, edx
    mov ebx, string2

    mov ax, [ebx]
    mov dx, [esi]
    xor ax, dx          ; just to force the zero flags.
                        ; if we have a match zero flag is settled.
                        ; But we need to check if the last byte at
                        ; string1 is zero.
  L0:
    or dh, 0
    jz L1               ; the last byte of the 1st string is zero.
                        ; We reached the end of the search.
                        ; Contunue computing the remaining bytes at "L1"
    lea ecx, [ecx+2]    ; compute the len of the string1
    mov dx, [esi+ecx]
    cmp [ebx+ecx], dx
    jz L0

  L1:
    lea ebx, [ebx+ecx]
    mov ax, [ebx-1]
    lea esi, [esi+ecx]
    mov dx, [esi-1]
    xor ax, dx          ; if both words on string1 and string2 don't
                        ; match jmp over. Otherwise, continue analysing
                        ;the remaining bytes
    jz L2
    xor eax, eax
    ret

  L2:                   ; We are almost there. If ecx = 0, means that the
                        ; string1 have only 2 byte len. So let´t search
                        ; for the 2nd byte of string2.

    ; do again for the last bytes
    ; so far all bytes are equal

    mov eax, TRUE
    .if ecx == 0        ; when len (ecx) is zero it means that we the
                        ; string is smaller then 2 bytes (including null
                        ; terminated).
        .if byte ptr [ebx+1]
                        ; so all we need is analyse if the last byte if
                        ; the string2 is zero (Because the 1st one of both
                        ; matches)
            xor eax, eax
        .endif
    .elseif byte ptr [ebx]
        xor eax, eax
    .endif

    ret

FastStringCompare endp

;==============================================================================
start:
;==============================================================================

    invoke FastStringCompare, chr$("g"), chr$("gu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("gu")
    printf("(should be 1)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("guu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("guuu"), chr$("guu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("guu"), chr$("guuu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gug"), chr$("guga")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("g44564"), chr$("112345g")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("g")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("gu")
    printf("(should be 1)\t%d\n", eax)
    invoke FastStringCompare, chr$("g"), chr$("gu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("g"), chr$("g")
    printf("(should be 1)\t%d\n\n", eax)

    invoke GetCurrentProcess
    invoke SetProcessAffinityMask, eax, 1

    invoke Sleep, 5000

    REPEAT 3

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke crt_strcmp, ADDR str1, ADDR str1
    counter_end
    printf("%d cycles, crt_strcmp str1, str1\n", eax)

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke FastStringCompare, ADDR str1, ADDR str1
    counter_end
    printf("%d cycles, FastStringCompare str1, str1\n", eax)

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke crt_strcmp, ADDR str1, ADDR str2
    counter_end
    printf("%d cycles, crt_strcmp str1, str2\n", eax)

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke FastStringCompare, ADDR str1, ADDR str2
    counter_end
    printf("%d cycles, FastStringCompare str1, str2\n\n", eax)

    ENDM

    inkey
    exit
;==============================================================================
end start


Running on a P3:

321 cycles, crt_strcmp str1, str1
274 cycles, FastStringCompare str1, str1
308 cycles, crt_strcmp str1, str2
151 cycles, FastStringCompare str1, str2

320 cycles, crt_strcmp str1, str1
274 cycles, FastStringCompare str1, str1
307 cycles, crt_strcmp str1, str2
151 cycles, FastStringCompare str1, str2

320 cycles, crt_strcmp str1, str1
274 cycles, FastStringCompare str1, str1
308 cycles, crt_strcmp str1, str2
151 cycles, FastStringCompare str1, str2


Running on a P4 Northwood:

199 cycles, crt_strcmp str1, str1
274 cycles, FastStringCompare str1, str1
197 cycles, crt_strcmp str1, str2
205 cycles, FastStringCompare str1, str2

195 cycles, crt_strcmp str1, str1
272 cycles, FastStringCompare str1, str1
194 cycles, crt_strcmp str1, str2
205 cycles, FastStringCompare str1, str2

199 cycles, crt_strcmp str1, str1
272 cycles, FastStringCompare str1, str1
185 cycles, crt_strcmp str1, str2
205 cycles, FastStringCompare str1, str2


I think eliminating the partial-register (and misaligned) accesses should improve the speed significantly.
Title: Re: String Compare
Post by: Ficko on January 28, 2013, 09:56:50 PM
Hi Guga,

For some times ago I was playing with string compare and tried aligning the strings.

I have no Idea how fast or "slow" :P is this algo just thought to post it for interest.


; STRCMP$(lpString1:STRING,lpString2:STRING),INT
;lpString1 < lpString2 EAX = -n : lpString1 > lpString2 EAX = +n : lpString1 = lpString2 EAX = 0
;Flags are set as well
; =============== S U B R O U T I N E =======================================
align 4
STRCMP$ proc public uses edi esi ebx String1:DWORD, String2:DWORD
mov esi, String1
mov edi, String2
cmp edi, esi
jz Exit_I
xor edx, edx
xor eax, eax
xor ecx, ecx
mov dl, 3
@@: test esi, edx
jz aligned
lodsb
mov cl, [edi]
sub eax, ecx
jnz Exit
jecxz Exit
inc edi
jmp @B
notequ: xchg eax, edx
@@: movzx eax, dl
movzx ecx, bl
sub eax, ecx
jnz Exit
jecxz Exit
shr edx, 8
shr ebx, 8
jmp @B
aligned:mov ecx, 80808080h ;ECX = 0x80808080
mov edx, 1010101h    ;EDX = 0x1010101
@@: mov eax, [esi]
mov ebx, [edi]
add esi, 4
cmp eax, ebx
jnz notequ
add edi, 4
sub eax, edx
and eax, ecx
je @B
not ebx
and eax, ebx
je @B
Exit_I: xor eax, eax
Exit: ret 8
STRCMP$ endp
Title: Re: String Compare
Post by: dedndave on January 29, 2013, 12:26:16 AM
the problem with string comparison is one string may be aligned when the other is not

if you want maximum speed, i would think you would access both strings as aligned dwords
you may wind up writing 4 different loops to do the grunt-work

some testing may be required to find out if the in-register re-alignment
effort is actually faster than un-aligned accesses (for one of the strings)
Title: Re: String Compare
Post by: MichaelW on January 29, 2013, 12:56:44 AM
My first test had a problem in that I failed to DWORD align my two test strings in the data section. Aligning them dropped the FastStringCompare (P3) cycle counts from ~274 and ~150 to ~154 and ~149. Then I determined that these two strings had to be the same length to get the correct comparison result. To fix this I simply made the strings the same length, but there is some problem either in the original code or in my translation. I was able to eliminate a few of the partial-register accesses, but not all of them.

;==============================================================================
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
;==============================================================================
.data
    align 4
    str1 db "The MASM32 SDK has a large body of example code that deals",0
    align 4
    str2 db "The MASM32 SDK has a large body of example code that deal ",0
.code
;==============================================================================
FastStringCompare proc uses esi ecx ebx edx string1:DWORD, string2:DWORD

    mov esi, string1
    xor ecx, ecx
    xor eax, eax
    xor edx, edx
    mov ebx, string2

    ;; ELIMINATED PARTIAL REGISTER ACCESSES HERE

    movzx eax, WORD PTR [ebx]
    movzx edx, WORD PTR [esi]
    xor eax, edx        ; just to force the zero flags.
                        ; if we have a match zero flag is settled.
                        ; But we need to check if the last byte at
                        ; string1 is zero.
  L0:
 
    or dh, 0
    jz L1               ; the last byte of the 1st string is zero.
                        ; We reached the end of the search.
                        ; Contunue computing the remaining bytes at "L1"
    lea ecx, [ecx+2]    ; compute the len of the string1

    mov dx, [esi+ecx]
    cmp [ebx+ecx], dx
    jz L0

  L1:
    lea ebx, [ebx+ecx]
    mov ax, [ebx-1]
    lea esi, [esi+ecx]
    mov dx, [esi-1]
    xor ax, dx          ; if both words on string1 and string2 don't
                        ; match jmp over. Otherwise, continue analysing
                        ; the remaining bytes
    jz L2
    xor eax, eax
    ret

  L2:                   ; We are almost there. If ecx = 0, means that the
                        ; string1 have only 2 byte len. So let´t search
                        ; for the 2nd byte of string2.

    ; do again for the last bytes
    ; so far all bytes are equal

    mov eax, TRUE
    .if ecx == 0        ; when len (ecx) is zero it means that we the
                        ; string is smaller then 2 bytes (including null
                        ; terminated).
        .if byte ptr [ebx+1]
                        ; so all we need is analyse if the last byte if
                        ; the string2 is zero (Because the 1st one of both
                        ; matches)
            xor eax, eax
        .endif
    .elseif byte ptr [ebx]
        xor eax, eax
    .endif

    ret

FastStringCompare endp

;==============================================================================
start:
;==============================================================================

    invoke FastStringCompare, chr$("g"), chr$("gu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("gu")
    printf("(should be 1)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("guu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("guuu"), chr$("guu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("guu"), chr$("guuu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gug"), chr$("guga")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("g44564"), chr$("112345g")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("g")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("gu")
    printf("(should be 1)\t%d\n", eax)
    invoke FastStringCompare, chr$("g"), chr$("gu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("g"), chr$("g")
    printf("(should be 1)\t%d\n\n", eax)

    invoke FastStringCompare, chr$("gustavo trigueiros e guilherme"),
                              chr$("gustavo trigueiros e guilherme")
    printf("(should be 1)\t%d\n\n", eax)

    invoke FastStringCompare, chr$("guga"), chr$("guga1")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("guga1"), chr$("guga")
    printf("(should be 0)\t%d\n\n", eax)

    invoke FastStringCompare, ADDR str1, ADDR str1
    printf("(should be 1)\t%d\n", eax)
    invoke FastStringCompare, ADDR str2, ADDR str2
    printf("(should be 1)\t%d\n", eax)
    invoke FastStringCompare, ADDR str1, ADDR str2
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, ADDR str2, ADDR str1
    printf("(should be 0)\t%d\n\n", eax)

    invoke GetCurrentProcess
    invoke SetProcessAffinityMask, eax, 1

    invoke Sleep, 5000

    REPEAT 3

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke crt_strcmp, ADDR str1, ADDR str1
    counter_end
    printf("%d cycles, crt_strcmp str1, str1\n", eax)

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke FastStringCompare, ADDR str1, ADDR str1
    counter_end
    printf("%d cycles, FastStringCompare str1, str1\n", eax)

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke crt_strcmp, ADDR str1, ADDR str2
    counter_end
    printf("%d cycles, crt_strcmp str1, str2\n", eax)

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke FastStringCompare, ADDR str1, ADDR str2
    counter_end
    printf("%d cycles, FastStringCompare str1, str2\n\n", eax)

    ENDM

    inkey
    exit
;==============================================================================
end start


P3:

320 cycles, crt_strcmp str1, str1
129 cycles, FastStringCompare str1, str1
307 cycles, crt_strcmp str1, str2
124 cycles, FastStringCompare str1, str2

320 cycles, crt_strcmp str1, str1
129 cycles, FastStringCompare str1, str1
307 cycles, crt_strcmp str1, str2
123 cycles, FastStringCompare str1, str2

320 cycles, crt_strcmp str1, str1
129 cycles, FastStringCompare str1, str1
307 cycles, crt_strcmp str1, str2
124 cycles, FastStringCompare str1, str2


P4 Northwood:

193 cycles, crt_strcmp str1, str1
151 cycles, FastStringCompare str1, str1
194 cycles, crt_strcmp str1, str2
147 cycles, FastStringCompare str1, str2

196 cycles, crt_strcmp str1, str1
152 cycles, FastStringCompare str1, str1
196 cycles, crt_strcmp str1, str2
152 cycles, FastStringCompare str1, str2

196 cycles, crt_strcmp str1, str1
151 cycles, FastStringCompare str1, str1
196 cycles, crt_strcmp str1, str2
149 cycles, FastStringCompare str1, str2

Title: Re: String Compare
Post by: jj2007 on January 29, 2013, 01:40:43 AM
FastStringCompare looks competitive, but is the return value correct?

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 238/100 cycles


6929    cycles for 100 * crt_strcmp
7926    cycles for 100 * FastStringCompare
5713    cycles for 100 * MasmBasic StringsDiffer
6034    cycles for 100 * STRCMP$

6936    cycles for 100 * crt_strcmp
7925    cycles for 100 * FastStringCompare
5714    cycles for 100 * MasmBasic StringsDiffer
6033    cycles for 100 * STRCMP$

6930    cycles for 100 * crt_strcmp
7920    cycles for 100 * FastStringCompare
5720    cycles for 100 * MasmBasic StringsDiffer
6033    cycles for 100 * STRCMP$

6928    cycles for 100 * crt_strcmp
7924    cycles for 100 * FastStringCompare
5716    cycles for 100 * MasmBasic StringsDiffer
6034    cycles for 100 * STRCMP$

6933    cycles for 100 * crt_strcmp
7985    cycles for 100 * FastStringCompare
5717    cycles for 100 * MasmBasic StringsDiffer
6048    cycles for 100 * STRCMP$

-1      = eax crt_strcmp
0       = eax FastStringCompare
-23     = eax MasmBasic StringsDiffer
-23     = eax STRCMP$
Title: Re: String Compare
Post by: Ficko on January 29, 2013, 01:46:36 AM
Mine above rocks despite using some slow OP codes like "jecxz". :bgrin:

Quote
130 cycles, crt_strcmp str1, str1
116 cycles, FastStringCompare str1, str1
1 cycles, STRCMP$ str1, str1
128 cycles, crt_strcmp str1, str2
111 cycles, FastStringCompare str1, str2
51 cycles, STRCMP$ str1, str2

130 cycles, crt_strcmp str1, str1
116 cycles, FastStringCompare str1, str1
1 cycles, STRCMP$ str1, str1
127 cycles, crt_strcmp str1, str2
111 cycles, FastStringCompare str1, str2
51 cycles, STRCMP$ str1, str2

130 cycles, crt_strcmp str1, str1
116 cycles, FastStringCompare str1, str1
1 cycles, STRCMP$ str1, str1
127 cycles, crt_strcmp str1, str2
111 cycles, FastStringCompare str1, str2
50 cycles, STRCMP$ str1, str2

Title: Re: String Compare
Post by: MichaelW on January 29, 2013, 02:02:34 AM
The CRT strcmp function returns 0 if the strings match, < 0 if string1 < string2, or > 0 if string1 is > string2, allowing it to be used directly as a comparison function for qsort and bsearch.
Title: Re: String Compare
Post by: jj2007 on January 29, 2013, 02:03:32 AM
Quote from: Ficko on January 29, 2013, 01:46:36 AM
Mine above rocks despite using some slow OP codes like "jecxz". :bgrin:

It's pretty good for a non-SSE2 algo - I added it above :t

@Michael: I meant FastStringCompare, it returns zero for some reason - see above.
Title: Re: String Compare
Post by: Ficko on January 29, 2013, 02:14:27 AM
Quote from: MichaelW on January 29, 2013, 02:02:34 AM
The CRT strcmp function returns 0 if the strings match, < 0 if string1 < string2, or > 0 if string1 is > string2, allowing it to be used directly as a comparison function for qsort and bsearch.

STRCMP$ mimics this behavior as well - additionaly it set's the flags too so "eax" do not have to be checked. -

I designed it for my QSORT algo indeed.  ;)
Title: Re: String Compare
Post by: MichaelW on January 29, 2013, 02:27:52 AM
FastStringCompare is returning false because the strings do not match.
Title: Re: String Compare
Post by: jj2007 on January 29, 2013, 02:49:46 AM
Quote from: MichaelW on January 29, 2013, 02:27:52 AM
FastStringCompare is returning false because the strings do not match.

Make them match and see what it returns.
Title: Re: String Compare
Post by: MichaelW on January 29, 2013, 03:05:41 AM
Quote from: jj2007 on January 29, 2013, 02:49:46 AM
Quote from: MichaelW on January 29, 2013, 02:27:52 AM
FastStringCompare is returning false because the strings do not match.

Make them match and see what it returns.

In my tests the return value is correct only if the strings are aligned (I used a 4-byte alignment and did not investigate other alignments). I intended to add this information to my second post, but forgot to do so.
Title: Re: String Compare
Post by: jj2007 on January 29, 2013, 03:19:29 AM
Strange - I get this:

align 4  ; not equal
Str1   db "gustavo trigueiros e guilhermeA", 0
align 4
Str2   db "gustavo trigueiros e guilhermeX", 0

6930    cycles for 100 * crt_strcmp
8307    cycles for 100 * FastStringCompare
4738    cycles for 100 * MasmBasic StringsDiffer
6028    cycles for 100 * STRCMP$

-1      = eax crt_strcmp
0       = eax FastStringCompare
-23     = eax MasmBasic StringsDiffer
-23     = eax STRCMP$


7910    cycles for 100 * crt_strcmp
8122    cycles for 100 * FastStringCompare
4741    cycles for 100 * MasmBasic StringsDiffer
3921    cycles for 100 * STRCMP$

align 4  ; EQUAL
Str1   db "gustavo trigueiros e guilhermeX", 0
align 4
Str2   db "gustavo trigueiros e guilhermeX", 0
0       = eax crt_strcmp
0       = eax FastStringCompare
0       = eax MasmBasic StringsDiffer
0       = eax STRCMP$
Title: Re: String Compare
Post by: guga on January 29, 2013, 09:19:10 AM
Hi guys, many tks for testing :)

Note: The return value for this function is only &TRUE or &FALSE


I didn´t coded it to return 0-1 (The cases that we have a partial match) because i wasn´t thinking on the qsort and bsearch functions. (I´ll see if i can implement this return, if it don´t slow down the resultant timmings)

About the alignment of the strings, indeed, i agree, it would (in theory) speed up the scanning process in 2x, approximately , but, how compilers align their strings (in 2 byte or 4) ? This is because i thought in 2 byte alignment/scanning because the minimum null terminated string to be compared is always a char+zero (Ex: "a", 0). But, what happens when we are dealing with a string inside a structure for example ? Or inside another chain of bytes (Like an array, etc), or even if this strings is in coded in the .text section in the middle of a function (Like in 3dframes.exe from masm package, for example) ?

Or even worst, what if the string ends on the end of a memory chunck ? I mean, if we have a string the looks like this
.data section --->
Guga db "Hello", 0
Virtual Memory: B$ ? ? ?
(...)
.idata section.

If the function starts comparing in a 4 byte aligned when it reach the 5th byte "o, 0", wouldn´t cause the app to crash ? (Sorry for the dumb question, but i´m not sure if all compilers are 4 byte aligned)

Have anyone knows if all compilers align their strings (or others data) always in a 4 byte boundary ?

If it is always aligned on a 4 byte, i can try to implement a alignment strings for the comparison and see the results.
Title: Re: String Compare
Post by: MichaelW on January 29, 2013, 11:16:06 AM
Quote from: jj2007 on January 29, 2013, 03:19:29 AM
Strange - I get this...

Yes, if I substitute your strings in my test app, then both of these comparisons return zero:

    invoke FastStringCompare, ADDR str1, ADDR str1
    invoke FastStringCompare, ADDR str2, ADDR str2


But if I add a space to the end of both strings, then both comparisons return 1, so the problem appears to be with the length of the strings.

And if I then comment out the align directives, both comparisons return zero. So the code is sensitive to the length and the alignment.

The above applies for my P3 and my P4.
Title: Re: String Compare
Post by: MichaelW on January 29, 2013, 11:59:49 AM
Quote from: guga on January 29, 2013, 09:19:10 AM
Have anyone knows if all compilers align their strings (or others data) always in a 4 byte boundary?

I would guess that mainstream compilers do. The Microsoft compilers appear to maintain a minimum 4-byte alignment for any string longer than the null terminator, but I'm not sure that my test data layout is correct for what I'm trying to determine.

#include <conio.h>
#include <stdio.h>
#include <stdlib.h>

int alignment( void *address )
{
    int rval;
    __asm
    {
      xor eax, eax
      mov ecx, address
      bsf ecx, ecx
      jz done
      mov eax, 1
      Shl eax, cl
    done:
      mov rval, eax
    }
    return rval;
}

char x6[6],x5[5],x4[4],x3[3],x2[2],x1[1],y1[1],y2[2],y3[3],y4[4],y5[5],y6[6];

void main()
{
  char i6[6],i5[5],i4[4],i3[3],i2[2],i1[1],j1[1],j2[2],j3[3],j4[4],j5[5],j6[6];

  printf("%d\n",alignment(i6));
  printf("%d\n",alignment(i5));
  printf("%d\n",alignment(i4));
  printf("%d\n",alignment(i3));
  printf("%d\n",alignment(i2));
  printf("%d\n",alignment(i1));
  printf("%d\n",alignment(j1));
  printf("%d\n",alignment(j2));
  printf("%d\n",alignment(j3));
  printf("%d\n",alignment(j4));
  printf("%d\n",alignment(j5));
  printf("%d\n\n",alignment(j6));

  printf("%d\n",alignment(x6));
  printf("%d\n",alignment(x5));
  printf("%d\n",alignment(x4));
  printf("%d\n",alignment(x3));
  printf("%d\n",alignment(x2));
  printf("%d\n",alignment(x1));
  printf("%d\n",alignment(y1));
  printf("%d\n",alignment(y2));
  printf("%d\n",alignment(y3));
  printf("%d\n",alignment(y4));
  printf("%d\n",alignment(y5));
  printf("%d\n\n",alignment(y6));

  getch();
}


8
64
16
8
4
1
1
4
8
32
4
8

16
32
4
128
4
1
2
8
4
8
8
8


; Listing generated by Microsoft (R) Optimizing Compiler Version 13.10.3077

TITLE test.c
.386P
include listing.inc
if @Version gt 510
.model FLAT
else
_TEXT SEGMENT PARA USE32 PUBLIC 'CODE'
_TEXT ENDS
_DATA SEGMENT DWORD USE32 PUBLIC 'DATA'
_DATA ENDS
CONST SEGMENT DWORD USE32 PUBLIC 'CONST'
CONST ENDS
_BSS SEGMENT DWORD USE32 PUBLIC 'BSS'
_BSS ENDS
$$SYMBOLS SEGMENT BYTE USE32 'DEBSYM'
$$SYMBOLS ENDS
_TLS SEGMENT DWORD USE32 PUBLIC 'TLS'
_TLS ENDS
FLAT GROUP _DATA, CONST, _BSS
ASSUME CS: FLAT, DS: FLAT, SS: FLAT
endif

INCLUDELIB LIBC
INCLUDELIB OLDNAMES

_DATA SEGMENT
COMM _x6:BYTE:06H
COMM _x5:BYTE:05H
COMM _x4:BYTE:04H
COMM _x3:BYTE:03H
COMM _x2:BYTE:02H
COMM _x1:BYTE
COMM _y1:BYTE
COMM _y2:BYTE:02H
COMM _y3:BYTE:03H
COMM _y4:BYTE:04H
COMM _y5:BYTE:05H
COMM _y6:BYTE:06H
$SG1250 DB '%d', 0aH, 00H
$SG1251 DB '%d', 0aH, 00H
$SG1252 DB '%d', 0aH, 00H
$SG1253 DB '%d', 0aH, 00H
$SG1254 DB '%d', 0aH, 00H
$SG1255 DB '%d', 0aH, 00H
$SG1256 DB '%d', 0aH, 00H
$SG1257 DB '%d', 0aH, 00H
$SG1258 DB '%d', 0aH, 00H
$SG1259 DB '%d', 0aH, 00H
$SG1260 DB '%d', 0aH, 00H
$SG1261 DB '%d', 0aH, 0aH, 00H
ORG $+3
$SG1262 DB '%d', 0aH, 00H
$SG1263 DB '%d', 0aH, 00H
$SG1264 DB '%d', 0aH, 00H
$SG1265 DB '%d', 0aH, 00H
$SG1266 DB '%d', 0aH, 00H
$SG1267 DB '%d', 0aH, 00H
$SG1268 DB '%d', 0aH, 00H
$SG1269 DB '%d', 0aH, 00H
$SG1270 DB '%d', 0aH, 00H
$SG1271 DB '%d', 0aH, 00H
$SG1272 DB '%d', 0aH, 00H
$SG1273 DB '%d', 0aH, 0aH, 00H
_DATA ENDS
PUBLIC _alignment
; Function compile flags: /Odt
_TEXT SEGMENT
_rval$ = -4 ; size = 4
_address$ = 8 ; size = 4
_alignment PROC NEAR
; File c:\program files\microsoft visual c++ toolkit 2003\my\alignment\test.c
; Line 6
push ebp
mov ebp, esp
push ecx
; Line 10
xor eax, eax
; Line 11
mov ecx, DWORD PTR _address$[ebp]
; Line 12
bsf ecx, ecx
; Line 13
je SHORT $done$1223
; Line 14
mov eax, 1
; Line 15
shl eax, cl
$done$1223:
; Line 17
mov DWORD PTR _rval$[ebp], eax
; Line 19
mov eax, DWORD PTR _rval$[ebp]
; Line 20
mov esp, ebp
pop ebp
ret 0
_alignment ENDP
_TEXT ENDS
PUBLIC _main
EXTRN _getch:NEAR
EXTRN _printf:NEAR
; Function compile flags: /Odt
_TEXT SEGMENT
_i2$ = -64 ; size = 2
_j6$ = -60 ; size = 6
_i4$ = -52 ; size = 4
_j1$ = -45 ; size = 1
_i3$ = -44 ; size = 3
_j2$ = -40 ; size = 2
_i5$ = -36 ; size = 5
_j3$ = -28 ; size = 3
_j5$ = -24 ; size = 5
_i1$ = -13 ; size = 1
_i6$ = -12 ; size = 6
_j4$ = -4 ; size = 4
_main PROC NEAR
; Line 25
push ebp
mov ebp, esp
sub esp, 64 ; 00000040H
; Line 28
lea eax, DWORD PTR _i6$[ebp]
push eax
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1250
call _printf
add esp, 8
; Line 29
lea ecx, DWORD PTR _i5$[ebp]
push ecx
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1251
call _printf
add esp, 8
; Line 30
lea edx, DWORD PTR _i4$[ebp]
push edx
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1252
call _printf
add esp, 8
; Line 31
lea eax, DWORD PTR _i3$[ebp]
push eax
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1253
call _printf
add esp, 8
; Line 32
lea ecx, DWORD PTR _i2$[ebp]
push ecx
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1254
call _printf
add esp, 8
; Line 33
lea edx, DWORD PTR _i1$[ebp]
push edx
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1255
call _printf
add esp, 8
; Line 34
lea eax, DWORD PTR _j1$[ebp]
push eax
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1256
call _printf
add esp, 8
; Line 35
lea ecx, DWORD PTR _j2$[ebp]
push ecx
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1257
call _printf
add esp, 8
; Line 36
lea edx, DWORD PTR _j3$[ebp]
push edx
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1258
call _printf
add esp, 8
; Line 37
lea eax, DWORD PTR _j4$[ebp]
push eax
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1259
call _printf
add esp, 8
; Line 38
lea ecx, DWORD PTR _j5$[ebp]
push ecx
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1260
call _printf
add esp, 8
; Line 39
lea edx, DWORD PTR _j6$[ebp]
push edx
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1261
call _printf
add esp, 8
; Line 42
push OFFSET FLAT:_x6
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1262
call _printf
add esp, 8
; Line 43
push OFFSET FLAT:_x5
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1263
call _printf
add esp, 8
; Line 44
push OFFSET FLAT:_x4
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1264
call _printf
add esp, 8
; Line 45
push OFFSET FLAT:_x3
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1265
call _printf
add esp, 8
; Line 46
push OFFSET FLAT:_x2
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1266
call _printf
add esp, 8
; Line 47
push OFFSET FLAT:_x1
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1267
call _printf
add esp, 8
; Line 48
push OFFSET FLAT:_y1
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1268
call _printf
add esp, 8
; Line 49
push OFFSET FLAT:_y2
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1269
call _printf
add esp, 8
; Line 50
push OFFSET FLAT:_y3
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1270
call _printf
add esp, 8
; Line 51
push OFFSET FLAT:_y4
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1271
call _printf
add esp, 8
; Line 52
push OFFSET FLAT:_y5
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1272
call _printf
add esp, 8
; Line 53
push OFFSET FLAT:_y6
call _alignment
add esp, 4
push eax
push OFFSET FLAT:$SG1273
call _printf
add esp, 8
; Line 55
call _getch
; Line 56
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
END


Edit:

Analyzing this further, and noting in the listing that the compiler had changed the order of the local strings in main, but not the order of the COMM strings in the DATA section, I added this code:

  printf("%d\n",&x6);
  printf("%d\n",&x5);
  printf("%d\n",&x4);
  printf("%d\n",&x3);
  printf("%d\n",&x2);
  printf("%d\n",&x1);
  printf("%d\n",&y1);
  printf("%d\n",&y2);
  printf("%d\n",&y3);
  printf("%d\n",&y4);
  printf("%d\n",&y5);
  printf("%d\n\n",&y6);


And got this output:

4233648
4233632
4233628
4233664
4233668
4233677
4233662
4233624
4233644
4233640
4233672
4233656


While I cannot readily equate the layout of the addresses to the compiler's layout of the local strings, the data has obviously been shuffled. So the question is, who did the shuffling? And the answer from the MASM 6.0 Programmer's Guide, under Selecting Data-Sharing Methods, is:
Quote
As an alternative, you can use the COMM directive instead of PUBLIC and EXTERN. However, communal variables have some limitations. You cannot depend on their location in memory because they are allocated by the linker, and they cannot be initialized.
Title: Re: String Compare
Post by: guga on January 29, 2013, 05:10:41 PM
Ok, guys, i guess i finished with string alignment. Didn´t tested the performance. So feel free to test it.

On the strings i tested the result is accurate. (0 if don´t match, 1 if match) - If this new version is fast i´ll implement the return 0-1 on it.


Proc FastStringCompare:
    Arguments @String1, @String2
    Uses esi, ecx, ebx, edx

    mov esi D@String1
    mov ecx 0
    mov eax 0
    mov ebx D@String2

    mov edx D$esi
    cmp edx D$ebx   ; just to force the zero flags. if we have a match zero flag is settled. But we need to check if the last byte at string1 is zero.

jz L0> ; The 1st 4 bytes are not zero, jmp over

        mov ecx 3
        shr edx 8 | jz L1> ; If string is only 1 byte len. ecx = 3
        inc ecx | jmp L1> ; else, ecx = 4

    L0:
        cmp dh 0 | jz L1> ; the last byte of the 1st string is zero. We reached the end of the search. Contunue computing the remaining bytes at "L1"
        lea ecx D$ecx+4 ; compute the len of the string1
        mov edx D$esi+ecx
        cmp D$ebx+ecx edx
        jz L0<

        jmp L3>> ; strings definitelly dont match

L1:

        lea ebx D$ebx+ecx | mov edx D$ebx-3
        lea esi D$esi+ecx
        cmp edx D$esi-3 ; if both words on string1 and string2 don´ match jmp over. Otherwise, continue analysing the remaining bytes
        jnz L3>>
        ; We are almost there. If ecx = 0, means that the string1 have only 2 byte len. So let´t search for the 2nd byte of string2
;;
    ; so far all bytes are equal. Let´t see if teh string2 is null terminated as it is on string1. Ex: str1) "guga1" | str2) "guga1" Or if it is str1) "guga1" | str2) "gugaM"...
    ; On the dword, we need to check the byte after the dx like: 00XX_0000
    If B$ebx-1 = 0
        inc eax
    End_If

    Note for me, i commented it out, because the above assumption is wrong. In fact, when we reach here we have analysed all the string
    and both are exactly the same.
;;

    ; In mine tests no matter what bytes gets here, because it always means that we fully analysed the strings, and they always matches
    inc eax

L3:

EndP


Tested strings:


    call FastStringCompare {"1", 0}, {"g", 0}
    call FastStringCompare {"g", 0}, {"g", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"gug", 0}, {"gug", 0}
    call FastStringCompare {"guga", 0}, {"guga", 0}
    call FastStringCompare {"guga1", 0}, {"guga1", 0}
    call FastStringCompare {"guga12", 0}, {"guga12", 0}
    call FastStringCompare {"guga123", 0}, {"guga123", 0}
    call FastStringCompare {"guga1234", 0}, {"guga1234", 0}
    call FastStringCompare {"guga12345", 0}, {"guga12345", 0}
    call FastStringCompare {"guga123456", 0}, {"guga123456", 0}
   
    call FastStringCompare {"1234", 0}, {"5678", 0}
    call FastStringCompare {"1", 0}, {"g", 0}
    call FastStringCompare {"gustavo trigu", 0}, {"gustavo trigueiros e guilherme", 0}
    call FastStringCompare {"g", 0}, {"gu", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"gu", 0}, {"guu", 0}
    call FastStringCompare {"guuu", 0}, {"guu", 0}
    call FastStringCompare {"guu", 0}, {"guuu", 0}
    call FastStringCompare {"gug", 0}, {"guga", 0}
    call FastStringCompare {"g44564", 0}, {"112345g", 0}
    call FastStringCompare {"gu", 0}, {"g", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"g", 0}, {"gu", 0}
    call FastStringCompare {"g", 0}, {"g", 0}
    call FastStringCompare {"1", 0}, {"g", 0}

        call FastStringCompare {"g1stavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gu1tavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gus1avo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gust1vo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gusta1o trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustav1 trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo1trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo 1rigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilherm1", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilher1e", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilhe1me", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilh1rme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guil1erme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e gui1herme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e gu1lherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e g1ilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e 1uilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e1guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros 1 guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros1e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiro1 e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueir1s e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}

        call FastStringCompare {"guga", 0}, {"guga1", 0}
        call FastStringCompare {"guga1", 0}, {"guga", 0}
Title: Re: String Compare - Updated
Post by: guga on January 29, 2013, 06:10:01 PM
The preliminary results i´ve got is
Not sure if the result is accurated because i wasn´t be able to assemble the masm app from JJ2007, so i just disassembled it, inserted my code and reassembled it back. But comparing the older results i guess it is faster then my previous version.

Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (SE4)
loop overhead is approx. 189/100 cycles


7185    cycles for 100 * crt_strcmp
2060    cycles for 100 * FastStringCompare
2058    cycles for 100 * MasmBasic StringsDiffer
3325    cycles for 100 * STRCMP$

7211    cycles for 100 * crt_strcmp
2057    cycles for 100 * FastStringCompare
2059    cycles for 100 * MasmBasic StringsDiffer
3216    cycles for 100 * STRCMP$

7177    cycles for 100 * crt_strcmp
2058    cycles for 100 * FastStringCompare
2062    cycles for 100 * MasmBasic StringsDiffer
3219    cycles for 100 * STRCMP$

7178    cycles for 100 * crt_strcmp
2057    cycles for 100 * FastStringCompare
2059    cycles for 100 * MasmBasic StringsDiffer
3309    cycles for 100 * STRCMP$

7179    cycles for 100 * crt_strcmp
2208    cycles for 100 * FastStringCompare
2073    cycles for 100 * MasmBasic StringsDiffer
3222    cycles for 100 * STRCMP$

-1      = eax crt_strcmp
0       = eax FastStringCompare
-23     = eax MasmBasic StringsDiffer
-23     = eax STRCMP$

--- ok ---
Title: Re: String Compare
Post by: jj2007 on January 29, 2013, 06:28:00 PM
Quote from: guga on January 29, 2013, 09:19:10 AMDoes anyone know if all compilers align their strings (or other data) always on a 4 byte boundary ?

Freshly created strings are likely to be 8 byte aligned, due to HeapAlloc. But a library function must be able to handle any alignment, e.g. when you pass a pointer to a substring. There shouldn't be any problem, though, unless you use SSE2. The MasmBasic StringsDiffer() does indeed use SSE2 and has some overhead, which makes it relatively slow for very short strings.

Here are two examples for a somewhat longer string, showing the influence of alignment:

Str1   db "gustavo trigueiros e guilherme alias 'guga' is today the brain behind RosAsm, one of the most powerful assemblers ever developedA", 0
Str2   db "gustavo ...X", 0


align 4 for both strings:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
++++++++++++++++++++
2894    cycles for 10 * crt_strcmp
2315    cycles for 10 * FastStringCompare
1186    cycles for 10 * MasmBasic StringsDiffer
1964    cycles for 10 * STRCMP$


align 16 for both strings:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
++++++++++++++++++++
2892    cycles for 10 * crt_strcmp
2309    cycles for 10 * FastStringCompare
969     cycles for 10 * MasmBasic StringsDiffer
1954    cycles for 10 * STRCMP$

Title: Re: String Compare - Updated
Post by: guga on January 29, 2013, 06:58:55 PM
Hi JJ, thanks for the nice words :)

I built a new version (using alignment) and just made a new improve:

Proc FastStringCompare:
    Arguments @String1, @String2
    Uses esi, ecx, ebx, edx

    mov esi D@String1
    mov ecx 0
    mov eax 0
    mov ebx D@String2

    mov edx D$esi
    cmp edx D$ebx   ; just to force the zero flags. if we have a match zero flag is settled. But we need to check if the last byte at string1 is zero.

jz L0> ; The 1st 4 bytes are not zero, jmp over

        mov ecx 3
        shr edx 8 | jz L1> ; If string is only 1 byte len. ecx = 3
        inc ecx | jmp L1> ; else, ecx = 4

    L0:
        test dh dh | jz L1> ; the last byte of the 1st string is zero. We reached the end of the search. Contunue computing the remaining bytes at "L1"
        lea ecx D$ecx+4 ; compute the len of the string1
        mov edx D$esi+ecx
        cmp D$ebx+ecx edx
        jz L0<

        jmp L3>> ; strings definitelly dont match

L1:

        lea ebx D$ebx+ecx | mov edx D$ebx-3
        lea esi D$esi+ecx
        cmp D$esi-3 edx; if both words on string1 and string2 don´ match jmp over. Otherwise, continue analysing the remaining bytes
        jnz L3>>
        ; We are almost there. If ecx = 0, means that the string1 have only 2 byte len. So let´t search for the 2nd byte of string2
;;
    ; so far all bytes are equal. Let´t see if teh string2 is null terminated as it is on string1. Ex: str1) "guga1" | str2) "guga1" Or if it is str1) "guga1" | str2) "gugaM"...
    ; On the dword, we need to check the byte after the dx like: 00XX_0000
    If B$ebx-1 = 0
        inc eax
    End_If

    Note for me, i commented it out, because the above assumption is wrong. In fact, when we reach here we have analysed all the string
    and both are exactly the same.
;;

    ; In mine tests no matter what bytes gets here, because it always means that we fully analysed the strings, and they always matches
    inc eax


L3:

EndP


Using the same strings as the post above i got this results:
Str1   db "gustavo trigueiros e guilherme alias 'guga' is today the brain behind RosAsm, one of the most powerful assemblers ever developedA", 0
Str2   db "gustavo ...X", 0

Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (£E4)
loop overhead is approx. 181/100 cycles

2103 cycles for 100 * crt_strcmp
1087 cycles for 100 * FastStringCompare
1774 cycles for 100 * MasmBasic StringsDiffer
1599 cycles for 100 * STRCMP$

2100 cycles for 100 * crt_strcmp
1091 cycles for 100 * FastStringCompare
1768 cycles for 100 * MasmBasic StringsDiffer
1593 cycles for 100 * STRCMP$

2479 cycles for 100 * crt_strcmp
1559 cycles for 100 * FastStringCompare
1849 cycles for 100 * MasmBasic StringsDiffer
1698 cycles for 100 * STRCMP$

2182 cycles for 100 * crt_strcmp
1087 cycles for 100 * FastStringCompare
1781 cycles for 100 * MasmBasic StringsDiffer
1607 cycles for 100 * STRCMP$

2109 cycles for 100 * crt_strcmp
1090 cycles for 100 * FastStringCompare
1774 cycles for 100 * MasmBasic StringsDiffer
1592 cycles for 100 * STRCMP$

1 = eax crt_strcmp
0 = eax FastStringCompare
70 = eax MasmBasic StringsDiffer
70 = eax STRCMP$

--- ok ---


And this results using
Str1   db "gustavo trigueiros e guilhermeA", 0
Str2   db "gustavo trigueiros e guilhermeX", 0


Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (SE4)
loop overhead is approx. 181/100 cycles


7380 cycles for 100 * crt_strcmp
2048 cycles for 100 * FastStringCompare
2142 cycles for 100 * MasmBasic StringsDiffer
3322 cycles for 100 * STRCMP$

7889 cycles for 100 * crt_strcmp
2052 cycles for 100 * FastStringCompare
2136 cycles for 100 * MasmBasic StringsDiffer
3326 cycles for 100 * STRCMP$

7370 cycles for 100 * crt_strcmp
2361 cycles for 100 * FastStringCompare
2128 cycles for 100 * MasmBasic StringsDiffer
3334 cycles for 100 * STRCMP$

7378 cycles for 100 * crt_strcmp
2052 cycles for 100 * FastStringCompare
2132 cycles for 100 * MasmBasic StringsDiffer
3708 cycles for 100 * STRCMP$

7392 cycles for 100 * crt_strcmp
2061 cycles for 100 * FastStringCompare
2106 cycles for 100 * MasmBasic StringsDiffer
3306 cycles for 100 * STRCMP$

-1 = eax crt_strcmp
0 = eax FastStringCompare
-23 = eax MasmBasic StringsDiffer
-23 = eax STRCMP$

--- ok ---


I´m not sure if it can be even faster, but i´ll give it a try. The important (at least for me) is not only speed, but accuracy in the results.
Title: Re: String Compare - Updated
Post by: Ficko on January 29, 2013, 07:07:02 PM
Quote
Freshly created strings are likely to be 8 byte aligned, due to HeapAlloc. But a library function must be able to handle any alignment, e.g. when you pass a pointer to a substring.

That's correct. Handling unaligned data is very important.

From Guga:
Quote
Or even worst, what if the string ends on the end of a memory chunck ?

That's right though very unlikely it can happen.
If you request 1 byte trough "HeapAlloc" you will get at least 4 bytes back - if not 8 - but it is not documented by MS as such.
So what I am doing is to wrap the "HeapAlloc" API and add 3 bytes to any heap request if using 32-bit memory access or add 16 bytes if using SSE.

It is a good practice if you use SSE a lot.
Like some of us do.
I use almost exclusivelly SSE for strlen and strcpy because they are so much faster.

Like Dave sad above somewhere it is not allways possible to have both string compared as aligned.
What I am doing it to access at least one string as aligned and the other one by chance. So I get at least 50% alignment on both strings - worst case scenario -


Title: Re: String Compare - Updated
Post by: Ficko on January 29, 2013, 07:13:43 PM
And BTW the above test would be more accurate if it would generate a table of random lenght and aligned strings and use that as input. IMO
Title: Re: String Compare - Updated
Post by: dedndave on January 29, 2013, 09:01:19 PM
yah - you can't depend on the strings having any particular alignment
probably the best approach is to test all the alignment combinations in each pass of the timing test
then divide the result by the number of tests
that way, you optimize your code for overall performance

        ALIGN   4
szTest0 db 'String',0
szTestA db 'Strinx',0

        ALIGN   4
        db 0
szTest1 db 'String',0
szTestB db 'Strinx',0

        ALIGN   4
        db 0,0
szTest2 db 'String',0
szTestC db 'Strinx',0

        ALIGN   4
        db 0,0,0
szTest3 db 'String',0
szTestD db 'Strinx',0


compilers may try to align strings
but, they often wind up being partials of some previous string
or in a buffer that is not aligned, etc
or - perhaps you are comparing strings, starting at some unaligned offset

strings are byte entities in a dword world   :P
because aligned SSE is probably the fastest way, it's a 16-byte world
Title: Re: String Compare - Updated
Post by: guga on January 30, 2013, 02:08:48 AM
On the updated version, the alignment really dont seems to be any problem. I tried to updated it to handle both aligned and unaligned strings, without having to use test, cmp mnemonics for each different alignment.

Also, i just did a small improve on the code and aligned the code itself (lea and some cmp mnemonics) and the result is really impressive. It seems to be faster then JJ2007 SSE version.

Proc FastStringCompare:
    Arguments @String1, @String2
    Uses esi, ecx, ebx, edx

    mov esi D@String1
    mov ebx D@String2
    mov eax 0
    mov ecx 4
    mov edx D$esi
    cmp edx D$ebx   ; just to force the zero flags. if we have a match zero flag is settled. But we need to check if the last byte at string1 is zero.
jz L0> ; The 1st 4 bytes are not zero, jmp over

        mov ecx 3
        shr edx 8 | jz L1> ; If string is only 1 byte len. ecx = 3
        inc ecx | jmp L1> ; else, ecx = 4

    L0:
        test dh dh | jz L1> ; the last byte of the 1st string is zero. We reached the end of the search. Contunue computing the remaining bytes at "L1"
        lea ecx D$ecx+4 ; compute the len of the string1
Align 4 ; trying to align the code to avoid pipeline/stall problems, and get maximum results. In this case, it don´t insert any nop operation. This line can be commented out

        mov edx D$esi+ecx
        cmp D$ebx+ecx edx
        jz L0<

        jmp L3>> ; strings definitelly dont match

L1:
        lea ebx D$ebx+ecx
        Align 4 ; trying to align the code to avoid pipeline/stall problems, and get maximum results. In this case, it don´t insert any nop operation. This line can be commented out
        mov edx D$ebx-3
        lea esi D$esi+ecx
        Align 4 ; trying to align the code to avoid pipeline/stall problems, and get maximum results. In this case, it insert 2 nops operations
        cmp D$esi-3 edx; if both words on string1 and string2 don´ match jmp over. Otherwise, continue analysing the remaining bytes
         Align 4; trying to align the code to avoid pipeline/stall problems, and get maximum results. In this case, it insert 1 nop operations
        jnz L3>>

    ; In mine tests no matter what bytes gets here, because it always means that we fully analysed the strings, and they always matches
    inc eax


L3:

EndP




Results

Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (SE4)
loop overhead is approx. 189/100 cycles


7327 cycles for 100 * crt_strcmp
1885 cycles for 100 * FastStringCompare
2461 cycles for 100 * MasmBasic StringsDiffer
3291 cycles for 100 * STRCMP$

8410 cycles for 100 * crt_strcmp
1903 cycles for 100 * FastStringCompare
2127 cycles for 100 * MasmBasic StringsDiffer
3279 cycles for 100 * STRCMP$

7286 cycles for 100 * crt_strcmp
1847 cycles for 100 * FastStringCompare
2115 cycles for 100 * MasmBasic StringsDiffer
3278 cycles for 100 * STRCMP$

7304 cycles for 100 * crt_strcmp
1847 cycles for 100 * FastStringCompare
2094 cycles for 100 * MasmBasic StringsDiffer
3302 cycles for 100 * STRCMP$




At night i´ll try to see if this can be improved a little bit more.
Title: Re: String Compare - Updated
Post by: jj2007 on January 30, 2013, 03:54:06 AM
Interesting, guga. Can you post the exe (and source), so that we can test it?
Title: Re: String Compare - Updated
Post by: guga on January 30, 2013, 11:08:44 AM
Hi JJ

OK, i´ll post in a couple of minutes (or 1 or 2 hours) before i finish my testings for speed and reliability. So far i fixed a small mistake i made that was generating the previous report. On unaligned strings it was missing 1 single byte during the check causing the report to decrease a few ms, and the code miss 1 byte unaligned strings.

Now, it seems to be ok, i modified the code and it seems to handle all strings (aligned and unaligned). So far, the result is:


Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (SE4)
loop overhead is approx. 208/100 cycles


7589    cycles for 100 * crt_strcmp
2208    cycles for 100 * FastStringCompare
2539    cycles for 100 * MasmBasic StringsDiffer
2950    cycles for 100 * STRCMP$

7593    cycles for 100 * crt_strcmp
2223    cycles for 100 * FastStringCompare
2544    cycles for 100 * MasmBasic StringsDiffer
2946    cycles for 100 * STRCMP$

7578    cycles for 100 * crt_strcmp
2206    cycles for 100 * FastStringCompare
2545    cycles for 100 * MasmBasic StringsDiffer
2947    cycles for 100 * STRCMP$

7575    cycles for 100 * crt_strcmp
2209    cycles for 100 * FastStringCompare
2545    cycles for 100 * MasmBasic StringsDiffer
2947    cycles for 100 * STRCMP$

7658    cycles for 100 * crt_strcmp
2214    cycles for 100 * FastStringCompare
2544    cycles for 100 * MasmBasic StringsDiffer
2947    cycles for 100 * STRCMP$

0       = eax crt_strcmp
1       = eax FastStringCompare
0       = eax MasmBasic StringsDiffer
0       = eax STRCMP$

--- ok ---


And on my tests it didn´t missed unaligned bytes anylonger. But, i´ll make a couple of more tests before i post the source,k since i´m focusing in accuracy and then speed.

I´ll upload it as soon as possible. Hold on :)