News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

String Compare - Updated

Started by guga, January 28, 2013, 12:12:44 PM

Previous topic - Next topic

guga

Hi guys

can someone test this string comparation function i made ? I have no idea how fast/slow is it.

In theory it should be twice as fast as strcmp from msvcrt, but i´m unable to benchmark it.

UPDATED VERSION 29/01/2013



Proc FastStringCompare:
    Arguments @String1, @String2
    Uses esi, ecx, ebx, edx

    mov esi D@String1
    mov ecx 0
    mov eax 0
    mov ebx D@String2

    mov edx D$esi
    cmp edx D$ebx   ; just to force the zero flags. if we have a match zero flag is settled. But we need to check if the last byte at string1 is zero.

jz L0> ; The 1st 4 bytes are not zero, jmp over

        mov ecx 3
        shr edx 8 | jz L1> ; If string is only 1 byte len. ecx = 3
        inc ecx | jmp L1> ; else, ecx = 4

    L0:
        test dh dh | jz L1> ; the last byte of the 1st string is zero. We reached the end of the search. Contunue computing the remaining bytes at "L1"
        lea ecx D$ecx+4 ; compute the len of the string1
        mov edx D$esi+ecx
        cmp D$ebx+ecx edx
        jz L0<

        jmp L3>> ; strings definitelly dont match

L1:

        lea ebx D$ebx+ecx | mov edx D$ebx-3
        lea esi D$esi+ecx
        cmp D$esi-3 edx; if both words on string1 and string2 don´ match jmp over. Otherwise, continue analysing the remaining bytes
        jnz L3>>
        ; We are almost there. If ecx = 0, means that the string1 have only 2 byte len. So let´t search for the 2nd byte of string2
;;
    ; so far all bytes are equal. Let´t see if teh string2 is null terminated as it is on string1. Ex: str1) "guga1" | str2) "guga1" Or if it is str1) "guga1" | str2) "gugaM"...
    ; On the dword, we need to check the byte after the dx like: 00XX_0000
    If B$ebx-1 = 0
        inc eax
    End_If

    Note for me, i commented it out, because the above assumption is wrong. In fact, when we reach here we have analysed all the string
    and both are exactly the same.
;;

    ; In mine tests no matter what bytes gets here, because it always means that we fully analysed the strings, and they always matches
    inc eax


L3:

EndP


Tested strings:


    call FastStringCompare {"1", 0}, {"g", 0}
    call FastStringCompare {"g", 0}, {"g", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"gug", 0}, {"gug", 0}
    call FastStringCompare {"guga", 0}, {"guga", 0}
    call FastStringCompare {"guga1", 0}, {"guga1", 0}
    call FastStringCompare {"guga12", 0}, {"guga12", 0}
    call FastStringCompare {"guga123", 0}, {"guga123", 0}
    call FastStringCompare {"guga1234", 0}, {"guga1234", 0}
    call FastStringCompare {"guga12345", 0}, {"guga12345", 0}
    call FastStringCompare {"guga123456", 0}, {"guga123456", 0}
   
    call FastStringCompare {"1234", 0}, {"5678", 0}
    call FastStringCompare {"1", 0}, {"g", 0}
    call FastStringCompare {"gustavo trigu", 0}, {"gustavo trigueiros e guilherme", 0}
    call FastStringCompare {"g", 0}, {"gu", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"gu", 0}, {"guu", 0}
    call FastStringCompare {"guuu", 0}, {"guu", 0}
    call FastStringCompare {"guu", 0}, {"guuu", 0}
    call FastStringCompare {"gug", 0}, {"guga", 0}
    call FastStringCompare {"g44564", 0}, {"112345g", 0}
    call FastStringCompare {"gu", 0}, {"g", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"g", 0}, {"gu", 0}
    call FastStringCompare {"g", 0}, {"g", 0}
    call FastStringCompare {"1", 0}, {"g", 0}

        call FastStringCompare {"g1stavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gu1tavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gus1avo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gust1vo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gusta1o trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustav1 trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo1trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo 1rigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilherm1", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilher1e", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilhe1me", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilh1rme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guil1erme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e gui1herme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e gu1lherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e g1ilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e 1uilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e1guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros 1 guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros1e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiro1 e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueir1s e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        call FastStringCompare {"gustavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}

        call FastStringCompare {"guga", 0}, {"guga1", 0}
        call FastStringCompare {"guga1", 0}, {"guga", 0}



________________________________________________________________________________________________
________________________________________________________________________________________________
________________________________________________________________________________________________
________________________________________________________________________________________________
________________________________________________________________________________________________

OLDER VERSION 29/01/2013 - Aligned data. It is fast, but the above one is faster.

Proc FastStringCompare:
    Arguments @String1, @String2
    Uses esi, ecx, ebx, edx

    mov esi D@String1
    mov ecx 0
    mov eax 0
    mov ebx D@String2

    mov edx D$esi
    cmp edx D$ebx   ; just to force the zero flags. if we have a match zero flag is settled. But we need to check if the last byte at string1 is zero.

jz L0> ; The 1st 4 bytes are not zero, jmp over

        mov ecx 3
        shr edx 8 | jz L1> ; If string is only 1 byte len. ecx = 3
        inc ecx | jmp L1> ; else, ecx = 4

    L0:
        cmp dh 0 | jz L1> ; the last byte of the 1st string is zero. We reached the end of the search. Contunue computing the remaining bytes at "L1"
        lea ecx D$ecx+4 ; compute the len of the string1
        mov edx D$esi+ecx
        cmp D$ebx+ecx edx
        jz L0<

        jmp L3>> ; strings definitelly dont match

L1:

        lea ebx D$ebx+ecx | mov edx D$ebx-3
        lea esi D$esi+ecx
        cmp edx D$esi-3 ; if both words on string1 and string2 don´ match jmp over. Otherwise, continue analysing the remaining bytes
        jnz L3>>
        ; We are almost there. If ecx = 0, means that the string1 have only 2 byte len. So let´t search for the 2nd byte of string2
;;
    ; so far all bytes are equal. Let´t see if teh string2 is null terminated as it is on string1. Ex: str1) "guga1" | str2) "guga1" Or if it is str1) "guga1" | str2) "gugaM"...
    ; On the dword, we need to check the byte after the dx like: 00XX_0000
    If B$ebx-1 = 0
        inc eax
    End_If

    Note for me, i commented it out, because the above assumption is wrong. In fact, when we reach here we have analysed all the string
    and both are exactly the same.
;;

    ; In mine tests no matter what bytes gets here, because it always means that we fully analysed the strings, and they always matches
    inc eax

L3:

EndP



OLDER VERSION 28/01/2013 - Unaligned data and some mistake when the string is only 1 byte len


; Return values:
; Eax = 0. Strings don´t match
; Eax = 1. Strings matches

Proc FastStringCompare:
    Arguments @String1, @String2
    Uses esi, ecx, ebx, edx

    mov esi D@String1
    xor ecx ecx
    xor eax eax
    xor edx edx
    mov ebx D@String2

    mov ax W$ebx
    mov dx W$esi
    xor ax dx   ; just to force the zero flags. if we have a match zero flag is settled. But we need to check if the last byte at string1 is zero.
    L0:
        or dh 0 | jz L1> ; the last byte of the 1st string is zero. We reached the end of the search. Contunue computing the remaining bytes at "L1"
        lea ecx D$ecx+2 ; compute the len of the string1
        mov dx W$esi+ecx
        cmp W$ebx+ecx dx
        jz L0<

L1:

        lea ebx D$ebx+ecx | mov ax W$ebx-1
        lea esi D$esi+ecx | mov dx W$esi-1
        xor ax dx ; if both words on string1 and string2 don´ match jmp over. Otherwise, continue analysing the remaining bytes

        jz L1> | L2: | xor eax eax | ExitP | L1:
        ; We are almost there. If ecx = 0, means that the string1 have only 2 byte len. So let´t search for the 2nd byte of string2
L1:

    ; do again for the last bytes
    ; so far all bytes are equal

    mov eax &TRUE
    .If ecx = 0 ; when len (ecx) is zero it means that we the string is smaller then 2 bytes (including null terminated)
        If B$ebx+1 <> 0 ; so all we need is analyse if the last byte if the string2 is zero (Because the 1st one of both matches)
            xor eax eax
        End_If
    .Else_If B$ebx <> 0
        xor eax eax
    .End_If

EndP


Code examples

    call FastStringCompare {"g", 0}, {"gu", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"gu", 0}, {"guu", 0}
    call FastStringCompare {"guuu", 0}, {"guu", 0}
    call FastStringCompare {"guu", 0}, {"guuu", 0}
    call FastStringCompare {"gug", 0}, {"guga", 0}
    call FastStringCompare {"g44564", 0}, {"112345g", 0}
    call FastStringCompare {"gu", 0}, {"g", 0}
    call FastStringCompare {"gu", 0}, {"gu", 0}
    call FastStringCompare {"g", 0}, {"gu", 0}
    call FastStringCompare {"g", 0}, {"g", 0}
    mov ecx 0-1
    Do
        call FastStringCompare {"gustavo trigueiros e guilherme", 0}, {"gustavo trigueiros e guilherme", 0}
        dec ecx
    Loop_Until ecx = 0
    call FastStringCompare {"guga", 0}, {"guga1", 0}
    call FastStringCompare {"guga1", 0}, {"guga", 0}
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

MichaelW

Judging from the function test results, my translation to MASM code is apparently correct.

;==============================================================================
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
;==============================================================================
.data
    str1 db "The MASM32 SDK has a large body of example code that deals",0
    str2 db "The MASM32 SDK has a large body of example code that deal",0
.code
;==============================================================================
FastStringCompare proc uses esi ecx ebx edx string1:DWORD, string2:DWORD

    mov esi, string1
    xor ecx, ecx
    xor eax, eax
    xor edx, edx
    mov ebx, string2

    mov ax, [ebx]
    mov dx, [esi]
    xor ax, dx          ; just to force the zero flags.
                        ; if we have a match zero flag is settled.
                        ; But we need to check if the last byte at
                        ; string1 is zero.
  L0:
    or dh, 0
    jz L1               ; the last byte of the 1st string is zero.
                        ; We reached the end of the search.
                        ; Contunue computing the remaining bytes at "L1"
    lea ecx, [ecx+2]    ; compute the len of the string1
    mov dx, [esi+ecx]
    cmp [ebx+ecx], dx
    jz L0

  L1:
    lea ebx, [ebx+ecx]
    mov ax, [ebx-1]
    lea esi, [esi+ecx]
    mov dx, [esi-1]
    xor ax, dx          ; if both words on string1 and string2 don't
                        ; match jmp over. Otherwise, continue analysing
                        ;the remaining bytes
    jz L2
    xor eax, eax
    ret

  L2:                   ; We are almost there. If ecx = 0, means that the
                        ; string1 have only 2 byte len. So let´t search
                        ; for the 2nd byte of string2.

    ; do again for the last bytes
    ; so far all bytes are equal

    mov eax, TRUE
    .if ecx == 0        ; when len (ecx) is zero it means that we the
                        ; string is smaller then 2 bytes (including null
                        ; terminated).
        .if byte ptr [ebx+1]
                        ; so all we need is analyse if the last byte if
                        ; the string2 is zero (Because the 1st one of both
                        ; matches)
            xor eax, eax
        .endif
    .elseif byte ptr [ebx]
        xor eax, eax
    .endif

    ret

FastStringCompare endp

;==============================================================================
start:
;==============================================================================

    invoke FastStringCompare, chr$("g"), chr$("gu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("gu")
    printf("(should be 1)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("guu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("guuu"), chr$("guu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("guu"), chr$("guuu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gug"), chr$("guga")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("g44564"), chr$("112345g")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("g")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("gu")
    printf("(should be 1)\t%d\n", eax)
    invoke FastStringCompare, chr$("g"), chr$("gu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("g"), chr$("g")
    printf("(should be 1)\t%d\n\n", eax)

    invoke GetCurrentProcess
    invoke SetProcessAffinityMask, eax, 1

    invoke Sleep, 5000

    REPEAT 3

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke crt_strcmp, ADDR str1, ADDR str1
    counter_end
    printf("%d cycles, crt_strcmp str1, str1\n", eax)

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke FastStringCompare, ADDR str1, ADDR str1
    counter_end
    printf("%d cycles, FastStringCompare str1, str1\n", eax)

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke crt_strcmp, ADDR str1, ADDR str2
    counter_end
    printf("%d cycles, crt_strcmp str1, str2\n", eax)

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke FastStringCompare, ADDR str1, ADDR str2
    counter_end
    printf("%d cycles, FastStringCompare str1, str2\n\n", eax)

    ENDM

    inkey
    exit
;==============================================================================
end start


Running on a P3:

321 cycles, crt_strcmp str1, str1
274 cycles, FastStringCompare str1, str1
308 cycles, crt_strcmp str1, str2
151 cycles, FastStringCompare str1, str2

320 cycles, crt_strcmp str1, str1
274 cycles, FastStringCompare str1, str1
307 cycles, crt_strcmp str1, str2
151 cycles, FastStringCompare str1, str2

320 cycles, crt_strcmp str1, str1
274 cycles, FastStringCompare str1, str1
308 cycles, crt_strcmp str1, str2
151 cycles, FastStringCompare str1, str2


Running on a P4 Northwood:

199 cycles, crt_strcmp str1, str1
274 cycles, FastStringCompare str1, str1
197 cycles, crt_strcmp str1, str2
205 cycles, FastStringCompare str1, str2

195 cycles, crt_strcmp str1, str1
272 cycles, FastStringCompare str1, str1
194 cycles, crt_strcmp str1, str2
205 cycles, FastStringCompare str1, str2

199 cycles, crt_strcmp str1, str1
272 cycles, FastStringCompare str1, str1
185 cycles, crt_strcmp str1, str2
205 cycles, FastStringCompare str1, str2


I think eliminating the partial-register (and misaligned) accesses should improve the speed significantly.
Well Microsoft, here's another nice mess you've gotten us into.

Ficko

Hi Guga,

For some times ago I was playing with string compare and tried aligning the strings.

I have no Idea how fast or "slow" :P is this algo just thought to post it for interest.


; STRCMP$(lpString1:STRING,lpString2:STRING),INT
;lpString1 < lpString2 EAX = -n : lpString1 > lpString2 EAX = +n : lpString1 = lpString2 EAX = 0
;Flags are set as well
; =============== S U B R O U T I N E =======================================
align 4
STRCMP$ proc public uses edi esi ebx String1:DWORD, String2:DWORD
mov esi, String1
mov edi, String2
cmp edi, esi
jz Exit_I
xor edx, edx
xor eax, eax
xor ecx, ecx
mov dl, 3
@@: test esi, edx
jz aligned
lodsb
mov cl, [edi]
sub eax, ecx
jnz Exit
jecxz Exit
inc edi
jmp @B
notequ: xchg eax, edx
@@: movzx eax, dl
movzx ecx, bl
sub eax, ecx
jnz Exit
jecxz Exit
shr edx, 8
shr ebx, 8
jmp @B
aligned:mov ecx, 80808080h ;ECX = 0x80808080
mov edx, 1010101h    ;EDX = 0x1010101
@@: mov eax, [esi]
mov ebx, [edi]
add esi, 4
cmp eax, ebx
jnz notequ
add edi, 4
sub eax, edx
and eax, ecx
je @B
not ebx
and eax, ebx
je @B
Exit_I: xor eax, eax
Exit: ret 8
STRCMP$ endp

dedndave

the problem with string comparison is one string may be aligned when the other is not

if you want maximum speed, i would think you would access both strings as aligned dwords
you may wind up writing 4 different loops to do the grunt-work

some testing may be required to find out if the in-register re-alignment
effort is actually faster than un-aligned accesses (for one of the strings)

MichaelW

My first test had a problem in that I failed to DWORD align my two test strings in the data section. Aligning them dropped the FastStringCompare (P3) cycle counts from ~274 and ~150 to ~154 and ~149. Then I determined that these two strings had to be the same length to get the correct comparison result. To fix this I simply made the strings the same length, but there is some problem either in the original code or in my translation. I was able to eliminate a few of the partial-register accesses, but not all of them.

;==============================================================================
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
;==============================================================================
.data
    align 4
    str1 db "The MASM32 SDK has a large body of example code that deals",0
    align 4
    str2 db "The MASM32 SDK has a large body of example code that deal ",0
.code
;==============================================================================
FastStringCompare proc uses esi ecx ebx edx string1:DWORD, string2:DWORD

    mov esi, string1
    xor ecx, ecx
    xor eax, eax
    xor edx, edx
    mov ebx, string2

    ;; ELIMINATED PARTIAL REGISTER ACCESSES HERE

    movzx eax, WORD PTR [ebx]
    movzx edx, WORD PTR [esi]
    xor eax, edx        ; just to force the zero flags.
                        ; if we have a match zero flag is settled.
                        ; But we need to check if the last byte at
                        ; string1 is zero.
  L0:
 
    or dh, 0
    jz L1               ; the last byte of the 1st string is zero.
                        ; We reached the end of the search.
                        ; Contunue computing the remaining bytes at "L1"
    lea ecx, [ecx+2]    ; compute the len of the string1

    mov dx, [esi+ecx]
    cmp [ebx+ecx], dx
    jz L0

  L1:
    lea ebx, [ebx+ecx]
    mov ax, [ebx-1]
    lea esi, [esi+ecx]
    mov dx, [esi-1]
    xor ax, dx          ; if both words on string1 and string2 don't
                        ; match jmp over. Otherwise, continue analysing
                        ; the remaining bytes
    jz L2
    xor eax, eax
    ret

  L2:                   ; We are almost there. If ecx = 0, means that the
                        ; string1 have only 2 byte len. So let´t search
                        ; for the 2nd byte of string2.

    ; do again for the last bytes
    ; so far all bytes are equal

    mov eax, TRUE
    .if ecx == 0        ; when len (ecx) is zero it means that we the
                        ; string is smaller then 2 bytes (including null
                        ; terminated).
        .if byte ptr [ebx+1]
                        ; so all we need is analyse if the last byte if
                        ; the string2 is zero (Because the 1st one of both
                        ; matches)
            xor eax, eax
        .endif
    .elseif byte ptr [ebx]
        xor eax, eax
    .endif

    ret

FastStringCompare endp

;==============================================================================
start:
;==============================================================================

    invoke FastStringCompare, chr$("g"), chr$("gu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("gu")
    printf("(should be 1)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("guu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("guuu"), chr$("guu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("guu"), chr$("guuu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gug"), chr$("guga")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("g44564"), chr$("112345g")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("g")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("gu"), chr$("gu")
    printf("(should be 1)\t%d\n", eax)
    invoke FastStringCompare, chr$("g"), chr$("gu")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("g"), chr$("g")
    printf("(should be 1)\t%d\n\n", eax)

    invoke FastStringCompare, chr$("gustavo trigueiros e guilherme"),
                              chr$("gustavo trigueiros e guilherme")
    printf("(should be 1)\t%d\n\n", eax)

    invoke FastStringCompare, chr$("guga"), chr$("guga1")
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, chr$("guga1"), chr$("guga")
    printf("(should be 0)\t%d\n\n", eax)

    invoke FastStringCompare, ADDR str1, ADDR str1
    printf("(should be 1)\t%d\n", eax)
    invoke FastStringCompare, ADDR str2, ADDR str2
    printf("(should be 1)\t%d\n", eax)
    invoke FastStringCompare, ADDR str1, ADDR str2
    printf("(should be 0)\t%d\n", eax)
    invoke FastStringCompare, ADDR str2, ADDR str1
    printf("(should be 0)\t%d\n\n", eax)

    invoke GetCurrentProcess
    invoke SetProcessAffinityMask, eax, 1

    invoke Sleep, 5000

    REPEAT 3

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke crt_strcmp, ADDR str1, ADDR str1
    counter_end
    printf("%d cycles, crt_strcmp str1, str1\n", eax)

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke FastStringCompare, ADDR str1, ADDR str1
    counter_end
    printf("%d cycles, FastStringCompare str1, str1\n", eax)

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke crt_strcmp, ADDR str1, ADDR str2
    counter_end
    printf("%d cycles, crt_strcmp str1, str2\n", eax)

    counter_begin 10000000, HIGH_PRIORITY_CLASS
        invoke FastStringCompare, ADDR str1, ADDR str2
    counter_end
    printf("%d cycles, FastStringCompare str1, str2\n\n", eax)

    ENDM

    inkey
    exit
;==============================================================================
end start


P3:

320 cycles, crt_strcmp str1, str1
129 cycles, FastStringCompare str1, str1
307 cycles, crt_strcmp str1, str2
124 cycles, FastStringCompare str1, str2

320 cycles, crt_strcmp str1, str1
129 cycles, FastStringCompare str1, str1
307 cycles, crt_strcmp str1, str2
123 cycles, FastStringCompare str1, str2

320 cycles, crt_strcmp str1, str1
129 cycles, FastStringCompare str1, str1
307 cycles, crt_strcmp str1, str2
124 cycles, FastStringCompare str1, str2


P4 Northwood:

193 cycles, crt_strcmp str1, str1
151 cycles, FastStringCompare str1, str1
194 cycles, crt_strcmp str1, str2
147 cycles, FastStringCompare str1, str2

196 cycles, crt_strcmp str1, str1
152 cycles, FastStringCompare str1, str1
196 cycles, crt_strcmp str1, str2
152 cycles, FastStringCompare str1, str2

196 cycles, crt_strcmp str1, str1
151 cycles, FastStringCompare str1, str1
196 cycles, crt_strcmp str1, str2
149 cycles, FastStringCompare str1, str2

Well Microsoft, here's another nice mess you've gotten us into.

jj2007

FastStringCompare looks competitive, but is the return value correct?

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 238/100 cycles


6929    cycles for 100 * crt_strcmp
7926    cycles for 100 * FastStringCompare
5713    cycles for 100 * MasmBasic StringsDiffer
6034    cycles for 100 * STRCMP$

6936    cycles for 100 * crt_strcmp
7925    cycles for 100 * FastStringCompare
5714    cycles for 100 * MasmBasic StringsDiffer
6033    cycles for 100 * STRCMP$

6930    cycles for 100 * crt_strcmp
7920    cycles for 100 * FastStringCompare
5720    cycles for 100 * MasmBasic StringsDiffer
6033    cycles for 100 * STRCMP$

6928    cycles for 100 * crt_strcmp
7924    cycles for 100 * FastStringCompare
5716    cycles for 100 * MasmBasic StringsDiffer
6034    cycles for 100 * STRCMP$

6933    cycles for 100 * crt_strcmp
7985    cycles for 100 * FastStringCompare
5717    cycles for 100 * MasmBasic StringsDiffer
6048    cycles for 100 * STRCMP$

-1      = eax crt_strcmp
0       = eax FastStringCompare
-23     = eax MasmBasic StringsDiffer
-23     = eax STRCMP$

Ficko

Mine above rocks despite using some slow OP codes like "jecxz". :bgrin:

Quote
130 cycles, crt_strcmp str1, str1
116 cycles, FastStringCompare str1, str1
1 cycles, STRCMP$ str1, str1
128 cycles, crt_strcmp str1, str2
111 cycles, FastStringCompare str1, str2
51 cycles, STRCMP$ str1, str2

130 cycles, crt_strcmp str1, str1
116 cycles, FastStringCompare str1, str1
1 cycles, STRCMP$ str1, str1
127 cycles, crt_strcmp str1, str2
111 cycles, FastStringCompare str1, str2
51 cycles, STRCMP$ str1, str2

130 cycles, crt_strcmp str1, str1
116 cycles, FastStringCompare str1, str1
1 cycles, STRCMP$ str1, str1
127 cycles, crt_strcmp str1, str2
111 cycles, FastStringCompare str1, str2
50 cycles, STRCMP$ str1, str2


MichaelW

The CRT strcmp function returns 0 if the strings match, < 0 if string1 < string2, or > 0 if string1 is > string2, allowing it to be used directly as a comparison function for qsort and bsearch.
Well Microsoft, here's another nice mess you've gotten us into.

jj2007

Quote from: Ficko on January 29, 2013, 01:46:36 AM
Mine above rocks despite using some slow OP codes like "jecxz". :bgrin:

It's pretty good for a non-SSE2 algo - I added it above :t

@Michael: I meant FastStringCompare, it returns zero for some reason - see above.

Ficko

Quote from: MichaelW on January 29, 2013, 02:02:34 AM
The CRT strcmp function returns 0 if the strings match, < 0 if string1 < string2, or > 0 if string1 is > string2, allowing it to be used directly as a comparison function for qsort and bsearch.

STRCMP$ mimics this behavior as well - additionaly it set's the flags too so "eax" do not have to be checked. -

I designed it for my QSORT algo indeed.  ;)

MichaelW

FastStringCompare is returning false because the strings do not match.
Well Microsoft, here's another nice mess you've gotten us into.

jj2007

Quote from: MichaelW on January 29, 2013, 02:27:52 AM
FastStringCompare is returning false because the strings do not match.

Make them match and see what it returns.

MichaelW

Quote from: jj2007 on January 29, 2013, 02:49:46 AM
Quote from: MichaelW on January 29, 2013, 02:27:52 AM
FastStringCompare is returning false because the strings do not match.

Make them match and see what it returns.

In my tests the return value is correct only if the strings are aligned (I used a 4-byte alignment and did not investigate other alignments). I intended to add this information to my second post, but forgot to do so.
Well Microsoft, here's another nice mess you've gotten us into.

jj2007

Strange - I get this:

align 4  ; not equal
Str1   db "gustavo trigueiros e guilhermeA", 0
align 4
Str2   db "gustavo trigueiros e guilhermeX", 0

6930    cycles for 100 * crt_strcmp
8307    cycles for 100 * FastStringCompare
4738    cycles for 100 * MasmBasic StringsDiffer
6028    cycles for 100 * STRCMP$

-1      = eax crt_strcmp
0       = eax FastStringCompare
-23     = eax MasmBasic StringsDiffer
-23     = eax STRCMP$


7910    cycles for 100 * crt_strcmp
8122    cycles for 100 * FastStringCompare
4741    cycles for 100 * MasmBasic StringsDiffer
3921    cycles for 100 * STRCMP$

align 4  ; EQUAL
Str1   db "gustavo trigueiros e guilhermeX", 0
align 4
Str2   db "gustavo trigueiros e guilhermeX", 0
0       = eax crt_strcmp
0       = eax FastStringCompare
0       = eax MasmBasic StringsDiffer
0       = eax STRCMP$

guga

Hi guys, many tks for testing :)

Note: The return value for this function is only &TRUE or &FALSE


  • If the strings matched it returns &TRUE
  • If it fails (no matter the order of the previously good match) it always returns &FALSE

I didn´t coded it to return 0-1 (The cases that we have a partial match) because i wasn´t thinking on the qsort and bsearch functions. (I´ll see if i can implement this return, if it don´t slow down the resultant timmings)

About the alignment of the strings, indeed, i agree, it would (in theory) speed up the scanning process in 2x, approximately , but, how compilers align their strings (in 2 byte or 4) ? This is because i thought in 2 byte alignment/scanning because the minimum null terminated string to be compared is always a char+zero (Ex: "a", 0). But, what happens when we are dealing with a string inside a structure for example ? Or inside another chain of bytes (Like an array, etc), or even if this strings is in coded in the .text section in the middle of a function (Like in 3dframes.exe from masm package, for example) ?

Or even worst, what if the string ends on the end of a memory chunck ? I mean, if we have a string the looks like this
.data section --->
Guga db "Hello", 0
Virtual Memory: B$ ? ? ?
(...)
.idata section.

If the function starts comparing in a 4 byte aligned when it reach the 5th byte "o, 0", wouldn´t cause the app to crash ? (Sorry for the dumb question, but i´m not sure if all compilers are 4 byte aligned)

Have anyone knows if all compilers align their strings (or others data) always in a 4 byte boundary ?

If it is always aligned on a 4 byte, i can try to implement a alignment strings for the comparison and see the results.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com