News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

byte ptr comparison

Started by Ryan, June 08, 2012, 10:43:10 PM

Previous topic - Next topic

Ryan

I'm still working on my search/find routine.  I want to do a byte by byte comparison.  If I can't do ".if byte ptr [eax]==byte ptr [edi]", what is the best alternative?  I already have a workaround by just moving [edi] to dl.  Is there an easy way to eliminate the step?  I'd also like to keep the edx register open for other things.  How does cmpsb do it?

BogdanOntanu

Quote from: Ryan on June 08, 2012, 10:43:10 PM
I'm still working on my search/find routine.  I want to do a byte by byte comparison.  If I can't do ".if byte ptr [eax]==byte ptr [edi]", what is the best alternative?  I already have a workaround by just moving [edi] to dl.  Is there an easy way to eliminate the step?  I'd also like to keep the edx register open for other things.  How does cmpsb do it?

mov al,[esi]
.if al == [edi]

.endif

cmpsb does mainly  the same thing but it is slow...

You need to explain more / better what exactly do you want to compare?

Two strings? one substring inside a string? A char inside a string?
Ambition is a lame excuse for the ones not brave enough to be lazy, www.oby.ro

Ryan

If cmpsb moves [esi] to al, then I guess there's no way around it.  No big deal.  I was just curious.  Thanks.

jj2007

No cmpsb doesn't use al, but it needs both esi and edi, of course. As regards "slow": That may depend on your CPU. Test yourself 8)

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
235     cycles for cmpsb
518     cycles for cmp [esi+ecx]
313     cycles for cmp [esi]

235     cycles for cmpsb
517     cycles for cmp [esi+ecx]
314     cycles for cmp [esi]

xandaz

    I'm not the kind of counting clocks and wasnt aware it was slow. I've done some compare routines more or less on these lines:
Compare PROC lpSource:DWORD,cbSource:DWORD,lpCompare:DWORD
mov esi,lpSource
mov edi,lpCompare
mov ecx,cbSource ; the size of the source string
compare:
cmpsb
jnz no_match
dec ecx
jcxz  match
jmp compare
match:
mov eax,TRUE
ret
no_match:
mov eax,FALSE
ret

hutch--

Ryan,

Bogdan is right here, the old string instructions are generally slow but they also have the irritation of being locked into specific registers which may not fit into the rest of the code you need to write. The mechanics of doing string comparisons vary on what you are doing, for a normal search you load the start character in one register then compare it to a memory operand pointed to by BYTE PTR.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    .data?
      value dd ?

    .data
      txt db "one two three four five six seven", 0

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    mov eax, "f"            ; the start character to find

    mov edx, OFFSET txt     ; the address of the text to find it in
    sub edx, 1


  lbl0:
    add edx, 1
    cmp BYTE PTR [edx], 0   ; is it the zero terminator ?
    je outa_here
    cmp BYTE PTR [edx], al  ; compare against the BYTE sized part of EAX
    jne lbl0

  ; in a search algo you would branch here to compare the
  ; rest of the search word to the current text location
  ; then return to lbl0 if it does not match

  outa_here:

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

dedndave

REPZ CMPSB might not be as bad as you think   :P
we should probably do some comparisons
at any rate, have a look at Hutch's szCmp and szCmpi routines in the \masm32\m32lib folder

jj2007


dedndave

yah - that is what i might expect
if we were doing REPZ CMPSD on aligned DWORD's, things might be different

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
100     cmpsb
100     [esi+ecx]
100     [esi]

500     cycles for cmpsb
448     cycles for cmp [esi+ecx]
544     cycles for cmp [esi]

496     cycles for cmpsb
448     cycles for cmp [esi+ecx]
541     cycles for cmp [esi]


string length may play an important role
although, we rarely compare strings longer than, say, 100 bytes or so

Ryan

If cmpsb doesn't use al, how does it do the comparison?  I've tried "cmp byte ptr [eax], byte ptr [edx]", but it doesn't work.  I assume it must be using some other method to do the comparison.

hutch--

Ryan,

That will not work because there is no mnemonic in an x86 processor that will directly compare memory to memory, you must load at least one into a register. CMP BYTE PTR [ESI], AL

dedndave

i suppose CMPSB loads a byte from [ESI] into a temporary register

Ryan

I'm not sure what to think.  jj says cmpsb doesn't use al, but Hutch and Dave allude to the possibility that it might?

dedndave

no - the CMPS instructions do not use AL/AX/EAX   :biggrin:

they perform the equivalent of
        cmp     [esi],[edi]
in byte, word, or dword form, as applicable
then, they adjust the index registers ESI and EDI

i mentioned a temporary register
that is a register (or plural) that the CPU uses for certain operations

the reason i say that is - i don't think there is any way for the hardware DMA to compare values at two different addresses - it's not part of the design
i.e., the CPU has to get one of the values into an internal register to make the comparison

jj2007

One more test with extra long strings:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
10000   cmpsb
10000   [esi+ecx]
10000   [esi]

40124   cycles for cmpsb
30076   cycles for cmp [esi+ecx]
31215   cycles for cmp [esi]

40127   cycles for cmpsb
30055   cycles for cmp [esi+ecx]
31208   cycles for cmp [esi]


On the AMD, cmpsb was clearly faster, but on Intel it is exactly 33% slower.