byte ptr comparison

Ryan · June 08, 2012, 10:43:10 PM

I'm still working on my search/find routine. I want to do a byte by byte comparison. If I can't do ".if byte ptr [eax]==byte ptr [edi]", what is the best alternative? I already have a workaround by just moving [edi] to dl. Is there an easy way to eliminate the step? I'd also like to keep the edx register open for other things. How does cmpsb do it?

BogdanOntanu · June 08, 2012, 11:08:05 PM

Quote from: Ryan on June 08, 2012, 10:43:10 PM
I'm still working on my search/find routine. I want to do a byte by byte comparison. If I can't do ".if byte ptr [eax]==byte ptr [edi]", what is the best alternative? I already have a workaround by just moving [edi] to dl. Is there an easy way to eliminate the step? I'd also like to keep the edx register open for other things. How does cmpsb do it?

mov al,[esi]
.if al == [edi]

.endif

cmpsb does mainly the same thing but it is slow...

You need to explain more / better what exactly do you want to compare?

Two strings? one substring inside a string? A char inside a string?

Ryan · June 08, 2012, 11:17:13 PM

If cmpsb moves [esi] to al, then I guess there's no way around it. No big deal. I was just curious. Thanks.

jj2007 · June 09, 2012, 12:22:36 AM

No cmpsb doesn't use al, but it needs both esi and edi, of course. As regards "slow": That may depend on your CPU. Test yourself 8)

Code Select

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
235     cycles for cmpsb
518     cycles for cmp [esi+ecx]
313     cycles for cmp [esi]

235     cycles for cmpsb
517     cycles for cmp [esi+ecx]
314     cycles for cmp [esi]

xandaz · June 09, 2012, 12:25:43 AM

I'm not the kind of counting clocks and wasnt aware it was slow. I've done some compare routines more or less on these lines:

Code Select


Compare PROC lpSource:DWORD,cbSource:DWORD,lpCompare:DWORD
mov esi,lpSource
mov edi,lpCompare
mov ecx,cbSource ; the size of the source string
compare:
cmpsb
jnz no_match
dec ecx
jcxz  match
jmp compare
match:
mov eax,TRUE
ret
no_match:
mov eax,FALSE
ret

hutch-- · June 09, 2012, 12:28:12 AM

Ryan,

Bogdan is right here, the old string instructions are generally slow but they also have the irritation of being locked into specific registers which may not fit into the rest of the code you need to write. The mechanics of doing string comparisons vary on what you are doing, for a normal search you load the start character in one register then compare it to a memory operand pointed to by BYTE PTR.

IF 0 ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

include \masm32\include\masm32rt.inc

.data?
value dd ?

.data
txt db "one two three four five six seven", 0

.code

start:

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

call main
inkey
exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

mov eax, "f" ; the start character to find

mov edx, OFFSET txt ; the address of the text to find it in
sub edx, 1

lbl0:
add edx, 1
cmp BYTE PTR [edx], 0 ; is it the zero terminator ?
je outa_here
cmp BYTE PTR [edx], al ; compare against the BYTE sized part of EAX
jne lbl0

; in a search algo you would branch here to compare the
; rest of the search word to the current text location
; then return to lbl0 if it does not match

outa_here:

ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

dedndave · June 09, 2012, 12:30:08 AM

REPZ CMPSB might not be as bad as you think :P
we should probably do some comparisons
at any rate, have a look at Hutch's szCmp and szCmpi routines in the \masm32\m32lib folder

jj2007 · June 09, 2012, 12:42:22 AM

See above, Reply #3

dedndave · June 09, 2012, 12:48:53 AM

yah - that is what i might expect
if we were doing REPZ CMPSD on aligned DWORD's, things might be different

prescott w/htt

Code Select

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
100     cmpsb
100     [esi+ecx]
100     [esi]

500     cycles for cmpsb
448     cycles for cmp [esi+ecx]
544     cycles for cmp [esi]

496     cycles for cmpsb
448     cycles for cmp [esi+ecx]
541     cycles for cmp [esi]

string length may play an important role
although, we rarely compare strings longer than, say, 100 bytes or so

Ryan · June 09, 2012, 12:54:25 AM

If cmpsb doesn't use al, how does it do the comparison? I've tried "cmp byte ptr [eax], byte ptr [edx]", but it doesn't work. I assume it must be using some other method to do the comparison.

hutch-- · June 09, 2012, 01:01:38 AM

Ryan,

That will not work because there is no mnemonic in an x86 processor that will directly compare memory to memory, you must load at least one into a register. CMP BYTE PTR [ESI], AL

dedndave · June 09, 2012, 01:05:04 AM

i suppose CMPSB loads a byte from [ESI] into a temporary register

Ryan · June 09, 2012, 01:18:30 AM

I'm not sure what to think. jj says cmpsb doesn't use al, but Hutch and Dave allude to the possibility that it might?

dedndave · June 09, 2012, 01:47:01 AM

no - the CMPS instructions do not use AL/AX/EAX

they perform the equivalent of

Code Select

cmp [esi],[edi]
in byte, word, or dword form, as applicable
then, they adjust the index registers ESI and EDI

i mentioned a temporary register
that is a register (or plural) that the CPU uses for certain operations

the reason i say that is - i don't think there is any way for the hardware DMA to compare values at two different addresses - it's not part of the design
i.e., the CPU has to get one of the values into an internal register to make the comparison

jj2007 · June 09, 2012, 02:22:49 AM

One more test with extra long strings:

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
10000   cmpsb
10000   [esi+ecx]
10000   [esi]

40124   cycles for cmpsb
30076   cycles for cmp [esi+ecx]
31215   cycles for cmp [esi]

40127   cycles for cmpsb
30055   cycles for cmp [esi+ecx]
31208   cycles for cmp [esi]

On the AMD, cmpsb was clearly faster, but on Intel it is exactly 33% slower.

The MASM Forum

News:

byte ptr comparison

Ryan

BogdanOntanu

Ryan

jj2007

xandaz

hutch--

dedndave

jj2007

dedndave

Ryan

hutch--

dedndave

Ryan

dedndave

jj2007