The MASM Forum

General => The Campus => Topic started by: Ryan on June 08, 2012, 10:43:10 PM

Title: byte ptr comparison
Post by: Ryan on June 08, 2012, 10:43:10 PM
I'm still working on my search/find routine.  I want to do a byte by byte comparison.  If I can't do ".if byte ptr [eax]==byte ptr [edi]", what is the best alternative?  I already have a workaround by just moving [edi] to dl.  Is there an easy way to eliminate the step?  I'd also like to keep the edx register open for other things.  How does cmpsb do it?
Title: Re: byte ptr comparison
Post by: BogdanOntanu on June 08, 2012, 11:08:05 PM
Quote from: Ryan on June 08, 2012, 10:43:10 PM
I'm still working on my search/find routine.  I want to do a byte by byte comparison.  If I can't do ".if byte ptr [eax]==byte ptr [edi]", what is the best alternative?  I already have a workaround by just moving [edi] to dl.  Is there an easy way to eliminate the step?  I'd also like to keep the edx register open for other things.  How does cmpsb do it?

mov al,[esi]
.if al == [edi]

.endif

cmpsb does mainly  the same thing but it is slow...

You need to explain more / better what exactly do you want to compare?

Two strings? one substring inside a string? A char inside a string?
Title: Re: byte ptr comparison
Post by: Ryan on June 08, 2012, 11:17:13 PM
If cmpsb moves [esi] to al, then I guess there's no way around it.  No big deal.  I was just curious.  Thanks.
Title: Re: byte ptr comparison
Post by: jj2007 on June 09, 2012, 12:22:36 AM
No cmpsb doesn't use al, but it needs both esi and edi, of course. As regards "slow": That may depend on your CPU. Test yourself 8)

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
235     cycles for cmpsb
518     cycles for cmp [esi+ecx]
313     cycles for cmp [esi]

235     cycles for cmpsb
517     cycles for cmp [esi+ecx]
314     cycles for cmp [esi]
Title: Re: byte ptr comparison
Post by: xandaz on June 09, 2012, 12:25:43 AM
    I'm not the kind of counting clocks and wasnt aware it was slow. I've done some compare routines more or less on these lines:
Compare PROC lpSource:DWORD,cbSource:DWORD,lpCompare:DWORD
mov esi,lpSource
mov edi,lpCompare
mov ecx,cbSource ; the size of the source string
compare:
cmpsb
jnz no_match
dec ecx
jcxz  match
jmp compare
match:
mov eax,TRUE
ret
no_match:
mov eax,FALSE
ret
Title: Re: byte ptr comparison
Post by: hutch-- on June 09, 2012, 12:28:12 AM
Ryan,

Bogdan is right here, the old string instructions are generally slow but they also have the irritation of being locked into specific registers which may not fit into the rest of the code you need to write. The mechanics of doing string comparisons vary on what you are doing, for a normal search you load the start character in one register then compare it to a memory operand pointed to by BYTE PTR.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    .data?
      value dd ?

    .data
      txt db "one two three four five six seven", 0

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    mov eax, "f"            ; the start character to find

    mov edx, OFFSET txt     ; the address of the text to find it in
    sub edx, 1


  lbl0:
    add edx, 1
    cmp BYTE PTR [edx], 0   ; is it the zero terminator ?
    je outa_here
    cmp BYTE PTR [edx], al  ; compare against the BYTE sized part of EAX
    jne lbl0

  ; in a search algo you would branch here to compare the
  ; rest of the search word to the current text location
  ; then return to lbl0 if it does not match

  outa_here:

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: byte ptr comparison
Post by: dedndave on June 09, 2012, 12:30:08 AM
REPZ CMPSB might not be as bad as you think   :P
we should probably do some comparisons
at any rate, have a look at Hutch's szCmp and szCmpi routines in the \masm32\m32lib folder
Title: Re: byte ptr comparison
Post by: jj2007 on June 09, 2012, 12:42:22 AM
See above, Reply #3 :biggrin:
Title: Re: byte ptr comparison
Post by: dedndave on June 09, 2012, 12:48:53 AM
yah - that is what i might expect
if we were doing REPZ CMPSD on aligned DWORD's, things might be different

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
100     cmpsb
100     [esi+ecx]
100     [esi]

500     cycles for cmpsb
448     cycles for cmp [esi+ecx]
544     cycles for cmp [esi]

496     cycles for cmpsb
448     cycles for cmp [esi+ecx]
541     cycles for cmp [esi]


string length may play an important role
although, we rarely compare strings longer than, say, 100 bytes or so
Title: Re: byte ptr comparison
Post by: Ryan on June 09, 2012, 12:54:25 AM
If cmpsb doesn't use al, how does it do the comparison?  I've tried "cmp byte ptr [eax], byte ptr [edx]", but it doesn't work.  I assume it must be using some other method to do the comparison.
Title: Re: byte ptr comparison
Post by: hutch-- on June 09, 2012, 01:01:38 AM
Ryan,

That will not work because there is no mnemonic in an x86 processor that will directly compare memory to memory, you must load at least one into a register. CMP BYTE PTR [ESI], AL
Title: Re: byte ptr comparison
Post by: dedndave on June 09, 2012, 01:05:04 AM
i suppose CMPSB loads a byte from [ESI] into a temporary register
Title: Re: byte ptr comparison
Post by: Ryan on June 09, 2012, 01:18:30 AM
I'm not sure what to think.  jj says cmpsb doesn't use al, but Hutch and Dave allude to the possibility that it might?
Title: Re: byte ptr comparison
Post by: dedndave on June 09, 2012, 01:47:01 AM
no - the CMPS instructions do not use AL/AX/EAX   :biggrin:

they perform the equivalent of
        cmp     [esi],[edi]
in byte, word, or dword form, as applicable
then, they adjust the index registers ESI and EDI

i mentioned a temporary register
that is a register (or plural) that the CPU uses for certain operations

the reason i say that is - i don't think there is any way for the hardware DMA to compare values at two different addresses - it's not part of the design
i.e., the CPU has to get one of the values into an internal register to make the comparison
Title: Re: byte ptr comparison
Post by: jj2007 on June 09, 2012, 02:22:49 AM
One more test with extra long strings:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
10000   cmpsb
10000   [esi+ecx]
10000   [esi]

40124   cycles for cmpsb
30076   cycles for cmp [esi+ecx]
31215   cycles for cmp [esi]

40127   cycles for cmpsb
30055   cycles for cmp [esi+ecx]
31208   cycles for cmp [esi]


On the AMD, cmpsb was clearly faster, but on Intel it is exactly 33% slower.
Title: Re: byte ptr comparison
Post by: dedndave on June 09, 2012, 02:53:24 AM
must be "opposite day"   :biggrin:

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
10000   cmpsb
10000   [esi+ecx]
10000   [esi]

45595   cycles for cmpsb
53307   cycles for cmp [esi+ecx]
60248   cycles for cmp [esi]

45436   cycles for cmpsb
53530   cycles for cmp [esi+ecx]
57437   cycles for cmp [esi]
Title: Re: byte ptr comparison
Post by: Ryan on June 09, 2012, 08:03:56 AM
Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
10000    cmpsb
10000    [esi+ecx]
10000    [esi]

40343    cycles for cmpsb
20082    cycles for cmp [esi+ecx]
20089    cycles for cmp [esi]

40151    cycles for cmpsb
20159    cycles for cmp [esi+ecx]
20140    cycles for cmp [esi]
Title: Re: byte ptr comparison
Post by: FORTRANS on June 09, 2012, 08:33:50 AM
Hi,

   Here are two more.  Tried a third machine, but that hasn't
worked yet.
pre-P4 (SSE1)
10000   cmpsb
10000   [esi+ecx]
10000   [esi]

45371   cycles for cmpsb
32006   cycles for cmp [esi+ecx]
40431   cycles for cmp [esi]

45366   cycles for cmpsb
32014   cycles for cmp [esi+ecx]
40435   cycles for cmp [esi]


--- ok ---

pre-P410000   cmpsb
10000   [esi+ecx]
10000   [esi]

48511   cycles for cmpsb
58212   cycles for cmp [esi+ecx]
58046   cycles for cmp [esi]

48616   cycles for cmpsb
58051   cycles for cmp [esi+ecx]
58045   cycles for cmp [esi]


--- ok ---

Regards,

Steve N.
Title: Re: byte ptr comparison
Post by: hutch-- on June 09, 2012, 08:52:26 AM
Ryan,

> I'm not sure what to think.  jj says cmpsb doesn't use al, but Hutch and Dave allude to the possibility that it might?

Don't confuse CMP with CMPSB or similar, CMPSB is one of the antique string instructions where CMP is a fast integer operation. CMP can use any integer register where CMPSB is locked into specific registers. Its basically old DOS junk that is left in Intel processors for backwards compatibility.

There is special case circuitry in most later Intel processors to speed up some of the old string instructions MOVS CMPS LODS etc ... but it is specific to the use of the REP(?) prefix family with them and it locks you into specific register usage, the reason why I recommend that you code your comparisons using instructions like CMP instead.

Title: Re: byte ptr comparison
Post by: Ryan on June 09, 2012, 09:40:33 AM
Thanks Hutch.

I already took your suggestion in the other thread and used straight comparisons.  I also want my search to be case insensitive, so it's easier for me to check for mixed case one by one.

My question in this thread was about the possibility of not having to use a register when comparing byte to byte.  It appears the answer is it is not.  I already know the solution; I'll just have to push/pop an additional register to make it happen.
Title: Re: byte ptr comparison
Post by: dedndave on June 09, 2012, 12:42:04 PM
Hutch has a function named szCmpi that is case insensitive   :t
it's probably pretty fast, too
Title: Re: byte ptr comparison
Post by: jj2007 on June 09, 2012, 05:58:18 PM
Quote from: hutch-- on June 09, 2012, 08:52:26 AMCMPSB is locked into specific registers. Its basically old DOS junk

Wait a second: Take the worst case above, 4 cycles per byte of comparison. With a 2GHz CPU, analysis of a 500MB file takes 500000000*4/2000000000 = 1 second.

It's good to be dogmatic when writing libraries, but for a real life app check first if the simplest instruction is really too "slow" :biggrin:
Title: Re: byte ptr comparison
Post by: hutch-- on June 09, 2012, 06:10:39 PM
Pick a processor, pick a result, I used to own a 550 meg AMD years ago that was really fast on the old string instructions but very ordinary on the lower level integer instructions, since then you had at least 2 families of PIIIs, 3 families of PIVs, Core series duos and quads and the current i3/5/7s and this is only with Intel processors. Over a wide average the old string instructions live in microcode, not in the fast intrinsic instructions. Then you have an antique architecture locked into the source and destination index ALA 16 bit 8088 DOS code.

The only place for the old string instructions is in compatibility mode where special case circuitry make REP(?) prefixed version fast AFTER a specific byte count. The price of locking yourself into old DOS junk is buying its ancient architecture, something like trying to tune the last couple of horsepower out of a T model Ford.
Title: Re: byte ptr comparison
Post by: dedndave on June 10, 2012, 12:45:30 AM
just so you know...
not everyone feels that way - lol
in my opinion, string instructions have their place
it's one of the many tools in the toolbox
try to use the proper tool for the job
Title: Re: byte ptr comparison
Post by: hutch-- on June 10, 2012, 01:06:05 AM
 :biggrin:

> in my opinion, string instructions have their place

The verb tense is wrong "had" reflects their real value, 16 bit DOS on an 8088 or 8086 running at 5 mhz. DOS only died 20 years ago.
Title: Re: byte ptr comparison
Post by: jj2007 on June 10, 2012, 02:57:06 AM
Quote from: hutch-- on June 10, 2012, 01:06:05 AM
The verb tense is wrong "had" reflects their real value, 16 bit DOS on an 8088 or 8086 running at 5 mhz. DOS only died 20 years ago.

So true :biggrin:
7C8023BC     ³.  8D7D CC             lea edi, [ebp-34]
7C8023BF     ³.  AB                  stosd
Title: Re: byte ptr comparison
Post by: dedndave on June 10, 2012, 07:07:18 AM
 :shock:

i can't believe you guys don't agree with me   :P
Title: Re: byte ptr comparison
Post by: jj2007 on June 10, 2012, 07:22:20 AM
Quote from: dedndave on June 10, 2012, 07:07:18 AM
i can't believe you guys don't agree with me   :P

Well, partly I do. I should remember that I have to check on Monday if the Win7 Kernel still has that 20 year old DOSD stosd ;)
Title: Re: byte ptr comparison
Post by: BogdanOntanu on June 10, 2012, 08:30:42 AM
STOSD, STOSB, MOVSB, MOVSD and friends are NOT slow.

But the thread was about comparing and CMPSB is slow ;)
Title: Re: byte ptr comparison
Post by: dedndave on June 10, 2012, 09:58:09 AM
we timed it earlier in the thread, boggie
it's not all that bad
very machine-dependant results
Title: Re: byte ptr comparison
Post by: Ryan on June 10, 2012, 09:58:35 AM
Um... the thread was an inquiry about how cmpsb compares two byte pointers.  I wanted to know if I could use similar logic to keep a register open for other use.
Title: Re: byte ptr comparison
Post by: hutch-- on June 10, 2012, 12:56:26 PM
Ryan,

If you use CMPS you must use the registers it requires, if you perform the comparison using lower level mnemonics you have more freedom to choose the registers you have available. If you look up the Intel manual you will see that you must use the source index ESI and the destination index EDI to perform the comparison and used in conjunction with the REP prefix. Byte counts MUST be done in ECX.

From the manual,

Quote
The CF, OF, SF, ZF, AF, and PF flags are set according to the temporary result of the comparison.

I have tried to warn you that using CMPS is buying into old DOS junk and an antique architecture. In the 8088 - 8086 days you had no choice, with the advent of 32 bit on 486 and later you are free of these restrictions.

What you write has some to do with what you are after, if you are performing a normal scan for text, you normally load the 1st character of the text into a register then scan through the text data you are searching to find a single character match. When you find a character match you branch to another loop to compare the rest of the characters in the text being searched for. If it matches the search text you exit with the start offset of the result, else you return to the main loop and keep scanning the text being searched.

If you are performing a text comparison, you load both addresses and compare them byte by byte.