News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

byte ptr comparison

Started by Ryan, June 08, 2012, 10:43:10 PM

Previous topic - Next topic

dedndave

must be "opposite day"   :biggrin:

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
10000   cmpsb
10000   [esi+ecx]
10000   [esi]

45595   cycles for cmpsb
53307   cycles for cmp [esi+ecx]
60248   cycles for cmp [esi]

45436   cycles for cmpsb
53530   cycles for cmp [esi+ecx]
57437   cycles for cmp [esi]

Ryan

Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
10000    cmpsb
10000    [esi+ecx]
10000    [esi]

40343    cycles for cmpsb
20082    cycles for cmp [esi+ecx]
20089    cycles for cmp [esi]

40151    cycles for cmpsb
20159    cycles for cmp [esi+ecx]
20140    cycles for cmp [esi]

FORTRANS

Hi,

   Here are two more.  Tried a third machine, but that hasn't
worked yet.
pre-P4 (SSE1)
10000   cmpsb
10000   [esi+ecx]
10000   [esi]

45371   cycles for cmpsb
32006   cycles for cmp [esi+ecx]
40431   cycles for cmp [esi]

45366   cycles for cmpsb
32014   cycles for cmp [esi+ecx]
40435   cycles for cmp [esi]


--- ok ---

pre-P410000   cmpsb
10000   [esi+ecx]
10000   [esi]

48511   cycles for cmpsb
58212   cycles for cmp [esi+ecx]
58046   cycles for cmp [esi]

48616   cycles for cmpsb
58051   cycles for cmp [esi+ecx]
58045   cycles for cmp [esi]


--- ok ---

Regards,

Steve N.

hutch--

Ryan,

> I'm not sure what to think.  jj says cmpsb doesn't use al, but Hutch and Dave allude to the possibility that it might?

Don't confuse CMP with CMPSB or similar, CMPSB is one of the antique string instructions where CMP is a fast integer operation. CMP can use any integer register where CMPSB is locked into specific registers. Its basically old DOS junk that is left in Intel processors for backwards compatibility.

There is special case circuitry in most later Intel processors to speed up some of the old string instructions MOVS CMPS LODS etc ... but it is specific to the use of the REP(?) prefix family with them and it locks you into specific register usage, the reason why I recommend that you code your comparisons using instructions like CMP instead.


Ryan

Thanks Hutch.

I already took your suggestion in the other thread and used straight comparisons.  I also want my search to be case insensitive, so it's easier for me to check for mixed case one by one.

My question in this thread was about the possibility of not having to use a register when comparing byte to byte.  It appears the answer is it is not.  I already know the solution; I'll just have to push/pop an additional register to make it happen.

dedndave

Hutch has a function named szCmpi that is case insensitive   :t
it's probably pretty fast, too

jj2007

Quote from: hutch-- on June 09, 2012, 08:52:26 AMCMPSB is locked into specific registers. Its basically old DOS junk

Wait a second: Take the worst case above, 4 cycles per byte of comparison. With a 2GHz CPU, analysis of a 500MB file takes 500000000*4/2000000000 = 1 second.

It's good to be dogmatic when writing libraries, but for a real life app check first if the simplest instruction is really too "slow" :biggrin:

hutch--

Pick a processor, pick a result, I used to own a 550 meg AMD years ago that was really fast on the old string instructions but very ordinary on the lower level integer instructions, since then you had at least 2 families of PIIIs, 3 families of PIVs, Core series duos and quads and the current i3/5/7s and this is only with Intel processors. Over a wide average the old string instructions live in microcode, not in the fast intrinsic instructions. Then you have an antique architecture locked into the source and destination index ALA 16 bit 8088 DOS code.

The only place for the old string instructions is in compatibility mode where special case circuitry make REP(?) prefixed version fast AFTER a specific byte count. The price of locking yourself into old DOS junk is buying its ancient architecture, something like trying to tune the last couple of horsepower out of a T model Ford.

dedndave

just so you know...
not everyone feels that way - lol
in my opinion, string instructions have their place
it's one of the many tools in the toolbox
try to use the proper tool for the job

hutch--

 :biggrin:

> in my opinion, string instructions have their place

The verb tense is wrong "had" reflects their real value, 16 bit DOS on an 8088 or 8086 running at 5 mhz. DOS only died 20 years ago.

jj2007

Quote from: hutch-- on June 10, 2012, 01:06:05 AM
The verb tense is wrong "had" reflects their real value, 16 bit DOS on an 8088 or 8086 running at 5 mhz. DOS only died 20 years ago.

So true :biggrin:
7C8023BC     ³.  8D7D CC             lea edi, [ebp-34]
7C8023BF     ³.  AB                  stosd

dedndave

 :shock:

i can't believe you guys don't agree with me   :P

jj2007

Quote from: dedndave on June 10, 2012, 07:07:18 AM
i can't believe you guys don't agree with me   :P

Well, partly I do. I should remember that I have to check on Monday if the Win7 Kernel still has that 20 year old DOSD stosd ;)

BogdanOntanu

STOSD, STOSB, MOVSB, MOVSD and friends are NOT slow.

But the thread was about comparing and CMPSB is slow ;)
Ambition is a lame excuse for the ones not brave enough to be lazy, www.oby.ro

dedndave

we timed it earlier in the thread, boggie
it's not all that bad
very machine-dependant results