Author Topic: Comparing 128-bit numbers aka OWORDs  (Read 119033 times)

nidud

  • Member
  • *****
  • Posts: 2056
    • https://github.com/nidud/asmc
Re: Comparing 128-bit numbers aka OWORDs
« Reply #285 on: September 02, 2013, 08:34:08 AM »
And yes, it's my fault because I erroneously used Qcmp in the timings. I was qonfused ::)

well, it's difficult to read your "code", but I think it will be like this:
Code: [Select]
movups xmm0,A[0]
movups xmm1,B[0]
movaps xmm2,xmm0 ; copy for pcmpeqb
  pcmpeqb xmm2,xmm1
pmovmskb edx,xmm2 ; show in dx where xt0 differs to xt1
not dx
and dh,07Fh
bsr edx,edx
push ecx
movzx ecx,byte ptr A[15]
bswap ecx
mov cl,A[edx]
movzx edx,byte ptr B[edx]
bswap edx
mov dl,B[15]
bswap edx
cmp ecx,edx
pop ecx

which could be reduced to this:
Code: [Select]
movups xmm0,A[0]
movups xmm1,B[0]
  pcmpeqb xmm0,xmm1
pmovmskb eax,xmm0
not eax
and eax,07FFFh
bsr eax,eax
mov dh,A[15]
mov dl,A[eax]
mov al,B[eax]
mov ah,B[15]
cmp dx,ax

KeepingRealBusy

  • Member
  • ***
  • Posts: 426
Re: Comparing 128-bit numbers aka OWORDs
« Reply #286 on: September 02, 2013, 08:52:39 AM »

Here is my contribution (from reply  284)

Code: [Select]

AMD A8-3520M APU with Radeon(tm) HD Graphics (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
1358    kCycles [x][x][x] - Cmp128Dave
1206    kCycles [x][x][x] - Cmp128Nidud
975     kCycles [x][x][x] - Cmp128NidudSSE
1171    kCycles [x][x][ ] - Cmp128Alex
1766    kCycles [x][x][x] - MasmBasic Ocmp
1424    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1040    kCycles [x][x][x] - Cmp128JJAlexSSE_2
721     kCycles [x][x][x] - Cmp128JJAlexSSE_3
535     kCycles [x][x][ ] - AxCMP128bitProc3
519     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---

Dave AKA KRB

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #287 on: September 02, 2013, 12:35:42 PM »
Jochen, did you time the version of a macro I posted couple pages above?
Here it is:

Cmp128JJAlexSSE_1 MACRO ow0:REQ, ow1:REQ
LOCAL @l1, @l2
   movups xmm0,[ow0]
   movups xmm1,[ow1]
   pcmpeqb   xmm0,xmm1
   pmovmskb ecx,xmm0
   xor ecx,0FFFFh
   jz @l2
   and ecx,7FFFh
   bsr ecx,ecx
   mov ah,byte ptr [ow0+15]
   mov dh,byte ptr [ow1+15]
   mov al,byte ptr [ow0+ecx]
   mov dl,byte ptr [ow1+ecx]
   cmp ax,dx
   @l2:
ENDM


For me it faster than original "_1" macro, also you can try to change so

   mov eax,word ptr [ow0+14]
   mov edx,word ptr [ow1+14]


but for me it is slower than the version above it.

Timings for it (there is your old macro - my testbed us a bit outdated)
Code: [Select]

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
2189320 cycles [x][x][x] - Cmp128Nidud
2295837 cycles [x][x][x] - Cmp128NidudSSE
2773387 cycles [x][x][x] - Cmp128Dave
4033478 cycles [x][x][x] - Cmp128Dave2
1597228 cycles [x][x][x] - Cmp128JJAlexSSE_1
1622741 cycles [x][x][x] - Cmp128JJAlexSSE_2
1905774 cycles [x][x][x] - Cmp128JJAlexSSE_3
993931  cycles [x][x][ ] - Cmp128Alex
1859714 cycles [x][x][x] - Cmp128Alex_2
1901902 cycles [x][x][x] - Cmp128Alex_3
1994856 cycles [x][x][ ] - Cmp128JJSSE
1346269 cycles [x][x][ ] - AxCMP128bitProc3
1311894 cycles [x][x][ ] - AxCMP128bitProc3c (cmov)
741050  cycles [x][ ][ ] - Cmp128DaveU
770599  cycles [x][ ][ ] - Cmp128NidudU

--- ok ---


Timings for Cmp128_timingsOQ
Code: [Select]

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
2696    kCycles [x][x][x] - Cmp128Dave
2713    kCycles [x][x][x] - Cmp128Nidud
3125    kCycles [x][x][x] - Cmp128NidudSSE
945     kCycles [x][x][ ] - Cmp128Alex
1932    kCycles [x][x][x] - MasmBasic Ocmp
1485    kCycles [x][x][x] - MasmBasic Qcmp
1639    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1604    kCycles [x][x][x] - Cmp128JJAlexSSE_2
1595    kCycles [x][x][x] - Cmp128JJAlexSSE_3
1360    kCycles [x][x][ ] - AxCMP128bitProc3
1274    kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---


Timings for Cmp128_timingsO
Code: [Select]

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
2856    kCycles [x][x][x] - Cmp128Dave
2752    kCycles [x][x][x] - Cmp128Nidud
3128    kCycles [x][x][x] - Cmp128NidudSSE
956     kCycles [x][x][ ] - Cmp128Alex
1928    kCycles [x][x][x] - MasmBasic Ocmp
1641    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1601    kCycles [x][x][x] - Cmp128JJAlexSSE_2
1592    kCycles [x][x][x] - Cmp128JJAlexSSE_3
1361    kCycles [x][x][ ] - AxCMP128bitProc3
1272    kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #288 on: September 02, 2013, 12:47:09 PM »
Hi Dave :t


Here is my contribution (from reply  284)

Code: [Select]

AMD A8-3520M APU with Radeon(tm) HD Graphics (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
1358    kCycles [x][x][x] - Cmp128Dave
1206    kCycles [x][x][x] - Cmp128Nidud
975     kCycles [x][x][x] - Cmp128NidudSSE
1171    kCycles [x][x][ ] - Cmp128Alex
1766    kCycles [x][x][x] - MasmBasic Ocmp
1424    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1040    kCycles [x][x][x] - Cmp128JJAlexSSE_2
721     kCycles [x][x][x] - Cmp128JJAlexSSE_3
535     kCycles [x][x][ ] - AxCMP128bitProc3
519     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---

Dave AKA KRB

Incredible difference in the algos, which use full and half sized regs. Your AMD seems to very good work with "partial" regs, contrary to Intel's which are bad with them.
Cmp128JJAlexSSE_3 differs from Cmp128JJAlexSSE_1
only with this:

   xor cx,0FFFFh
   jz @l2
   and cx,7FFFh
   bsr cx,cx


jj2007

  • Member
  • *****
  • Posts: 11177
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #289 on: September 02, 2013, 04:28:55 PM »
Jochen, did you time the version of a macro I posted couple pages above?

Here it comes:
Code: [Select]
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
945     kCycles [x][x][x] - Cmp128Dave
916     kCycles [x][x][x] - Cmp128Nidud
1017    kCycles [x][x][x] - Cmp128NidudSSE
689     kCycles [x][x][ ] - Cmp128Alex
1013    kCycles [x][x][x] - MasmBasic Ocmp
815     kCycles [x][x][x] - Cmp128JJAlexSSE_1
854     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
925     kCycles [x][x][x] - Cmp128JJAlexSSE_2
926     kCycles [x][x][x] - Cmp128JJAlexSSE_3
858     kCycles [x][x][ ] - AxCMP128bitProc3
870     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
653     kCycles [x][x][x] - Cmp128Dave
608     kCycles [x][x][x] - Cmp128Nidud
806     kCycles [x][x][x] - Cmp128NidudSSE
434     kCycles [x][x][ ] - Cmp128Alex
386     kCycles [x][x][x] - MasmBasic Ocmp
315     kCycles [x][x][x] - Cmp128JJAlexSSE_1
366     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
355     kCycles [x][x][x] - Cmp128JJAlexSSE_2
316     kCycles [x][x][x] - Cmp128JJAlexSSE_3
455     kCycles [x][x][ ] - AxCMP128bitProc3
439     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

well, it's difficult to read your "code", but I think...
You should learn Masm, it's a fascinating language :t

(and I'm afraid your interpretation is not correct - you might launch Olly to see what it really does).

sinsi

  • Guest
Re: Comparing 128-bit numbers aka OWORDs
« Reply #290 on: September 02, 2013, 06:06:46 PM »
jj's latest
Code: [Select]
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
----------------------------------------------------
695     kCycles [x][x][x] - Cmp128Dave
564     kCycles [x][x][x] - Cmp128Nidud
652     kCycles [x][x][x] - Cmp128NidudSSE
396     kCycles [x][x][ ] - Cmp128Alex
316     kCycles [x][x][x] - MasmBasic Ocmp
268     kCycles [x][x][x] - Cmp128JJAlexSSE_1
321     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
312     kCycles [x][x][x] - Cmp128JJAlexSSE_2
271     kCycles [x][x][x] - Cmp128JJAlexSSE_3
403     kCycles [x][x][ ] - AxCMP128bitProc3
378     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
----------------------------------------------------
748     kCycles [x][x][x] - Cmp128Dave
615     kCycles [x][x][x] - Cmp128Nidud
714     kCycles [x][x][x] - Cmp128NidudSSE
433     kCycles [x][x][ ] - Cmp128Alex
348     kCycles [x][x][x] - MasmBasic Ocmp
296     kCycles [x][x][x] - Cmp128JJAlexSSE_1
353     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
344     kCycles [x][x][x] - Cmp128JJAlexSSE_2
298     kCycles [x][x][x] - Cmp128JJAlexSSE_3
442     kCycles [x][x][ ] - AxCMP128bitProc3
416     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #291 on: September 02, 2013, 07:01:01 PM »
your code is hard to read, Jochen - lol
i dread if i have to add a routine   :P

jj2007

  • Member
  • *****
  • Posts: 11177
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #292 on: September 02, 2013, 08:07:38 PM »
your code is hard to read, Jochen - lol

Come on, it's ultra simple...
  pmovmskb edx, xt2   ; show in dx where xt0 differs to xt1
  if MbcmpO eq QWORD
   not dl
   and edx, 07fh
  else          ; don't duplicate MSB
   if 1
      xor edx, -1
      and edx, 07fffh
   else
      not dx
      and dh, 07fh
   endif
  endif

nidud

  • Member
  • *****
  • Posts: 2056
    • https://github.com/nidud/asmc
Re: Comparing 128-bit numbers aka OWORDs
« Reply #293 on: September 02, 2013, 10:57:58 PM »
I guess there is different views about writing code
but one could consider (at least I do) the golden rule:
Quote
Keep it simple stupid!

At least if you write code for other people, which is often the case, as in forums and projects involving other people and so on. A comment should explain what the code actually do, not how it does it, which means that the code basically should explain itself.

Quote
You should learn Masm, it's a fascinating language :t
Well, if you follow these simple rules you may write your own compiler, as I have done in collaboration with other people using a common language. That simplifies the process since everybody is able to read and understand what other people do.

Quote
I'm afraid your interpretation is not correct
How do you know?

Quote
you might launch Olly to see what it really does
Don't you think that this is a bit to much to ask, or at least a bit complicated, to use a debugger to see what it actually does?
Code: [Select]
CPU Disasm
Address   Hex dump          Command                                  Comments
0040111C  |.  0F1006        MOVUPS XMM0,DQWORD PTR DS:[ESI]          ; FLOAT 0.0, 0.0, 0.0, 0.0
0040111F  |.  0F100F        MOVUPS XMM1,DQWORD PTR DS:[EDI]
00401122  |.  0F28D0        MOVAPS XMM2,XMM0
00401125  |.  660F74D1      PCMPEQB XMM2,XMM1
00401129  |.  660FD7D2      PMOVMSKB EDX,XMM2
0040112D  |.  66:F7D2       NOT DX
00401130  |.  80E6 7F       AND DH,7F
00401133  |.  0FBDD2        BSR EDX,EDX
00401136  |.  51            PUSH ECX
00401137  |.  0FB64E 0F     MOVZX ECX,BYTE PTR DS:[ESI+0F]
0040113B  |.  0FC9          BSWAP ECX
0040113D  |.  8A0C32        MOV CL,BYTE PTR DS:[ESI+EDX]
00401140  |.  0FB6143A      MOVZX EDX,BYTE PTR DS:[EDI+EDX]
00401144  |.  0FCA          BSWAP EDX
00401146  |.  8A57 0F       MOV DL,BYTE PTR DS:[EDI+0F]
00401149  |.  0FCA          BSWAP EDX
0040114B  |.  3BCA          CMP ECX,EDX
0040114D  |.  59            POP ECX

jj2007

  • Member
  • *****
  • Posts: 11177
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #294 on: September 02, 2013, 11:27:09 PM »
well, it's difficult to read your "code"
I guess there is different views about writing code
Yes, certainly. But I would never call your code "code", or refer to you as a "coder" instead of a coder. It requires a certain level of arrogance to dismiss somebody else's code as "code".

Quote
Quote
I'm afraid your interpretation is not correct
How do you know?

Quote
you might launch Olly to see what it really does
Don't you think that this is a bit to much to ask, or at least a bit complicated, to use a debugger to see what it actually does?

Normally, I would not ask, but since you had difficulties de-coding my macro, I thought Olly would be a reliable way to check. What you show above, by the way, is old code - the version of oqCmp.asm that I posted 15 hours ago already contained:
   if 1
      xor edx, -1
      and edx, 07fffh
   else
      not dx
      and dh, 07fh
   endif

The if 1 is conditional assembly and means "use this branch, not the other one".

Congrats, by the way - on the AMD your code is faster than mine:
Code: [Select]
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
----------------------------------------------------
843     kCycles [x][x][x] - Cmp128Dave
847     kCycles [x][x][x] - Cmp128Nidud
917     kCycles [x][x][x] - Cmp128NidudSSE
643     kCycles [x][x][ ] - Cmp128Alex
1578    kCycles [x][x][x] - MasmBasic Ocmp
1469    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1531    kCycles [x][x][x] - Cmp128JJAlexSSE_1new
1466    kCycles [x][x][x] - Cmp128JJAlexSSE_2
1466    kCycles [x][x][x] - Cmp128JJAlexSSE_3
803     kCycles [x][x][ ] - AxCMP128bitProc3
771     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

FORTRANS

  • Member
  • *****
  • Posts: 1085
Re: Comparing 128-bit numbers aka OWORDs
« Reply #295 on: September 03, 2013, 12:49:43 AM »
From Reply #289.
Code: [Select]

Mobile Intel(R) Celeron(R) processor     600MHz (SSE2)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
1022 kCycles [x][x][x] - Cmp128Dave
917 kCycles [x][x][x] - Cmp128Nidud
1022 kCycles [x][x][x] - Cmp128NidudSSE
817 kCycles [x][x][ ] - Cmp128Alex
1561 kCycles [x][x][x] - MasmBasic Ocmp
1422 kCycles [x][x][x] - Cmp128JJAlexSSE_1
1471 kCycles [x][x][x] - Cmp128JJAlexSSE_1new
1668 kCycles [x][x][x] - Cmp128JJAlexSSE_2
1677 kCycles [x][x][x] - Cmp128JJAlexSSE_3
937 kCycles [x][x][ ] - AxCMP128bitProc3
985 kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #296 on: September 03, 2013, 12:54:43 AM »
Jochen,
it's just the text format
we each have our own style and it can be hard to get used to someone else's   :P

Gunther

  • Member
  • *****
  • Posts: 3709
  • Forgive your enemies, but never forget their names
Re: Comparing 128-bit numbers aka OWORDs
« Reply #297 on: September 03, 2013, 02:02:26 AM »
The timings for Jochen's latest version:

Code: [Select]

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
736     kCycles [x][x][x] - Cmp128Dave
629     kCycles [x][x][x] - Cmp128Nidud
696     kCycles [x][x][x] - Cmp128NidudSSE
442     kCycles [x][x][ ] - Cmp128Alex
367     kCycles [x][x][x] - MasmBasic Ocmp
321     kCycles [x][x][x] - Cmp128JJAlexSSE_1
371     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
364     kCycles [x][x][x] - Cmp128JJAlexSSE_2
352     kCycles [x][x][x] - Cmp128JJAlexSSE_3
447     kCycles [x][x][ ] - AxCMP128bitProc3
427     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---

Gunther
Get your facts first, and then you can distort them.

jj2007

  • Member
  • *****
  • Posts: 11177
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #298 on: September 03, 2013, 02:18:54 AM »
Thanxalot :icon14:

Attached one more, inter alia with a modification of the test_start macro:

test_start macro useit:=<1>
usethismacro=useit
  if usethismacro
   push 50000000
   .Repeat
      dec dword ptr [esp]   ; heat up the CPU
   .Until Sign?
   add esp, 4
   invoke Sleep, 0
   counter_begin 1000, HIGH_PRIORITY_CLASS
  endif
endm


On some machines, timings were very volatile, the small mod above seems to help.

Code: [Select]
Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
986     kCycles [x][x][x] - Cmp128Dave
946     kCycles [x][x][x] - Cmp128Nidud
818     kCycles [x][x][x] - Cmp128NidudSSE
575     kCycles [x][x][ ] - Cmp128Alex
564     kCycles [x][x][x] - MasmBasic Ocmp.1
517     kCycles [x][x][x] - MasmBasic Ocmp.0
549     kCycles [x][x][x] - MasmBasic Ocmp.1
513     kCycles [x][x][x] - MasmBasic Ocmp.0
472     kCycles [x][x][x] - Cmp128JJAlexSSE_1
476     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
618     kCycles [x][x][x] - Cmp128JJAlexSSE_2
614     kCycles [x][x][x] - Cmp128JJAlexSSE_3
747     kCycles [x][x][ ] - AxCMP128bitProc3
772     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
-----------------------------------------------------
843     kCycles [x][x][x] - Cmp128Dave
844     kCycles [x][x][x] - Cmp128Nidud
919     kCycles [x][x][x] - Cmp128NidudSSE
641     kCycles [x][x][ ] - Cmp128Alex
1588    kCycles [x][x][x] - MasmBasic Ocmp.1
1584    kCycles [x][x][x] - MasmBasic Ocmp.0
1586    kCycles [x][x][x] - MasmBasic Ocmp.1
1578    kCycles [x][x][x] - MasmBasic Ocmp.0
1467    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1532    kCycles [x][x][x] - Cmp128JJAlexSSE_1new
1471    kCycles [x][x][x] - Cmp128JJAlexSSE_2
1468    kCycles [x][x][x] - Cmp128JJAlexSSE_3
801     kCycles [x][x][ ] - AxCMP128bitProc3
771     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

Gunther

  • Member
  • *****
  • Posts: 3709
  • Forgive your enemies, but never forget their names
Re: Comparing 128-bit numbers aka OWORDs
« Reply #299 on: September 03, 2013, 03:13:23 AM »
Jochen,

the new timings. I hope that helps:

Code: [Select]

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
660     kCycles [x][x][x] - Cmp128Dave
538     kCycles [x][x][x] - Cmp128Nidud
603     kCycles [x][x][x] - Cmp128NidudSSE
371     kCycles [x][x][ ] - Cmp128Alex
316     kCycles [x][x][x] - MasmBasic Ocmp.1
307     kCycles [x][x][x] - MasmBasic Ocmp.0
314     kCycles [x][x][x] - MasmBasic Ocmp.1
306     kCycles [x][x][x] - MasmBasic Ocmp.0
259     kCycles [x][x][x] - Cmp128JJAlexSSE_1
308     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
302     kCycles [x][x][x] - Cmp128JJAlexSSE_2
263     kCycles [x][x][x] - Cmp128JJAlexSSE_3
391     kCycles [x][x][ ] - AxCMP128bitProc3
363     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---

Gunther
Get your facts first, and then you can distort them.