Author Topic: Comparing 128-bit numbers aka OWORDs  (Read 119038 times)

jj2007

  • Member
  • *****
  • Posts: 11177
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #300 on: September 03, 2013, 03:44:45 AM »
Thanks, Gunther. And here is the Celeron M:
Code: [Select]
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
955     kCycles [x][x][x] - Cmp128Dave
923     kCycles [x][x][x] - Cmp128Nidud
1012    kCycles [x][x][x] - Cmp128NidudSSE
676     kCycles [x][x][ ] - Cmp128Alex
1081    kCycles [x][x][x] - MasmBasic Ocmp.1
1010    kCycles [x][x][x] - MasmBasic Ocmp.0
1080    kCycles [x][x][x] - MasmBasic Ocmp.1
1011    kCycles [x][x][x] - MasmBasic Ocmp.0
814     kCycles [x][x][x] - Cmp128JJAlexSSE_1
853     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
926     kCycles [x][x][x] - Cmp128JJAlexSSE_2
927     kCycles [x][x][x] - Cmp128JJAlexSSE_3
869     kCycles [x][x][ ] - AxCMP128bitProc3
867     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

nidud

  • Member
  • *****
  • Posts: 2056
    • https://github.com/nidud/asmc
Re: Comparing 128-bit numbers aka OWORDs
« Reply #301 on: September 03, 2013, 03:49:40 AM »
Oh, here we go again  :lol:

You're an excellent "coder" JJ, but sometimes your "code" is a bit intelligent for a simple mind like me, so I tried to simplify it a bit

You are a bit sensitive me think  :P

On reflection, from my simplified view, I now see what Alex was doing here

I added my modified version to the test:
Code: [Select]
movups xmm0,A[0]
movups xmm1,B[0]
mov edx,B[12]
mov eax,A[12]
  pcmpeqb xmm0,xmm1
pmovmskb ecx,xmm0
not ecx
and ecx,07FFFh
bsr ecx,ecx
mov dl,B[ecx]
mov al,A[ecx]
cmp eax,edx

result:
Code: [Select]
Intel(R) Core(TM) i3-2367M CPU @ 1.40GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
781     kCycles [x][x][x] - Cmp128Dave
663     kCycles [x][x][x] - Cmp128Nidud
693     kCycles [x][x][x] - Cmp128NidudSSE
428     kCycles [x][x][ ] - Cmp128Alex
389     kCycles [x][x][x] - MasmBasic Ocmp.1
409     kCycles [x][x][x] - MasmBasic Ocmp.0
406     kCycles [x][x][x] - MasmBasic Ocmp.1
379     kCycles [x][x][x] - MasmBasic Ocmp.0
315     kCycles [x][x][x] - Cmp128JJAlexSSE_1
366     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
287     kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
355     kCycles [x][x][x] - Cmp128JJAlexSSE_2
315     kCycles [x][x][x] - Cmp128JJAlexSSE_3
463     kCycles [x][x][ ] - AxCMP128bitProc3
437     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

AMD Athlon(tm) II X2 245 Processor (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
848 kCycles [x][x][x] - Cmp128Dave
777 kCycles [x][x][x] - Cmp128Nidud
692 kCycles [x][x][x] - Cmp128NidudSSE
599 kCycles [x][x][ ] - Cmp128Alex
1153 kCycles [x][x][x] - MasmBasic Ocmp.1
1085 kCycles [x][x][x] - MasmBasic Ocmp.0
1153 kCycles [x][x][x] - MasmBasic Ocmp.1
1086 kCycles [x][x][x] - MasmBasic Ocmp.0
1044 kCycles [x][x][x] - Cmp128JJAlexSSE_1
1042 kCycles [x][x][x] - Cmp128JJAlexSSE_1new
1029 kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
1048 kCycles [x][x][x] - Cmp128JJAlexSSE_2
1048 kCycles [x][x][x] - Cmp128JJAlexSSE_3
695 kCycles [x][x][ ] - AxCMP128bitProc3
676 kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #302 on: September 03, 2013, 07:43:59 AM »
Latest Jochen's archive:
Code: [Select]

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
2713    kCycles [x][x][x] - Cmp128Dave
2716    kCycles [x][x][x] - Cmp128Nidud
3065    kCycles [x][x][x] - Cmp128NidudSSE
918     kCycles [x][x][ ] - Cmp128Alex
2044    kCycles [x][x][x] - MasmBasic Ocmp.1
1900    kCycles [x][x][x] - MasmBasic Ocmp.0
2046    kCycles [x][x][x] - MasmBasic Ocmp.1
1890    kCycles [x][x][x] - MasmBasic Ocmp.0
1605    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1574    kCycles [x][x][x] - Cmp128JJAlexSSE_1new
1614    kCycles [x][x][x] - Cmp128JJAlexSSE_2
1571    kCycles [x][x][x] - Cmp128JJAlexSSE_3
1363    kCycles [x][x][ ] - AxCMP128bitProc3
1256    kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---

Latest nidud's archive:
Code: [Select]

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
2708    kCycles [x][x][x] - Cmp128Dave
2749    kCycles [x][x][x] - Cmp128Nidud
3076    kCycles [x][x][x] - Cmp128NidudSSE
909     kCycles [x][x][ ] - Cmp128Alex
2044    kCycles [x][x][x] - MasmBasic Ocmp.1
1895    kCycles [x][x][x] - MasmBasic Ocmp.0
2046    kCycles [x][x][x] - MasmBasic Ocmp.1
1906    kCycles [x][x][x] - MasmBasic Ocmp.0
1604    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1543    kCycles [x][x][x] - Cmp128JJAlexSSE_1new
1351    kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
1576    kCycles [x][x][x] - Cmp128JJAlexSSE_2
1578    kCycles [x][x][x] - Cmp128JJAlexSSE_3
1340    kCycles [x][x][ ] - AxCMP128bitProc3
1267    kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---



:biggrin:

jj2007

  • Member
  • *****
  • Posts: 11177
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #303 on: September 03, 2013, 08:18:12 AM »
Code: [Select]
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
957     kCycles [x][x][x] - Cmp128Dave
926     kCycles [x][x][x] - Cmp128Nidud
1016    kCycles [x][x][x] - Cmp128NidudSSE
680     kCycles [x][x][ ] - Cmp128Alex
1039    kCycles [x][x][x] - MasmBasic Ocmp.1
1039    kCycles [x][x][x] - MasmBasic Ocmp.0
818     kCycles [x][x][x] - Cmp128JJAlexSSE_1
857     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
785     kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
832     kCycles [x][x][x] - Cmp128AxelNidudJJ_A
853     kCycles [x][x][x] - Cmp128AxelNidudJJ_B
930     kCycles [x][x][x] - Cmp128JJAlexSSE_2
930     kCycles [x][x][x] - Cmp128JJAlexSSE_3
866     kCycles [x][x][ ] - AxCMP128bitProc3
871     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

Cmp128AxelNidudJJ MACRO A:REQ, B:REQ
   movups   xmm0,A[0]
   movups   xmm1,B[0]
   push ecx      ; do not trash ecx
;    mov   eax,A[12]
;    mov   edx,B[12]
     pcmpeqb   xmm0,xmm1
   pmovmskb ecx,xmm0
   if ANJ_A
      xor ecx, -1
      and   ecx, 07FFFh
      or ecx, 1      ; make sure there is no zero input
      bsr   ecx, ecx
      mov eax,A[12]
      mov edx,B[12]
      mov dl,B[ecx]
      mov al,A[ecx]
      cmp eax,edx
   else
      xor ecx, 0FFFFh
      .if !Zero?      ; make sure there is no zero input
         and   ecx, 07FFFh
         bsr   ecx, ecx
         mov eax,A[12]
         mov edx,B[12]
         mov dl,B[ecx]
         mov al,A[ecx]
         cmp eax,edx
      .endif
   endif
   pop ecx
ENDM

Siekmanski

  • Member
  • *****
  • Posts: 2357
Re: Comparing 128-bit numbers aka OWORDs
« Reply #304 on: September 03, 2013, 08:26:59 AM »
Hi Jochen,

Reply #298:

Code: [Select]
Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
963     kCycles [x][x][x] - Cmp128Dave
901     kCycles [x][x][x] - Cmp128Nidud
1000    kCycles [x][x][x] - Cmp128NidudSSE
662     kCycles [x][x][ ] - Cmp128Alex
991     kCycles [x][x][x] - MasmBasic Ocmp.1
941     kCycles [x][x][x] - MasmBasic Ocmp.0
991     kCycles [x][x][x] - MasmBasic Ocmp.1
941     kCycles [x][x][x] - MasmBasic Ocmp.0
765     kCycles [x][x][x] - Cmp128JJAlexSSE_1
826     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
870     kCycles [x][x][x] - Cmp128JJAlexSSE_2
873     kCycles [x][x][x] - Cmp128JJAlexSSE_3
862     kCycles [x][x][ ] - AxCMP128bitProc3
888     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

Hi nidud,

Reply #301:

Code: [Select]
Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
961     kCycles [x][x][x] - Cmp128Dave
903     kCycles [x][x][x] - Cmp128Nidud
999     kCycles [x][x][x] - Cmp128NidudSSE
663     kCycles [x][x][ ] - Cmp128Alex
990     kCycles [x][x][x] - MasmBasic Ocmp.1
942     kCycles [x][x][x] - MasmBasic Ocmp.0
990     kCycles [x][x][x] - MasmBasic Ocmp.1
942     kCycles [x][x][x] - MasmBasic Ocmp.0
763     kCycles [x][x][x] - Cmp128JJAlexSSE_1
827     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
718     kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
871     kCycles [x][x][x] - Cmp128JJAlexSSE_2
874     kCycles [x][x][x] - Cmp128JJAlexSSE_3
855     kCycles [x][x][ ] - AxCMP128bitProc3
887     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

Jochen,  Reply #303:

Code: [Select]
Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
964     kCycles [x][x][x] - Cmp128Dave
901     kCycles [x][x][x] - Cmp128Nidud
1000    kCycles [x][x][x] - Cmp128NidudSSE
662     kCycles [x][x][ ] - Cmp128Alex
970     kCycles [x][x][x] - MasmBasic Ocmp.1
968     kCycles [x][x][x] - MasmBasic Ocmp.0
764     kCycles [x][x][x] - Cmp128JJAlexSSE_1
825     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
720     kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
717     kCycles [x][x][x] - Cmp128AxelNidudJJ
870     kCycles [x][x][x] - Cmp128JJAlexSSE_2
872     kCycles [x][x][x] - Cmp128JJAlexSSE_3
862     kCycles [x][x][ ] - AxCMP128bitProc3
886     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)
Creative coders use backward thinking techniques as a strategy.

jj2007

  • Member
  • *****
  • Posts: 11177
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #305 on: September 03, 2013, 08:30:53 AM »
Thanks, Marinus - I like it :biggrin:

KeepingRealBusy

  • Member
  • ***
  • Posts: 426
Re: Comparing 128-bit numbers aka OWORDs
« Reply #306 on: September 03, 2013, 08:48:02 AM »
JJ's latest

Code: [Select]

AMD A8-3520M APU with Radeon(tm) HD Graphics (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
1448    kCycles [x][x][x] - Cmp128Dave
1146    kCycles [x][x][x] - Cmp128Nidud
846     kCycles [x][x][x] - Cmp128NidudSSE
496     kCycles [x][x][ ] - Cmp128Alex
2043    kCycles [x][x][x] - MasmBasic Ocmp.1
2153    kCycles [x][x][x] - MasmBasic Ocmp.0
1849    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1996    kCycles [x][x][x] - Cmp128JJAlexSSE_1new
1799    kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
1884    kCycles [x][x][x] - Cmp128AxelNidudJJ
1956    kCycles [x][x][x] - Cmp128JJAlexSSE_2
1994    kCycles [x][x][x] - Cmp128JJAlexSSE_3
1344    kCycles [x][x][ ] - AxCMP128bitProc3
1263    kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---

Dave.

jj2007

  • Member
  • *****
  • Posts: 11177
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #307 on: September 03, 2013, 08:54:39 AM »
Dave & Marinus,

Thanks but I am afraid the difference between Cmp128JJAlexSSE_1new1 and Cmp128AlexNidudJJ reflected just the volatility of timings - I changed the description but forgot the macro call itself :redface:

Scroll back three posts to get the good version... sorry ;-)

KeepingRealBusy

  • Member
  • ***
  • Posts: 426
Re: Comparing 128-bit numbers aka OWORDs
« Reply #308 on: September 03, 2013, 09:06:41 AM »
From 303 - I thought something had happened - timings were way high with the newest version.

Code: [Select]

AMD A8-3520M APU with Radeon(tm) HD Graphics (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
1391    kCycles [x][x][x] - Cmp128Dave
1065    kCycles [x][x][x] - Cmp128Nidud
884     kCycles [x][x][x] - Cmp128NidudSSE
685     kCycles [x][x][ ] - Cmp128Alex
949     kCycles [x][x][x] - MasmBasic Ocmp.1
953     kCycles [x][x][x] - MasmBasic Ocmp.0
714     kCycles [x][x][x] - Cmp128JJAlexSSE_1
703     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
700     kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
740     kCycles [x][x][x] - Cmp128AxelNidudJJ_A
739     kCycles [x][x][x] - Cmp128AxelNidudJJ_B
710     kCycles [x][x][x] - Cmp128JJAlexSSE_2
697     kCycles [x][x][x] - Cmp128JJAlexSSE_3
516     kCycles [x][x][ ] - AxCMP128bitProc3
1331    kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---

Dave.

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #309 on: September 03, 2013, 10:43:04 AM »
Code: [Select]

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
2713    kCycles [x][x][x] - Cmp128Dave
2729    kCycles [x][x][x] - Cmp128Nidud
3105    kCycles [x][x][x] - Cmp128NidudSSE
921     kCycles [x][x][ ] - Cmp128Alex
1979    kCycles [x][x][x] - MasmBasic Ocmp.1
1979    kCycles [x][x][x] - MasmBasic Ocmp.0
1607    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1540    kCycles [x][x][x] - Cmp128JJAlexSSE_1new
1354    kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
1549    kCycles [x][x][x] - Cmp128AxelNidudJJ_A
1589    kCycles [x][x][x] - Cmp128AxelNidudJJ_B
1574    kCycles [x][x][x] - Cmp128JJAlexSSE_2
1577    kCycles [x][x][x] - Cmp128JJAlexSSE_3
1333    kCycles [x][x][ ] - AxCMP128bitProc3
1251    kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---

Antariy

  • Member
  • ****
  • Posts: 551
Re: Comparing 128-bit numbers aka OWORDs
« Reply #310 on: September 03, 2013, 10:54:24 AM »
Cmp128AxelNidudJJ MACRO A:REQ, B:REQ

:biggrin:

japheth

  • Guest
Re: Comparing 128-bit numbers aka OWORDs
« Reply #311 on: September 03, 2013, 05:22:18 PM »
You are a bit sensitive me think  :P

Sieht so aus ... ist wohl ein Standardfeature hyperaktiver Forumsmitglieder ... da könnte ich auch die ein oder andere Erfahrung beisteuern.  :icon_mrgreen:


jj2007

  • Member
  • *****
  • Posts: 11177
  • Assembler is fun ;-)
    • MasmBasic
Re: Comparing 128-bit numbers aka OWORDs
« Reply #312 on: September 03, 2013, 11:03:31 PM »
Cmp128AxelNidudJJ MACRO A:REQ, B:REQ

Oops - my apologies, Alex :redface:


FORTRANS

  • Member
  • *****
  • Posts: 1085
Re: Comparing 128-bit numbers aka OWORDs
« Reply #313 on: September 03, 2013, 11:36:20 PM »
Hi,

   Well, I put versions of a compare routine using CMPSB and
CMPSW into the timing suite.  If I did it correctly, someone in
Intel really hates string instructions.  And going from bytes to
words only helped ~5 - 15%.  Which means I probably need to
check for gross errors.  Oh well, maybe small code size counts
for something.

Cheers,

Steve N.

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: Comparing 128-bit numbers aka OWORDs
« Reply #314 on: September 04, 2013, 12:04:31 AM »
i am not sure that the string method would pass all the tests, Steve
at least, no without some extra support code   :P

RE: "Axel"

Axel is a good name - let's call him that, from now on   :lol: