News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Comparing 128-bit numbers aka OWORDs

Started by jj2007, August 12, 2013, 08:25:24 PM

Previous topic - Next topic

jj2007

Thanks, Gunther. And here is the Celeron M:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
955     kCycles [x][x][x] - Cmp128Dave
923     kCycles [x][x][x] - Cmp128Nidud
1012    kCycles [x][x][x] - Cmp128NidudSSE
676     kCycles [x][x][ ] - Cmp128Alex
1081    kCycles [x][x][x] - MasmBasic Ocmp.1
1010    kCycles [x][x][x] - MasmBasic Ocmp.0
1080    kCycles [x][x][x] - MasmBasic Ocmp.1
1011    kCycles [x][x][x] - MasmBasic Ocmp.0
814     kCycles [x][x][x] - Cmp128JJAlexSSE_1
853     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
926     kCycles [x][x][x] - Cmp128JJAlexSSE_2
927     kCycles [x][x][x] - Cmp128JJAlexSSE_3
869     kCycles [x][x][ ] - AxCMP128bitProc3
867     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

nidud

#301
deleted

Antariy

Latest Jochen's archive:


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
2713    kCycles [x][x][x] - Cmp128Dave
2716    kCycles [x][x][x] - Cmp128Nidud
3065    kCycles [x][x][x] - Cmp128NidudSSE
918     kCycles [x][x][ ] - Cmp128Alex
2044    kCycles [x][x][x] - MasmBasic Ocmp.1
1900    kCycles [x][x][x] - MasmBasic Ocmp.0
2046    kCycles [x][x][x] - MasmBasic Ocmp.1
1890    kCycles [x][x][x] - MasmBasic Ocmp.0
1605    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1574    kCycles [x][x][x] - Cmp128JJAlexSSE_1new
1614    kCycles [x][x][x] - Cmp128JJAlexSSE_2
1571    kCycles [x][x][x] - Cmp128JJAlexSSE_3
1363    kCycles [x][x][ ] - AxCMP128bitProc3
1256    kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---


Latest nidud's archive:


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
2708    kCycles [x][x][x] - Cmp128Dave
2749    kCycles [x][x][x] - Cmp128Nidud
3076    kCycles [x][x][x] - Cmp128NidudSSE
909     kCycles [x][x][ ] - Cmp128Alex
2044    kCycles [x][x][x] - MasmBasic Ocmp.1
1895    kCycles [x][x][x] - MasmBasic Ocmp.0
2046    kCycles [x][x][x] - MasmBasic Ocmp.1
1906    kCycles [x][x][x] - MasmBasic Ocmp.0
1604    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1543    kCycles [x][x][x] - Cmp128JJAlexSSE_1new
1351    kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
1576    kCycles [x][x][x] - Cmp128JJAlexSSE_2
1578    kCycles [x][x][x] - Cmp128JJAlexSSE_3
1340    kCycles [x][x][ ] - AxCMP128bitProc3
1267    kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---




:biggrin:

jj2007

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
957     kCycles [x][x][x] - Cmp128Dave
926     kCycles [x][x][x] - Cmp128Nidud
1016    kCycles [x][x][x] - Cmp128NidudSSE
680     kCycles [x][x][ ] - Cmp128Alex
1039    kCycles [x][x][x] - MasmBasic Ocmp.1
1039    kCycles [x][x][x] - MasmBasic Ocmp.0
818     kCycles [x][x][x] - Cmp128JJAlexSSE_1
857     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
785     kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
832     kCycles [x][x][x] - Cmp128AxelNidudJJ_A
853     kCycles [x][x][x] - Cmp128AxelNidudJJ_B
930     kCycles [x][x][x] - Cmp128JJAlexSSE_2
930     kCycles [x][x][x] - Cmp128JJAlexSSE_3
866     kCycles [x][x][ ] - AxCMP128bitProc3
871     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)


Cmp128AxelNidudJJ MACRO A:REQ, B:REQ
   movups   xmm0,A[0]
   movups   xmm1,B[0]
   push ecx      ; do not trash ecx
;    mov   eax,A[12]
;    mov   edx,B[12]
     pcmpeqb   xmm0,xmm1
   pmovmskb ecx,xmm0
   if ANJ_A
      xor ecx, -1
      and   ecx, 07FFFh
      or ecx, 1      ; make sure there is no zero input
      bsr   ecx, ecx
      mov eax,A[12]
      mov edx,B[12]
      mov dl,B[ecx]
      mov al,A[ecx]
      cmp eax,edx
   else
      xor ecx, 0FFFFh
      .if !Zero?      ; make sure there is no zero input
         and   ecx, 07FFFh
         bsr   ecx, ecx
         mov eax,A[12]
         mov edx,B[12]
         mov dl,B[ecx]
         mov al,A[ecx]
         cmp eax,edx
      .endif
   endif
   pop ecx
ENDM

Siekmanski

Hi Jochen,

Reply #298:

Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
963     kCycles [x][x][x] - Cmp128Dave
901     kCycles [x][x][x] - Cmp128Nidud
1000    kCycles [x][x][x] - Cmp128NidudSSE
662     kCycles [x][x][ ] - Cmp128Alex
991     kCycles [x][x][x] - MasmBasic Ocmp.1
941     kCycles [x][x][x] - MasmBasic Ocmp.0
991     kCycles [x][x][x] - MasmBasic Ocmp.1
941     kCycles [x][x][x] - MasmBasic Ocmp.0
765     kCycles [x][x][x] - Cmp128JJAlexSSE_1
826     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
870     kCycles [x][x][x] - Cmp128JJAlexSSE_2
873     kCycles [x][x][x] - Cmp128JJAlexSSE_3
862     kCycles [x][x][ ] - AxCMP128bitProc3
888     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)


Hi nidud,

Reply #301:

Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
961     kCycles [x][x][x] - Cmp128Dave
903     kCycles [x][x][x] - Cmp128Nidud
999     kCycles [x][x][x] - Cmp128NidudSSE
663     kCycles [x][x][ ] - Cmp128Alex
990     kCycles [x][x][x] - MasmBasic Ocmp.1
942     kCycles [x][x][x] - MasmBasic Ocmp.0
990     kCycles [x][x][x] - MasmBasic Ocmp.1
942     kCycles [x][x][x] - MasmBasic Ocmp.0
763     kCycles [x][x][x] - Cmp128JJAlexSSE_1
827     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
718     kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
871     kCycles [x][x][x] - Cmp128JJAlexSSE_2
874     kCycles [x][x][x] - Cmp128JJAlexSSE_3
855     kCycles [x][x][ ] - AxCMP128bitProc3
887     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)


Jochen,  Reply #303:

Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
964     kCycles [x][x][x] - Cmp128Dave
901     kCycles [x][x][x] - Cmp128Nidud
1000    kCycles [x][x][x] - Cmp128NidudSSE
662     kCycles [x][x][ ] - Cmp128Alex
970     kCycles [x][x][x] - MasmBasic Ocmp.1
968     kCycles [x][x][x] - MasmBasic Ocmp.0
764     kCycles [x][x][x] - Cmp128JJAlexSSE_1
825     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
720     kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
717     kCycles [x][x][x] - Cmp128AxelNidudJJ
870     kCycles [x][x][x] - Cmp128JJAlexSSE_2
872     kCycles [x][x][x] - Cmp128JJAlexSSE_3
862     kCycles [x][x][ ] - AxCMP128bitProc3
886     kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)
Creative coders use backward thinking techniques as a strategy.

jj2007


KeepingRealBusy

JJ's latest



AMD A8-3520M APU with Radeon(tm) HD Graphics (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
1448    kCycles [x][x][x] - Cmp128Dave
1146    kCycles [x][x][x] - Cmp128Nidud
846     kCycles [x][x][x] - Cmp128NidudSSE
496     kCycles [x][x][ ] - Cmp128Alex
2043    kCycles [x][x][x] - MasmBasic Ocmp.1
2153    kCycles [x][x][x] - MasmBasic Ocmp.0
1849    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1996    kCycles [x][x][x] - Cmp128JJAlexSSE_1new
1799    kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
1884    kCycles [x][x][x] - Cmp128AxelNidudJJ
1956    kCycles [x][x][x] - Cmp128JJAlexSSE_2
1994    kCycles [x][x][x] - Cmp128JJAlexSSE_3
1344    kCycles [x][x][ ] - AxCMP128bitProc3
1263    kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---


Dave.

jj2007

Dave & Marinus,

Thanks but I am afraid the difference between Cmp128JJAlexSSE_1new1 and Cmp128AlexNidudJJ reflected just the volatility of timings - I changed the description but forgot the macro call itself :redface:

Scroll back three posts to get the good version... sorry ;-)

KeepingRealBusy

From 303 - I thought something had happened - timings were way high with the newest version.



AMD A8-3520M APU with Radeon(tm) HD Graphics (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
1391    kCycles [x][x][x] - Cmp128Dave
1065    kCycles [x][x][x] - Cmp128Nidud
884     kCycles [x][x][x] - Cmp128NidudSSE
685     kCycles [x][x][ ] - Cmp128Alex
949     kCycles [x][x][x] - MasmBasic Ocmp.1
953     kCycles [x][x][x] - MasmBasic Ocmp.0
714     kCycles [x][x][x] - Cmp128JJAlexSSE_1
703     kCycles [x][x][x] - Cmp128JJAlexSSE_1new
700     kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
740     kCycles [x][x][x] - Cmp128AxelNidudJJ_A
739     kCycles [x][x][x] - Cmp128AxelNidudJJ_B
710     kCycles [x][x][x] - Cmp128JJAlexSSE_2
697     kCycles [x][x][x] - Cmp128JJAlexSSE_3
516     kCycles [x][x][ ] - AxCMP128bitProc3
1331    kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---


Dave.

Antariy



Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
CMP emulation: [JB/JA] [JL/JG] [JO/JS]
------------------------------------------------------
2713    kCycles [x][x][x] - Cmp128Dave
2729    kCycles [x][x][x] - Cmp128Nidud
3105    kCycles [x][x][x] - Cmp128NidudSSE
921     kCycles [x][x][ ] - Cmp128Alex
1979    kCycles [x][x][x] - MasmBasic Ocmp.1
1979    kCycles [x][x][x] - MasmBasic Ocmp.0
1607    kCycles [x][x][x] - Cmp128JJAlexSSE_1
1540    kCycles [x][x][x] - Cmp128JJAlexSSE_1new
1354    kCycles [x][x][x] - Cmp128JJAlexSSE_1new1
1549    kCycles [x][x][x] - Cmp128AxelNidudJJ_A
1589    kCycles [x][x][x] - Cmp128AxelNidudJJ_B
1574    kCycles [x][x][x] - Cmp128JJAlexSSE_2
1577    kCycles [x][x][x] - Cmp128JJAlexSSE_3
1333    kCycles [x][x][ ] - AxCMP128bitProc3
1251    kCycles [x][x][ ] - AxCMP128bitProc3c (cmov)

--- ok ---


japheth

Quote from: nidud on September 03, 2013, 03:49:40 AM
You are a bit sensitive me think  :P

Sieht so aus ... ist wohl ein Standardfeature hyperaktiver Forumsmitglieder ... da könnte ich auch die ein oder andere Erfahrung beisteuern.  :icon_mrgreen:


jj2007


FORTRANS

Hi,

   Well, I put versions of a compare routine using CMPSB and
CMPSW into the timing suite.  If I did it correctly, someone in
Intel really hates string instructions.  And going from bytes to
words only helped ~5 - 15%.  Which means I probably need to
check for gross errors.  Oh well, maybe small code size counts
for something.

Cheers,

Steve N.

dedndave

i am not sure that the string method would pass all the tests, Steve
at least, no without some extra support code   :P

RE: "Axel"

Axel is a good name - let's call him that, from now on   :lol: