News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Comparing 128-bit numbers aka OWORDs

Started by jj2007, August 12, 2013, 08:25:24 PM

Previous topic - Next topic

jj2007

Quote from: nidud on August 24, 2013, 04:05:01 AM
The result of a full test for Cmp128JJ

cmp FFFFFFFF_FFFFFFFF_00000001_FFFFFFFF , FFFFFFFF_00000001_FFFFFFFF_00000001
AX:DX 0001FFFF  was: NO NS NZ CY should be: NO NS NZ NC

216 Failures

That one was marked as "tinkering with", you can take it out. I was talking about the MasmBasic algo (Cmp128JJSEE - what is SEE? SSE?) which, AFAIK, sets zero and carry exactly like a cmp eax, edx; it does produce different results for SF/OF but in a manner that does not alter the jl/jg jumps (which require SF!=OF resp SF==OF). Which means 0 failures, right?

Besides, as shown above, your test produces occasionally wrong results. The deb macro's czso means "none of the four are set", your algo says the carry was set. Olly and deb say carry is clear.

nidud

#196
deleted

nidud

#197
deleted

nidud

#198
deleted

jj2007

Quote from: nidud on August 24, 2013, 05:15:19 AM
The result of a full test
Quote150 Failures

Yes, and all 150 produce correct jumps because SF & OF are swapped. So Ocmp and Qcmp work correctly - no failures. I guess the same holds true for Alex' version, although I haven't had time to test it.

Antariy

Can I please have timings for the attached prog?


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
---------------------------------------------------------
5934340 cycles for Cmp128Dave
9986238 cycles for Cmp128Dave2
6086064 cycles for Cmp128Nidud
2989932 cycles for Cmp128NidudSEE (xor)
2520135 cycles for Cmp128NidudSEEU (unsigned)
933977  cycles for Cmp128Alex (xor)
991343  cycles for Cmp128DaveU (unsigned)
926699  cycles for Cmp128NidudU (unsigned)
3613971 cycles for JJAxCMP128bit (SSE)
5933804 cycles for Ocmp2 - JJ's (SSE)
2965817 cycles for AxCMP128bitProc3
3446805 cycles for AxCMP128bitProc3c (cmov)
---------------------------------------------------------
6019773 cycles for Cmp128Dave
9803217 cycles for Cmp128Dave2
6065554 cycles for Cmp128Nidud
2770437 cycles for Cmp128NidudSEE (xor)
2689950 cycles for Cmp128NidudSEEU (unsigned)
945193  cycles for Cmp128Alex (xor)
950871  cycles for Cmp128DaveU (unsigned)
923201  cycles for Cmp128NidudU (unsigned)
3674979 cycles for JJAxCMP128bit (SSE)
5935439 cycles for Ocmp2 - JJ's (SSE)
2902697 cycles for AxCMP128bitProc3
3456920 cycles for AxCMP128bitProc3c (cmov)
---------------------------------------------------------

--- ok ---

dedndave

Alex CMP128bit
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
---------------------------------------------------------
5817682 cycles for Cmp128Dave
9943469 cycles for Cmp128Dave2
6048901 cycles for Cmp128Nidud
2573234 cycles for Cmp128NidudSEE (xor)
2341353 cycles for Cmp128NidudSEEU (unsigned)
1022981 cycles for Cmp128Alex (xor)
919308  cycles for Cmp128DaveU (unsigned)
844475  cycles for Cmp128NidudU (unsigned)
3445214 cycles for JJAxCMP128bit (SSE)
5638061 cycles for Ocmp2 - JJ's (SSE)
2800548 cycles for AxCMP128bitProc3
3475108 cycles for AxCMP128bitProc3c (cmov)
---------------------------------------------------------
5776971 cycles for Cmp128Dave
9150470 cycles for Cmp128Dave2
5760939 cycles for Cmp128Nidud
2597086 cycles for Cmp128NidudSEE (xor)
2326359 cycles for Cmp128NidudSEEU (unsigned)
1154350 cycles for Cmp128Alex (xor)
880936  cycles for Cmp128DaveU (unsigned)
896082  cycles for Cmp128NidudU (unsigned)
3442676 cycles for JJAxCMP128bit (SSE)
5767909 cycles for Ocmp2 - JJ's (SSE)
2825598 cycles for AxCMP128bitProc3
3242168 cycles for AxCMP128bitProc3c (cmov)
---------------------------------------------------------

nidud

#202
deleted

dedndave

nidud cmp128
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
---------------------------------------------------------
5657695 cycles for Cmp128Dave
8870013 cycles for Cmp128Dave2
5700804 cycles for Cmp128Nidud
1115699 cycles for Cmp128Alex (xor)
895891  cycles for Cmp128DaveU (unsigned)
1045671 cycles for Cmp128NidudU (unsigned)
5584412 cycles for Cmp128JJSSE (xor)
3425749 cycles for Cmp128JJAlexSSE (xor)
6307931 cycles for Cmp128NidudSSE
2788370 cycles for AxCMP128bitProc3
3278983 cycles for AxCMP128bitProc3c (cmov)
---------------------------------------------------------
5626412 cycles for Cmp128Dave
8946119 cycles for Cmp128Dave2
5902719 cycles for Cmp128Nidud
1007638 cycles for Cmp128Alex (xor)
911179  cycles for Cmp128DaveU (unsigned)
977962  cycles for Cmp128NidudU (unsigned)
5649649 cycles for Cmp128JJSSE (xor)
3396764 cycles for Cmp128JJAlexSSE (xor)
6278291 cycles for Cmp128NidudSSE
2784183 cycles for AxCMP128bitProc3
3333723 cycles for AxCMP128bitProc3c (cmov)

---------------------------------------------------------

nidud

#204
deleted

Antariy

The wondering thing is that there're some "rumours", like "CMOV is preferably than jump + mov", or "string instructions with REP(E) prefix are the fastest possible", but in tests these rumours are not proved much of times :icon_eek:

nidud's cmp128.zip

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
---------------------------------------------------------
6909610 cycles for Cmp128Dave
9987059 cycles for Cmp128Dave2
5967230 cycles for Cmp128Nidud
948941  cycles for Cmp128Alex (xor)
969717  cycles for Cmp128DaveU (unsigned)
1015841 cycles for Cmp128NidudU (unsigned)
5981220 cycles for Cmp128JJSSE (xor)
3506260 cycles for Cmp128JJAlexSSE (xor)
6612580 cycles for Cmp128NidudSSE
2884301 cycles for AxCMP128bitProc3
3538700 cycles for AxCMP128bitProc3c (cmov)
---------------------------------------------------------
6021956 cycles for Cmp128Dave
9902589 cycles for Cmp128Dave2
5931336 cycles for Cmp128Nidud
982830  cycles for Cmp128Alex (xor)
928048  cycles for Cmp128DaveU (unsigned)
933436  cycles for Cmp128NidudU (unsigned)
5800983 cycles for Cmp128JJSSE (xor)
3515863 cycles for Cmp128JJAlexSSE (xor)
6532618 cycles for Cmp128NidudSSE
2878085 cycles for AxCMP128bitProc3
3399463 cycles for AxCMP128bitProc3c (cmov)
---------------------------------------------------------

--- ok ---

Antariy

Quote from: nidud on August 24, 2013, 10:45:31 AM
Alex,
I converted the functions to macros, it's a bit faster  :biggrin:

Thanks :biggrin:

It looks like prologue and epilogue get much of time.


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
---------------------------------------------------------
6111257 cycles for Cmp128Dave
9430653 cycles for Cmp128Dave2
6087648 cycles for Cmp128Nidud
977496  cycles for Cmp128Alex (xor)
912537  cycles for Cmp128DaveU (unsigned)
897094  cycles for Cmp128NidudU (unsigned)
5813163 cycles for Cmp128JJSSE (xor)
3502859 cycles for Cmp128JJAlexSSE (xor)
6542713 cycles for Cmp128NidudSSE
2027954 cycles for AxCMP128bitProc3
2026933 cycles for AxCMP128bitProc3c (cmov)
---------------------------------------------------------
5771320 cycles for Cmp128Dave
9534868 cycles for Cmp128Dave2
5882258 cycles for Cmp128Nidud
999464  cycles for Cmp128Alex (xor)
911686  cycles for Cmp128DaveU (unsigned)
970983  cycles for Cmp128NidudU (unsigned)
5801006 cycles for Cmp128JJSSE (xor)
3547188 cycles for Cmp128JJAlexSSE (xor)
6496130 cycles for Cmp128NidudSSE
2012868 cycles for AxCMP128bitProc3
2010059 cycles for AxCMP128bitProc3c (cmov)
---------------------------------------------------------

--- ok ---


nidud

#207
deleted

Gunther

Hi nidud,

timings:


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 2360/1000 cycles

1771    cycles for 1000 * Ocmp (JJ)
1757    cycles for 1000 * Ocmp2 (JJ)
1445    cycles for 1000 * cmp128n (nidud)
3829    cycles for 1000 * cmp128 qWord
3146    cycles for 1000 * AxCMP128bit

1843    cycles for 1000 * Ocmp (JJ)
1846    cycles for 1000 * Ocmp2 (JJ)
1612    cycles for 1000 * cmp128n (nidud)
3992    cycles for 1000 * cmp128 qWord
3144    cycles for 1000 * AxCMP128bit

1709    cycles for 1000 * Ocmp (JJ)
1746    cycles for 1000 * Ocmp2 (JJ)
1506    cycles for 1000 * cmp128n (nidud)
3848    cycles for 1000 * cmp128 qWord
3270    cycles for 1000 * AxCMP128bit

--- ok ---


Gunther
You have to know the facts before you can distort them.

six_L

QuoteIntel(R) Core(TM) i3 CPU       M 370  @ 2.40GHz (SSE4)
---------------------------------------------------------
2639323   cycles for Cmp128Dave
8595376   cycles for Cmp128Dave2
3084573   cycles for Cmp128Nidud
1718384   cycles for Cmp128Alex (xor)
742712   cycles for Cmp128DaveU (unsigned)
621263   cycles for Cmp128NidudU (unsigned)
2229787   cycles for Cmp128JJSSE (xor)
988796   cycles for Cmp128JJAlexSSE (xor)
1853176   cycles for Cmp128NidudSSE
1302968   cycles for AxCMP128bitProc3
1286231   cycles for AxCMP128bitProc3c (cmov)
---------------------------------------------------------
1768975   cycles for Cmp128Dave
7299673   cycles for Cmp128Dave2
2650490   cycles for Cmp128Nidud
1824726   cycles for Cmp128Alex (xor)
1732493   cycles for Cmp128DaveU (unsigned)
1461457   cycles for Cmp128NidudU (unsigned)
3480289   cycles for Cmp128JJSSE (xor)
1887690   cycles for Cmp128JJAlexSSE (xor)
2749835   cycles for Cmp128NidudSSE
2177752   cycles for AxCMP128bitProc3
2126370   cycles for AxCMP128bitProc3c (cmov)
---------------------------------------------------------

--- ok ---

Say you, Say me, Say the codes together for ever.