News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)

Started by Antariy, August 14, 2013, 02:31:50 PM

Previous topic - Next topic

Antariy

It's just like a CMP instruction, but for 128 bits instead of 32 :biggrin: So, you can use a constructs like:
AxCMP128bit thenumber1,thenumber2
jge @Signed_jumpIfTheNumber1IsGreaterThanTheNumber2OrEqualToIt


The only one difference is in that the code does not (always) set SF flag, but all other conditions are met the "standard". The sign maybe just checked by checking highest order bit of the OWORD, though.


65 cycles for tn7F00-tn7FFF
42 cycles for short cut
Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
Press any key to continue ...


The results are exact for signed/unsigned/mixed comparsions.

Since the code is SSE1-capable, and uses an interesting trycky replacement (my, but someone on the planet for sure used it somewhere, too) of PCMPEQD (SSE2), it's interesting to see how the code works on a wide range of SSE1+ capable harware, so, I'm asking for a test in this new thread.

Antariy

An update - jump table instead of JECXZ :biggrin:

41 cycles for tn7F00-tn7FFF
29 cycles for short cut
Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
Press any key to continue ...

sinsi

Here you go
4 cycles for tn7F00-tn7FFF
2 cycles for short cut

Everything passed.

Siekmanski

12 cycles for tn7F00-tn7FFF
8 cycles for short cut

They all pass
Creative coders use backward thinking techniques as a strategy.

dedndave

p4 prescott w/htt
70 cycles for tn7F00-tn7FFF
58 cycles for short cut


everything passed   :t


jj2007

Hi Alex,

Now I solved my problems with the flags. The new version produces exactly the same passes as yours (60). The timings looked a bit inconsistent, so I added an invoke Sleep, 50.

35 cycles for tn7F00-tn7FFF (Alex)
33 cycles for short cut - press any key

23 cycles for tn7F00-tn7FFF (JJ)
23 cycles for short cut - press any key


Modified source & exe attached.

sinsi

i3-3100M
15 cycles for tn7F00-tn7FFF (Alex)
6 cycles for short cut

1 cycles for tn7F00-tn7FFF (JJ)
4 cycles for short cut


i7-3770K
5 cycles for tn7F00-tn7FFF (Alex)
3 cycles for short cut

2 cycles for tn7F00-tn7FFF (JJ)
1 cycles for short cut


FORTRANS

Hi,

   No failure messages so don't know if the final message means
anything.  P-III.

Regards,

Steve


>axjj_cmp
40 cycles for tn7F00-tn7FFF (Alex)
28 cycles for short cut - press any key
...
60      # passed

30 cycles for tn7F00-tn7FFF (JJ)
30 cycles for short cut - press any key

...

34      # passed

bye

jj2007

Quote from: FORTRANS on August 14, 2013, 09:52:51 PM
   No failure messages so don't know if the final message means
anything.  P-III.

34      # passed

Since it's less than 60, it means the P-III doesn't know about SSE2 ;-)

sinsi

Heh, it's good to see a PIII as a sanity check, I have worked on quite a few in the last couple of years (even a PII with NT 3.51).
All used as servers (NT4 Server, 2000 Server) and so entrenched they can't be touched.
/offtopic

Antariy

Hi Jochen :t

Here are results for your code.


45 cycles for tn7F00-tn7FFF (Alex)
40 cycles for short cut - press any key

Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
60      # passed

32 cycles for tn7F00-tn7FFF (JJ)
32 cycles for short cut - press any key

Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
60      # passed



Hi Steve :t

Quote from: FORTRANS on August 14, 2013, 09:52:51 PM
   No failure messages so don't know if the final message means
anything.  P-III.


>axjj_cmp
40 cycles for tn7F00-tn7FFF (Alex)
28 cycles for short cut - press any key
...
60      # passed


But my SSE1 version passed the test on PIII :biggrin: Thank you for the test on it :t
The tricky thing with that code is that I use pure SSE FLOAT comparsion instruction to "compare" integers :biggrin: - that's why it's interesting to see its behaviour on different machines.

jj2007

Quote from: Antariy on August 15, 2013, 04:03:04 AM
But my SSE1 version passed the test on PIII :biggrin: Thank you for the test on it :t
The tricky thing with that code is that I use pure SSE FLOAT comparsion instruction to "compare" integers :biggrin: - that's why it's interesting to see its behaviour on different machines.

Yeah, it looks tricky indeed, but it seems to work just fine :t
I'll stick to my algo because MasmBasic doesn't run anyway with less than SSE2, so it won't make a difference. Thanks for all the help, Alex :icon14:

Antariy

Hi Jochen :t
Quote from: jj2007 on August 15, 2013, 04:31:52 AM
Yeah, it looks tricky indeed, but it seems to work just fine :t

I created a thread where explain why this trick is pretty robust and legal :biggrin:

Quote from: jj2007 on August 15, 2013, 04:31:52 AM
I'll stick to my algo because MasmBasic doesn't run anyway with less than SSE2, so it won't make a difference. Thanks for all the help, Alex :icon14:

And your algo is faster :biggrin: :icon14: