The MASM Forum

General => The Laboratory => Topic started by: Antariy on August 14, 2013, 02:31:50 PM

Title: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: Antariy on August 14, 2013, 02:31:50 PM
It's just like a CMP instruction, but for 128 bits instead of 32 :biggrin: So, you can use a constructs like:
AxCMP128bit thenumber1,thenumber2
jge @Signed_jumpIfTheNumber1IsGreaterThanTheNumber2OrEqualToIt


The only one difference is in that the code does not (always) set SF flag, but all other conditions are met the "standard". The sign maybe just checked by checking highest order bit of the OWORD, though.


65 cycles for tn7F00-tn7FFF
42 cycles for short cut
Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
Press any key to continue ...


The results are exact for signed/unsigned/mixed comparsions.

Since the code is SSE1-capable, and uses an interesting trycky replacement (my, but someone on the planet for sure used it somewhere, too) of PCMPEQD (SSE2), it's interesting to see how the code works on a wide range of SSE1+ capable harware, so, I'm asking for a test in this new thread.
Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: Antariy on August 14, 2013, 02:58:40 PM
An update - jump table instead of JECXZ :biggrin:

41 cycles for tn7F00-tn7FFF
29 cycles for short cut
Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
Press any key to continue ...
Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: sinsi on August 14, 2013, 03:13:35 PM
Here you go
4 cycles for tn7F00-tn7FFF
2 cycles for short cut

Everything passed.
Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: Siekmanski on August 14, 2013, 05:51:56 PM
12 cycles for tn7F00-tn7FFF
8 cycles for short cut

They all pass
Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: dedndave on August 14, 2013, 06:48:21 PM
p4 prescott w/htt
70 cycles for tn7F00-tn7FFF
58 cycles for short cut


everything passed   :t
Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: Antariy on August 14, 2013, 07:08:10 PM
Thank you all! :biggrin:
Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: jj2007 on August 14, 2013, 08:56:06 PM
Hi Alex,

Now I solved my problems with the flags. The new version produces exactly the same passes as yours (60). The timings looked a bit inconsistent, so I added an invoke Sleep, 50.

35 cycles for tn7F00-tn7FFF (Alex)
33 cycles for short cut - press any key

23 cycles for tn7F00-tn7FFF (JJ)
23 cycles for short cut - press any key


Modified source & exe attached.
Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: sinsi on August 14, 2013, 09:47:01 PM
i3-3100M
15 cycles for tn7F00-tn7FFF (Alex)
6 cycles for short cut

1 cycles for tn7F00-tn7FFF (JJ)
4 cycles for short cut


i7-3770K
5 cycles for tn7F00-tn7FFF (Alex)
3 cycles for short cut

2 cycles for tn7F00-tn7FFF (JJ)
1 cycles for short cut

Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: FORTRANS on August 14, 2013, 09:52:51 PM
Hi,

   No failure messages so don't know if the final message means
anything.  P-III.

Regards,

Steve


>axjj_cmp
40 cycles for tn7F00-tn7FFF (Alex)
28 cycles for short cut - press any key
...
60      # passed

30 cycles for tn7F00-tn7FFF (JJ)
30 cycles for short cut - press any key

...

34      # passed

bye
Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: jj2007 on August 14, 2013, 10:06:45 PM
Quote from: FORTRANS on August 14, 2013, 09:52:51 PM
   No failure messages so don't know if the final message means
anything.  P-III.

34      # passed

Since it's less than 60, it means the P-III doesn't know about SSE2 ;-)
Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: sinsi on August 14, 2013, 10:35:50 PM
Heh, it's good to see a PIII as a sanity check, I have worked on quite a few in the last couple of years (even a PII with NT 3.51).
All used as servers (NT4 Server, 2000 Server) and so entrenched they can't be touched.
/offtopic
Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: Antariy on August 15, 2013, 04:03:04 AM
Hi Jochen :t

Here are results for your code.


45 cycles for tn7F00-tn7FFF (Alex)
40 cycles for short cut - press any key

Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
60      # passed

32 cycles for tn7F00-tn7FFF (JJ)
32 cycles for short cut - press any key

Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
60      # passed



Hi Steve :t

Quote from: FORTRANS on August 14, 2013, 09:52:51 PM
   No failure messages so don't know if the final message means
anything.  P-III.


>axjj_cmp
40 cycles for tn7F00-tn7FFF (Alex)
28 cycles for short cut - press any key
...
60      # passed


But my SSE1 version passed the test on PIII :biggrin: Thank you for the test on it :t
The tricky thing with that code is that I use pure SSE FLOAT comparsion instruction to "compare" integers :biggrin: - that's why it's interesting to see its behaviour on different machines.
Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: jj2007 on August 15, 2013, 04:31:52 AM
Quote from: Antariy on August 15, 2013, 04:03:04 AM
But my SSE1 version passed the test on PIII :biggrin: Thank you for the test on it :t
The tricky thing with that code is that I use pure SSE FLOAT comparsion instruction to "compare" integers :biggrin: - that's why it's interesting to see its behaviour on different machines.

Yeah, it looks tricky indeed, but it seems to work just fine :t
I'll stick to my algo because MasmBasic doesn't run anyway with less than SSE2, so it won't make a difference. Thanks for all the help, Alex :icon14:
Title: Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
Post by: Antariy on August 15, 2013, 11:18:57 PM
Hi Jochen :t
Quote from: jj2007 on August 15, 2013, 04:31:52 AM
Yeah, it looks tricky indeed, but it seems to work just fine :t

I created a thread (http://masm32.com/board/index.php?topic=2242.msg23112#msg23112) where explain why this trick is pretty robust and legal :biggrin:

Quote from: jj2007 on August 15, 2013, 04:31:52 AM
I'll stick to my algo because MasmBasic doesn't run anyway with less than SSE2, so it won't make a difference. Thanks for all the help, Alex :icon14:

And your algo is faster :biggrin: :icon14: