Author Topic: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)  (Read 4767 times)

Antariy

  • Member
  • ****
  • Posts: 541
It's just like a CMP instruction, but for 128 bits instead of 32 :biggrin: So, you can use a constructs like:
AxCMP128bit thenumber1,thenumber2
jge @Signed_jumpIfTheNumber1IsGreaterThanTheNumber2OrEqualToIt


The only one difference is in that the code does not (always) set SF flag, but all other conditions are met the "standard". The sign maybe just checked by checking highest order bit of the OWORD, though.

Code: [Select]
65 cycles for tn7F00-tn7FFF
42 cycles for short cut
Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
Press any key to continue ...

The results are exact for signed/unsigned/mixed comparsions.

Since the code is SSE1-capable, and uses an interesting trycky replacement (my, but someone on the planet for sure used it somewhere, too) of PCMPEQD (SSE2), it's interesting to see how the code works on a wide range of SSE1+ capable harware, so, I'm asking for a test in this new thread.

Antariy

  • Member
  • ****
  • Posts: 541
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #1 on: August 14, 2013, 02:58:40 PM »
An update - jump table instead of JECXZ :biggrin:

Code: [Select]
41 cycles for tn7F00-tn7FFF
29 cycles for short cut
Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
Press any key to continue ...

sinsi

  • Member
  • ****
  • Posts: 996
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #2 on: August 14, 2013, 03:13:35 PM »
Here you go
4 cycles for tn7F00-tn7FFF
2 cycles for short cut

Everything passed.
I can walk on water but stagger on beer.

Siekmanski

  • Member
  • *****
  • Posts: 1089
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #3 on: August 14, 2013, 05:51:56 PM »
12 cycles for tn7F00-tn7FFF
8 cycles for short cut

They all pass

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #4 on: August 14, 2013, 06:48:21 PM »
p4 prescott w/htt
Code: [Select]
70 cycles for tn7F00-tn7FFF
58 cycles for short cut

everything passed   :t

Antariy

  • Member
  • ****
  • Posts: 541
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #5 on: August 14, 2013, 07:08:10 PM »
Thank you all! :biggrin:

jj2007

  • Member
  • *****
  • Posts: 7542
  • Assembler is fun ;-)
    • MasmBasic
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #6 on: August 14, 2013, 08:56:06 PM »
Hi Alex,

Now I solved my problems with the flags. The new version produces exactly the same passes as yours (60). The timings looked a bit inconsistent, so I added an invoke Sleep, 50.

35 cycles for tn7F00-tn7FFF (Alex)
33 cycles for short cut - press any key

23 cycles for tn7F00-tn7FFF (JJ)
23 cycles for short cut - press any key


Modified source & exe attached.

sinsi

  • Member
  • ****
  • Posts: 996
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #7 on: August 14, 2013, 09:47:01 PM »
i3-3100M
15 cycles for tn7F00-tn7FFF (Alex)
6 cycles for short cut

1 cycles for tn7F00-tn7FFF (JJ)
4 cycles for short cut


i7-3770K
5 cycles for tn7F00-tn7FFF (Alex)
3 cycles for short cut

2 cycles for tn7F00-tn7FFF (JJ)
1 cycles for short cut

I can walk on water but stagger on beer.

FORTRANS

  • Member
  • ****
  • Posts: 944
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #8 on: August 14, 2013, 09:52:51 PM »
Hi,

   No failure messages so don't know if the final message means
anything.  P-III.

Regards,

Steve


>axjj_cmp
40 cycles for tn7F00-tn7FFF (Alex)
28 cycles for short cut - press any key
...
60      # passed

30 cycles for tn7F00-tn7FFF (JJ)
30 cycles for short cut - press any key

...

34      # passed

bye

jj2007

  • Member
  • *****
  • Posts: 7542
  • Assembler is fun ;-)
    • MasmBasic
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #9 on: August 14, 2013, 10:06:45 PM »
   No failure messages so don't know if the final message means
anything.  P-III.

34      # passed

Since it's less than 60, it means the P-III doesn't know about SSE2 ;-)

sinsi

  • Member
  • ****
  • Posts: 996
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #10 on: August 14, 2013, 10:35:50 PM »
Heh, it's good to see a PIII as a sanity check, I have worked on quite a few in the last couple of years (even a PII with NT 3.51).
All used as servers (NT4 Server, 2000 Server) and so entrenched they can't be touched.
/offtopic
I can walk on water but stagger on beer.

Antariy

  • Member
  • ****
  • Posts: 541
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #11 on: August 15, 2013, 04:03:04 AM »
Hi Jochen :t

Here are results for your code.

Code: [Select]
45 cycles for tn7F00-tn7FFF (Alex)
40 cycles for short cut - press any key

Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
60      # passed

32 cycles for tn7F00-tn7FFF (JJ)
32 cycles for short cut - press any key

Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
60      # passed


Hi Steve :t

   No failure messages so don't know if the final message means
anything.  P-III.


>axjj_cmp
40 cycles for tn7F00-tn7FFF (Alex)
28 cycles for short cut - press any key
...
60      # passed


But my SSE1 version passed the test on PIII :biggrin: Thank you for the test on it :t
The tricky thing with that code is that I use pure SSE FLOAT comparsion instruction to "compare" integers :biggrin: - that's why it's interesting to see its behaviour on different machines.

jj2007

  • Member
  • *****
  • Posts: 7542
  • Assembler is fun ;-)
    • MasmBasic
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #12 on: August 15, 2013, 04:31:52 AM »
But my SSE1 version passed the test on PIII :biggrin: Thank you for the test on it :t
The tricky thing with that code is that I use pure SSE FLOAT comparsion instruction to "compare" integers :biggrin: - that's why it's interesting to see its behaviour on different machines.

Yeah, it looks tricky indeed, but it seems to work just fine :t
I'll stick to my algo because MasmBasic doesn't run anyway with less than SSE2, so it won't make a difference. Thanks for all the help, Alex :icon14:

Antariy

  • Member
  • ****
  • Posts: 541
Re: 128 bit comparsion "instruction" (CMP for 128 bits-wide numbers)
« Reply #13 on: August 15, 2013, 11:18:57 PM »
Hi Jochen :t
Yeah, it looks tricky indeed, but it seems to work just fine :t

I created a thread where explain why this trick is pretty robust and legal :biggrin:

I'll stick to my algo because MasmBasic doesn't run anyway with less than SSE2, so it won't make a difference. Thanks for all the help, Alex :icon14:

And your algo is faster :biggrin: :icon14: