It's just like a CMP instruction, but for 128 bits instead of 32 :biggrin: So, you can use a constructs like:
AxCMP128bit thenumber1,thenumber2
jge @Signed_jumpIfTheNumber1IsGreaterThanTheNumber2OrEqualToIt
The only one difference is in that the code does not (always) set SF flag, but all other conditions are met the "standard". The sign maybe just checked by checking highest order bit of the OWORD, though.
65 cycles for tn7F00-tn7FFF
42 cycles for short cut
Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
Press any key to continue ...
The results are exact for signed/unsigned/mixed comparsions.
Since the code is SSE1-capable, and uses an interesting trycky replacement (my, but someone on the planet for sure used it somewhere, too) of PCMPEQD (SSE2), it's interesting to see how the code works on a wide range of SSE1+ capable harware, so, I'm asking for a test in this new thread.
An update - jump table instead of JECXZ :biggrin:
41 cycles for tn7F00-tn7FFF
29 cycles for short cut
Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
Press any key to continue ...
Here you go
4 cycles for tn7F00-tn7FFF
2 cycles for short cut
Everything passed.
12 cycles for tn7F00-tn7FFF
8 cycles for short cut
They all pass
p4 prescott w/htt
70 cycles for tn7F00-tn7FFF
58 cycles for short cut
everything passed :t
Thank you all! :biggrin:
Hi Alex,
Now I solved my problems with the flags. The new version produces exactly the same passes as yours (60). The timings looked a bit inconsistent, so I added an invoke Sleep, 50.
35 cycles for tn7F00-tn7FFF (Alex)
33 cycles for short cut - press any key
23 cycles for tn7F00-tn7FFF (JJ)
23 cycles for short cut - press any key
Modified source & exe attached.
i3-3100M
15 cycles for tn7F00-tn7FFF (Alex)
6 cycles for short cut
1 cycles for tn7F00-tn7FFF (JJ)
4 cycles for short cut
i7-3770K
5 cycles for tn7F00-tn7FFF (Alex)
3 cycles for short cut
2 cycles for tn7F00-tn7FFF (JJ)
1 cycles for short cut
Hi,
No failure messages so don't know if the final message means
anything. P-III.
Regards,
Steve
>axjj_cmp
40 cycles for tn7F00-tn7FFF (Alex)
28 cycles for short cut - press any key
...
60 # passed
30 cycles for tn7F00-tn7FFF (JJ)
30 cycles for short cut - press any key
...
34 # passed
bye
Quote from: FORTRANS on August 14, 2013, 09:52:51 PM
No failure messages so don't know if the final message means
anything. P-III.
34 # passed
Since it's less than 60, it means the P-III doesn't know about SSE2 ;-)
Heh, it's good to see a PIII as a sanity check, I have worked on quite a few in the last couple of years (even a PII with NT 3.51).
All used as servers (NT4 Server, 2000 Server) and so entrenched they can't be touched.
/offtopic
Hi
Jochen :t
Here are results for your code.
45 cycles for tn7F00-tn7FFF (Alex)
40 cycles for short cut - press any key
Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
60 # passed
32 cycles for tn7F00-tn7FFF (JJ)
32 cycles for short cut - press any key
Compare tn7F00 with tn7FFF
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tn7F00
JAE passed
JA passed
JGE passed
JG passed
Compare tn7FFF with tn0
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn7F00
JBE passed
JB passed
JLE passed
JL passed
Compare tn7FFF with tnN1
JGE passed
JG passed
JBE passed
JB passed
Compare tnN1 with tn7F00
JAE passed
JA passed
JLE passed
JL passed
Compare tnN1 with tnN1
JZ passed
Compare tnN1 with tnN2
JAE passed
JA passed
JGE passed
JG passed
Compare tnN2 with tnN1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tn0
JAE passed
JA passed
JLE passed
JL passed
Compare tn0 with tnx1y1
JGE passed
JG passed
JBE passed
JB passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y1
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y2 with tnNx2y1
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx2y1 with tnNx1y2
JBE passed
JB passed
JLE passed
JL passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tnNx1y2 with tnNx1y2
JZ passed
Compare tnNx1y1 with tnNx1y2
JAE passed
JA passed
JGE passed
JG passed
Compare tn0 with tn0
JZ passed
Compare tn7F00 with tn7F00
JZ passed
60 # passed
Hi
Steve :t
Quote from: FORTRANS on August 14, 2013, 09:52:51 PM
No failure messages so don't know if the final message means
anything. P-III.
>axjj_cmp
40 cycles for tn7F00-tn7FFF (Alex)
28 cycles for short cut - press any key
...
60 # passed
But my SSE1 version passed the test on PIII :biggrin: Thank you for the test on it :t
The tricky thing with that code is that I use pure SSE FLOAT comparsion instruction to "compare" integers :biggrin: - that's why it's interesting to see its behaviour on different machines.
Quote from: Antariy on August 15, 2013, 04:03:04 AM
But my SSE1 version passed the test on PIII :biggrin: Thank you for the test on it :t
The tricky thing with that code is that I use pure SSE FLOAT comparsion instruction to "compare" integers :biggrin: - that's why it's interesting to see its behaviour on different machines.
Yeah, it looks tricky indeed, but it seems to work just fine :t
I'll stick to my algo because MasmBasic doesn't run anyway with less than SSE2, so it won't make a difference. Thanks for all the help, Alex :icon14:
Hi
Jochen :t
Quote from: jj2007 on August 15, 2013, 04:31:52 AM
Yeah, it looks tricky indeed, but it seems to work just fine :t
I created a thread (http://masm32.com/board/index.php?topic=2242.msg23112#msg23112) where explain why this trick is pretty robust and legal :biggrin:
Quote from: jj2007 on August 15, 2013, 04:31:52 AM
I'll stick to my algo because MasmBasic doesn't run anyway with less than SSE2, so it won't make a difference. Thanks for all the help, Alex :icon14:
And your algo is faster :biggrin: :icon14: