[SSE2]Make all bytes positive

jj2007 · December 23, 2012, 03:21:37 AM

Quote from: dedndave on December 23, 2012, 03:15:58 AM
oh - didn't see the question

That was more a rhetorical question (and your answer is a lil' bit misleading, too - the whole thread is on signed bytes...)

dedndave · December 23, 2012, 03:23:55 AM

anyways, this seems to work ok...

Code Select

        movups  xmm0,oword ptr oData
        xorps   xmm1,xmm1
        pcmpgtb xmm1,xmm0
        xorps   xmm0,xmm1
        psubb   xmm0,xmm1
        movups  oword ptr oData,xmm0

dedndave · December 23, 2012, 03:26:28 AM

Quote from: jj2007 on December 23, 2012, 03:21:37 AM
and your answer is a lil' bit misleading, too - the whole thread is on signed bytes...

they are no longer signed if you take the absolute value, eh ?
besides - you cannot have +128 in the world of signed bytes
so, you must consider them to be unsigned

jj2007 · December 23, 2012, 03:30:21 AM

Yes, but Farabi wants positive bytes. 128 is not a positive byte, so my code converts it to +127...
(did you know that around Christmas people get nervous and stressed and start wars for virtually no reason? :icon_mrgreen:)
;)

dedndave · December 23, 2012, 03:39:18 AM

128 is positive if you regard it as unsigned

range for unsigned bytes: 0 to 255
range for signed bytes: -128 to +127

qWord · December 23, 2012, 05:08:38 AM

MOVUPS, XORPS ... and that with byte data

jj2007 · December 23, 2012, 05:42:37 AM

Quote from: dedndave on December 23, 2012, 03:39:18 AM
128 is positive if you regard it as unsigned

That's actually true! So we can help Farabi with much shorter code, since 129-255 are also positive when regarded as unsigned:
nop

@qWord: XORPS--Bitwise Logical XOR
... but I would be grateful for a link to some Intel or AMD source that explains in more detail what are the risks of using movups/movaps for integers. Agner Fog's microarchitecture, page 88, offers a fascinating lecture in this respect - see the part on latency & throughput.

dedndave · December 23, 2012, 05:58:28 AM

yah - but the difference is
we do not have -129 to -255 as possible input values
we DO have -128 as a possible input value

this is the nature of two's compliment
i know you have been down this road before - lol

jj2007 · December 23, 2012, 06:04:11 AM

Quote from: dedndave on December 23, 2012, 05:58:28 AM
we do not have -129 to -255 as possible input values
we DO have -128 as a possible input value

And I thought the whole point of this thread was to turn negative signed bytes into positive signed bytes. Now, is 128 aka 80h a signed positive byte? What does
mov byte ptr [esi], 128
movsx eax, byte ptr [esi]
return?

qWord · December 23, 2012, 07:52:03 AM

Quote from: jj2007 on December 23, 2012, 05:42:37 AM@qWord: XORPS--Bitwise Logical XOR
... but I would be grateful for a link to some Intel or AMD source that explains in more detail what are the risks of using movups/movaps for integers.

Why do you think they introduce different instructions that seems to do same? For fun?
Even, it has several times showed by tests of yourself (sorry I forgot the topics, but one was about your macros) that your habit of using wrong typed instructions cause speed issues on recent processors.

jj2007 · December 23, 2012, 07:59:35 AM

How boring. Bring evidence.

QuoteThe important conclusion here is that there is a penalty in terms of latency to using an XMM
instruction of the wrong type on the Nehalem. On previous Intel processors there is no
penalty for using move and shuffle instructions on other types of operands than they are
intended for.

The bypass delay is important in long dependency chains where latency is a bottleneck, but
not where it is throughput rather than latency that matters. In fact, the throughput may
actually be improved by using the integer vector versions of the move and Boolean
instructions

qWord · December 23, 2012, 08:16:18 AM

Quote from: Intel® 64 and IA-32 Architectures Optimization Reference Manual3.5.1.9 Mixing SIMD Data Types
Previous microarchitectures (before Intel Core microarchitecture) do not have
explicit restrictions on mixing integer and floating-point (FP) operations on XMM
registers. For Intel Core microarchitecture, mixing integer and floating-point opera-
tions on the content of an XMM register can degrade performance. Software should
avoid mixed-use of integer/FP operation on XMM registers. Specifically,

Use SIMD integer operations to feed SIMD integer operations. Use PXOR for
idiom.
Use SIMD floating point operations to feed SIMD floating point operations. Use
XORPS for idiom.
When floating point operations are bitwise equivalent, use PS data type instead
of PD data type. MOVAPS and MOVAPD do the same thing, but MOVAPS takes one
less byte to encode the instruction.

jj2007 · December 23, 2012, 08:32:05 AM

Intel recommendations are one thing, evidence is a different one. The latter is a testbed showing whether using movups instead of movdqu does degrade performance (not "can" degrade performance). Go ahead, set up a testbed, and let's have some fun in the lab :icon14:

Farabi · December 23, 2012, 08:42:28 AM

Thanks for the trouble.

I see that there is difficulty to determine wheter -1 and 255 is the same or not. So I dont think this should be done after the substraction and cannot be done after all substraction were done. But judgjing that there would be no substraction yield a result of 255 we can assume 255 is never exist and treat it as -1.

qWord · December 23, 2012, 09:37:41 AM

Quote from: jj2007 on December 23, 2012, 08:32:05 AMcan[/b]" degrade performance). Go ahead, set up a testbed, and let's have some fun in the lab :icon14:

that is boring ;)

The MASM Forum

News:

[SSE2]Make all bytes positive

jj2007

dedndave

dedndave

jj2007

dedndave

qWord

jj2007

dedndave

jj2007

qWord

jj2007

qWord

jj2007

Farabi

qWord