News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

[SSE2]Make all bytes positive

Started by Farabi, December 21, 2012, 07:58:19 PM

Previous topic - Next topic

jj2007

Quote from: dedndave on December 23, 2012, 03:15:58 AM
oh - didn't see the question

That was more a rhetorical question (and your answer is a lil' bit misleading, too - the whole thread is on signed bytes...)

dedndave

anyways, this seems to work ok...

        movups  xmm0,oword ptr oData
        xorps   xmm1,xmm1
        pcmpgtb xmm1,xmm0
        xorps   xmm0,xmm1
        psubb   xmm0,xmm1
        movups  oword ptr oData,xmm0

dedndave

Quote from: jj2007 on December 23, 2012, 03:21:37 AM
and your answer is a lil' bit misleading, too - the whole thread is on signed bytes...

they are no longer signed if you take the absolute value, eh ?
besides - you cannot have +128 in the world of signed bytes
so, you must consider them to be unsigned

jj2007

Yes, but Farabi wants positive bytes. 128 is not a positive byte, so my code converts it to +127...
(did you know that around Christmas people get nervous and stressed and start wars for virtually no reason?  :icon_mrgreen:)
;)

dedndave

128 is positive if you regard it as unsigned

range for unsigned bytes: 0 to 255
range for signed bytes: -128 to +127

qWord

MOVUPS, XORPS ... and that with byte data   :eusa_naughty:
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

Quote from: dedndave on December 23, 2012, 03:39:18 AM
128 is positive if you regard it as unsigned

That's actually true! So we can help Farabi with much shorter code, since 129-255 are also positive when regarded as unsigned:
  nop

@qWord: XORPS--Bitwise Logical XOR
... but I would be grateful for a link to some Intel or AMD source that explains in more detail what are the risks of using movups/movaps for integers. Agner Fog's microarchitecture, page 88, offers a fascinating lecture in this respect - see the part on latency & throughput.

dedndave

yah - but the difference is
we do not have -129 to -255 as possible input values
we DO have -128 as a possible input value

this is the nature of two's compliment
i know you have been down this road before - lol

jj2007

Quote from: dedndave on December 23, 2012, 05:58:28 AM
we do not have -129 to -255 as possible input values
we DO have -128 as a possible input value

And I thought the whole point of this thread was to turn negative signed bytes into positive signed bytes. Now, is 128 aka 80h a signed positive byte? What does
mov byte ptr [esi], 128
movsx eax, byte ptr [esi]
return?

qWord

Quote from: jj2007 on December 23, 2012, 05:42:37 AM@qWord: XORPS--Bitwise Logical XOR
... but I would be grateful for a link to some Intel or AMD source that explains in more detail what are the risks of using movups/movaps for integers.
Why do you think they introduce different instructions that seems to do same? For fun?
Even, it has several times showed by tests of yourself (sorry I forgot the topics, but one was about your macros) that your habit of using wrong typed instructions cause speed issues on recent processors.
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

How boring. Bring evidence.

QuoteThe important conclusion here is that there is a penalty in terms of latency to using an XMM
instruction of the wrong type on the Nehalem. On previous Intel processors there is no
penalty for using move and shuffle instructions on other types of operands than they are
intended for. 

The bypass delay is important in long dependency chains where latency is a bottleneck, but 
not where it is throughput rather than latency that matters. In fact, the throughput may
actually be improved by using the integer vector versions of the move and Boolean
instructions

qWord

Quote from: IntelĀ® 64 and IA-32 Architectures Optimization Reference Manual3.5.1.9 Mixing SIMD Data Types
Previous microarchitectures (before Intel Core microarchitecture) do not have
explicit restrictions on mixing integer and floating-point (FP) operations on XMM
registers. For Intel Core microarchitecture, mixing integer and floating-point opera-
tions on the content of an XMM register can degrade performance. Software should
avoid mixed-use of integer/FP operation on XMM registers. Specifically,

  • Use SIMD integer operations to feed SIMD integer operations. Use PXOR for
    idiom.
  • Use SIMD floating point operations to feed SIMD floating point operations. Use
    XORPS for idiom.
  • When floating point operations are bitwise equivalent, use PS data type instead
    of PD data type. MOVAPS and MOVAPD do the same thing, but MOVAPS takes one
    less byte to encode the instruction.
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

Intel recommendations are one thing, evidence is a different one. The latter is a testbed showing whether using movups instead of movdqu does degrade performance (not "can" degrade performance). Go ahead, set up a testbed, and let's have some fun in the lab :icon14:

Farabi

Thanks for the trouble.

I see that there is difficulty to determine wheter -1 and 255 is the same or not. So I dont think this should be done after the substraction and cannot be done after all substraction were done. But judgjing that there would be no substraction yield a result of 255 we can assume 255 is never exist and treat it as -1.
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

qWord

Quote from: jj2007 on December 23, 2012, 08:32:05 AMcan[/b]" degrade performance). Go ahead, set up a testbed, and let's have some fun in the lab :icon14:
that is boring   ;)
MREAL macros - when you need floating point arithmetic while assembling!