News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

[SSE2]Make all bytes positive

Started by Farabi, December 21, 2012, 07:58:19 PM

Previous topic - Next topic

dedndave

Quote from: Farabi on December 23, 2012, 08:42:28 AM
Thanks for the trouble.

I see that there is difficulty to determine wheter -1 and 255 is the same or not. So I dont think this should be done after the substraction and cannot be done after all substraction were done. But judgjing that there would be no substraction yield a result of 255 we can assume 255 is never exist and treat it as -1.

no, that's not the issue

the issue is: what to do with the value -128
and, are the resulting bytes signed or unsigned

range for unsigned bytes: 0 to 255
range for signed bytes: -128 to +127

so, when you convert -128 to a positive value, it exceeds the range of signed bytes

it boils down to: do you expect all resulting bytes to be representable with 7-bits

the other issue is: which sse mov instruction to use   :P

Farabi

Quote from: dedndave on December 24, 2012, 06:39:45 AM
Quote from: Farabi on December 23, 2012, 08:42:28 AM
Thanks for the trouble.

I see that there is difficulty to determine wheter -1 and 255 is the same or not. So I dont think this should be done after the substraction and cannot be done after all substraction were done. But judgjing that there would be no substraction yield a result of 255 we can assume 255 is never exist and treat it as -1.

no, that's not the issue

the issue is: what to do with the value -128
and, are the resulting bytes signed or unsigned

range for unsigned bytes: 0 to 255
range for signed bytes: -128 to +127

so, when you convert -128 to a positive value, it exceeds the range of signed bytes

it boils down to: do you expect all resulting bytes to be representable with 7-bits

the other issue is: which sse mov instruction to use   :P

Yes youre right, I better convert the bytes to word and then do the substraction so it will fit the bits. I just want to substract a pixel with it neightbour and then check if it is below ten, if it was, then it was a different pixel, if it was not, then it is the same pixel just different intensity.
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

dedndave

then, this is good code - no need for words

        movups  xmm0,oword ptr oData
        xorps   xmm1,xmm1
        pcmpgtb xmm1,xmm0
        xorps   xmm0,xmm1
        psubb   xmm0,xmm1
        movups  oword ptr oData,xmm0


i am just not sure if i am using MOVUPS correctly - there may be a better instruction for that
qWord and Jochen were discussing it - then they went to discussing 64-bit moves   :dazzled:
i don't understand the outcome - lol

according to qWord, i should use PXOR instead of XORPS
the documents i read didn't seem to say anything about that
some testing my be needed

Farabi

Hi

movups  xmm0,oword ptr oData
        xorps   xmm1,xmm1
        pcmpgtb xmm1,xmm0
        xorps   xmm0,xmm1
        psubb   xmm0,xmm1
        movups  oword ptr oData,xmm0


Can you please tell me what this code do ? I dont understand the xorps part.
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

dedndave

it does this, but SSE on 16 bytes at once
mov al,n
cbw
xor al,ah
sub al,ah


if n is negative, then CBW makes AH = 0FFh
if n is positive, then CBW makes AH = 0

if n is negative, then XOR inverts all the bits
if n is positive, then XOR does nothing to AL

if n is negative, then SUB AL,AH adds one to AL (subtracts -1)
if n is positive, then SUB AL,AH does nothing (AH = 0)

the idea is this:
one way to negate a value is to invert all the bits, then add 1
(you could also subtract 1, then invert all the bits)
for absolute value, if the initial value is positive, we do not want to negate it

frktons

Quote
XORPS—Bitwise Logical XOR for Single-Precision Floating-Point Values

Description
------------------------
Performs a bitwise logical exclusive-OR of the four packed single-precision floating-point values from the source
operand (second operand) and the destination operand (first operand), and stores the result in the destination
operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an
XMM register.

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

Farabi

Quote from: dedndave on December 24, 2012, 03:05:36 PM
it does this, but SSE on 16 bytes at once
mov al,n
cbw
xor al,ah
sub al,ah


if n is negative, then CBW makes AH = 0FFh
if n is positive, then CBW makes AH = 0

if n is negative, then XOR inverts all the bits
if n is positive, then XOR does nothing to AL

if n is negative, then SUB AL,AH adds one to AL (subtracts -1)
if n is positive, then SUB AL,AH does nothing (AH = 0)

the idea is this:
one way to negate a value is to invert all the bits, then add 1
for absolute value, if the initial value is positive, we do not want to negate it

Great idea. :U
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

Farabi

#52
Here is my code so far


SSECmpPixel proc uses esi edi pix1:dword,pix2:dword,tol:dword
LOCAL buff[256]:dword

lea esi,buff
movd xmm0,pix1
movd xmm1,pix2
psubb xmm0,xmm1
movd [esi],xmm0

test byte ptr[esi],10000000b
.if !ZERO?
neg byte ptr[esi]
.endif

test byte ptr[esi+1],10000000b
.if !ZERO?
neg byte ptr[esi+1]
.endif

test byte ptr[esi+2],10000000b
.if !ZERO?
neg byte ptr[esi+2]
.endif

test byte ptr[esi+3],10000000b
.if !ZERO?
neg byte ptr[esi+3]
.endif

movd xmm0,[esi]
movd xmm1,tol
pcmpgtb xmm0,xmm1
movd eax,xmm0


ret
SSECmpPixel endp
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

Farabi

This code do the same, but without SSE, and it is faster.


CmpPix proc uses esi edi ecx  pix1:dword,pix2:dword,tol:dword
LOCAL r3,g3,b3,r4,g4,b4:dword
LOCAL rr,rg,rb:dword

mov eax,pix1
mov edx,pix2

movzx ecx,al
mov r3,ecx
shr eax,8
movzx ecx,al
mov g3,ecx
shr eax,8
movzx ecx,al
mov b3,ecx

movzx ecx,dl
mov r4,ecx
shr edx,8
movzx ecx,dl
mov g4,ecx
shr edx,8
movzx ecx,dl
mov b4,ecx

mov eax,r3
sub eax,r4
cmp eax,0
jg @f
neg eax
@@:
cmp eax,tol
jle @f
xor eax,eax
mov ecx,0
ret
@@:
mov rr,eax

mov eax,g3
sub eax,g4
cmp eax,0
jg @f
neg eax
@@:
cmp eax,tol
jle @f
xor eax,eax
mov ecx,1
ret
@@:
mov rg,eax

mov eax,b3
sub eax,b4
cmp eax,0
jg @f
neg eax
@@:
cmp eax,tol
jle @f
xor eax,eax
mov ecx,2
ret
@@:


xor eax,eax
inc eax
mov ecx,3

ret
CmpPix endp


I dont think SSE that great.
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

qWord

Quote from: Farabi on December 24, 2012, 04:47:03 PMI dont think SSE that great.
because you use the wrong approach.

.data
    align 16
msk1 LABEL OWORD
db 16 dup (1)
.code

movdqa xmm0,16 bytes
pcmpeqb xmm3,xmm3
pxor xmm1,xmm1
pcmpgtb xmm1,xmm0
movdqa xmm2,xmm1
pandn xmm2,xmm0
pand xmm1,xmm0
pandn xmm1,xmm3
paddb xmm1,msk1
por xmm1,xmm2
; xmm1 = abs(xmm0)


EDIT: daves solution is of course much better :t
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

Quote from: dedndave on December 24, 2012, 02:55:23 PM
according to qWord, i should use PXOR instead of XORPS
the documents i read didn't seem to say anything about that
some testing my be needed

Xmas present for you :biggrin:

  m2m ecx, 7
  LoopAlign        ; same for all algos
.Repeat
        movups xmm0, OWORD PTR [esi]
        ?xor? xmm1, xmm1
        pcmpgtb xmm1, xmm0
        ?xor? xmm0, xmm1
        psubb xmm0, xmm1
        movups OWORD PTR [edi], xmm0
        dec ecx
  .Until Sign?

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
Testing with 10000000 loops
453 ms for psubb with pxor
453 ms for psubb with xorpd
454 ms for psubb with xorps

453 ms for psubb with pxor
454 ms for psubb with xorpd
453 ms for psubb with xorps

453 ms for psubb with pxor
452 ms for psubb with xorpd
454 ms for psubb with xorps

27      bytes for psubb with pxor
27      bytes for psubb with xorpd
25      bytes for psubb with xorps

qWord

Quote from: dedndave on December 24, 2012, 02:55:23 PMaccording to qWord, i should use PXOR instead of XORPS
the documents i read didn't seem to say anything about that
Intel's optimization manual says: "Use SIMD integer operations to feed SIMD integer operations. Use PXOR for
idiom"
MREAL macros - when you need floating point arithmetic while assembling!

dedndave

 :t

thanks qWord

Jochen - didn't run the test
i know you don't want to see P4 results - lol
what you want is newer CPU results

hool

#58
should work

CmpPix proc uses esi edi ecx  pix1:dword,pix2:dword,tol:dword

        movd    xmm0, pix1
        movd    xmm1, pix2
        movd    xmm5, tol       ; for the sake of simplicity every byte of "tol" has same value

        ; difference between 2 values (not specifically between 0 and a value)
        pxor    xmm4, xmm4
        movdqa  xmm3, xmm1
        psubusb xmm1, xmm0
        pcmpeqb xmm4, xmm1
        pand    xmm0, xmm4
        psubusb xmm0, xmm3
        por     xmm0, xmm1      ; xmm0[7:0], xmm[15:8], xmm[23:16]   = diff betw colors

        pxor    xmm4, xmm4
        psubusb xmm0, xmm5
        pcmpeqb xmm0, xmm4
        pmovmskb eax, xmm0      ; low 3 bits indicate if color component was different or not

        ; optional
        not      eax
        and      eax, 0xffff
        bsf      ecx, eax       ; ecx = 1st color component that was different
        ;jz       all_identical
       
        ret
CmpPix endp   

Farabi

Quote from: hool on December 25, 2012, 03:24:03 AM
should work

CmpPix proc uses esi edi ecx  pix1:dword,pix2:dword,tol:dword

        movd    xmm0, pix1
        movd    xmm1, pix2
        movd    xmm5, tol       ; for the sake of simplicity every byte of "tol" has same value

        ; difference between 2 values (not specifically between 0 and a value)
        pxor    xmm4, xmm4
        movdqa  xmm3, xmm1
        psubusb xmm1, xmm0
        pcmpeqb xmm4, xmm1
        pand    xmm0, xmm4
        psubusb xmm0, xmm3
        por     xmm0, xmm1      ; xmm0[7:0], xmm[15:8], xmm[23:16]   = diff betw colors

        pxor    xmm4, xmm4
        psubusb xmm0, xmm5
        pcmpeqb xmm0, xmm4
        pmovmskb eax, xmm0      ; low 3 bits indicate if color component was different or not

        ; optional
        not      eax
        and      eax, 0xffff
        bsf      ecx, eax       ; ecx = 1st color component that was different
        ;jz       all_identical
       
        ret
CmpPix endp   


:t Thanks, great job.
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165