[SSE2]Make all bytes positive

dedndave · December 24, 2012, 06:39:45 AM

Quote from: Farabi on December 23, 2012, 08:42:28 AM
Thanks for the trouble.

I see that there is difficulty to determine wheter -1 and 255 is the same or not. So I dont think this should be done after the substraction and cannot be done after all substraction were done. But judgjing that there would be no substraction yield a result of 255 we can assume 255 is never exist and treat it as -1.

no, that's not the issue

the issue is: what to do with the value -128
and, are the resulting bytes signed or unsigned

range for unsigned bytes: 0 to 255
range for signed bytes: -128 to +127

so, when you convert -128 to a positive value, it exceeds the range of signed bytes

it boils down to: do you expect all resulting bytes to be representable with 7-bits

the other issue is: which sse mov instruction to use :P

Farabi · December 24, 2012, 02:40:13 PM

Quote from: dedndave on December 24, 2012, 06:39:45 AM
Quote from: Farabi on December 23, 2012, 08:42:28 AM
Thanks for the trouble.

I see that there is difficulty to determine wheter -1 and 255 is the same or not. So I dont think this should be done after the substraction and cannot be done after all substraction were done. But judgjing that there would be no substraction yield a result of 255 we can assume 255 is never exist and treat it as -1.

no, that's not the issue

the issue is: what to do with the value -128
and, are the resulting bytes signed or unsigned

range for unsigned bytes: 0 to 255
range for signed bytes: -128 to +127

so, when you convert -128 to a positive value, it exceeds the range of signed bytes

it boils down to: do you expect all resulting bytes to be representable with 7-bits

the other issue is: which sse mov instruction to use :P

Yes youre right, I better convert the bytes to word and then do the substraction so it will fit the bits. I just want to substract a pixel with it neightbour and then check if it is below ten, if it was, then it was a different pixel, if it was not, then it is the same pixel just different intensity.

dedndave · December 24, 2012, 02:55:23 PM

then, this is good code - no need for words

Code Select

        movups  xmm0,oword ptr oData
        xorps   xmm1,xmm1
        pcmpgtb xmm1,xmm0
        xorps   xmm0,xmm1
        psubb   xmm0,xmm1
        movups  oword ptr oData,xmm0

i am just not sure if i am using MOVUPS correctly - there may be a better instruction for that
qWord and Jochen were discussing it - then they went to discussing 64-bit moves

i don't understand the outcome - lol

according to qWord, i should use PXOR instead of XORPS
the documents i read didn't seem to say anything about that
some testing my be needed

Farabi · December 24, 2012, 03:00:58 PM

Hi

Code Select


 movups  xmm0,oword ptr oData
        xorps   xmm1,xmm1
        pcmpgtb xmm1,xmm0
        xorps   xmm0,xmm1
        psubb   xmm0,xmm1
        movups  oword ptr oData,xmm0

Can you please tell me what this code do ? I dont understand the xorps part.

dedndave · December 24, 2012, 03:05:36 PM

it does this, but SSE on 16 bytes at once

Code Select

mov al,n
cbw
xor al,ah
sub al,ah

if n is negative, then CBW makes AH = 0FFh
if n is positive, then CBW makes AH = 0

if n is negative, then XOR inverts all the bits
if n is positive, then XOR does nothing to AL

if n is negative, then SUB AL,AH adds one to AL (subtracts -1)
if n is positive, then SUB AL,AH does nothing (AH = 0)

the idea is this:
one way to negate a value is to invert all the bits, then add 1
(you could also subtract 1, then invert all the bits)
for absolute value, if the initial value is positive, we do not want to negate it

frktons · December 24, 2012, 03:07:27 PM

Quote
XORPS—Bitwise Logical XOR for Single-Precision Floating-Point Values

Description
------------------------
Performs a bitwise logical exclusive-OR of the four packed single-precision floating-point values from the source
operand (second operand) and the destination operand (first operand), and stores the result in the destination
operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an
XMM register.

Farabi · December 24, 2012, 03:08:20 PM

Quote from: dedndave on December 24, 2012, 03:05:36 PM
it does this, but SSE on 16 bytes at once
Code Select Expand
mov al,n cbw xor al,ah sub al,ah

if n is negative, then CBW makes AH = 0FFh
if n is positive, then CBW makes AH = 0

if n is negative, then XOR inverts all the bits
if n is positive, then XOR does nothing to AL

if n is negative, then SUB AL,AH adds one to AL (subtracts -1)
if n is positive, then SUB AL,AH does nothing (AH = 0)

the idea is this:
one way to negate a value is to invert all the bits, then add 1
for absolute value, if the initial value is positive, we do not want to negate it

Great idea. :U

Farabi · December 24, 2012, 04:32:12 PM

Here is my code so far

Code Select


SSECmpPixel proc uses esi edi pix1:dword,pix2:dword,tol:dword
	LOCAL buff[256]:dword
	
	lea esi,buff
	movd xmm0,pix1
	movd xmm1,pix2
	psubb xmm0,xmm1
	movd [esi],xmm0
	
	test byte ptr[esi],10000000b
	.if !ZERO?
		neg byte ptr[esi]
	.endif
	
	test byte ptr[esi+1],10000000b
	.if !ZERO?
		neg byte ptr[esi+1]
	.endif
	
	test byte ptr[esi+2],10000000b
	.if !ZERO?
		neg byte ptr[esi+2]
	.endif

	test byte ptr[esi+3],10000000b
	.if !ZERO?
		neg byte ptr[esi+3]
	.endif
	
	movd xmm0,[esi]
	movd xmm1,tol
	pcmpgtb xmm0,xmm1
	movd eax,xmm0
	
	
	ret
SSECmpPixel endp

Farabi · December 24, 2012, 04:47:03 PM

This code do the same, but without SSE, and it is faster.

Code Select


CmpPix proc uses esi edi ecx  pix1:dword,pix2:dword,tol:dword
	LOCAL r3,g3,b3,r4,g4,b4:dword
	LOCAL rr,rg,rb:dword
	
	mov eax,pix1
	mov edx,pix2
	
	movzx ecx,al
	mov r3,ecx
	shr eax,8
	movzx ecx,al
	mov g3,ecx
	shr eax,8
	movzx ecx,al
	mov b3,ecx
	
	movzx ecx,dl
	mov r4,ecx
	shr edx,8
	movzx ecx,dl
	mov g4,ecx
	shr edx,8
	movzx ecx,dl
	mov b4,ecx
	
	mov eax,r3
	sub eax,r4
	cmp eax,0
	jg @f
		neg eax
	@@:
	cmp eax,tol
	jle @f
		xor eax,eax
		mov ecx,0
		ret
	@@:
	mov rr,eax

	mov eax,g3
	sub eax,g4
	cmp eax,0
	jg @f
		neg eax
	@@:
	cmp eax,tol
	jle @f
		xor eax,eax
		mov ecx,1
		ret
	@@:
	mov rg,eax
	
	mov eax,b3
	sub eax,b4
	cmp eax,0
	jg @f
		neg eax
	@@:
	cmp eax,tol
	jle @f
		xor eax,eax
		mov ecx,2
		ret
	@@:

	
	xor eax,eax
	inc eax
	mov ecx,3
	
	ret
CmpPix endp

I dont think SSE that great.

qWord · December 24, 2012, 06:18:54 PM

Quote from: Farabi on December 24, 2012, 04:47:03 PMI dont think SSE that great.

because you use the wrong approach.

Code Select

.data
    align 16
	msk1 LABEL OWORD
			db 16 dup (1)
.code

movdqa xmm0,16 bytes
pcmpeqb xmm3,xmm3
pxor xmm1,xmm1
pcmpgtb xmm1,xmm0
movdqa xmm2,xmm1
pandn xmm2,xmm0
pand xmm1,xmm0
pandn xmm1,xmm3
paddb xmm1,msk1
por xmm1,xmm2
; xmm1 = abs(xmm0)

EDIT: daves solution is of course much better :t

jj2007 · December 24, 2012, 06:25:34 PM

Quote from: dedndave on December 24, 2012, 02:55:23 PM
according to qWord, i should use PXOR instead of XORPS
the documents i read didn't seem to say anything about that
some testing my be needed

Xmas present for you

m2m ecx, 7
LoopAlign ; same for all algos
.Repeat
movups xmm0, OWORD PTR [esi]
?xor? xmm1, xmm1
pcmpgtb xmm1, xmm0
?xor? xmm0, xmm1
psubb xmm0, xmm1
movups OWORD PTR [edi], xmm0
dec ecx
.Until Sign?

Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Testing with 10000000 loops
453 ms for psubb with pxor
453 ms for psubb with xorpd
454 ms for psubb with xorps

453 ms for psubb with pxor
454 ms for psubb with xorpd
453 ms for psubb with xorps

453 ms for psubb with pxor
452 ms for psubb with xorpd
454 ms for psubb with xorps

27 bytes for psubb with pxor
27 bytes for psubb with xorpd
25 bytes for psubb with xorps

qWord · December 24, 2012, 06:38:20 PM

Quote from: dedndave on December 24, 2012, 02:55:23 PMaccording to qWord, i should use PXOR instead of XORPS
the documents i read didn't seem to say anything about that

Intel's optimization manual says: "Use SIMD integer operations to feed SIMD integer operations. Use PXOR for
idiom"

dedndave · December 25, 2012, 02:04:56 AM

:t

thanks qWord

Jochen - didn't run the test
i know you don't want to see P4 results - lol
what you want is newer CPU results

hool · December 25, 2012, 03:24:03 AM

should work

Code Select

CmpPix proc uses esi edi ecx  pix1:dword,pix2:dword,tol:dword

        movd    xmm0, pix1
        movd    xmm1, pix2
        movd    xmm5, tol       ; for the sake of simplicity every byte of "tol" has same value

        ; difference between 2 values (not specifically between 0 and a value)
        pxor    xmm4, xmm4
        movdqa  xmm3, xmm1
        psubusb xmm1, xmm0
        pcmpeqb xmm4, xmm1
        pand    xmm0, xmm4
        psubusb xmm0, xmm3
        por     xmm0, xmm1      ; xmm0[7:0], xmm[15:8], xmm[23:16]   = diff betw colors

        pxor    xmm4, xmm4
        psubusb xmm0, xmm5
        pcmpeqb xmm0, xmm4
        pmovmskb eax, xmm0      ; low 3 bits indicate if color component was different or not

        ; optional
        not      eax
        and      eax, 0xffff
        bsf      ecx, eax       ; ecx = 1st color component that was different
        ;jz       all_identical
        
        ret
CmpPix endp

Farabi · December 25, 2012, 12:03:19 PM

Quote from: hool on December 25, 2012, 03:24:03 AM
should work

Code Select Expand
CmpPix proc uses esi edi ecx pix1:dword,pix2:dword,tol:dword movd xmm0, pix1 movd xmm1, pix2 movd xmm5, tol ; for the sake of simplicity every byte of "tol" has same value ; difference between 2 values (not specifically between 0 and a value) pxor xmm4, xmm4 movdqa xmm3, xmm1 psubusb xmm1, xmm0 pcmpeqb xmm4, xmm1 pand xmm0, xmm4 psubusb xmm0, xmm3 por xmm0, xmm1 ; xmm0[7:0], xmm[15:8], xmm[23:16] = diff betw colors pxor xmm4, xmm4 psubusb xmm0, xmm5 pcmpeqb xmm0, xmm4 pmovmskb eax, xmm0 ; low 3 bits indicate if color component was different or not ; optional not eax and eax, 0xffff bsf ecx, eax ; ecx = 1st color component that was different ;jz all_identical ret CmpPix endp

:t Thanks, great job.

The MASM Forum

News:

[SSE2]Make all bytes positive

dedndave

Farabi

dedndave

Farabi

dedndave

frktons

Farabi

Farabi

Farabi

qWord

jj2007

qWord

dedndave

hool

Farabi