Im substracting 16 8bit integer and want to have the result all positive. How to do that?
Something like this?
.code
whatever db 123, -127, 99, 255, 128, 127, 0, 100, 123, -127, 99, 255, 128, 127, 0, 100
start:
mov eax, 7f7f7f7fh
movd xmm1, eax
pshufd xmm1, xmm1, 0
movups xmm0, oword ptr whatever
andps xmm0, xmm1
Quote from: jj2007 on December 21, 2012, 08:26:06 PM
Something like this?
.code
whatever db 123, -127, 99, 255, 128, 127, 0, 100, 123, -127, 99, 255, 128, 127, 0, 100
start:
mov eax, 7f7f7f7fh
movd xmm1, eax
pshufd xmm1, xmm1, 0
movups xmm0, oword ptr whatever
pandn xmm0, xmm1
:t Well I dont get it how it work yet, but yes that is what I want. I though there are a single instruction out there but that should sufficient, thanks.
It seems andps is the one to choose, not pandn - see corrected code above.
Can I know what this code does?
pshufd xmm1, xmm1, 0
It doesnot seems did nothing.
Farabi, do you want to have the absolute value of the difference? (abs(b-a))
Quote from: Farabi on December 21, 2012, 08:52:09 PM
Can I know what this code does?
pshufd xmm1, xmm1, 0
It doesnot seems did nothing.
It propagates the lowest dword to the other three dwords of the xmm reg, so that xmm1 contains
7f7f7f7f7f7f7f7f7f7f7f7f7f7f7fh
Quote from: qWord on December 21, 2012, 09:07:03 PM
Farabi, do you want to have the absolute value of the difference? (abs(b-a))
I think integer will always be absolute.
Quote from: Farabi on December 22, 2012, 03:48:19 AM
Quote from: qWord on December 21, 2012, 09:07:03 PM
Farabi, do you want to have the absolute value of the difference? (abs(b-a))
I think integer will always be absolute.
That doesn't answer the question, Onan. Can you post a real example what you want to see?
Quote from: jj2007 on December 22, 2012, 04:08:20 AM
Quote from: Farabi on December 22, 2012, 03:48:19 AM
Quote from: qWord on December 21, 2012, 09:07:03 PM
Farabi, do you want to have the absolute value of the difference? (abs(b-a))
I think integer will always be absolute.
That doesn't answer the question, Onan. Can you post a real example what you want to see?
Sorry maybe Im misunderstood with the term "absolute".
I need to substract a pixel and check if the result was less than ten, and to check all of that I need all to be positive, if is it negative, I need to add another cycle to check wheter it negative or not.
mov edx,bm.bmBits
movd xmm1,[edx]
pxor xmm0,xmm0
psubb xmm0,xmm1
lea edx,buff
movd [edx],xmm0
You want me to post the whole project?
I cant use "pshufd" why?
perhaps it's the addressing mode ?
it's SSE2 - i figure your CPU does that
you must be using .686/.MMX/.XMM or the other instructions wouldn't work :P
Quote from: dedndave on December 22, 2012, 01:06:53 PM
perhaps it's the addressing mode ?
it's SSE2 - i figure your CPU does that
you must be using .686/.MMX/.XMM or the other instructions wouldn't work :P
I can use "psubb" which is SSE2 instruction but not "pshufd" why?
Quote from: Farabi on December 22, 2012, 01:21:41 PM
Quote from: dedndave on December 22, 2012, 01:06:53 PM
perhaps it's the addressing mode ?
it's SSE2 - i figure your CPU does that
you must be using .686/.MMX/.XMM or the other instructions wouldn't work :P
I can use "psubb" which is SSE2 instruction but not "pshufd" why?
error message?
Also remarks the instruction pcmp
GTb (and pcmpEQb)
Quote from: qWord on December 22, 2012, 01:32:43 PM
Quote from: Farabi on December 22, 2012, 01:21:41 PM
Quote from: dedndave on December 22, 2012, 01:06:53 PM
perhaps it's the addressing mode ?
it's SSE2 - i figure your CPU does that
you must be using .686/.MMX/.XMM or the other instructions wouldn't work :P
I can use "psubb" which is SSE2 instruction but not "pshufd" why?
error message?
Also remarks the instruction pcmpGTb (and pcmpEQb)
Here is the error message "error A2008: syntax error : xmm"
I Used JWAsm and it worked.
Quote from: jj2007 on December 21, 2012, 08:26:06 PM
Something like this?
.code
whatever db 123, -127, 99, 255, 128, 127, 0, 100, 123, -127, 99, 255, 128, 127, 0, 100
start:
mov eax, 7f7f7f7fh
movd xmm1, eax
pshufd xmm1, xmm1, 0
movups xmm0, oword ptr whatever
andps xmm0, xmm1
JJ I think you just remove the positive sign bit, not change the value to a correct positive value.
maybe you can make an SSE equiv based on this concept
mov eax,n
cdq
xor eax,edx
sub eax,edx
the byte version would be
mov al,n
cbw
xor al,ah
sub al,ah
Quote from: Farabi on December 22, 2012, 03:00:56 PM
JJ I think you just remove the positive sign bit,
Yes
Quotenot change the value to a correct positive value.
Until this point, you have not explained what the "correct positive value" would be. If your negative input byte is -123, is the "correct positive value" +123, or zero, or what? GIVE US A RULE.
We need more context. For example, how often will the negative value happen? You can detect it with a member of the pcmpGTb family (as mentioned by qWord), and then manipulate the bytes accordingly, either with "normal" or SSE instructions.
i think what he's saying is...
if you remove the sign bit from a positive number, no adjustment is necessary (nothing happens)
if you remove the sign bit from a negative number, it needs adjustment
11111111 is = -1
01111111 is not = +1
Dave, you can't remove the sign bit from a positive number :eusa_naughty:
Jokes apart, let's wait if Farabi can formulate a rule...
I want -123 become 123. -1 become 1. THat is it.
I can simply use neg, but still it need another cycle for checking each bytes for negative. I though there are a single instruction for this.
Ok, now I understand. My idea was to go this road:
.data
whatever db 123, -127, 99, 255, 128, 127, 0, 100, 123, -127, 99, 255, 128, 127, 0, 100
.code
movups xmm0, oword ptr whatever
movups xmm1, oword ptr whatever
mov eax, 7f7f7f7fh
movd xmm2, eax
pshufd xmm2, xmm2, 0
int 3
pcmpgtb xmm1, xmm2
pmovmskb eax, xmm1 ; set byte mask in eax
Status after pcmpgtb:
XMM0 64007F80 FF63817B 64007F80 FF63817B
XMM1 00000000 00000000 00000000 00000000
XMM2 7F7F7F7F 7F7F7F7F 7F7F7F7F 7F7F7F7F
Bad luck, I expected some bytes in Xmm1 set to FF. Right now I have no time to investigate further. Launch Olly and try your luck :icon14:
MMX and SSE are not exactly my forte'
but - i wouldn't mind getting my feet wet :P
not sure what the best way is to set an XMM register to all 0's
but, let's say XMM1 is all 0's
pcmpgtb xmm1,oword ptr whatever
this does the CBW for us (equivalent in this case - doesn't actually make them words)
for each byte in "whatever", the corresponding byte is XMM1 is all 1's if the source byte is negative
from there, you can do the same thing as i showed you earlier
you get the "whatever" bytes into XMM2 (again, not sure what the best way is)
then XOR xxm2,xmm1 (not sure what the instruction is)
then SUB (bytes) xxm2,xmm1 (not sure what the instruction is)
mov al,n
cbw
xor al,ah
sub al,ah
I got it!
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
.data
whatever db 123, 99, 127, -127, 255, 128, 0, 100, 123, -127, 255, 128, 99, 127, 0, 100
Init
movups xmm0, oword ptr whatever
movups xmm1, oword ptr whatever
or eax, -1
movd xmm2, eax
pshufd xmm2, xmm2, 0
pcmpgtb xmm1, xmm2
pmovmskb eax, xmm1 ; set byte mask in eax
Inkey Right$(Bin$(eax), 16)
Exit
end start
Output (read from right to left):
1111000111000111
i was close :P
my first whack at SSE
i don't know why this doesn't do anything - lol
;###############################################################################################
.XCRef
.NoList
INCLUDE \Masm32\Include\Masm32rt.inc
.686p
.MMX
.XMM
INCLUDE \Masm32\Macros\Timers.asm
.List
;###############################################################################################
.DATA
oData db 123,99,127,-127,-1,-128,0,100,123,-127,-1,-128,99,127,0,100
;###############################################################################################
.CODE
;***********************************************************************************************
_main PROC
call show16s
movups xmm0,oword ptr oData
xorps xmm1,xmm1
pcmpgtb xmm1,xmm0
xorps xmm0,xmm1
psubb xmm0,xmm1
movups oword ptr oData,xmm0
call show16u
inkey
exit
_main ENDP
;***********************************************************************************************
show16s PROC
mov esi,offset oData
mov ebx,16
sh16s0: movsx eax,byte ptr [esi]
print str$(eax),44,32
inc esi
dec ebx
jnz sh16s0
print chr$(13,10)
ret
show16s ENDP
;***********************************************************************************************
show16u PROC
mov esi,offset oData
mov ebx,16
sh16u0: movzx eax,byte ptr [esi]
print ustr$(eax),44,32
inc esi
dec ebx
jnz sh16u0
print chr$(13,10)
ret
show16u ENDP
;###############################################################################################
END _main
Try my version :biggrin:
-123 +99 +127 -127 -1 -128 +0 +100 +123 -127 -1 -128 +99 +127 +0 -100
+123 +99 +127 +127 +1 +127 +0 +100 +123 +127 +1 +127 +99 +127 +0 +100
Besides, your code seems to produce exactly the same result :t
123, 99, 127, -127, -1, -128, 0, 100, 123, -127, -1, -128, 99, 127, 0, 100,
123, 99, 127, 127, 1, 128, 0, 100, 123, 127, 1, 128, 99, 127, 0, 100,
Well, almost. Is byte 128 positive?
;-)
oops
i was using ML version 6.14
the PCMPGTB and PSUBB instructions were using MMX registers, not XMM registers :biggrin:
that's my first SSE code :eusa_dance:
what ? - you haven't compared timing ?
is that someone else using Jochen's ID ????? :lol:
oh - didn't see the question
the bit pattern 10000000 is -128 as a signed byte, +128 as an unsigned byte
it is a special case (like 0) because, to negate it, you do nothing :P
of course, +128 is not in the range of signed byte values
Try my version :biggrin:
-123 +99 +127 -127 -1 -128 +0 +100 +123 -127 -1 -128 +99 +127 +0 -100
+123 +99 +127 +127 +1 +127 +0 +100 +123 +127 +1 +127 +99 +127 +0 +100
Quote from: dedndave on December 23, 2012, 03:15:58 AM
oh - didn't see the question
That was more a rhetorical question (and your answer is a lil' bit misleading, too - the whole thread is on signed bytes...)
anyways, this seems to work ok...
movups xmm0,oword ptr oData
xorps xmm1,xmm1
pcmpgtb xmm1,xmm0
xorps xmm0,xmm1
psubb xmm0,xmm1
movups oword ptr oData,xmm0
Quote from: jj2007 on December 23, 2012, 03:21:37 AM
and your answer is a lil' bit misleading, too - the whole thread is on signed bytes...
they are no longer signed if you take the absolute value, eh ?
besides - you cannot have +128 in the world of signed bytes
so, you must consider them to be unsigned
Yes, but Farabi wants positive bytes. 128 is not a positive byte, so my code converts it to +127...
(did you know that around Christmas people get nervous and stressed and start wars for virtually no reason? :icon_mrgreen:)
;)
128 is positive if you regard it as unsigned
range for unsigned bytes: 0 to 255
range for signed bytes: -128 to +127
MOVUPS, XORPS ... and that with byte data :eusa_naughty:
Quote from: dedndave on December 23, 2012, 03:39:18 AM
128 is positive if you regard it as unsigned
That's actually true! So we can help Farabi with much shorter code, since 129-255 are also positive when regarded as unsigned:
nop
@qWord: XORPS--Bitwise Logical XOR
... but I would be grateful for a link to some Intel or AMD source that explains in more detail what are the risks of using movups/movaps for integers. Agner Fog's microarchitecture (http://www.agner.org/optimize/microarchitecture.pdf), page 88, offers a fascinating lecture in this respect - see the part on latency & throughput.
yah - but the difference is
we do not have -129 to -255 as possible input values
we DO have -128 as a possible input value
this is the nature of two's compliment
i know you have been down this road before - lol
Quote from: dedndave on December 23, 2012, 05:58:28 AM
we do not have -129 to -255 as possible input values
we DO have -128 as a possible input value
And I thought the whole point of this thread was to turn negative signed bytes into positive signed bytes. Now, is 128 aka 80h a signed positive byte? What does
mov byte ptr [esi], 128
movsx eax, byte ptr [esi]
return?
Quote from: jj2007 on December 23, 2012, 05:42:37 AM@qWord: XORPS--Bitwise Logical XOR
... but I would be grateful for a link to some Intel or AMD source that explains in more detail what are the risks of using movups/movaps for integers.
Why do you think they introduce different instructions that seems to do same? For fun?
Even, it has several times showed by tests of yourself (sorry I forgot the topics, but one was about your macros) that your habit of using wrong typed instructions cause speed issues on recent processors.
How boring. Bring evidence.
QuoteThe important conclusion here is that there is a penalty in terms of latency to using an XMM
instruction of the wrong type on the Nehalem. On previous Intel processors there is no
penalty for using move and shuffle instructions on other types of operands than they are
intended for.
The bypass delay is important in long dependency chains where latency is a bottleneck, but
not where it is throughput rather than latency that matters. In fact, the throughput may
actually be improved by using the integer vector versions of the move and Boolean
instructions
Quote from: Intel® 64 and IA-32 Architectures Optimization Reference Manual3.5.1.9 Mixing SIMD Data Types
Previous microarchitectures (before Intel Core microarchitecture) do not have
explicit restrictions on mixing integer and floating-point (FP) operations on XMM
registers. For Intel Core microarchitecture, mixing integer and floating-point opera-
tions on the content of an XMM register can degrade performance. Software should
avoid mixed-use of integer/FP operation on XMM registers. Specifically,
- Use SIMD integer operations to feed SIMD integer operations. Use PXOR for
idiom.
- Use SIMD floating point operations to feed SIMD floating point operations. Use
XORPS for idiom. - When floating point operations are bitwise equivalent, use PS data type instead
of PD data type. MOVAPS and MOVAPD do the same thing, but MOVAPS takes one
less byte to encode the instruction.
Intel recommendations are one thing, evidence is a different one. The latter is a testbed showing whether using movups instead of movdqu does degrade performance (not "can" degrade performance). Go ahead, set up a testbed, and let's have some fun in the lab :icon14:
Thanks for the trouble.
I see that there is difficulty to determine wheter -1 and 255 is the same or not. So I dont think this should be done after the substraction and cannot be done after all substraction were done. But judgjing that there would be no substraction yield a result of 255 we can assume 255 is never exist and treat it as -1.
Quote from: jj2007 on December 23, 2012, 08:32:05 AMcan[/b]" degrade performance). Go ahead, set up a testbed, and let's have some fun in the lab :icon14:
that is boring ;)
Quote from: Farabi on December 23, 2012, 08:42:28 AM
Thanks for the trouble.
I see that there is difficulty to determine wheter -1 and 255 is the same or not. So I dont think this should be done after the substraction and cannot be done after all substraction were done. But judgjing that there would be no substraction yield a result of 255 we can assume 255 is never exist and treat it as -1.
no, that's not the issue
the issue is: what to do with the value -128
and, are the resulting bytes signed or unsigned
range for unsigned bytes: 0 to 255
range for signed bytes: -128 to +127
so, when you convert -128 to a positive value, it exceeds the range of signed bytes
it boils down to: do you expect all resulting bytes to be representable with 7-bits
the other issue is: which sse mov instruction to use :P
Quote from: dedndave on December 24, 2012, 06:39:45 AM
Quote from: Farabi on December 23, 2012, 08:42:28 AM
Thanks for the trouble.
I see that there is difficulty to determine wheter -1 and 255 is the same or not. So I dont think this should be done after the substraction and cannot be done after all substraction were done. But judgjing that there would be no substraction yield a result of 255 we can assume 255 is never exist and treat it as -1.
no, that's not the issue
the issue is: what to do with the value -128
and, are the resulting bytes signed or unsigned
range for unsigned bytes: 0 to 255
range for signed bytes: -128 to +127
so, when you convert -128 to a positive value, it exceeds the range of signed bytes
it boils down to: do you expect all resulting bytes to be representable with 7-bits
the other issue is: which sse mov instruction to use :P
Yes youre right, I better convert the bytes to word and then do the substraction so it will fit the bits. I just want to substract a pixel with it neightbour and then check if it is below ten, if it was, then it was a different pixel, if it was not, then it is the same pixel just different intensity.
then, this is good code - no need for words
movups xmm0,oword ptr oData
xorps xmm1,xmm1
pcmpgtb xmm1,xmm0
xorps xmm0,xmm1
psubb xmm0,xmm1
movups oword ptr oData,xmm0
i am just not sure if i am using MOVUPS correctly - there may be a better instruction for that
qWord and Jochen were discussing it - then they went to discussing 64-bit moves :dazzled:
i don't understand the outcome - lol
according to qWord, i should use PXOR instead of XORPS
the documents i read didn't seem to say anything about that
some testing my be needed
Hi
movups xmm0,oword ptr oData
xorps xmm1,xmm1
pcmpgtb xmm1,xmm0
xorps xmm0,xmm1
psubb xmm0,xmm1
movups oword ptr oData,xmm0
Can you please tell me what this code do ? I dont understand the xorps part.
it does this, but SSE on 16 bytes at once
mov al,n
cbw
xor al,ah
sub al,ah
if n is negative, then CBW makes AH = 0FFh
if n is positive, then CBW makes AH = 0
if n is negative, then XOR inverts all the bits
if n is positive, then XOR does nothing to AL
if n is negative, then SUB AL,AH adds one to AL (subtracts -1)
if n is positive, then SUB AL,AH does nothing (AH = 0)
the idea is this:
one way to negate a value is to invert all the bits, then add 1
(you could also subtract 1, then invert all the bits)
for absolute value, if the initial value is positive, we do not want to negate it
Quote
XORPS—Bitwise Logical XOR for Single-Precision Floating-Point Values
Description
------------------------
Performs a bitwise logical exclusive-OR of the four packed single-precision floating-point values from the source
operand (second operand) and the destination operand (first operand), and stores the result in the destination
operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an
XMM register.
Quote from: dedndave on December 24, 2012, 03:05:36 PM
it does this, but SSE on 16 bytes at once
mov al,n
cbw
xor al,ah
sub al,ah
if n is negative, then CBW makes AH = 0FFh
if n is positive, then CBW makes AH = 0
if n is negative, then XOR inverts all the bits
if n is positive, then XOR does nothing to AL
if n is negative, then SUB AL,AH adds one to AL (subtracts -1)
if n is positive, then SUB AL,AH does nothing (AH = 0)
the idea is this:
one way to negate a value is to invert all the bits, then add 1
for absolute value, if the initial value is positive, we do not want to negate it
Great idea. :U
Here is my code so far
SSECmpPixel proc uses esi edi pix1:dword,pix2:dword,tol:dword
LOCAL buff[256]:dword
lea esi,buff
movd xmm0,pix1
movd xmm1,pix2
psubb xmm0,xmm1
movd [esi],xmm0
test byte ptr[esi],10000000b
.if !ZERO?
neg byte ptr[esi]
.endif
test byte ptr[esi+1],10000000b
.if !ZERO?
neg byte ptr[esi+1]
.endif
test byte ptr[esi+2],10000000b
.if !ZERO?
neg byte ptr[esi+2]
.endif
test byte ptr[esi+3],10000000b
.if !ZERO?
neg byte ptr[esi+3]
.endif
movd xmm0,[esi]
movd xmm1,tol
pcmpgtb xmm0,xmm1
movd eax,xmm0
ret
SSECmpPixel endp
This code do the same, but without SSE, and it is faster.
CmpPix proc uses esi edi ecx pix1:dword,pix2:dword,tol:dword
LOCAL r3,g3,b3,r4,g4,b4:dword
LOCAL rr,rg,rb:dword
mov eax,pix1
mov edx,pix2
movzx ecx,al
mov r3,ecx
shr eax,8
movzx ecx,al
mov g3,ecx
shr eax,8
movzx ecx,al
mov b3,ecx
movzx ecx,dl
mov r4,ecx
shr edx,8
movzx ecx,dl
mov g4,ecx
shr edx,8
movzx ecx,dl
mov b4,ecx
mov eax,r3
sub eax,r4
cmp eax,0
jg @f
neg eax
@@:
cmp eax,tol
jle @f
xor eax,eax
mov ecx,0
ret
@@:
mov rr,eax
mov eax,g3
sub eax,g4
cmp eax,0
jg @f
neg eax
@@:
cmp eax,tol
jle @f
xor eax,eax
mov ecx,1
ret
@@:
mov rg,eax
mov eax,b3
sub eax,b4
cmp eax,0
jg @f
neg eax
@@:
cmp eax,tol
jle @f
xor eax,eax
mov ecx,2
ret
@@:
xor eax,eax
inc eax
mov ecx,3
ret
CmpPix endp
I dont think SSE that great.
Quote from: Farabi on December 24, 2012, 04:47:03 PMI dont think SSE that great.
because you use the wrong approach.
.data
align 16
msk1 LABEL OWORD
db 16 dup (1)
.code
movdqa xmm0,16 bytes
pcmpeqb xmm3,xmm3
pxor xmm1,xmm1
pcmpgtb xmm1,xmm0
movdqa xmm2,xmm1
pandn xmm2,xmm0
pand xmm1,xmm0
pandn xmm1,xmm3
paddb xmm1,msk1
por xmm1,xmm2
; xmm1 = abs(xmm0)
EDIT: daves solution is of course much better :t
Quote from: dedndave on December 24, 2012, 02:55:23 PM
according to qWord, i should use PXOR instead of XORPS
the documents i read didn't seem to say anything about that
some testing my be needed
Xmas present for you :biggrin:
m2m ecx, 7
LoopAlign ; same for all algos
.Repeat
movups xmm0, OWORD PTR [esi]
?xor? xmm1, xmm1
pcmpgtb xmm1, xmm0
?xor? xmm0, xmm1
psubb xmm0, xmm1
movups OWORD PTR [edi], xmm0
dec ecx
.Until Sign?
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Testing with 10000000 loops
453 ms for psubb with pxor
453 ms for psubb with xorpd
454 ms for psubb with xorps
453 ms for psubb with pxor
454 ms for psubb with xorpd
453 ms for psubb with xorps
453 ms for psubb with pxor
452 ms for psubb with xorpd
454 ms for psubb with xorps
27 bytes for psubb with pxor
27 bytes for psubb with xorpd
25 bytes for psubb with xorps
Quote from: dedndave on December 24, 2012, 02:55:23 PMaccording to qWord, i should use PXOR instead of XORPS
the documents i read didn't seem to say anything about that
Intel's optimization manual says: "Use SIMD integer operations to feed SIMD integer operations. Use PXOR for
idiom"
:t
thanks qWord
Jochen - didn't run the test
i know you don't want to see P4 results - lol
what you want is newer CPU results
should work
CmpPix proc uses esi edi ecx pix1:dword,pix2:dword,tol:dword
movd xmm0, pix1
movd xmm1, pix2
movd xmm5, tol ; for the sake of simplicity every byte of "tol" has same value
; difference between 2 values (not specifically between 0 and a value)
pxor xmm4, xmm4
movdqa xmm3, xmm1
psubusb xmm1, xmm0
pcmpeqb xmm4, xmm1
pand xmm0, xmm4
psubusb xmm0, xmm3
por xmm0, xmm1 ; xmm0[7:0], xmm[15:8], xmm[23:16] = diff betw colors
pxor xmm4, xmm4
psubusb xmm0, xmm5
pcmpeqb xmm0, xmm4
pmovmskb eax, xmm0 ; low 3 bits indicate if color component was different or not
; optional
not eax
and eax, 0xffff
bsf ecx, eax ; ecx = 1st color component that was different
;jz all_identical
ret
CmpPix endp
Quote from: hool on December 25, 2012, 03:24:03 AM
should work
CmpPix proc uses esi edi ecx pix1:dword,pix2:dword,tol:dword
movd xmm0, pix1
movd xmm1, pix2
movd xmm5, tol ; for the sake of simplicity every byte of "tol" has same value
; difference between 2 values (not specifically between 0 and a value)
pxor xmm4, xmm4
movdqa xmm3, xmm1
psubusb xmm1, xmm0
pcmpeqb xmm4, xmm1
pand xmm0, xmm4
psubusb xmm0, xmm3
por xmm0, xmm1 ; xmm0[7:0], xmm[15:8], xmm[23:16] = diff betw colors
pxor xmm4, xmm4
psubusb xmm0, xmm5
pcmpeqb xmm0, xmm4
pmovmskb eax, xmm0 ; low 3 bits indicate if color component was different or not
; optional
not eax
and eax, 0xffff
bsf ecx, eax ; ecx = 1st color component that was different
;jz all_identical
ret
CmpPix endp
:t Thanks, great job.
Hool:
The speed is impressive. 4 times faster than my fastest compare algo.
(http://www.iquilezles.org/prods/dem_41_p.jpg)
The white color is example of same color but different intensity, but on real life, sometime some color even the same, is distorted, it had different pattern of color.
For example, a color consist of (RGB) 22h-22h-22h is the same with 66h-66h-66h but in real life, sometime the color become 66h-64h-65h and it make my algo confused to determine wheter if this pixel is the same color or not. So, now I decided, if the substraction had equal result, it is the same. For example, 33h-33h-33h substract with 22h-22h-22h which have result RGB 11h for R and 11h for G and 11h for B is the same color because the intensity is the same. But, if the R is 10 and the G is 9 and the B is 10 it was a different color because the result component is different. But to make thing easier, I make a tolerance( I used word "Tol") to 10d for the difference tobe acceptable.
color space is not a cube, either
R, G, and B each have different non-linear weights
this will approximate the "distance" between 2 colors
(http://img692.imageshack.us/img692/5482/colordiffapproxmod.png)
there are probably a number of good short-cuts :P
this equation is a short-cut, already
the real formula would be extremely complex
faster version of absolute difference between 2 values
simply:movdqa xmm3, xmm1
psubusb xmm1, xmm0
psubusb xmm0, xmm3
por xmm0, xmm1
Quote from: dedndave on December 26, 2012, 01:29:11 AM
color space is not a cube, either
R, G, and B each have different non-linear weights
this will approximate the "distance" between 2 colors
(http://img692.imageshack.us/img692/5482/colordiffapproxmod.png)
there are probably a number of good short-cuts :P
this equation is a short-cut, already
the real formula would be extremely complex
Anyway dave, where did you get that formula, that formula is exactly what Im looking for.
sorry - i should have mentioned that
it is a slightly modified version of Thiadmer's equation found here...
http://www.compuphase.com/cmetric.htm (http://www.compuphase.com/cmetric.htm)
he based it on previous work by Charles Poynton, who is a guru for such things
then, he did some experimentation and testing to arrive at that equation