Hi all!
I would like to convert colors represented in eax as 00RRGGBBh format to COLORREF type 00BBGGRRh and thus abandon the directive .486
Now it looks like this:
.486
.code
shl eax, 8
bswap eax
As you can easily do it with the .386 directive?
.386
.model flat
.code
start:
; 0RGB
; 0BGR
movzx edx, al ; 000B
shl edx, 2*8 ; 0B00
shr eax, 8 ; eax=00RG
mov dh, al ; 0BG
mov dl, ah ; 0BGR
ret
end start
Now your problem is to find a CPU that supports such ancient code :biggrin:
if you are writing code for a machine that actually has a 386 - ok
otherwise - best to use the extended instruction set
however, i believe they are out there, in the world of embedded controllers, etc
push eax
mov ah,[esp+2]
mov [esp+2],al
mov [esp],ah
pop eax
maybe slightly faster...
push eax
mov [esp+2],al
shr eax,8
mov [esp],ah
pop eax
it only uses the source/dest register
could make a macro so it can be done on EAX, EBX, ECX, or EDX
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
2066 cycles for 100*push etc
256 cycles for 100*mov etc
2065 cycles for 100*push etc
261 cycles for 100*mov etc
Thank you!
All three solutions are interesting to me to learn assembly language. I understand and agree that the extended instruction set has the advantage here. In my case, there is no restriction on the hardware, but performance is not critical.
The solution from jj2007, like a quick, and two solutions from dedndave not affect the values of other registers.
ouch - lol
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
11766 cycles for 100*push etc
284 cycles for 100*mov etc
11752 cycles for 100*push etc
288 cycles for 100*mov etc
even worse on my P4
Hi,
If the DWORD is in memory, or you can put it into
memory, you might try the following.
RGB DD 00RRGGBB ; Dummy value (won't assemble).
; That would be the same as
RGB DB BB, GG, RR, 0 ; Dummies again. Little endian storage.
; Then
MOV AL,[RGB]
EXCH AL,[RBG+2] ; Use AH and two MOVes for speed?
MOV [RGB],AL
Regards,
Steve N.
Try this on an antique processor.
mov eax, 00998877h
push eax
print hex$(eax),13,10
pop eax
xchg al, ah
ror eax, 16
xchg al, ah
ror eax, 8
print hex$(eax),13,10
i seem to recall a bit-swap thread in the old forum
something about a "stir fry" method - lol
Here are 3 variations.
xchg al, ah
ror eax, 16
xchg al, ah
ror eax, 8
ror eax, 24
xchg al, ah
ror eax, 16
xchg al, ah
rol ax, 8
rol eax, 16
rol ax, 8
rol eax, 24
More examples. The last options is attractive, must be the fastest - all in the registers & just one register used. Thank you for your interesting ideas!
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
1321 cycles for 100*push etc Dave
261 cycles for 100*mov etc JJ
1036 cycles for 100*Hutch 1
1042 cycles for 100*Hutch 2
1044 cycles for 100*Hutch 3
730 cycles for 100*Hutch 4
1321 cycles for 100*push etc Dave
263 cycles for 100*mov etc JJ
1034 cycles for 100*Hutch 1
1039 cycles for 100*Hutch 2
1047 cycles for 100*Hutch 3
730 cycles for 100*Hutch 4
hang on a second - lol
you're telling me that it only takes ~2.6 cycles to execute 5 instructions,
which includes MOVZX and 2 shifts with a count ?
i smell an Italian fish :P
NOTE that I recommended testing them on antique processors, BSWAP is a better option on anything later. :biggrin:
I think it is still faster and smaller using BSWAP.
bswap eax
shr eax,8
Quote from: dedndave on June 21, 2012, 07:42:34 AM
i smell an Italian fish :P
Typical reaction from the same kind of people who refuse to admit that Marconi invented the radio :eusa_boohoo:
Besides, you have the source attached. Make it faster instead of ranting here, Dave :lol:
Since in most instances an RGB value comes as a memory operand, here is a small proc that should perform OK. If you are genuinely serious on late model hardware (SSE4) there is a very fast masked instruction that will do 4 at a time.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
flip_rgb proc rgb:DWORD ; input as memory operand
; ------------------
; swap the two bytes
; ------------------
mov al, [esp+4+2]
mov cl, [esp+4]
mov [esp+4], al
mov [esp+4+2], cl
mov eax, [esp+4] ; result in EAX
ret 4
flip_rgb endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
here's a macro for converting immediate constants
it will convert RGB to BGR or convert BGR to RGB :biggrin:
RgbSwap MACRO rgb
EXITM % ((rgb AND 0FF0000h) SHR 16) OR (rgb AND 0FF00h) OR ((rgb AND 0FFh) SHL 16)
ENDM
mov eax,RgbSwap(804020h)
;EAX = 204080h
either way, Hutch's age = green :P
how's it feel to be 40 again ?
Quote from: jj2007 on June 21, 2012, 06:06:19 PMTypical reaction from the same kind of people who refuse to admit that Marconi invented the radio :eusa_boohoo:
i am a big fan of Marconi
in fact, i have used some old equipment with the Marconi brand-name on it
even by todays standards - some good stuff :P
but i think Maxwell, Hertz, and Faraday preceeded him in the discovery of the properties of electro-magnetic waves
Dave,
Cute macro but almost exclusively those types of conversions are dynamic and usually done at high speed.
i thought you'd like it because you were 64 going in and 40h going out - lol
I would do it the other way round, nothing wrong with being 64.
Quote from: dedndave on June 21, 2012, 07:42:34 AM
you're telling me that it only takes ~2.6 cycles to execute 5 instructions,
which includes MOVZX and 2 shifts with a count ?
What is strange in that fact?
with dependancies, it should be at least 7 cycles or so
AMD Phenom(tm) II X6 1100T Processor (SSE3)
1308 cycles for 100*push etc Dave
173 cycles for 100*mov etc JJ
536 cycles for 100*Hutch 1
535 cycles for 100*Hutch 2
534 cycles for 100*Hutch 3
355 cycles for 100*Hutch 4
>it should be at least 7 cycles or so
Why?
Seems my Core2 quad does not like any of them much. I think the culprit is XCHG in the 1st 3.
Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz (SSE4)
1437 cycles for 100*push etc Dave
221 cycles for 100*mov etc JJ
1047 cycles for 100*Hutch 1
1045 cycles for 100*Hutch 2
1057 cycles for 100*Hutch 3
698 cycles for 100*Hutch 4
1443 cycles for 100*push etc Dave
221 cycles for 100*mov etc JJ
1050 cycles for 100*Hutch 1
1049 cycles for 100*Hutch 2
1055 cycles for 100*Hutch 3
698 cycles for 100*Hutch 4
--- ok ---
Hi,
Amusing to do a percent change with Hutch's new CPU.
pre-P4 (SSE1)
1367 cycles for 100*push etc Dave 95%
265 cycles for 100*mov etc JJ 120%
1680 cycles for 100*Hutch 1 160%
1680 cycles for 100*Hutch 2 161%
1676 cycles for 100*Hutch 3 159%
1413 cycles for 100*Hutch 4 202%
1366 cycles for 100*push etc Dave
265 cycles for 100*mov etc JJ
1680 cycles for 100*Hutch 1
1680 cycles for 100*Hutch 2
1676 cycles for 100*Hutch 3
1411 cycles for 100*Hutch 4
--- ok ---
Cheers.
Steve N.
Here is a quick tweak using JJ's test bed that substitutes one of the slow XCHG versions with a simple memory based byte swap. Its purpose was to work directly on the memory rather than converting it back and forth in registers.
This is the code change.
invoke Sleep, 100
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
REPEAT 100
; ror eax, 24
; xchg al, ah
; ror eax, 16
; xchg al, ah
; =========================
mov cl, [ebp]
mov dl, [ebp+2]
mov [ebp+2], cl
mov [ebp], dl
; =========================
ENDM
counter_end
print str$(eax), 9, "cycles for 100*Hutch 3 --- mem op version", 13, 10
These are the results on my Core2 quad.
Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz (SSE4)
1437 cycles for 100*push etc Dave
221 cycles for 100*mov etc JJ
1047 cycles for 100*Hutch 1
1045 cycles for 100*Hutch 2
605 cycles for 100*Hutch 3 --- mem op version
695 cycles for 100*Hutch 4
1436 cycles for 100*push etc Dave
221 cycles for 100*mov etc JJ
1047 cycles for 100*Hutch 1
1045 cycles for 100*Hutch 2
605 cycles for 100*Hutch 3 --- mem op version
696 cycles for 100*Hutch 4
--- ok ---
Here's your new one hutch, 'yuck' says AMD
AMD Phenom(tm) II X6 1100T Processor (SSE3)
1295 cycles for 100*push etc Dave
174 cycles for 100*mov etc JJ
533 cycles for 100*Hutch 1
532 cycles for 100*Hutch 2
1173 cycles for 100*Hutch 3 --- mem op version
353 cycles for 100*Hutch 4
1290 cycles for 100*push etc Dave
173 cycles for 100*mov etc JJ
532 cycles for 100*Hutch 1
533 cycles for 100*Hutch 2
1167 cycles for 100*Hutch 3 --- mem op version
353 cycles for 100*Hutch 4
Quote from: sinsi on June 23, 2012, 12:22:30 AM
AMD Phenom(tm) II X6 1100T Processor (SSE3)
1295 cycles for 100*push etc Dave
174 cycles for 100*mov etc JJ
1.74 cycles = witchcraft or worse, according to Dave :dazzled:
abra abra cadabra, baby 8)
http://www.youtube.com/watch?v=wCuTrfTfGd0 (http://www.youtube.com/watch?v=wCuTrfTfGd0)