News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

How to convert 00RRGGBBh to 00BBGGRRh without bswap

Started by mc-black, June 21, 2012, 04:45:30 AM

Previous topic - Next topic

jj2007

Quote from: dedndave on June 21, 2012, 07:42:34 AM
i smell an Italian fish   :P

Typical reaction from the same kind of people who refuse to admit that Marconi invented the radio :eusa_boohoo:

Besides, you have the source attached. Make it faster instead of ranting here, Dave :lol:

hutch--

Since in most instances an RGB value comes as a memory operand, here is a small proc that should perform OK. If you are genuinely serious on late model hardware (SSE4) there is a very fast masked instruction that will do 4 at a time.



; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

flip_rgb proc rgb:DWORD     ; input as memory operand

  ; ------------------
  ; swap the two bytes
  ; ------------------
    mov al, [esp+4+2]
    mov cl, [esp+4]
    mov [esp+4], al
    mov [esp+4+2], cl

    mov eax, [esp+4]        ; result in EAX

    ret 4

flip_rgb endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

dedndave

here's a macro for converting immediate constants
it will convert RGB to BGR or convert BGR to RGB   :biggrin:
RgbSwap MACRO   rgb
        EXITM   % ((rgb AND 0FF0000h) SHR 16) OR (rgb AND 0FF00h) OR ((rgb AND 0FFh) SHL 16)
        ENDM


        mov     eax,RgbSwap(804020h)
;EAX = 204080h


either way, Hutch's age = green   :P
how's it feel to be 40 again ?

dedndave

Quote from: jj2007 on June 21, 2012, 06:06:19 PMTypical reaction from the same kind of people who refuse to admit that Marconi invented the radio :eusa_boohoo:

i am a big fan of Marconi
in fact, i have used some old equipment with the Marconi brand-name on it
even by todays standards - some good stuff   :P

but i think Maxwell, Hertz, and Faraday preceeded him in the discovery of the properties of electro-magnetic waves

hutch--

Dave,

Cute macro but almost exclusively those types of conversions are dynamic and usually done at high speed.

dedndave

i thought you'd like it because you were 64 going in and 40h going out - lol

hutch--

I would do it the other way round, nothing wrong with being 64.

Antariy

 
Quote from: dedndave on June 21, 2012, 07:42:34 AM
you're telling me that it only takes ~2.6 cycles to execute 5 instructions,
which includes MOVZX and 2 shifts with a count ?

What is strange in that fact?

dedndave

with dependancies, it should be at least 7 cycles or so

sinsi

AMD Phenom(tm) II X6 1100T Processor (SSE3)
1308    cycles for 100*push etc Dave
173     cycles for 100*mov etc JJ
536     cycles for 100*Hutch 1
535     cycles for 100*Hutch 2
534     cycles for 100*Hutch 3
355     cycles for 100*Hutch 4


>it should be at least 7 cycles or so
Why?

hutch--

Seems my Core2 quad does not like any of them much. I think the culprit is XCHG in the 1st 3.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
1437    cycles for 100*push etc Dave
221     cycles for 100*mov etc JJ
1047    cycles for 100*Hutch 1
1045    cycles for 100*Hutch 2
1057    cycles for 100*Hutch 3
698     cycles for 100*Hutch 4

1443    cycles for 100*push etc Dave
221     cycles for 100*mov etc JJ
1050    cycles for 100*Hutch 1
1049    cycles for 100*Hutch 2
1055    cycles for 100*Hutch 3
698     cycles for 100*Hutch 4

--- ok ---

FORTRANS

Hi,

   Amusing to do a percent change with Hutch's new CPU.

pre-P4 (SSE1)
1367    cycles for 100*push etc Dave  95%
265     cycles for 100*mov etc JJ    120%
1680    cycles for 100*Hutch 1       160%
1680    cycles for 100*Hutch 2       161%
1676    cycles for 100*Hutch 3       159%
1413    cycles for 100*Hutch 4       202%

1366    cycles for 100*push etc Dave
265     cycles for 100*mov etc JJ
1680    cycles for 100*Hutch 1
1680    cycles for 100*Hutch 2
1676    cycles for 100*Hutch 3
1411    cycles for 100*Hutch 4


--- ok ---


Cheers.

Steve N.

hutch--

Here is a quick tweak using JJ's test bed that substitutes one of the slow XCHG versions with a simple memory based byte swap. Its purpose was to work directly on the memory rather than converting it back and forth in registers.

This is the code change.


      invoke Sleep, 100
      counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
         REPEAT 100
;             ror eax, 24
;             xchg al, ah
;             ror eax, 16
;             xchg al, ah

                    ; =========================

                        mov cl, [ebp]
                        mov dl, [ebp+2]
                        mov [ebp+2], cl
                        mov [ebp], dl

                    ; =========================

             ENDM
      counter_end
      print str$(eax), 9, "cycles for 100*Hutch 3  --- mem op version", 13, 10


These are the results on my Core2 quad.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
1437    cycles for 100*push etc Dave
221     cycles for 100*mov etc JJ
1047    cycles for 100*Hutch 1
1045    cycles for 100*Hutch 2
605     cycles for 100*Hutch 3  --- mem op version
695     cycles for 100*Hutch 4

1436    cycles for 100*push etc Dave
221     cycles for 100*mov etc JJ
1047    cycles for 100*Hutch 1
1045    cycles for 100*Hutch 2
605     cycles for 100*Hutch 3  --- mem op version
696     cycles for 100*Hutch 4


--- ok ---


sinsi

Here's your new one hutch, 'yuck' says AMD

AMD Phenom(tm) II X6 1100T Processor (SSE3)
1295    cycles for 100*push etc Dave
174     cycles for 100*mov etc JJ
533     cycles for 100*Hutch 1
532     cycles for 100*Hutch 2
1173    cycles for 100*Hutch 3  --- mem op version
353     cycles for 100*Hutch 4

1290    cycles for 100*push etc Dave
173     cycles for 100*mov etc JJ
532     cycles for 100*Hutch 1
533     cycles for 100*Hutch 2
1167    cycles for 100*Hutch 3  --- mem op version
353     cycles for 100*Hutch 4

jj2007

Quote from: sinsi on June 23, 2012, 12:22:30 AM
AMD Phenom(tm) II X6 1100T Processor (SSE3)
1295    cycles for 100*push etc Dave
174     cycles for 100*mov etc JJ

1.74 cycles = witchcraft or worse, according to Dave :dazzled: