News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

I would like a lesson

Started by Grincheux, December 08, 2015, 05:18:30 AM

Previous topic - Next topic

Grincheux

Just title!

I am coding a program  that modifies colors in an image.

xRGB - 32 bits

What are the MMX or SSE or xxx that I could you use.
I don't know these instruction, perhaps some of them could help me,.... to be faster


jmp @Loop

; **********************************************************************************
ALIGN 16
; **********************************************************************************

@Loop :

mov eax,[edi]
and eax,00ffffffh
mov edx,eax
mov ebx,eax
shr edx,8
shr ebx,16
and eax,000000ffh
and edx,000000ffh
and ebx,000000ffh
add eax,ebx
add eax,edx
xor edx,edx
mov ebx,3
div bx
mov ah,al
shl eax,8
mov al,ah
mov [edi],eax
add edi,SIZEOF DWord
sub ecx,SIZEOF DWord
jnz @Loop

ret
Effect_Grey_Minimum ENDP


This is an example of what I do.
This is a grey conversion based on the average of REG / GREEN / BLUE
I would like to optimize it.

JJ2007 vexed me with my very slow program. I laugh.

jj2007

Quote from: Grincheux on December 08, 2015, 05:18:30 AM
JJ2007 vexed me with my very slow program. I laugh.

Sorry, no bad intentions :P

Have a look specifically at
pshuf*
andps
movlps, movups
pcmpeqb

dedndave

a little work on your algorithm is probably in order

avoid the use of DIV when possible
in the case of division by 3, it is easily avoided
you can use "multiply-to-divide" and it will be much faster

however, the gray-scale conversion can be performed another way...

Y = .587G + .299R + .114B

so, select scaling constants that will cause a total of 0FF_FFFFFFFFh and accumulate in EDX:EAX
when you are done, DL will have the luminance byte value without division   :biggrin:

Y = (LG + MR + NB) / 4294967296
L + M + N = FFFFFFFFFF / FF = 4311810305
L = .587 * 4311810305 = 2531032649
M = .299 * 4311810305 = 1289231281
N = .114 * 4311810305 = 491546375


so....

MOVZX EAX,byte ptr [blue]
MUL by 491546375
MOV ESI,EAX
MOV EDI,EDX

MOVZX EAX,byte ptr [red]
MUL by 1289231281
ADD ESI,EAX
ADC EDI,EDX

MOVZX EAX,byte ptr [green]
MUL by 2531032649
ADD EAX,ESI
ADC EDX,EDI

;DL = luminance byte


now, you could do this...

MOV byte ptr [blue],DL
MOV byte ptr [green],DL
MOV byte ptr [red],DL


but, faster, MUL DL by 10101h and write all 3 bytes as a dword

Grincheux

For Gray conversion there is your formula that I know but there are three others :

Minimum of all the components
Maximum
Average

That's all grey

dedndave

that's a poor shortcut

use the equation i posted - lol

Grincheux

JJ2007 thanks for your docs but I already have it.
I have searched width Google but they just say what the instruction does. One line by instruction!
Many instructions use floating points but me it is integers that I have. Is it a problem?
https://www.csie.ntu.edu.tw/~cyy/courses/assembly/docs/ch11_MMX.pdf This is the best I have found.
Many have gas syntax.

jj2007

Quote from: Grincheux on December 08, 2015, 06:49:24 AM
JJ2007 thanks for your docs but I already have it.

It's the best I found so far. The original Intel manuals are too big and detailed for my taste.

QuoteMany instructions use floating points but me it is integers that I have. Is it a problem?

No. SSE instructions that load, save or make bitwise operations do not distinguish between float, double or integer.
Btw avoid mmx, go for xmm. More bits and no conflict with the FPU.

Grincheux

For dividing, I know that it is very slow. I suggested an other way (http://masm32.com/board/index.php?topic=4886.0).
Many people have a formula for grey conversion. Difficult to make the choice.

Grey = 0.2125 Rouge + 0.7154 Vert + 0.0721 Bleu
Grey = 0.299 Rouge + 0.587 Vert + 0.114 Bleu

A site than can help in research http://www.tutorialspoint.com/dip/pdf/Gray_Level_Transformations.pdf

Siekmanski

#8
Hi Grincheux,

Here are 4 methods:

1) The lightness method:
   Averages the most prominent and least prominent colors: (max(R, G, B) + min(R, G, B)) / 2

2) The average method:
   Averages the values: (R + G + B) / 3

3) Colorimetric method:
   It forms a weighted average to account for human perception.
   Luminosity is: (R * 0.2126) + (G * 0.7152) + (B * 0.0722)

4) Luma coding method:
   This is how we perceive colors from TV's monitors etc.
   Luma is: (R * 0.299) + (G * 0.587) + (B * 0.114)


This is the SSE version of the average method.
It processes four 32 bit pixels at once.

EDIT: inserted roundup code to prevent overflow.


.data
align 16
div3 equ 256/3
MagicDiv3   dd div3,div3,div3,div3
ByteMask    dd 000000ffh,000000ffh,000000ffh,000000ffh
RoundUp     dd 3,3,3,3

rgb_data    db 255,123,11,0, 210,33,77,0, 239,111,178,0, 99,88,22,0
Average     dd 0,0,0,0

.code
    movaps  xmm0,oword ptr rgb_data
    movaps  xmm1,oword ptr ByteMask
    movaps  xmm2,xmm0   ; __ B3 G3 R3,  __ B2 G2 R2,  __ B1 G1 R1,  __ B0 G0 R0
    psrld   xmm0,8      ; __ __ B3 G3,  __ __ B2 G2,  __ __ B1 G1,  __ __ B0 G0
    movaps  xmm3,xmm0   ; __ __ B3 G3,  __ __ B2 G2,  __ __ B1 G1,  __ __ B0 G0
    psrld   xmm0,8      ; __ __ __ B3,  __ __ __ B2,  __ __ __ B1,  __ __ __ B0
    pand    xmm2,xmm1   ; __ __ __ R3,  __ __ __ R2,  __ __ __ R1,  __ __ __ R0
    pand    xmm3,xmm1   ; __ __ __ G3,  __ __ __ G2,  __ __ __ G1,  __ __ __ G0
    pand    xmm0,xmm1   ; __ __ __ B3,  __ __ __ B2,  __ __ __ B1,  __ __ __ B0
    paddd   xmm0,xmm3   ; B + G
    paddd   xmm0,xmm2   ; B + G + R
    paddd   xmm0,oword ptr RoundUp
    pmullw  xmm0,oword ptr MagicDiv3
    psrld   xmm0,8      ; __ __ __ A3,  __ __ __ A2,  __ __ __ A1,  __ __ __ A0
    movaps  xmm1,xmm0
    movaps  xmm2,xmm0
    pslld   xmm1,8      ; __ __ A3 __,  __ __ A2 __,  __ __ A1 __,  __ __ A0 __
    pslld   xmm2,16     ; __ A3 __ __,  __ A2 __ __,  __ A1 __ __,  __ A0 __ __
    pxor    xmm0,xmm1   ; __ __ A3 A3,  __ __ A2 A2,  __ __ A1 A1,  __ __ A0 A0
    pxor    xmm0,xmm2   ; __ A3 A3 A3,  __ A2 A2 A2,  __ A1 A1 A1,  __ A0 A0 A0
    movaps  oword ptr Average,xmm0 ; A3=130 A2=107 A1=177 A0=70


Marinus
Creative coders use backward thinking techniques as a strategy.

dedndave

Marinus...

1) this method makes no sense for individual-pixel conversion   :redface:

2) this method is a poor shortcut - simpler, yes - faster, probably not

3) this method is a color-contrast equation, used primarily for contrasting text comparisons

4) this is the proper method for converting to gray-shades
it is how we perceive light (not just from TVs, research human eye, rods and cones)
it is used for all sorts of things, not just simple image conversion (astronomy, facial recognition, bio-medical, etc)

no matter which method you use, an image can be created which will be totally ambiguous when converted
in other words, the 3 image colors may contrast,
but when converted to gray-shades, the entire image is a single shade
over a wide variety of images, the later method will perform the best

dedndave

give this a try
you won't be disappointed with speed or results

the _lpBmpData argument is a pointer to the image data (after the header)
it will be fastest if this address is 4-aligned

Bmp2Gray32 PROTO :DWORD,:DWORD,:LPVOID

Bmp2Gray32 PROC USES EBX ESI EDI _dwWidth:DWORD,_dwHeight:DWORD,_lpBmpData:LPVOID

    mov     eax,_dwWidth
    push    ebp
    mul     _dwHeight
    mov     ebx,_lpBmpData
    xchg    eax,ebp
    .repeat
        movzx   eax,byte ptr [ebx]           ;EAX = blue
        mov     ecx,491546375
        mul     ecx
        xchg    eax,esi
        mov     edi,edx
        movzx   eax,byte ptr [ebx+1]         ;EAX = green
        mov     ecx,2531032649
        mul     ecx
        add     esi,eax
        adc     edi,edx
        movzx   eax,byte ptr [ebx+2]         ;EAX = red
        mov     ecx,1289231281
        mul     ecx
        add     eax,esi
        adc     edx,edi
        mov     eax,10101h
        mul     edx
        or      eax,0FF000000h               ;opacity = 255
        mov     [ebx],eax
        dec     ebp
        lea     ebx,[ebx+4]
    .until ZERO?
    pop     ebp
    ret

Bmp2Gray32 ENDP

FORTRANS

Hi Dave,

Quote from: dedndave on December 08, 2015, 11:46:28 PM
1) this method makes no sense for individual-pixel conversion

   At first glance, I would tend to agree.  I may try it out though,
if I get curious.

Quote2) this method is a poor shortcut - simpler, yes - faster, probably not

   But it is commonly used.

Quote3) this method is a color-contrast equation, used primarily for contrasting text comparisons

   Based on the reference I use, this is closest to using S.M.P.T.E.
RGB primaries and YIQ or YUV intensity encoding.  The numbers
in his table are 0.212, 0.701, and 0.087.  These are close to the
numbers they posted.  Essentially an update of the N.T.S.B. values
to more modern phosphors (and so forth).

Quote4) this is the proper method for converting to gray-shades
it is how we perceive light (not just from TVs, research human eye, rods and cones)
it is used for all sorts of things, not just simple image conversion (astronomy, facial recognition, bio-medical, etc)

   This is using N.T.S.B. RGB primaries and the YIQ intensity channel.
Often used, but uses old phosphor emission curves.  This was
developed for color television transmission and reception in the U.S..

   If I look up in the table the C.I.E. colorimetric (actually "Spectral
Primary Color Coordinate System") RGB primarys and its YIQ
transformation, its factors are 0.177, 0.813, and 0.011.
This is a bit odd, as it is using tristimulus values of a "Standard
Observer" and TV transmission transforms.  But it also brings
up the old fast color to gray algorithm, Gray = Green.  Remember
that one from "the good old days"?

   In practice, the N.T.S.B. YIQ intensity transformation works
well on most common images.  YMMV.

Regards,

Steve N.

Grincheux

Quote
the _lpBmpData argument is a pointer to the image data (after the header)
it will be fastest if this address is 4-aligned

The BITMAPINFO Structure is 44 bytes length so I suppose it is aligned.


.Data
ALIGN 4

TabDiv3 Byte 0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5
Byte 5,5,6,6,6,7,7,7,8,8,8,9,9,9,10,10
Byte 10,11,11,11,12,12,12,13,13,13,14,14,14,15,15,15
Byte 16,16,16,17,17,17,18,18,18,19,19,19,20,20,20,21
Byte 21,21,22,22,22,23,23,23,24,24,24,25,25,25,26,26
Byte 26,27,27,27,28,28,28,29,29,29,30,30,30,31,31,31
Byte 32,32,32,33,33,33,34,34,34,35,35,35,36,36,36,37
Byte 37,37,38,38,38,39,39,39,40,40,40,41,41,41,42,42
Byte 42,43,43,43,44,44,44,45,45,45,46,46,46,47,47,47
Byte 48,48,48,49,49,49,50,50,50,51,51,51,52,52,52,53
Byte 53,53,54,54,54,55,55,55,56,56,56,57,57,57,58,58
Byte 58,59,59,59,60,60,60,61,61,61,62,62,62,63,63,63
Byte 64,64,64,65,65,65,66,66,66,67,67,67,68,68,68,69
Byte 69,69,70,70,70,71,71,71,72,72,72,73,73,73,74,74
Byte 74,75,75,75,76,76,76,77,77,77,78,78,78,79,79,79
Byte 80,80,80,81,81,81,82,82,82,83,83,83,84,84,84,85
Byte 85,85,86,86,86,87,87,87,88,88,88,89,89,89,90,90
Byte 90,91,91,91,92,92,92,93,93,93,94,94,94,95,95,95
Byte 96,96,96,97,97,97,98,98,98,99,99,99,100,100,100,101
Byte 101,101,102,102,102,103,103,103,104,104,104,105,105,105,106,106
Byte 106,107,107,107,108,108,108,109,109,109,110,110,110,111,111,111
Byte 112,112,112,113,113,113,114,114,114,115,115,115,116,116,116,117
Byte 117,117,118,118,118,119,119,119,120,120,120,121,121,121,122,122
Byte 122,123,123,123,124,124,124,125,125,125,126,126,126,127,127,127
Byte 128,128,128,129,129,129,130,130,130,131,131,131,132,132,132,133
Byte 133,133,134,134,134,135,135,135,136,136,136,137,137,137,138,138
Byte 138,139,139,139,140,140,140,141,141,141,142,142,142,143,143,143
Byte 144,144,144,145,145,145,146,146,146,147,147,147,148,148,148,149
Byte 149,149,150,150,150,151,151,151,152,152,152,153,153,153,154,154
Byte 154,155,155,155,156,156,156,157,157,157,158,158,158,159,159,159
Byte 160,160,160,161,161,161,162,162,162,163,163,163,164,164,164,165
Byte 165,165,166,166,166,167,167,167,168,168,168,169,169,169,170,170
Byte 170,171,171,171,172,172,172,173,173,173,174,174,174,175,175,175
Byte 176,176,176,177,177,177,178,178,178,179,179,179,180,180,180,181
Byte 181,181,182,182,182,183,183,183,184,184,184,185,185,185,186,186
Byte 186,187,187,187,188,188,188,189,189,189,190,190,190,191,191,191
Byte 192,192,192,193,193,193,194,194,194,195,195,195,196,196,196,197
Byte 197,197,198,198,198,199,199,199,200,200,200,201,201,201,202,202
Byte 202,203,203,203,204,204,204,205,205,205,206,206,206,207,207,207
Byte 208,208,208,209,209,209,210,210,210,211,211,211,212,212,212,213
Byte 213,213,214,214,214,215,215,215,216,216,216,217,217,217,218,218
Byte 218,219,219,219,220,220,220,221,221,221,222,222,222,223,223,223
Byte 224,224,224,225,225,225,226,226,226,227,227,227,228,228,228,229
Byte 229,229,230,230,230,231,231,231,232,232,232,233,233,233,234,234
Byte 234,235,235,235,236,236,236,237,237,237,238,238,238,239,239,239
Byte 240,240,240,241,241,241,242,242,242,243,243,243,244,244,244,245
Byte 245,245,246,246,246,247,247,247,248,248,248,249,249,249,250,250
Byte 250,251,251,251,252,252,252,253,253,253,254,254,254,255,255,255

ALIGN 4

TabRed Byte 0,0,0,0,1,1,1,2,2,2,2,3,3,3,4,4
Byte 4,5,5,5,5,6,6,6,7,7,7,8,8,8,8,9
Byte 9,9,10,10,10,11,11,11,11,12,12,12,13,13,13,14
Byte 14,14,14,15,15,15,16,16,16,17,17,17,17,18,18,18
Byte 19,19,19,20,20,20,20,21,21,21,22,22,22,23,23,23
Byte 23,24,24,24,25,25,25,25,26,26,26,27,27,27,28,28
Byte 28,28,29,29,29,30,30,30,31,31,31,31,32,32,32,33
Byte 33,33,34,34,34,34,35,35,35,36,36,36,37,37,37,37
Byte 38,38,38,39,39,39,40,40,40,40,41,41,41,42,42,42
Byte 43,43,43,43,44,44,44,45,45,45,46,46,46,46,47,47
Byte 47,48,48,48,49,49,49,49,50,50,50,51,51,51,51,52
Byte 52,52,53,53,53,54,54,54,54,55,55,55,56,56,56,57
Byte 57,57,57,58,58,58,59,59,59,60,60,60,60,61,61,61
Byte 62,62,62,63,63,63,63,64,64,64,65,65,65,66,66,66
Byte 66,67,67,67,68,68,68,69,69,69,69,70,70,70,71,71
Byte 71,72,72,72,72,73,73,73,74,74,74,75,75,75,75,76

ALIGN 4

TabGreen Byte 2,2,3,3,3,4,4,4,4,4,4,5,5,5,5,5
Byte 5,6,6,6,7,7,7,7,7,7,8,8,8,8,8,8
Byte 9,9,9,9,9,9,10,10,10,11,11,11,11,11,11,12
Byte 12,12,12,12,12,13,13,13,14,14,14,14,14,14,15,15
Byte 15,15,15,15,16,16,16,17,17,17,17,17,17,18,18,18
Byte 18,18,18,19,19,19,19,19,19,20,20,20,21,21,21,21
Byte 21,21,22,22,22,22,22,22,23,23,23,24,24,24,24,24
Byte 24,25,25,25,25,25,25,26,26,26,26,26,26,27,27,27
Byte 28,28,28,28,28,28,29,29,29,29,29,29,30,30,30,31
Byte 31,31,31,31,31,32,32,32,32,32,32,33,33,33,34,34
Byte 34,34,34,34,35,35,35,35,35,35,36,36,36,36,36,36
Byte 37,37,37,38,38,38,38,38,38,39,39,39,39,39,39,40
Byte 40,40,41,41,41,41,41,41,42,42,42,42,42,42,43,43
Byte 43,44,44,44,44,44,44,45,45,45,45,45,45,46,46,46
Byte 46,46,46,47,47,47,48,48,48,48,48,48,49,49,49,49
Byte 49,49,50,50,50,51,51,51,51,51,51,52,52,52,52,52

ALIGN 4

TabBlue Byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
Byte 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
Byte 1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2
Byte 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
Byte 2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3
Byte 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3
Byte 3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4
Byte 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
Byte 4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5
Byte 5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
Byte 5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6
Byte 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6
Byte 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
Byte 7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8
Byte 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8
Byte 8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9

.Code

; ==================================================================================
ALIGN 16
; ==================================================================================

Effect_Grey_601 PROC __lpImageBuffer:LPBYTE

push edi

mov edi,__lpImageBuffer
mov ecx,[edi].BITMAPINFO.bmiHeader.biSizeImage
add edi,SIZEOF BITMAPINFO
jmp @Loop

; **********************************************************************************
ALIGN 16
; **********************************************************************************

@Loop :

mov eax,[edi]
push eax
movzx edx,ah
movzx eax,al
mov dl,Byte Ptr [OFFSET TabGreen + edx]
mov al,Byte Ptr [OFFSET TabBlue + eax]
add eax,edx
pop edx
shr edx,16
movzx edx,dl
mov dl,Byte Ptr [OFFSET TabRed + edx]
add eax,edx
movzx eax,al

mov ah,al
shl eax,8
mov al,ah

mov [edi],eax

add edi,SIZEOF DWord
sub ecx,SIZEOF DWord
jnz @Loop

pop edi

ret
Effect_Grey_601 ENDP


TabBlue, TabGreen and TabRed are the values for the coefficients.

Thanks evryone for your help. I learn a lot of things.

Siekmanski

Dave...

I agree that the " (R * 0.299) + (G * 0.587) + (B * 0.114) " is the best routine for gray-scale conversion.  ;)
I just summed up the other methods I've heard of.

Grincheux asked: What are the MMX or SSE or xxx that I could use.
So I translated the average method routine he posted to SSE.
Creative coders use backward thinking techniques as a strategy.

Grincheux

Super. That what I expected.
Thank you Siekmanski  :eusa_boohoo:.