I would like a lesson

Grincheux · December 08, 2015, 05:18:30 AM

Just title!

I am coding a program that modifies colors in an image.

xRGB - 32 bits

What are the MMX or SSE or xxx that I could you use.
I don't know these instruction, perhaps some of them could help me,.... to be faster

Code Select


						jmp		@Loop

;	**********************************************************************************
						ALIGN	16
;	**********************************************************************************

@Loop :

						mov		eax,[edi]
						and		eax,00ffffffh
						mov		edx,eax
						mov		ebx,eax
						shr		edx,8
						shr		ebx,16
						and		eax,000000ffh
						and		edx,000000ffh
						and		ebx,000000ffh
						add		eax,ebx
						add		eax,edx
						xor		edx,edx
						mov		ebx,3
						div		bx
						mov		ah,al
						shl		eax,8
						mov		al,ah
						mov		[edi],eax
						add		edi,SIZEOF DWord
						sub		ecx,SIZEOF DWord
						jnz		@Loop

						ret
Effect_Grey_Minimum		ENDP

This is an example of what I do.
This is a grey conversion based on the average of REG / GREEN / BLUE
I would like to optimize it.

JJ2007 vexed me with my very slow program. I laugh.

jj2007 · December 08, 2015, 06:01:40 AM

Quote from: Grincheux on December 08, 2015, 05:18:30 AM
JJ2007 vexed me with my very slow program. I laugh.

Sorry, no bad intentions :P

Have a look specifically at
pshuf*
andps
movlps, movups
pcmpeqb

dedndave · December 08, 2015, 06:35:16 AM

a little work on your algorithm is probably in order

avoid the use of DIV when possible
in the case of division by 3, it is easily avoided
you can use "multiply-to-divide" and it will be much faster

however, the gray-scale conversion can be performed another way...

Code Select

Y = .587G + .299R + .114B

so, select scaling constants that will cause a total of 0FF_FFFFFFFFh and accumulate in EDX:EAX
when you are done, DL will have the luminance byte value without division

Code Select

Y = (LG + MR + NB) / 4294967296
L + M + N = FFFFFFFFFF / FF = 4311810305
L = .587 * 4311810305 = 2531032649
M = .299 * 4311810305 = 1289231281
N = .114 * 4311810305 = 491546375

so....

Code Select

MOVZX EAX,byte ptr [blue]
MUL by 491546375
MOV ESI,EAX
MOV EDI,EDX

MOVZX EAX,byte ptr [red]
MUL by 1289231281
ADD ESI,EAX
ADC EDI,EDX

MOVZX EAX,byte ptr [green]
MUL by 2531032649
ADD EAX,ESI
ADC EDX,EDI

;DL = luminance byte

now, you could do this...

Code Select

MOV byte ptr [blue],DL
MOV byte ptr [green],DL
MOV byte ptr [red],DL

but, faster, MUL DL by 10101h and write all 3 bytes as a dword

Grincheux · December 08, 2015, 06:40:42 AM

For Gray conversion there is your formula that I know but there are three others :

Minimum of all the components
Maximum
Average

That's all grey

dedndave · December 08, 2015, 06:44:47 AM

that's a poor shortcut

use the equation i posted - lol

Grincheux · December 08, 2015, 06:49:24 AM

JJ2007 thanks for your docs but I already have it.
I have searched width Google but they just say what the instruction does. One line by instruction!
Many instructions use floating points but me it is integers that I have. Is it a problem?
https://www.csie.ntu.edu.tw/~cyy/courses/assembly/docs/ch11_MMX.pdf This is the best I have found.
Many have gas syntax.

jj2007 · December 08, 2015, 07:27:53 AM

Quote from: Grincheux on December 08, 2015, 06:49:24 AM
JJ2007 thanks for your docs but I already have it.

It's the best I found so far. The original Intel manuals are too big and detailed for my taste.

QuoteMany instructions use floating points but me it is integers that I have. Is it a problem?

No. SSE instructions that load, save or make bitwise operations do not distinguish between float, double or integer.
Btw avoid mmx, go for xmm. More bits and no conflict with the FPU.

Grincheux · December 08, 2015, 07:31:21 AM

For dividing, I know that it is very slow. I suggested an other way (http://masm32.com/board/index.php?topic=4886.0).
Many people have a formula for grey conversion. Difficult to make the choice.

Grey = 0.2125 Rouge + 0.7154 Vert + 0.0721 Bleu
Grey = 0.299 Rouge + 0.587 Vert + 0.114 Bleu

A site than can help in research http://www.tutorialspoint.com/dip/pdf/Gray_Level_Transformations.pdf

Siekmanski · December 08, 2015, 12:21:01 PM

Hi Grincheux,

Here are 4 methods:

1) The lightness method:
Averages the most prominent and least prominent colors: (max(R, G, B) + min(R, G, B)) / 2

2) The average method:
Averages the values: (R + G + B) / 3

3) Colorimetric method:
It forms a weighted average to account for human perception.
Luminosity is: (R * 0.2126) + (G * 0.7152) + (B * 0.0722)

4) Luma coding method:
This is how we perceive colors from TV's monitors etc.
Luma is: (R * 0.299) + (G * 0.587) + (B * 0.114)

This is the SSE version of the average method.
It processes four 32 bit pixels at once.

EDIT: inserted roundup code to prevent overflow.

Code Select


.data
align 16
div3 equ 256/3
MagicDiv3   dd div3,div3,div3,div3
ByteMask    dd 000000ffh,000000ffh,000000ffh,000000ffh
RoundUp     dd 3,3,3,3

rgb_data    db 255,123,11,0, 210,33,77,0, 239,111,178,0, 99,88,22,0
Average     dd 0,0,0,0

.code
    movaps  xmm0,oword ptr rgb_data
    movaps  xmm1,oword ptr ByteMask 
    movaps  xmm2,xmm0   ; __ B3 G3 R3,  __ B2 G2 R2,  __ B1 G1 R1,  __ B0 G0 R0 
    psrld   xmm0,8      ; __ __ B3 G3,  __ __ B2 G2,  __ __ B1 G1,  __ __ B0 G0 
    movaps  xmm3,xmm0   ; __ __ B3 G3,  __ __ B2 G2,  __ __ B1 G1,  __ __ B0 G0 
    psrld   xmm0,8      ; __ __ __ B3,  __ __ __ B2,  __ __ __ B1,  __ __ __ B0 
    pand    xmm2,xmm1   ; __ __ __ R3,  __ __ __ R2,  __ __ __ R1,  __ __ __ R0 
    pand    xmm3,xmm1   ; __ __ __ G3,  __ __ __ G2,  __ __ __ G1,  __ __ __ G0 
    pand    xmm0,xmm1   ; __ __ __ B3,  __ __ __ B2,  __ __ __ B1,  __ __ __ B0 
    paddd   xmm0,xmm3   ; B + G
    paddd   xmm0,xmm2   ; B + G + R
    paddd   xmm0,oword ptr RoundUp
    pmullw  xmm0,oword ptr MagicDiv3
    psrld   xmm0,8      ; __ __ __ A3,  __ __ __ A2,  __ __ __ A1,  __ __ __ A0
    movaps  xmm1,xmm0
    movaps  xmm2,xmm0
    pslld   xmm1,8      ; __ __ A3 __,  __ __ A2 __,  __ __ A1 __,  __ __ A0 __
    pslld   xmm2,16     ; __ A3 __ __,  __ A2 __ __,  __ A1 __ __,  __ A0 __ __
    pxor    xmm0,xmm1   ; __ __ A3 A3,  __ __ A2 A2,  __ __ A1 A1,  __ __ A0 A0
    pxor    xmm0,xmm2   ; __ A3 A3 A3,  __ A2 A2 A2,  __ A1 A1 A1,  __ A0 A0 A0
    movaps  oword ptr Average,xmm0 ; A3=130 A2=107 A1=177 A0=70

Marinus

dedndave · December 08, 2015, 11:46:28 PM

Marinus...

1) this method makes no sense for individual-pixel conversion :redface:

2) this method is a poor shortcut - simpler, yes - faster, probably not

3) this method is a color-contrast equation, used primarily for contrasting text comparisons

4) this is the proper method for converting to gray-shades
it is how we perceive light (not just from TVs, research human eye, rods and cones)
it is used for all sorts of things, not just simple image conversion (astronomy, facial recognition, bio-medical, etc)

no matter which method you use, an image can be created which will be totally ambiguous when converted
in other words, the 3 image colors may contrast,
but when converted to gray-shades, the entire image is a single shade
over a wide variety of images, the later method will perform the best

dedndave · December 09, 2015, 12:20:10 AM

give this a try
you won't be disappointed with speed or results

the _lpBmpData argument is a pointer to the image data (after the header)
it will be fastest if this address is 4-aligned

Code Select

Bmp2Gray32 PROTO :DWORD,:DWORD,:LPVOID

Code Select

Bmp2Gray32 PROC USES EBX ESI EDI _dwWidth:DWORD,_dwHeight:DWORD,_lpBmpData:LPVOID

    mov     eax,_dwWidth
    push    ebp
    mul     _dwHeight
    mov     ebx,_lpBmpData
    xchg    eax,ebp
    .repeat
        movzx   eax,byte ptr [ebx]           ;EAX = blue
        mov     ecx,491546375
        mul     ecx
        xchg    eax,esi
        mov     edi,edx
        movzx   eax,byte ptr [ebx+1]         ;EAX = green
        mov     ecx,2531032649
        mul     ecx
        add     esi,eax
        adc     edi,edx
        movzx   eax,byte ptr [ebx+2]         ;EAX = red
        mov     ecx,1289231281
        mul     ecx
        add     eax,esi
        adc     edx,edi
        mov     eax,10101h
        mul     edx
        or      eax,0FF000000h               ;opacity = 255
        mov     [ebx],eax
        dec     ebp
        lea     ebx,[ebx+4]
    .until ZERO?
    pop     ebp
    ret

Bmp2Gray32 ENDP

FORTRANS · December 09, 2015, 01:36:36 AM

Hi Dave,

Quote from: dedndave on December 08, 2015, 11:46:28 PM
1) this method makes no sense for individual-pixel conversion

At first glance, I would tend to agree. I may try it out though,
if I get curious.

Quote2) this method is a poor shortcut - simpler, yes - faster, probably not

But it is commonly used.

Quote3) this method is a color-contrast equation, used primarily for contrasting text comparisons

Based on the reference I use, this is closest to using S.M.P.T.E.
RGB primaries and YIQ or YUV intensity encoding. The numbers
in his table are 0.212, 0.701, and 0.087. These are close to the
numbers they posted. Essentially an update of the N.T.S.B. values
to more modern phosphors (and so forth).

Quote4) this is the proper method for converting to gray-shades
it is how we perceive light (not just from TVs, research human eye, rods and cones)
it is used for all sorts of things, not just simple image conversion (astronomy, facial recognition, bio-medical, etc)

This is using N.T.S.B. RGB primaries and the YIQ intensity channel.
Often used, but uses old phosphor emission curves. This was
developed for color television transmission and reception in the U.S..

If I look up in the table the C.I.E. colorimetric (actually "Spectral
Primary Color Coordinate System") RGB primarys and its YIQ
transformation, its factors are 0.177, 0.813, and 0.011.
This is a bit odd, as it is using tristimulus values of a "Standard
Observer" and TV transmission transforms. But it also brings
up the old fast color to gray algorithm, Gray = Green. Remember
that one from "the good old days"?

In practice, the N.T.S.B. YIQ intensity transformation works
well on most common images. YMMV.

Regards,

Steve N.

Grincheux · December 09, 2015, 02:42:44 AM

Quote
the _lpBmpData argument is a pointer to the image data (after the header)
it will be fastest if this address is 4-aligned

The BITMAPINFO Structure is 44 bytes length so I suppose it is aligned.

Code Select


.Data
						ALIGN	4

TabDiv3					Byte	0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5
						Byte	5,5,6,6,6,7,7,7,8,8,8,9,9,9,10,10
						Byte	10,11,11,11,12,12,12,13,13,13,14,14,14,15,15,15
						Byte	16,16,16,17,17,17,18,18,18,19,19,19,20,20,20,21
						Byte	21,21,22,22,22,23,23,23,24,24,24,25,25,25,26,26
						Byte	26,27,27,27,28,28,28,29,29,29,30,30,30,31,31,31
						Byte	32,32,32,33,33,33,34,34,34,35,35,35,36,36,36,37
						Byte	37,37,38,38,38,39,39,39,40,40,40,41,41,41,42,42
						Byte	42,43,43,43,44,44,44,45,45,45,46,46,46,47,47,47
						Byte	48,48,48,49,49,49,50,50,50,51,51,51,52,52,52,53
						Byte	53,53,54,54,54,55,55,55,56,56,56,57,57,57,58,58
						Byte	58,59,59,59,60,60,60,61,61,61,62,62,62,63,63,63
						Byte	64,64,64,65,65,65,66,66,66,67,67,67,68,68,68,69
						Byte	69,69,70,70,70,71,71,71,72,72,72,73,73,73,74,74
						Byte	74,75,75,75,76,76,76,77,77,77,78,78,78,79,79,79
						Byte	80,80,80,81,81,81,82,82,82,83,83,83,84,84,84,85
						Byte	85,85,86,86,86,87,87,87,88,88,88,89,89,89,90,90
						Byte	90,91,91,91,92,92,92,93,93,93,94,94,94,95,95,95
						Byte	96,96,96,97,97,97,98,98,98,99,99,99,100,100,100,101
						Byte	101,101,102,102,102,103,103,103,104,104,104,105,105,105,106,106
						Byte	106,107,107,107,108,108,108,109,109,109,110,110,110,111,111,111
						Byte	112,112,112,113,113,113,114,114,114,115,115,115,116,116,116,117
						Byte	117,117,118,118,118,119,119,119,120,120,120,121,121,121,122,122
						Byte	122,123,123,123,124,124,124,125,125,125,126,126,126,127,127,127
						Byte	128,128,128,129,129,129,130,130,130,131,131,131,132,132,132,133
						Byte	133,133,134,134,134,135,135,135,136,136,136,137,137,137,138,138
						Byte	138,139,139,139,140,140,140,141,141,141,142,142,142,143,143,143
						Byte	144,144,144,145,145,145,146,146,146,147,147,147,148,148,148,149
						Byte	149,149,150,150,150,151,151,151,152,152,152,153,153,153,154,154
						Byte	154,155,155,155,156,156,156,157,157,157,158,158,158,159,159,159
						Byte	160,160,160,161,161,161,162,162,162,163,163,163,164,164,164,165
						Byte	165,165,166,166,166,167,167,167,168,168,168,169,169,169,170,170
						Byte	170,171,171,171,172,172,172,173,173,173,174,174,174,175,175,175
						Byte	176,176,176,177,177,177,178,178,178,179,179,179,180,180,180,181
						Byte	181,181,182,182,182,183,183,183,184,184,184,185,185,185,186,186
						Byte	186,187,187,187,188,188,188,189,189,189,190,190,190,191,191,191
						Byte	192,192,192,193,193,193,194,194,194,195,195,195,196,196,196,197
						Byte	197,197,198,198,198,199,199,199,200,200,200,201,201,201,202,202
						Byte	202,203,203,203,204,204,204,205,205,205,206,206,206,207,207,207
						Byte	208,208,208,209,209,209,210,210,210,211,211,211,212,212,212,213
						Byte	213,213,214,214,214,215,215,215,216,216,216,217,217,217,218,218
						Byte	218,219,219,219,220,220,220,221,221,221,222,222,222,223,223,223
						Byte	224,224,224,225,225,225,226,226,226,227,227,227,228,228,228,229
						Byte	229,229,230,230,230,231,231,231,232,232,232,233,233,233,234,234
						Byte	234,235,235,235,236,236,236,237,237,237,238,238,238,239,239,239
						Byte	240,240,240,241,241,241,242,242,242,243,243,243,244,244,244,245
						Byte	245,245,246,246,246,247,247,247,248,248,248,249,249,249,250,250
						Byte	250,251,251,251,252,252,252,253,253,253,254,254,254,255,255,255

						ALIGN	4

TabRed					Byte	0,0,0,0,1,1,1,2,2,2,2,3,3,3,4,4
						Byte	4,5,5,5,5,6,6,6,7,7,7,8,8,8,8,9
						Byte	9,9,10,10,10,11,11,11,11,12,12,12,13,13,13,14
						Byte	14,14,14,15,15,15,16,16,16,17,17,17,17,18,18,18
						Byte	19,19,19,20,20,20,20,21,21,21,22,22,22,23,23,23
						Byte	23,24,24,24,25,25,25,25,26,26,26,27,27,27,28,28
						Byte	28,28,29,29,29,30,30,30,31,31,31,31,32,32,32,33
						Byte	33,33,34,34,34,34,35,35,35,36,36,36,37,37,37,37
						Byte	38,38,38,39,39,39,40,40,40,40,41,41,41,42,42,42
						Byte	43,43,43,43,44,44,44,45,45,45,46,46,46,46,47,47
						Byte	47,48,48,48,49,49,49,49,50,50,50,51,51,51,51,52
						Byte	52,52,53,53,53,54,54,54,54,55,55,55,56,56,56,57
						Byte	57,57,57,58,58,58,59,59,59,60,60,60,60,61,61,61
						Byte	62,62,62,63,63,63,63,64,64,64,65,65,65,66,66,66
						Byte	66,67,67,67,68,68,68,69,69,69,69,70,70,70,71,71
						Byte	71,72,72,72,72,73,73,73,74,74,74,75,75,75,75,76

						ALIGN	4

TabGreen				Byte	2,2,3,3,3,4,4,4,4,4,4,5,5,5,5,5
						Byte	5,6,6,6,7,7,7,7,7,7,8,8,8,8,8,8
						Byte	9,9,9,9,9,9,10,10,10,11,11,11,11,11,11,12
						Byte	12,12,12,12,12,13,13,13,14,14,14,14,14,14,15,15
						Byte	15,15,15,15,16,16,16,17,17,17,17,17,17,18,18,18
						Byte	18,18,18,19,19,19,19,19,19,20,20,20,21,21,21,21
						Byte	21,21,22,22,22,22,22,22,23,23,23,24,24,24,24,24
						Byte	24,25,25,25,25,25,25,26,26,26,26,26,26,27,27,27
						Byte	28,28,28,28,28,28,29,29,29,29,29,29,30,30,30,31
						Byte	31,31,31,31,31,32,32,32,32,32,32,33,33,33,34,34
						Byte	34,34,34,34,35,35,35,35,35,35,36,36,36,36,36,36
						Byte	37,37,37,38,38,38,38,38,38,39,39,39,39,39,39,40
						Byte	40,40,41,41,41,41,41,41,42,42,42,42,42,42,43,43
						Byte	43,44,44,44,44,44,44,45,45,45,45,45,45,46,46,46
						Byte	46,46,46,47,47,47,48,48,48,48,48,48,49,49,49,49
						Byte	49,49,50,50,50,51,51,51,51,51,51,52,52,52,52,52

						ALIGN	4

TabBlue					Byte	0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
						Byte	1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
						Byte	1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2
						Byte	2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
						Byte	2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3
						Byte	3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3
						Byte	3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4
						Byte	4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
						Byte	4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5
						Byte	5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
						Byte	5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6
						Byte	6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6
						Byte	7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
						Byte	7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8
						Byte	8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8
						Byte	8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9

.Code

;	==================================================================================
						ALIGN	16
;	==================================================================================

Effect_Grey_601			PROC	__lpImageBuffer:LPBYTE

						push	edi

						mov		edi,__lpImageBuffer
						mov		ecx,[edi].BITMAPINFO.bmiHeader.biSizeImage
						add		edi,SIZEOF BITMAPINFO
						jmp		@Loop

;	**********************************************************************************
						ALIGN	16
;	**********************************************************************************

@Loop :

						mov		eax,[edi]
						push	eax
						movzx	edx,ah
						movzx	eax,al
						mov		dl,Byte Ptr [OFFSET TabGreen + edx]
						mov		al,Byte Ptr [OFFSET TabBlue + eax]
						add		eax,edx
						pop		edx
						shr		edx,16
						movzx	edx,dl
						mov		dl,Byte Ptr [OFFSET TabRed + edx]
						add		eax,edx
						movzx	eax,al

						mov		ah,al
						shl		eax,8
						mov		al,ah

						mov		[edi],eax

						add		edi,SIZEOF DWord
						sub		ecx,SIZEOF DWord
						jnz		@Loop

						pop		edi

						ret
Effect_Grey_601			ENDP

TabBlue, TabGreen and TabRed are the values for the coefficients.

Thanks evryone for your help. I learn a lot of things.

Siekmanski · December 09, 2015, 06:55:26 AM

Dave...

I agree that the " (R * 0.299) + (G * 0.587) + (B * 0.114) " is the best routine for gray-scale conversion. ;)
I just summed up the other methods I've heard of.

Grincheux asked: What are the MMX or SSE or xxx that I could use.
So I translated the average method routine he posted to SSE.

Grincheux · December 09, 2015, 07:31:05 AM

Super. That what I expected.
Thank you Siekmanski

.

The MASM Forum

News:

I would like a lesson

Grincheux

jj2007

dedndave

Grincheux

dedndave

Grincheux

jj2007

Grincheux

Siekmanski

dedndave

dedndave

FORTRANS

Grincheux

Siekmanski

Grincheux