Fast Compare Real8 with SSE and ColorSpaces

daydreamer · February 16, 2019, 08:28:49 PM

Guga please post PDF,I dug up and found SSE/SSE2 tutorial and posted it in my SSE thread to keep it from disappearing among lots of posts
I think if you understand the way of packed compares work,you dont need to be stuck with lots of packed SSE code bottlenecked with scalar IF's
I want to make a realtime colorization demo,starting with a simpler colorspace to begin with ,if CieLCH is too slow

Siekmanski great work :t

guga · February 16, 2019, 11:29:41 PM

Quote from: Siekmanski on February 16, 2019, 07:47:59 PM
CieLCH seems to be the best option but also the most complicated color space conversion algorithm.
For me this is a new field to explore and have to learn a lot more to understand it fully.
Looking forward to see your CieLCH routine when finished. :t

Found this paper: http://jcgt.org/published/0002/02/01/paper.pdf

In my own logic, I always try to understand algorithms by working my way backwards.
Then I try to simplify the calculations if possible.
In the case of color space conversion you can precalculate the coefficients for a dot3 matrix routine.

Once you know the 3*3 matrix coefficients for the "forward" RGB -> XYZ calculations,
you can use the inverse of the 3*3 matrix coefficients to compute the "backwards" XYZ -> RGB.
This way, the "backwards" results will always be correct.

Bruce Lindbloom has done some of the coefficients math for us to compute the RGB -> XYZ and XYZ -> RGB matrices.
http://www.brucelindbloom.com/Eqn_RGB_XYZ_Matrix.html

There is a CIE Color Calculator on his site:
http://www.brucelindbloom.com/ColorCalculator.html

My routines are not finished yet but I will post them when ready, then we can check if it is done right.

Here is the dot3 color conversion routine I wrote, the one is used in the example above in reply #70.
I wrote it in SSE2 instructions so it can be used on older computers as well. ( so no fancy byte shuffles in this one. )
Adjusted the 3*3 matrix transpose routine to handle 4 row elements preserving alpha

Code Select Expand
; B G R A CIERGB2XYZ real4 0.2006017, 0.3106803, 0.4887180, 0.0 ; X real4 0.0108109, 0.8129847, 0.1762044, 0.0 ; Y real4 0.9897952, 0.0102048, 0.0000000, 1.0 ; Z ( AZ must be 1.0 ) ALPHA_mask dd -1,-1,-1,0 align 4 ColorConversionInt2Float proc uses ebx esi edi BitmapWidth:DWORD,BitmapHeight:DWORD,pSourceMem:DWORD,pDestinationMem:DWORD,pConversionType:DWORD mov esi,pSourceMem mov edi,pDestinationMem mov edx,pConversionType mov ecx,BitmapWidth imul ecx,BitmapHeight shr ecx,2 pxor xmm5,xmm5 ; Empty the source operand, to zero the integer high parts, ; in the "punpcklbw", "punpcklwd" instructions align 16 LoadFourPixels: mov ebx,4 movdqa xmm6,oword ptr [esi] ; Load 4 ARGB pixels at once FourPixelLP: movq xmm0,xmm6 ; 1 pixel punpcklbw xmm0,xmm5 ; Convert 4 bytes to 4 words punpcklwd xmm0,xmm5 ; Convert 4 words to 4 dwords cvtdq2ps xmm0,xmm0 ; Convert 4 dwords to 4 real4 values movaps xmm1,xmm0 ; [B G R A] movaps xmm2,xmm0 ; [B G R A] mulps xmm0,oword ptr [edx] ; [BX GX RX --] Multiply Color X conversion coefficients mulps xmm1,oword ptr [edx+16] ; [BY GY RY --] Multiply Color Y conversion coefficients mulps xmm2,oword ptr [edx+32] ; [BZ GZ RZ AZ] Multiply Color Z conversion coefficients ; Color conversion using an adjusted SSE2 4*3 matrix transposition routine ( preserving Alpha_Z) ; Now we can run a fast Dot3 ( 3-component vector ) calculation on the ; 12 color components and 12 color coefficients ( 9 of each + 3 Alpha components ) ; Calculations are in parallel, 3 muls and 2 adds movaps xmm3,xmm0 ; [BX GX RX --] movaps xmm4,xmm2 ; [BZ GZ RZ AZ] unpcklps xmm3,xmm1 ; [BX BY GX GY] unpcklps xmm4,xmm4 ; [BZ BZ GZ GZ] movhlps xmm4,xmm3 ; [GX GY GZ GZ] movlhps xmm3,xmm2 ; [BX BY BZ GZ] unpckhps xmm0,xmm1 ; [RX RY -- --] shufps xmm0,xmm2,Shuffle(3,2,1,0) ; [RX RY RZ AZ] shufps xmm6,xmm6,Shuffle(0,3,2,1) ; pre-load next ARGB pixel addps xmm3,xmm4 ; [BX+GX BY+GY BZ+GZ GZ+GZ] andps xmm3,oword ptr ALPHA_mask ; [BX+GX BY+GY BZ+GZ --] addps xmm0,xmm3 ; [RX+BX+GX RY+BY+GY RZ+BZ+GZ AZ] ; result: BGRA movaps oword ptr [edi],xmm0 ; Store BGRA Pixel in Real4 format add edi,16 dec ebx jnz FourPixelLP add esi,16 dec ecx jnz LoadFourPixels ret ColorConversionInt2Float endp

Great work :t

About the transposing matrix. Yes. The backwards calculation uses the inverse matrix transposed. All of this is precalculated way before the routines RGBtoCieLch/CieLCHtoRGB functions starts. One thing only, don´t forget to include the gamma adjustments and white reference i told on the paper.

Bruce has done a great job on all of this, but stil the problem on this colorspace in particular remains. He managed to fix the discontinuity problem adjusting the threshold after the RGB is converted to XYZ before converting it to Lab/LCH but, what he didn´t thought is to check the limits between Hue/Chroma and Luminosity that do have some issues if you use the CIE formula without the necessary fixes.

If you use the formula on the "normal" way you will end up having to clip the resultant R, G or B on the backwards computation. So, when yu are doing the backwards computation you always needs to check if the relation beetween chroma/hue and luma fits to tjhe formula i proposed.

About hue, the delta hue i showed previously [5*cos(Hue)-2*sin(Hue)] seems to be limited to sqrt(29). I´ll finish the paper updating it to include the backward formula and post it here for you see.

guga · February 16, 2019, 11:46:32 PM

Quote from: daydreamer on February 16, 2019, 08:28:49 PM
Guga please post PDF,I dug up and found SSE/SSE2 tutorial and posted it in my SSE thread to keep it from disappearing among lots of posts
I think if you understand the way of packed compares work,you dont need to be stuck with lots of packed SSE code bottlenecked with scalar IF's
I want to make a realtime colorization demo,starting with a simpler colorspace to begin with ,if CieLCH is too slow

Siekmanski great work :t

Hi DayDreamer. I´m finishing some details on the algorithm and will update with the backwards algorithm for you and Siekmanski

Siekmanski · February 16, 2019, 11:54:57 PM

:t

daydreamer · February 17, 2019, 02:20:06 AM

Quote from: guga on February 16, 2019, 11:46:32 PM
Hi DayDreamer. I´m finishing some details on the algorithm and will update with the backwards algorithm for you and Siekmanski

great :t

guga · February 18, 2019, 03:01:52 AM

Hi Siekmanski, can you try something ?

I altered the original formula of CieLab to better fit the Chroma and Hue on CieLCH. Can you please give a test replacing this:

Code Select

a = (X-Y)*500
b = (Y-Z)*200

with this.

Code Select


a = (X-Y)*(5/2)*225
b = (Y-Z)*225

It seems that this new formula better fits to the Hue and Chroma and don´t extrapolates the result (I hope)

There is a problematic gap on the original CieLab equation and i´m trying to fix this. If i did the math correctly, the new equation may be the one that corrects that problem. I wold like to see the results using your images as you did previously.

It will increase the chromacity on a factor of 1.125 but at the same time, forces the Hue to be restricted on their own limits as the ones i found from luma.

If this is correcet, then i may have found the correct boundaries/ranges for chromacity too

And then we can assume that we have only 256 different variations of luminance, each one having his ranges (fixed ones) for Chromacity as well allowing a range of Hue to be stayed in 0º to 360º without extrapolating and better spreading the hue angles accordying to each chroma/luma range.

So, the final CieLCH function can be even faster, simply pointing to a table containing the ranges for Luma and Chroma (That also can be used integers as well from 0 to 255, rather then Floating Points) and using only Hue with fractions. (Limited to the chroma/luma table too, i hope :icon_mrgreen:)

Note:
Using b = (Y-Z)*225 seems compatible with some other perceptual colorspaces and gamma correction as well, since they tend to limit the RGB values to 253. So: 225*1.125 ~ 253

If we want to keep the range of 0 to 255, probably we could try also with this other formula too:

Code Select


a = (X-Y)*(5/2)*(226+2/3)
b = (Y-Z)*(226+2/3)

since: (226+2/3)*1.125 = 255 ;) This last one seems more logical using all range of 0 to 255, since we already corrected gamma and white reference previously. So, it seems to me unreasonable limiting the range to 253 as it is done on the 'normal' way.

Siekmanski · February 18, 2019, 10:37:08 AM

Hi guga,

Still have to do more research before I can understand the CieLCH color conversion.

I think you are right, it is logical to use the complete range of 256 lumas or at least the maximum value of the lumas in the source bitmap.

In your case, where you want to combine the luma of bitmap A with the chromas of bitmap B to construct a new bitmap from bitmap A with new colors we only need to create a 256 colors palette.
Correct me if I'm wrong.

Not every bitmap has the full range of 256 luma values and most probably, some luma values that are present in bitmap A aren't present in bitmap A and vice versa.

So, I think in my own logic ( I have no experience in this matter and could be totally missing the point here :icon_eek:) that we need the highest luma value from bitmap A as the total number of palette entries.

Then calculate the avarage ( maybe it is better to get the standard deviation or variance.... ) of each chroma pair with the same luma.
If there are missing luma values in bitmap B, we can interpolate between the neighbour chromas and assign the value(s) to the missing lumas.

Then there is another thing.
What do we do when there are less luma values in bitmap B as in bitmap A after we have calculated the missing in between lumas in bitmap B?
I think we can simply stretch the chroma values to the number of lumas in bitmap A.

These are just my thoughts and probably not the best way of doing this but, I will try it out.

guga · February 18, 2019, 01:33:15 PM

Hi SiekManski

Calculating the average or standard deviation is one way to do it too when the source bitmap has no luma as in the gray image.

What i did on this previous version was search for the nearest value rather then calculating the average.

The data in Source A (The colored one) contains a table with 256 structures whose members contains the pointers to the gray/luminance values on it. The gray is used as an index.

So, as i posted previously, on the Table from the colored one we have a pointer to all luminance existant on it but indexed according to the gray value. (That is nothing more than the luma range itself).

So, we are trying to colorized a pixel on the gray image whose gray is 100, but on the source image (the colored) we don´t have any pixel that will represent that same "gray", i looked for the one nearest to it. So i start looking on the table (containing the structures) if it has a "gray" 101, 102, 103 etc and do the same backwards looking for 99, 98, 97. Then i only calculate the nearest value.

Say on the colored image we have this sequence of 'gray" (On our array of 256 structure): 96 97 98 103 105 108. Since we are looking for pixel 100 (on the gray image), i get the previous and next "gray" that actually is present on the colored image. On this case it is 98 and 103. Then i simply choose the one that is closer to 100. In case i pick up 98. I do that because 98 is closer to 100 then 103 is closer to 100. I use their deltas, rather then computing the average between them. So, Delta1 = 100-98 = 2 Delta2 = 103-100 = 3. The smaller delta (2) is the pixel nearest to the value of 100. So, i use 98 to fill the gap.

Of course, we could fill the gap simply calculating the average among them (103+98)/2 = 100.5. But remember that we are using luminance/gray as an index on a table ? They are in integer values. So we could never find the proper value when it comes to a fraction. That´s why i´m choosing the nearest one on this preliminary tests.

You can´t stretch the chroma randomly. If you do that, you will be out of the range of hue too. The best always looking for the luminance and using the table (described below) to retrieve the chroma/hue back. If the luma don´t exists, you can do as i said previously on finding the nearest, or even estimating the average or doing it by the neighbor etc. What you can´t do is rely on chroma or even on hue for retrieve that back. You always need to use the luminance as the guidance of the equations.

The best choice to extend the limitation of 256 colors is calculating the luminance of the neighbour pixels. Similar to when we are using a signature to identify images, for example. On this way, we can also retrieve texture as well. We only need to use a small matrix to do this. Let´s say, 3*3 of the gray pixels and 3x3 on the colored ones. And then we look 1st for their positions for a full match. Since the odds that one 3x3 matrix is exactly the same as another is too low it is unlikely that an error is generated. You may think that we will need a huge database of 3x3 pixels to be used as samples, but, probably not. If we succeed to fix the CieLCH equations we will limit the total amount of colors to search and, by consequence, the total amount of luminance combinations as well.

From all those 16 million of colors, probably more then a half of it are perceptual similar to another one. I didn´t estimated yet the amount of unique colors, because i´m struggling trying to fix that damn CieLab/CieLCH equations, but the goal is forcing the algorithm (The RGBtoCieLCh function and it´s reversal) to be restricted on a table containing only

256 gray colors (each gray color correspond to a unique luminance value)
256 chroma values (same as above, each chroma value seems to be limited to the same range as the luma and the difference between the max and min are extremelly low)
360 (or most likely 338) different hue angles

The hue angle is being a particular problem. The original equation simply does not fits to the results. There is a difference in the chroma/hue fraction of 1.125. I fixed it as proposed, but, the difference remains there. perhaps, i´ll need to multiply Y with 1.125 e divide X and Z by [2/(3-1.125)] to adjust that properly.

I didn´t tested yet. Whenever i fix one part, this difference appears on another. The problem is that X, Y and Z are attached to each other biasing on the matrix multiplication of Red, Green and Blue with those tristimulus values from sRGB for example. If i simply multiply Y, i´ll end up with an error, because X and Z also uses the same Red, Green and Blue values just multiplied with their own tristmulus values.

I know i´m close to the solution, because the difference of 1.125 seems to be a fixed value, but i can´t actually see where to fix it. The new paper i wrote is a complete mess, btw. I do the equations and write it down on the doc, but when i see an error, i put it on the same paper to don´t forget later what i was doing. :icon_mrgreen: :icon_mrgreen: :icon_mrgreen:

The good thing is that the properties of Luma, Hue and Chroma to be used on the backward computation are all there now :) I simply need to adjust the equation for fit the relation between chroma and hue values.

What i did found about Chroma is that i can put them on the same table as in Luma range and ended up in something like this:

Code Select

Gray/Color	       Luminosity (Min)	                        Chroma Min
      0	                        0	                                                0
      1	                        2.741066938704112e-1	            26.0519460866180125
      2	                        5.48213387740825841e-1	            26.49074207984956658

So, when a pixel have the value a luminosity in the range of 2.74e-1 to 5.48e-1, it means , in fact, that this pixel will necessarily generate a gray color whose value is "1" (That we can use as an index)
It also means that whenever a pixel falls on the range of gray "1" (No matter what was the original combination that created that gray=1) it will always have a chromacity range from 26.05194 to a max of 26.49074207....

The problem of the "normal" way is that when they do the conversion they don´pt check those kind of limits and accept extrapolations on the backward computation (CieLCh to RGB). So, it will inevitably clip the final values of R, G, B.

Say that we are trying to decrease/increase the chromacity of an image. On the "normal" way we can use whatever luminance we want with whatever chroma value, but, doing that we will clip the result generating a color that simply was not supposed to be there.
So, using a table, it is easier to make the values be restricted on their own boundaries.
If we are looking for luma = 5.6e-1 (or looking for gray = 2. It really don´t matter because they are the same thing. Gray=Luma range), all we need to do is search for it on the table where the luminance have a value of 2 (for example, that corresponds to that range of 5.48...e-1 to xxx) and take the chroma from there.

And the hue you may be able to use the one you inputted. What it seems is that the difference of chromacity is not enough to make a color be distinguishable from another. What it do matters is the hue angle.

Now...all i need to know is that if there is a similar limit for hue as well. (it seems there do exists, but i simply can´t find it using their own "normal' equations.)

The final result i hope we can get is something like this:

Code Select

Gray/Color	Luminosity (Min)	            Chroma Min                            Hue Min
            0	                        0	                                    0
            1	                        2.741066938704112e-1	26.0519460866180125               0
            2	                        5.48213387740825841e-1	26.49074207984956658              35.1215º

If there is also the same limits for hue (Which seems to exists) then the backward computation will be extremely easy as a pointer to tables.

What i´m finding in all of this is that the concepts of CIE are incorrect. Luminosity is simply the frequency of a luminous waveform, and what they call "chroma' is, in fact, the intensity of the luminosity over a pixel. And hue is the actual "chroma". They claims that Luminosity is completely isolated from Chroma and Hue, but that´s pure rubbish. Luma and "Chroma' seems to be faces of the same thing, but one is related to the wavelenght and other the intensity of it. Hue is the actual chroma that make our eyes identify different colors.

guga · February 18, 2019, 04:06:11 PM

About

QuoteIn your case, where you want to combine the luma of bitmap A with the chromas of bitmap B to construct a new bitmap from bitmap A with new colors we only need to create a 256 colors palette.

On this stage, yes. Using 256 different colors is the 1st stage of the colorization method. It is the easier to see if the algorithms works as expected before develop it on a more extensive way. But if we want to retrieve more colors (65536 or 16 million), other techniques will need to be used, such the searching for neighbours pixels.

So far the results i´m having with a 256 colors palette are ok, way better then i expected, in fact (although still limited to 256). Once we succeed to make it works correctly with those 256 colors, then making it works on 655356 or 16 million of collors would be way easier, since the basics technique will be already developed and the colorspace conversion functions properly fixed.

Oh...another thing...i´ll put here as a note so i won´t forget later :icon_mrgreen: :icon_mrgreen: Since there is a table of fixed values for "Chroma" (That, in fact seems to be only the intensity of luma), then we can only uses 2 values for the CieLCh colorspace. We wil no longer need to use "Chroma' to retrieve any color since all we need to do to retrieve Chroma is pointing to the luma table that will also contain the fixed minimum values of chroma already precalculated (i only need to check how big are the differences of this range of chroma to see if the minimum values can also be used on a fixed way). All we may need is Luma and Hue. I´ll make some more tests before trying to fix the equation to see if this can also be a possibility to use. If this is correct, then we may have created a new perceptual colorspace with less variables to use

Siekmanski · February 18, 2019, 05:53:25 PM

Can't wait to see the results of your approach. :t
I still have a lot to digest.

guga · February 21, 2019, 02:22:41 AM

Ok, guys, i finished the properties of the equations, but i´m having some problems to solve for Red, Green and Blue parameters.

I´m doing the equations backwards (i.e.: reversing it to make the proper fixes) since Red, Green and Blue are directly related to X, Y, Z.

What is the better way for posting the equations for you ? PDF, doc, simple text ?

I found some interesting properties for "a" and "b' and also from x, y, z, but as i said, using their own equation won´t work at all. I found the limits for all of those monster, but i´m having trouble to go deeper reverting the algorithm until it´s original values for Red, Green and Blue.

The goal is to make the function fixes the inputed values of Red, Green and Blue before going deeper on the computations of XYZ->CieLab->CIELCH. Since i found their properties, the backwards computation seems ok too. But, fixing the inputed values is needed to be done 1st on the RGB to CieLCH

Can someone help ?.

daydreamer · February 21, 2019, 06:09:48 AM

Quote from: guga on February 21, 2019, 02:22:41 AM

Can someone help ?.

please post pdf ,I dont know if I can help,its complex formulas
would it help if you check 8x8 pixels simultanously?

Siekmanski · February 21, 2019, 06:21:25 AM

Quote from: guga on February 21, 2019, 02:22:41 AM

What is the better way for posting the equations for you ? PDF, doc, simple text ?

Whatever you like.

Quote
I found some interesting properties for "a" and "b' and also from x, y, z, but as i said, using their own equation won´t work at all. I found the limits for all of those monster, but i´m having trouble to go deeper reverting the algorithm until it´s original values for Red, Green and Blue.

The goal is to make the function fixes the inputed values of Red, Green and Blue before going deeper on the computations of XYZ->CieLab->CIELCH. Since i found their properties, the backwards computation seems ok too. But, fixing the inputed values is needed to be done 1st on the RGB to CieLCH

Can someone help ?.

This is not an easy task, if you ask me. ( very complicated algorithm.... )

daydreamer · February 22, 2019, 05:30:17 AM

Would it be possible with fixed point/integer code guga?
Using SSE2 integer instructions?
Hue conversion with help of byteshuffles

guga · February 22, 2019, 06:51:32 AM

Hi Guys, i´m finishing the pdf for you :t

Daydreamer.
For the luminance values, it is possible to use integer since i succeeded to make Luminance be used as a look up table relating it to grays integers. For chromacity, i´m not so sure yet, since i found out that the range of chromacity for each luma is bigger then i thought (Although restricted to each luma as well. For example, when luma is on the same range as gray = 190, chromacity ranges are from 67.23 to 157). I started yesterday creating a fraction to be used for chromacity to it stay on the range of 0 to 100 (or only from 0 to 255, too - that could be used as an integer/index) but didn´t tested it yet, since the equations that makes the relation between chromacity and hue are still problematic.
About Hue, i´m quite sure it cannot be done with integers, since the range for hue is bigger and the formula have issues on it when trying to force it to be inside it´s own limits.

Using SSE can ensure the speed for the multiplication of the values in the tristimulus matrix and also and on the look up table and the trigonometry functions that are needed to be use but, 1st it is important to fix the equations.

The problem of CieLab/CieLCH is that they are trying to emulate a spherical color space using values that simply don´t fix to spherical coordinates (i.e: coordinates, x, y, z - Not the colorspace XYZ, but coordinates itself, as plotting on a 3d graphic). So, what they call "hue" is not exactly an angle between 2 of the axis (Even though they uses red and blue to reproduce this "pseudo axis"). I tried to represent the functions in a 3d space using R, G, B as the axis coordinates, but didn´t succeeded to reproduce where the values of Hue, "a", "b' (and neither x, y, z) fits to it. It´s like you trying to put an elephant inside a ping pong ball :icon_mrgreen: :icon_mrgreen: :icon_mrgreen:

I´m posting the pdf today as soon i finished cleaning the equations for you.

To you guys have an idea of what i´m doing, here is a screenshot containing the luminance table and some values for Chromacity i inserted on it to fits to the ranges. The "Ws_Matrix" is a huge structure that holds all necessary information to be loaded previously the RGB to CieLCH and CieLCH to RGB starts. Once all necessary data is fully preloaded, the app runs way faster. (Can do better with SSe, for sure :t)

The MASM Forum

News:

Fast Compare Real8 with SSE and ColorSpaces

daydreamer

guga

guga

Siekmanski

daydreamer

guga

Siekmanski

guga

guga

Siekmanski

guga

daydreamer

Siekmanski

daydreamer

guga