Author Topic: Fast Compare Real8 with SSE and ColorSpaces  (Read 6862 times)

Siekmanski

  • Member
  • *****
  • Posts: 1906
Re: Fast Compare Real8 with SSE
« Reply #90 on: February 22, 2019, 05:12:46 PM »
Quote
The problem of CieLab/CieLCH is that they are trying to emulate a spherical color space using values that simply don´t fix to spherical coordinates (i.e: coordinates, x, y, z - Not the colorspace XYZ, but coordinates itself, as plotting on a 3d graphic). So, what they call "hue" is not exactly an angle between 2 of the axis (Even though they uses red and blue to reproduce this "pseudo axis"). I tried to represent the functions in a 3d space using R, G, B as the axis coordinates, but didn´t succeeded to reproduce where the values of Hue, "a", "b' (and neither x, y, z) fits to it. It´s like you trying to put an elephant inside a ping pong ball :icon_mrgreen: :icon_mrgreen: :icon_mrgreen: :greensml: :greensml:

I think they don't fit because, RGB is spaced at 120 degrees steps and CieLCH is spaced in 90 degrees steps for the color range.
Creative coders use backward thinking techniques as a strategy.

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #91 on: February 22, 2019, 07:19:36 PM »
Hi Siekmanski

Quote
I think they don't fit because, RGB is spaced at 120 degrees steps and CieLCH is spaced in 90 degrees steps for the color range.

Probably, but i figure it out that, in fact, CieLCH have a fixed range of 270º (and, of course 4 gaps - from the 'normal' equations. 2 for gray and 2 for non existent color. Those gaps are separated by 90º each. After the fix, one of them that represented non existent colors, is not used any longer. 8)  ). I managed to fix those damn equations and inserted a multiplicand fraction in the final X (prior to the conversion to CieLab -> CieLCH)

I found practically all limits now. The only thing i need to make sure is the minimum limit for Hue (It seems to starts at 0º now, but..i´ll have to make sure, because it started at 68º previously, immediatelly after the gap for non existent colors - now fixed). The maximum range is all there. It ends at approximate  Hue of 338º.

Also chromacity and Luma i succeeded to find a ratio for it. I named it as CLFactor whose maximum value is sqrt(29). I´m only needing now to check for the minimum value (most likely is 5, but i didn´t confirmed yet. Too tired  :dazzled: :dazzled:)

I´m posting here the complete algorithm with the necessary fixes and all properties, limits and how i achieved those results. It is only missing the minimum values described in the end of the paper (last chapter/section).

For what i saw so far, my preliminary thoughts was correct, Chroma is only the intensity of luma and they can be used together to retrieve a value of Hue.

Since i´m putting them on tables, it would be way easier to the backwards computation using all those limits. Only 256 will be necessary for the limits of Chroma and also Hue can be limited to each Luminance range as well.

Here is the pdf to you check. Sorry for the mess. I did my best to try to make this paper more readable. :greensml:

If you can, please, check the math involved to see if the logic and results are really those i found :t
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #92 on: February 23, 2019, 02:25:14 AM »
Hi Guys, Gooooooood morning, Vietnam  :greensml: :greensml: :greensml: :greensml:

A small fix.

I was testing the results using the new XFinal, but found out errors on the ouput that still are extrapolating the hue angle and CLFactor (it is generating values bigger then sqrt(29)). While reviewing the equation, i probably mistaken when multiplied x by *(1-2/sqrt(29))

Since i was taking onto account the value YFinal to the maximum, i probably needed to adjust X to fits to Y and not making it also isolated.

The fix i did was simply making XFinal = YFinal*(1-2/sqrt(29)). Doing this, the resultant values no longer generated errors, except a few 111 (among 16 million colors) that were caused due to rounding error.

So, if i did it correct this time, then, the values of XFinal, YFinal and ZFinal are given only by:

Code: [Select]
If  Y > (216/24389)
    YFinal = Y^(1/3)
Else
     YFinal = Y*(841/108) + (16/116)
Endif

XFinal = YFinal*(1-2/sqrt(29))

If  Z > (216/24389)
    ZFinal = Z^(1/3)
Else
     ZFinal = Z*(841/108) + (16/116)
Endif


Doing this, the values of X, Y, and Z, produces the correct values for hue and chroma, limited to the CLFactor max of sqrt(29).

Can someone please check if this updated fix is correct ?

It´s unlikely, but if it´s correct, then probably i succeeded to restrict Hue angle on a true 3d spherical coordinates making the maximum range of the angles be restricted to something around 90º as i was trying to do last year with a new ColorSpace i was developing also biased on one quadrant of a 3d sphere
« Last Edit: February 23, 2019, 04:26:10 AM by guga »
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #93 on: February 23, 2019, 11:00:12 AM »
Hmm..forget the above post. It works on the RGb to CieLCh but don´t works on the reverse operation. We need to try fixing the original equations in the pdf. I´ll continue tonight.
« Last Edit: February 23, 2019, 01:31:10 PM by guga »
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #94 on: February 24, 2019, 01:23:33 AM »
Did you guys understood what i tried to explain in the pdf ?

Is it possible to try fixing the equations on a way similar to this i posted in page 19 ?

Code: [Select]
If  X > (216/24389)
    XTmp = X(1/3)
Else
     XTmp = X*(841/108) + (16/116)
Endif
XFinal = XTmp*(1-2/sqrt(29))

If  Y > (216/24389)
    YFinal = Y(1/3)
Else
     YFinal = Y*(841/108) + (16/116)
Endif

If  Z > (216/24389)
    ZFinal = Z(1/3)
Else
     ZFinal = Z*(841/108) + (16/116)
Endif


It seems that XFinal can be retrieved with XFinal = XTmp*(1-2/sqrt(29)) (and perhaps ZFinal too), in order to the equation fits to the limit of sqrt(29), but i simply cannot find the proper way to fix those before the conversion to "a", 'b" (That generates chroma and hue) occurs.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

daydreamer

  • Member
  • ****
  • Posts: 904
  • watch Chebyshev on the backside of the Moon
Re: Fast Compare Real8 with SSE
« Reply #95 on: February 24, 2019, 03:59:02 AM »
Did you guys understood what i tried to explain in the pdf ?

Is it possible to try fixing the equations on a way similar to this i posted in page 19 ?

Code: [Select]
If  X > (216/24389)
    XTmp = X(1/3)
Else
     XTmp = X*(841/108) + (16/116)
Endif
XFinal = XTmp*(1-2/sqrt(29))

If  Y > (216/24389)
    YFinal = Y(1/3)
Else
     YFinal = Y*(841/108) + (16/116)
Endif

If  Z > (216/24389)
    ZFinal = Z(1/3)
Else
     ZFinal = Z*(841/108) + (16/116)
Endif


It seems that XFinal can be retrieved with XFinal = XTmp*(1-2/sqrt(29)) (and perhaps ZFinal too), in order to the equation fits to the limit of sqrt(29), but i simply cannot find the proper way to fix those before the conversion to "a", 'b" (That generates chroma and hue) occurs.
I dont know,couldnt you let computer run try different constants until you get the right results so forward conversion pixels until it matches backward conversion with a small tile?,or output graphic curve makes understand visual?

Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
what cpu handle "press any key"? any cpu of course(from C#) :D

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #96 on: February 24, 2019, 07:00:49 AM »
Quote from: daydreamer
I dont know,couldnt you let computer run try different constants until you get the right results so forward conversion pixels until it matches backward conversion with a small tile?,or output graphic curve makes understand visual?

Hi Daydreamer.

I was doing it right now. I´m building a routine to find the maximum and minimum value of CLFactor to see if i can find how much it is bigger then sqrt(29).

Most likely, the bigger CLFActor is found on the smaller delta values (differences) between X, Y, Z. If i can find it using a few routines (I mean, calculating it from a maximum of 9 checks for different combinations using maximum R, G, B ranges using only 0 and 255 ) then probably i´ll see how much it is extending the limit in order to establish a fixing ration on XFinal, YFinal and ZFinal.

I´ll see if i can find the maximum values from the equation i created at page 21



Which is simply, the same as:



I´m not sure about the results, because CLFactor is directly related to Hue and the last time i tried it this past week, i had to make the routines runs on all the 16 million colors to retrieve the maximum Hue. Also, probably i´ll have to use the cubic form (power of 3) on each X, Y, Z and CLFactor, due to the threshold limits that uses the power of 1/3 to retrieve those values, but, don't know yet

Not sure what will result, but i´ll give a try anyway.

I tried to use the CieLab on the 3d coordinates but was unable to understand how "a" and 'b' are related to the X, Y, Z axis. CieLab/CieLCH are a cylindrical colorspace trying to emulate a spherical one.  But uses those damn (x-y)*500 (y-z)*200 and thresholds to compute afactor, bFactor and thus, Hue and Chroma.

I have no idea how to represent the "x-y" and "y-z" on the 3d axis in order to try to fix it using basic geometry. I thought in displaying them on a sort of a cube containing inside the cylinder represented by CieLab, but don´t know how to represent this "x-y" stuff. I thought in sides of a cube forming the sides of the cylinder, but i´m clueless how to properly represent this.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 1906
Re: Fast Compare Real8 with SSE
« Reply #97 on: February 24, 2019, 09:40:52 PM »
We could get rid of the atan2, sin and cos trigonometric functions and calculate A and B to create the 3D space coordinates by creating a linear HUE table in the color conversion we want.

RGB color space:
luma = lightness, and brightness.
hue = color components = red(0 degrees) to green(120 degrees) to blue(240 degrees) to red(360 degrees)



As you can see, there is only a transition between 2 of the 3 RGB color components each 120 degrees.
We can use this as our A and B components.
The first 60 degrees is where A is RED 255,0,0 to YELLOW 255,255,0
The next 60 degrees is YELLOW 255,255,0 to GREEN 0,255,0
This completes the transition in 120 degrees between A and B in 512 steps.
A full 360 degrees = 3 * 512 = 1536 HUE color combinations.
 
These are all the integer hue color combinations ( without the luma component).

1536 * 256 (luma values) = 393216 RGB color combinations, which is 2,34375 % of all possible 65536 chroma values.
Thus we need a factor to calculate all 65536 different chromas.

65536/1536 = 42.66667 ( forward )
1536/65536 = 0.0234375 ( backward )

CIE color space:
red (0 degrees ), yellow(90 degrees), green(180 degrees), blue(270 degrees) and red(360 degrees)
A full 360 degrees = 4 * 512 = 2048 HUE color combinations.



65536/2048 = 32.0 ( forward )
2048/65536 = 0.03125 ( backward )

Now we can get A and B from a table of 2048 values using the factor or create a table with 65536 precalculated values.
I think 2048 values which only need 1 mul will be faster than a 65536 values without the mul. ( smaller data cache )

3D space:

X = A
Y = B
Z = Luma = ( R*0.1762044 + G*0.8129847 + B*0.0108109 ) ; is the Y value of the CIERGB2XYZ tristimulus matrix.

"And we don't need to mess with negative values because every value is automatically translated to the first quadrant ( X+, Y+ )"
"And we also don't need to use compare instructions because the table is a power of 2, we can use the AND instruction"

This is as far as I understand the CIE 3D colorspace, beware, I could be wrong though.......
Creative coders use backward thinking techniques as a strategy.

daydreamer

  • Member
  • ****
  • Posts: 904
  • watch Chebyshev on the backside of the Moon
Re: Fast Compare Real8 with SSE
« Reply #98 on: February 25, 2019, 12:17:30 AM »
Siekmanski
It's simple linear interpolate hue between two colors 0-60  and shuffle them according to which sector they are in
One color channel increases with angle, while the other decreases,maybe can be done in similar ways with CieLCh but with 90 degrees,if that doesn't work with linear interpolation,try with cosine or other curve
Cubic colorspace remind me of super fast raycasting
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
what cpu handle "press any key"? any cpu of course(from C#) :D

Siekmanski

  • Member
  • *****
  • Posts: 1906
Re: Fast Compare Real8 with SSE
« Reply #99 on: February 25, 2019, 12:44:32 AM »
Hi Magnus,

Yes, that is exactly what I was thinking, hence the linear LUT method.  :t
We will see, I have to study CieLch a bit more to know what I'm doing.
If I read guga's paper, a lot of corrections are needed and makes the algorithm more complex then it already is.....

But it has my interest, wish I had more time left to spend on it......  :biggrin:
Creative coders use backward thinking techniques as a strategy.

Siekmanski

  • Member
  • *****
  • Posts: 1906
Re: Fast Compare Real8 with SSE
« Reply #100 on: February 25, 2019, 01:32:45 AM »
Hi guga,

Is this the correct route from RGB to CIElch?
rgb2xyz -> xyz2lab -> lab2lch and vice versa?
Creative coders use backward thinking techniques as a strategy.

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #101 on: February 25, 2019, 02:28:41 AM »
Quote
Is this the correct route from RGB to CIElch?
rgb2xyz -> xyz2lab -> lab2lch and vice versa?

Hi Siekmanski

yes, this is the correct route.

Just keep in mind that when converting from xyz2Lab, a correction must be done only in the x Axis (XFinal). I´m close to find a constant to be used in the xAxis before it is transformed to originate Lab values. The hue angle have a maximum around 338º. Those graphics on wiki are not precise because CieLCH forms a parabolic cylinder, but the "a" "b" used in CieLab to compute hue and chroma are not exactly you may call "axis". Hue does not occupy all 360º. That´s why the original CieLab function produces incorrect results but, i´ll probably close to find a constant for it. (In fact, i found one, but not sure if it will work on all color models yet. (sRGB, HDTV, Adobe tristimulus matrices).

Hue can be calculated from
hue = atan(5/2) + asin(-ClFActor*sqrt(29)/29) and using the image i posted previously we can get rid of the trigonometric computations using an approximation of the asin,acos, or atan functions, such as the chebyshev functions you described. I found three functions that uses an approximation in nvidia, but i was unable to port it yet. On this, way, calculating a hue with a constant (atan(5/2) plus a polynomial fucntion will be probably faster then using atan from Fpu

You can calculate CLFactor directly from x,y,z or a more direct way from chroma as:
CLFactor = 1000*y/Chroma . Note: "y" from xyz. If you want to use Luma, just replace y = (Luma+16)/116
Thus, hue = atan(5/2)+asin(-(1000*y/chroma)*sqrt(29)/29)

Can you give a try porting/optimizing these (With doubles- Real8, rather then float)?
Specially, this one (asin) 8)
Code: [Select]
float asin(float x) {
  float negate = float(x < 0);
  x = abs(x);
  float ret = -0.0187293;
  ret *= x;
  ret += 0.0742610;
  ret *= x;
  ret -= 0.2121144;
  ret *= x;
  ret += 1.5707288;
  ret = 3.14159265358979*0.5 - sqrt(1.0 - x)*ret;
  return ret - 2 * negate * ret;
}

Code: [Select]
float acos(float x) {
float negate = float(x < 0);
x = abs(x);
float ret = -0.0187293;
ret = ret * x;
ret = ret + 0.0742610;
ret = ret * x;
ret = ret - 0.2121144;
ret = ret * x;
ret = ret + 1.5707288;
ret = ret * sqrt(1.0 - x);
ret = ret - 2 * negate * ret;
return negate * 3.14159265358979 + ret;
}

Code: [Select]
float atan(float y, float x)
{
float t0, t1, t2, t3, t4;

t3 = abs(x);
t1 = abs(y);
t0 = max(t3, t1);
t1 = min(t3, t1);
t3 = float(1) / t0;
t3 = t1 * t3;

t4 = t3 * t3;
t0 = -float(0.013480470);
t0 = t0 * t4 + float(0.057477314);
t0 = t0 * t4 - float(0.121239071);
t0 = t0 * t4 + float(0.195635925);
t0 = t0 * t4 - float(0.332994597);
t0 = t0 * t4 + float(0.999995630);
t3 = t0 * t3;

t3 = (abs(y) > abs(x)) ? float(1.570796327) - t3 : t3;
t3 = (x < 0) ? float(3.141592654) - t3 : t3;
t3 = (y < 0) ? -t3 : t3;

return t3;
}

I found them on nvidia at http://developer.download.nvidia.com/cg/asin.html that used the "Handbook of Mathematical Functions - M. Abramowitz and I.A. Stegun, Ed.". I downloaded the free version of the ebook, but couldn't understand how those values/constants were achieved, but could be worthful give a try and see if we can have those constants with a bit of more precision.
« Last Edit: February 25, 2019, 03:31:35 AM by guga »
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 1906
Re: Fast Compare Real8 with SSE
« Reply #102 on: February 25, 2019, 03:12:01 AM »
Aha, COOL!  8)

Do you really need real8 precision?
Afterall the end results are natural numbers ( integer values ).
I'll try to write the trig functions and put something together in my sparse spare time. ( this can take some time. )
Or you can try the LUT approach in my previous post.
Creative coders use backward thinking techniques as a strategy.

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #103 on: February 25, 2019, 04:13:34 AM »
Aha, COOL!  8)

Do you really need real8 precision?
Afterall the end results are natural numbers ( integer values ).

 :t :t Unfortunately, we will need Real8 since all data (Including those on the LUT) are written in Real8 (Matrices and constants) and we will need precision to avoid that during the backwards computation we won´t end up on another value for RGB. For example: if the input is R 200, G 251, B 108, it will be transformed to Luma, Chroma and Hue. But if the user do the conversion again, if we don´t grant a bit more precision, it will most likelyt the results be R 198, G 250, B 107, and if we do the conversion again...other numbers can be generated on a cascading error.

Quote
I'll try to write the trig functions and put something together in my sparse spare time. ( this can take some time. )
Or you can try the LUT approach in my previous post.

Tks  :t :t :t :t

Yeah, i´ll try the LUT approach oncei suceed to fix the XAxis problem. If i suceed i can try to use integer for Luminosity and Chroma, forcing them to stay on limits as, let´s say: 0 to 255 (as the same for RGB integer) and for the hue ranges i´ll need to see how big is the difference on each luma before giving a try to make them work on integers too.

For example: Luma can be done as simple integers whose real values comes from the LUT. Chroma also maybe used as such (If the range is not too big) for each Luma range. So, If Luma = 100 (Index), Chroma can have a range between 58.584 to 137.1245 and thus, it would be enough to make DeltChroma/255 and use this fraction to be multiplied by the integers to retrieve the correct chroma values back from the LUT. Ex:
x*(137.1245-58.584)/255 + 58.584
On this way we can use an integer (x, for example from 0 to 255) to it stay withing the range of 137.1245-58.584

About doing the same for Hue, i´m not sure yet, since i´ll need to check if the  ranges for it are too much bigger then the ones from Chroma and how Hue is affected by Chroma as well. But, it is possible to use Hue as a range of integers as you proposed from 0 to 2048 (for example. or even from 0 to 255 too, is possible to be made) and without needing to build an extra table for that. It would be possible only calculate a ratio for Hue too after seeing precisely how it is related to chroma

At the end, if all of this goes ok, everything will be calculated taking the integers for Luminance in account and simply adjusting chroma (or hue, both in integers too, i hope) to retrieve the proper values from the Table of structures containing only 256 elements.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 1906
Re: Fast Compare Real8 with SSE
« Reply #104 on: February 25, 2019, 04:50:06 AM »
Quote
:t :t Unfortunately, we will need Real8 since all data (Including those on the LUT) are written in Real8 (Matrices and constants) and we will need precision to avoid that during the backwards computation we won´t end up on another value for RGB. For example: if the input is R 200, G 251, B 108, it will be transformed to Luma, Chroma and Hue. But if the user do the conversion again, if we don´t grant a bit more precision, it will most likelyt the results be R 198, G 250, B 107, and if we do the conversion again...other numbers can be generated on a cascading error.

We store the real4 end results for the backwards conversion, it will have enough precision.
We only have to  convert them from real4 to integer ARGB format if we present them to the screen as a Bitmap Image.
Creative coders use backward thinking techniques as a strategy.