Author Topic: Fast Compare Real8 with SSE and ColorSpaces  (Read 6313 times)

daydreamer

  • Member
  • ****
  • Posts: 894
  • watch Chebyshev on the backside of the Moon
Re: Fast Compare Real8 with SSE
« Reply #45 on: February 07, 2019, 12:25:39 PM »
Tks a lot, JJ :)  :t :t :t

I`m creating a pdf about the CieLCH routines describing the algorithms and trying to figure it out some limits of it to be used on the backwards convertion from CieLCH to RGB. As soon i finish, i´ll post it here to see if you guys can help me finding the correct limits for this. The goal is make the backwards calculation more accurate  avoiding clipping the final result. And, of course, once it´s fixed we can try optimize it further. I read your´s algo on fastsin to later use on the CieLCH routines, and it´s amazing :)

The next  optimizations will be in sin, cos, atan and power (exponencial, logaritm etc) functions :)
its not good to go from real8 precision color channels to RGB being integer byte size,I think it should be better with floats and maybe final antialiasing filtering before final conversion to bytesized RGB channels
maybe try 2tiles,which user can choose bigger or smaller tilesize to match their cpu's cache better,2tiles=two threads/two cores,seen most timing reports here show most of us have modern cpu's,so why not try that solution?

I also have question about minimum reference pic size to make it work?are there any rule you follow when you have 4times bigger than greyscale?

read PDF,ouch too many divisions there
slope and threshold is depending on slope,would it be possible to change slope calculation to get inverse slope instead,so you can change to threshold=offset/(gamma-1)*inverseslope ?
restricting max/min of gamma can be made using MAXSD and MINSD
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
what cpu handle "press any key"? any cpu of course(from C#) :D

guga

  • Member
  • *****
  • Posts: 1041
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #46 on: February 07, 2019, 02:40:08 PM »
Hi daydreamer :)

Quote
its not good to go from real8 precision color channels to RGB being integer byte size,I think it should be better with floats and maybe final antialiasing filtering before final conversion to bytesized RGB channels
maybe try 2 tiles,which user can choose bigger or smaller tilesize to match their cpu's cache better,2tiles=two threads/two cores,seen most timing reports here show most of us have modern cpu's,so why not try that solution?

I prefer first to try to understand how Luminance, Chroma and Hue are related to each other. It is possible to use integers (From 0 to 255), instead Floating Points (Real8 or Real4), but, in order to do that, those "monster" equations needs to be fixed. There´s no need for an anti-aliasing filter on the Conversion functions. It would slow down the final conversion a lot and there´s no guarantee that the final result would be accurate. ABout user´s choice on titlesize or using more then 1 thread, i overcome that using tables for luminance (Kfactor and Lumamap) Kfactor are only 256 different and unique values regarding the gamma/offset, already pre-calculated before the main function starts. This speeds ups the computation a lot, since we don´t need anymore to perform all gamma calculation whenever we use the RGB to CieLCH function. The same thing for a lumaMap i created that is a simple table of 256 ranges of luminance each one of them related to a gray color as well. Both tables works on the 2 types of operation, i mean, works for RGB to CieLCH and the backward computation (CieLCh to RGB).

Quote
I also have question about minimum reference pic size to make it work?are there any rule you follow when you have 4times bigger than greyscale?

Well...i didn´t established any limit for the size of a image yet. At least not on the RGB to CieLCH and CieLCH to RGB functions. However, i was forced to create some limit on the algorithm i´m making for convert gray to color, because if the reference image is big (Let´s say, 2900x1920 etc ) it will generate a buffer in memory with something around 2 GB to create the samples, and i was forced to limit that because on my PC, windows refused to work on files (or memory buffer) with 2GB :greensml: :greensml: :greensml:
Of course, this was only a preliminary test, i don´t plan to create a monster file sample to be used. The plan is create several different tiny pieces of samples to be used. I´m pretty sure that, even a small sized colored image, such as 64x64 can provide enough samples to convert from gray to color, but i did not reached that part of the development yet.

The importance of the RGB to CieLab and his reversal algorithms are exactly to avoid the usage of unnecessary color combinations. From what i´m concluding so far, there´s simply no such a thing as 16 million colors. We may have millions of "colors" that are the exact same thing as another combination of values and all we need to do is a way to find them to be removed from the final computation. Pixel values are one thing, but colors (In terms of hue/chroma/luma inter-dependencies) are another. Ex: RGB (100,99,40) is the same color as RGB(100,99,48) despite the fact that it differs in 1 byte/pixel. And we have thousands (if not millions) of those unnecessary color combinations to represent the very same color.

I was surprised to see that luminance was responsible for the color that a gray image should have and also even more when figure it out that gray is nothing more then a small range of luminance.

So, i´m trying to figure it out if such ranges also exists for chroma and hue. I mean, if we have a luminance range from 88 to 89 (Let´s say, a gray color of 136), i wonder if that range also have specific ranges of chroma and hue to work with.If i did the math correctly (and it is a Big IF  :bgrin: :bgrin: :bgrin:) then this limits may exists if the following equation is accurate:

Chroma  <  [(1000/116)*(Luma+16)] / [5*cos(Hue)-2*sin(Hue)]

The only problem is see if there is a limit for chroma (or even hue) for each luminance range as well. (I hope there is :) ) If it do, all it is needed is one or more table containing those ranges (min and max) and the backwards conversion will be even more accurate (and perhaps faster).

Quote
slope and threshold is depending on slope,would it be possible to change slope calculation to get inverse slope instead,so you can change to threshold=offset/(gamma-1)*inverseslope ?
restricting max/min of gamma can be made using MAXSD and MINSD

I already did the backwards  computation. I just didn´t created the pdf containing the reversal (backwards) algorithm yet, because those limits needs to be found 1st. If there are such limits/boundaries, then the backwards computation will be more accurate and without clipping.

If necessary, i can try making the backwards pdf right now, but, perhaps, solving that small issues on the algorithm would be better, i think.

The backwards maths is quite trivial, btw. It´s basically the multiplication of the inverted matrices, after converting back from Chroma/Hue/Luma to XYZ colorspace. Once the multiplication is applied it retrieves back the same values of the "Kfactor" (gamma, etc etc), and all is needed is search for them on their own tables, rather then making more math to retrieve that exact same color value.

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 1859
Re: Fast Compare Real8 with SSE
« Reply #47 on: February 08, 2019, 03:11:58 AM »
First of all, I'm not an expert in this topic and maybe I'm totally missing the point here.  :icon_eek:

Just my thoughts over this,

A = Source Bitmap
B = Reference Bitmap
D = Destination Bitmap
L = Luminance = 256 grey values ( humans are most sensitive to these )
C = Chroma = 256 * 256 = 65536 color values

As far as I understand we are using the L of A and the C from B to construct D.

Then we need to construct a Chroma table from B and calculate the missing Chroma colors to complete the total of 256 Chroma colors in the order of the 256 L (grey) values.
Now we can combine the new Chroma table with the Luminance of A to construct D.

How do we handle different Chroma values with the same Luminance?
Do we average them?

The question is which color space model do we use for this?

My first thought was, because we only deal with a maximum of 256 different pixel colors for the Destination bitmap,
why not calculate the missing Chroma colors by interpolating between the neighbour Chroma colors, using RGB2YUV <-> YUV2RGB....... ( we can do this fast in SIMD with Dot-Matrix calculations )

If we are lucky, all the Luminance values are present in the Reference Bitmap and we don't need any missing color space conversion calculations, only averaging?

The CIE L*C*h Color Space is a very costly and heavy calculation:

a and b are the polar coordinates:
Chroma = sqrt(a*a+b*b)
hue angle = atan(b/a)

convert back from polar to cartesian coordinates:
a = Chroma*cos(hue angle)
b = Chroma*sin(hue angle)

We could calculate it on the fly and use Chebyshev polynomials to do the trigonometry calculations....
Maybe we can simplify the trigonometry by using precalculated lookup tables?

If we look at the CIE color space image below, we could precalculate the outer rim color values.
Interpolating between the 4 colors red, yellow, green, blue and red, we will end up with 1024 Chroma values which hold all the Chroma values we need.
So we have 256 (Luma) * 1024 (Chroma) = 262144 colors to choose from to construct the 256 color Destination Bitmap Image.



The fastest way I think would be approximate the coefficients for a Dot-Matrix conversion, vice versa.

This weekend I will try to put something together if time permits.
Creative coders use backward thinking techniques as a strategy.

guga

  • Member
  • *****
  • Posts: 1041
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #48 on: February 08, 2019, 09:01:37 AM »
Hi Siekmanski.

Quote
As far as I understand we are using the L of A and the C from B to construct D.

Then we need to construct a Chroma table from B and calculate the missing Chroma colors to complete the total of 256 Chroma colors in the order of the 256 L (grey) values.
Now we can combine the new Chroma table with the Luminance of A to construct D.

How do we handle different Chroma values with the same Luminance?
Do we average them?

Yes, we are calculating the luminosity from the gray values and comparing it from the colored image.

STEP1 - Building the necessary tables of samples.

I built 2 tables. One to be used as a map pointer and other for the colored pixels themselves.

The 1st step is map every pixel and convert them to gray (that can used as an index, for example) luminosity, chroma and hue. I`m showing you here the very basic method of color transfer. I just extended it a bit, to allow other methods to be implemented later, ok ?

It is a array of structures on this format:

[Pixmap.PixAddress: D$ 0 ; a pointer to the address of the color pixel . Currently not used, but it may be necessary for other color methods, such as search for the neighbors pixels
 Pixmap.Gray: D$ 0 ; The gray value of that pixel. For example, RGB = 200, 212, 100, can result in gray of 150 (It´s just, an example. The correct value may be another). So we convert the color pix to gray and put it´s value here
 PixMap.Luma: R$ 0 ; A Real 8 value containing the luminosity of the colored pixel.
 PixMap.Chroma: R$ 0 ; A Real8 value containing the chromacity of the colored pixel
 PixMap.Hue: R$ 0 ; A Real8 value containing the Hue of the colored pixel
]

Size of PixMap = 32  bytes (In fact it is bigger, since it contains some more members in the current structure i updated, but i´m showing the basics, since the other members i´ll may remove those later while developing)

This structure forms a sample, from where we use to scan for a best match for it´s luminosity.

So, if the image has a size of 640*480, there are 307.200 pixels available as a potencial sample. Thus, we need a array of structures of 9830400 bytes long. (640*480*32 bytes)

I then sort this array of structures in crescent order of it´s luminance. (Btw...the each gray value corresponds to a range of luminance, so it will be easier later to find it´s correspondance on the map)

So, the Array after ordering may looks like this:

[Pixmap.Data0.PixAddress: D$ XXXX
 Pixmap.Data0.Gray: D$ 0 ; Once we order by luma, it will always starts with gray = 0 too ;) Luma Range for gray = 0 is >= 0 to < 2.741066938704112e-1.
 PixMap.Data0.Luma: R$ 1.23526e-2
 PixMap.Data0.Chroma: R$ 45.45646 ; In fact, it´s closer to 0 here. It´s just an example ;)
 PixMap.Data0.Hue: R$ 320.124

 Pixmap.Data1.PixAddress: D$ XXXX
 Pixmap.Data1.Gray: D$ 0  ; Again...if the image can produce a gray color whose value is 0, this member will also corresponds to it.
 PixMap.Data1.Luma: R$ 2.1e-1
 PixMap.Data1.Chroma: R$ 55.145
 PixMap.Data1.Hue: R$ 40.1454

 Pixmap.Data2.PixAddress: D$ XXXX
 Pixmap.Data2.Gray: D$ 1  ; Now the colored pixel corresponds to a gray whose value is 1. In other words, if we convert that pixel to gray, it will result in 1. Luma Range for gray = 1  is >= 2.741066938704112e-1 to < 5.48213387740825841e-1
 PixMap.Data2.Luma: R$ 2.741066938704112e-1
 PixMap.Data2.Chroma: R$ 114.125
 PixMap.Data2.Hue: R$ 20.2565

 Pixmap.Data3.PixAddress: D$ XXXX
 Pixmap.Data3.Gray: D$ 1  ; Now the colored pixel corresponds to a gray whose value is 1. In other words, if we convert that pixel to gray, it will result in 1
 PixMap.Data3.Luma: R$ 3.46e-1
 PixMap.Data3.Chroma: R$ 15.125
 PixMap.Data3.Hue: R$ 37.256

 Pixmap.Data4.PixAddress: D$ XXXX
 Pixmap.Data4.Gray: D$ 1  ; Now the colored pixel corresponds to a gray whose value is 1.
 PixMap.Data4.Luma: R$ 5.125e-1
 PixMap.Data4.Chroma: R$ 25.415674
 PixMap.Data4.Hue: R$ 44.689

(...)
; and we continue the array untill the last pixel that should correspond to a gray color of 255 (If it do exists on the colored image, btw).

 Pixmap.Data307.199.PixAddress: D$ XXXX
 Pixmap.Data307.199.Gray: D$ 255  ; Now the colored pixel corresponds to a gray whose value is 255. In other words, if we convert that pixel to gray, it will result in 255
 PixMap.Data307.199.Luma: R$ 100
 PixMap.Data307.199.Chroma: R$ 124.2
 PixMap.Data307.199.Hue: R$ 45.18
]

So, our colored image produced 307200 samples and among them we can have 2 related to gray pixel = 0, 12548 related to gray pixel = 1 and so on. We are basically ordering to count how many gray pixels are generated from each colored one. Since we are using CieLCH colorspace, each generated gray value from the colored pixel will always correspond to a given range of luminosity (also in crescent order), so, when ordering the samples, whenever you order it by lumonsity you won´t have to worry about the ordering of the corresponding gray, because it will always corresponds to a fixed value (From 0 to 255) for a given luma range.

As explained previously, I found out that  luma is directly related to gray, no matter what is the pixel combination. Thus, the 256 gray values always corresponds to 256 chunks/ranges of luma. Like this for sRGB D65:

    Gray  Luminosity (Min)
    0       0
    1       2.741066938704112e-1
    2       5.48213387740825841e-1
    3       8.22320081611237263e-1
    4       1.0964267754816519
    5       1.37053346935206344
    6       1.64464016322247808
    7       1.91874685709288939
(...)
    253     99.309586872082832
    254     99.6549222327689391
    255     100


So, once we finished creating a sample file we can now map it to make the transfer method easier and faster.

To do that, i created a 2nd table containing only 256 structures whose members maps the samples. Each structure is formed by 5 dwords.

[LinkedColoredMap.Data0.GrayColor: D$ 0 ; the gray color index. Always starts with 0 corresponding to gray 0. I know, it maybe unnecessary, sice we can simply poits to the array directly, but i´m using it here for further development
 LinkedColoredMap.Data0.MapPtr: D$ 0 ; The pointer to the start of the sample corresponding to the gray color. in "GrayColor" Member. If there is no corresponding value, this member should be 0
 LinkedColoredMap.Data0.ColorCnt: D$ 0 ; The total amounts of gray of a certain value on the colored image.
 LinkedColoredMap.Data0.NextValidPtr: D$ 0 ; a Pointer to the next address in the sample array corresponding to the next gray value (For example, if the "GrayColor" member here is 100 we will point to the address on the sample map corresponding to gray 101 (If existant).  Also, if the GrayColor member here is 255, then the sample map will have no value at all (after all we have only 256 colors from 0 to 255), thus, this member will be 0 meaning we reached the end of the sample map. Also If the colored image does not produces any gray = 101, then this member will points to the next valid array (In the sample map) corresponding to gray 102 (also, if existant) and so on.
 LinkedColoredMap.Data0.PrevValidPtr: D$ 0 ; A pointer to the previous address in the sample array corresponding to the previous gray value. Similar as above, but backwards. (For example, if the "GrayColor" member here is 100 we will point to the address on the sample map corresponding to gray 99 (If existant).  Also, if the GrayColor member here is 0, then the sample map will have no value at all (after all we reached the start of the sample map, whose values have only 256 colors from 0 to 255), thus, this member will be 0 meaning we reached the start of the sample map and no previous pixels exists. Also If the colored image does not produces any gray = 99, then this member will points to the previous valid array (In the sample map) corresponding to gray 98 (also, if existant) and so on.

The general looks of this array of structure look like this:

[LinkedColoredMap:
 LinkedColoredMap.Data0.GrayColor: D$ 0
 LinkedColoredMap.Data0.MapPtr: D$ Pixmap.Data0.PixAddress ; Address of the start of sample map (colored) whose gray value corresponds to gray = 0. If the colored image don´t have any gray pixel of 0, this member should be 0
 LinkedColoredMap.Data0.ColorCnt: D$ 2 ; Only 2 pixels on the colored image produces a gray value of 0
 LinkedColoredMap.Data0.NextValidPtr: D$ Pixmap.Data2.PixAddress ; The sample image do have pixels that generate gray = 1, thus points to the start of the address in the sample map
 LinkedColoredMap.Data0.PrevValidPtr: D$ 0 ; Since we are at the start of sampple map (Gray = 0), there no pixel before it. So, here the value is 0

 LinkedColoredMap.Data1.GrayColor: D$ 1
 LinkedColoredMap.Data1.MapPtr: D$ Pixmap.Data2.PixAddress
 LinkedColoredMap.Data1.ColorCnt: D$ 12548 ; There are 12548 pixels on that colored image that, after converted to grayscale, corresponds to gray value = 1
 LinkedColoredMap.Data1.NextValidPtr: D$ Pixmap.Data12549.PixAddress ; Start of the next address on the sample map corresponding to gray value = 2 (If existant). If it do not exists it will points to the next address corresponding to gray = 3 and so on
 LinkedColoredMap.Data1.PrevValidPtr: D$ Pixmap.Data0.PixAddress ; Start of the previous address on the sample map corresponding to gray value = 0 (If existant). If it do not exists it will points to the previous address. Since there no pixel before "0", thus this member will be 0, meaning that the colored image when converted to grayscale does not have any 0 gray value.

 LinkedColoredMap.Data2.GrayColor: D$ 2
 LinkedColoredMap.Data2.MapPtr: D$ Pixmap.Data12549.PixAddress
 LinkedColoredMap.Data2.ColorCnt: D$ 30000
 LinkedColoredMap.Data2.NextValidPtr: D$ XXXX
 LinkedColoredMap.Data2.PrevValidPtr: D$ YYYYY

(...)
Until the 256th array
 LinkedColoredMap.Data255.GrayColor: D$ 255
 LinkedColoredMap.Data255.MapPtr: D$ Pixmap.DataZZZZZ.PixAddress
 LinkedColoredMap.Data255.ColorCnt: D$ 4545
 LinkedColoredMap.Data255.NextValidPtr: D$ XXXX
 LinkedColoredMap.Data255.PrevValidPtr: D$ 0 ;  There nothing forward it


STEP 2 - Finding the Best match


In order to find which luminance best matches all is needed is to point the gray value (from the gray image to it´s correspondent member in the LinkedMap structure and start searching inside of it.

For example:

The pixel i want to color in the gray image is the value of "1" (From 0 to 255).

What needs to do is:

1s) Find the proper address in the LinkedColoredMap. On this example it is located at  "LinkedColoredMap.Data1.GrayColor"

On the address of the array we have some necessary info to look for.  We know that, the colored map we do have pixels that after converted generates a value of 1. We have a total amount of 12548 of them :greensml:

But which one to choose ? On this basic method, i´m choosing the pixel that have the value of it´s luminosity (in the colored map related to gray 1 that is closer to the luminosity value of gray 1.

Note: Since gray = 1 have a luminosity range equal or bigger then 2.741066938704112e-1 (Mininum) and smaller then 5.48213387740825841e-1 (Maximum), i´m looking for the match of the minimum gray value. Just for a convention, i could, as well, look for the average, or the maximum etc, but, in fact, for CieLab/CieLCH colorspaces, it really does not matter either the range is minimum or maximum, since they will always represents the very same color that produces that specific gray if they have the same hue. When we are dealing with pixels on the same hue (or very very tiny variances) this difference between maximum and minimum range of luminosity is not enough to make the color be distinguishable. Thus the colors on that range are similar to each other. Also eventual differences in chromacity (for the same hue) are also not enough to make them perceptually distinguishable.  This is why, for convention, mainly, i´m using the minimum value of luminosity

Continuing....if i´m trying to colorize a gray pixel of 1 i search for the valid correspondence in the Sample Map from the Linked Colored one. Thus, i´ll point it to  Pixmap.Data2.PixAddress and then select the very 1st array of that structure.

Why, selecting the 1st and not the others 12547 ? Because on this basic method, i´m selecting the very 1st one whose luminosity bests approaches to gray = 1 . In case, the one that is closer to MinLuma = 2.741066938704112e-1. I´m discarding (On this tests only) all others 12547 so i can later use in another method (pixel neighbour  seraching for average luminance, or looking for the Standard Deviation, or looking for exact pattern match like: looking for the exact position of the neighbor pixels etc).

It will points then to the Pix map address corresponding to gray = 1, that is the address at Pixmap.Data2.PixAddress. Since the PixMap is already ordered, the luma of it at PixMap.Data2.Luma will always be the one that is closer to the one i want to find. In case, both are "2.741066938704112e-1".

Then, i simply grab the values of Chroma and Hue from PixMap.Data2.Chroma and PixMap.Data2.Hue and convert them to color with a CieLCh to RGB function. But...i´m transfering Chroma and Hue and what about Luma ? Which one to choose ? The grayed one or the colored one ? Well...The intention is preserve the luminosity of the gray image and transfer only the Chroma and Hue from the Colored one. So when converting back from CieLCH to RGB, i use the luma related to the gray pixel (in the gray image) and uses the Chroma and Hue from the colored image.

2) What if the colored image don´t have the same gray value as the one we are trying to colorize ?

We then look for the next gray correspondence and search for the one whose luminosity best approaches. That´s why i created the "nextvalidptr" and "prevvalidptr" members. For example, if i´m trying to colorize pixel 1 in the gray image, but, on the colored image there no such a thing as a gray pixel of 1, i start looking for the previous and next valid gray colors.

On this situations, i start looking for NextValidPtr (A pointer to the color map as explained) and store the value of it´s gray value (and luminosity) and do the same for the previous one. Then i calculate a difference among them. The one that have the smaller difference is the best candidate to colorize.

Ex:

GrayPixel = 5 , but on colored image the nearest correspondent gray is 1 (From previous pointer to pixmap) and 6 (from next point to pixmap)

So i do:

Delta (In absolute value) = Gray - prev = 5-1 = 4
Delta (In absolute value) = Gray - next = 5-6 = 1

Ok, so i know that the value on the colored map that most approaches to the gray one is the "next Ptr" whose gray correspondent value is 6. Why ? because "6" is closer to "5" than "1" is closer to "5".  What happens when both deltas are the same ? Well..on this basic method, it choose any of them ;)

Now i simply point to the address at Pixmap corresponding to Gray = 6 and grab it´s hue and chroma and do the transfer on the same way as explained previously.

So, to convert from gray to color i´ll use the Luminosity of my Gray Pixel (in case: 5 whose Minimum Luma = 1.37053346935206344) and use the values of Chroma and Hue from  PixMap.Data6.Chroma/PixMap.Data6.Hue ("PixMap.data6" is an example, the address are whatever the ones corresponding to the start on the pixmap from a gray value = 6)


Quote
The question is which color space model do we use for this?

CieLCh is the one that best produces an accurate result. The correspondence between luminance and gray are more accurate in CieLCH/CieLab colorspaces then YUV, HSL etc etc

I´m not using Cielab because i found out that CieLab produces inaccurate results on the conversion back to RGB, that´s why i´m trying to fix the problems on CieLCH 1st. Also we can use Chroma and Hue to a more accurated method of color transfer as, for example, when we are looking for the neighbor pixels.

With CieLCH we can reduce substantially the total amount of samples to be used, since we can simply discard all samples that are the result of the analysis of the very same color. (Even if the RGB values are different, they may represent the same color.) It´s the difference between using 307.200 samples and using only 1000 of them.

Also, with CieLCH colorspace we can produce a image more accurate. Since we have the ability to remove all duplicated colors we can simply create new ones from their neighbors if needed. This maybe helpful to enhance the general quality of a image or even for tracking techniques in video processing. For tracking, pattern recognition, motion estimation etc, once we remove duplicated colors it will be easier to search for the similar ones on he next frame of a video, for example.

Quote
We could calculate it on the fly and use Chebyshev polynomials to do the trigonometry calculations....
Maybe we can simplify the trigonometry by using precalculated lookup tables?

Yes, this is what i did with the KfactorMap as explained on the pdf. The computation of almost 60% of the code is done before the main RGB to CieLCh and CieLCH to RGB starts. It builds the necessary look up tables to be used. The only thing i did not succeeded is to optimize the sin/cos/atan functions, but  it can be done with the Chebyshev algorithms you did (It is needed a fast a atan too using that method if possible ;) ). This would increase the speed a lot. Using the LUTs i already had a major advantage in terms of speed since i removed completely the math involving the power/logarithm functions and used a simple search on the LUT as provided by JJ´s (Which is amazing fast, btw. Sure, it do have one minor issue yet, but he is working it :)  )

As i mentioned, in my preliminary tests of colorizing with the "normal" method people uses, the rendering was taking more then half an hour to colorize one single image (960*720) and now colorizes almost immediately (it takes 1 or 2 seconds already counting the time it takes to compute the tables etc - which can be done when the app starts too or, even better, making the algorithm loads external files related to samples that were previously created by the user). Using external files may speed up a lot too, since all the algorithm will do is point to the necessary address in the loaded file and voilá :greensml:

Quote
If we look at the CIE color space image below, we could precalculate the outer rim color values.
Interpolating between the 4 colors red, yellow, green, blue and red, we will end up with 1024 Chroma values which hold all the Chroma values we need.
So we have 256 (Luma) * 1024 (Chroma) = 262144 colors to choose from to construct the 256 color Destination Bitmap Image.

Maybe, but the problem of CIE colorspace (CieLab or CieLCH) is that they do produces inaccurate result on the backwards computation (CieLCh to RGB). If you feed the function with your own Luma or new Chroma/Hue the resultant values of RGB will not be accurate since they will be clipped inevitably. That´s why i´m trying to figure it out a way to know the limits or relation in between Hue/Chroma and Luma to prevent such clippings. So far, i guess i was correct in finding one of the relations, that is:

Chroma  <  [(1000/116)*(Luma+16)] / [5*cos(Hue)-2*sin(Hue)]

But...i´m not sure yet. It seems that for each Luma range (That also corresponds to the ranges of gray as i explained) we can have a same range for Chroma (or even Hue). On this way it would be better to find their limits. But i´m still clueless on how to fix that. There no single paper describing how to fix that. They all uses the CiE papers as if they were absolutely correct (which is not the case btw as demonstrated by bruce lindbloom at http://www.brucelindbloom.com/index.html?LContinuity.html)
« Last Edit: February 08, 2019, 10:50:22 AM by guga »
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1041
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #49 on: February 09, 2019, 03:07:02 AM »
Hi Guys....

I maybe have found the limits for the CieLCH colorspace. If i did the maths correctly, the limits for Hue, Chroma and Luma are the following:

LumaMin = 0
LumaMax = 100

ChromaMin = 0
ChromaMax  = 185.6953381770518631465762238462182605619006952291306495751858162074705 (This is the maximum allowed value. The true computed limit for sRGB are a bit smaller, but within their own equations)


Since we have a Chroma and Luma Maximum Hue was supposed to have a maximum value too.
HueMax = 338.1985905136481882297551339130563354323346227336940419663º

This value is coincident with the limit for Gray. Whenever a pixel is gray it do have 2 hues of:
Hue when gray = 158.198590513648188229755133913056335432334622733694041966°
Hue when gray = 338.1985905136481882297551339130563354323346227336940419663º

Also, we have impossible values of Hue, when the formula divides by zero,. Those 3 angles should never be used:
68.1985905136481882297551339130563354323346227336940419663° (degrees) (Orange)
248.1985905136481882297551339130563354323346227336940419662° (degrees) (Blue)

Also, from the limits i found that Hue was supposed to cover all angles (Except those 2 ones that should never be used), but it do have gaps and extrapolations

When we are dealing with Maximum Chroma and Luma, Hue, was supposed to be limited to a 338.1985905136481882297551339130563354323346227336940419663º (The maximum degree where is related to gray) but, in fact, we do have pixels with a Hue of 359º 348º due to a failure on the general formula used for CieLab/CieLCh colorspaces. This fail causes the equations to clip the resultant values of RGB, which, in any extent, implies to conclude that we are retrieving other colors while trying to convert back.


I´ll see how to handle those limits for Hue properly after analysing the returning values of all 16 million colors. Most likely it seems that if i limit the hue angle to a max of 338º and min of 68º i may be able to recover the colors in between this gap. Or simply, will allow at the end the convertion of this gap, simply recomputing their croma and luma to avoid clippings. I´m thinking in what should be the best strategy to handle those colors whose Hue angles should never be there, in the 1st place
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

daydreamer

  • Member
  • ****
  • Posts: 894
  • watch Chebyshev on the backside of the Moon
Re: Fast Compare Real8 with SSE
« Reply #50 on: February 09, 2019, 05:06:10 AM »
looking forward to what you make Siekmanski

I have decided to work on these kinds of macros and also make ***PD versions of them
http://masm32.com/board/index.php?topic=6802.0
maybe also need to make inttofloat and floattoint,maybe some MMX/SSE2 integer make
maybe it would be good to have some imageoriented macros too?,for example load and store rectangle pixels macros?
any suggestions of useful macros are welcome
maybe could help speed up colorization

« Last Edit: February 09, 2019, 06:06:53 AM by daydreamer »
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
what cpu handle "press any key"? any cpu of course(from C#) :D

guga

  • Member
  • *****
  • Posts: 1041
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #51 on: February 09, 2019, 06:36:01 AM »
Macros for SSE are needed, indeed :t :t :t

Specially some of them related to conditional macros as "If", "Else", "EndIf", "Loop", "While" etc.  Did you have some macros example using SSE for conditional jmps ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1041
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #52 on: February 12, 2019, 05:47:47 AM »
Hi Jochen

This one is for 64 bIt. Do you have some info about 80 Bit - real10 ?

The bugs in mine version are probably because i forgot to include a routine to check for  numbers outside the range of the table
No, that's not a problem since we only feed valid numbers. The check can be omitted.

Quote
So, it could compare the index value found at eax, with the LOWORD of the HIDWORD of the next value (or previous one). If it also matches, then we have found the proper values.
Yes, something like that. But in most cases it's not necessary, since the numbers will be "distant" enough anyway. We are checking the high DWORD, i.e. the one that contains the exponent and part of the mantissa:

That "part" is 32-11-1=20 bits, or about 6-7 digits. Should be sufficient in most cases ;-)


I´m rewriting an old code to better displays the category of all FPU errors (when existent) for Real10. It was ok, so far, but probably there is a little bug when it tries to identify values such as:
[Value1: D$ 0, 0FFFFFFFF W$ 0] ; This number exceeds the limit for FPU. It is converted to 6.724206...e-4932 Does it exceeds the limit ?
[Value2: B$ 0FE, 07F, 0, 0, 0C0, 07F, 0, 0, 0, 0] ; This number exceeds the limit for FPU. It is converted to 5.1201424......e-4937 Does it exceeds the limit ?

Are those values/limit  correct ?

Code: [Select]

[SpecialFPU_QNAN 1] ; QNAN
[SpecialFPU_SNAN 2] ; SNAN
[SpecialFPU_NegInf 3] ; Negative Infinite
[SpecialFPU_PosInf 4] ; Positive Infinite
[SpecialFPU_Indefinite 5] ; Indefinite
[SpecialFPU_SpecialIndefQNan 6] ; Special INDEFINITE QNAN
[SpecialFPU_SpecialIndefSNan 7] ; Special INDEFINITE SNAN
[SpecialFPU_SpecialIndefInfinite 8] ; Special INDEFINITE Infinite

Proc RealTenFPUNumberCategory:
    Arguments @Float80Pointer
    Local @FPUErrorMode
    Uses edi, ebx


    mov ebx D@Float80Pointer
    mov D@FPUErrorMode &FALSE

    ...If_And W$ebx+8 = 0, D$ebx+4 = 0 ; This is denormalized, but it is possible.
        ; 0000 00000000 00000000
        ; 0000 00000000 FFFFFFFF
    ...Else_If_And W$ebx+8 = 0, D$ebx+4 > 0 ; This is Ok.
        ; 0000 00000001 00000000
        ; 0000 FFFFFFFF FFFFFFFF
    ...Else_If_And W$ebx+8 > 0, W$ebx+8 < 07FFF; This is ok only if the fraction Dword is bigger or equal to 080000000
        .If D$ebx+4 < 080000000
            .Test_If D$ebx+4 040000000
                ; QNAN 40000000
                mov D@FPUErrorMode SpecialFPU_QNAN
            .Test_Else
                If_And D$ebx+4 > 0, D$ebx > 0
                    ; SNAN only if at least 1 bit is set
                    mov D@FPUErrorMode SpecialFPU_SNAN
                Else ; All fraction Bits are 0
                    ; Bit 15 is never reached. The bit is 0 from W$ebx+8
                    ; -INFINITE ; Bit15 = 0
                    mov D@FPUErrorMode SpecialFPU_NegInf
                End_If
            .Test_End
        .End_If
    ...Else_If W$ebx+8 = 07FFF; This is ok only if the fraction Dword is bigger or equal to 080000000
        .Test_If D$ebx+4 040000000
            ; QNAN 40000000
            mov D@FPUErrorMode SpecialFPU_QNAN
        .Test_Else
            If_And D$ebx+4 > 0, D$ebx > 0
                ; SNAN only if at least 1 bit is set
                mov D@FPUErrorMode SpecialFPU_SNAN
            Else ; All fraction Bits are 0
                ; Bit 15 is never reached. The bit is 0 from W$ebx+8
                ; -INFINITE ; Bit15 = 0
;               Test_If W$ebx+8 = 0FFFF ; we need to see if Bit 15 is set
 ;                  ; -INFINITE ; Bit15 = 0
  ;             Test_Else
   ;                ; +INFINITE ; Bit15 = 1
    ;           Test_End
                ;mov D$edi '-INF', B$edi+4 0
                mov D@FPUErrorMode SpecialFPU_NegInf
            End_If
        .Test_End
        ; Below is similar to W$ebx+8 = 0
    ...Else_If_And W$ebx+8 = 08000, D$ebx+4 = 0 ; This is denormalized, but possible.
        ; 8000 00000000 00000000 (0)
        ; 8000 00000000 FFFFFFFF (-0.0000000156560127730E-4933)
    ...Else_If_And W$ebx+8 = 08000, D$ebx+4 > 0 ; This is Ok.
        ; 8000 01000000 00000000 (-0.2626643080556322880E-4933)
        ; 8000 FFFFFFFF 00000001 (-6.7242062846585856000E-4932)
    ...Else_If_And W$ebx+8 > 08000, W$ebx+8 < 0FFFF; This is ok only if the fraction Dword is bigger or equal to 080000000
        .If D$ebx+4 < 080000000
            .Test_If D$ebx+4 040000000
                ; QNAN 40000000
                mov D@FPUErrorMode SpecialFPU_QNAN
            .Test_Else
                If_And D$ebx+4 > 0, D$ebx > 0
                    ; SNAN only if at least 1 bit is set
                    ;mov D$edi 'SNaN', B$edi+4 0
                    mov D@FPUErrorMode SpecialFPU_SNAN
                Else ; All fraction Bits are 0
                    ; Bit 15 is always reached. The bit is 1 from W$ebx+8
                    ; +INFINITE ; Bit15 = 1
                    ;mov D$edi '+INF', B$edi+4 0
                    mov D@FPUErrorMode SpecialFPU_PosInf
                End_If
            .Test_End
        .End_If

    ...Else_If W$ebx+8 = 0FFFF; This is to we identify indefined or other NAN values

        .If_And D$ebx+4 >= 040000000, D$ebx = 0
            ; INDEFINITE
            mov D@FPUErrorMode SpecialFPU_Indefinite
        .Else
            .Test_If D$ebx+4 040000000
                ; Special INDEFINITE QNAN 40000000
                mov D@FPUErrorMode SpecialFPU_SpecialIndefQNan
            .Test_Else
                If_And D$ebx+4 > 0, D$ebx > 0
                    ; Special INDEFINITE SNAN only if at least 1 bit is set
                    mov D@FPUErrorMode SpecialFPU_SpecialIndefSNan
                Else ; All fraction Bits are 0
                    ; Bit 15 is always reached. The bit is 1 from W$ebx+8
                    ; Special INDEFINITE +INFINITE ; Bit15 = 1
                    mov D@FPUErrorMode SpecialFPU_SpecialIndefInfinite
                End_If
            .Test_End
        .End_If
    ...End_If

    ..If D@FPUErrorMode <> 0

        On B$edi-1 = '-', dec edi

        .If D@FPUErrorMode = SpecialFPU_QNAN
            push esi | zcopy {"QNAN ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_SNAN
            push esi | zcopy {"SNAN ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_NegInf
            push esi | zcopy {"-INFINITE ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_PosInf
            push esi | zcopy {"+INFINITE ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_Indefinite
            push esi | zcopy {"INDEFINITE ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_SpecialIndefQNan
            push esi | zcopy {"Special INDEFINITE QNAN ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_SpecialIndefSNan
            push esi | zcopy {"Special INDEFINITE SNAN ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_SpecialIndefInfinite
            push esi | zcopy {"Special INDEFINITE +INFINITE ", 0} | pop esi
            mov B$edi 0
        .End_If

    ..End_If

    mov eax D@FPUErrorMode

EndP
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

TimoVJL

  • Member
  • ***
  • Posts: 408
Re: Fast Compare Real8 with SSE
« Reply #53 on: February 12, 2019, 06:15:11 AM »
May the source be with you

guga

  • Member
  • *****
  • Posts: 1041
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #54 on: February 12, 2019, 06:24:23 AM »
Tks, TimoJVl

I read this, but didn´t fully understood

It shows the image as an 80 bit value, but the sign Bit is marked as "Bit 63" , but Bit 63 is for 64 Bit vfalues (Real4). The signed bit in Real10 is at bit 79.

I´m trying to find a similar scheme showing the Pseudo-Infinit etc etc, but for 80 bit.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

  • Member
  • *****
  • Posts: 9635
  • Assembler is fun ;-)
    • MasmBasic
Re: Fast Compare Real8 with SSE
« Reply #55 on: February 12, 2019, 06:49:24 AM »
This one is for 64 bIt. Do you have some info about 80 Bit - real10 ?

https://en.wikipedia.org/wiki/Extended_precision:


Note that in contrast to single and double, the REAL10 does have an explicit integer bit (more).

guga

  • Member
  • *****
  • Posts: 1041
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #56 on: February 12, 2019, 12:19:06 PM »
Tks again, JJ

I read it both, but one thing i´m still not understanding. Raymond says that the limit for Real10 is from 3.36x10-4932 whereas wiki says it is  3.65×10−4951. So, what is the real limit ? I mean, if e-4951 is the true limit, then i presume my function was correct in convert them but, if not, i´m clueless on what i could be doing wrong.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

HSE

  • Member
  • *****
  • Posts: 1079
  • <AMD>< 7-32>
Re: Fast Compare Real8 with SSE
« Reply #57 on: February 12, 2019, 01:09:08 PM »
Just to contributed to confusion, that depends if you are talking about signed or unsigned REAL10.

guga

  • Member
  • *****
  • Posts: 1041
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #58 on: February 12, 2019, 01:56:16 PM »
Oh my God. :dazzled: :dazzled: :dazzled: Holy Shmoly Batman :icon_mrgreen: :icon_mrgreen: :icon_mrgreen: I give up ! I´ll open a bottle of beer and don´t stop drinking until next Monday :icon_mrgreen: :icon_mrgreen: :icon_mrgreen: Errr....not the same beer :icon_rolleyes: :icon_rolleyes: :icon_rolleyes: :bgrin: :bgrin: :bgrin: :bgrin:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

  • Member
  • *****
  • Posts: 9635
  • Assembler is fun ;-)
    • MasmBasic
Re: Fast Compare Real8 with SSE
« Reply #59 on: February 12, 2019, 07:09:09 PM »
Raymond says that the limit for Real10 is from 3.36x10-4932 whereas wiki says it is  3.65×10−4951. So, what is the real limit ?

Wiki says The 80-bit floating point format has a range (including subnormals) from approximately 3.65×10−4951 to 1.18×104932. Indeed an interesting difference 8)