Author Topic: Fast Compare Real8 with SSE and ColorSpaces  (Read 6869 times)

jj2007

  • Member
  • *****
  • Posts: 9686
  • Assembler is fun ;-)
    • MasmBasic
Re: Fast Compare Real8 with SSE
« Reply #120 on: February 27, 2019, 03:49:33 AM »
JJ. Your fcmp macro is actually a function, right ? movups loads 4 Real4 at once, is that it ?

A macro combined with a function that expects two numbers on the FPU.

  SetFloat xmm0=1234567890.1234567
  SetFloat MyR4=1234567890.1234567      ; MyR4 is a REAL4
  deb 4, "Comparing two floats:", MyR4, f:xmm0
  Fcmp MyR4, f:xmm0, high      ; compare the two with high precision (top would be strict, medium more tolerant)

This creates the following under the hood:
Code: [Select]
0040143F      ³.  DDC7              ffree st(7)                          ; float 1234567890.1234567000
00401441      ³.  83EC 08           sub esp, 8                           ; create a REAL8 slot
00401444      ³.  0F130424          movlps [esp], xmm0                   ; load xmm0 into slot
00401448      ³.  DD0424            fld qword ptr [esp]                  ; push value on FPU
0040144B      ³.  83C4 08           add esp, 8                           ; correct stack
0040144E      ³.  DDC7              ffree st(7)
00401450      ³.  D943 8A           fld dword ptr [ebx-76]               ; MyR4, the REAL4
00401453      ³.  6A 2D             push 2D                              ; requested precision
00401455      ³.  E8 46100000       call MbFloatCmp                      ; ÀFcmp.MbFloatCmp
0040145A      ³. 75 2B             jnz short C0004

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #121 on: February 27, 2019, 03:55:29 AM »
Quote
For 2 real8 you have 2 sign states in eax witch gives you 4 possibilities ( movmskpd )
For 4 real4 you have 4 sign states in eax witch gives you 16 possibilities ( movmskps )

This gives you also the possibility to use a jump table to execute code without branching.
For RGB conversion you will be able to handle 3 sign bits ( 8 possibilities ).

So, 1 compare makes it possible to execute the code at once without branching.

Tks a lot, Marinus. I´ll give a try this afternoon after reading the intel manual about the basics of SSE.


Quote
For example this piece of pseudo code (you can see I'm learning xyz2lab  :biggrin: ) where you can compare red green and blue at once and jump to 1 of the 6 possibilities.
:greensml: :greensml: :greensml: :greensml: :greensml:
It´s cool, right ? :icon_mrgreen: :icon_mrgreen:

Oh...just keep in mind that the equation in CIE to convert XYZ to Lab is incorrect. Bruce made the initial fix and i fixed the rest.

All you have to do is multiply immediately after it checks the threshold at:

Quote
  if (x > 0.008856) x = pow(x, 1.0 / 3.0)
   else x = (x * 7.787) + (16.0 / 116.0)

The resultant x value must be multiplied with this:


or...on a more simplified  way. Multiply x with:


probably it will result on a greenish image if you don´t adjust "a" and "b' to also stays withinh the range of luma, but this is  (in fact) what the CieLab equations are actually doing.

It´s only the final "x" that is incorrect. This fix, will also works once you convert it to CieLCH as well.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #122 on: February 27, 2019, 04:27:15 AM »
JJ. Your fcmp macro is actually a function, right ? movups loads 4 Real4 at once, is that it ?

A macro combined with a function that expects two numbers on the FPU.

  SetFloat xmm0=1234567890.1234567
  SetFloat MyR4=1234567890.1234567      ; MyR4 is a REAL4
  deb 4, "Comparing two floats:", MyR4, f:xmm0
  Fcmp MyR4, f:xmm0, high      ; compare the two with high precision (top would be strict, medium more tolerant)

This creates the following under the hood:
Code: [Select]
0040143F      ³.  DDC7              ffree st(7)                          ; float 1234567890.1234567000
00401441      ³.  83EC 08           sub esp, 8                           ; create a REAL8 slot
00401444      ³.  0F130424          movlps [esp], xmm0                   ; load xmm0 into slot
00401448      ³.  DD0424            fld qword ptr [esp]                  ; push value on FPU
0040144B      ³.  83C4 08           add esp, 8                           ; correct stack
0040144E      ³.  DDC7              ffree st(7)
00401450      ³.  D943 8A           fld dword ptr [ebx-76]               ; MyR4, the REAL4
00401453      ³.  6A 2D             push 2D                              ; requested precision
00401455      ³.  E8 46100000       call MbFloatCmp                      ; ÀFcmp.MbFloatCmp
0040145A      ³. 75 2B             jnz short C0004

Great work. :t :t :t :t

At the end what it is doing is comparing the values in FPU, rather then SSE, right ?

I was thinking in something similar to the one as in the "If" (Same as in the masm version) or "Fpu_If" (In rosasm.I don´t know if there is a similar one for masm) Macro using the SSE instructions more directly, similar to what you did with the comisd instruction.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

  • Member
  • *****
  • Posts: 9686
  • Assembler is fun ;-)
    • MasmBasic
Re: Fast Compare Real8 with SSE
« Reply #123 on: February 27, 2019, 05:36:26 AM »
Great work. :t :t :t :t

Thanks :bgrin:

Quote
At the end what it is doing is comparing the values in FPU, rather then SSE, right ?

Yes, more or less. There was a long thread discussing various options some years ago.

If you want the real fun, try this:

include \masm32\MasmBasic\MasmBasic.inc         ; download
  SetGlobals temp:REAL10
  Init
  SetFloat temp= 123.45678901234567891   ; temp is a REAL10
  SetFloat ST(0)=123.45678901234567890
  deb 4, "Comparing two floats:", ST(0), temp
  Fcmp temp, ST(0), xtra
 .if Zero?
        Print Str$("%Jf==", ST(0)), Str$(temp)
 .elseif FcmpLess
        PrintLine Str$("temp %Jf", temp), Chr$(60), Str$(ST(0))
 .else
        PrintLine Str$("temp %Jf", temp), Chr$(62), Str$(ST(0))
 .endif
  Inkey CrLf$, "comparing floats is fun!!!!"
EndOfCode


Output:
Code: [Select]
Comparing two floats:
ST(0)           123.4567890123456789
temp            123.4567890123456789
temp 123.4567890123456789>123.4567890123456789

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #124 on: February 27, 2019, 07:39:07 AM »
Good news :)

1st tests when converting Luma to interger was ok. And as a result of the convertion, when i compensate Luma to be adjusted to Chroma, no more errors whatsoever on extrapolations of the sqrt(29) limit.  It seems that the compensation ratio adjusts Chroma and Hue on such a way that it will always be within their boundaries.

I´ll make a minor test on the backward function (it do needs to calculate the reversed compensation ratio, but...no problem. It won´t affect performance. It will be needed to check the limits anyway, so, a pointer to another member of the structure won´t make any significant  difference in terms of speed, and any eventual loss of speed can be compensated later when i start to give a try and convert it to SSE and Real4 to be used in the matrix. I just hope we won´t loose accuracy after converting all variables to Real4 rather then Real8)

The errors on the algorithm when show up are a kind of hide and seek play :greensml: :greensml: :greensml:


One question. How to calculate the probability or density of a certain range of data ?

THis is to analyse what CieLCH is doing with the pixels. I mean, after the fixes, i suceeded to map all 16 million pixels on another table from 0 to 255 (also biased on the luma), and figure it out that the distribution is almost proportional (or making a sine curve, perhaps ?) all over the table.

For example:
Gray/Luma = 0, total pixels = 38
Gray/Luma = 1, total pixels = 250
Gray/Luma = 2, total pixels = 330
....
Gray/Luma = 142, total pixels = 128694
after that it starts decreasing again.
Gray/Luma = 143, total pixels = 108965
Gray/Luma = 144, total pixels = 107555
...
Gray/Luma = 255, total pixels = 0

It seems that i suceeded to adjust CieLab equations to behave as a parabolic cylinder, but the midpoint seems to be at 142 and the decresing values does not match to their opposite side.

So, how to "equalize" the distribution of a range of data in order to it results in something like this:

Gray/Luma = 0, total pixels = 38
Gray/Luma = 1, total pixels = 250
Gray/Luma = 2, total pixels = 330
....
Gray/Luma = 126, total pixels = 100000
Gray/Luma = 127, total pixels = 128694 ; midpoint
Gray/Luma = 128, total pixels = 128694 ; midpoint
after that it starts decreasing again.
Gray/Luma = 129, total pixels = 100000
...

Gray/Luma = 253, total pixels = 330
Gray/Luma = 254, total pixels = 250
Gray/Luma = 255, total pixels = 38

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #125 on: February 27, 2019, 11:28:18 AM »
1st result using Integer as input and output as Luma settled at 49% :t

The fix on the limits for Chroma and Hue shift are not done yet. I´m currently working on it

SheHulk is looking better :greensml: :greensml: :greensml: :greensml: :greensml: :greensml: :greensml:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 1906
Re: Fast Compare Real8 with SSE
« Reply #126 on: February 27, 2019, 11:47:52 AM »
 :t

I've made her an avatar from Pandora  :biggrin:

Creative coders use backward thinking techniques as a strategy.

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #127 on: February 27, 2019, 11:59:45 AM »
 :greenclp: :greenclp: :greenclp: :greenclp: :greenclp:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #128 on: February 27, 2019, 07:50:01 PM »
SheHulk is getting mature, now :greensml: :greensml: :greensml: :greensml:




Hi guys, i fixed Luma and Chroma. Now Luma is a  integer from 0 to 255 and Chroma (Which, really is a intensity of Luma, as i mentioned before) i settled it as a Real8 number from 0 to 100. The hue shift i´ll solve the same way i did for Chroma. It is needed to it stay also within the boundaries of Luma inside the LUT. There is a minor, tiny loss of chroma (around 2%) due to the shifting of Hue yet.

On this test, i reduced Luma to 49% and kept chromacity percentage untouched. Once i fix the hue shifting problem, i´ll see the best strategy to link both (Hue and chroma) and start creating some flags to be used during the output  (CieLCH to RGB)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 1906
Re: Fast Compare Real8 with SSE
« Reply #129 on: February 27, 2019, 08:09:12 PM »
 :t
Creative coders use backward thinking techniques as a strategy.

FORTRANS

  • Member
  • *****
  • Posts: 1056
Re: Fast Compare Real8 with SSE
« Reply #130 on: February 28, 2019, 12:55:31 AM »
Hi,

So, how to "equalize" the distribution of a range of data in order to it results in something like this:

Gray/Luma = 0, total pixels = 38
Gray/Luma = 1, total pixels = 250
Gray/Luma = 2, total pixels = 330
....
Gray/Luma = 126, total pixels = 100000
Gray/Luma = 127, total pixels = 128694 ; midpoint
Gray/Luma = 128, total pixels = 128694 ; midpoint
after that it starts decreasing again.
Gray/Luma = 129, total pixels = 100000
...

Gray/Luma = 253, total pixels = 330
Gray/Luma = 254, total pixels = 250
Gray/Luma = 255, total pixels = 38

   Well I did this on gray scale images.  I made a quick look, but
I did not find my code for a Gaussian distribution.  What I did
was read in the pixel data, assigned a number corresponding to
its position in the picture, and then sorted the data.  I then
replaced the data with values that followed the desired distribution
and unsorted the data back to reform the image.

   A brute force method, but easy to program.  A better way is
to count up the data values and apply an algorithm to replace
an input data value with a value that will create the desired
distribution.  A somewhat harder programming job, as a given
input value may need to map to multiple output values.  So
you may actually have to think a bit when writing your program.

   I did find an example with a uniform output distribution that
can show that result if wanted.

Cheers,

Steve N.

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #131 on: February 28, 2019, 03:16:53 AM »
Tks, Steve

Quote
   Well I did this on gray scale images.  I made a quick look, but
I did not find my code for a Gaussian distribution.  What I did
was read in the pixel data, assigned a number corresponding to
its position in the picture, and then sorted the data.  I then
replaced the data with values that followed the desired distribution
and unsorted the data back to reform the image.

If you find it, can you post for me to read ? I did something like that to equalize the histogram on gray as well, but don´t know if it is ok. I´m trying to understand better how to equalize the whole 16 million of colors rather then an image itself. The distribution of all 16 million colors on the CieLCH using AdobeRGB, after the fixes i made are now more or less equally distributed, but i don´t know exactly how it reached that way. It looks line a sine distribution, but i´m clueless how to create a equation to retrieve the actual math involved with this kind of distribution. I mean, i want to find the proper equation that generated that result (The distribution), fix the math on it (if necessary) in order to it distribute the values/pixels following that equation.

As you see on the attached image, When i reduced luma to 50%, the algorithm generated a similar distribution as the original one (except, reduced to half of it´s pixels.). It is still shifting the hue, since i didn´t made the proper fix yet (hard to follow all those equations, btw and try to develop a strategy to avoid shifting :dazzled: :dazzled:)

Quote
   I did find an example with a uniform output distribution that
can show that result if wanted.

Please, can you upload it or have a link to see this ?


Tks a lot
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

FORTRANS

  • Member
  • *****
  • Posts: 1056
Re: Fast Compare Real8 with SSE
« Reply #132 on: February 28, 2019, 08:12:38 AM »
Hi,

   Okay, the last version of the program did uniform weighting and
was written in 2001.  (And still had errors.)  The Gaussian distribution
version was done in 1995.  Its output looks a little rough, see
attached histograms with a maximum bin count.

   Old FORTRAN is sometimes hard to figure out when you don't
remember writing it any more.  I can post some of the code if you
want, but it follows my description above as far as I can see.
And it was for rescaling an image to improve the dynamic range.

Regards,

Steve

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #133 on: February 28, 2019, 01:22:28 PM »
Tks, Steve. Pls post. :icon_cool: :icon_cool: I would like to read to understand it better. :t :t :t

Regards

guga
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

FORTRANS

  • Member
  • *****
  • Posts: 1056
Re: Fast Compare Real8 with SSE
« Reply #134 on: March 01, 2019, 02:23:02 AM »
Tks, Steve. Pls post. :icon_cool: :icon_cool: I would like to read to understand it better. :t :t :t

Regards

guga

Hi,

   Okay.  This should be it.  I deleted some debugging code
and a comment was changed to make more sense, sort of.
But I don't think anything meaningful was changed.  Not that
this code is meaningful anyway.

Enjoy,

Steve