Author Topic: Fast Compare Real8 with SSE and ColorSpaces  (Read 6924 times)

HSE

  • Member
  • *****
  • Posts: 1106
  • <AMD>< 7-32>
Re: Fast Compare Real8 with SSE
« Reply #60 on: February 12, 2019, 09:37:34 PM »
Intel manual (downloaded minutes ago) say that Double Extended Precision (80 bits) is signed and range is 3.37 × 10^–4932 to 1.18 × 10^4932 

Siekmanski

  • Member
  • *****
  • Posts: 1906
Re: Fast Compare Real8 with SSE
« Reply #61 on: February 12, 2019, 11:05:21 PM »
Interesting info on the IEEE Standard for Floating-Point Arithmetic IEEE 754-2008 revision
http://www.dsc.ufcg.edu.br/~cnum/modulos/Modulo2/IEEE754_2008.pdf

https://steve.hollasch.net/cgindex/coding/ieeefloat.html
Creative coders use backward thinking techniques as a strategy.

jj2007

  • Member
  • *****
  • Posts: 9687
  • Assembler is fun ;-)
    • MasmBasic
Re: Fast Compare Real8 with SSE
« Reply #62 on: February 12, 2019, 11:38:50 PM »
Intel manual (downloaded minutes ago) say that Double Extended Precision (80 bits) is signed and range is 3.37 × 10^–4932 to 1.18 × 10^4932

Wiki says The 80-bit floating point format has a range (including subnormals) from approximately 3.65×10−4951 to 1.18×104932.

The 3.37 × 10^–4932 are normalised numbers. Below that you have approx 2^63 subnormal numbers with exponent all zero and decreasing precision.

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #63 on: February 13, 2019, 12:18:07 AM »
Too confusing.  :dazzled: :dazzled:

The problem is that the generated values of such things as "0FE, 07F, 0, 0, 0C0, 07F, 0, 0, 0, 0" do generates a valid number that exceeds IEEE (since this, in particular is subnormal). The function i made can actually convert such things on the same way ollydebugger does (except, on mine, i didn´t threw any msg of bad number (subnormal) as in olly. I achieved the same result as in Olly. (5.12...e-4937) but still not convinced what it is all about. If this (Subnormal numbers) should be considered a NAN or not. And if it is, then what are the bits that could be settled to configure this as a NAN since the bits responsible for identify NAN´s simply does not identify this  ? And, btw, the FPU operators seems to accept this number without throwing any exception .

HSE, i read Intel manual, but i couldn´t found on them anything about those subnormal ranges. JJ is correct about the lack of precision and the limits, but i don´t know how (or if) those numbers can be categorized as NANs for usage in a disassembler or debugger, for example. I realize that such values shouldn´t be a big beef (afterall who on Earth will use values such as 1e-4937 ?) but, for a disassembler point of view, it maybe necessary to know what to do in order to avoid cascading errors. For example, if we have a table of bytes (or structures) that contains a mix of FloatingPoints (Real10) and single bytes, if those numbers were considered valid, the whole strucure will be converted to Float units. However, if they are really to be considered NAN´s, then the disassembler will simply discard those bytes as true numbers and display the corresponding bytes rather then the float, and this bytes may also be part of pointer to addresses, for example. So, to avoid cascading errors, it is reasonable think whether those subnormal may represent valid Floats (although imprecise) or not

I start wondering if posit is really better then IEEE floats.

https://supercomputingfrontiers.eu/2017/wp-content/uploads/2017/03/2_1100_John-Gustafson.pdf
https://github.com/libcg/bfp
https://posithub.org/docs/PositTutorial_Part1.html

Maybe it´s worth to give a try later ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

HSE

  • Member
  • *****
  • Posts: 1106
  • <AMD>< 7-32>
Re: Fast Compare Real8 with SSE
« Reply #64 on: February 13, 2019, 12:48:09 AM »
Wiki says The 80-bit floating point format has a range (including subnormals) from approximately 3.65×10−4951 to 1.18×104932.
Working with exceptions then is 3.65×10−4951 to 1.18×104951, because also there are denormalized positive numbers

jj2007

  • Member
  • *****
  • Posts: 9687
  • Assembler is fun ;-)
    • MasmBasic
Re: Fast Compare Real8 with SSE
« Reply #65 on: February 13, 2019, 12:59:10 AM »
With Olly, a Num dd 0,  80000000h, 1h is the limit.

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #66 on: February 13, 2019, 02:05:10 AM »
With Olly, a Num dd 0,  80000000h, 1h is the limit.

I´ll have to do the same thing then and insert a flag of bad number for the debugger (displaying a msg as in ollydbg) and on the disassembler as well. Damn ! I wonder why why why people uses such things or why compilers can´t simply adjust their precision ? I´ll have to plan a whole different strategy now for the disassembler since compilers can generate such bad numbers and see how can i properly identify them as a number (although bad ones) or a chunk of bytes that can be used as pointers or whatever.

I was giving a test on such numbers using old apps i have since the late 90´s. I found those numbers while disassembling an old version of Mirc that used a library containing functions such as: logl, f87_Log, matherrl that contained such numbers displayed on a Array of Real10 Floats. It was compiled using BCC 4/5 :greensml: :greensml:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #67 on: February 13, 2019, 02:48:21 AM »
Interesting info on the IEEE Standard for Floating-Point Arithmetic IEEE 754-2008 revision
http://www.dsc.ufcg.edu.br/~cnum/modulos/Modulo2/IEEE754_2008.pdf

https://steve.hollasch.net/cgindex/coding/ieeefloat.html

Tks a lot, Siekmanski. It will help to properly categorize those situations  :t :t :t :t
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #68 on: February 13, 2019, 03:28:21 AM »
Ok, so according to the documents and information, it´s safe to conclude that whenever the last Word of a TenByte is 0, the number is subnormal and have his range outside the maximum values, right ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #69 on: February 13, 2019, 08:19:34 AM »
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 1906
Re: Fast Compare Real8 with SSE
« Reply #70 on: February 15, 2019, 06:59:57 AM »
Hi guga,
The color space routine is finished and I hope it is as should be? (not 100% sure, I don't have reference bitmaps to check against)
Have to write the stuff to combine the source-luma and the reference-chromas to get the destination bitmap.
Next week I'll write the rest.

Creative coders use backward thinking techniques as a strategy.

daydreamer

  • Member
  • ****
  • Posts: 907
  • watch Chebyshev on the backside of the Moon
Re: Fast Compare Real8 with SSE
« Reply #71 on: February 16, 2019, 05:50:13 AM »
Good work Siekmanski  :t
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D

Siekmanski

  • Member
  • *****
  • Posts: 1906
Re: Fast Compare Real8 with SSE
« Reply #72 on: February 16, 2019, 06:18:43 AM »
Thanks Magnus, hope I didn't screw up the calculations....  :biggrin:
Creative coders use backward thinking techniques as a strategy.

guga

  • Member
  • *****
  • Posts: 1043
  • Assembly is a state of art.
    • RosAsm
Re: Fast Compare Real8 with SSE
« Reply #73 on: February 16, 2019, 04:00:58 PM »
Wonderfull, Siekmanski. :t :t :t


I finished some calculations about the CieLCH routines. CieLCH/CieLab are way better then YUV colorspace, but still have some issues on the backwards computation. I succeeded to find the limits for Chroma And Hue on the RGB to CieLCH convertion. So, whenever you need to readjust luma or chorma to a new RGB it won´t mess up the result.

If you need i can upload the updated pdf for you. One thing i found is that the hue from CieLCH is supposed to have a limit of sqrt(29) According to this:

Chroma < [(1000/116)*(Luma+16)] / [5*cos(Hue)-2*sin(Hue)]

And for the maximum value for Hue, it must obey this limit: 5*cos(Hue)-2*sin(Hue) <= sqrt(29)

The only problem about those limitations of CieLab/CieLCH is that Hue was to be restricted to an angle of something around 338º but, it is extrapolating in 1 quadrant (So, plus 90º). One solution is reduce Hue forcing it to be restricted to 0 to 270º

The problem on CieLab relies on this:

Luma = YFinal*116-16
a = (XFinal-YFinal)*500 = AFactor
b = (YFinal-ZFinal)*200 = BFactor

The bFactor is extrapolating the Hue. I`m quite sure, the ratio is 3/4. So, perhaps, multiplying the final "b' with 3/4 could fix the Chroma and also the hue forcing it to stay in the limit of only 3 quadrants


To convert back from CieLCH to RGb you must 1st see if the relation between Hue, Chroma and Hue fits according to this formula:

Chroma < [(1000/116)*(Luma+16)] / [5*cos(Hue)-2*sin(Hue)]

But it probably needs to reduce Chroma (reducing "b' with 3/4) to make the hue angle stay on the proper limits.

Tomorrow i´ll try making some more calculations  to check if it is correct those limits on the backward convertion. Raymond, JJ and AW helped me with a problem i was having on the FPU routines i use to analyse all of this :)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 1906
Re: Fast Compare Real8 with SSE
« Reply #74 on: February 16, 2019, 07:47:59 PM »
CieLCH seems to be the best option but also the most complicated color space conversion algorithm.
For me this is a new field to explore and have to learn a lot more to understand it fully.
Looking forward to see your CieLCH routine when finished.  :t

Found this paper: http://jcgt.org/published/0002/02/01/paper.pdf

In my own logic, I always try to understand algorithms by working my way backwards.
Then I try to simplify the calculations if possible.
In the case of color space conversion you can precalculate the coefficients for a dot3 matrix routine.

Once you know the 3*3 matrix coefficients for the "forward" RGB -> XYZ calculations,
you can use the inverse of the 3*3 matrix coefficients to compute the "backwards" XYZ -> RGB.
This way, the "backwards" results will always be correct.

Bruce Lindbloom has done some of the coefficients math for us to compute the RGB -> XYZ and XYZ -> RGB matrices.
http://www.brucelindbloom.com/Eqn_RGB_XYZ_Matrix.html

There is a CIE Color Calculator on his site:
http://www.brucelindbloom.com/ColorCalculator.html

My routines are not finished yet but I will post them when ready, then we can check if it is done right.

Here is the dot3 color conversion routine I wrote, the one is used in the example above in reply #70.
I wrote it in SSE2 instructions so it can be used on older computers as well. ( so no fancy byte shuffles in this one. )
Adjusted the 3*3 matrix transpose routine to handle 4 row elements preserving alpha

Code: [Select]
                ;    B          G          R          A
CIERGB2XYZ  real4  0.2006017, 0.3106803, 0.4887180, 0.0 ; X
            real4  0.0108109, 0.8129847, 0.1762044, 0.0 ; Y
            real4  0.9897952, 0.0102048, 0.0000000, 1.0 ; Z ( AZ must be 1.0 )

ALPHA_mask  dd  -1,-1,-1,0


align 4
ColorConversionInt2Float proc uses ebx esi edi BitmapWidth:DWORD,BitmapHeight:DWORD,pSourceMem:DWORD,pDestinationMem:DWORD,pConversionType:DWORD

    mov         esi,pSourceMem
    mov         edi,pDestinationMem
    mov         edx,pConversionType
   
    mov         ecx,BitmapWidth
    imul        ecx,BitmapHeight
    shr         ecx,2
   
    pxor        xmm5,xmm5                   ; Empty the source operand, to zero the integer high parts,
                                            ; in the "punpcklbw", "punpcklwd" instructions
align 16
LoadFourPixels:
    mov         ebx,4
    movdqa      xmm6,oword ptr [esi]        ; Load 4 ARGB pixels at once

FourPixelLP:
    movq        xmm0,xmm6                   ; 1 pixel
    punpcklbw   xmm0,xmm5                   ; Convert 4 bytes to 4 words
    punpcklwd   xmm0,xmm5                   ; Convert 4 words to 4 dwords
    cvtdq2ps    xmm0,xmm0                   ; Convert 4 dwords to 4 real4 values

    movaps      xmm1,xmm0                   ; [B G R A]
    movaps      xmm2,xmm0                   ; [B G R A]
    mulps       xmm0,oword ptr [edx]        ; [BX GX RX  --] Multiply Color X conversion coefficients
    mulps       xmm1,oword ptr [edx+16]     ; [BY GY RY  --] Multiply Color Y conversion coefficients
    mulps       xmm2,oword ptr [edx+32]     ; [BZ GZ RZ  AZ] Multiply Color Z conversion coefficients

    ; Color conversion using an adjusted SSE2 4*3 matrix transposition routine ( preserving Alpha_Z)
    ; Now we can run a fast Dot3 ( 3-component vector ) calculation on the
    ; 12 color components and 12 color coefficients ( 9 of each + 3 Alpha components )
    ; Calculations are in parallel, 3 muls and 2 adds
   
    movaps      xmm3,xmm0                   ; [BX GX RX --]
    movaps      xmm4,xmm2                   ; [BZ GZ RZ AZ]
    unpcklps    xmm3,xmm1                   ; [BX BY GX GY]
    unpcklps    xmm4,xmm4                   ; [BZ BZ GZ GZ]
    movhlps     xmm4,xmm3                   ; [GX GY GZ GZ]
    movlhps     xmm3,xmm2                   ; [BX BY BZ GZ]
    unpckhps    xmm0,xmm1                   ; [RX RY -- --]
    shufps      xmm0,xmm2,Shuffle(3,2,1,0)  ; [RX RY RZ AZ]
    shufps      xmm6,xmm6,Shuffle(0,3,2,1)  ; pre-load next ARGB pixel
    addps       xmm3,xmm4                   ; [BX+GX BY+GY BZ+GZ GZ+GZ]
    andps       xmm3,oword ptr ALPHA_mask   ; [BX+GX BY+GY BZ+GZ --]   
    addps       xmm0,xmm3                   ; [RX+BX+GX RY+BY+GY RZ+BZ+GZ AZ]
                                            ; result: BGRA
    movaps      oword ptr [edi],xmm0        ; Store BGRA Pixel in Real4 format
    add         edi,16
    dec         ebx
    jnz         FourPixelLP
    add         esi,16
    dec         ecx
    jnz         LoadFourPixels
    ret

ColorConversionInt2Float endp
Creative coders use backward thinking techniques as a strategy.