### Author Topic: Fast Compare Real8 with SSE and ColorSpaces  (Read 8986 times)

#### HSE

• Member
• Posts: 1205
• <AMD>< 7-32>
##### Re: Fast Compare Real8 with SSE
« Reply #60 on: February 12, 2019, 09:37:34 PM »
Intel manual (downloaded minutes ago) say that Double Extended Precision (80 bits) is signed and range is 3.37 × 10^–4932 to 1.18 × 10^4932

#### Siekmanski

• Member
• Posts: 1973
##### Re: Fast Compare Real8 with SSE
« Reply #61 on: February 12, 2019, 11:05:21 PM »
Interesting info on the IEEE Standard for Floating-Point Arithmetic IEEE 754-2008 revision
http://www.dsc.ufcg.edu.br/~cnum/modulos/Modulo2/IEEE754_2008.pdf

https://steve.hollasch.net/cgindex/coding/ieeefloat.html
Creative coders use backward thinking techniques as a strategy.

#### jj2007

• Member
• Posts: 9998
• Assembler is fun ;-)
##### Re: Fast Compare Real8 with SSE
« Reply #62 on: February 12, 2019, 11:38:50 PM »
Intel manual (downloaded minutes ago) say that Double Extended Precision (80 bits) is signed and range is 3.37 × 10^–4932 to 1.18 × 10^4932

Wiki says The 80-bit floating point format has a range (including subnormals) from approximately 3.65×10−4951 to 1.18×104932.

The 3.37 × 10^–4932 are normalised numbers. Below that you have approx 2^63 subnormal numbers with exponent all zero and decreasing precision.

#### guga

• Member
• Posts: 1074
• Assembly is a state of art.
##### Re: Fast Compare Real8 with SSE
« Reply #63 on: February 13, 2019, 12:18:07 AM »
Too confusing.

The problem is that the generated values of such things as "0FE, 07F, 0, 0, 0C0, 07F, 0, 0, 0, 0" do generates a valid number that exceeds IEEE (since this, in particular is subnormal). The function i made can actually convert such things on the same way ollydebugger does (except, on mine, i didn´t threw any msg of bad number (subnormal) as in olly. I achieved the same result as in Olly. (5.12...e-4937) but still not convinced what it is all about. If this (Subnormal numbers) should be considered a NAN or not. And if it is, then what are the bits that could be settled to configure this as a NAN since the bits responsible for identify NAN´s simply does not identify this  ? And, btw, the FPU operators seems to accept this number without throwing any exception .

HSE, i read Intel manual, but i couldn´t found on them anything about those subnormal ranges. JJ is correct about the lack of precision and the limits, but i don´t know how (or if) those numbers can be categorized as NANs for usage in a disassembler or debugger, for example. I realize that such values shouldn´t be a big beef (afterall who on Earth will use values such as 1e-4937 ?) but, for a disassembler point of view, it maybe necessary to know what to do in order to avoid cascading errors. For example, if we have a table of bytes (or structures) that contains a mix of FloatingPoints (Real10) and single bytes, if those numbers were considered valid, the whole strucure will be converted to Float units. However, if they are really to be considered NAN´s, then the disassembler will simply discard those bytes as true numbers and display the corresponding bytes rather then the float, and this bytes may also be part of pointer to addresses, for example. So, to avoid cascading errors, it is reasonable think whether those subnormal may represent valid Floats (although imprecise) or not

I start wondering if posit is really better then IEEE floats.

https://github.com/libcg/bfp
https://posithub.org/docs/PositTutorial_Part1.html

Maybe it´s worth to give a try later ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### HSE

• Member
• Posts: 1205
• <AMD>< 7-32>
##### Re: Fast Compare Real8 with SSE
« Reply #64 on: February 13, 2019, 12:48:09 AM »
Wiki says The 80-bit floating point format has a range (including subnormals) from approximately 3.65×10−4951 to 1.18×104932.
Working with exceptions then is 3.65×10−4951 to 1.18×104951, because also there are denormalized positive numbers

#### jj2007

• Member
• Posts: 9998
• Assembler is fun ;-)
##### Re: Fast Compare Real8 with SSE
« Reply #65 on: February 13, 2019, 12:59:10 AM »
With Olly, a Num dd 0,  80000000h, 1h is the limit.

#### guga

• Member
• Posts: 1074
• Assembly is a state of art.
##### Re: Fast Compare Real8 with SSE
« Reply #66 on: February 13, 2019, 02:05:10 AM »
With Olly, a Num dd 0,  80000000h, 1h is the limit.

I´ll have to do the same thing then and insert a flag of bad number for the debugger (displaying a msg as in ollydbg) and on the disassembler as well. Damn ! I wonder why why why people uses such things or why compilers can´t simply adjust their precision ? I´ll have to plan a whole different strategy now for the disassembler since compilers can generate such bad numbers and see how can i properly identify them as a number (although bad ones) or a chunk of bytes that can be used as pointers or whatever.

I was giving a test on such numbers using old apps i have since the late 90´s. I found those numbers while disassembling an old version of Mirc that used a library containing functions such as: logl, f87_Log, matherrl that contained such numbers displayed on a Array of Real10 Floats. It was compiled using BCC 4/5
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

• Member
• Posts: 1074
• Assembly is a state of art.
##### Re: Fast Compare Real8 with SSE
« Reply #67 on: February 13, 2019, 02:48:21 AM »
Interesting info on the IEEE Standard for Floating-Point Arithmetic IEEE 754-2008 revision
http://www.dsc.ufcg.edu.br/~cnum/modulos/Modulo2/IEEE754_2008.pdf

https://steve.hollasch.net/cgindex/coding/ieeefloat.html

Tks a lot, Siekmanski. It will help to properly categorize those situations  :t :t :t :t
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

• Member
• Posts: 1074
• Assembly is a state of art.
##### Re: Fast Compare Real8 with SSE
« Reply #68 on: February 13, 2019, 03:28:21 AM »
Ok, so according to the documents and information, it´s safe to conclude that whenever the last Word of a TenByte is 0, the number is subnormal and have his range outside the maximum values, right ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

• Member
• Posts: 1074
• Assembly is a state of art.
##### Re: Fast Compare Real8 with SSE
« Reply #69 on: February 13, 2019, 08:19:34 AM »
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### Siekmanski

• Member
• Posts: 1973
##### Re: Fast Compare Real8 with SSE
« Reply #70 on: February 15, 2019, 06:59:57 AM »
Hi guga,
The color space routine is finished and I hope it is as should be? (not 100% sure, I don't have reference bitmaps to check against)
Have to write the stuff to combine the source-luma and the reference-chromas to get the destination bitmap.
Next week I'll write the rest.

Creative coders use backward thinking techniques as a strategy.

#### daydreamer

• Member
• Posts: 1081
• I also want a stargate
##### Re: Fast Compare Real8 with SSE
« Reply #71 on: February 16, 2019, 05:50:13 AM »
Good work Siekmanski  :t
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

#### Siekmanski

• Member
• Posts: 1973
##### Re: Fast Compare Real8 with SSE
« Reply #72 on: February 16, 2019, 06:18:43 AM »
Thanks Magnus, hope I didn't screw up the calculations....
Creative coders use backward thinking techniques as a strategy.

#### guga

• Member
• Posts: 1074
• Assembly is a state of art.
##### Re: Fast Compare Real8 with SSE
« Reply #73 on: February 16, 2019, 04:00:58 PM »
Wonderfull, Siekmanski. :t :t :t

I finished some calculations about the CieLCH routines. CieLCH/CieLab are way better then YUV colorspace, but still have some issues on the backwards computation. I succeeded to find the limits for Chroma And Hue on the RGB to CieLCH convertion. So, whenever you need to readjust luma or chorma to a new RGB it won´t mess up the result.

If you need i can upload the updated pdf for you. One thing i found is that the hue from CieLCH is supposed to have a limit of sqrt(29) According to this:

Chroma < [(1000/116)*(Luma+16)] / [5*cos(Hue)-2*sin(Hue)]

And for the maximum value for Hue, it must obey this limit: 5*cos(Hue)-2*sin(Hue) <= sqrt(29)

The only problem about those limitations of CieLab/CieLCH is that Hue was to be restricted to an angle of something around 338º but, it is extrapolating in 1 quadrant (So, plus 90º). One solution is reduce Hue forcing it to be restricted to 0 to 270º

The problem on CieLab relies on this:

Luma = YFinal*116-16
a = (XFinal-YFinal)*500 = AFactor
b = (YFinal-ZFinal)*200 = BFactor

The bFactor is extrapolating the Hue. I`m quite sure, the ratio is 3/4. So, perhaps, multiplying the final "b' with 3/4 could fix the Chroma and also the hue forcing it to stay in the limit of only 3 quadrants

To convert back from CieLCH to RGb you must 1st see if the relation between Hue, Chroma and Hue fits according to this formula:

Chroma < [(1000/116)*(Luma+16)] / [5*cos(Hue)-2*sin(Hue)]

But it probably needs to reduce Chroma (reducing "b' with 3/4) to make the hue angle stay on the proper limits.

Tomorrow i´ll try making some more calculations  to check if it is correct those limits on the backward convertion. Raymond, JJ and AW helped me with a problem i was having on the FPU routines i use to analyse all of this :)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### Siekmanski

• Member
• Posts: 1973
##### Re: Fast Compare Real8 with SSE
« Reply #74 on: February 16, 2019, 07:47:59 PM »
CieLCH seems to be the best option but also the most complicated color space conversion algorithm.
For me this is a new field to explore and have to learn a lot more to understand it fully.
Looking forward to see your CieLCH routine when finished.  :t

Found this paper: http://jcgt.org/published/0002/02/01/paper.pdf

In my own logic, I always try to understand algorithms by working my way backwards.
Then I try to simplify the calculations if possible.
In the case of color space conversion you can precalculate the coefficients for a dot3 matrix routine.

Once you know the 3*3 matrix coefficients for the "forward" RGB -> XYZ calculations,
you can use the inverse of the 3*3 matrix coefficients to compute the "backwards" XYZ -> RGB.
This way, the "backwards" results will always be correct.

Bruce Lindbloom has done some of the coefficients math for us to compute the RGB -> XYZ and XYZ -> RGB matrices.
http://www.brucelindbloom.com/Eqn_RGB_XYZ_Matrix.html

There is a CIE Color Calculator on his site:
http://www.brucelindbloom.com/ColorCalculator.html

My routines are not finished yet but I will post them when ready, then we can check if it is done right.

Here is the dot3 color conversion routine I wrote, the one is used in the example above in reply #70.
I wrote it in SSE2 instructions so it can be used on older computers as well. ( so no fancy byte shuffles in this one. )
Adjusted the 3*3 matrix transpose routine to handle 4 row elements preserving alpha

Code: [Select]
`                ;    B          G          R          ACIERGB2XYZ  real4  0.2006017, 0.3106803, 0.4887180, 0.0 ; X            real4  0.0108109, 0.8129847, 0.1762044, 0.0 ; Y            real4  0.9897952, 0.0102048, 0.0000000, 1.0 ; Z ( AZ must be 1.0 )ALPHA_mask  dd  -1,-1,-1,0align 4ColorConversionInt2Float proc uses ebx esi edi BitmapWidth:DWORD,BitmapHeight:DWORD,pSourceMem:DWORD,pDestinationMem:DWORD,pConversionType:DWORD    mov         esi,pSourceMem    mov         edi,pDestinationMem    mov         edx,pConversionType        mov         ecx,BitmapWidth    imul        ecx,BitmapHeight    shr         ecx,2        pxor        xmm5,xmm5                   ; Empty the source operand, to zero the integer high parts,                                            ; in the "punpcklbw", "punpcklwd" instructionsalign 16LoadFourPixels:    mov         ebx,4    movdqa      xmm6,oword ptr [esi]        ; Load 4 ARGB pixels at onceFourPixelLP:    movq        xmm0,xmm6                   ; 1 pixel    punpcklbw   xmm0,xmm5                   ; Convert 4 bytes to 4 words    punpcklwd   xmm0,xmm5                   ; Convert 4 words to 4 dwords    cvtdq2ps    xmm0,xmm0                   ; Convert 4 dwords to 4 real4 values    movaps      xmm1,xmm0                   ; [B G R A]    movaps      xmm2,xmm0                   ; [B G R A]    mulps       xmm0,oword ptr [edx]        ; [BX GX RX  --] Multiply Color X conversion coefficients    mulps       xmm1,oword ptr [edx+16]     ; [BY GY RY  --] Multiply Color Y conversion coefficients    mulps       xmm2,oword ptr [edx+32]     ; [BZ GZ RZ  AZ] Multiply Color Z conversion coefficients    ; Color conversion using an adjusted SSE2 4*3 matrix transposition routine ( preserving Alpha_Z)    ; Now we can run a fast Dot3 ( 3-component vector ) calculation on the    ; 12 color components and 12 color coefficients ( 9 of each + 3 Alpha components )    ; Calculations are in parallel, 3 muls and 2 adds        movaps      xmm3,xmm0                   ; [BX GX RX --]    movaps      xmm4,xmm2                   ; [BZ GZ RZ AZ]    unpcklps    xmm3,xmm1                   ; [BX BY GX GY]    unpcklps    xmm4,xmm4                   ; [BZ BZ GZ GZ]    movhlps     xmm4,xmm3                   ; [GX GY GZ GZ]    movlhps     xmm3,xmm2                   ; [BX BY BZ GZ]    unpckhps    xmm0,xmm1                   ; [RX RY -- --]    shufps      xmm0,xmm2,Shuffle(3,2,1,0)  ; [RX RY RZ AZ]    shufps      xmm6,xmm6,Shuffle(0,3,2,1)  ; pre-load next ARGB pixel    addps       xmm3,xmm4                   ; [BX+GX BY+GY BZ+GZ GZ+GZ]    andps       xmm3,oword ptr ALPHA_mask   ; [BX+GX BY+GY BZ+GZ --]        addps       xmm0,xmm3                   ; [RX+BX+GX RY+BY+GY RZ+BZ+GZ AZ]                                            ; result: BGRA    movaps      oword ptr [edi],xmm0        ; Store BGRA Pixel in Real4 format    add         edi,16    dec         ebx    jnz         FourPixelLP    add         esi,16    dec         ecx    jnz         LoadFourPixels    retColorConversionInt2Float endp`
Creative coders use backward thinking techniques as a strategy.