Recent Posts

Pages: [1] 2 3 ... 10
1
UASM Assembler Development / Immediate too large
« Last post by Biterider on Today at 03:00:41 AM »
Hi
It seems that UASM 2.47 in 64 bit mode accepts such a line

Code: [Select]
mov PPP, 01122334455667788h

00007FF6B8582BA2 48 C7 05 D3 91 00 00 88 77 66 55 mov         qword ptr [PPP (07FF6B858BD80h)],55667788h 

without warning that the value is too large, but an error is issued when passing an offset.

I think that this behavior is not intended ...

Biterider
2
The Laboratory / Re: Fast Compare Real8 with SSE
« Last post by daydreamer on Today at 02:20:06 AM »
Hi DayDreamer. I´m finishing some details on the algorithm and will update with the backwards algorithm for you and Siekmanski
great  :t

3
RosAsm / Re: RosAsm update - Fev-2019 (V2.054h)
« Last post by guga on Today at 01:41:46 AM »
New Update
V 2.055a

Fixed FPU routine used on the debugger to identify NAN, Infinite, QNAN, Indefinite. The fixes are in the function: RealTenFPUNumberCategory and FloatToUString. Added 4 new FPU error categories to avoid Unknown FPU error. Those new errors Flags identifies when a TenByte does not contains the Integer Bit set (63th bit of the tenbyte that is the same as 31th bit of it´s 2nd dword). On such cases of lack of a integer bit, the FPU operators will simply refuses to process the TenByte number since it contains an error. So, to prevent such things,  i created a new category of errors to help identify all of this.

        SpecialFPU_SpecialIndefQNan         10      Special INDEFINITE QNAN (Same as QNAN, but happened on an TenByte without the integer bit set)
        SpecialFPU_SpecialIndefSNan         11      Special INDEFINITE SNAN (Same as SNAN, but happened on an TenByte without the integer bit set)
        SpecialFPU_SpecialIndefNegInfinite  12      Special INDEFINITE Negative Infinite (Same as Negative Infinite, but happened on an TenByte without the integer bit set)
        SpecialFPU_SpecialIndefPosInfinite  13      Special INDEFINITE Positive Infinite (Same as Positive Infinite, but happened on an TenByte without the integer bit set)

Also created a category to identify zeros. When a negative zero is found, it simply set to the same category as zero. After all, there´s no such a thing as a -0.
 SpecialFPU_Zero                     4       The FPU contains a valid zero number

So, the full set of new flags are as follows:

Code: [Select]
        Equate                              Value   Description
       
        SpecialFPU_PosValid                 0       The FPU contains a valid positive number.
        SpecialFPU_NegValid                 1       The FPU contains a valid negative number.
        SpecialFPU_PosSubNormal             2       The FPU produced a positive Subnormal (denormalized) number.
                                                    Although it´s range is outside the range 3.6...e-4932, the number lost it´ precision, but it is still valid
                                                    Ex: 0000 00000000 00000000
                                                        0000 00000000 FFFFFFFF
                                                        0000 00000000 00008000
                                                        0000 00000001 00000000
                                                        0000 FFFFFFFF FFFFFFFF
        SpecialFPU_NegSubNormal             3       The FPU produced a negative Subnormal (denormalized) number.
                                                    Although it´s range is outside the range -3.6...e-4932, the number lost it´ precision, but it is still valid
                                                    Ex: 8000 00000000 00000000 (0) (Negative zero must be considered only as zero)
                                                        8000 00000000 FFFFFFFF (-0.0000000156560127730E-4933)
                                                        8000 01000000 00000000 (-0.2626643080556322880E-4933)
                                                        8000 FFFFFFFF 00000001 (-6.7242062846585856000E-4932)
        SpecialFPU_Zero                     4       The FPU contains a valid zero number
        SpecialFPU_QNAN                     5       QNAN - Quite NAN (Not a number)
        SpecialFPU_SNAN                     6       SNAN - Signaling NAN (Not a number)
        SpecialFPU_NegInf                   7       Negative Infinite
        SpecialFPU_PosInf                   8       Positive Infinite
        SpecialFPU_Indefinite               9       Indefinite

        These 4 equates below are not the official ones from IEEE. They were created to represente the cases when the Integer bit of the TenByte was not
        present by some error on compilers. A tenbyte always should have this bit settled (value = 1). When it is not settled the FPU simply will
        refuses to process. To handle this lack of category of error we created the 4 ones below.
        The integer bit is the 63th bit of the tenbyte (or 31 of the 2nd dword) was not set
       
        SpecialFPU_SpecialIndefQNan         10      Special INDEFINITE QNAN (Same as QNAN, but happened on an TenByte without the integer bit set)
        SpecialFPU_SpecialIndefSNan         11      Special INDEFINITE SNAN (Same as SNAN, but happened on an TenByte without the integer bit set)
        SpecialFPU_SpecialIndefNegInfinite  12      Special INDEFINITE Negative Infinite (Same as Negative Infinite, but happened on an TenByte without the integer bit set)
        SpecialFPU_SpecialIndefPosInfinite  13      Special INDEFINITE Positive Infinite (Same as Positive Infinite, but happened on an TenByte without the integer bit set)

New test Version 2.055a
Portal
Forum


Many tks to Raymond,  JJ and AW
4
The Workshop / Re: Rounding Mode in FPU question
« Last post by guga on Today at 12:04:09 AM »
You're welcome. Best wishes for your project.

One minor clarification to my last statement.
Quote
The explicit 1 in bit 63 always remains set for the REAL10 format
The one exception is when all exponent and fraction bits are 0's, defining the value of +/-0 depending on the sign bit.

Thanks  :t :t :t :t  :)


About the integer being set i did that and the cases of zero, i did those as you described :)  For negative zeros i put those on the same category as zero, since there´s no such a thing as a -0

I just added 4 new categories on the rare situations where the integer bit is not present on the TenByte (rare, but happens when you analyse some old libraries from a disassembler) resulting in an error on the FPU computations that will reject it.

[SpecialFPU_SpecialIndefQNan 10] ; Special INDEFINITE QNAN . Same as QNAN, but without the integer bit set
[SpecialFPU_SpecialIndefSNan 11] ; Special INDEFINITE SNAN. Same as SNAN, but without the integer bit set
[SpecialFPU_SpecialIndefNegInfinite 12] ; Special INDEFINITE Negative Infinite. Same as Negative Infinite, but without the integer bit set
[SpecialFPU_SpecialIndefPosInfinite 13] ; Special INDEFINITE Positive Infinite. Same as Positive Infinite, but without the integer bit set

For the disassembler and debugger purposes, i added those just to better categorize the error type, rather then throw an Unknown Format flag.

The updated code looks like this:

Code: [Select]

[SpecialFPU_PosValid 0] ; The FPU contains a valid positive result
[SpecialFPU_NegValid 1] ; The FPU contains a valid negative result
[SpecialFPU_PosSubNormal 2] ; The FPU produced a positive Subnormal (denormalized) number. So, although it´ range is outside the range 3.6...e-4932, the number lost it´ precision, but it is still valid
[SpecialFPU_NegSubNormal 3] ; The FPU produced a negative Subnormal (denormalized) number. So, although it´ range is outside the range -3.6...e-4932, the number lost it´ precision, but it is still valid
[SpecialFPU_Zero 4] ; The FPU contains a valid zero number
[SpecialFPU_QNAN 5] ; QNAN
[SpecialFPU_SNAN 6] ; SNAN
[SpecialFPU_NegInf 7] ; Negative Infinite
[SpecialFPU_PosInf 8] ; Positive Infinite
[SpecialFPU_Indefinite 9] ; Indefinite
[SpecialFPU_SpecialIndefQNan 10] ; Special INDEFINITE QNAN
[SpecialFPU_SpecialIndefSNan 11] ; Special INDEFINITE SNAN
[SpecialFPU_SpecialIndefNegInfinite 12] ; Special INDEFINITE Negative Infinite
[SpecialFPU_SpecialIndefPosInfinite 13] ; Special INDEFINITE Positive Infinite

Proc RealTenFPUNumberCategory:
    Arguments @Float80Pointer
    Local @FPUErrorMode, @TenByteErr
    Uses ebx


    mov ebx D@Float80Pointer
    mov D@FPUErrorMode SpecialFPU_PosValid

    ; 1st located all zero numbers (no bits settled on the whole TenByte or, only the Sign bit (15th of the last word = 79th bit of the Tenbyte ) was settled.
    ; Ex: D$ 0, 0, W$ 0 ; (0)
    ; Ex: D$ 0, 0, W$ 08000 ; negative zero is only zero
    .If_and D$ebx = 0, D$ebx+4 = 0
        ; ebx+8 = 08000 means a negative zero. But, there´s no such a thing as a negative zero, thus, it should be considered only 0
        If_Or W$ebx+8 = 0, W$ebx+8 = 08000
            mov eax SpecialFPU_Zero ; eax wil exist as Zero
            ExitP
        End_If
    .End_If

    ; Check if the TenByte is valid. Check for the integer bit (bit 63 of the tenbyte = bit 31 of the 2nd dword)
    mov D@TenByteErr &FALSE
    Test_If_Not B$ebx+7 080 ; See if the integer bit is present on the tenbyte. No integer bit ? Flag it as an error
        mov D@TenByteErr &TRUE
    Test_End


    ; Possible NANs contains always the 1st 2 bits of the 2nd word settled and the last 2 bit settled (pg 91/100 on Intel Manual)
    ; The biased exponent is on this form for NAN 11..11

     ; 2nd located all denormalized, but possible positive numbers.
     ; Ex: D$ 0FFFFFFFF, 0, W$ 0         ; (1.56560127731845315e-4941)
     ; Ex: D$ 0, 01, W$ 0                ; (1.56560127768297311e-4941)
     ; Ex: D$ 0FFFFFFFF, 0FFFFFFFF, W$ 0 ; (6.72420628622418417e-4932)
                                         ; (6.7242062862241870120e-4932) (Olly)

    ...If_And W$ebx+8 = 0, D$ebx+4 >= 0, D@TenByteErr = &FALSE ; No errors in the tenbyte. It also do contains the integer bit

        mov D@FPUErrorMode SpecialFPU_PosSubNormal

    ...Else_If_And W$ebx+8 = 08000, D$ebx+4 >= 0, D@TenByteErr = &FALSE ; No errors in the tenbyte. It also do contains the integer bit

    ; 3rd located all denormalized, but possible negative numbers. Bit 15th of the last word (Bit79 of the tenbyte) is settled
    ; Ex: D$ 0FFFFFFFF, 0, W$ 08000 ; (-1.56560127731845315e-4941)
    ; Ex: D$ 0, 01000000, W$ 08000  ; (-2.626643080565632194e-4934)  in olly dbg = (-0.2626643080556323050e-4933)
    ; Ex: D$ 01, 0FFFFFFFF, W$ 08000 ; (-6.72420628465858289e-4932)  in olly dbg = (-6.7242062846585857350e-4932)

        mov D@FPUErrorMode SpecialFPU_NegSubNormal

    ...Else_If W$ebx+8 = 07FFF ; Locate all positive infinite, QNAN, Special indefinite 00__0111_1111__1111_1111
    ; 00__0110_0000__0000_0011 06003 ; error happen here
    ; 00__0111_1111__1111_1111 07FFF ; error happen here

    ; 00__0111_1111__1111_1110 07FFE ; ok normal
    ; 00__0110_0000__0000_0010 06002 ; ok normal
    ; 00__0000_0000__0000_0001 01 ; ok normal

        ; locate all: Positive Infinite (Bit 15 of the last word is not set), the second dword are zero and the 1st dword contains only the integer bit 00__1000_0000__0000_0000
        ; Ex: D$ 0, 0, W$ 07FFF  ; 07FFF 00000000 00000000 when bit15 is not set it is positive 00__0111_1111__1111_1111

        .If_And D$ebx+4 = 080000000, D$ebx = 0; 2nd dword = 00__1000_0000__0000_0000__0000_0000__0000_0000 1st dword = 0

            mov D@FPUErrorMode SpecialFPU_PosInf

        .Else_If_And D$ebx+4 >= 0C0000000, D$ebx >= 01

            mov D@FPUErrorMode SpecialFPU_QNAN

        .Else_If_And D$ebx+4 >= 080000000, D$ebx >= 01

            mov D@FPUErrorMode SpecialFPU_SNAN

        .Else_If_And D$ebx+4 >= 040000000, D$ebx >= 01
            ; 00__0100_0000__0000_0000__0000_0000__0000_0000 If the compiler made an error and didn´t inserted the integer bit (bit 63 of the tenbyte or 31 of the 2nd dword)
            mov D@FPUErrorMode SpecialFPU_SpecialIndefQNan

        .Else_If D$ebx >= 01
            ; 2nd Dword = 0, and at least 1 bit settled on the 1st dword. If the compiler made an error and didn´ inserted the integer bit (bit 63 of the tenbyte or 31 of the 2nd dword)

            mov D@FPUErrorMode SpecialFPU_SpecialIndefSNan

        .Else
            ; all remaining cases, result only in D$ebx = 0. The lack of the 63th bit on this case could represente also a Indefinite Positive, but we are here labeling it as Indefinite infinite to be more
            ; logical with the Indefinite category
            mov D@FPUErrorMode SpecialFPU_SpecialIndefPosInfinite

        .End_If

    ...Else_If W$ebx+8 = 0FFFF ; Locate all negative infinite, QNAN, indefinite

        .If_And D$ebx+4 = 080000000, D$ebx = 0; 2nd dword = 00__1000_0000__0000_0000__0000_0000__0000_0000 1st dword = 0

            mov D@FPUErrorMode SpecialFPU_NegInf

        .Else_If_And D$ebx+4 >= 0C0000000, D$ebx >= 01

            mov D@FPUErrorMode SpecialFPU_QNAN

        .Else_If_And D$ebx+4 >= 080000000, D$ebx >= 01

            mov D@FPUErrorMode SpecialFPU_SNAN

        .Else_If_And D$ebx+4 = 0C0000000, D$ebx = 0

            mov D@FPUErrorMode SpecialFPU_Indefinite

        .Else_If_And D$ebx+4 >= 040000000, D$ebx >= 01
            ; 00__0100_0000__0000_0000__0000_0000__0000_0000 If the compiler made an error and didn´t inserted the integer bit (bit 63 of the tenbyte or 31 of the 2nd dword)
            mov D@FPUErrorMode SpecialFPU_SpecialIndefQNan

        .Else_If D$ebx >= 01
            ; 2nd Dword = 0, and at least 1 bit settled on the 1st dword. If the compiler made an error and didn´ inserted the integer bit (bit 63 of the tenbyte or 31 of the 2nd dword)

            mov D@FPUErrorMode SpecialFPU_SpecialIndefSNan

        .Else
            ; all remaining cases, result only in D$ebx = 0.

            mov D@FPUErrorMode SpecialFPU_SpecialIndefNegInfinite

        .End_If

    ...Else

        ; Now we must analyse the cases where the integer Bit (Bit 63 of tenbyte or 31th of the 2nd dword) is not settled by some error
        ; and the FPU will simply refuse to process
        ; Ex: [NumTest18: D$ 0, 013900F00, W$ 07ED]
        ..If D$ebx+4 < 080000000 ; Integer bit is never settled

            .If_And D$ebx+4 >= 040000000, D$ebx >= 01
                ; 00__0100_0000__0000_0000__0000_0000__0000_0000 If the compiler made an error and didn´t inserted the integer bit (bit 63 of the tenbyte or 31 of the 2nd dword)
                mov D@FPUErrorMode SpecialFPU_SpecialIndefQNan

            .Else_If D$ebx >= 01
                ; 2nd Dword = 0, and at least 1 bit settled on the 1st dword. If the compiler made an error and didn´ inserted the integer bit (bit 63 of the tenbyte or 31 of the 2nd dword)

                mov D@FPUErrorMode SpecialFPU_SpecialIndefSNan

            .Else
                ; all remaining cases, result only in D$ebx = 0 and we must only check if it is positive or negative
                ; Sign bit is settled
                Test_If B$ebx+9 0_80
                    mov D@FPUErrorMode SpecialFPU_SpecialIndefNegInfinite
                Test_Else
                    ; Sign bit is never settled
                    mov D@FPUErrorMode SpecialFPU_SpecialIndefPosInfinite
                Test_End
            .End_If
        ..End_If

    ...End_If

    .If D@FPUErrorMode = SpecialFPU_PosValid
        Test_If B$ebx+9 0_80
            mov D@FPUErrorMode SpecialFPU_NegValid
        Test_End
    .End_If

    mov eax D@FPUErrorMode

EndP

5
The Laboratory / Re: Fast Compare Real8 with SSE
« Last post by Siekmanski on February 16, 2019, 11:54:57 PM »
 :t
6
The Laboratory / Re: Fast Compare Real8 with SSE
« Last post by guga on February 16, 2019, 11:46:32 PM »
Guga please post PDF,I dug up and found SSE/SSE2 tutorial and posted it in my SSE thread to keep it from disappearing among lots of posts
I think if you understand the way of packed compares work,you dont need to be stuck with lots of packed SSE code bottlenecked with scalar IF's
I want to make a realtime colorization demo,starting with a simpler colorspace to begin with ,if CieLCH is too slow

Siekmanski great work :t

Hi DayDreamer. I´m finishing some details on the algorithm and will update with the backwards algorithm for you and Siekmanski
7
The Laboratory / Re: Fast Compare Real8 with SSE
« Last post by guga on February 16, 2019, 11:29:41 PM »
CieLCH seems to be the best option but also the most complicated color space conversion algorithm.
For me this is a new field to explore and have to learn a lot more to understand it fully.
Looking forward to see your CieLCH routine when finished.  :t

Found this paper: http://jcgt.org/published/0002/02/01/paper.pdf

In my own logic, I always try to understand algorithms by working my way backwards.
Then I try to simplify the calculations if possible.
In the case of color space conversion you can precalculate the coefficients for a dot3 matrix routine.

Once you know the 3*3 matrix coefficients for the "forward" RGB -> XYZ calculations,
you can use the inverse of the 3*3 matrix coefficients to compute the "backwards" XYZ -> RGB.
This way, the "backwards" results will always be correct.

Bruce Lindbloom has done some of the coefficients math for us to compute the RGB -> XYZ and XYZ -> RGB matrices.
http://www.brucelindbloom.com/Eqn_RGB_XYZ_Matrix.html

There is a CIE Color Calculator on his site:
http://www.brucelindbloom.com/ColorCalculator.html

My routines are not finished yet but I will post them when ready, then we can check if it is done right.

Here is the dot3 color conversion routine I wrote, the one is used in the example above in reply #70.
I wrote it in SSE2 instructions so it can be used on older computers as well. ( so no fancy byte shuffles in this one. )
Adjusted the 3*3 matrix transpose routine to handle 4 row elements preserving alpha

Code: [Select]
                ;    B          G          R          A
CIERGB2XYZ  real4  0.2006017, 0.3106803, 0.4887180, 0.0 ; X
            real4  0.0108109, 0.8129847, 0.1762044, 0.0 ; Y
            real4  0.9897952, 0.0102048, 0.0000000, 1.0 ; Z ( AZ must be 1.0 )

ALPHA_mask  dd  -1,-1,-1,0


align 4
ColorConversionInt2Float proc uses ebx esi edi BitmapWidth:DWORD,BitmapHeight:DWORD,pSourceMem:DWORD,pDestinationMem:DWORD,pConversionType:DWORD

    mov         esi,pSourceMem
    mov         edi,pDestinationMem
    mov         edx,pConversionType
   
    mov         ecx,BitmapWidth
    imul        ecx,BitmapHeight
    shr         ecx,2
   
    pxor        xmm5,xmm5                   ; Empty the source operand, to zero the integer high parts,
                                            ; in the "punpcklbw", "punpcklwd" instructions
align 16
LoadFourPixels:
    mov         ebx,4
    movdqa      xmm6,oword ptr [esi]        ; Load 4 ARGB pixels at once

FourPixelLP:
    movq        xmm0,xmm6                   ; 1 pixel
    punpcklbw   xmm0,xmm5                   ; Convert 4 bytes to 4 words
    punpcklwd   xmm0,xmm5                   ; Convert 4 words to 4 dwords
    cvtdq2ps    xmm0,xmm0                   ; Convert 4 dwords to 4 real4 values

    movaps      xmm1,xmm0                   ; [B G R A]
    movaps      xmm2,xmm0                   ; [B G R A]
    mulps       xmm0,oword ptr [edx]        ; [BX GX RX  --] Multiply Color X conversion coefficients
    mulps       xmm1,oword ptr [edx+16]     ; [BY GY RY  --] Multiply Color Y conversion coefficients
    mulps       xmm2,oword ptr [edx+32]     ; [BZ GZ RZ  AZ] Multiply Color Z conversion coefficients

    ; Color conversion using an adjusted SSE2 4*3 matrix transposition routine ( preserving Alpha_Z)
    ; Now we can run a fast Dot3 ( 3-component vector ) calculation on the
    ; 12 color components and 12 color coefficients ( 9 of each + 3 Alpha components )
    ; Calculations are in parallel, 3 muls and 2 adds
   
    movaps      xmm3,xmm0                   ; [BX GX RX --]
    movaps      xmm4,xmm2                   ; [BZ GZ RZ AZ]
    unpcklps    xmm3,xmm1                   ; [BX BY GX GY]
    unpcklps    xmm4,xmm4                   ; [BZ BZ GZ GZ]
    movhlps     xmm4,xmm3                   ; [GX GY GZ GZ]
    movlhps     xmm3,xmm2                   ; [BX BY BZ GZ]
    unpckhps    xmm0,xmm1                   ; [RX RY -- --]
    shufps      xmm0,xmm2,Shuffle(3,2,1,0)  ; [RX RY RZ AZ]
    shufps      xmm6,xmm6,Shuffle(0,3,2,1)  ; pre-load next ARGB pixel
    addps       xmm3,xmm4                   ; [BX+GX BY+GY BZ+GZ GZ+GZ]
    andps       xmm3,oword ptr ALPHA_mask   ; [BX+GX BY+GY BZ+GZ --]   
    addps       xmm0,xmm3                   ; [RX+BX+GX RY+BY+GY RZ+BZ+GZ AZ]
                                            ; result: BGRA
    movaps      oword ptr [edi],xmm0        ; Store BGRA Pixel in Real4 format
    add         edi,16
    dec         ebx
    jnz         FourPixelLP
    add         esi,16
    dec         ecx
    jnz         LoadFourPixels
    ret

ColorConversionInt2Float endp

Great work :t

About the transposing matrix. Yes. The backwards calculation uses the inverse matrix transposed. All of this is precalculated way before the routines RGBtoCieLch/CieLCHtoRGB functions starts. One thing only, don´t forget to include the gamma adjustments and white reference i told on the paper.


Bruce has done a great job on all of this, but stil the problem on this colorspace in particular remains. He managed to fix the discontinuity problem adjusting the threshold after the RGB is converted to XYZ before converting it to Lab/LCH but, what he didn´t thought is to check the limits between Hue/Chroma and Luminosity that do have some issues if you use the CIE formula without the necessary fixes.

If you use the formula on the "normal"  way you will end up having to clip the resultant R, G or B on the backwards computation. So, when yu are doing the backwards computation you always needs to check if the relation beetween chroma/hue and luma fits to tjhe formula i proposed.

About hue, the delta hue i showed previously [5*cos(Hue)-2*sin(Hue)] seems to be limited to sqrt(29). I´ll finish the paper updating it to include the backward formula and post it here for you see.





8
The Workshop / Re: Rounding Mode in FPU question
« Last post by daydreamer on February 16, 2019, 08:52:15 PM »
According to the data which I had gathered at the time, I would have the following for the REAL10 format:

Positive INFINITY is represented by the bit pattern 7FFF_80000000_00000000
Negative INFINITY is represented by the bit pattern FFFF_80000000_00000000

thanks raymond,its good to know infinitys so I can choose
;x/y

.IF y!=0
fld x
fdiv y
.ELSE
fld infinity ;humans thinks x/divbyzero=infinity
or
fldz ;humans think 0*0=0 as div by zero
depending on what works best in the formula you code

9
The Laboratory / Re: Fast Compare Real8 with SSE
« Last post by daydreamer on February 16, 2019, 08:28:49 PM »
Guga please post PDF,I dug up and found SSE/SSE2 tutorial and posted it in my SSE thread to keep it from disappearing among lots of posts
I think if you understand the way of packed compares work,you dont need to be stuck with lots of packed SSE code bottlenecked with scalar IF's
I want to make a realtime colorization demo,starting with a simpler colorspace to begin with ,if CieLCH is too slow

Siekmanski great work :t
10
The Laboratory / Re: My SSE macros
« Last post by Siekmanski on February 16, 2019, 08:25:16 PM »
Thanks  :t
Pages: [1] 2 3 ... 10