Rounding Mode in FPU question

raymond · February 14, 2019, 06:50:30 AM

Whenever all the bits are set to 1 in the exponent field of a real number format, the value is designated as a NAN.

The number you used as an example would thus not qualify as a NAN. However, it would qualify as a denormalized number (http://www.ray.masmcode.com/tutorial/fpuchap2.htm#denormal) which is very different from a NAN.

aw27 · February 14, 2019, 07:08:18 AM

To be sincere I have no idea what guga is talking about and can't even see how he obtains the values he mentions.

guga · February 14, 2019, 07:12:45 AM

Hi Raymond, not sure if i´m fully undertanding this.

From what i understood on the documentation, the Word 07ED (last 2 bytes from the TenByte) represents the sign bit (79) and the exponent, right ? And so, they can´t be denormalized since the last Word (of the tenbyte) is not zero and the second Dword also is not zero. Demnormalized numbers are the ones where the second Dword of the tenbyte is zero and the last word is either zero or 08000, right ?

So, is this number qualified as an -inifinte ? I don´t understand why you say that this number is denormalized.

guga · February 14, 2019, 07:16:50 AM

I´m quite confused right now.

I created a function to categorize the FPU numbers (Tenbyte in memory). Is this correct ?

Code Select



    RealTenFPUNumberCategory
        This function identifies the Errors existant in a Real10 FPU data.

    Parameters:
        Float80Pointer - A pointer to a variable containing a TenByte (80 bit) value

    Returned Values:
    
        The function will return one of the following equates:

        Equate                          Value   Description
        
        SpecialFPU_PosValid             0       The FPU contains a valid positive number.
        SpecialFPU_NegValid             1       The FPU contains a valid negative number.
        SpecialFPU_PosSubNormal         2       The FPU produced a positive Subnormal (denormalized) number.
                                                Although it´s range is outside the range 3.6...e-4932, the number lost it´ precision, but it is still valid
                                                Ex: 0000 00000000 00000000
                                                    0000 00000000 FFFFFFFF
                                                    0000 00000000 00008000
                                                    0000 00000001 00000000
                                                    0000 FFFFFFFF FFFFFFFF
        SpecialFPU_NegSubNormal         3       The FPU produced a negative Subnormal (denormalized) number.
                                                Although it´s range is outside the range -3.6...e-4932, the number lost it´ precision, but it is still valid
                                                Ex: 8000 00000000 00000000 (0) (Negative zero must be considered only as zero)
                                                    8000 00000000 FFFFFFFF (-0.0000000156560127730E-4933)
                                                    8000 01000000 00000000 (-0.2626643080556322880E-4933)
                                                    8000 FFFFFFFF 00000001 (-6.7242062846585856000E-4932)
        SpecialFPU_QNAN                 4       QNAN - Quite NAN (Not a number)
        SpecialFPU_SNAN                 5       SNAN - Signaling NAN (Not a number)
        SpecialFPU_NegInf               6       Negative Infinite
        SpecialFPU_PosInf               7       Positive Infinite
        SpecialFPU_Indefinite           8       Indefinite
        SpecialFPU_SpecialIndefQNan     9       Special INDEFINITE QNAN
        SpecialFPU_SpecialIndefSNan     10      Special INDEFINITE SNAN
        SpecialFPU_SpecialIndefInfinite 11      Special INDEFINITE Infinite
__________________________________________________________________________

; Equates related to the function

[SpecialFPU_PosValid 0] ; The FPU contains a valid positive result
[SpecialFPU_NegValid 1] ; The FPU contains a valid negative result
[SpecialFPU_PosSubNormal 2] ; The FPU produced a positive Subnormal (denormalized) number. So, although it´ range is outside the range 3.6...e-4932, the number lost it´ precision, but it is still valid
[SpecialFPU_NegSubNormal 3] ; The FPU produced a negative Subnormal (denormalized) number. So, although it´ range is outside the range -3.6...e-4932, the number lost it´ precision, but it is still valid
[SpecialFPU_QNAN 4] ; QNAN
[SpecialFPU_SNAN 5] ; SNAN
[SpecialFPU_NegInf 6] ; Negative Infinite
[SpecialFPU_PosInf 7] ; Positive Infinite
[SpecialFPU_Indefinite 8] ; Indefinite
[SpecialFPU_SpecialIndefQNan 9] ; Special INDEFINITE QNAN
[SpecialFPU_SpecialIndefSNan 10] ; Special INDEFINITE SNAN
[SpecialFPU_SpecialIndefInfinite 11] ; Special INDEFINITE Infinite

; Updated in 12/02/2019
Proc RealTenFPUNumberCategory:
    Arguments @Float80Pointer
    Local @FPUErrorMode
    Uses edi, ebx


    mov ebx D@Float80Pointer
    mov D@FPUErrorMode SpecialFPU_PosValid

    ...If_And W$ebx+8 = 0, D$ebx+4 = 0 ; This is denormalized, but it is possible.
        ; 0000 00000000 00000000
        ; 0000 00000000 FFFFFFFF
        mov D@FPUErrorMode SpecialFPU_PosSubNormal
    ...Else_If_And W$ebx+8 = 0, D$ebx+4 > 0 ; This is Ok.
        ; 0000 00000001 00000000
        ; 0000 FFFFFFFF FFFFFFFF
        mov D@FPUErrorMode SpecialFPU_PosSubNormal
    ...Else_If_And W$ebx+8 > 0, W$ebx+8 < 07FFF; This is ok only if the fraction Dword is bigger or equal to 080000000
        .If D$ebx+4 < 080000000
            .Test_If D$ebx+4 040000000
                ; QNAN 40000000
                mov D@FPUErrorMode SpecialFPU_QNAN
            .Test_Else
                If_And D$ebx+4 > 0, D$ebx > 0
                    ; SNAN only if at least 1 bit is set
                    mov D@FPUErrorMode SpecialFPU_SNAN
                Else ; All fraction Bits are 0
                    ; Bit 15 is never reached. The bit is 0 from W$ebx+8
                    ; -INFINITE ; Bit15 = 0
                    mov D@FPUErrorMode SpecialFPU_NegInf
                End_If
            .Test_End
        .End_If
    ...Else_If W$ebx+8 = 07FFF; This is ok only if the fraction Dword is bigger or equal to 080000000
        .Test_If D$ebx+4 040000000
            ; QNAN 40000000
            mov D@FPUErrorMode SpecialFPU_QNAN
        .Test_Else
            If_And D$ebx+4 > 0, D$ebx > 0
                ; SNAN only if at least 1 bit is set
                mov D@FPUErrorMode SpecialFPU_SNAN
            Else ; All fraction Bits are 0
                ; Bit 15 is never reached. The bit is 0 from W$ebx+8
                ; -INFINITE ; Bit15 = 0
                mov D@FPUErrorMode SpecialFPU_NegInf
            End_If
        .Test_End
        ; Below is similar to W$ebx+8 = 0
    ...Else_If_And W$ebx+8 = 08000, D$ebx+4 = 0 ; This is denormalized, but possible.
        ; 8000 00000000 00000000 (0)
        ; 8000 00000000 FFFFFFFF (-0.0000000156560127730E-4933)
        mov D@FPUErrorMode SpecialFPU_NegSubNormal
    ...Else_If_And W$ebx+8 = 08000, D$ebx+4 > 0 ; This is Ok.
        ; 8000 01000000 00000000 (-0.2626643080556322880E-4933)
        ; 8000 FFFFFFFF 00000001 (-6.7242062846585856000E-4932)
        mov D@FPUErrorMode SpecialFPU_NegSubNormal
    ...Else_If_And W$ebx+8 > 08000, W$ebx+8 < 0FFFF; This is ok only if the fraction Dword is bigger or equal to 080000000
        .If D$ebx+4 < 080000000
            .Test_If D$ebx+4 040000000
                ; QNAN 40000000
                mov D@FPUErrorMode SpecialFPU_QNAN
            .Test_Else
                If_And D$ebx+4 > 0, D$ebx > 0
                    ; SNAN only if at least 1 bit is set
                    ;mov D$edi 'SNaN', B$edi+4 0
                    mov D@FPUErrorMode SpecialFPU_SNAN
                Else ; All fraction Bits are 0
                    ; Bit 15 is always reached. The bit is 1 from W$ebx+8
                    ; +INFINITE ; Bit15 = 1
                    ;mov D$edi '+INF', B$edi+4 0
                    mov D@FPUErrorMode SpecialFPU_PosInf
                End_If
            .Test_End
        .End_If

    ...Else_If W$ebx+8 = 0FFFF; This is to we identify indefined or other NAN values

        .If_And D$ebx+4 >= 040000000, D$ebx = 0
            ; INDEFINITE
            mov D@FPUErrorMode SpecialFPU_Indefinite
        .Else
            .Test_If D$ebx+4 040000000
                ; Special INDEFINITE QNAN 40000000
                mov D@FPUErrorMode SpecialFPU_SpecialIndefQNan
            .Test_Else
                If_And D$ebx+4 > 0, D$ebx > 0
                    ; Special INDEFINITE SNAN only if at least 1 bit is set
                    mov D@FPUErrorMode SpecialFPU_SpecialIndefSNan
                Else ; All fraction Bits are 0
                    ; Bit 15 is always reached. The bit is 1 from W$ebx+8
                    ; Special INDEFINITE +INFINITE ; Bit15 = 1
                    mov D@FPUErrorMode SpecialFPU_SpecialIndefInfinite
                End_If
            .Test_End
        .End_If
    ...End_If

    .If D@FPUErrorMode = SpecialFPU_PosValid
        Test_If B$ebx+9 0_80
            mov D@FPUErrorMode SpecialFPU_NegValid
        Test_End
    .End_If
    mov eax D@FPUErrorMode

EndP

aw27 · February 14, 2019, 07:33:46 AM

In my computer when I load the 0x07ED13900F0000000000 from memory I get +1.5836591183212933e-4322
I don't see neither a NaN nor a denormalized number. I can see a denormalized number if I look at the number backwards.

BTW, since we are talking about backwards, coffee spelled backwards is eeffoc. Just know that I don't give eeffoc until I've had some coffee.

K_F · February 14, 2019, 08:35:24 AM

Quote from: Adamanteus on February 14, 2019, 04:58:59 AM
Quote from: K_F on February 12, 2019, 07:02:27 AM
A possible way of re-introducing the lost resolution is to run a 'random' generator at the 'resolution of lost bits', and then add that to the truncated Real4 (Real8).
A way of minimising FPU error propagation. ;)
Particular way of solving such type of problems, is to have ONE type for all calculations, I'm using this :
Code Select Expand
MACE TYPEDEF REAL10

True, Sometimes you have to convert down from Real10 ;)

guga · February 14, 2019, 10:14:35 AM

AW, you are feeding an inverted order of the numbers, i presume. That´s why you are having different results from me and Raymond

I was talking about this sequence: db 0, 0, 0, 0, 0, 0F, 090, 013, 0ED, 07

The exponent and the sign are the last 2 bytes of the tenbyte.

I succeeded to convert the function to be ported to masm. Sorry for the lack os macros, i actually don´t remember the syntax in masm, but ported the whole functions if someone is interested in convert it to masm syntax more properly.

I´m not understanding why Raymond says it is denormalized, while on mine version and on ollydbg, this number shows as an -Infinite. What i´m doing wrong ?

Btw, the specifications of usage of the function is as follows:

Code Select


    FloatToUString - Updated in 10/02/2019
    
    This function converts a FPU Number to decimal string (positive or negative) to be displayed on the debugger

    Parameters:
        
        Float80Pointer - A pointer to a variable containing a TenByte (80 bit) value to be converted to decimal string.
        
        DestinationPointer - A buffer to hold the converted string. The size of the buffer must be at least 32 bytes.
                             A maximum of 19 chars (including the dot) will be converted.
                             If the number cannot be converted, the buffer will contain a string representing the proper
                             category of the FPU error, such as: QNAN, SNAN, Infinite, Indefinite.
        
        TruncateBytes - The total amount of bytes to truncate. You can truncate a maximum of 3 numbers
                        on the end of a string. The truncation is to prevent rounding errors of some
                        compilers when tries to convert Floating Point Units.
                        For Terabytes we can discard the last 3 Bytes that are related to a diference in the error mode.
                        But, if you want to maintain accuracy, leave this parameter as 0.
        
        AddSubNormalMsg - A flag to enable append a warning message at the end of the number stored on the buffer at the DestinationPointer,
                          labeling it as a "(Bad)" number (positive or negative) meaning that the number is way too below the limit for
                          the FPU TenByte and is decreasing precision.
                           
                          To append the warning message, set this flag to &TRUE. Otherwise, set it to &FALSE.
                                                      
                          The 80-bit floating point format has a range (including subnormals) from approximately 3.65e-4951 to 1.18e4932.
                          Normal numbers from within the range 3.36210314311209208e-4932 to 1.18e4932) keeps their accuracy.
                          Numbers below that limit are called "SubNormal" (or denormalized) on a range from 3.65e-4951 to 3.362103...e-4932
                           
                          All subnormal numbers decreases their precision as they are going away from the limit of a normal number.
                          It have an approximated amount of 2^63 subnormal numbers that are way too close to zero and decreasing precision.
                           
                          The limit of a normal number is: 3.36210314311209208e-4932 (equivalent to declare it as: "FPULimit: D$ 0, 080000000, W$ 01")
                          
                          

    Return Values:

        The function will return one of the following equates:

        Equate                          Value   Description
        
        SpecialFPU_PosValid             0       The FPU contains a valid positive number.
        SpecialFPU_NegValid             1       The FPU contains a valid negative number.
        SpecialFPU_PosSubNormal         2       The FPU produced a positive Subnormal (denormalized) number.
                                                Although it´s range is outside the range 3.6...e-4932, the number lost it´ precision, but it is still valid
                                                Ex: 0000 00000000 00000000
                                                    0000 00000000 FFFFFFFF
                                                    0000 00000000 00008000
                                                    0000 00000001 00000000
                                                    0000 FFFFFFFF FFFFFFFF
        SpecialFPU_NegSubNormal         3       The FPU produced a negative Subnormal (denormalized) number.
                                                Although it´s range is outside the range -3.6...e-4932, the number lost it´ precision, but it is still valid
                                                Ex: 8000 00000000 00000000 (0) (Negative zero must be considered only as zero)
                                                    8000 00000000 FFFFFFFF (-0.0000000156560127730E-4933)
                                                    8000 01000000 00000000 (-0.2626643080556322880E-4933)
                                                    8000 FFFFFFFF 00000001 (-6.7242062846585856000E-4932)
        SpecialFPU_QNAN                 4       QNAN - Quite NAN (Not a number)
        SpecialFPU_SNAN                 5       SNAN - Signaling NAN (Not a number)
        SpecialFPU_NegInf               6       Negative Infinite
        SpecialFPU_PosInf               7       Positive Infinite
        SpecialFPU_Indefinite           8       Indefinite
        SpecialFPU_SpecialIndefQNan     9       Special INDEFINITE QNAN
        SpecialFPU_SpecialIndefSNan     10      Special INDEFINITE SNAN
        SpecialFPU_SpecialIndefInfinite 11      Special INDEFINITE Infinite

jj2007 · February 14, 2019, 12:15:43 PM

For exploring the mysteries of REAL10 notation 8)

picked=5
ife picked
NaN REAL10 0x07ED13900F0000000000
elseif picked eq 1
NaN REAL10 0x00000000000F9013ED07
elseif picked eq 2
NaN db 07, 0EDh, 13h, 90h, 0Fh, 00, 00, 00, 00, 00
elseif picked eq 3
NaN db 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
elseif picked eq 4
NaN REAL10 0x00000000000000000001
elseif picked eq 5
NaN db 1, 0, 0, 0, 0, 0, 0, 0, 0, 0
elseif picked eq 6
NaN db 00, 00h, 00h, 00h, 00h, 00h, 00, 80h, 00, 00
endif
...
int 3
fld REAL10 ptr NaN
fld FP10(1.0e4900)
fmul

picked 5 produces 3.6452e-51 after multiplying with 1.0e4900 - remember what Wiki says about the range?

Specific Watcom mysteries: I expected picked 4 & 5 to be identical. What exactly does the REAL10 0x111 notation mean?

raymond · February 14, 2019, 12:43:15 PM

Quote from: guga on February 14, 2019, 07:12:45 AM
Hi Raymond, not sure if i´m fully undertanding this.

From what i understood on the documentation, the Word 07ED (last 2 bytes from the TenByte) represents the sign bit (79) and the exponent, right ? And so, they can´t be denormalized since the last Word (of the tenbyte) is not zero and the second Dword also is not zero. Demnormalized numbers are the ones where the second Dword of the tenbyte is zero and the last word is either zero or 08000, right ?

Maybe 'right' or 'wrong' depending on what you happen to consider the last 2 bytes. You would be right if they are the last 2 bytes in memory; wrong if you consider the 80 bits as a whole number.

I think we are getting confused with the actual location of the 10 bytes. The most significant byte of a TBYTE would have the sign bit followed by the 7 most significant bits of the biased exponent. The next byte would have the remaining 8 bits of the 15-bit biased exponent.

In the case we are discussing, the most significant byte appears to be either 7 or ED (the 7 indicating a positive number or the ED indicating a negative number), with the other byte being the remaining bits of the exponent depending on how your display of W$ 07ED is interpreted. Either way, such could never be considered as a NAN.

raymond · February 14, 2019, 02:44:29 PM

I may have solved your mystery. If you look closely at the description of a REAL10, you will notice the following:

As opposed to the REAL4 and REAL8 formats, the first bit of the number is explicitly included in the significand field and followed by the fraction bits f1, f2, etc.

Therefore, if bit #63 of a REAL10 in memory is not set to a 1, it would not be considered as a valid extended precision "float". I would neither have the format of a NAN if the exponent bits are not all 1s, nor of a valid denormalized number if the exponent bits are not all 0s. The FPU would simply refuse to process it.

Such could happen easily if you try using a float from memory and referring to it as a REAL10 when it is not.

hutch-- · February 14, 2019, 03:55:07 PM

> coffee spelled backwards is eeffoc.

I wonder if this is 00EEFF0Ch :P

aw27 · February 14, 2019, 06:50:05 PM

Quote from: guga on February 14, 2019, 10:14:35 AM
AW, you are feeding an inverted order of the numbers, i presume. That´s why you are having different results from me and Raymond
I was talking about this sequence: db 0, 0, 0, 0, 0, 0F, 090, 013, 0ED, 07

I am talking about the same thing as you are.
0x07ED13900F0000000000 is laid out in memory exactly like that in all little endian systems.
The number can't be a NaN because the exponent is not all ones. It can't be subnormal because the exponent is not all zeros. Is that difficult?

jj2007 · February 14, 2019, 08:09:54 PM

Quote from: AW on February 14, 2019, 06:50:05 PM0x07ED13900F0000000000 is laid out in memory exactly like that in all little endian systems.

Code Select

include \masm32\MasmBasic\MasmBasic.inc	; download
align 16
R10a	REAL10 0x11223344556677889900
	db 0AAh, 0BBh, 0CCh, 0DDh, 0EEh, 0FFh	; fill with AA BB CC DD EE FF
R10b	db 00h, 99h, 88h, 77h, 66h, 55h, 44h, 33h, 22h, 11h
	db 0AAh, 0BBh, 0CCh, 0DDh, 0EEh, 0FFh	; fill with AA BB CC DD EE FF
R8a	REAL8 0x1122334455667788
	REAL8 0
R8b	db 88h, 77h, 66h, 55h, 44h, 33h, 22h, 11h
	REAL8 0

  Init
  Inkey HexDump$(offset R10a, 64, notext)
EndOfCode

Result:

Code Select

00000000  00 00 00 00 00 32 11 EF 1D 40 AA BB CC DD EE FF
00000010  00 99 88 77 66 55 44 33 22 11 AA BB CC DD EE FF
00000020  00 00 00 E0 9D 59 D5 41 00 00 00 00 00 00 00 00
00000030  88 77 66 55 44 33 22 11 00 00 00 00 00 00 00 00

HSE · February 15, 2019, 01:44:50 AM

It's in this way :

Code Select

include \masm32\MasmBasic\MasmBasic.inc	; download
align 16
R10a	REAL10 11223344556677889900r
	db 0AAh, 0BBh, 0CCh, 0DDh, 0EEh, 0FFh	; fill with AA BB CC DD EE FF
R10b	db 00h, 99h, 88h, 77h, 66h, 55h, 44h, 33h, 22h, 11h
	db 0AAh, 0BBh, 0CCh, 0DDh, 0EEh, 0FFh	; fill with AA BB CC DD EE FF
R8a	REAL8 1122334455667788r
	REAL8 0.0
R8b	db 88h, 77h, 66h, 55h, 44h, 33h, 22h, 11h
	REAL8 0.0
NaN	REAL8  7FF8000000000000r     ;>> I use a lot this

  Init
  Inkey HexDump$(offset R10a, 64+8, notext)
EndOfCode

Result:

Code Select

00000000  00 99 88 77 66 55 44 33 22 11 AA BB CC DD EE FF
00000010  00 99 88 77 66 55 44 33 22 11 AA BB CC DD EE FF
00000020  88 77 66 55 44 33 22 11 00 00 00 00 00 00 00 00
00000030  88 77 66 55 44 33 22 11 00 00 00 00 00 00 00 00
00000040  00 00 00 00 00 00 F8 7F

jj2007 · February 15, 2019, 03:07:50 AM

Quote from: HSE on February 15, 2019, 01:44:50 AM
It's in this way

Yep, the "r" thing, thanks for reminding me. Qword used this notation years ago.

The MASM Forum

News:

Rounding Mode in FPU question

raymond

aw27

guga

guga

aw27

K_F

guga

jj2007

raymond

raymond

hutch--

aw27

jj2007

HSE

jj2007