Rounding Mode in FPU question

guga · February 11, 2019, 10:03:13 PM

Hi Guys

what is the minimum acceptable value for rounding in FPU ?
I mean, consider we have 2 numbers loaded in ST0 and ST1, respectivelly

Number1 = 0403A B1A2 BC2E C4FF FFF7
Number2 = 0413A B1A2 BC2E C4FF FFF7

When the numbers are compared using operands such as:

fcom/fcompp and friends, what is the minimum value for those numbers be considered equal ?

If both are TenBytes, they are different ? Or the difference (error mode due to rounding) only happens on Real8/real4 ? Or there isn´t any difference at all, and the functions after being compared will be considered different ?

jj2007 · February 12, 2019, 12:54:04 AM

Quote from: guga on February 11, 2019, 10:03:13 PMwhat is the minimum acceptable value for rounding in FPU ?

None! Here is a testbed:

include \masm32\MasmBasic\MasmBasic.inc ; download

.data
align 16
Num1 REAL10 1234567890.1234567890
align 16
Num2 REAL10 1234567890.1234567891
align 16

.data?
Dest1 REAL10 ?
Dest2 REAL10 ?
DestX1 REAL8 ?
DestX2 REAL8 ?

Init
movaps xmm1, OWORD PTR Num1
movaps xmm2, OWORD PTR Num2
deb 4, "Test xmm", x:xmm1, x:xmm2
comisd xmm1, xmm2
deb 4, "comisd (lowercase = flag not set)", flags
fld Num1
fld Num2
deb 4, "Test FPU", ST(0), ST(1)
fst DestX1
fst DestX2
movlps xmm1, DestX1
movlps xmm2, DestX2
comisd xmm1, xmm2
deb 4, "comisd (lowercase = not set)", flags
deb 4, "Test xmm", x:xmm1, x:xmm2, x:DestX1, x:DestX2
fstp Dest2
fstp Dest1
mov eax, dword ptr Dest1
mov edx, dword ptr Dest2
deb 4, "lowest 4 bytes", x:eax, x:edx
fld Num1
fld Num2
fcomip ST, ST(1)
deb 4, "fcomip (lowercase = not set)", flags
Inkey "ok"
EndOfCode

Code Select

Test xmm
x:xmm1          00000000 0000401D 932C05A4 3F35BA6E
x:xmm2          00000000 0000401D 932C05A4 3F35BA6F
comisd (lowercase = not set)    flags:          czso

Test FPU
ST(0)           1234567890.123456789
ST(1)           1234567890.123456789
comisd (lowercase = not set)    flags:          cZso

Test xmm
x:xmm1          00000000 0000401D 41D26580 B487E6B7
x:xmm2          00000000 0000401D 41D26580 B487E6B7
x:DestX1        41D26580 B487E6B7
x:DestX2        41D26580 B487E6B7

lowest 4 bytes
x:eax           3F35BA6E
x:edx           3F35BA6F
fcomip (lowercase = not set)    flags:          czso

Note in particular that the fst DestX1 + movlps xmm1, DestX1 results in a ZERO flag after comisd!

In case you want to build the snippet & play with it: It needs the very latest version of MB of today, right now, archive ending with *_f.zip. The reason being that your request made me find a tiny little bug in deb - the flags were not reported correctly :icon_redface:

Raistlin · February 12, 2019, 01:17:07 AM

I had a similar (exactly the same) issue 16 years ago
writing a system in asm for the SA Reserve bank. See
old forum: FPU rounding. The answer came from raymond
if I remember correctly re: convert real4 within his LIB,
or face the convoluted answer via inspecting the code.

aw27 · February 12, 2019, 04:32:50 AM

Quote from: Raistlin on February 12, 2019, 01:17:07 AM
writing a system in asm for the SA Reserve bank.

Great opportunity for a salami slicing (fraudulent numbers are always round numbers).

raymond · February 12, 2019, 04:55:50 AM

Quotewhat is the minimum acceptable value for rounding in FPU ?

a) Whatever value is held in FPU data registers ALWAYS has 10 bytes (80 bits) consisting of the sign bit, 15 exponent bits and 64 significand bits. This is the REAL10 format.

b) When storing such values in memory as REAL4 or REAL8, some of the significand bits would get lost due to truncation and rounding would be performed on the remaining significand bits according to the selected rounding mode.

c) If you reload a "truncated" float into an FPU data register, it would be converted into the 80-bit format but without remembering whatever bits may have been truncated previously.

d) If you compare the truncated value to its original 80-bit value, it would be considered different UNLESS ALL THE TRUNCATED BITS HAD BEEN 0's.

Raistlin · February 12, 2019, 05:16:04 AM

VERY funny AW27 ... inspired :lol:
But YES ! At lasT! To magistrate Raymond, after 16 years, thanks it now makes sense - thank you ever so much, my nightmares on this very topic will now end. Thanks sincerely.

Pps guga: the ladies in your avatar are too distracting, perhaps stars in certain places are appropriate ? 😈

guga · February 12, 2019, 06:44:58 AM

QuoteNone! Here is a testbed:

I was afraid you said that :icon_mrgreen: :icon_mrgreen: :icon_mrgreen: I´m rebuilding the FPU functions for disassembler and debugger, and will have to make they work differently. I´ll need to update the debugger to it avoid those rounding anyway :( :( The disassembler, i made a couple of fixes too, but allowing the rounding mode due to errors produced by some compilers throwing things like: 07ED 7FFFFFFF 00000000, or 07ED 13900F00 00000000 whenm the correct should be 07ED 80000000 00000000

Tks, JJ ! :t :t

Thanks a lot Raymond. :t :t :t i´ll be forced to make the function works without rounding or truncating (at least, for the debugger). Btw...do you know where i can find a scheme for Real10 values similar as this ? https://en.wikipedia.org/wiki/Extended_precision I didn´t fully understood how to correctly categorize the error modes. Wiki said about 80 bit values, but the documentation seems to be related to 64 ?

I´m trying to redesign an old function in RosAsm, but i´m not sure about some values. On the FPU to string, it is converting values such as:

[Value1: D$ 0, 0FFFFFFFF W$ 0] ; This number exceeds the limit for FPU. It is converted to 6.724206...e-4932 Does it exceeds the limit ?
[Value2: B$ 0FE, 07F, 0, 0, 0C0, 07F, 0, 0, 0, 0] ; This number exceeds the limit for FPU. It is converted to 5.1201424......e-4937 Does it exceeds the limit ?

But i´m not sure, if those values are correct. Aren´t they exceeding the limit ? If so, how to force them to be handled according to the error mode/category ? Are they QNAN, indefinite etc ? If so, what are the bits related to settled QNAN etc for those 2 numbers in particular ?

Code Select



[SpecialFPU_QNAN 1] ; QNAN
[SpecialFPU_SNAN 2] ; SNAN
[SpecialFPU_NegInf 3] ; Negative Infinite
[SpecialFPU_PosInf 4] ; Positive Infinite
[SpecialFPU_Indefinite 5] ; Indefinite
[SpecialFPU_SpecialIndefQNan 6] ; Special INDEFINITE QNAN
[SpecialFPU_SpecialIndefSNan 7] ; Special INDEFINITE SNAN
[SpecialFPU_SpecialIndefInfinite 8] ; Special INDEFINITE Infinite

Proc RealTenFPUNumberCategory:
    Arguments @Float80Pointer
    Local @FPUErrorMode
    Uses edi, ebx


    mov ebx D@Float80Pointer
    mov D@FPUErrorMode &FALSE

    ...If_And W$ebx+8 = 0, D$ebx+4 = 0 ; This is denormalized, but it is possible.
        ; 0000 00000000 00000000
        ; 0000 00000000 FFFFFFFF
    ...Else_If_And W$ebx+8 = 0, D$ebx+4 > 0 ; This is Ok.
        ; 0000 00000001 00000000
        ; 0000 FFFFFFFF FFFFFFFF
    ...Else_If_And W$ebx+8 > 0, W$ebx+8 < 07FFF; This is ok only if the fraction Dword is bigger or equal to 080000000
        .If D$ebx+4 < 080000000
            .Test_If D$ebx+4 040000000
                ; QNAN 40000000
                mov D@FPUErrorMode SpecialFPU_QNAN
            .Test_Else
                If_And D$ebx+4 > 0, D$ebx > 0
                    ; SNAN only if at least 1 bit is set
                    mov D@FPUErrorMode SpecialFPU_SNAN
                Else ; All fraction Bits are 0
                    ; Bit 15 is never reached. The bit is 0 from W$ebx+8
                    ; -INFINITE ; Bit15 = 0
                    mov D@FPUErrorMode SpecialFPU_NegInf
                End_If
            .Test_End
        .End_If
    ...Else_If W$ebx+8 = 07FFF; This is ok only if the fraction Dword is bigger or equal to 080000000
        .Test_If D$ebx+4 040000000
            ; QNAN 40000000
            mov D@FPUErrorMode SpecialFPU_QNAN
        .Test_Else
            If_And D$ebx+4 > 0, D$ebx > 0
                ; SNAN only if at least 1 bit is set
                mov D@FPUErrorMode SpecialFPU_SNAN
            Else ; All fraction Bits are 0
                ; Bit 15 is never reached. The bit is 0 from W$ebx+8
                ; -INFINITE ; Bit15 = 0
;               Test_If W$ebx+8 = 0FFFF ; we need to see if Bit 15 is set
 ;                  ; -INFINITE ; Bit15 = 0
  ;             Test_Else
   ;                ; +INFINITE ; Bit15 = 1
    ;           Test_End
                ;mov D$edi '-INF', B$edi+4 0
                mov D@FPUErrorMode SpecialFPU_NegInf
            End_If
        .Test_End
        ; Below is similar to W$ebx+8 = 0
    ...Else_If_And W$ebx+8 = 08000, D$ebx+4 = 0 ; This is denormalized, but possible.
        ; 8000 00000000 00000000 (0)
        ; 8000 00000000 FFFFFFFF (-0.0000000156560127730E-4933)
    ...Else_If_And W$ebx+8 = 08000, D$ebx+4 > 0 ; This is Ok.
        ; 8000 01000000 00000000 (-0.2626643080556322880E-4933)
        ; 8000 FFFFFFFF 00000001 (-6.7242062846585856000E-4932)
    ...Else_If_And W$ebx+8 > 08000, W$ebx+8 < 0FFFF; This is ok only if the fraction Dword is bigger or equal to 080000000
        .If D$ebx+4 < 080000000
            .Test_If D$ebx+4 040000000
                ; QNAN 40000000
                mov D@FPUErrorMode SpecialFPU_QNAN
            .Test_Else
                If_And D$ebx+4 > 0, D$ebx > 0
                    ; SNAN only if at least 1 bit is set
                    ;mov D$edi 'SNaN', B$edi+4 0
                    mov D@FPUErrorMode SpecialFPU_SNAN
                Else ; All fraction Bits are 0
                    ; Bit 15 is always reached. The bit is 1 from W$ebx+8
                    ; +INFINITE ; Bit15 = 1
                    ;mov D$edi '+INF', B$edi+4 0
                    mov D@FPUErrorMode SpecialFPU_PosInf
                End_If
            .Test_End
        .End_If

    ...Else_If W$ebx+8 = 0FFFF; This is to we identify indefined or other NAN values

        .If_And D$ebx+4 >= 040000000, D$ebx = 0
            ; INDEFINITE
            mov D@FPUErrorMode SpecialFPU_Indefinite
        .Else
            .Test_If D$ebx+4 040000000
                ; Special INDEFINITE QNAN 40000000
                mov D@FPUErrorMode SpecialFPU_SpecialIndefQNan
            .Test_Else
                If_And D$ebx+4 > 0, D$ebx > 0
                    ; Special INDEFINITE SNAN only if at least 1 bit is set
                    mov D@FPUErrorMode SpecialFPU_SpecialIndefSNan
                Else ; All fraction Bits are 0
                    ; Bit 15 is always reached. The bit is 1 from W$ebx+8
                    ; Special INDEFINITE +INFINITE ; Bit15 = 1
                    mov D@FPUErrorMode SpecialFPU_SpecialIndefInfinite
                End_If
            .Test_End
        .End_If
    ...End_If

    ..If D@FPUErrorMode <> 0

        On B$edi-1 = '-', dec edi

        .If D@FPUErrorMode = SpecialFPU_QNAN
            push esi | zcopy {"QNAN ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_SNAN
            push esi | zcopy {"SNAN ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_NegInf
            push esi | zcopy {"-INFINITE ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_PosInf
            push esi | zcopy {"+INFINITE ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_Indefinite
            push esi | zcopy {"INDEFINITE ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_SpecialIndefQNan
            push esi | zcopy {"Special INDEFINITE QNAN ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_SpecialIndefSNan
            push esi | zcopy {"Special INDEFINITE SNAN ", 0} | pop esi
            mov B$edi 0
        .Else_If D@FPUErrorMode = SpecialFPU_SpecialIndefInfinite
            push esi | zcopy {"Special INDEFINITE +INFINITE ", 0} | pop esi
            mov B$edi 0
        .End_If

    ..End_If

    mov eax D@FPUErrorMode

EndP

Raistlin

QuotePps guga: the ladies in your avatar are too distracting, perhaps stars in certain places are appropriate ? 😈

Considering the time i made the avatar, those ladies must be kind old right now (or dead)

(They are not naked, you know...Only wearing a bikini)

K_F · February 12, 2019, 07:02:27 AM

A possible way of re-introducing the lost resolution is to run a 'random' generator at the 'resolution of lost bits', and then add that to the truncated Real4 (Real8).
A way of minimising FPU error propagation. ;)

raymond · February 12, 2019, 07:39:55 AM

NANs (including QNAN's and SNAN's) are sufficiently described in Simply FPU and would be a waste of time to start copying that section here.

See http://www.ray.masmcode.com/tutorial/fpuchap2.htm#nans

aw27 · February 12, 2019, 08:37:17 PM

If f is a NaN then f !=f is true.
This is according to IEEE, so compilers/assemblers must abide.

guga · February 14, 2019, 03:45:59 AM

Quote from: AW on February 12, 2019, 08:37:17 PM
If f is a NaN then f !=f is true.
This is according to IEEE, so compilers/assemblers must abide.

Yes, but some compilers generates errors when converting string to FPU and it is hard to identify them while you are disassembling or debugging. For example, the programmer is using Float as a global variable, but when it compiled it, the compiler generated a NAN, like this:
07ED 13900F00 00000000

So, if that variable was supposed to be used on a multiplication or division operation, for example, it will produce an error. And this error is hard to identify or fix. On the disassembler we needed to make fixes when found bugs like these ones forcing the numbers of that variable to be restricted on their own boundaries. On such situations, to fix that we needed to check the next good value that is, on this case: 07ED 80000000 00000000 (1.0361967820008025600E-4321)

Sure, we could also look for the previous good value as well, but, they are the same (on this case) 07EC FFFFFFFF FFFFFFFF (1.0361967820008025600E-4321), showing us that the NAN was, in fact: 1.0361967820008025600E-4321 (07ED 80000000 00000000) and we could safely use that value, rather then accepting the NAN as a good value.

This kind of gap, is for the disassembler and debugger purposes and not for defining a NAN or checking for zero divisions etc. For the disassembler point of view, its useless trying to "guess" what the number was originally, because we are analyzing a defined value with certain Bytes that certainly was badly generated due to an error on the linker/compiler that failed to check for the NANs or error mode values or by any other unknown error. For example, if the user was trying to build things like:
T$ 1.0361967820008025600E-4321, instead the compiler output this bytes: 07ED 80000000 00000000, it outputed bad ones, like 07ED 7FFFFFFF 00000000, or 07ED 13900F00 00000000 like in the example showed.

Those kind of errors were not supposed to happen, but it do happens, specially in older compilers or old libraries.

aw27 · February 14, 2019, 04:08:20 AM

Well, let me try to reformulate because I had not taken that from the horse's mouth. Now, I have read IEEE Std 754 -2008.
What it says, when talking about comparisons of floating point numbers, is:

Quote
Four mutually exclusive relations are possible: less than, equal, greater than, and unordered. The last case arises when at least one operand is NaN. Every NaN shall compare unordered with everything, including itself.
....
Languages define how the result of a comparison shall be delivered, in one of two ways: either as a relation identifying one of the four relations listed above, or as a true-false response to a predicate that names the specific comparison desired.

So, for a few high-level language, including C (at least in Visual Studio), it is true what I said:
If f is a NaN then f !=f is true

But, what happened behind the scenes is not that, what happens is an unordered relation:

This is the MASM equivalent of what really happens:

Code Select


.model flat, stdcall
option casemap:none

includelib \masm32\lib\msvcrt.lib
printf proto C :ptr, :VARARG
includelib \masm32\lib\kernel32.lib
ExitProcess proto :dword

NANFactory UNION 
	q QWORD ?
	f REAL8 ?
NANFactory ENDS

.data
nan NANFactory   <7FF0000000000001h> ; Generates SNaN
;nan NANFactory  <7FFF100000000001h> ; Generates QNaN

isNotNaNMsg db 'Value is not a NaN',10,0
isNaNMsg db 'Value is a NaN',10,0

.code
main proc
	; ************ SSE	*************
	movsd xmm0, qword ptr nan.f 
	ucomisd xmm0, xmm0 
	COMMENT £
	The UCOMISD instruction differs from the COMISD instruction in that it signals a SIMD floating-point invalid operation exception (#I) only when a source operand is an SNaN. But both QNaN and SNaN return unordered results.
	£
	lahf	; For NaN: SF=0 ZF=1 0 AF=0 0 PF=1 1 CF=1 , UNORDERED: ZF,PF,CF←111;
			; For non NaN:
			;	GREATER_THAN: ZF,PF,CF←000
			;	LESS_THAN: ZF,PF,CF←001;
			;	EQUAL: ZF,PF,CF←100;
	test ah, 01000100b ; 44h - Test ZF and PF
	.if PARITY?
		invoke printf, offset isNaNMsg ; zf=0 pf=1
	.else
		invoke printf, offset isNotNaNMsg ; zf=0 pf=0		
	.endif

	; ************ X87	*************
	fld qword ptr nan.f 
	fld qword ptr nan.f
	fucompp ; unordered comparison of the contents of register ST(0) and ST(1)
	COMMENT £
		Unordered. Flags C0, C2 and C3 set when SNaN or an unsupported format.
	£
	fnstsw ax 
	test ah, 01000100b ; 44h - Test ZF and PF
	.if PARITY?
		invoke printf, offset isNaNMsg ; zf=0 pf=1
	.else
		invoke printf, offset isNotNaNMsg ; zf=0 pf=0		
	.endif
@exit:	
	invoke ExitProcess, 0
main endp

end

Adamanteus · February 14, 2019, 04:58:59 AM

Quote from: K_F on February 12, 2019, 07:02:27 AM
A possible way of re-introducing the lost resolution is to run a 'random' generator at the 'resolution of lost bits', and then add that to the truncated Real4 (Real8).
A way of minimising FPU error propagation. ;)

Particular way of solving such type of problems, is to have ONE type for all calculations, I'm using this :

Code Select


MACE TYPEDEF REAL10

raymond · February 14, 2019, 05:06:14 AM

Quotethe compiler generated a NAN, like this:
07ED 13900F00 00000000

Would you be kind enough to clarify if that hex value
a) is taken from an FPU register or from a memory section
b) if the latter, is it reported in the little-endian style or would it be as an ordered ten-byte value

Quotethe programmer is using Float as a global variable

Many HLL programmers consider the term "float" as meaning REAL4 variables. Then you use a REAL10 example number for discussion. Could you clarify the discrepancy.

guga · February 14, 2019, 06:26:56 AM

Hi raymond

Sorry, i forgot to fix the previous examples before posting. They are TenByte in little endian. They are taken from memory section. So, the Ten Byte they looks like:

D$ 0
D$ 013900F00
W$ 07ED

In byte sequence: 0, 0, 0, 0, 0, 0F, 090, 013, 0ED, 07

QuoteMany HLL programmers consider the term "float" as meaning REAL4 variables. Then you use a REAL10 example number for discussion. Could you clarify the discrepancy.

I said Float (In general), but i was referring to the Real10 Variable on the example i used.

The MASM Forum

News: