## News:

Message to All Guests

## 64 bit OS and Real 4 (float) - not very reliable!

Started by Gunther, November 04, 2012, 09:52:42 PM

#### Gunther

Hi qWord,

your later answers sounding better and better and I hope we're at the same line.

Quote from: qWord on November 07, 2012, 08:58:48 AM
Even this sound a bit like you won't think about the error of an calculation and instead hope that the high precision solve that problem. I would estimate the error and then choose the correct data type or, if need, instead us an high/arbitrary precision library.

Excuse me please, but with the best will in the world, I can't see an error. In my testbed we've the following situation:

• No number has fractional part. So, rounding errors which have to do with 0.1 and multiples of that can't occur.
• We don't subtract nearly equal values, which would lead to a dangerous digit cancallation.
• The valid range for REAL 4 is from 1.175 x 10^(-38) up to 3.403 x 10^38. We're far away from that limits.
Therefore, the FPU makes a good job in that range and brings the right results. What you are calling "error" has only to do with the precision shrink. These shrink isn't a law of nature, but the work of man done by compiler builders.

Gunther
You have to know the facts before you can distort them.

#### qWord

Quote from: Gunther on November 08, 2012, 02:51:18 AM
• No number has fractional part. So, rounding errors which have to do with 0.1 and multiples of that can't occur.
all numbers are normalized to 1.xxx...*2^y

Quote from: Gunther on November 08, 2012, 02:51:18 AM
• The valid range for REAL 4 is from 1.175 x 10^(-38) up to 3.403 x 10^38. We're far away from that limits
yes, with 24 precision bits. Even with 64 precision bits, you  will only get around 18 decimal digits of precision.

Quote from: Gunther on November 08, 2012, 02:51:18 AMThese shrink isn't a law of nature, but the work of man done by compiler builders.
no, it is an technical limit.

Quote from: Gunther on November 08, 2012, 02:51:18 AMTherefore, the FPU makes a good job in that range and brings the right results.
no doubts that it does a good job, but you will also get it using double precision variables with SSEx.

If I use float or double variables in my programs, I assume that the compiler will create code that does the calculations at least with that precision. However I would never assume that higher precision is used. As your example shows, the compiler creators and most other SW developers share this opinion.
You assumption based on your experience that it was done that way in the past - I'm sure it was never the intention by any compiler builder to always use REAL10/8s  - they simply know that it is dam slow to switch the FPU flags.

regards, qWord
MREAL macros - when you need floating point arithmetic while assembling!

#### qWord

addendum: I've take a look in the latest C standard and found this:
Quote from: ISO/IEC 9899:201x Committee Draft — April 12, 2011Except for assignment and cast (which remove all extra range and precision), the values
yielded by operators with floating operands and values subject to the usual arithmetic
conversions and of floating constants are evaluated to a format whose range and precision
may be greater
than required by the type. The use of evaluation formats is characterized
by the implementation-defined value of FLT_EVAL_METHOD

• -1 indeterminable;
• 0 evaluate all operations and constants just to the range and precision of the
type;
• 1 evaluate operations and constants of type float and double to the
range and precision of the double type, evaluate long double
operations and constants to the range and precision of the long double
type;
• 2 evaluate all operations and constants to the range and precision of the
long double type.

All other negative values for FLT_EVAL_METHOD characterize implementation-defined
behavior.
What does your compiler returns for FLT_EVAL_METHOD?
I currently can't find similar statements for c++...
MREAL macros - when you need floating point arithmetic while assembling!

#### Gunther

Hi qWord,

Quote from: qWord on November 08, 2012, 09:36:45 AM
all numbers are normalized to 1.xxx...*2^y

That's clear, but I meant the decimal representation of the numbers.

Quote from: qWord on November 08, 2012, 09:36:45 AM
yes, with 24 precision bits. Even with 64 precision bits, you  will only get around 18 decimal digits of precision.
My point was REAL 4 and by the way, I've strong doubts that you will have 18 valid decimal digits with REAL 8; probably 16 digits is real.

Quote from: qWord on November 08, 2012, 09:36:45 AM
Quote from: Gunther on November 08, 2012, 02:51:18 AMThese shrink isn't a law of nature, but the work of man done by compiler builders.
no, it is an technical limit.
The decision to use the xmm registers and not the FPU is the work of man and has nothing to do with technical limits.

Quote from: qWord on November 08, 2012, 09:36:45 AM
no doubts that it does a good job, but you will also get it using double precision variables with SSEx.
That is to check.

Quote from: qWord on November 08, 2012, 09:36:45 AM
If I use float or double variables in my programs, I assume that the compiler will create code that does the calculations at least with that precision. However I would never assume that higher precision is used. As your example shows, the compiler creators and most other SW developers share this opinion.
Or, and that's another possibility, the compiler builders don't know enough about the dangers and risks.

Quote from: qWord on November 08, 2012, 10:02:47 AM
What does your compiler returns for FLT_EVAL_METHOD?
I currently can't find similar statements for c++...

I've to check that.

Gunther
You have to know the facts before you can distort them.

#### MichaelW

Everything I have is 32-bit. For my most recent installation of MinGW (with gcc version 4.5.2), __FLT_EVAL_METHOD__ is set to 2. This is the relevant section of math.h:
`/* Use the compiler's builtin define for FLT_EVAL_METHOD to   set float_t and double_t.  */#if defined(__FLT_EVAL_METHOD__)  # if ( __FLT_EVAL_METHOD__== 0)typedef float float_t;typedef double double_t;# elif (__FLT_EVAL_METHOD__ == 1)typedef double float_t;typedef double double_t;# elif (__FLT_EVAL_METHOD__ == 2)typedef long double float_t;typedef long double double_t;#endif#else /* ix87 FPU default */typedef long double float_t;typedef long double double_t;#endif`

I cannot find any reference to __FLT_EVAL_METHOD__ in the header files that shipped with my installation of the 2003 PSDK, the Windows Server 2003 SP1 DDK, or the Microsoft Visual C++ Toolkit 2003.
Well Microsoft, here's another nice mess you've gotten us into.

#### Gunther

Hi Mike,

thank you for your investigation. My installed gcc version 4.7.2 contains the same section in math.h.

Gunther
You have to know the facts before you can distort them.

#### MichaelW

It seems to me that there should be a way to control the value, a command line option or some such, but in my quick search of the GCC docs I did not find one. And then there is the question of how this relates to the effective deprecation of long double under Windows.
Well Microsoft, here's another nice mess you've gotten us into.

#### Gunther

Hi Mike,

Quote from: MichaelW on November 11, 2012, 02:36:43 PM
It seems to me that there should be a way to control the value, a command line option or some such, but in my quick search of the GCC docs I did not find one. And then there is the question of how this relates to the effective deprecation of long double under Windows.

the gcc supports long double under Windows but can't print such values, because it uses the windows libc. It's a bit strange.

Gunther
You have to know the facts before you can distort them.