News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

64 bit OS and Real 4 (float) - not very reliable!

Started by Gunther, November 04, 2012, 09:52:42 PM

Previous topic - Next topic

Gunther

I've attached the archive testbed.zip.

It contains the sources, the binary files, and the running applications for both 64 bit operating systems (Linux and Windows). For details and explanations, please check the files readme.txt and testbed.pdf.

A special thanks to Bill Cravener; he did a bit proof reading. But to make it clear: I'm responsible for all the errors in the files and documents - nobody else. Critical remarks and proposals for improvements are welcome.

Gunther
You have to know the facts before you can distort them.

Ryan

I've got the Express Edition of VS C++.  I was trying to compile it, and then I remembered I've had no luck linking 64-bit programs.

Gunther

Ryan,

thank you for your effort. What's the problem? I used ANSI C?

Gunther
You have to know the facts before you can distort them.

qWord

QuoteProcessing REAL 4 numbers with the new XMM registers means that the
calculation is made only with 32 bit; that leads to the catastrophic digit
cancallation and is the reason for the false results. The data type float is not
very reliable under a 64 bit operating system, because the described design
decision seems to be unwise.
I can't agree with that:
1. floats have 24 precession bits, thus they can represents integers in the range from -16777216 to 16777216 exact. Your test produces numbers that are out of that rang thus rounding applies.
2. If I'm not wrong, most C/C++ compilers set the FPU's precision control to double by default. If you set the PC to single, you will get the same results as with the SIMD instructions.
3. I can't see why you claim this a 64Bit problem - you have simply chose the wrong data type

regards, qWord
MREAL macros - when you need floating point arithmetic while assembling!

Ryan

Quote from: Gunther on November 06, 2012, 07:21:27 AM
Ryan,

thank you for your effort. What's the problem? I used ANSI C?

Gunther
I've tried multiple times to install the SDK as Microsoft recommends, but it gives a generic error message during the install and fails.  I've tried disabling my AV, but no effect.

http://msdn.microsoft.com/en-us/library/9yb4317s%28v=vs.80%29.aspx

Gunther

#5
Hi qword,

may be or not. We've the fact that the FPU has done a pretty good job in that number range, while the xmm code failed resoundingly. Period.

Quote from: qWord on November 06, 2012, 07:21:36 AM
2. If I'm not wrong, most C/C++ compilers set the FPU's precision control to double by default.
That depends. The 32 bit OS starts the FPU with the full 80 bit wide in the rounding mode round to nearest; that's reasonable. The ANSI standard uses the rounding mode truncate for casts, for example, from floating point to integer. Therefore the FPU control word must be changed and restored after the cast. But there's no need to shrink the FPU precision. It has had serious reasons that William Kahan, the father of floating point, insist for a 80 bit wide FPU and Intel realized it.

Quote from: qWord on November 06, 2012, 07:21:36 AM
3. I can't see why you claim this a 64Bit problem - you have simply chose the wrong data type

Why should be a REAL 4 (float) a wrong data type? The good old FPU makes a good job in the described data range. Why not using another approach? Let the FPU doing the floating point operations by default (with the full register width) and if one would like to use the XMM registers, set an appropriate compiler switch. This leads to better and more reliable results. That was the standard behaviour for more than 25 years.

Gunther
You have to know the facts before you can distort them.

jj2007

You get the precision you choose...

PI on FPU=      3.141592653589793238
PI ^ 9 WinCalc= 29809.09933344621166
PI ^ 9 FPU=     29809.09933344621167
PI ^ 9 R8=      29809.09933344620003
PI ^ 9 R4=      29809.10742187500000


Windows Calculator uses internally a very high precision, 30 digits or so.

include \masm32\MasmBasic\MasmBasic.inc   ; download
.data
res10   REAL10 29809.09933344621166650940240124   ; Windows Calculator
pi4   REAL4 3.1415926535897932384626433832795
pi8   REAL8 3.1415926535897932384626433832795
res4   REAL4 ?
res8   REAL8 ?

   Init
   expo=9
   fldpi
   Print Str$("PI on FPU=\t%Jf\n", ST(0))
   Print Str$("PI ^ 9 WinCalc=\t%Jf\n", res10)

   fldpi
   REPEAT expo-1
      fmul st, st(1)   ; FPU, Real10 precision
   ENDM
   Print Str$("PI ^ 9 FPU=\t%Jf\n", ST(0))

   movsd xmm0, pi8
   REPEAT expo-1
      mulsd xmm0, pi8   ; xmm, Real8 precision
   ENDM
   movsd res8, xmm0
   Print Str$("PI ^ 9 R8=\t%Jf\n", res8)

   movss xmm0, pi4
   REPEAT expo-1
      mulss xmm0, pi4   ; xmm, Real4 precision
   ENDM
   movss res4, xmm0
   Inkey Str$("PI ^ 9 R4=\t%Jf\n", res4)

   Exit
end start

Gunther

Hi Jochen,

one thing is clear: MasmBasic is a good thing.  :t On the other hand are the numbers not inside a critical range.

Gunther
You have to know the facts before you can distort them.

qWord

Quote from: Gunther on November 06, 2012, 07:49:50 AMBut there's no need to shrink the FPU precision.
Speed!
Quote from: Gunther on November 06, 2012, 07:49:50 AMIt has had serious reasons that William Kahan, the father of floating point, insist for a 80 bit wide FPU and Intel realized it.
The reason was that there are calculations that need such high precision. Even that requires to use the corresponding data type in HLLs.

Quote from: Gunther on November 06, 2012, 07:49:50 AM
Why should be a REAL 4 (float) a wrong data type? The good old FPU makes a good job in the described data range.
Why to use a data type that maybe can't represent the calculation's (intermediate) result(s)?
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

Quote from: Gunther on November 06, 2012, 08:23:06 AM
On the other hand are the numbers not inside a critical range.

But you can see how the precision decreases rapidly for Pi^9....
What do you get for your C compilers? Same as Real8, or Real4 precision? Under the hood of the compiler is assembler ;-)

Gunther

Hi qword,

Quote from: qWord on November 06, 2012, 08:29:25 AM
Speed!

What helps speed, if the result is garbage? You've fast garbage, nothing else.

Quote from: qWord on November 06, 2012, 08:29:25 AM
The reason was that there are calculations that need such high precision. Even that requires to use the corresponding data type in HLLs.

There are a lot of calculations that needs such precision. What we have with floating point numbers is something like a comb with lesser and lesser tooths at the end of the range. So try to comb your hair with that equipment! And you would really like to shrink the precision?

Quote from: qWord on November 06, 2012, 08:29:25 AM
Why to use a data type that maybe can't represent the calculation's (intermediate) result(s)?

That's the point: interim results must be exact enough to reduce the rounding errors. A lot of people made a lot of effort to reach that goal; check for example Kahan's summation algorithm https://en.wikipedia.org/wiki/Kahan_summation_algorithm. And don't forget: Robert Hooke wrote it very precise: If I have been able to see further it is because I have stood on the Shoulders of Giants. There's no need to look arrogant at the work of our predecessors. Nobody alone is a pillar of wisdom.

Gunther

You have to know the facts before you can distort them.

Gunther

Jochen,

Quote from: jj2007 on November 06, 2012, 08:34:56 AM
But you can see how the precision decreases rapidly for Pi^9....
What do you get for your C compilers? Same as Real8, or Real4 precision? Under the hood of the compiler is assembler ;-)

that is very clear. What my compiler produces shows the example. You can't blame the underlying assembler for that. It has to do with the lack of thought by some compiler builders which haven't any clue of numerical mathematics.

Gunther
You have to know the facts before you can distort them.

qWord

Quote from: Gunther on November 06, 2012, 09:13:57 AMWhat helps speed, if the result is garbage? You've fast garbage, nothing else.
That's my point: using the correct data type doesn't yield  in garbage.

Quote from: Gunther on November 06, 2012, 09:13:57 AMThere's no need to look arrogant at the work of our predecessors. Nobody alone is a pillar of wisdom
That was not my intention. My thoughts was on the data type and not about the need of high precision calculations.

regards, qWord
MREAL macros - when you need floating point arithmetic while assembling!

Gunther

Hi qWord,

Quote from: qWord on November 06, 2012, 09:38:12 AM
That's my point: using the correct data type doesn't yield  in garbage.

That's your point of view - why not? But it seems to me that you're labouring under a misapprehension. In practical calculations it is very common that interim results are out of range. Let me give you an example: the multiplication of two 24 bit mantissas leads to a 48 bit mantissa - out of range! Do you really think that rounding down that interim result with brute force is exactly the ideal solution? The FPU width of 80 bit was made to accumulate such results; the appropriate rounding was done after finishing the computation. The record of success speaks for the FPU and not for the fancy new automatic XMM code generation.

Quote from: qWord on November 06, 2012, 09:38:12 AM
That was not my intention. My thoughts was on the data type and not about the need of high precision calculations.

Of course, but a high accuracy arithmetic is only possible with good sense.

Gunther

You have to know the facts before you can distort them.

qWord

hi,
Quote from: Gunther on November 07, 2012, 07:39:47 AMBut it seems to me that you're labouring under a misapprehension. In practical calculations it is very common that interim results are out of range. Let me give you an example: the multiplication of two 24 bit mantissas leads to a 48 bit mantissa - out of range!
That is absolute correct: a FP multiplier must have at least a 48 Bit register. However, the results gets normalized, which also means that only redundant precision bit are rounded. Even, all this is done in the hardware regardless if we are using the FPU or SSEx instructions.

I see that we can't get on the same line - you expect that the compiler should always use highest precision regardless the used FP type. Even this sound a bit like you won't think about the error of an calculation and instead hope that the high precision solve that problem. I would estimate the error and then choose the correct data type or, if need, instead us an high/arbitrary precision library.

regards, qWord



MREAL macros - when you need floating point arithmetic while assembling!