Hi all!
Last week we talk about rounding in FPU. Using REAL8 variables FPU results precision will be, close to sure, that of REAL8. Using REAL10 variables you can have more precision, but you don't know results precision. Then if you need more precision than REAL8 you have to use bignumbers systems.
Most of this systems are very slow because they must emulates the operations. A more faster system use a mix of FPU and software:
QuoteDouble-double arithmetic
A common software technique to implement nearly quadruple precision using pairs of double-precision values is sometimes called double-double arithmetic.Using pairs of IEEE double-precision values with 53-bit significands, double-double arithmetic provides operations on numbers with significands of at least[4] 2 × 53 = 106 bits (actually 107 bits except for some of the largest values, due to the limited exponent range), only slightly less precise than the 113-bit significand of IEEE binary128 quadruple precision. The range of a double-double remains essentially the same as the double-precision format because the exponent has still 11 bits,[4] significantly lower than the 15-bit exponent of IEEE quadruple precision (a range of 1.8 × 10^308 for double-double versus 1.2 × 10^4932 for binary128).
I found a DoubleDouble. java file, from Martin Davis, with most important pieces and operations that can be translate to Masm.
Quote from: Martin DavisImmutable, extended-precision floating-point numbers which maintain 106 bits
(approximately 30 decimal digits) of precision.
Obviously I used a lot of macros, thinking in a future implementation. This is for ML64.
The test don't look so amazing :biggrin: :
rr <3.14159265358979310e+000, 1.22464679914735320e-016>
rr <2.71828182845904510e+000, 1.44564689172925020e-016>
3.1415926535897932384626433832795
+
2.7182818284590452353602874713526
=
5.8598744820488384738229308546321
Press any key to continue...
Regards, HSE.
a -------------- Real8x2.zip (15.43 kB - downloaded 8 times.)
b - Corrected toStandardNotation Real8x2b.zip (15.85 kB - downloaded 8 times.)
- Added toSciNotation
c strings problems look solved now
:thumbsup:
I wonder how much precision is necessary to bring a spaceship safely to Mars. Any ideas?
Quote from: jj2007 on February 20, 2023, 05:58:35 AM
:thumbsup:
I wonder how much precision is necessary to bring a spaceship safely to Mars. Any ideas?
:biggrin: Not so much, I think. In the end, you rely in sensors precision and jet correctors control.
Apollo used an italian calculator :thumbsup:
Hi
Very clever. I only really understood it after reading this (https://github.com/ChiralBehaviors/Utils/blob/master/src/main/java/com/hellblazer/utils/math/DoubleDouble.java#:~:text=*%20%3Cp%3E-,*,-A%20DoubleDouble%20uses):
Quote* A DoubleDouble uses a representation containing two double-precision values.
* A number x is represented as a pair of doubles, x.hi and x.lo, such that the
* number represented by x is x.hi + x.lo, where
*
* <pre>
* |x.lo| <= 0.5*ulp(x.hi)
* </pre>
*
* and ulp(y) means "unit in the last place of y". The basic arithmetic
* operations are implemented using convenient properties of IEEE-754
* floating-point arithmetic.
* <p>
Biterider
Hi Biterider!
Quote from: Biterider on February 20, 2023, 07:31:51 AM
I only really understood it after reading this (https://github.com/ChiralBehaviors/Utils/blob/master/src/main/java/com/hellblazer/utils/math/DoubleDouble.java#:~:text=*%20%3Cp%3E-,*,-A%20DoubleDouble%20uses):
:thumbsup: That is at DoubleDouble.inc end.
The .asm file was missing in the zip :biggrin:
Thanks, HSE.
Quote from: jj2007 on February 20, 2023, 05:58:35 AM
:thumbsup:
I wonder how much precision is necessary to bring a spaceship safely to Mars. Any ideas?
Just enough. :biggrin:
:biggrin:
Some good work there Hector, if I could think of a use (for myself) of higher levels of precision than 64 bit, I would probably use it but SSE2 64 bit maths has result lengths that generally require truncation to make it readable so I may be wasting such extra precision.
Quote from: jj2007 on February 20, 2023, 05:58:35 AMI wonder how much precision is necessary to bring a spaceship safely to Mars.
I googled a bit and found this JPL/NASA article (October 2022): How Many Decimals of Pi Do We Really Need? (https://www.jpl.nasa.gov/edu/news/2016/3/16/how-many-decimals-of-pi-do-we-really-need/)
QuoteFor JPL's highest accuracy calculations, which are for interplanetary navigation, we use 3.141592653589793.
For comparison, REAL10 precision: 3.141592653589793238. So the fpu is a factor 1000 more precise than the required precision for interplanetary travel.
Quote from: hutch-- on February 20, 2023, 10:04:12 AM
if I could think of a use (for myself) of higher levels of precision than 64 bit..
If You think that, will be preciated like example. :biggrin:
:thumbsup:
Hitting Mars depends on if you put a chimpanzee or rocket scientist using computer to calculate trajectory between earth - Mars
None at all,just wait a few days and we all reach Mars :tongue:
Héctor about slow,Add/sub would be candidate for use packed SSE2
Hi daydreamer!
Quote from: daydreamer on February 20, 2023, 06:41:42 PM
about slow,Add/sub would be candidate for use packed SSE2
Not even close!!
The reason why FPU can work at REAL8 precision is the existence of internal REAL10 to make operations.
SIMD don't have internal REAL10, then your precision probably is around a theoretical REAL
6 (even if you are using REAL
8 variables). You never can use SIMD if you want high precision.
Like Raymond remember us from time to time, rarely somebody need full REAL8 precision. Then SIMD frequently is a good and fast option. Not the case here.
HSE
Hi all!
Added translation of toSciNotation procedure.
Corrected toStandardNotation procedure from a mix of original and translation problems with negative exponents. Perhaps don't was never been called to much.
Update in first post.
HSE
Hi Hector,
I have this sneaking suspicion that SSE2 and the old x87 FP share / use the same internal circuitry which would be a matter of economy in basic chip design for silicon area. Dual use is not uncommon in Intel hardware, MMX and FP sharing the same registers. I don't have a lot of use for it but 64 bit SSE2 seems to deliver about the same level of precision as any of the examples I have seen on the internet.
Quote from: HSE on February 21, 2023, 04:34:18 AMThe reason why FPU can work at REAL8 precision is the existence of internal REAL10 to make operations.
After
finit, the fpu works internally and externally with REAL10 precision, i.e. you can load and save REAL10 numbers. That's 16 bits more precise than corresponding SIMD instructions, and most of the time equally fast. The only reason to go for SIMD is parallel (packed) processing, but that is not always possible.
Quote from: hutch-- on February 21, 2023, 07:49:27 AM
64 bit SSE2 seems to deliver about the same level of precision as any of the examples I have seen on the internet.
:thumbsup: For most applications even REAL4 is to much precision.
In some simulations calculating 1500 equations 8.5 millon times, using SSE take 75% of time using FPU, but with differences in results around 1/45000.
SSE have very few functions, you have to code them in software. Probably never that functions are coded to levels of precision close to FPU. That is obvious, if you need FPU precision you are going to use FPU, not SSE.
I think could be hard to find good at limit SSE-FPU precision comparisons, because people who need FPU, not even look at SSE :biggrin:
It's not an interesting discussion because FPUs are everywhere. I have 8 FPU just in the phone :thumbsup:
Quote from: HSE on February 21, 2023, 08:31:34 AMusing SSE take 75% of time using FPU
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
80 cycles for 100 * Fpu mult
458 cycles for 100 * Simd mult
2223 cycles for 100 * Fpu div
2124 cycles for 100 * Simd div
81 cycles for 100 * Fpu mult
454 cycles for 100 * Simd mult
2225 cycles for 100 * Fpu div
1903 cycles for 100 * Simd div
80 cycles for 100 * Fpu mult
458 cycles for 100 * Simd mult
2217 cycles for 100 * Fpu div
2129 cycles for 100 * Simd div
83 cycles for 100 * Fpu mult
460 cycles for 100 * Simd mult
2222 cycles for 100 * Fpu div
2123 cycles for 100 * Simd div
23 bytes for Fpu mult
27 bytes for Simd mult
18 bytes for Fpu div
22 bytes for Simd div
Calculation results are identical.
Same but with REAL10 precision for the FPU (the bottleneck is fstp res10):
626 cycles for 100 * Fpu mult
460 cycles for 100 * Simd mult
1821 cycles for 100 * Fpu div
2121 cycles for 100 * Simd div
645 cycles for 100 * Fpu mult
463 cycles for 100 * Simd mult
1818 cycles for 100 * Fpu div
2125 cycles for 100 * Simd div
629 cycles for 100 * Fpu mult
459 cycles for 100 * Simd mult
1820 cycles for 100 * Fpu div
2128 cycles for 100 * Simd div
630 cycles for 100 * Fpu mult
455 cycles for 100 * Simd mult
1818 cycles for 100 * Fpu div
2128 cycles for 100 * Simd div
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (SSE4)
32 cycles for 100 * Fpu mult
425 cycles for 100 * Simd mult
1537 cycles for 100 * Fpu div
1831 cycles for 100 * Simd div
15 cycles for 100 * Fpu mult
420 cycles for 100 * Simd mult
1552 cycles for 100 * Fpu div
1821 cycles for 100 * Simd div
32 cycles for 100 * Fpu mult
423 cycles for 100 * Simd mult
1540 cycles for 100 * Fpu div
1821 cycles for 100 * Simd div
30 cycles for 100 * Fpu mult
418 cycles for 100 * Simd mult
1548 cycles for 100 * Fpu div
1818 cycles for 100 * Simd div
23 bytes for Fpu mult
27 bytes for Simd mult
23 bytes for Fpu div
27 bytes for Simd div
1840505499 = eax Fpu mult
1840505499 = eax Simd mult
1326909748 = eax Fpu div
1326909748 = eax Simd div
--- ok ---
I would like to see this benchmark run on leter hardware, mine is Haswell era and the later stuff may have faster SSE2.
AMD Athlon Gold 3150U with Radeon Graphics (SSE4)
74 cycles for 100 * Fpu mult
205 cycles for 100 * Simd mult
277 cycles for 100 * Fpu div
334 cycles for 100 * Simd div
74 cycles for 100 * Fpu mult
3 cycles for 100 * Simd mult
288 cycles for 100 * Fpu div
813 cycles for 100 * Simd div
71 cycles for 100 * Fpu mult
197 cycles for 100 * Simd mult
278 cycles for 100 * Fpu div
562 cycles for 100 * Simd div
112 cycles for 100 * Fpu mult
116 cycles for 100 * Simd mult
283 cycles for 100 * Fpu div
627 cycles for 100 * Simd div
Quote from: jj2007 on February 21, 2023, 08:48:03 AM
Quote from: HSE on February 21, 2023, 08:31:34 AMusing SSE take 75% of time using FPU
That is the real problem. When you have real problem results little tests not longer matter.
I can't see results of calculations in your program.
Quote from: HSE on February 21, 2023, 09:06:57 AMThat is the real problem. When you have real problem results little tests not longer matter.
Go ahead, Hector, show us a real problem :thumbsup:
QuoteI can't see results of calculations in your program.
I can see them, and they are identical.
Quote from: jj2007 on February 21, 2023, 09:08:33 AM
I can see them, and they are identical.
But you are using REAL2 numbers :biggrin:
:biggrin:
Hector
http://masm32.com/board/index.php?topic=10277.0 (http://masm32.com/board/index.php?topic=10277.0)
Is it possible with double precision in Taylor series trigo?
Sine I use x^3, x^5,x^7,x^9, for example the second float is made up by use x^11,x^13,x^15,x^17?
This is where packed SIMD can be used to divpd x^3 and x^5 simultaneously
Quote from: daydreamer on February 21, 2023, 05:39:15 PM
Is it possible with double precision in Taylor series trigo?
Yes :thumbsup: . For standard calculations usually 4-5 polinomies are used in SIMD, and FPU internally use 8-9 polinomies. For sure more polinomies are used for Quadruple precision. DoubleDouble most of the time use FPU functions, in a little complicated way, that result in far better precision than FPU alone and faster than Quadruple, but I still don't played with trigonometry.
Hi daydreamer!
I finished essential trigonometry. DoubleDouble precision is using 16 terms of Taylor's serie for sine and cosine.
Also exponential functions. There "only" 26 terms are used :biggrin: :biggrin:
HSE
Quote from: HSE on March 20, 2023, 07:10:48 AM
Hi daydreamer!
I finished essential trigonometry. DoubleDouble precision is using 16 terms of Taylor's serie for sine and cosine.
Also exponential functions. There "only" 26 terms are used :biggrin: :biggrin:
HSE
Great Héctor :thumbsup:
Gonna post code?
Quote from: daydreamer on March 20, 2023, 08:27:13 PM
Gonna post code?
:thumbsup: It's going to be a new SmplMath's backend. Right now is possible to write simple equations, and that is used in more complex functions, but still I'm missing something in the shunting yard.
Hi
I came across 2 videos from an old friend that explain the basic arithmetic operations of Dekker's DoubleDouble step by step, and as he also says, the arithmetic economy of Dekker is stunning, especially when it comes to division.
Here the link to the 2 videos
https://www.youtube.com/watch?v=6OuqnaHHUG8 (https://www.youtube.com/watch?v=6OuqnaHHUG8)
https://www.youtube.com/watch?v=5IL1LJ5noww (https://www.youtube.com/watch?v=5IL1LJ5noww)
Dekker's original paper from 1971 :eusa_clap:
https://csclub.uwaterloo.ca/~pbarfuss/dekker1971.pdf (https://csclub.uwaterloo.ca/~pbarfuss/dekker1971.pdf)
Biterider
Perhaps same idea ?
http://web.mit.edu/tabbott/Public/quaddouble-debian/qd-2.3.4-old/docs/qd.pdf (http://web.mit.edu/tabbott/Public/quaddouble-debian/qd-2.3.4-old/docs/qd.pdf)
EDIT 2023-11-19
QDC was a QD fork in C, but never finished :sad:
Hi Timo
The idea of extending double-double to quad-double is appealing :thumbsup:
But before getting into that, wouldn't it be wiser to explore extending double-double with REAL10 instead of REAL8? This would give the same algorithm 8 digits more precision without increasing the complexity using today's existing x86 hardware.
Biterider
Hey, that's clever! And even I can understand how it works. Kewl.
it's an interesting subject, but some years ago I tested the performance of dd vs soft-float and soft-float was a bit faster
just now I did a very simple test of qd vs my kindergarten mp float routines and mine was about 45% slower than qd
Um, what exactly is "soft-float"? It certainly can't be floating-point implemented in software if it's faster than "hard-float".
Quote from: TimoVJL on November 18, 2023, 08:16:44 PMPerhaps same idea ?
http://web.mit.edu/tabbott/Public/quaddouble-debian/qd-2.3.4-old/docs/qd.pdf (http://web.mit.edu/tabbott/Public/quaddouble-debian/qd-2.3.4-old/docs/qd.pdf)
Exactly same idea.
Apparently nobody is using qd, probably because range of exponents anyway is that of REAL8. That trigger the circle few used - hardly accepted for publication - few used.
The easy thing could be to use Lawrence Berkeley National Laboratory libraries, but they have now another licenses. I don't remember what was, just not or fun.
Hi Biterider,
Quote from: Biterider on November 18, 2023, 09:14:46 PMBut before getting into that, wouldn't it be wiser to explore extending double-double with REAL10 instead of REAL8?
In the FPU, calculations between REAL10 numbers don't always have REAL10 precision (never trascendental functions, IIRC).
To make calculations between REAL8 numbers and to obtain always REAL8 precision, the FPU use REAL10 as internal representation.
No problem with REAL10 if you solve calculations by hand, instead of FPU :biggrin: :biggrin:
HSE
Alan Miller implemented dd and qd in FORTRAN https://jblevins.org/mirror/amiller/#nas (https://jblevins.org/mirror/amiller/#nas) should be easy enough to translate the code to your favorite language
btw, here's SoftFloat http://www.jhauser.us/arithmetic/SoftFloat.html (http://www.jhauser.us/arithmetic/SoftFloat.html)
Quote from: jack on November 19, 2023, 09:56:04 AMbtw, here's SoftFloat (http://www.jhauser.us/arithmetic/SoftFloat.html)
Thanks (turned that into a clickable link for you; just posting URLs doesn't make them into links here anymore)
But but but ... if it's implemented in software, how could it possibly be faster than a FPU?
not faster than the FPU but faster than dd
btw, the quad_nas.f90 only need 2 small changes for it to compile with gfortran
change line 51 from INTEGER, PARAMETER :: r10 = 3
to INTEGER, PARAMETER :: r10 = real(10)
and line 1464 to 0.0000000000000000000_r10 , 0.000000000000000000_r10 , &
Hi jack,
Quote from: jack on November 19, 2023, 09:56:04 AMAlan Miller implemented dd and qd in FORTRAN https://jblevins.org/mirror/amiller/#nas should be easy enough to translate the code to your favorite language
btw, here's SoftFloat http://www.jhauser.us/arithmetic/SoftFloat.html
No. :eusa_naughty:
QuoteThe latest release of SoftFloat implements five floating-point formats: 16-bit half-precision, 32-bit single-precision, 64-bit double-precision, 80-bit double-extended-precision, and 128-bit quadruple-precision
That don't have
double-double-precision nor
quadruple-double-precision.
HSE
128-bit is the same as double-double :bgrin:
but SoftFloat doesn't come with string to float and vice versa conversion routines, I need to look and see what I used
Quote from: jack on November 19, 2023, 10:54:58 AM128-bit is the same as double-double :bgrin:
Very funny, but no.
dd and Soft-128 both give about 33-34 decimal digits the difference being that Soft-128 has a larger exponent range
for Soft-float you can use https://github.com/jwiegley/gdtoa for the conversions
Quote from: jack on November 19, 2023, 11:02:47 AMdd and Soft-128 both
They are 2 diferents formats.
Double-double-precision (Real8,Real8) use FPU, then is very fast (same idea is used in
Quadruple-double-precision (Real8,Real8,Real8,Real8)).
Quadruple-precision (Real16 or
Soft-128) requiere a library to execute some kind of emulation in memory, that is very slow.
Quote from: jack on November 19, 2023, 11:02:47 AMyou can use https://github.com/jwiegley/gdtoa for the conversions
You can convert numbers between any format you can think, if they fit inside format range, eventualy loosing precision.
Quote from: jj2007 on February 20, 2023, 05:58:35 AM:thumbsup:
I wonder how much precision is necessary to bring a spaceship safely to Mars. Any ideas?
https://www.iflscience.com/why-nasa-only-uses-a-handful-of-the-628-trillion-known-digits-of-pi-in-its-calculations-73401 (https://www.iflscience.com/why-nasa-only-uses-a-handful-of-the-628-trillion-known-digits-of-pi-in-its-calculations-73401)
That article from about a year ago, discusses how precise (in digits) NASA uses pi (16 including the integer part (3.) plus 15 decimal places)
That should give an idea of the precision used in some of its calculations and explanation why.
I just happened to be searching for some info on pi just today - and saw another member viewing this topic a few minutes ago in Who's Online.