News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Playing with DoubleDouble presicion

Started by HSE, February 20, 2023, 05:42:06 AM

Previous topic - Next topic

jj2007

Quote from: HSE on February 21, 2023, 08:31:34 AMusing SSE take 75% of time using FPU

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

80      cycles for 100 * Fpu mult
458     cycles for 100 * Simd mult
2223    cycles for 100 * Fpu div
2124    cycles for 100 * Simd div

81      cycles for 100 * Fpu mult
454     cycles for 100 * Simd mult
2225    cycles for 100 * Fpu div
1903    cycles for 100 * Simd div

80      cycles for 100 * Fpu mult
458     cycles for 100 * Simd mult
2217    cycles for 100 * Fpu div
2129    cycles for 100 * Simd div

83      cycles for 100 * Fpu mult
460     cycles for 100 * Simd mult
2222    cycles for 100 * Fpu div
2123    cycles for 100 * Simd div

23      bytes for Fpu mult
27      bytes for Simd mult
18      bytes for Fpu div
22      bytes for Simd div


Calculation results are identical.

Same but with REAL10 precision for the FPU (the bottleneck is fstp res10):
626     cycles for 100 * Fpu mult
460     cycles for 100 * Simd mult
1821    cycles for 100 * Fpu div
2121    cycles for 100 * Simd div

645     cycles for 100 * Fpu mult
463     cycles for 100 * Simd mult
1818    cycles for 100 * Fpu div
2125    cycles for 100 * Simd div

629     cycles for 100 * Fpu mult
459     cycles for 100 * Simd mult
1820    cycles for 100 * Fpu div
2128    cycles for 100 * Simd div

630     cycles for 100 * Fpu mult
455     cycles for 100 * Simd mult
1818    cycles for 100 * Fpu div
2128    cycles for 100 * Simd div

hutch--


Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (SSE4)

32      cycles for 100 * Fpu mult
425     cycles for 100 * Simd mult
1537    cycles for 100 * Fpu div
1831    cycles for 100 * Simd div

15      cycles for 100 * Fpu mult
420     cycles for 100 * Simd mult
1552    cycles for 100 * Fpu div
1821    cycles for 100 * Simd div

32      cycles for 100 * Fpu mult
423     cycles for 100 * Simd mult
1540    cycles for 100 * Fpu div
1821    cycles for 100 * Simd div

30      cycles for 100 * Fpu mult
418     cycles for 100 * Simd mult
1548    cycles for 100 * Fpu div
1818    cycles for 100 * Simd div

23      bytes for Fpu mult
27      bytes for Simd mult
23      bytes for Fpu div
27      bytes for Simd div

1840505499      = eax Fpu mult
1840505499      = eax Simd mult
1326909748      = eax Fpu div
1326909748      = eax Simd div

--- ok ---


hutch--

I would like to see this benchmark run on leter hardware, mine is Haswell era and the later stuff may have faster SSE2.

jj2007

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

74      cycles for 100 * Fpu mult
205     cycles for 100 * Simd mult
277     cycles for 100 * Fpu div
334     cycles for 100 * Simd div

74      cycles for 100 * Fpu mult
3       cycles for 100 * Simd mult
288     cycles for 100 * Fpu div
813     cycles for 100 * Simd div

71      cycles for 100 * Fpu mult
197     cycles for 100 * Simd mult
278     cycles for 100 * Fpu div
562     cycles for 100 * Simd div

112     cycles for 100 * Fpu mult
116     cycles for 100 * Simd mult
283     cycles for 100 * Fpu div
627     cycles for 100 * Simd div

HSE

Quote from: jj2007 on February 21, 2023, 08:48:03 AM
Quote from: HSE on February 21, 2023, 08:31:34 AMusing SSE take 75% of time using FPU

That is the real problem. When you have real problem results little tests not longer matter.

I can't see results of calculations in your program.
Equations in Assembly: SmplMath

jj2007

Quote from: HSE on February 21, 2023, 09:06:57 AMThat is the real problem. When you have real problem results little tests not longer matter.

Go ahead, Hector, show us a real problem :thumbsup:

QuoteI can't see results of calculations in your program.

I can see them, and they are identical.

HSE

Equations in Assembly: SmplMath

jj2007


HSE

Equations in Assembly: SmplMath

daydreamer

Hector
http://masm32.com/board/index.php?topic=10277.0
Is it possible with double precision in Taylor series trigo?
Sine I use x^3, x^5,x^7,x^9, for example the second float is made up by use x^11,x^13,x^15,x^17?
This is where packed SIMD can be used to divpd x^3 and x^5 simultaneously
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

HSE

Quote from: daydreamer on February 21, 2023, 05:39:15 PM
Is it possible with double precision in Taylor series trigo?

Yes  :thumbsup: . For standard calculations usually 4-5 polinomies are used in SIMD, and FPU internally use 8-9 polinomies. For sure more polinomies are used for Quadruple precision. DoubleDouble most of the time use FPU functions, in a little complicated way, that result in far better precision than FPU alone and faster than Quadruple, but I still don't played with trigonometry.
Equations in Assembly: SmplMath

HSE

Hi daydreamer!

I finished essential trigonometry. DoubleDouble precision is using 16 terms of Taylor's serie for sine and cosine.

Also exponential functions. There "only" 26 terms are used  :biggrin: :biggrin:

HSE


Equations in Assembly: SmplMath

daydreamer

Quote from: HSE on March 20, 2023, 07:10:48 AM
Hi daydreamer!

I finished essential trigonometry. DoubleDouble precision is using 16 terms of Taylor's serie for sine and cosine.

Also exponential functions. There "only" 26 terms are used  :biggrin: :biggrin:

HSE
Great Héctor :thumbsup:
Gonna post code?
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

HSE

Quote from: daydreamer on March 20, 2023, 08:27:13 PM
Gonna post code?

:thumbsup: It's going to be a new SmplMath's backend. Right now is possible to write simple equations, and that is used in more complex functions, but still I'm missing something in the shunting yard.
Equations in Assembly: SmplMath

Biterider

Hi
I came across 2 videos from an old friend that explain the basic arithmetic operations of Dekker's DoubleDouble step by step, and as he also says, the arithmetic economy of Dekker is stunning, especially when it comes to division.

Here the link to the 2 videos
https://www.youtube.com/watch?v=6OuqnaHHUG8
https://www.youtube.com/watch?v=5IL1LJ5noww

Dekker's original paper from 1971 :eusa_clap:
https://csclub.uwaterloo.ca/~pbarfuss/dekker1971.pdf

Biterider