Fastest sqrt?
Fsqrt with real4,real8,REAL10
SSE sqrtss
SSE2 sqrtds
Rcpsqrt followed by rcpss?
Sqrt LUT 0-255 range = 0-65535 words
https://stackoverflow.com/questions/7724061/how-slow-how-many-cycles-is-calculating-a-square-root (https://stackoverflow.com/questions/7724061/how-slow-how-many-cycles-is-calculating-a-square-root)
SSE Reciprocal square root enough for calculate per bcd digit?
Probably good enough for student task of calculate phytagoras only output 2-3 decimals
SSE Reciprocal square root with newton-raphson calculated more times than fewer times with real8 together with newton raphson?
Quote from: daydreamer on December 04, 2023, 06:39:06 PMFsqrt with real4,real8,REAL10
fsqrt in REAL10 precision costs only 5-6 cycles (https://masm32.com/board/index.php?topic=8744.msg95836#msg95836).
Clever tricks make sense for slow complex functions like the Bessel function (https://masm32.com/board/index.php?topic=8779.0).
you might want to play with inverse square root approximation https://stackoverflow.com/questions/11644441/fast-inverse-square-root-on-x64
Quote from: jack on December 05, 2023, 12:25:22 AMyou might want to play with inverse square root approximation
I love it when C++ fans try to be fast :biggrin:
AMD Athlon Gold 3150U with Radeon Graphics (SSE4)
67 µs for initialising FastLog10
1927 cycles for 100 * Log10 (Guga)
4961 cycles for 100 * MasmBasic Log10
1246 cycles for 100 * FastMath Log10
541 cycles for 100 * fsqrt (good ol' FPU)
1246 cycles for 100 * FastSqrt
645 cycles for 100 * SqrtSSE
1948 cycles for 100 * Log10 (Guga)
4917 cycles for 100 * MasmBasic Log10
1242 cycles for 100 * FastMath Log10
538 cycles for 100 * fsqrt (good ol' FPU)
1242 cycles for 100 * FastSqrt
618 cycles for 100 * SqrtSSE
1944 cycles for 100 * Log10 (Guga)
4915 cycles for 100 * MasmBasic Log10
1242 cycles for 100 * FastMath Log10
575 cycles for 100 * fsqrt (good ol' FPU)
1258 cycles for 100 * FastSqrt
614 cycles for 100 * SqrtSSE
2634 cycles for 100 * Log10 (Guga)
4939 cycles for 100 * MasmBasic Log10
1270 cycles for 100 * FastMath Log10
541 cycles for 100 * fsqrt (good ol' FPU)
1242 cycles for 100 * FastSqrt
616 cycles for 100 * SqrtSSE
1883 cycles for 100 * Log10 (Guga)
4929 cycles for 100 * MasmBasic Log10
1269 cycles for 100 * FastMath Log10
548 cycles for 100 * fsqrt (good ol' FPU)
1242 cycles for 100 * FastSqrt
610 cycles for 100 * SqrtSSE
516 bytes for Log10 (Guga)
16 bytes for MasmBasic Log10
67 bytes for FastMath Log10
14 bytes for fsqrt (good ol' FPU)
67 bytes for FastSqrt
30 bytes for SqrtSSE
Real8 0.6989700043360187465
Real8 0.6989700043360188575
Real8 0.6989700043360188575
Real8 2.236067977499789805
Real8 2.236067977499789805
Real8 2.236328125000000000
Note the reduced precision for SqrtSSE: 2.236328125000000000
Quote from: jj2007 on December 05, 2023, 01:32:25 AMQuote from: jack on December 05, 2023, 12:25:22 AMyou might want to play with inverse square root approximation
I love it when C++ fans try to be fast :biggrin:
also when we need 32 / 64 bit and ARM versions :biggrin:
Fastest squirts are also interesting :biggrin:
Quote from: jack on December 05, 2023, 12:25:22 AMyou might want to play with inverse square root approximation https://stackoverflow.com/questions/11644441/fast-inverse-square-root-on-x64
Not so exact, I'm afraid:
sqrt fast?
aprox 5.635775715649263800000e-001
FPU 5.638032088398418100000e-001
Press any key to continue...
:biggrin: For more precision, more cycles (and perhaps no longer faster)
inv sqrt fast?
aprox(4) 5.638032088398418100000e-001
FPU 5.638032088398418100000e-001
Press any key to continue...
thanks jochen,hector,jack :thumbsup:
Hector for the purpose of user enter number of decimals like in raymonds 10000 decimals,user wants between 1-3 decimals and real8 ,use approximation
for 10000 decimals how many repeated real8 calculation is needed ?
If user decide number of decimals that is arbitrary presicion :thumbsup: