The MASM Forum

General => The Laboratory => Topic started by: daydreamer on December 04, 2023, 06:39:06 PM

Title: Fastest sqrt?
Post by: daydreamer on December 04, 2023, 06:39:06 PM
Fastest sqrt?
Fsqrt with real4,real8,REAL10
SSE sqrtss
SSE2 sqrtds
Rcpsqrt followed by rcpss?
Sqrt LUT 0-255 range = 0-65535 words
https://stackoverflow.com/questions/7724061/how-slow-how-many-cycles-is-calculating-a-square-root (https://stackoverflow.com/questions/7724061/how-slow-how-many-cycles-is-calculating-a-square-root)
SSE Reciprocal square root enough for calculate per bcd digit?
Probably good enough for student task of calculate phytagoras only output 2-3 decimals

SSE Reciprocal square root with newton-raphson calculated more times than fewer times with real8 together with newton raphson?
Title: Re: Fastest sqrt?
Post by: jj2007 on December 04, 2023, 09:21:50 PM
Quote from: daydreamer on December 04, 2023, 06:39:06 PMFsqrt with real4,real8,REAL10

fsqrt in REAL10 precision costs only 5-6 cycles (https://masm32.com/board/index.php?topic=8744.msg95836#msg95836).

Clever tricks make sense for slow complex functions like the Bessel function (https://masm32.com/board/index.php?topic=8779.0).
Title: Re: Fastest sqrt?
Post by: jack on December 05, 2023, 12:25:22 AM
you might want to play with inverse square root approximation https://stackoverflow.com/questions/11644441/fast-inverse-square-root-on-x64
Title: Re: Fastest sqrt?
Post by: jj2007 on December 05, 2023, 01:32:25 AM
Quote from: jack on December 05, 2023, 12:25:22 AMyou might want to play with inverse square root approximation

I love it when C++ fans try to be fast :biggrin:
AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)
67 µs for initialising FastLog10

1927    cycles for 100 * Log10 (Guga)
4961    cycles for 100 * MasmBasic Log10
1246    cycles for 100 * FastMath Log10
541    cycles for 100 * fsqrt (good ol' FPU)
1246    cycles for 100 * FastSqrt
645    cycles for 100 * SqrtSSE

1948    cycles for 100 * Log10 (Guga)
4917    cycles for 100 * MasmBasic Log10
1242    cycles for 100 * FastMath Log10
538    cycles for 100 * fsqrt (good ol' FPU)
1242    cycles for 100 * FastSqrt
618    cycles for 100 * SqrtSSE

1944    cycles for 100 * Log10 (Guga)
4915    cycles for 100 * MasmBasic Log10
1242    cycles for 100 * FastMath Log10
575    cycles for 100 * fsqrt (good ol' FPU)
1258    cycles for 100 * FastSqrt
614    cycles for 100 * SqrtSSE

2634    cycles for 100 * Log10 (Guga)
4939    cycles for 100 * MasmBasic Log10
1270    cycles for 100 * FastMath Log10
541    cycles for 100 * fsqrt (good ol' FPU)
1242    cycles for 100 * FastSqrt
616    cycles for 100 * SqrtSSE

1883    cycles for 100 * Log10 (Guga)
4929    cycles for 100 * MasmBasic Log10
1269    cycles for 100 * FastMath Log10
548    cycles for 100 * fsqrt (good ol' FPU)
1242    cycles for 100 * FastSqrt
610    cycles for 100 * SqrtSSE

516    bytes for Log10 (Guga)
16      bytes for MasmBasic Log10
67      bytes for FastMath Log10
14      bytes for fsqrt (good ol' FPU)
67      bytes for FastSqrt
30      bytes for SqrtSSE

Real8  0.6989700043360187465
Real8  0.6989700043360188575
Real8  0.6989700043360188575
Real8  2.236067977499789805
Real8  2.236067977499789805
Real8  2.236328125000000000

Note the reduced precision for SqrtSSE: 2.236328125000000000
Title: Re: Fastest sqrt?
Post by: TimoVJL on December 05, 2023, 01:56:54 AM
Quote from: jj2007 on December 05, 2023, 01:32:25 AM
Quote from: jack on December 05, 2023, 12:25:22 AMyou might want to play with inverse square root approximation

I love it when C++ fans try to be fast :biggrin:
also when we need 32 / 64 bit and ARM versions  :biggrin:

Fastest squirts are also interesting  :biggrin:
Title: Re: Fastest sqrt?
Post by: HSE on December 05, 2023, 03:29:03 AM
Quote from: jack on December 05, 2023, 12:25:22 AMyou might want to play with inverse square root approximation https://stackoverflow.com/questions/11644441/fast-inverse-square-root-on-x64

Not so exact, I'm afraid:
sqrt fast?

aprox    5.635775715649263800000e-001

FPU      5.638032088398418100000e-001

Press any key to continue...
Title: Re: Fastest sqrt?
Post by: HSE on December 05, 2023, 03:41:24 AM
 :biggrin:  For more precision, more cycles (and perhaps no longer faster)

inv sqrt fast?

aprox(4)  5.638032088398418100000e-001

FPU       5.638032088398418100000e-001

Press any key to continue...
Title: Re: Fastest sqrt?
Post by: daydreamer on December 07, 2023, 05:03:48 AM
thanks jochen,hector,jack  :thumbsup:
Hector for the purpose of user enter number of decimals like in raymonds 10000 decimals,user wants between 1-3 decimals and real8 ,use approximation
for 10000 decimals how many repeated real8 calculation is needed ?
 
Title: Re: Fastest sqrt?
Post by: HSE on December 07, 2023, 11:23:13 AM
If user decide number of decimals that is arbitrary presicion  :thumbsup: