News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Fastest sqrt?

Started by daydreamer, December 04, 2023, 06:39:06 PM

Previous topic - Next topic

daydreamer

Fastest sqrt?
Fsqrt with real4,real8,REAL10
SSE sqrtss
SSE2 sqrtds
Rcpsqrt followed by rcpss?
Sqrt LUT 0-255 range = 0-65535 words
https://stackoverflow.com/questions/7724061/how-slow-how-many-cycles-is-calculating-a-square-root
SSE Reciprocal square root enough for calculate per bcd digit?
Probably good enough for student task of calculate phytagoras only output 2-3 decimals

SSE Reciprocal square root with newton-raphson calculated more times than fewer times with real8 together with newton raphson?
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

Quote from: daydreamer on December 04, 2023, 06:39:06 PMFsqrt with real4,real8,REAL10

fsqrt in REAL10 precision costs only 5-6 cycles.

Clever tricks make sense for slow complex functions like the Bessel function.

jack

you might want to play with inverse square root approximation https://stackoverflow.com/questions/11644441/fast-inverse-square-root-on-x64

jj2007

Quote from: jack on December 05, 2023, 12:25:22 AMyou might want to play with inverse square root approximation

I love it when C++ fans try to be fast :biggrin:
AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)
67 µs for initialising FastLog10

1927    cycles for 100 * Log10 (Guga)
4961    cycles for 100 * MasmBasic Log10
1246    cycles for 100 * FastMath Log10
541    cycles for 100 * fsqrt (good ol' FPU)
1246    cycles for 100 * FastSqrt
645    cycles for 100 * SqrtSSE

1948    cycles for 100 * Log10 (Guga)
4917    cycles for 100 * MasmBasic Log10
1242    cycles for 100 * FastMath Log10
538    cycles for 100 * fsqrt (good ol' FPU)
1242    cycles for 100 * FastSqrt
618    cycles for 100 * SqrtSSE

1944    cycles for 100 * Log10 (Guga)
4915    cycles for 100 * MasmBasic Log10
1242    cycles for 100 * FastMath Log10
575    cycles for 100 * fsqrt (good ol' FPU)
1258    cycles for 100 * FastSqrt
614    cycles for 100 * SqrtSSE

2634    cycles for 100 * Log10 (Guga)
4939    cycles for 100 * MasmBasic Log10
1270    cycles for 100 * FastMath Log10
541    cycles for 100 * fsqrt (good ol' FPU)
1242    cycles for 100 * FastSqrt
616    cycles for 100 * SqrtSSE

1883    cycles for 100 * Log10 (Guga)
4929    cycles for 100 * MasmBasic Log10
1269    cycles for 100 * FastMath Log10
548    cycles for 100 * fsqrt (good ol' FPU)
1242    cycles for 100 * FastSqrt
610    cycles for 100 * SqrtSSE

516    bytes for Log10 (Guga)
16      bytes for MasmBasic Log10
67      bytes for FastMath Log10
14      bytes for fsqrt (good ol' FPU)
67      bytes for FastSqrt
30      bytes for SqrtSSE

Real8  0.6989700043360187465
Real8  0.6989700043360188575
Real8  0.6989700043360188575
Real8  2.236067977499789805
Real8  2.236067977499789805
Real8  2.236328125000000000

Note the reduced precision for SqrtSSE: 2.236328125000000000

TimoVJL

Quote from: jj2007 on December 05, 2023, 01:32:25 AM
Quote from: jack on December 05, 2023, 12:25:22 AMyou might want to play with inverse square root approximation

I love it when C++ fans try to be fast :biggrin:
also when we need 32 / 64 bit and ARM versions  :biggrin:

Fastest squirts are also interesting  :biggrin:
May the source be with you

HSE

Quote from: jack on December 05, 2023, 12:25:22 AMyou might want to play with inverse square root approximation https://stackoverflow.com/questions/11644441/fast-inverse-square-root-on-x64

Not so exact, I'm afraid:
sqrt fast?

aprox    5.635775715649263800000e-001

FPU      5.638032088398418100000e-001

Press any key to continue...
Equations in Assembly: SmplMath

HSE

 :biggrin:  For more precision, more cycles (and perhaps no longer faster)

inv sqrt fast?

aprox(4)  5.638032088398418100000e-001

FPU       5.638032088398418100000e-001

Press any key to continue...
Equations in Assembly: SmplMath

daydreamer

thanks jochen,hector,jack  :thumbsup:
Hector for the purpose of user enter number of decimals like in raymonds 10000 decimals,user wants between 1-3 decimals and real8 ,use approximation
for 10000 decimals how many repeated real8 calculation is needed ?
 
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

HSE

If user decide number of decimals that is arbitrary presicion  :thumbsup:
Equations in Assembly: SmplMath