Timings for a Million iterations, trusty old CRT vs GNU Scientific Library (http://www.gnu.org/software/gsl/manual/gsl-ref.html#An-Example-Program):
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
J0 =-0.177596771314338304... Wolfram Alpha
J0 =-0.17759677131433826 crt
J0 =-0.17759677131433829 Gsl
Y0 =-0.308517625249033780... Wolfram Alpha
Y0 =-0.30851762524903474 crt
Y0 =-0.30851762524903376 Gsl
Result: -0.17759677131433826 53 ms for CRT J0
Result: -0.17759677131433829 519 ms for GSL
Result: -0.30851762524903474 115 ms for CRT Y0
Result: -0.30851762524903376 514 ms for GSL
I've tried to add an example from Intel's Math Kernel Library - it's a free 613MB download.
Technically speaking, the MKL is accessible from assembler, see attachment, but I couldn't find an example for the simple Bessel functions ::)
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Intel(R) Math Kernel Library Version 2017.0.0 Product Build 20160801 for 32-bit applications
CPU current: 2.89 GHz
CPU max: 2.50 GHz
CPU clocks: 2.49 GHz
Found another test case for the MKL - square root:Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Intel(R) Math Kernel Library Version 2017.0.0 Product Build 20160801 for 32-bit applications
SQRT: 10 elements, 1000000 loops
70 ms for FPU, fsqrt
110 ms for MKL, vdSqrt
SQRT: 1000 elements, 1000000 loops
7815 ms for FPU, fsqrt
3573 ms for MKL, vdSqrt
sq =1.000000000000000000 MKL square root
sq =1.414213562373095145 MKL square root
sq =1.732050807568877193 MKL square root
sq =2.000000000000000000 MKL square root
sq =2.236067977499789805 MKL square root
This loads a double array with 0, 1, 2 etc and then calculates square roots for each element. For small number of elements (<=20), the FPU is faster, but for high element counts, the MKL is twice as fast.
Source and exe attached, but the MKL is required; the exe will invite you to visit Intel (https://registrationcenter.intel.com/en/forms/?productid=2558&licensetype=2) if it can't find it. The download is free but registration is required.
(the *.asc source is in RTF format, opens in Wordpad but much better in RichMasm (http://masm32.com/board/index.php?topic=5314.0))
Another test, this time with the Intel compiler's libmmd.dll, which I just discovered on my Win7-64 machine:
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
Init
Dll "%CommonProgramFiles%\Intel\Shared Libraries\redist\ia32\compiler\libmmd.dll"
Declare double j0, C:1
PrintLine "Bessel J0(5)=-0.17759677131433830435 (Wolfram Alpha)" ; site (http://functions.wolfram.com/webMathematica/FunctionEvaluation.jsp?name=BesselJ)
Print Str$("Bessel J0(5)=%Jf (Intel)\n", j0(5.0)#)
invoke crt__j0, FP8(5.0)
Print Str$("Bessel J0(5)=%Jf (CRT)\n\n", ST(0)#)
PrintLine "Bessel J0(7)= 0.30007927051955559665 (Wolfram Alpha)"
Print Str$("Bessel J0(7)= %Jf (Intel)\n", j0(7.0)#)
invoke crt__j0, FP8(7.0)
Inkey Str$("Bessel J0(7)= %Jf (CRT)", ST(0)#)
EndOfCode
The output shows that the Intel compiler returns REAL10 precision:Bessel J0(5)=-0.17759677131433830435 (Wolfram Alpha)
Bessel J0(5)=-0.1775967713143383044 (Intel)
Bessel J0(5)=-0.1775967713143382590 (CRT)
Bessel J0(7)= 0.30007927051955559665 (Wolfram Alpha)
Bessel J0(7)= 0.3000792705195555967 (Intel)
Bessel J0(7)= 0.3000792705195550060 (CRT)
Does everybody have the file C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\ia32\compiler\libmmd.dll ?
These DLLs are available here (//http://), but I can't remember that I installed them knowingly; perhaps some other package did it ::)