The MASM Forum

General => The Laboratory => Topic started by: jj2007 on October 18, 2016, 05:06:20 AM

Title: CRT vs GSL
Post by: jj2007 on October 18, 2016, 05:06:20 AM
Timings for a Million iterations, trusty old CRT vs GNU Scientific Library (http://www.gnu.org/software/gsl/manual/gsl-ref.html#An-Example-Program):

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
J0 =-0.177596771314338304...    Wolfram Alpha
J0 =-0.17759677131433826        crt
J0 =-0.17759677131433829        Gsl

Y0 =-0.308517625249033780...    Wolfram Alpha
Y0 =-0.30851762524903474        crt
Y0 =-0.30851762524903376        Gsl

Result: -0.17759677131433826    53 ms for CRT J0
Result: -0.17759677131433829    519 ms for GSL

Result: -0.30851762524903474    115 ms for CRT Y0
Result: -0.30851762524903376    514 ms for GSL
Title: Re: CRT vs GSL
Post by: jj2007 on October 18, 2016, 11:31:58 AM
I've tried to add an example from Intel's Math Kernel Library - it's a free 613MB download.

Technically speaking, the MKL is accessible from assembler, see attachment, but I couldn't find an example for the simple Bessel functions ::)

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Intel(R) Math Kernel Library Version 2017.0.0 Product Build 20160801 for 32-bit applications
CPU current:    2.89 GHz
CPU max:        2.50 GHz
CPU clocks:     2.49 GHz
Title: Re: CRT vs GSL
Post by: jj2007 on October 19, 2016, 12:29:35 AM
Found another test case for the MKL - square root:Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Intel(R) Math Kernel Library Version 2017.0.0 Product Build 20160801 for 32-bit applications

SQRT: 10 elements, 1000000 loops
70 ms for FPU, fsqrt
110 ms for MKL, vdSqrt

SQRT: 1000 elements, 1000000 loops
7815 ms for FPU, fsqrt
3573 ms for MKL, vdSqrt

sq =1.000000000000000000        MKL square root
sq =1.414213562373095145        MKL square root
sq =1.732050807568877193        MKL square root
sq =2.000000000000000000        MKL square root
sq =2.236067977499789805        MKL square root


This loads a double array with 0, 1, 2 etc and then calculates square roots for each element. For small number of elements (<=20), the FPU is faster, but for high element counts, the MKL is twice as fast.

Source and exe attached, but the MKL is required; the exe will invite you to visit Intel (https://registrationcenter.intel.com/en/forms/?productid=2558&licensetype=2) if it can't find it. The download is free but registration is required.

(the *.asc source is in RTF format, opens in Wordpad but much better in RichMasm (http://masm32.com/board/index.php?topic=5314.0))
Title: Re: CRT vs GSL
Post by: jj2007 on October 20, 2016, 09:31:33 AM
Another test, this time with the Intel compiler's libmmd.dll, which I just discovered on my Win7-64 machine:

include \masm32\MasmBasic\MasmBasic.inc      ; download (http://masm32.com/board/index.php?topic=94.0)
  Init

  Dll "%CommonProgramFiles%\Intel\Shared Libraries\redist\ia32\compiler\libmmd.dll"
  Declare double j0, C:1

  PrintLine "Bessel J0(5)=-0.17759677131433830435      (Wolfram Alpha)"      ; site (http://functions.wolfram.com/webMathematica/FunctionEvaluation.jsp?name=BesselJ)
  Print Str$("Bessel J0(5)=%Jf      (Intel)\n", j0(5.0)#)
  invoke crt__j0, FP8(5.0)
  Print Str$("Bessel J0(5)=%Jf      (CRT)\n\n", ST(0)#)

  PrintLine "Bessel J0(7)= 0.30007927051955559665      (Wolfram Alpha)"
  Print Str$("Bessel J0(7)= %Jf      (Intel)\n", j0(7.0)#)
  invoke crt__j0, FP8(7.0)
  Inkey Str$("Bessel J0(7)= %Jf      (CRT)", ST(0)#)
EndOfCode


The output shows that the Intel compiler returns REAL10 precision:Bessel J0(5)=-0.17759677131433830435    (Wolfram Alpha)
Bessel J0(5)=-0.1775967713143383044     (Intel)
Bessel J0(5)=-0.1775967713143382590     (CRT)

Bessel J0(7)= 0.30007927051955559665    (Wolfram Alpha)
Bessel J0(7)= 0.3000792705195555967     (Intel)
Bessel J0(7)= 0.3000792705195550060     (CRT)


Does everybody have the file C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\ia32\compiler\libmmd.dll ?
These DLLs are available here (//http://), but I can't remember that I installed them knowingly; perhaps some other package did it ::)