News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

LOG timings

Started by jj2007, January 31, 2013, 07:34:54 AM

Previous topic - Next topic

jj2007

I am tinkering with Pelles C, and stumbled over a hilariously complicated implementation of log(x). Timings are accordingly :biggrin:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 189/100 cycles

6357    cycles for 100 * log Asm
26583   cycles for 100 * log CRT
21191   cycles for 100 * logC Math.h

6343    cycles for 100 * log Asm
26624   cycles for 100 * log CRT
21163   cycles for 100 * logC Math.h

6351    cycles for 100 * log Asm
26614   cycles for 100 * log CRT
21157   cycles for 100 * logC Math.h

13      bytes for log Asm
24      bytes for log CRT
23      bytes for logC Math.h

R8=-5.54677872584653642 for t log Asm
R8=-5.54677872584653642 for t log CRT
R8=-5.54677872584653642 for t logC Math.h


P.S.: Build requires linking LogC.obj (included) and \Masm32\PellesC\Lib\crt.lib (not included)

Gunther

Hi Jochen,

here are my results.


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
++++++++++++++++++2 of 20 tests valid, loop overhead is approx. 191/100 cycles

5012 cycles for 100 * log Asm
16923 cycles for 100 * log Asm with checks
21842 cycles for 100 * log CRT
22318 cycles for 100 * logC Math.h

11258 cycles for 100 * log Asm
18792 cycles for 100 * log Asm with checks
15586 cycles for 100 * log CRT
19173 cycles for 100 * logC Math.h

10486 cycles for 100 * log Asm
16812 cycles for 100 * log Asm with checks
21828 cycles for 100 * log CRT
22309 cycles for 100 * logC Math.h

13 bytes for log Asm
19 bytes for log Asm with checks
24 bytes for log CRT
23 bytes for logC Math.h

R8=-5.54677872584653642 for t log Asm
R8=-5.54677872584653642 for t log Asm with checks
R8=-5.54677872584653642 for t log CRT
R8=-5.54677872584653642 for t logC Math.h

--- ok ---


I can't re-build the EXE, because I haven't crt.lib. But transzendental functions are mostly awkward implemented by compiler builders and time consuming, too.

Gunther
You have to know the facts before you can distort them.

dedndave

18 of 20 ? - sounds borg to me   :P

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
++18 of 20 tests valid, loop overhead is approx. 239/100 cycles

10014   cycles for 100 * log Asm
81822   cycles for 100 * log Asm with checks
84241   cycles for 100 * log CRT
47662   cycles for 100 * logC Math.h

11296   cycles for 100 * log Asm
82449   cycles for 100 * log Asm with checks
82746   cycles for 100 * log CRT
54064   cycles for 100 * logC Math.h

9925    cycles for 100 * log Asm
77921   cycles for 100 * log Asm with checks
90875   cycles for 100 * log CRT
47641   cycles for 100 * logC Math.h

13      bytes for log Asm
19      bytes for log Asm with checks
24      bytes for log CRT
23      bytes for logC Math.h

R8=-5.54677872584653642 for t log Asm
R8=-5.54677872584653642 for t log Asm with checks
R8=-5.54677872584653642 for t log CRT
R8=-5.54677872584653642 for t logC Math.h

qWord

It seems like that a  series for ln(x) = 2*artanh((x-1)/(x+1)) is used - really not that effective, even if the FPU is allready used...
MREAL macros - when you need floating point arithmetic while assembling!