Author Topic: 2^x timings  (Read 7840 times)

qWord

  • Member
  • *****
  • Posts: 1454
  • The base type of a type is the type itself
    • SmplMath macros
Re: 2^x timings
« Reply #15 on: April 11, 2013, 08:08:48 AM »
Looks very competitive, qWord - compliments :t
Would you mind adding it to MasmBasic, with proper acknowledgement, of course?
I'm afraid Agner Fog had found that long before me ;-D
Code: [Select]
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
loop overhead is approx. 64/20 cycles

1194    cycles for 20 * Pow2 fist/fild
1118    cycles for 20 * Pow2 fadd One
1189    cycles for 20 * Pow2 frndint
2202    cycles for 20 * PowX fist/fild
754     cycles for 20 * pow qWord

885     cycles for 20 * Pow2 fist/fild
1011    cycles for 20 * Pow2 fadd One
1240    cycles for 20 * Pow2 frndint
2100    cycles for 20 * PowX fist/fild
752     cycles for 20 * pow qWord

1085    cycles for 20 * Pow2 fist/fild
1118    cycles for 20 * Pow2 fadd One
1354    cycles for 20 * Pow2 frndint
2147    cycles for 20 * PowX fist/fild
925     cycles for 20 * pow qWord

877     cycles for 20 * Pow2 fist/fild
894     cycles for 20 * Pow2 fadd One
1208    cycles for 20 * Pow2 frndint
2103    cycles for 20 * PowX fist/fild
1006    cycles for 20 * pow qWord

44      bytes for Pow2 fist/fild
38      bytes for Pow2 fadd One
36      bytes for Pow2 frndint
50      bytes for PowX fist/fild
96      bytes for pow qWord


--- ok ---
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

  • Member
  • *****
  • Posts: 7546
  • Assembler is fun ;-)
    • MasmBasic
Re: 2^x timings
« Reply #16 on: April 11, 2013, 08:16:29 AM »
I'm afraid Agner Fog had found that long before me ;-D

Just found bitRAKE's version - he quotes Agner.


Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 38/20 cycles

2647    cycles for 20 * Pow2 fist/fild
1399    cycles for 20 * pow qWord/Agner

2654    cycles for 20 * Pow2 fist/fild
1399    cycles for 20 * pow qWord/Agner

2645    cycles for 20 * Pow2 fist/fild
1406    cycles for 20 * pow qWord/Agner

44      bytes for Pow2 fist/fild
90      bytes for pow qWord/Agner

« Last Edit: April 11, 2013, 09:30:30 AM by jj2007 »

Antariy

  • Member
  • ****
  • Posts: 541
Re: 2^x timings
« Reply #17 on: April 11, 2013, 02:41:47 PM »
Hi, Jochen :t

Code: [Select]
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
loop overhead is approx. 68/20 cycles

6315    cycles for 20 * Pow2 fist/fild
4175    cycles for 20 * pow qWord/Agner

6285    cycles for 20 * Pow2 fist/fild
3900    cycles for 20 * pow qWord/Agner

6479    cycles for 20 * Pow2 fist/fild
4086    cycles for 20 * pow qWord/Agner

44      bytes for Pow2 fist/fild
90      bytes for pow qWord/Agner


--- ok ---

jj2007

  • Member
  • *****
  • Posts: 7546
  • Assembler is fun ;-)
    • MasmBasic
Re: 2^x timings
« Reply #18 on: April 11, 2013, 03:22:09 PM »
Thanks, Alex :icon14:

Not much difference for your Celeron, it seems ;-)

sinsi

  • Member
  • ****
  • Posts: 996
Re: 2^x timings
« Reply #19 on: April 11, 2013, 03:43:42 PM »

AMD Phenom(tm) II X6 1100T Processor (SSE3)
loop overhead is approx. 71/20 cycles
837     cycles for 20 * Pow2 fist/fild
1054    cycles for 20 * pow qWord/Agner

Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz (SSE4)
loop overhead is approx. 55/20 cycles
1227    cycles for 20 * Pow2 fist/fild
1065    cycles for 20 * pow qWord/Agner

Intel(R) Core(TM)2 Duo CPU     T8100  @ 2.10GHz (SSE4)
loop overhead is approx. 37/20 cycles
2394    cycles for 20 * Pow2 fist/fild
1249    cycles for 20 * pow qWord/Agner

I can walk on water but stagger on beer.

ToutEnMasm

  • Member
  • *****
  • Posts: 1189
    • EditMasm
Re: 2^x timings
« Reply #20 on: April 11, 2013, 03:58:30 PM »

Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
loop overhead is approx. 53/20 cycles

5799    cycles for 20 * Pow2 fist/fild
3774    cycles for 20 * pow qWord/Agner

5806    cycles for 20 * Pow2 fist/fild
3794    cycles for 20 * pow qWord/Agner

5801    cycles for 20 * Pow2 fist/fild
3772    cycles for 20 * pow qWord/Agner

44      bytes for Pow2 fist/fild
90      bytes for pow qWord/Agner


--- ok ---
Fa is a musical note to play with CL

jj2007

  • Member
  • *****
  • Posts: 7546
  • Assembler is fun ;-)
    • MasmBasic
Re: 2^x timings
« Reply #21 on: April 11, 2013, 04:34:56 PM »
Thanks, John and Yves. Apparently the algo doesn't like AMD...


AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 78/20 cycles

913     cycles for 20 * Pow2 fist/fild
1161    cycles for 20 * pow qWord/Agner


Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz (SSE4)
loop overhead is approx. 69/20 cycles

1090    cycles for 20 * Pow2 fist/fild
813     cycles for 20 * pow qWord/Agner
« Last Edit: April 11, 2013, 10:25:16 PM by jj2007 »

sinsi

  • Member
  • ****
  • Posts: 996
Re: 2^x timings
« Reply #22 on: April 11, 2013, 04:49:43 PM »
>Apparently the algo doesn't like AMD...
I sometimes feel left out amid all these intel cpus.
Plenty of timing code seems to be the opposite for my amd, probably because everyone else is intel.
I can walk on water but stagger on beer.

Magnum

  • Member
  • *****
  • Posts: 2235
Re: 2^x timings
« Reply #23 on: April 11, 2013, 09:36:04 PM »
Are there instructions that don't work on AMD's or in a non-standard way ?

I used to have a K-6 myself.

Andy
Take care,
                   Andy

Ubuntu-mate-16.04-desktop-amd64

http://www.goodnewsnetwork.org

Antariy

  • Member
  • ****
  • Posts: 541
Re: 2^x timings
« Reply #24 on: April 11, 2013, 11:55:28 PM »
Jochen, what if add invoke SetProcessAffinityMask,1 into the init of the program? Maybe on mostly multicore AMD CPUs it just gets switch the thread from one core to another?

jj2007

  • Member
  • *****
  • Posts: 7546
  • Assembler is fun ;-)
    • MasmBasic
Re: 2^x timings
« Reply #25 on: April 12, 2013, 12:58:59 AM »
Jochen, what if add invoke SetProcessAffinityMask,1 into the init of the program? Maybe on mostly multicore AMD CPUs it just gets switch the thread from one core to another?

Alex,
I have been scratching my head all the time why the timings were so volatile on some CPUs... thanks for reminding me of SetProcessAffinityMask :t
I was deeply convinced that I had set it somewhere but nope, it just wasn't included :redface:

The good news is it's now included, see attachment.
The bad news is it doesn't make AMD any faster :icon_mrgreen:

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: 2^x timings
« Reply #26 on: April 12, 2013, 02:10:37 AM »
Quote
thanks for reminding me of SetProcessAffinityMask

 :icon_eek:
from the guy who has probably written more timing tests than anyone else - lol
they could probably run a little longer, too
1) select a single core
2) use Sleep,500 after that (or more) to bind and settle
3) try to make each test use about 0.5 seconds

prescott w/htt
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 58/20 cycles

5899    cycles for 20 * Pow2 fist/fild
4082    cycles for 20 * pow qWord/Agner

5930    cycles for 20 * Pow2 fist/fild
4049    cycles for 20 * pow qWord/Agner

5925    cycles for 20 * Pow2 fist/fild
4096    cycles for 20 * pow qWord/Agner

Gunther

  • Member
  • *****
  • Posts: 3515
  • Forgive your enemies, but never forget their names
Re: 2^x timings
« Reply #27 on: April 12, 2013, 05:24:38 AM »
Jochen,

results from Pow2Timings3.zip:

Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 76/20 cycles

1029   cycles for 20 * Pow2 fist/fild
1503   cycles for 20 * pow qWord/Agner

1027   cycles for 20 * Pow2 fist/fild
1726   cycles for 20 * pow qWord/Agner

1639   cycles for 20 * Pow2 fist/fild
899   cycles for 20 * pow qWord/Agner

44   bytes for Pow2 fist/fild
90   bytes for pow qWord/Agner

--- ok ---

Gunther
Get your facts first, and then you can distort them.

Antariy

  • Member
  • ****
  • Posts: 541
Re: 2^x timings
« Reply #28 on: April 12, 2013, 02:12:06 PM »
Hi Jochen!

I have been scratching my head all the time why the timings were so volatile on some CPUs... thanks for reminding me of SetProcessAffinityMask :t
I was deeply convinced that I had set it somewhere but nope, it just wasn't included :redface:

The good news is it's now included, see attachment.
The bad news is it doesn't make AMD any faster :icon_mrgreen:

:redface:

At least the timings now will not jump like crazy rabbits in other multicore-affected, and not only, tests where testbed could be used - you made it as very comprehensive template :t

jj2007

  • Member
  • *****
  • Posts: 7546
  • Assembler is fun ;-)
    • MasmBasic
Re: 2^x timings
« Reply #29 on: April 15, 2013, 08:48:17 AM »
At least the timings now will not jump like crazy rabbits in other multicore-affected, and not only, tests where testbed could be used - you made it a very comprehensive template :t

Thanks, Alex.

Exp10, Exp2, ExpE and ExpXY are now implemented, see here.