News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

2^x timings

Started by jj2007, April 10, 2013, 10:30:23 PM

Previous topic - Next topic

qWord

Quote from: jj2007 on April 11, 2013, 07:48:51 AM
Looks very competitive, qWord - compliments :t
Would you mind adding it to MasmBasic, with proper acknowledgement, of course?
I'm afraid Agner Fog had found that long before me ;-D
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
loop overhead is approx. 64/20 cycles

1194    cycles for 20 * Pow2 fist/fild
1118    cycles for 20 * Pow2 fadd One
1189    cycles for 20 * Pow2 frndint
2202    cycles for 20 * PowX fist/fild
754     cycles for 20 * pow qWord

885     cycles for 20 * Pow2 fist/fild
1011    cycles for 20 * Pow2 fadd One
1240    cycles for 20 * Pow2 frndint
2100    cycles for 20 * PowX fist/fild
752     cycles for 20 * pow qWord

1085    cycles for 20 * Pow2 fist/fild
1118    cycles for 20 * Pow2 fadd One
1354    cycles for 20 * Pow2 frndint
2147    cycles for 20 * PowX fist/fild
925     cycles for 20 * pow qWord

877     cycles for 20 * Pow2 fist/fild
894     cycles for 20 * Pow2 fadd One
1208    cycles for 20 * Pow2 frndint
2103    cycles for 20 * PowX fist/fild
1006    cycles for 20 * pow qWord

44      bytes for Pow2 fist/fild
38      bytes for Pow2 fadd One
36      bytes for Pow2 frndint
50      bytes for PowX fist/fild
96      bytes for pow qWord


--- ok ---
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

#16
Quote from: qWord on April 11, 2013, 08:08:48 AM
I'm afraid Agner Fog had found that long before me ;-D

Just found bitRAKE's version - he quotes Agner.


Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 38/20 cycles

2647    cycles for 20 * Pow2 fist/fild
1399    cycles for 20 * pow qWord/Agner

2654    cycles for 20 * Pow2 fist/fild
1399    cycles for 20 * pow qWord/Agner

2645    cycles for 20 * Pow2 fist/fild
1406    cycles for 20 * pow qWord/Agner

44      bytes for Pow2 fist/fild
90      bytes for pow qWord/Agner


Antariy

Hi, Jochen :t


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
loop overhead is approx. 68/20 cycles

6315    cycles for 20 * Pow2 fist/fild
4175    cycles for 20 * pow qWord/Agner

6285    cycles for 20 * Pow2 fist/fild
3900    cycles for 20 * pow qWord/Agner

6479    cycles for 20 * Pow2 fist/fild
4086    cycles for 20 * pow qWord/Agner

44      bytes for Pow2 fist/fild
90      bytes for pow qWord/Agner


--- ok ---

jj2007

Thanks, Alex :icon14:

Not much difference for your Celeron, it seems ;-)

sinsi


AMD Phenom(tm) II X6 1100T Processor (SSE3)
loop overhead is approx. 71/20 cycles
837     cycles for 20 * Pow2 fist/fild
1054    cycles for 20 * pow qWord/Agner

Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz (SSE4)
loop overhead is approx. 55/20 cycles
1227    cycles for 20 * Pow2 fist/fild
1065    cycles for 20 * pow qWord/Agner

Intel(R) Core(TM)2 Duo CPU     T8100  @ 2.10GHz (SSE4)
loop overhead is approx. 37/20 cycles
2394    cycles for 20 * Pow2 fist/fild
1249    cycles for 20 * pow qWord/Agner


TouEnMasm


Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
loop overhead is approx. 53/20 cycles

5799    cycles for 20 * Pow2 fist/fild
3774    cycles for 20 * pow qWord/Agner

5806    cycles for 20 * Pow2 fist/fild
3794    cycles for 20 * pow qWord/Agner

5801    cycles for 20 * Pow2 fist/fild
3772    cycles for 20 * pow qWord/Agner

44      bytes for Pow2 fist/fild
90      bytes for pow qWord/Agner


--- ok ---
Fa is a musical note to play with CL

jj2007

#21
Thanks, John and Yves. Apparently the algo doesn't like AMD...


AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 78/20 cycles

913     cycles for 20 * Pow2 fist/fild
1161    cycles for 20 * pow qWord/Agner


Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz (SSE4)
loop overhead is approx. 69/20 cycles

1090    cycles for 20 * Pow2 fist/fild
813     cycles for 20 * pow qWord/Agner

sinsi

>Apparently the algo doesn't like AMD...
I sometimes feel left out amid all these intel cpus.
Plenty of timing code seems to be the opposite for my amd, probably because everyone else is intel.

Magnum

Are there instructions that don't work on AMD's or in a non-standard way ?

I used to have a K-6 myself.

Andy
Take care,
                   Andy

Ubuntu-mate-18.04-desktop-amd64

http://www.goodnewsnetwork.org

Antariy

Jochen, what if add invoke SetProcessAffinityMask,1 into the init of the program? Maybe on mostly multicore AMD CPUs it just gets switch the thread from one core to another?

jj2007

Quote from: Antariy on April 11, 2013, 11:55:28 PM
Jochen, what if add invoke SetProcessAffinityMask,1 into the init of the program? Maybe on mostly multicore AMD CPUs it just gets switch the thread from one core to another?

Alex,
I have been scratching my head all the time why the timings were so volatile on some CPUs... thanks for reminding me of SetProcessAffinityMask :t
I was deeply convinced that I had set it somewhere but nope, it just wasn't included :redface:

The good news is it's now included, see attachment.
The bad news is it doesn't make AMD any faster :icon_mrgreen:

dedndave

Quotethanks for reminding me of SetProcessAffinityMask

:icon_eek:
from the guy who has probably written more timing tests than anyone else - lol
they could probably run a little longer, too
1) select a single core
2) use Sleep,500 after that (or more) to bind and settle
3) try to make each test use about 0.5 seconds

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 58/20 cycles

5899    cycles for 20 * Pow2 fist/fild
4082    cycles for 20 * pow qWord/Agner

5930    cycles for 20 * Pow2 fist/fild
4049    cycles for 20 * pow qWord/Agner

5925    cycles for 20 * Pow2 fist/fild
4096    cycles for 20 * pow qWord/Agner

Gunther

Jochen,

results from Pow2Timings3.zip:

Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 76/20 cycles

1029   cycles for 20 * Pow2 fist/fild
1503   cycles for 20 * pow qWord/Agner

1027   cycles for 20 * Pow2 fist/fild
1726   cycles for 20 * pow qWord/Agner

1639   cycles for 20 * Pow2 fist/fild
899   cycles for 20 * pow qWord/Agner

44   bytes for Pow2 fist/fild
90   bytes for pow qWord/Agner

--- ok ---

Gunther
You have to know the facts before you can distort them.

Antariy

Hi Jochen!

Quote from: jj2007 on April 12, 2013, 12:58:59 AM
I have been scratching my head all the time why the timings were so volatile on some CPUs... thanks for reminding me of SetProcessAffinityMask :t
I was deeply convinced that I had set it somewhere but nope, it just wasn't included :redface:

The good news is it's now included, see attachment.
The bad news is it doesn't make AMD any faster :icon_mrgreen:

:redface:

At least the timings now will not jump like crazy rabbits in other multicore-affected, and not only, tests where testbed could be used - you made it as very comprehensive template :t

jj2007

Quote from: Antariy on April 12, 2013, 02:12:06 PM
At least the timings now will not jump like crazy rabbits in other multicore-affected, and not only, tests where testbed could be used - you made it a very comprehensive template :t

Thanks, Alex.

Exp10, Exp2, ExpE and ExpXY are now implemented, see here.