Print Page - 2^x timings

Title: 2^x timings
Post by: jj2007 on April 10, 2013, 10:30:23 PM

Hi folks,
Could I please have some timings for these Y=2^x algos?
Thanks, JJ

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 79/20 cycles

913 cycles for 20 * Pow2 fist/fild
907 cycles for 20 * Pow2 fadd One
1500 cycles for 20 * Pow2 frndint

Title: Re: 2^x timings
Post by: anta40 on April 10, 2013, 10:43:31 PM

Updated:

Code Select

Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz (SSE4)
loop overhead is approx. 30/20 cycles

1037    cycles for 20 * Pow2 fist/fild
1039    cycles for 20 * Pow2 fadd One
1183    cycles for 20 * Pow2 frndint
2301    cycles for 20 * PowX fist/fild
2429    cycles for 20 * PowX frndint

1071    cycles for 20 * Pow2 fist/fild
1046    cycles for 20 * Pow2 fadd One
1153    cycles for 20 * Pow2 frndint
2544    cycles for 20 * PowX fist/fild
2740    cycles for 20 * PowX frndint

1343    cycles for 20 * Pow2 fist/fild
1202    cycles for 20 * Pow2 fadd One
1448    cycles for 20 * Pow2 frndint
2600    cycles for 20 * PowX fist/fild
2438    cycles for 20 * PowX frndint

1295    cycles for 20 * Pow2 fist/fild
1322    cycles for 20 * Pow2 fadd One
1452    cycles for 20 * Pow2 frndint
2775    cycles for 20 * PowX fist/fild
2624    cycles for 20 * PowX frndint

44      bytes for Pow2 fist/fild
38      bytes for Pow2 fadd One
36      bytes for Pow2 frndint
50      bytes for PowX fist/fild
47      bytes for PowX frndint


--- ok --

Title: Re: 2^x timings
Post by: Gunther on April 10, 2013, 11:04:27 PM

Jochen,

here are my results:

Code Select


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 77/20 cycles

1036	cycles for 20 * Pow2 fist/fild
1053	cycles for 20 * Pow2 fadd One
24582	cycles for 20 * Pow2 frndint

1027	cycles for 20 * Pow2 fist/fild
1054	cycles for 20 * Pow2 fadd One
24499	cycles for 20 * Pow2 frndint

1026	cycles for 20 * Pow2 fist/fild
1351	cycles for 20 * Pow2 fadd One
24550	cycles for 20 * Pow2 frndint

44	bytes for Pow2 fist/fild
38	bytes for Pow2 fadd One
34	bytes for Pow2 frndint

--- ok ---

By the way: well done. :t

Gunther

Title: Re: 2^x timings
Post by: Magnum on April 10, 2013, 11:19:15 PM

Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz (SSE4)
loop overhead is approx. 68/20 cycles

2296   cycles for 20 * Pow2 fist/fild
2324   cycles for 20 * Pow2 fadd One
14751   cycles for 20 * Pow2 frndint

2298   cycles for 20 * Pow2 fist/fild
2300   cycles for 20 * Pow2 fadd One
14857   cycles for 20 * Pow2 frndint

2325   cycles for 20 * Pow2 fist/fild
2295   cycles for 20 * Pow2 fadd One
14731   cycles for 20 * Pow2 frndint

44   bytes for Pow2 fist/fild
38   bytes for Pow2 fadd One
34   bytes for Pow2 frndint

Title: Re: 2^x timings
Post by: FORTRANS on April 10, 2013, 11:27:18 PM

Hi,

P-III, others if wanted.

Code Select


pre-P4 (SSE1)
loop overhead is approx. 48/20 cycles

2665    cycles for 20 * Pow2 fist/fild
2654    cycles for 20 * Pow2 fadd One
8635    cycles for 20 * Pow2 frndint

2664    cycles for 20 * Pow2 fist/fild
2655    cycles for 20 * Pow2 fadd One
8635    cycles for 20 * Pow2 frndint

2664    cycles for 20 * Pow2 fist/fild
2650    cycles for 20 * Pow2 fadd One
8635    cycles for 20 * Pow2 frndint

44      bytes for Pow2 fist/fild
38      bytes for Pow2 fadd One
34      bytes for Pow2 frndint


--- ok ---

Regards,

Steve N.

Title: Re: 2^x timings
Post by: dedndave on April 11, 2013, 12:26:50 AM

prescott w/htt

Code Select

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 51/20 cycles

5718    cycles for 20 * Pow2 fist/fild
5759    cycles for 20 * Pow2 fadd One
78540   cycles for 20 * Pow2 frndint

5740    cycles for 20 * Pow2 fist/fild
5725    cycles for 20 * Pow2 fadd One
78548   cycles for 20 * Pow2 frndint

5732    cycles for 20 * Pow2 fist/fild
5716    cycles for 20 * Pow2 fadd One
79000   cycles for 20 * Pow2 frndint

i don't know what's in frndint, but it doesn't like P4's - lol

Title: Re: 2^x timings
Post by: sinsi on April 11, 2013, 12:39:46 AM

AMD Phenom(tm) II X6 1100T Processor (SSE3)
loop overhead is approx. 75/20 cycles

895 cycles for 20 * Pow2 fist/fild
890 cycles for 20 * Pow2 fadd One
1465 cycles for 20 * Pow2 frndint

Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz (SSE4)
loop overhead is approx. 39/20 cycles

1254 cycles for 20 * Pow2 fist/fild
1279 cycles for 20 * Pow2 fadd One
28154 cycles for 20 * Pow2 frndint

Title: Re: 2^x timings
Post by: jj2007 on April 11, 2013, 12:53:11 AM

Quote from: dedndave on April 11, 2013, 12:26:50 AM
i don't know what's in frndint, but it doesn't like P4's - lol

Could be related to the fact that I forgot one fld st in that algo :redface:

New version attached on top of this thread. My apologies for having wasted your time. To compensate, I added z=x^y to the list - see PowX.

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)

912 cycles for 20 * Pow2 fist/fild
980 cycles for 20 * Pow2 fadd One
1041 cycles for 20 * Pow2 frndint
4059 cycles for 20 * PowX fist/fild
3982 cycles for 20 * PowX frndint

841 cycles for 20 * Pow2 fist/fild
906 cycles for 20 * Pow2 fadd One
1039 cycles for 20 * Pow2 frndint
4054 cycles for 20 * PowX fist/fild
4065 cycles for 20 * PowX frndint

Title: Re: 2^x timings
Post by: dedndave on April 11, 2013, 01:18:18 AM

much better :biggrin:

prescott w/htt

Code Select

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 47/20 cycles

5724    cycles for 20 * Pow2 fist/fild
5745    cycles for 20 * Pow2 fadd One
5932    cycles for 20 * Pow2 frndint
8048    cycles for 20 * PowX fist/fild
8082    cycles for 20 * PowX frndint

5739    cycles for 20 * Pow2 fist/fild
5715    cycles for 20 * Pow2 fadd One
5947    cycles for 20 * Pow2 frndint
8371    cycles for 20 * PowX fist/fild
8105    cycles for 20 * PowX frndint

5737    cycles for 20 * Pow2 fist/fild
5759    cycles for 20 * Pow2 fadd One
5924    cycles for 20 * Pow2 frndint
8060    cycles for 20 * PowX fist/fild
8094    cycles for 20 * PowX frndint

5748    cycles for 20 * Pow2 fist/fild
5736    cycles for 20 * Pow2 fadd One
5930    cycles for 20 * Pow2 frndint
8021    cycles for 20 * PowX fist/fild
8116    cycles for 20 * PowX frndint

Title: Re: 2^x timings
Post by: Gunther on April 11, 2013, 01:27:45 AM

Jochen,

never mind; no need for excuses. Here are the new timings:

Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 78/20 cycles

1481   cycles for 20 * Pow2 fist/fild
1505   cycles for 20 * Pow2 fadd One
1479   cycles for 20 * Pow2 frndint
2755   cycles for 20 * PowX fist/fild
2820   cycles for 20 * PowX frndint

1325   cycles for 20 * Pow2 fist/fild
1356   cycles for 20 * Pow2 fadd One
1636   cycles for 20 * Pow2 frndint
2774   cycles for 20 * PowX fist/fild
2810   cycles for 20 * PowX frndint

1325   cycles for 20 * Pow2 fist/fild
1352   cycles for 20 * Pow2 fadd One
1644   cycles for 20 * Pow2 frndint
2595   cycles for 20 * PowX fist/fild
2794   cycles for 20 * PowX frndint

1494   cycles for 20 * Pow2 fist/fild
1509   cycles for 20 * Pow2 fadd One
1180   cycles for 20 * Pow2 frndint
2591   cycles for 20 * PowX fist/fild
2803   cycles for 20 * PowX frndint

44   bytes for Pow2 fist/fild
38   bytes for Pow2 fadd One
36   bytes for Pow2 frndint
50   bytes for PowX fist/fild
47   bytes for PowX frndint

--- ok ---

Gunther

Title: Re: 2^x timings
Post by: FORTRANS on April 11, 2013, 01:46:09 AM

P-III

Code Select


pre-P4 (SSE1)
loop overhead is approx. 48/20 cycles

2665	cycles for 20 * Pow2 fist/fild
2652	cycles for 20 * Pow2 fadd One
2717	cycles for 20 * Pow2 frndint
4829	cycles for 20 * PowX fist/fild
5023	cycles for 20 * PowX frndint

2664	cycles for 20 * Pow2 fist/fild
2658	cycles for 20 * Pow2 fadd One
2711	cycles for 20 * Pow2 frndint
4833	cycles for 20 * PowX fist/fild
5017	cycles for 20 * PowX frndint

2670	cycles for 20 * Pow2 fist/fild
2652	cycles for 20 * Pow2 fadd One
2712	cycles for 20 * Pow2 frndint
4835	cycles for 20 * PowX fist/fild
5028	cycles for 20 * PowX frndint

2664	cycles for 20 * Pow2 fist/fild
2653	cycles for 20 * Pow2 fadd One
2711	cycles for 20 * Pow2 frndint
4837	cycles for 20 * PowX fist/fild
5026	cycles for 20 * PowX frndint

44	bytes for Pow2 fist/fild
38	bytes for Pow2 fadd One
36	bytes for Pow2 frndint
50	bytes for PowX fist/fild
47	bytes for PowX frndint


--- ok ---

Title: Re: 2^x timings
Post by: qWord on April 11, 2013, 03:26:59 AM

it may be also interesting to replace FSCALE by non-FPU instructions, because it does nothing more than st(0)*2^rndint(st(1)). This could be replaced by code that directly manipulates the exponent field (of value = 1.0) to get 2^rndint(st(1)). Even we already have the rounded value of st(1) as integer...

Title: Re: 2^x timings
Post by: dedndave on April 11, 2013, 05:46:16 AM

i was playing with a little code to do that
and, while it may be simple to manipulate directly in 99.99 % of the cases, :P
there are those special cases where you need a bunch of if/else statements to handle properly

Title: Re: 2^x timings
Post by: qWord on April 11, 2013, 06:56:58 AM

Quote from: dedndave on April 11, 2013, 05:46:16 AM
i was playing with a little code to do that
and, while it may be simple to manipulate directly in 99.99 % of the cases, :P
there are those special cases where you need a bunch of if/else statements to handle properly

the following code should do it in all cases of valid input:

Code Select

; calc: 2^x = 2^(a+b) = 2^a*2^b = 2^fract_part(st0)*2^int_part(st0)
; In:   st0 == exponent
; Out:  st0 == 2^st0
pow2 proc
LOCAL r10[3]:DWORD
LOCAL exp:SDWORD

    mov eax,3fffh
    fist exp
    fisub exp
    add eax,exp		; case: add 3fffh,-X ==> sub 3fffh,X
    jle @err1		; underflow
    cmp eax,8000h
    jae @err2		; overflow
    mov r10[0],0
    mov r10[4],80000000h
    mov r10[8],eax
    f2xm1
    fadd FP4(1.0)
    fld REAL10 ptr r10
    fmulp st(1),st
    ret

@err1:
    fstp st(0)
    fld FP4(0.0)
    ret
@err2:
    fstp st(0)
    fld FP4(07F800000r) ; infinite
    ret
    
pow2 endp

EDIT: overflow detection was incomplete

Title: Re: 2^x timings
Post by: jj2007 on April 11, 2013, 07:48:51 AM

Looks very competitive, qWord - compliments :t
Would you mind adding it to MasmBasic, with proper acknowledgement, of course?

EDIT: Shaved off a cycle and a few bytes. See Reply#16 for attachment.

Title: Re: 2^x timings
Post by: qWord on April 11, 2013, 08:08:48 AM

Quote from: jj2007 on April 11, 2013, 07:48:51 AM
Looks very competitive, qWord - compliments :t
Would you mind adding it to MasmBasic, with proper acknowledgement, of course?

I'm afraid Agner Fog had found that long before me ;-D

Code Select

Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
loop overhead is approx. 64/20 cycles

1194    cycles for 20 * Pow2 fist/fild
1118    cycles for 20 * Pow2 fadd One
1189    cycles for 20 * Pow2 frndint
2202    cycles for 20 * PowX fist/fild
754     cycles for 20 * pow qWord

885     cycles for 20 * Pow2 fist/fild
1011    cycles for 20 * Pow2 fadd One
1240    cycles for 20 * Pow2 frndint
2100    cycles for 20 * PowX fist/fild
752     cycles for 20 * pow qWord

1085    cycles for 20 * Pow2 fist/fild
1118    cycles for 20 * Pow2 fadd One
1354    cycles for 20 * Pow2 frndint
2147    cycles for 20 * PowX fist/fild
925     cycles for 20 * pow qWord

877     cycles for 20 * Pow2 fist/fild
894     cycles for 20 * Pow2 fadd One
1208    cycles for 20 * Pow2 frndint
2103    cycles for 20 * PowX fist/fild
1006    cycles for 20 * pow qWord

44      bytes for Pow2 fist/fild
38      bytes for Pow2 fadd One
36      bytes for Pow2 frndint
50      bytes for PowX fist/fild
96      bytes for pow qWord


--- ok ---

Title: Re: 2^x timings
Post by: jj2007 on April 11, 2013, 08:16:29 AM

Quote from: qWord on April 11, 2013, 08:08:48 AM
I'm afraid Agner Fog had found that long before me ;-D

Just found bitRAKE's version (http://www.asmcommunity.net/board/index.php?PHPSESSID=cf1081402952c2c1ed7f56c24f03a5ba&topic=2979.msg30586#msg30586) - he quotes Agner.

Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
loop overhead is approx. 38/20 cycles

2647 cycles for 20 * Pow2 fist/fild
1399 cycles for 20 * pow qWord/Agner

2654 cycles for 20 * Pow2 fist/fild
1399 cycles for 20 * pow qWord/Agner

2645 cycles for 20 * Pow2 fist/fild
1406 cycles for 20 * pow qWord/Agner

44 bytes for Pow2 fist/fild
90 bytes for pow qWord/Agner

Title: Re: 2^x timings
Post by: Antariy on April 11, 2013, 02:41:47 PM

Hi, Jochen :t

Code Select


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
loop overhead is approx. 68/20 cycles

6315    cycles for 20 * Pow2 fist/fild
4175    cycles for 20 * pow qWord/Agner

6285    cycles for 20 * Pow2 fist/fild
3900    cycles for 20 * pow qWord/Agner

6479    cycles for 20 * Pow2 fist/fild
4086    cycles for 20 * pow qWord/Agner

44      bytes for Pow2 fist/fild
90      bytes for pow qWord/Agner


--- ok ---

Title: Re: 2^x timings
Post by: jj2007 on April 11, 2013, 03:22:09 PM

Thanks, Alex :icon14:

Not much difference for your Celeron, it seems ;-)

Title: Re: 2^x timings
Post by: sinsi on April 11, 2013, 03:43:42 PM

AMD Phenom(tm) II X6 1100T Processor (SSE3)
loop overhead is approx. 71/20 cycles
837 cycles for 20 * Pow2 fist/fild
1054 cycles for 20 * pow qWord/Agner

Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz (SSE4)
loop overhead is approx. 55/20 cycles
1227 cycles for 20 * Pow2 fist/fild
1065 cycles for 20 * pow qWord/Agner

Intel(R) Core(TM)2 Duo CPU T8100 @ 2.10GHz (SSE4)
loop overhead is approx. 37/20 cycles
2394 cycles for 20 * Pow2 fist/fild
1249 cycles for 20 * pow qWord/Agner

Title: Re: 2^x timings
Post by: TouEnMasm on April 11, 2013, 03:58:30 PM

Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
loop overhead is approx. 53/20 cycles

5799 cycles for 20 * Pow2 fist/fild
3774 cycles for 20 * pow qWord/Agner

5806 cycles for 20 * Pow2 fist/fild
3794 cycles for 20 * pow qWord/Agner

5801 cycles for 20 * Pow2 fist/fild
3772 cycles for 20 * pow qWord/Agner

44 bytes for Pow2 fist/fild
90 bytes for pow qWord/Agner

--- ok ---

Title: Re: 2^x timings
Post by: jj2007 on April 11, 2013, 04:34:56 PM

Thanks, John and Yves. Apparently the algo doesn't like AMD...

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 78/20 cycles

913 cycles for 20 * Pow2 fist/fild
1161 cycles for 20 * pow qWord/Agner

Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz (SSE4)
loop overhead is approx. 69/20 cycles

1090 cycles for 20 * Pow2 fist/fild
813 cycles for 20 * pow qWord/Agner

Title: Re: 2^x timings
Post by: sinsi on April 11, 2013, 04:49:43 PM

>Apparently the algo doesn't like AMD...
I sometimes feel left out amid all these intel cpus.
Plenty of timing code seems to be the opposite for my amd, probably because everyone else is intel.

Title: Re: 2^x timings
Post by: Magnum on April 11, 2013, 09:36:04 PM

Are there instructions that don't work on AMD's or in a non-standard way ?

I used to have a K-6 myself.

Andy

Title: Re: 2^x timings
Post by: Antariy on April 11, 2013, 11:55:28 PM

Jochen, what if add invoke SetProcessAffinityMask,1 into the init of the program? Maybe on mostly multicore AMD CPUs it just gets switch the thread from one core to another?

Title: Re: 2^x timings
Post by: jj2007 on April 12, 2013, 12:58:59 AM

Quote from: Antariy on April 11, 2013, 11:55:28 PM
Jochen, what if add invoke SetProcessAffinityMask,1 into the init of the program? Maybe on mostly multicore AMD CPUs it just gets switch the thread from one core to another?

Alex,
I have been scratching my head all the time why the timings were so volatile on some CPUs... thanks for reminding me of SetProcessAffinityMask :t
I was deeply convinced that I had set it somewhere but nope, it just wasn't included :redface:

The good news is it's now included, see attachment.
The bad news is it doesn't make AMD any faster :icon_mrgreen:

Title: Re: 2^x timings
Post by: dedndave on April 12, 2013, 02:10:37 AM

Quotethanks for reminding me of SetProcessAffinityMask

:icon_eek:
from the guy who has probably written more timing tests than anyone else - lol
they could probably run a little longer, too
1) select a single core
2) use Sleep,500 after that (or more) to bind and settle
3) try to make each test use about 0.5 seconds

prescott w/htt

Code Select

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 58/20 cycles

5899    cycles for 20 * Pow2 fist/fild
4082    cycles for 20 * pow qWord/Agner

5930    cycles for 20 * Pow2 fist/fild
4049    cycles for 20 * pow qWord/Agner

5925    cycles for 20 * Pow2 fist/fild
4096    cycles for 20 * pow qWord/Agner

Title: Re: 2^x timings
Post by: Gunther on April 12, 2013, 05:24:38 AM

Jochen,

results from Pow2Timings3.zip:

Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 76/20 cycles

1029   cycles for 20 * Pow2 fist/fild
1503   cycles for 20 * pow qWord/Agner

1027   cycles for 20 * Pow2 fist/fild
1726   cycles for 20 * pow qWord/Agner

1639   cycles for 20 * Pow2 fist/fild
899   cycles for 20 * pow qWord/Agner

44   bytes for Pow2 fist/fild
90   bytes for pow qWord/Agner

--- ok ---

Gunther

Title: Re: 2^x timings
Post by: Antariy on April 12, 2013, 02:12:06 PM

Hi Jochen!

Quote from: jj2007 on April 12, 2013, 12:58:59 AM
I have been scratching my head all the time why the timings were so volatile on some CPUs... thanks for reminding me of SetProcessAffinityMask :t
I was deeply convinced that I had set it somewhere but nope, it just wasn't included :redface:

The good news is it's now included, see attachment.
The bad news is it doesn't make AMD any faster :icon_mrgreen:

:redface:

At least the timings now will not jump like crazy rabbits in other multicore-affected, and not only, tests where testbed could be used - you made it as very comprehensive template :t

Title: Re: 2^x timings
Post by: jj2007 on April 15, 2013, 08:48:17 AM

Quote from: Antariy on April 12, 2013, 02:12:06 PM
At least the timings now will not jump like crazy rabbits in other multicore-affected, and not only, tests where testbed could be used - you made it a very comprehensive template :t

Thanks, Alex.

Exp10, Exp2, ExpE and ExpXY are now implemented, see here (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1192).

Title: Re: 2^x timings
Post by: Gunther on April 16, 2013, 01:11:55 AM

Jochen,

Quote from: jj2007 on April 15, 2013, 08:48:17 AM
Exp10, Exp2, ExpE and ExpXY are now implemented, see here (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1192).

rock solid work. I've checked it. :t

Gunther

The MASM Forum

General => The Laboratory => Topic started by: jj2007 on April 10, 2013, 10:30:23 PM