Author Topic: 2^x timings  (Read 7837 times)

jj2007

  • Member
  • *****
  • Posts: 7542
  • Assembler is fun ;-)
    • MasmBasic
2^x timings
« on: April 10, 2013, 10:30:23 PM »
Hi folks,
Could I please have some timings for these Y=2^x algos?
Thanks, JJ

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 79/20 cycles

913     cycles for 20 * Pow2 fist/fild
907     cycles for 20 * Pow2 fadd One
1500    cycles for 20 * Pow2 frndint
« Last Edit: April 11, 2013, 12:53:49 AM by jj2007 »

anta40

  • Member
  • ***
  • Posts: 293
Re: 2^x timings
« Reply #1 on: April 10, 2013, 10:43:31 PM »
Updated:

Code: [Select]
Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz (SSE4)
loop overhead is approx. 30/20 cycles

1037    cycles for 20 * Pow2 fist/fild
1039    cycles for 20 * Pow2 fadd One
1183    cycles for 20 * Pow2 frndint
2301    cycles for 20 * PowX fist/fild
2429    cycles for 20 * PowX frndint

1071    cycles for 20 * Pow2 fist/fild
1046    cycles for 20 * Pow2 fadd One
1153    cycles for 20 * Pow2 frndint
2544    cycles for 20 * PowX fist/fild
2740    cycles for 20 * PowX frndint

1343    cycles for 20 * Pow2 fist/fild
1202    cycles for 20 * Pow2 fadd One
1448    cycles for 20 * Pow2 frndint
2600    cycles for 20 * PowX fist/fild
2438    cycles for 20 * PowX frndint

1295    cycles for 20 * Pow2 fist/fild
1322    cycles for 20 * Pow2 fadd One
1452    cycles for 20 * Pow2 frndint
2775    cycles for 20 * PowX fist/fild
2624    cycles for 20 * PowX frndint

44      bytes for Pow2 fist/fild
38      bytes for Pow2 fadd One
36      bytes for Pow2 frndint
50      bytes for PowX fist/fild
47      bytes for PowX frndint


--- ok --
« Last Edit: April 11, 2013, 02:28:09 AM by anta40 »

Gunther

  • Member
  • *****
  • Posts: 3515
  • Forgive your enemies, but never forget their names
Re: 2^x timings
« Reply #2 on: April 10, 2013, 11:04:27 PM »
Jochen,

here are my results:

Code: [Select]
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 77/20 cycles

1036 cycles for 20 * Pow2 fist/fild
1053 cycles for 20 * Pow2 fadd One
24582 cycles for 20 * Pow2 frndint

1027 cycles for 20 * Pow2 fist/fild
1054 cycles for 20 * Pow2 fadd One
24499 cycles for 20 * Pow2 frndint

1026 cycles for 20 * Pow2 fist/fild
1351 cycles for 20 * Pow2 fadd One
24550 cycles for 20 * Pow2 frndint

44 bytes for Pow2 fist/fild
38 bytes for Pow2 fadd One
34 bytes for Pow2 frndint

--- ok ---

By the way: well done.  :t

Gunther
Get your facts first, and then you can distort them.

Magnum

  • Member
  • *****
  • Posts: 2235
Re: 2^x timings
« Reply #3 on: April 10, 2013, 11:19:15 PM »
Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz (SSE4)
loop overhead is approx. 68/20 cycles

2296   cycles for 20 * Pow2 fist/fild
2324   cycles for 20 * Pow2 fadd One
14751   cycles for 20 * Pow2 frndint

2298   cycles for 20 * Pow2 fist/fild
2300   cycles for 20 * Pow2 fadd One
14857   cycles for 20 * Pow2 frndint

2325   cycles for 20 * Pow2 fist/fild
2295   cycles for 20 * Pow2 fadd One
14731   cycles for 20 * Pow2 frndint

44   bytes for Pow2 fist/fild
38   bytes for Pow2 fadd One
34   bytes for Pow2 frndint
Take care,
                   Andy

Ubuntu-mate-16.04-desktop-amd64

http://www.goodnewsnetwork.org

FORTRANS

  • Member
  • ****
  • Posts: 944
Re: 2^x timings
« Reply #4 on: April 10, 2013, 11:27:18 PM »
Hi,

   P-III, others if wanted.

Code: [Select]
pre-P4 (SSE1)
loop overhead is approx. 48/20 cycles

2665    cycles for 20 * Pow2 fist/fild
2654    cycles for 20 * Pow2 fadd One
8635    cycles for 20 * Pow2 frndint

2664    cycles for 20 * Pow2 fist/fild
2655    cycles for 20 * Pow2 fadd One
8635    cycles for 20 * Pow2 frndint

2664    cycles for 20 * Pow2 fist/fild
2650    cycles for 20 * Pow2 fadd One
8635    cycles for 20 * Pow2 frndint

44      bytes for Pow2 fist/fild
38      bytes for Pow2 fadd One
34      bytes for Pow2 frndint


--- ok ---

Regards,

Steve N.

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: 2^x timings
« Reply #5 on: April 11, 2013, 12:26:50 AM »
prescott w/htt
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 51/20 cycles

5718    cycles for 20 * Pow2 fist/fild
5759    cycles for 20 * Pow2 fadd One
78540   cycles for 20 * Pow2 frndint

5740    cycles for 20 * Pow2 fist/fild
5725    cycles for 20 * Pow2 fadd One
78548   cycles for 20 * Pow2 frndint

5732    cycles for 20 * Pow2 fist/fild
5716    cycles for 20 * Pow2 fadd One
79000   cycles for 20 * Pow2 frndint

i don't know what's in frndint, but it doesn't like P4's - lol

sinsi

  • Member
  • ****
  • Posts: 996
Re: 2^x timings
« Reply #6 on: April 11, 2013, 12:39:46 AM »

AMD Phenom(tm) II X6 1100T Processor (SSE3)
loop overhead is approx. 75/20 cycles

895     cycles for 20 * Pow2 fist/fild
890     cycles for 20 * Pow2 fadd One
1465    cycles for 20 * Pow2 frndint


Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz (SSE4)
loop overhead is approx. 39/20 cycles

1254    cycles for 20 * Pow2 fist/fild
1279    cycles for 20 * Pow2 fadd One
28154   cycles for 20 * Pow2 frndint
I can walk on water but stagger on beer.

jj2007

  • Member
  • *****
  • Posts: 7542
  • Assembler is fun ;-)
    • MasmBasic
Re: 2^x timings
« Reply #7 on: April 11, 2013, 12:53:11 AM »
i don't know what's in frndint, but it doesn't like P4's - lol

Could be related to the fact that I forgot one fld st in that algo :redface:

New version attached on top of this thread. My apologies for having wasted your time. To compensate, I added z=x^y to the list - see PowX.

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)

912     cycles for 20 * Pow2 fist/fild
980     cycles for 20 * Pow2 fadd One
1041    cycles for 20 * Pow2 frndint
4059    cycles for 20 * PowX fist/fild
3982    cycles for 20 * PowX frndint

841     cycles for 20 * Pow2 fist/fild
906     cycles for 20 * Pow2 fadd One
1039    cycles for 20 * Pow2 frndint
4054    cycles for 20 * PowX fist/fild
4065    cycles for 20 * PowX frndint

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: 2^x timings
« Reply #8 on: April 11, 2013, 01:18:18 AM »
much better   :biggrin:

prescott w/htt
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 47/20 cycles

5724    cycles for 20 * Pow2 fist/fild
5745    cycles for 20 * Pow2 fadd One
5932    cycles for 20 * Pow2 frndint
8048    cycles for 20 * PowX fist/fild
8082    cycles for 20 * PowX frndint

5739    cycles for 20 * Pow2 fist/fild
5715    cycles for 20 * Pow2 fadd One
5947    cycles for 20 * Pow2 frndint
8371    cycles for 20 * PowX fist/fild
8105    cycles for 20 * PowX frndint

5737    cycles for 20 * Pow2 fist/fild
5759    cycles for 20 * Pow2 fadd One
5924    cycles for 20 * Pow2 frndint
8060    cycles for 20 * PowX fist/fild
8094    cycles for 20 * PowX frndint

5748    cycles for 20 * Pow2 fist/fild
5736    cycles for 20 * Pow2 fadd One
5930    cycles for 20 * Pow2 frndint
8021    cycles for 20 * PowX fist/fild
8116    cycles for 20 * PowX frndint

Gunther

  • Member
  • *****
  • Posts: 3515
  • Forgive your enemies, but never forget their names
Re: 2^x timings
« Reply #9 on: April 11, 2013, 01:27:45 AM »
Jochen,

never mind; no need for excuses. Here are the new timings:

Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 78/20 cycles

1481   cycles for 20 * Pow2 fist/fild
1505   cycles for 20 * Pow2 fadd One
1479   cycles for 20 * Pow2 frndint
2755   cycles for 20 * PowX fist/fild
2820   cycles for 20 * PowX frndint

1325   cycles for 20 * Pow2 fist/fild
1356   cycles for 20 * Pow2 fadd One
1636   cycles for 20 * Pow2 frndint
2774   cycles for 20 * PowX fist/fild
2810   cycles for 20 * PowX frndint

1325   cycles for 20 * Pow2 fist/fild
1352   cycles for 20 * Pow2 fadd One
1644   cycles for 20 * Pow2 frndint
2595   cycles for 20 * PowX fist/fild
2794   cycles for 20 * PowX frndint

1494   cycles for 20 * Pow2 fist/fild
1509   cycles for 20 * Pow2 fadd One
1180   cycles for 20 * Pow2 frndint
2591   cycles for 20 * PowX fist/fild
2803   cycles for 20 * PowX frndint

44   bytes for Pow2 fist/fild
38   bytes for Pow2 fadd One
36   bytes for Pow2 frndint
50   bytes for PowX fist/fild
47   bytes for PowX frndint

--- ok ---

Gunther

Get your facts first, and then you can distort them.

FORTRANS

  • Member
  • ****
  • Posts: 944
Re: 2^x timings
« Reply #10 on: April 11, 2013, 01:46:09 AM »
P-III

Code: [Select]
pre-P4 (SSE1)
loop overhead is approx. 48/20 cycles

2665 cycles for 20 * Pow2 fist/fild
2652 cycles for 20 * Pow2 fadd One
2717 cycles for 20 * Pow2 frndint
4829 cycles for 20 * PowX fist/fild
5023 cycles for 20 * PowX frndint

2664 cycles for 20 * Pow2 fist/fild
2658 cycles for 20 * Pow2 fadd One
2711 cycles for 20 * Pow2 frndint
4833 cycles for 20 * PowX fist/fild
5017 cycles for 20 * PowX frndint

2670 cycles for 20 * Pow2 fist/fild
2652 cycles for 20 * Pow2 fadd One
2712 cycles for 20 * Pow2 frndint
4835 cycles for 20 * PowX fist/fild
5028 cycles for 20 * PowX frndint

2664 cycles for 20 * Pow2 fist/fild
2653 cycles for 20 * Pow2 fadd One
2711 cycles for 20 * Pow2 frndint
4837 cycles for 20 * PowX fist/fild
5026 cycles for 20 * PowX frndint

44 bytes for Pow2 fist/fild
38 bytes for Pow2 fadd One
36 bytes for Pow2 frndint
50 bytes for PowX fist/fild
47 bytes for PowX frndint


--- ok ---

qWord

  • Member
  • *****
  • Posts: 1454
  • The base type of a type is the type itself
    • SmplMath macros
Re: 2^x timings
« Reply #11 on: April 11, 2013, 03:26:59 AM »
it may be also interesting to replace FSCALE by non-FPU instructions, because it does nothing more than st(0)*2^rndint(st(1)). This could be replaced by code that directly manipulates the exponent field (of value = 1.0) to get 2^rndint(st(1)). Even we already have the rounded value of st(1) as integer...
MREAL macros - when you need floating point arithmetic while assembling!

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: 2^x timings
« Reply #12 on: April 11, 2013, 05:46:16 AM »
i was playing with a little code to do that
and, while it may be simple to manipulate directly in 99.99 % of the cases,   :P
there are those special cases where you need a bunch of if/else statements to handle properly

qWord

  • Member
  • *****
  • Posts: 1454
  • The base type of a type is the type itself
    • SmplMath macros
Re: 2^x timings
« Reply #13 on: April 11, 2013, 06:56:58 AM »
i was playing with a little code to do that
and, while it may be simple to manipulate directly in 99.99 % of the cases,   :P
there are those special cases where you need a bunch of if/else statements to handle properly
the following code should do it in all cases of valid input:
Code: [Select]
; calc: 2^x = 2^(a+b) = 2^a*2^b = 2^fract_part(st0)*2^int_part(st0)
; In:   st0 == exponent
; Out:  st0 == 2^st0
pow2 proc
LOCAL r10[3]:DWORD
LOCAL exp:SDWORD

    mov eax,3fffh
    fist exp
    fisub exp
    add eax,exp ; case: add 3fffh,-X ==> sub 3fffh,X
    jle @err1 ; underflow
    cmp eax,8000h
    jae @err2 ; overflow
    mov r10[0],0
    mov r10[4],80000000h
    mov r10[8],eax
    f2xm1
    fadd FP4(1.0)
    fld REAL10 ptr r10
    fmulp st(1),st
    ret

@err1:
    fstp st(0)
    fld FP4(0.0)
    ret
@err2:
    fstp st(0)
    fld FP4(07F800000r) ; infinite
    ret
   
pow2 endp

EDIT: overflow detection was incomplete
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

  • Member
  • *****
  • Posts: 7542
  • Assembler is fun ;-)
    • MasmBasic
Re: 2^x timings
« Reply #14 on: April 11, 2013, 07:48:51 AM »
Looks very competitive, qWord - compliments :t
Would you mind adding it to MasmBasic, with proper acknowledgement, of course?

EDIT: Shaved off a cycle and a few bytes. See Reply#16 for attachment.
« Last Edit: April 11, 2013, 09:30:03 AM by jj2007 »