News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

2^x timings

Started by jj2007, April 10, 2013, 10:30:23 PM

Previous topic - Next topic

jj2007

Hi folks,
Could I please have some timings for these Y=2^x algos?
Thanks, JJ

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 79/20 cycles

913     cycles for 20 * Pow2 fist/fild
907     cycles for 20 * Pow2 fadd One
1500    cycles for 20 * Pow2 frndint

anta40

#1
Updated:

Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz (SSE4)
loop overhead is approx. 30/20 cycles

1037    cycles for 20 * Pow2 fist/fild
1039    cycles for 20 * Pow2 fadd One
1183    cycles for 20 * Pow2 frndint
2301    cycles for 20 * PowX fist/fild
2429    cycles for 20 * PowX frndint

1071    cycles for 20 * Pow2 fist/fild
1046    cycles for 20 * Pow2 fadd One
1153    cycles for 20 * Pow2 frndint
2544    cycles for 20 * PowX fist/fild
2740    cycles for 20 * PowX frndint

1343    cycles for 20 * Pow2 fist/fild
1202    cycles for 20 * Pow2 fadd One
1448    cycles for 20 * Pow2 frndint
2600    cycles for 20 * PowX fist/fild
2438    cycles for 20 * PowX frndint

1295    cycles for 20 * Pow2 fist/fild
1322    cycles for 20 * Pow2 fadd One
1452    cycles for 20 * Pow2 frndint
2775    cycles for 20 * PowX fist/fild
2624    cycles for 20 * PowX frndint

44      bytes for Pow2 fist/fild
38      bytes for Pow2 fadd One
36      bytes for Pow2 frndint
50      bytes for PowX fist/fild
47      bytes for PowX frndint


--- ok --

Gunther

Jochen,

here are my results:


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 77/20 cycles

1036 cycles for 20 * Pow2 fist/fild
1053 cycles for 20 * Pow2 fadd One
24582 cycles for 20 * Pow2 frndint

1027 cycles for 20 * Pow2 fist/fild
1054 cycles for 20 * Pow2 fadd One
24499 cycles for 20 * Pow2 frndint

1026 cycles for 20 * Pow2 fist/fild
1351 cycles for 20 * Pow2 fadd One
24550 cycles for 20 * Pow2 frndint

44 bytes for Pow2 fist/fild
38 bytes for Pow2 fadd One
34 bytes for Pow2 frndint

--- ok ---


By the way: well done.  :t

Gunther
You have to know the facts before you can distort them.

Magnum

Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz (SSE4)
loop overhead is approx. 68/20 cycles

2296   cycles for 20 * Pow2 fist/fild
2324   cycles for 20 * Pow2 fadd One
14751   cycles for 20 * Pow2 frndint

2298   cycles for 20 * Pow2 fist/fild
2300   cycles for 20 * Pow2 fadd One
14857   cycles for 20 * Pow2 frndint

2325   cycles for 20 * Pow2 fist/fild
2295   cycles for 20 * Pow2 fadd One
14731   cycles for 20 * Pow2 frndint

44   bytes for Pow2 fist/fild
38   bytes for Pow2 fadd One
34   bytes for Pow2 frndint
Take care,
                   Andy

Ubuntu-mate-18.04-desktop-amd64

http://www.goodnewsnetwork.org

FORTRANS

Hi,

   P-III, others if wanted.


pre-P4 (SSE1)
loop overhead is approx. 48/20 cycles

2665    cycles for 20 * Pow2 fist/fild
2654    cycles for 20 * Pow2 fadd One
8635    cycles for 20 * Pow2 frndint

2664    cycles for 20 * Pow2 fist/fild
2655    cycles for 20 * Pow2 fadd One
8635    cycles for 20 * Pow2 frndint

2664    cycles for 20 * Pow2 fist/fild
2650    cycles for 20 * Pow2 fadd One
8635    cycles for 20 * Pow2 frndint

44      bytes for Pow2 fist/fild
38      bytes for Pow2 fadd One
34      bytes for Pow2 frndint


--- ok ---


Regards,

Steve N.

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 51/20 cycles

5718    cycles for 20 * Pow2 fist/fild
5759    cycles for 20 * Pow2 fadd One
78540   cycles for 20 * Pow2 frndint

5740    cycles for 20 * Pow2 fist/fild
5725    cycles for 20 * Pow2 fadd One
78548   cycles for 20 * Pow2 frndint

5732    cycles for 20 * Pow2 fist/fild
5716    cycles for 20 * Pow2 fadd One
79000   cycles for 20 * Pow2 frndint


i don't know what's in frndint, but it doesn't like P4's - lol

sinsi


AMD Phenom(tm) II X6 1100T Processor (SSE3)
loop overhead is approx. 75/20 cycles

895     cycles for 20 * Pow2 fist/fild
890     cycles for 20 * Pow2 fadd One
1465    cycles for 20 * Pow2 frndint


Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz (SSE4)
loop overhead is approx. 39/20 cycles

1254    cycles for 20 * Pow2 fist/fild
1279    cycles for 20 * Pow2 fadd One
28154   cycles for 20 * Pow2 frndint

jj2007

Quote from: dedndave on April 11, 2013, 12:26:50 AM
i don't know what's in frndint, but it doesn't like P4's - lol

Could be related to the fact that I forgot one fld st in that algo :redface:

New version attached on top of this thread. My apologies for having wasted your time. To compensate, I added z=x^y to the list - see PowX.

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)

912     cycles for 20 * Pow2 fist/fild
980     cycles for 20 * Pow2 fadd One
1041    cycles for 20 * Pow2 frndint
4059    cycles for 20 * PowX fist/fild
3982    cycles for 20 * PowX frndint

841     cycles for 20 * Pow2 fist/fild
906     cycles for 20 * Pow2 fadd One
1039    cycles for 20 * Pow2 frndint
4054    cycles for 20 * PowX fist/fild
4065    cycles for 20 * PowX frndint

dedndave

much better   :biggrin:

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 47/20 cycles

5724    cycles for 20 * Pow2 fist/fild
5745    cycles for 20 * Pow2 fadd One
5932    cycles for 20 * Pow2 frndint
8048    cycles for 20 * PowX fist/fild
8082    cycles for 20 * PowX frndint

5739    cycles for 20 * Pow2 fist/fild
5715    cycles for 20 * Pow2 fadd One
5947    cycles for 20 * Pow2 frndint
8371    cycles for 20 * PowX fist/fild
8105    cycles for 20 * PowX frndint

5737    cycles for 20 * Pow2 fist/fild
5759    cycles for 20 * Pow2 fadd One
5924    cycles for 20 * Pow2 frndint
8060    cycles for 20 * PowX fist/fild
8094    cycles for 20 * PowX frndint

5748    cycles for 20 * Pow2 fist/fild
5736    cycles for 20 * Pow2 fadd One
5930    cycles for 20 * Pow2 frndint
8021    cycles for 20 * PowX fist/fild
8116    cycles for 20 * PowX frndint

Gunther

Jochen,

never mind; no need for excuses. Here are the new timings:

Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 78/20 cycles

1481   cycles for 20 * Pow2 fist/fild
1505   cycles for 20 * Pow2 fadd One
1479   cycles for 20 * Pow2 frndint
2755   cycles for 20 * PowX fist/fild
2820   cycles for 20 * PowX frndint

1325   cycles for 20 * Pow2 fist/fild
1356   cycles for 20 * Pow2 fadd One
1636   cycles for 20 * Pow2 frndint
2774   cycles for 20 * PowX fist/fild
2810   cycles for 20 * PowX frndint

1325   cycles for 20 * Pow2 fist/fild
1352   cycles for 20 * Pow2 fadd One
1644   cycles for 20 * Pow2 frndint
2595   cycles for 20 * PowX fist/fild
2794   cycles for 20 * PowX frndint

1494   cycles for 20 * Pow2 fist/fild
1509   cycles for 20 * Pow2 fadd One
1180   cycles for 20 * Pow2 frndint
2591   cycles for 20 * PowX fist/fild
2803   cycles for 20 * PowX frndint

44   bytes for Pow2 fist/fild
38   bytes for Pow2 fadd One
36   bytes for Pow2 frndint
50   bytes for PowX fist/fild
47   bytes for PowX frndint

--- ok ---

Gunther

You have to know the facts before you can distort them.

FORTRANS

P-III


pre-P4 (SSE1)
loop overhead is approx. 48/20 cycles

2665 cycles for 20 * Pow2 fist/fild
2652 cycles for 20 * Pow2 fadd One
2717 cycles for 20 * Pow2 frndint
4829 cycles for 20 * PowX fist/fild
5023 cycles for 20 * PowX frndint

2664 cycles for 20 * Pow2 fist/fild
2658 cycles for 20 * Pow2 fadd One
2711 cycles for 20 * Pow2 frndint
4833 cycles for 20 * PowX fist/fild
5017 cycles for 20 * PowX frndint

2670 cycles for 20 * Pow2 fist/fild
2652 cycles for 20 * Pow2 fadd One
2712 cycles for 20 * Pow2 frndint
4835 cycles for 20 * PowX fist/fild
5028 cycles for 20 * PowX frndint

2664 cycles for 20 * Pow2 fist/fild
2653 cycles for 20 * Pow2 fadd One
2711 cycles for 20 * Pow2 frndint
4837 cycles for 20 * PowX fist/fild
5026 cycles for 20 * PowX frndint

44 bytes for Pow2 fist/fild
38 bytes for Pow2 fadd One
36 bytes for Pow2 frndint
50 bytes for PowX fist/fild
47 bytes for PowX frndint


--- ok ---

qWord

it may be also interesting to replace FSCALE by non-FPU instructions, because it does nothing more than st(0)*2^rndint(st(1)). This could be replaced by code that directly manipulates the exponent field (of value = 1.0) to get 2^rndint(st(1)). Even we already have the rounded value of st(1) as integer...
MREAL macros - when you need floating point arithmetic while assembling!

dedndave

i was playing with a little code to do that
and, while it may be simple to manipulate directly in 99.99 % of the cases,   :P
there are those special cases where you need a bunch of if/else statements to handle properly

qWord

Quote from: dedndave on April 11, 2013, 05:46:16 AM
i was playing with a little code to do that
and, while it may be simple to manipulate directly in 99.99 % of the cases,   :P
there are those special cases where you need a bunch of if/else statements to handle properly
the following code should do it in all cases of valid input:
; calc: 2^x = 2^(a+b) = 2^a*2^b = 2^fract_part(st0)*2^int_part(st0)
; In:   st0 == exponent
; Out:  st0 == 2^st0
pow2 proc
LOCAL r10[3]:DWORD
LOCAL exp:SDWORD

    mov eax,3fffh
    fist exp
    fisub exp
    add eax,exp ; case: add 3fffh,-X ==> sub 3fffh,X
    jle @err1 ; underflow
    cmp eax,8000h
    jae @err2 ; overflow
    mov r10[0],0
    mov r10[4],80000000h
    mov r10[8],eax
    f2xm1
    fadd FP4(1.0)
    fld REAL10 ptr r10
    fmulp st(1),st
    ret

@err1:
    fstp st(0)
    fld FP4(0.0)
    ret
@err2:
    fstp st(0)
    fld FP4(07F800000r) ; infinite
    ret
   
pow2 endp


EDIT: overflow detection was incomplete
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

#14
Looks very competitive, qWord - compliments :t
Would you mind adding it to MasmBasic, with proper acknowledgement, of course?

EDIT: Shaved off a cycle and a few bytes. See Reply#16 for attachment.