News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Fast Exp approximation

Started by guga, August 12, 2020, 12:29:23 AM

Previous topic - Next topic

jj2007

Quote from: Siekmanski on August 14, 2020, 08:32:19 AM
Hi Jochen,
This is cool stuff or not?

Yessss :thumbsup:

Quote from: Siekmanski on August 14, 2020, 08:48:42 AM
You calculated Exp2

Exp2(0.5) is equal to sqrt(2.0)

Indeed. Normally I use Print Str$("2^0.5=%Jf\n", Exp2(0.5)v). Output: 2^0.5=1.414213562373095049  :smiley:

guga

Quote from: Siekmanski on August 14, 2020, 07:08:51 AM
2**x is just the notation for the Exponent function, the second one is the weight function.
Just as you, I have to find my way in this LolRemez program.  :thumbsup:

The example -1 to 1 is often used in audio.
You stay always in that range, so you don't have compensation code for numbers outside that range.

I just tried to get the polynomials I used in my SSE2 example.
They are almost the same.
I will use the new ones. ( should be more precise since LolRemez has evolved to version 0.6  :cool:)

lolremez -d 5 -r 0:1 "2**x" "2**x"

// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 5.
// p(x)=((((1.8775766751914795e-3*x+8.9893400904946651e-3)*x+5.5826318053295672e
-2)*x+2.401536170443754e-1)*x+6.9315307320016896e-1)*x+9.9999992506352618e-1
double f(double x)
{
    double u = 1.8775766751914795e-3;
    u = u * x + 8.9893400904946651e-3;
    u = u * x + 5.5826318053295672e-2;
    u = u * x + 2.401536170443754e-1;
    u = u * x + 6.9315307320016896e-1;
    return u * x + 9.9999992506352618e-1;


Hi Marinus. You´re welcome  :thumbsup: :thumbsup: :thumbsup:

Btw,, i still don´t get this app to work properly.

I tried the 10th term and did this:
lolremez -d 10 -r 0:1 "2**x" "2**x"

It created this polynomials


// p(x)=(((((((((9.9522144894077596e-9*x+9.4609455637620191e-8)*x+1.331271998884894e-6)*x+1.5244659656760603e-5)*x+1.5403957490414841e-4)*x+1.3333543730974749e-3)*x+9.6181294091749148e-3)*x+5.5504108628244985e-2)*x+2.402265069613608e-1)*x+6.9314718055989127e-1)*x+1.0000000000000002
double f(double x)
{
    double u = 9.9522144894077596e-9;
    u = u * x + 9.4609455637620191e-8;
    u = u * x + 1.331271998884894e-6;
    u = u * x + 1.5244659656760603e-5;
    u = u * x + 1.5403957490414841e-4;
    u = u * x + 1.3333543730974749e-3;
    u = u * x + 9.6181294091749148e-3;
    u = u * x + 5.5504108628244985e-2;
    u = u * x + 2.402265069613608e-1;
    u = u * x + 6.9314718055989127e-1;
    return u * x + 1.0000000000000002;
}




But, when i tried to put them onto a function, it failed miserably. I don´t understand  what i´m doing wrong.   All i did was:


[<16 Cheby10Exp0: R$ 9.9522144894077596e-9, R$ 9.9522144894077596e-9]
[<16 Cheby10Exp1: R$ 9.4609455637620191e-8, R$ 9.4609455637620191e-8]
[<16 Cheby10Exp2: R$ 1.331271998884894e-6, R$ 1.331271998884894e-6]
[<16 Cheby10Exp3: R$ 1.5244659656760603e-5, R$ 1.5244659656760603e-5]
[<16 Cheby10Exp4: R$ 1.5403957490414841e-4, R$ 1.5403957490414841e-4]
[<16 Cheby10Exp5: R$ 1.3333543730974749e-3, R$ 1.3333543730974749e-3]
[<16 Cheby10Exp6: R$ 9.6181294091749148e-3, R$ 9.6181294091749148e-3]
[<16 Cheby10Exp7: R$ 5.5504108628244985e-2, R$ 5.5504108628244985e-2]
[<16 Cheby10Exp8: R$ 2.402265069613608e-1, R$ 2.402265069613608e-1]
[<16 Cheby10Exp9: R$ 6.9314718055989127e-1, R$ 6.9314718055989127e-1]
[<16 Cheby10Exp10: R$ 1.0000000000000002, R$ 1.0000000000000002]


Proc SSE2_Exp2::
    Arguments @Number, @Flag

    mov eax D@Flag
    Test_if eax SSE_EXP_INT
        cvtsi2ss xmm1 D@Number ; converts a signed integer to double
    Test_Else_if eax SSE_EXP_FLOAT
        ;movss xmm1 D@Number ; converts a single precision float to double
        cvtss2sd xmm1 X@Number ; converts a single precision float to double
    Test_Else_if eax SSE_EXP_REAL8
        mov eax D@Number | movss XMM1 X$eax
    Test_Else_if eax SSE_EXP_QWORD
        mov eax D@Number | movq XMM1 X$eax
    Test_Else
        xor eax eax | ExitP ; return 0 Invalid parameter
    Test_End



    movupd xmm0 X$Cheby10Exp0
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp1
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp2
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp3
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp4
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp5
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp6
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp7
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp8
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp9
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp10

EndP


It was supposed to work, because all the values and way to calculate as equal to the C version, right ?

I tried calculating exp of 5, but instead resulting in:

148.41315910257660342111558004055227962348766759387898904675284511

It returned:
31.987334657819531
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

 :biggrin:

you created an Exp2 function.
Exp function calculates the value of e (the base of natural logarithms) raised to the power of x.
See the version I posted before.
I have to go to bed now, tomorrow I'll show you how to implement the 10th degree polynomials.
Creative coders use backward thinking techniques as a strategy.

guga

Thanks to dodicat (From Freebasic forum), we also have a a FreeBasic conversion of this LOlRemex app.

Can someone, please port it to asm ?

"Compile the file poly.bas to .exe
Click it to get the command prompt.
It runs thus:
At the prompt something like:

poly "-6,8,cos(x)*sin(x)"
Remember to use the double quotes as above!
meaning from x = -6 to 8 the function cos(x)*sin(x) is approximated.
Press the spacebar to build up the approximation until you are happy that the curves are similar.
The standard way of showing the approximation coefficients is shown on the console, but you have the option (upon exit by pressing <esc>), to save to a .bas file, the nested method, which is much faster arithmetically for the computer."

Reference: https://www.freebasic.net/forum/viewtopic.php?t=25897
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

With your suggested input lolremez.exe poly "-6,8,cos(x)*sin(x)" it says input error.
With lolremez.exe "poly -6,8,cos(x)*sin(x)" it says EXPECTED (, got '-' but proceeds afterwards with nice graphs.

guga

Quote from: jj2007 on August 14, 2020, 06:58:05 PM
With your suggested input lolremez.exe poly "-6,8,cos(x)*sin(x)" it says input error.
With lolremez.exe "poly -6,8,cos(x)*sin(x)" it says EXPECTED (, got '-' but proceeds afterwards with nice graphs.

Tks a lot JJ. I saw your post on the original forum and gave another try and it compiled ok  :azn: :azn: :azn: I saw the graphics and it is really nice.


The command works without this "poly" token

I used as:
LolRemez "-3,3,x^3-x^2"
pause





It also exports a function to be compiled in freebasic containing the polynomials :)



I tried also with

LolRemez "0,1,exp(x)"

And it worked, but exported somethign i failed to understand. It exported this:

(((((((((((((((((((((((((((((((+0.04829789212904446)*x-0.1792981145327924)*x-0.6677612347982678)*x+5.893146189666752)*x _
-17.1355277003133)*x _
+26.0763728821472)*x _
-19.69798344538533)*x _
+0.9900030668437522)*x _
+10.30400147926733)*x _
-8.096178275550896)*x _
+11.14649562485596)*x _
-31.20839952020678)*x _
+51.6443661394669)*x _
-51.27397166179219)*x _
+31.83357261868321)*x _
-10.78621348361321)*x _
-0.4973412406197402)*x _
+2.927004122935927)*x _
-1.894224695114594)*x _
+0.741826698340469)*x _
-0.2034182656840364)*x _
+0.04056360695256523)*x _
-0.005880298181518774)*x _
+0.0008169520352547144)*x _
+0.001343698483337935)*x _
+0.008335523163337043)*x _
+0.04166660157481193)*x _
+0.1666666677131286)*x _
+0.4999999999927096)*x _
+1.000000000000015)*x _
+1) _


All of these polynomials are to be used ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

Quote from: guga on August 14, 2020, 09:22:32 PMThe command works without this "poly" token

I'm not sure about that - the output looks a lot different.

guga

I think that on each key you press, it is equivalent to one iteration on the original lolremez. I gave a try on the freebasic version with

LolRemez "0,1,exp(x)"

and hit 11 times the key until it produces a error of something around e-11. It produces these values:

(((((((((((+4.567028385771292e-007)*x+2.288390713140263e-006)*x+2.545465440196621e-005)*x _
+0.0001978554379130849)*x _
+0.001389191775841137)*x _
+0.008333228132164795)*x _
+0.04166668937151426)*x _
+0.1666666638148875)*x _
+0.5000000001833158)*x _
+0.9999999999954152)*x _
+1.000000000000019) _


I´ll give a try on them to see if it works, but i´m clueless how to use them yet. Better would be wait until Marinus wake up and explain to us how to use this lolremez app.

The graphics are very handy, btw, specially because we can actually see how much of the error we want to get rid off.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Ok, i gave a try on the generated values from dodicat FreeBasic version. On 10 iterations, it produced the values i used below. But the result completely lacks accuracy

[<16 Cheby10Exp0: R$ 4.567028385771292e-007]
[<16 Cheby10Exp1: R$ 2.288390713140263e-006]
[<16 Cheby10Exp2: R$ 2.545465440196621e-005]
[<16 Cheby10Exp3: R$ 0.0001978554379130849]
[<16 Cheby10Exp4: R$ 0.001389191775841137]
[<16 Cheby10Exp5: R$ 0.008333228132164795]
[<16 Cheby10Exp6: R$ 0.04166668937151426]
[<16 Cheby10Exp7: R$ 0.1666666638148875]
[<16 Cheby10Exp8: R$ 0.5000000001833158]
[<16 Cheby10Exp9: R$ 0.9999999999954152]
[<16 Cheby10Exp10: R$ 1.000000000000019]


Proc SSE2_Exp2::
    Arguments @Number, @Flag

    mov eax D@Flag
    Test_if eax SSE_EXP_INT
        cvtsi2ss xmm1 D@Number ; converts a signed integer to double
    Test_Else_if eax SSE_EXP_FLOAT
        ;movss xmm1 D@Number ; converts a single precision float to double
        cvtss2sd xmm1 X@Number ; converts a single precision float to double
    Test_Else_if eax SSE_EXP_REAL8
        mov eax D@Number | movss XMM1 X$eax
    Test_Else_if eax SSE_EXP_QWORD
        mov eax D@Number | movq XMM1 X$eax
    Test_Else
        xor eax eax | ExitP ; return 0 Invalid parameter
    Test_End

    movsd xmm0 X$Cheby10Exp0
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp1
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp2
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp3
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp4
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp5
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp6
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp7
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp8
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp9
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp10


EndP


I gave a try calculating the exp of 5. The correct value of exp^5 should be:
148.41315910257660342111558004055227962348766759387898904675284511

But this lolremez app returned to me:
147.452655481459771

(Similar result as the original lolremez, btw). So, better wait Marinus wake up and help on this to explain why we are having so huge margin of error.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jack

you may also benefit from visiting the lolremez-wiki https://github.com/samhocevar/lolremez/wiki/Tutorial-1-of-5%3A-exp%28x%29-the-quick-way
the git repo is at https://github.com/samhocevar/lolremez you can build lolremez on windows using VS2019 and of course on linux

Siekmanski

Hi Jack,

Guga already compiled it for us.

Hi Guga,

Exp(x) = Exp2(x / Logn(2.0))
1/Logn(2.0) = 1.4426950408889634073599246810019

Now we can use ( x * 1.4426950408889634073599246810019 ) to calculate the Exp.

E = x * 1.442695
To use the polynomials we have to use only the fractional part of E [0 to 1]
When done, scale the result by adding the integer part to the exponent field of the floating point result.

Here are the 10 degree polynomials and the SSE2 routine.


lolremez -d 10 -r 0:1 "2**x" "2**x"

// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 10.
double f(double x)
{
    double u = 9.9522144894077596e-9;
    u = u * x + 9.4609455637620191e-8;
    u = u * x + 1.331271998884894e-6;
    u = u * x + 1.5244659656760603e-5;
    u = u * x + 1.5403957490414841e-4;
    u = u * x + 1.3333543730974749e-3;
    u = u * x + 9.6181294091749148e-3;
    u = u * x + 5.5504108628244985e-2;
    u = u * x + 2.402265069613608e-1;
    u = u * x + 6.9314718055989127e-1;
    return u * x + 1.0000000000000002;
}

.const
Chebylog2E      real4 4 dup (1.44269504089) ; 1/Logn(2.0)

Cheby10Exp0  real4 4 dup (9.9522144894077596e-9)
Cheby10Exp1  real4 4 dup (9.4609455637620191e-8)
Cheby10Exp2  real4 4 dup (1.331271998884894e-6)
Cheby10Exp3  real4 4 dup (1.5244659656760603e-5)
Cheby10Exp4  real4 4 dup (1.5403957490414841e-4)
Cheby10Exp5  real4 4 dup (1.3333543730974749e-3)
Cheby10Exp6  real4 4 dup (9.6181294091749148e-3)
Cheby10Exp7  real4 4 dup (5.5504108628244985e-2)
Cheby10Exp8  real4 4 dup (2.402265069613608e-1)
Cheby10Exp9  real4 4 dup (6.9314718055989127e-1)
Cheby10Exp10  real4 4 dup (1.0000000000000002)


.code
align 16
SSE2_Exp_10:  ; in: xmm0, out: xmm1
    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31
    cvttps2dq   xmm1,xmm2
    psubd       xmm1,xmm0
    movdqa      xmm0,xmm1
    cvtdq2ps    xmm1,xmm1
    subps       xmm2,xmm1
    movaps      xmm1,oword ptr Cheby10Exp0
    pslld       xmm0,23
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp1
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp2
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp3
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp4
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp5
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp6
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp7
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp8
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp9
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp10
    paddd       xmm0,xmm1
    ret
Creative coders use backward thinking techniques as a strategy.

Siekmanski

Here is a real8 version,  where I explain the above exp math in code.

align 16
SSE41_Exp_10: ; in: xmm0, out: xmm0
    mulpd       xmm0,oword ptr NaturalLogE
    roundpd     xmm1,xmm0,1 ; lose the fractional parts and save the integer parts in xmm1
    subpd       xmm0,xmm1               ; Subtract them, now it's inside the range, results are values between 0.0 to 1.0
    cvtpd2dq    xmm2,xmm1 ; save the integer parts in xmm2
    psllq       xmm2,52 ; shift quadwords 52 bits left ( size of real8 mantissa ) to the integer parts of the exponent parts
    movapd      xmm1,xmm0 ; copy the range
    mulpd       xmm0,oword ptr Exp_Cheby10_0_pd
    addpd       xmm0,oword ptr Exp_Cheby10_1_pd
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_2_pd
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_3_pd
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_4_pd
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_5_pd
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_6_pd
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_7_pd
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_8_pd
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_9_pd
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_10_pd
    paddq       xmm0,xmm2       ; add the exponent parts to the floating point result
    ret
Creative coders use backward thinking techniques as a strategy.

guga

Quote from: Siekmanski on August 15, 2020, 04:53:26 AM
Hi Jack,

Guga already compiled it for us.

Hi Guga,

Exp(x) = Exp2(x / Logn(2.0))
1/Logn(2.0) = 1.4426950408889634073599246810019

Now we can use ( x * 1.4426950408889634073599246810019 ) to calculate the Exp.

E = x * 1.442695
To use the polynomials we have to use only the fractional part of E [0 to 1]
When done, scale the result by adding the integer part to the exponent field of the floating point result.

Here are the 10 degree polynomials and the SSE2 routine.


lolremez -d 10 -r 0:1 "2**x" "2**x"

// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 10.
double f(double x)
{
    double u = 9.9522144894077596e-9;
    u = u * x + 9.4609455637620191e-8;
    u = u * x + 1.331271998884894e-6;
    u = u * x + 1.5244659656760603e-5;
    u = u * x + 1.5403957490414841e-4;
    u = u * x + 1.3333543730974749e-3;
    u = u * x + 9.6181294091749148e-3;
    u = u * x + 5.5504108628244985e-2;
    u = u * x + 2.402265069613608e-1;
    u = u * x + 6.9314718055989127e-1;
    return u * x + 1.0000000000000002;
}

.const
Chebylog2E      real4 4 dup (1.44269504089) ; 1/Logn(2.0)

Cheby10Exp0  real4 4 dup (9.9522144894077596e-9)
Cheby10Exp1  real4 4 dup (9.4609455637620191e-8)
Cheby10Exp2  real4 4 dup (1.331271998884894e-6)
Cheby10Exp3  real4 4 dup (1.5244659656760603e-5)
Cheby10Exp4  real4 4 dup (1.5403957490414841e-4)
Cheby10Exp5  real4 4 dup (1.3333543730974749e-3)
Cheby10Exp6  real4 4 dup (9.6181294091749148e-3)
Cheby10Exp7  real4 4 dup (5.5504108628244985e-2)
Cheby10Exp8  real4 4 dup (2.402265069613608e-1)
Cheby10Exp9  real4 4 dup (6.9314718055989127e-1)
Cheby10Exp10  real4 4 dup (1.0000000000000002)


.code
align 16
SSE2_Exp_10:  ; in: xmm0, out: xmm1
    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31
    cvttps2dq   xmm1,xmm2
    psubd       xmm1,xmm0
    movdqa      xmm0,xmm1
    cvtdq2ps    xmm1,xmm1
    subps       xmm2,xmm1
    movaps      xmm1,oword ptr Cheby10Exp0
    pslld       xmm0,23
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp1
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp2
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp3
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp4
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp5
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp6
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp7
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp8
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp9
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp10
    paddd       xmm0,xmm1
    ret


Now i see where you got those values :) Thanks a lot.

One question....Why using the range only to 0 to 1 ? On github, the example is from -1 to 1 . When should we use them ?

Does 0 to 1 means only positive exponential ? Ex: exp^5
and
-1 to 1 means negative ? Ex: exp^-5 ??


Btw...did you checked the margin of error ?


I´ll give a try on the method you use and the one that uses a double check for error, like this:

lolremez -d 10 -r -1:1 "exp(x)"  "exp(x)"

What´s the difference from yours ? I mean why using "2**x" "2**x" ?

When i use "exp(x)" the values are different


Other question, why using...

    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31 ; Why ?
    cvttps2dq   xmm1,xmm2
    psubd       xmm1,xmm0
    movdqa      xmm0,xmm1
    cvtdq2ps    xmm1,xmm1
    subps       xmm2,xmm1
    movaps      xmm1,oword ptr Cheby10Exp0
    pslld       xmm0,23
(...)


wouldn´t only be necessary multiply this new value ?

Why not using only ?

    mulsd xmm0 X$Chebylog2E and later add and multiply by the remaining values ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

tested for accuracy :)

It looses accuracy after the 4th digit after the "."

correct value:
exp^5 =  148.41315910257660342111558004055227962348766759387898904675284511

using lolremez
exp^5 = 148.413162231445312

This is similar to what i found yesterday. On the version i made it has 0 loss and the speed was not so different from the ones using lolremez.  Wouldn´t be better if we try to optimize the version i ported ? I´m pretty sure there´s room for optimization yet.

It´s juust a matter of trying to understand what are the values found on the Exptable.

For what i saw, those values (The bigger ones) seems to be exponents of  integers divided by 5000. While the others (around e-310 seems to be some shifted values of some sort.)

Ex: 2.04582073601169570e-16 = exp(-180628/5000)
2.42294657314453310e-310 = exp(-712.9163944364413) = exp(some shift or division by what ????)
4.99974487227263260e-17 = exp(-187673/5000)

And the "SSE_Shifter4" seems to be formed by 2 exponentials of log(2). Like this:

; (x) = exp(51 log(2) + log(3)) = 6.755399441055744e+15
; (y) = exp(-60 log(2)) = 8.67361737988403547e-19
[SSE_Shifter4: R$ 6.755399441055744e+15,    R$ 6.755399441055744e+15,
                        R$ 8.67361737988403547e-19,  R$ 8.67361737988403547e-19]
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

#59
Remember that real4 has only 7.2 digits precision so, you need real8 to get more digits ( I believe 15 for real8 )

QuoteOne question....Why using the range only to 0 to 1 ? On github, the example is from -1 to 1 . When should we use them ?

-> For example if you only need to do calculations in the range -1 to 1 as in e.g. Audio routines.
   It's saves you the correction code, splitting it up in fractions and integers and glue everything together into a float.

QuoteDoes 0 to 1 means only positive exponential ? Ex: exp^5
and
-1 to 1 means negative ? Ex: exp^-5 ??

-> No, the signs are taken care of.

QuoteBtw...did you checked the margin of error ?

-> No

QuoteI´ll give a try on the method you use and the one that uses a double check for error, like this:

lolremez -d 10 -r -1:1 "exp(x)"  "exp(x)"

What´s the difference from yours ? I mean why using "2**x" "2**x" ?

When i use "exp(x)" the values are different

-> I use the Exp2 polynomials to calculate Exp()
   2**x = 2 raised to the power of x = Exp2()
   Why? because, Exp() calculates the value of e (the base of natural logarithms) raised to the power of x.

-> By simplifying the math, we have 1 set of polynomials to calculate 2 functions, Exp() and Exp2()  :cool:
   Exp(x) = Exp2(x / Logn(2.0))
   Create a reciprocal: 1/Logn(2.0) = 1.442695
   Now we can use ( x * 1.442695 ) to calculate Exp().

QuoteOther question, why using...
    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31 ; Why ?
    cvttps2dq   xmm1,xmm2
    psubd       xmm1,xmm0
    movdqa      xmm0,xmm1
    cvtdq2ps    xmm1,xmm1
    subps       xmm2,xmm1
    movaps      xmm1,oword ptr Cheby10Exp0
    pslld       xmm0,23


-> In SSE2 we don't have a floor instruction ( SSE4_1 does: roundps xmm1,xmm0,1 )
   We need that to work outside the range of the polynomials.
   So we have to create it ourselfs:

    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31 ; put sign-bits in position
    cvttps2dq   xmm1,xmm2 ; convert with truncation, packed real4 to packed signed 32bit integers
    psubd       xmm1,xmm0 ; signed integers - sign bits
    movdqa      xmm0,xmm1 ; save them
    cvtdq2ps    xmm1,xmm1 ; convert 32bit packed integers to packed real4
    subps       xmm2,xmm1 ; get the fractional parts ( range 0.0 to 1.0 )
    movaps      xmm1,oword ptr Cheby10Exp0
    pslld       xmm0,23 ; shift 32bit integers 23 bits to the left, ( size of real4 mantissa ) into the exponent fields

at the end of the routine:
    paddd       xmm0,xmm1 ; add the exponent fields to the real4 results



Quotewouldn´t only be necessary multiply this new value ?

Why not using only ?

    mulsd xmm0 X$Chebylog2E and later add and multiply by the remaining values ?
Logged

-> Because it doesn't work that way, we need a range to work with the polynomials.
Creative coders use backward thinking techniques as a strategy.