Fast Exp approximation

jj2007 · August 14, 2020, 09:30:31 AM

Quote from: Siekmanski on August 14, 2020, 08:32:19 AM
Hi Jochen,
This is cool stuff or not?

Yessss

Quote from: Siekmanski on August 14, 2020, 08:48:42 AM
You calculated Exp2

Exp2(0.5) is equal to sqrt(2.0)

Indeed. Normally I use Print Str$("2^0.5=%Jf\n", Exp2(0.5)v). Output: 2^0.5=1.414213562373095049

guga · August 14, 2020, 02:18:53 PM

Quote from: Siekmanski on August 14, 2020, 07:08:51 AM
2**x is just the notation for the Exponent function, the second one is the weight function.
Just as you, I have to find my way in this LolRemez program.

The example -1 to 1 is often used in audio.
You stay always in that range, so you don't have compensation code for numbers outside that range.

I just tried to get the polynomials I used in my SSE2 example.
They are almost the same.
I will use the new ones. ( should be more precise since LolRemez has evolved to version 0.6 )

lolremez -d 5 -r 0:1 "2**x" "2**x"

// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 5.
// p(x)=((((1.8775766751914795e-3*x+8.9893400904946651e-3)*x+5.5826318053295672e
-2)*x+2.401536170443754e-1)*x+6.9315307320016896e-1)*x+9.9999992506352618e-1
double f(double x)
{
double u = 1.8775766751914795e-3;
u = u * x + 8.9893400904946651e-3;
u = u * x + 5.5826318053295672e-2;
u = u * x + 2.401536170443754e-1;
u = u * x + 6.9315307320016896e-1;
return u * x + 9.9999992506352618e-1;

Hi Marinus. You´re welcome

Btw,, i still don´t get this app to work properly.

I tried the 10th term and did this:
lolremez -d 10 -r 0:1 "2**x" "2**x"

It created this polynomials

Code Select


// p(x)=(((((((((9.9522144894077596e-9*x+9.4609455637620191e-8)*x+1.331271998884894e-6)*x+1.5244659656760603e-5)*x+1.5403957490414841e-4)*x+1.3333543730974749e-3)*x+9.6181294091749148e-3)*x+5.5504108628244985e-2)*x+2.402265069613608e-1)*x+6.9314718055989127e-1)*x+1.0000000000000002
double f(double x)
{
    double u = 9.9522144894077596e-9;
    u = u * x + 9.4609455637620191e-8;
    u = u * x + 1.331271998884894e-6;
    u = u * x + 1.5244659656760603e-5;
    u = u * x + 1.5403957490414841e-4;
    u = u * x + 1.3333543730974749e-3;
    u = u * x + 9.6181294091749148e-3;
    u = u * x + 5.5504108628244985e-2;
    u = u * x + 2.402265069613608e-1;
    u = u * x + 6.9314718055989127e-1;
    return u * x + 1.0000000000000002;
}

But, when i tried to put them onto a function, it failed miserably. I don´t understand what i´m doing wrong. All i did was:

Code Select


[<16 Cheby10Exp0: R$ 9.9522144894077596e-9, R$ 9.9522144894077596e-9]
[<16 Cheby10Exp1: R$ 9.4609455637620191e-8, R$ 9.4609455637620191e-8]
[<16 Cheby10Exp2: R$ 1.331271998884894e-6, R$ 1.331271998884894e-6]
[<16 Cheby10Exp3: R$ 1.5244659656760603e-5, R$ 1.5244659656760603e-5]
[<16 Cheby10Exp4: R$ 1.5403957490414841e-4, R$ 1.5403957490414841e-4]
[<16 Cheby10Exp5: R$ 1.3333543730974749e-3, R$ 1.3333543730974749e-3]
[<16 Cheby10Exp6: R$ 9.6181294091749148e-3, R$ 9.6181294091749148e-3]
[<16 Cheby10Exp7: R$ 5.5504108628244985e-2, R$ 5.5504108628244985e-2]
[<16 Cheby10Exp8: R$ 2.402265069613608e-1, R$ 2.402265069613608e-1]
[<16 Cheby10Exp9: R$ 6.9314718055989127e-1, R$ 6.9314718055989127e-1]
[<16 Cheby10Exp10: R$ 1.0000000000000002, R$ 1.0000000000000002]


Proc SSE2_Exp2::
    Arguments @Number, @Flag

    mov eax D@Flag
    Test_if eax SSE_EXP_INT
        cvtsi2ss xmm1 D@Number ; converts a signed integer to double
    Test_Else_if eax SSE_EXP_FLOAT
        ;movss xmm1 D@Number ; converts a single precision float to double
        cvtss2sd xmm1 X@Number ; converts a single precision float to double
    Test_Else_if eax SSE_EXP_REAL8
        mov eax D@Number | movss XMM1 X$eax
    Test_Else_if eax SSE_EXP_QWORD
        mov eax D@Number | movq XMM1 X$eax
    Test_Else
        xor eax eax | ExitP ; return 0 Invalid parameter
    Test_End



    movupd xmm0 X$Cheby10Exp0
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp1
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp2
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp3
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp4
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp5
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp6
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp7
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp8
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp9
    mulpd xmm0 xmm1 | addpd xmm0 X$Cheby10Exp10

EndP

It was supposed to work, because all the values and way to calculate as equal to the C version, right ?

I tried calculating exp of 5, but instead resulting in:

148.41315910257660342111558004055227962348766759387898904675284511

It returned:
31.987334657819531

Siekmanski · August 14, 2020, 02:51:54 PM

you created an Exp2 function.
Exp function calculates the value of e (the base of natural logarithms) raised to the power of x.
See the version I posted before.
I have to go to bed now, tomorrow I'll show you how to implement the 10th degree polynomials.

guga · August 14, 2020, 03:40:49 PM

Thanks to dodicat (From Freebasic forum), we also have a a FreeBasic conversion of this LOlRemex app.

Can someone, please port it to asm ?

"Compile the file poly.bas to .exe
Click it to get the command prompt.
It runs thus:
At the prompt something like:

poly "-6,8,cos(x)*sin(x)"
Remember to use the double quotes as above!
meaning from x = -6 to 8 the function cos(x)*sin(x) is approximated.
Press the spacebar to build up the approximation until you are happy that the curves are similar.
The standard way of showing the approximation coefficients is shown on the console, but you have the option (upon exit by pressing <esc>), to save to a .bas file, the nested method, which is much faster arithmetically for the computer."

Reference: https://www.freebasic.net/forum/viewtopic.php?t=25897

jj2007 · August 14, 2020, 06:58:05 PM

With your suggested input lolremez.exe poly "-6,8,cos(x)*sin(x)" it says input error.
With lolremez.exe "poly -6,8,cos(x)*sin(x)" it says EXPECTED (, got '-' but proceeds afterwards with nice graphs.

guga · August 14, 2020, 09:22:32 PM

Quote from: jj2007 on August 14, 2020, 06:58:05 PM
With your suggested input lolremez.exe poly "-6,8,cos(x)*sin(x)" it says input error.
With lolremez.exe "poly -6,8,cos(x)*sin(x)" it says EXPECTED (, got '-' but proceeds afterwards with nice graphs.

Tks a lot JJ. I saw your post on the original forum and gave another try and it compiled ok

I saw the graphics and it is really nice.

The command works without this "poly" token

I used as:
LolRemez "-3,3,x^3-x^2"
pause

It also exports a function to be compiled in freebasic containing the polynomials :)

I tried also with

LolRemez "0,1,exp(x)"

And it worked, but exported somethign i failed to understand. It exported this:

Code Select


(((((((((((((((((((((((((((((((+0.04829789212904446)*x-0.1792981145327924)*x-0.6677612347982678)*x+5.893146189666752)*x _ 
-17.1355277003133)*x _ 
+26.0763728821472)*x _ 
-19.69798344538533)*x _ 
+0.9900030668437522)*x _ 
+10.30400147926733)*x _ 
-8.096178275550896)*x _ 
+11.14649562485596)*x _ 
-31.20839952020678)*x _ 
+51.6443661394669)*x _ 
-51.27397166179219)*x _ 
+31.83357261868321)*x _ 
-10.78621348361321)*x _ 
-0.4973412406197402)*x _ 
+2.927004122935927)*x _ 
-1.894224695114594)*x _ 
+0.741826698340469)*x _ 
-0.2034182656840364)*x _ 
+0.04056360695256523)*x _ 
-0.005880298181518774)*x _ 
+0.0008169520352547144)*x _ 
+0.001343698483337935)*x _ 
+0.008335523163337043)*x _ 
+0.04166660157481193)*x _ 
+0.1666666677131286)*x _ 
+0.4999999999927096)*x _ 
+1.000000000000015)*x _ 
+1) _

All of these polynomials are to be used ?

jj2007 · August 14, 2020, 09:42:46 PM

Quote from: guga on August 14, 2020, 09:22:32 PMThe command works without this "poly" token

I'm not sure about that - the output looks a lot different.

guga · August 14, 2020, 10:10:10 PM

I think that on each key you press, it is equivalent to one iteration on the original lolremez. I gave a try on the freebasic version with

LolRemez "0,1,exp(x)"

and hit 11 times the key until it produces a error of something around e-11. It produces these values:

Code Select


(((((((((((+4.567028385771292e-007)*x+2.288390713140263e-006)*x+2.545465440196621e-005)*x _ 
+0.0001978554379130849)*x _ 
+0.001389191775841137)*x _ 
+0.008333228132164795)*x _ 
+0.04166668937151426)*x _ 
+0.1666666638148875)*x _ 
+0.5000000001833158)*x _ 
+0.9999999999954152)*x _ 
+1.000000000000019) _

I´ll give a try on them to see if it works, but i´m clueless how to use them yet. Better would be wait until Marinus wake up and explain to us how to use this lolremez app.

The graphics are very handy, btw, specially because we can actually see how much of the error we want to get rid off.

guga · August 14, 2020, 10:25:01 PM

Ok, i gave a try on the generated values from dodicat FreeBasic version. On 10 iterations, it produced the values i used below. But the result completely lacks accuracy

Code Select

[<16 Cheby10Exp0: R$ 4.567028385771292e-007]
[<16 Cheby10Exp1: R$ 2.288390713140263e-006]
[<16 Cheby10Exp2: R$ 2.545465440196621e-005]
[<16 Cheby10Exp3: R$ 0.0001978554379130849]
[<16 Cheby10Exp4: R$ 0.001389191775841137]
[<16 Cheby10Exp5: R$ 0.008333228132164795]
[<16 Cheby10Exp6: R$ 0.04166668937151426]
[<16 Cheby10Exp7: R$ 0.1666666638148875]
[<16 Cheby10Exp8: R$ 0.5000000001833158]
[<16 Cheby10Exp9: R$ 0.9999999999954152]
[<16 Cheby10Exp10: R$ 1.000000000000019]


Proc SSE2_Exp2::
    Arguments @Number, @Flag

    mov eax D@Flag
    Test_if eax SSE_EXP_INT
        cvtsi2ss xmm1 D@Number ; converts a signed integer to double
    Test_Else_if eax SSE_EXP_FLOAT
        ;movss xmm1 D@Number ; converts a single precision float to double
        cvtss2sd xmm1 X@Number ; converts a single precision float to double
    Test_Else_if eax SSE_EXP_REAL8
        mov eax D@Number | movss XMM1 X$eax
    Test_Else_if eax SSE_EXP_QWORD
        mov eax D@Number | movq XMM1 X$eax
    Test_Else
        xor eax eax | ExitP ; return 0 Invalid parameter
    Test_End

    movsd xmm0 X$Cheby10Exp0
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp1
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp2
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp3
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp4
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp5
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp6
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp7
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp8
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp9
    mulsd xmm0 xmm1 | addsd xmm0 X$Cheby10Exp10


EndP

I gave a try calculating the exp of 5. The correct value of exp^5 should be:
148.41315910257660342111558004055227962348766759387898904675284511

But this lolremez app returned to me:
147.452655481459771

(Similar result as the original lolremez, btw). So, better wait Marinus wake up and help on this to explain why we are having so huge margin of error.

jack · August 15, 2020, 04:45:10 AM

you may also benefit from visiting the lolremez-wiki https://github.com/samhocevar/lolremez/wiki/Tutorial-1-of-5%3A-exp%28x%29-the-quick-way
the git repo is at https://github.com/samhocevar/lolremez you can build lolremez on windows using VS2019 and of course on linux

Siekmanski · August 15, 2020, 04:53:26 AM

Hi Jack,

Guga already compiled it for us.

Hi Guga,

Exp(x) = Exp2(x / Logn(2.0))
1/Logn(2.0) = 1.4426950408889634073599246810019

Now we can use ( x * 1.4426950408889634073599246810019 ) to calculate the Exp.

E = x * 1.442695
To use the polynomials we have to use only the fractional part of E [0 to 1]
When done, scale the result by adding the integer part to the exponent field of the floating point result.

Here are the 10 degree polynomials and the SSE2 routine.

Code Select


lolremez -d 10 -r 0:1 "2**x" "2**x"

// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 10.
double f(double x)
{
    double u = 9.9522144894077596e-9;
    u = u * x + 9.4609455637620191e-8;
    u = u * x + 1.331271998884894e-6;
    u = u * x + 1.5244659656760603e-5;
    u = u * x + 1.5403957490414841e-4;
    u = u * x + 1.3333543730974749e-3;
    u = u * x + 9.6181294091749148e-3;
    u = u * x + 5.5504108628244985e-2;
    u = u * x + 2.402265069613608e-1;
    u = u * x + 6.9314718055989127e-1;
    return u * x + 1.0000000000000002;
}

.const
Chebylog2E      real4 4 dup (1.44269504089) ; 1/Logn(2.0)

Cheby10Exp0  	real4 4 dup (9.9522144894077596e-9)
Cheby10Exp1  	real4 4 dup (9.4609455637620191e-8)
Cheby10Exp2  	real4 4 dup (1.331271998884894e-6)
Cheby10Exp3  	real4 4 dup (1.5244659656760603e-5)
Cheby10Exp4  	real4 4 dup (1.5403957490414841e-4)
Cheby10Exp5  	real4 4 dup (1.3333543730974749e-3)
Cheby10Exp6  	real4 4 dup (9.6181294091749148e-3)
Cheby10Exp7  	real4 4 dup (5.5504108628244985e-2)
Cheby10Exp8  	real4 4 dup (2.402265069613608e-1)
Cheby10Exp9  	real4 4 dup (6.9314718055989127e-1)
Cheby10Exp10  	real4 4 dup (1.0000000000000002)


.code
align 16
SSE2_Exp_10:  ; in: xmm0, out: xmm1
    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31
    cvttps2dq   xmm1,xmm2
    psubd       xmm1,xmm0
    movdqa      xmm0,xmm1
    cvtdq2ps    xmm1,xmm1
    subps       xmm2,xmm1
    movaps      xmm1,oword ptr Cheby10Exp0
    pslld       xmm0,23
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp1
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp2
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp3
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp4
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp5
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp6
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp7
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp8
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp9
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby10Exp10
    paddd       xmm0,xmm1
    ret

Siekmanski · August 15, 2020, 05:37:57 AM

Here is a real8 version, where I explain the above exp math in code.

Code Select

align 16
SSE41_Exp_10: ; in: xmm0, out: xmm0
    mulpd       xmm0,oword ptr NaturalLogE
    roundpd     xmm1,xmm0,1		; lose the fractional parts and save the integer parts in xmm1
    subpd       xmm0,xmm1               ; Subtract them, now it's inside the range, results are values between 0.0 to 1.0
    cvtpd2dq    xmm2,xmm1		; save the integer parts in xmm2
    psllq       xmm2,52			; shift quadwords 52 bits left ( size of real8 mantissa ) to the integer parts of the exponent parts
    movapd      xmm1,xmm0		; copy the range
    mulpd       xmm0,oword ptr Exp_Cheby10_0_pd
    addpd       xmm0,oword ptr Exp_Cheby10_1_pd 
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_2_pd 
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_3_pd 
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_4_pd 
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_5_pd 
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_6_pd 
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_7_pd 
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_8_pd 
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_9_pd 
    mulpd       xmm0,xmm1
    addpd       xmm0,oword ptr Exp_Cheby10_10_pd 
    paddq       xmm0,xmm2       ; add the exponent parts to the floating point result
    ret

guga · August 15, 2020, 05:42:08 AM

Quote from: Siekmanski on August 15, 2020, 04:53:26 AM
Hi Jack,

Guga already compiled it for us.

Hi Guga,

Exp(x) = Exp2(x / Logn(2.0))
1/Logn(2.0) = 1.4426950408889634073599246810019

Now we can use ( x * 1.4426950408889634073599246810019 ) to calculate the Exp.

E = x * 1.442695
To use the polynomials we have to use only the fractional part of E [0 to 1]
When done, scale the result by adding the integer part to the exponent field of the floating point result.

Here are the 10 degree polynomials and the SSE2 routine.

Code Select Expand
lolremez -d 10 -r 0:1 "2**x" "2**x" // Approximation of f(x) = 2**x // with weight function g(x) = 2**x // on interval [ 0, 1 ] // with a polynomial of degree 10. double f(double x) { double u = 9.9522144894077596e-9; u = u * x + 9.4609455637620191e-8; u = u * x + 1.331271998884894e-6; u = u * x + 1.5244659656760603e-5; u = u * x + 1.5403957490414841e-4; u = u * x + 1.3333543730974749e-3; u = u * x + 9.6181294091749148e-3; u = u * x + 5.5504108628244985e-2; u = u * x + 2.402265069613608e-1; u = u * x + 6.9314718055989127e-1; return u * x + 1.0000000000000002; } .const Chebylog2E real4 4 dup (1.44269504089) ; 1/Logn(2.0) Cheby10Exp0 real4 4 dup (9.9522144894077596e-9) Cheby10Exp1 real4 4 dup (9.4609455637620191e-8) Cheby10Exp2 real4 4 dup (1.331271998884894e-6) Cheby10Exp3 real4 4 dup (1.5244659656760603e-5) Cheby10Exp4 real4 4 dup (1.5403957490414841e-4) Cheby10Exp5 real4 4 dup (1.3333543730974749e-3) Cheby10Exp6 real4 4 dup (9.6181294091749148e-3) Cheby10Exp7 real4 4 dup (5.5504108628244985e-2) Cheby10Exp8 real4 4 dup (2.402265069613608e-1) Cheby10Exp9 real4 4 dup (6.9314718055989127e-1) Cheby10Exp10 real4 4 dup (1.0000000000000002) .code align 16 SSE2_Exp_10: ; in: xmm0, out: xmm1 movaps xmm2,oword ptr Chebylog2E mulps xmm2,xmm0 psrld xmm0,31 cvttps2dq xmm1,xmm2 psubd xmm1,xmm0 movdqa xmm0,xmm1 cvtdq2ps xmm1,xmm1 subps xmm2,xmm1 movaps xmm1,oword ptr Cheby10Exp0 pslld xmm0,23 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby10Exp1 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby10Exp2 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby10Exp3 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby10Exp4 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby10Exp5 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby10Exp6 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby10Exp7 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby10Exp8 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby10Exp9 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby10Exp10 paddd xmm0,xmm1 ret

Now i see where you got those values :) Thanks a lot.

One question....Why using the range only to 0 to 1 ? On github, the example is from -1 to 1 . When should we use them ?

Does 0 to 1 means only positive exponential ? Ex: exp^5
and
-1 to 1 means negative ? Ex: exp^-5 ??

Btw...did you checked the margin of error ?

I´ll give a try on the method you use and the one that uses a double check for error, like this:

lolremez -d 10 -r -1:1 "exp(x)" "exp(x)"

What´s the difference from yours ? I mean why using "2**x" "2**x" ?

When i use "exp(x)" the values are different

Other question, why using...

Code Select


    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31 ; Why ?
    cvttps2dq   xmm1,xmm2
    psubd       xmm1,xmm0
    movdqa      xmm0,xmm1
    cvtdq2ps    xmm1,xmm1
    subps       xmm2,xmm1
    movaps      xmm1,oword ptr Cheby10Exp0
    pslld       xmm0,23
(...)

wouldn´t only be necessary multiply this new value ?

Why not using only ?

mulsd xmm0 X$Chebylog2E and later add and multiply by the remaining values ?

guga · August 15, 2020, 06:50:29 AM

tested for accuracy :)

It looses accuracy after the 4th digit after the "."

correct value:
exp^5 = 148.41315910257660342111558004055227962348766759387898904675284511

using lolremez
exp^5 = 148.413162231445312

This is similar to what i found yesterday. On the version i made it has 0 loss and the speed was not so different from the ones using lolremez. Wouldn´t be better if we try to optimize the version i ported ? I´m pretty sure there´s room for optimization yet.

It´s juust a matter of trying to understand what are the values found on the Exptable.

For what i saw, those values (The bigger ones) seems to be exponents of integers divided by 5000. While the others (around e-310 seems to be some shifted values of some sort.)

Ex: 2.04582073601169570e-16 = exp(-180628/5000)
2.42294657314453310e-310 = exp(-712.9163944364413) = exp(some shift or division by what ????)
4.99974487227263260e-17 = exp(-187673/5000)

And the "SSE_Shifter4" seems to be formed by 2 exponentials of log(2). Like this:

; (x) = exp(51 log(2) + log(3)) = 6.755399441055744e+15
; (y) = exp(-60 log(2)) = 8.67361737988403547e-19
[SSE_Shifter4: R$ 6.755399441055744e+15, R$ 6.755399441055744e+15,
R$ 8.67361737988403547e-19, R$ 8.67361737988403547e-19]

Siekmanski · August 15, 2020, 07:08:16 AM

Remember that real4 has only 7.2 digits precision so, you need real8 to get more digits ( I believe 15 for real8 )

QuoteOne question....Why using the range only to 0 to 1 ? On github, the example is from -1 to 1 . When should we use them ?

-> For example if you only need to do calculations in the range -1 to 1 as in e.g. Audio routines.
It's saves you the correction code, splitting it up in fractions and integers and glue everything together into a float.

QuoteDoes 0 to 1 means only positive exponential ? Ex: exp^5
and
-1 to 1 means negative ? Ex: exp^-5 ??

-> No, the signs are taken care of.

QuoteBtw...did you checked the margin of error ?

-> No

QuoteI´ll give a try on the method you use and the one that uses a double check for error, like this:

lolremez -d 10 -r -1:1 "exp(x)" "exp(x)"

What´s the difference from yours ? I mean why using "2**x" "2**x" ?

When i use "exp(x)" the values are different

-> I use the Exp2 polynomials to calculate Exp()
2**x = 2 raised to the power of x = Exp2()
Why? because, Exp() calculates the value of e (the base of natural logarithms) raised to the power of x.

-> By simplifying the math, we have 1 set of polynomials to calculate 2 functions, Exp() and Exp2()

Exp(x) = Exp2(x / Logn(2.0))
Create a reciprocal: 1/Logn(2.0) = 1.442695
Now we can use ( x * 1.442695 ) to calculate Exp().

QuoteOther question, why using...
Code Select Expand
movaps xmm2,oword ptr Chebylog2E mulps xmm2,xmm0 psrld xmm0,31 ; Why ? cvttps2dq xmm1,xmm2 psubd xmm1,xmm0 movdqa xmm0,xmm1 cvtdq2ps xmm1,xmm1 subps xmm2,xmm1 movaps xmm1,oword ptr Cheby10Exp0 pslld xmm0,23

-> In SSE2 we don't have a floor instruction ( SSE4_1 does: roundps xmm1,xmm0,1 )
We need that to work outside the range of the polynomials.
So we have to create it ourselfs:

Code Select

    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31		; put sign-bits in position
    cvttps2dq   xmm1,xmm2	; convert with truncation, packed real4 to packed signed 32bit integers
    psubd       xmm1,xmm0	; signed integers - sign bits
    movdqa      xmm0,xmm1	; save them
    cvtdq2ps    xmm1,xmm1	; convert 32bit packed integers to packed real4
    subps       xmm2,xmm1	; get the fractional parts ( range 0.0 to 1.0 )
    movaps      xmm1,oword ptr Cheby10Exp0
    pslld       xmm0,23		; shift 32bit integers 23 bits to the left, ( size of real4 mantissa ) into the exponent fields

at the end of the routine:
    paddd       xmm0,xmm1	; add the exponent fields to the real4 results

Quotewouldn´t only be necessary multiply this new value ?

Why not using only ?

mulsd xmm0 X$Chebylog2E and later add and multiply by the remaining values ?
Logged

-> Because it doesn't work that way, we need a range to work with the polynomials.

The MASM Forum

News:

Fast Exp approximation

jj2007

guga

Siekmanski

guga

jj2007

guga

jj2007

guga

guga

jack

Siekmanski

Siekmanski

guga

guga

Siekmanski