Fast Exp approximation

jj2007 · August 13, 2020, 07:20:29 PM

Quote from: nidud on August 13, 2020, 11:43:27 AMThe assembler don't use the FPU (or SSE) so it doesn't set the underflow flag but rather the function that convert the string sets errno to ERANGE. I assume the changes in Masm (and Uasm) has to do with external library functions rather than any deliberate changes to the assembler.

errno = 0; /* v2.11: init errno; errno is set on over- and under-flow */
double_value = strtod( inp, NULL );

Of course it's not a runtime check. When you try to assemble somebyte db 256, the assembler checks the bounds and throws an error. Same for somedouble REAL8 1.0e-400 - that's an error, and the assembler knows it. But there is a grey area in this case due to the capacity of the FPU to load "denormalised" values beyond 2.22e-308. They behave like legit numbers, so those who need that (a very exotic case, but we stumbled over it in this thread) should be able to use this feature.

Therefore my proposal: issue a warning for the grey area. Maybe only once, I hate long listings of identical warnings.

At present, you get fatal error A1012: error count exceeds 100; stopping assembly, and that clearly is not necessary, given that the code runs just fine when assembled with UAasm.

Sheer luxury would be a one-liner:

W1000: 112 denormalised numbers between lines 14 and 39

Siekmanski · August 14, 2020, 01:03:38 AM

Hi guga,

Very fast real4 SSE2 Exponent routine.
Up to 4 exponents at once.
Precision: 7 digits

Code Select

.const
Chebylog2E  real4 4 dup (1.44269504089)

; 5th Degree Chebyshev ( Remez Algorithm ) Polynomials
Cheby5Exp0  real4 4 dup (0.0018775767)
Cheby5Exp1  real4 4 dup (0.0089893397)
Cheby5Exp2  real4 4 dup (0.055826318)
Cheby5Exp3  real4 4 dup (0.24015361)
Cheby5Exp4  real4 4 dup (0.69315308)
Cheby5Exp5  real4 4 dup (0.99999994)

.code
align 16
SSE2_Exp:  ; in: xmm0, out: xmm0
    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31
    cvttps2dq   xmm1,xmm2
    psubd       xmm1,xmm0
    movdqa      xmm0,xmm1
    cvtdq2ps    xmm1,xmm1
    subps       xmm2,xmm1
    movaps      xmm1,oword ptr Cheby5Exp0
    pslld       xmm0,23
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp1
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp2
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp3
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp4
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp5
    paddd       xmm0,xmm1
    ret

nidud · August 14, 2020, 01:06:50 AM

deleted

Siekmanski · August 14, 2020, 01:14:10 AM

SSE4_1 version, faster than the SSE2 version.

Code Select

align 16
SSE41_Exp: ; in: xmm0, out: xmm0
    mulps       xmm0,oword ptr Chebylog2E
    roundps     xmm1,xmm0,1
    subps       xmm0,xmm1
    cvtps2dq    xmm2,xmm1
    pslld       xmm2,23
    movaps      xmm1,xmm0
    mulps       xmm0,oword ptr Cheby5Exp0
    addps       xmm0,oword ptr Cheby5Exp1
    mulps       xmm0,xmm1
    addps       xmm0,oword ptr Cheby5Exp2
    mulps       xmm0,xmm1
    addps       xmm0,oword ptr Cheby5Exp3
    mulps       xmm0,xmm1
    addps       xmm0,oword ptr Cheby5Exp4
    mulps       xmm0,xmm1
    addps       xmm0,oword ptr Cheby5Exp5
    paddd       xmm0,xmm2
    ret

jj2007 · August 14, 2020, 02:20:26 AM

@nidud: It seems you just want to argue. Frustrated? Can we help you?

Code Select

include \masm32\MasmBasic\MasmBasic.inc
NotSoSmall	REAL8 1.234567890123456e-308	; legal for all assemblers, even AsmC
VerySmall	REAL8 1.234567890123456e-322	; legal for UAsm and MASM 14+
; TooSmall	REAL8 1.234567890123456e-324	; illegal even for UAsm
Result	REAL8 ?
  Init
  fld NotSoSmall
  fmul FP8(1.0e154)
  fmul FP8(1.0e154)
  fstp Result
  Print Str$("NotSoSmall*1e308=\t%Jf\n", Result)
  fld VerySmall
  fmul FP8(1.0e161)
  fmul FP8(1.0e161)
  fstp Result
  Inkey Str$("VerySmall*1e322=\t%Jf\n", Result)
EndOfCode

Output:
NotSoSmall*1e308= 1.234567890123456246
VerySmall*1e322= 1.235164114603116481

guga · August 14, 2020, 02:51:24 AM

Quote from: Siekmanski on August 14, 2020, 01:03:38 AM
Hi guga,

Very fast real4 SSE2 Exponent routine.
Up to 4 exponents at once.
Precision: 7 digits

Code Select Expand
.const Chebylog2E real4 4 dup (1.44269504089) ; 5th Degree Chebyshev ( Remez Algorithm ) Polynomials Cheby5Exp0 real4 4 dup (0.0018775767) Cheby5Exp1 real4 4 dup (0.0089893397) Cheby5Exp2 real4 4 dup (0.055826318) Cheby5Exp3 real4 4 dup (0.24015361) Cheby5Exp4 real4 4 dup (0.69315308) Cheby5Exp5 real4 4 dup (0.99999994) .code align 16 SSE2_Exp: ; in: xmm0, out: xmm0 movaps xmm2,oword ptr Chebylog2E mulps xmm2,xmm0 psrld xmm0,31 cvttps2dq xmm1,xmm2 psubd xmm1,xmm0 movdqa xmm0,xmm1 cvtdq2ps xmm1,xmm1 subps xmm2,xmm1 movaps xmm1,oword ptr Cheby5Exp0 pslld xmm0,23 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby5Exp1 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby5Exp2 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby5Exp3 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby5Exp4 mulps xmm1,xmm2 addps xmm1,oword ptr Cheby5Exp5 paddd xmm0,xmm1 ret

Great ! A couple of questions:

1 - How did you got those numbers of ChebyXXX ? How to calculate this cheby values ?

2 - Can we improve the precision to the 13Th digit ? What is the cost in terms of speed ?

3 - what if we extend it to a 10th Degree Chebyshev ( Remez Algorithm ) Polynomials ? Is it possible to grant more accuracy ?

Siekmanski · August 14, 2020, 04:30:26 AM

Hi guga,

I had the calculated polynomials already on my computer for a few years.
You can create them with the program called LolRemez.
It's open source.

https://github.com/samhocevar/lolremez

Maybe somebody here on the forum would be so kind to compile the program for us in Visual Studio?

https://github.com/samhocevar/lolremez/releases/download/v0.6/lolremez-0.6.zip

guga · August 14, 2020, 05:36:21 AM

I guess i did it

See if it is working as expected, please. I had to install github on the desktop to make this thing compile

Siekmanski · August 14, 2020, 06:16:40 AM

lolremez -d 10 -r 0:1 "2**x" "2**x"

// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 10.
// p(x)=(((((((((9.9522144894077596e-9*x+9.4609455637620191e-8)*x+1.331271998
894e-6)*x+1.5244659656760603e-5)*x+1.5403957490414841e-4)*x+1.333354373097474
3)*x+9.6181294091749148e-3)*x+5.5504108628244985e-2)*x+2.402265069613608e-1)*
.9314718055989127e-1)*x+1.0000000000000002
double f(double x)
{
double u = 9.9522144894077596e-9;
u = u * x + 9.4609455637620191e-8;
u = u * x + 1.331271998884894e-6;
u = u * x + 1.5244659656760603e-5;
u = u * x + 1.5403957490414841e-4;
u = u * x + 1.3333543730974749e-3;
u = u * x + 9.6181294091749148e-3;
u = u * x + 5.5504108628244985e-2;
u = u * x + 2.402265069613608e-1;
u = u * x + 6.9314718055989127e-1;
return u * x + 1.0000000000000002;
}

nidud · August 14, 2020, 06:32:37 AM

deleted

guga · August 14, 2020, 06:42:51 AM

Quote from: Siekmanski on August 14, 2020, 06:16:40 AM

lolremez -d 10 -r 0:1 "2**x" "2**x"

// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 10.
// p(x)=(((((((((9.9522144894077596e-9*x+9.4609455637620191e-8)*x+1.331271998
894e-6)*x+1.5244659656760603e-5)*x+1.5403957490414841e-4)*x+1.333354373097474
3)*x+9.6181294091749148e-3)*x+5.5504108628244985e-2)*x+2.402265069613608e-1)*
.9314718055989127e-1)*x+1.0000000000000002
double f(double x)
{
double u = 9.9522144894077596e-9;
u = u * x + 9.4609455637620191e-8;
u = u * x + 1.331271998884894e-6;
u = u * x + 1.5244659656760603e-5;
u = u * x + 1.5403957490414841e-4;
u = u * x + 1.3333543730974749e-3;
u = u * x + 9.6181294091749148e-3;
u = u * x + 5.5504108628244985e-2;
u = u * x + 2.402265069613608e-1;
u = u * x + 6.9314718055989127e-1;
return u * x + 1.0000000000000002;
}

Hi Marinus

Thanks...

This is what i would ask yo8u When do you know which range to use ?

I mean, on https://github.com/samhocevar/lolremez/wiki/Tutorial-1-of-5%3A-exp%28x%29-the-quick-way It shows a example of a exp(x) function with a range of -1 to 1. Yours is a range of 0 to 1. So, i don´t get it. We need to use the value of x always normalized ???

Because even when we insert a range of 0 to 1 with exp(x), the values are different from the ones you posted.

Btw... why did you inputed "2**x" "2**x" to calculate exp(x) ?

Siekmanski · August 14, 2020, 07:08:51 AM

2**x is just the notation for the Exponent function, the second one is the weight function.
Just as you, I have to find my way in this LolRemez program.

The example -1 to 1 is often used in audio.
You stay always in that range, so you don't have compensation code for numbers outside that range.

I just tried to get the polynomials I used in my SSE2 example.
They are almost the same.
I will use the new ones. ( should be more precise since LolRemez has evolved to version 0.6

)

lolremez -d 5 -r 0:1 "2**x" "2**x"

// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 5.
// p(x)=((((1.8775766751914795e-3*x+8.9893400904946651e-3)*x+5.5826318053295672e
-2)*x+2.401536170443754e-1)*x+6.9315307320016896e-1)*x+9.9999992506352618e-1
double f(double x)
{
double u = 1.8775766751914795e-3;
u = u * x + 8.9893400904946651e-3;
u = u * x + 5.5826318053295672e-2;
u = u * x + 2.401536170443754e-1;
u = u * x + 6.9315307320016896e-1;
return u * x + 9.9999992506352618e-1;

jj2007 · August 14, 2020, 08:23:28 AM

Code Select

  fld FP4(0.5)	; x
  fld st
  fld st
  fld st
  fld st
  fmul FP8(1.8775766751914795e-3)
  fadd FP8(8.9893400904946651e-3)
  fmul
  fadd FP8(5.5826318053295672e-2)
  fmul
  fadd FP8(2.401536170443754e-1)
  fmul
  fadd FP8(6.9315307320016896e-1)
  fmul
  fadd FP8(9.9999992506352618e-1)	; result on FPU
  PrintLine Str$("2^0.5=\t  %Jf", ST(0)v), CrLf$, "expected: 1.414213562373095048"

Output:

Code Select

2^0.5=    1.414213663708122130
expected: 1.414213562373095048

Siekmanski · August 14, 2020, 08:32:19 AM

Guga,
Thank you for compiling this piece of GOLD for me in Visual Studio.
LolRemez is really a very valuable program.
It opens doors to get very fast trigonometric functions with polynomials exactly the way I want them.

Hi Jochen,
This is cool stuff or not?
Now we can adjust the precision to real4 or real8 just by finding the correct degree of polynomials.

Siekmanski · August 14, 2020, 08:48:42 AM

Jochen,

You calculated Exp2

Exp2(0.5) is equal to sqrt(2.0)
It calculates the value of 2 raised to the power of x

Here is the FPU function for Exp
It calculates the value of e (the base of natural logarithms) raised to the power of x

Code Select

FPU_Exp:
; st(0) == x
    fldl2e
    fmulp       st(1), st(0)
    fld         st(0)
    frndint
    fsub        st(1), st(0)
    fxch
    f2xm1
    fld1
    faddp
    fscale
    fstp        st(1)
    ret

result: 1.648721270700 ( for 0.5 )

The MASM Forum

News:

Fast Exp approximation

jj2007

Siekmanski

nidud

Siekmanski

jj2007

guga

Siekmanski

guga

Siekmanski

nidud

guga

Siekmanski

jj2007

Siekmanski

Siekmanski