News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Fast Exp approximation

Started by guga, August 12, 2020, 12:29:23 AM

Previous topic - Next topic

jj2007

Quote from: nidud on August 13, 2020, 11:43:27 AMThe assembler don't use the FPU (or SSE) so it doesn't set the underflow flag but rather the function that convert the string sets errno to ERANGE. I assume the changes in Masm (and Uasm) has to do with external library functions rather than any deliberate changes to the assembler.

    errno = 0; /* v2.11: init errno; errno is set on over- and under-flow */
    double_value = strtod( inp, NULL );

Of course it's not a runtime check. When you try to assemble somebyte db 256, the assembler checks the bounds and throws an error. Same for somedouble REAL8 1.0e-400 - that's an error, and the assembler knows it. But there is a grey area in this case due to the capacity of the FPU to load "denormalised" values beyond 2.22e-308. They behave like legit numbers, so those who need that (a very exotic case, but we stumbled over it in this thread) should be able to use this feature.

Therefore my proposal: issue a warning for the grey area. Maybe only once, I hate long listings of identical warnings.

At present, you get fatal error A1012: error count exceeds 100; stopping assembly, and that clearly is not necessary, given that the code runs just fine when assembled with UAasm.

Sheer luxury would be a one-liner:

W1000: 112 denormalised numbers between lines 14 and 39 :cool:

Siekmanski

Hi guga,

Very fast real4 SSE2 Exponent routine.
Up to 4 exponents at once.
Precision: 7 digits

.const
Chebylog2E  real4 4 dup (1.44269504089)

; 5th Degree Chebyshev ( Remez Algorithm ) Polynomials
Cheby5Exp0  real4 4 dup (0.0018775767)
Cheby5Exp1  real4 4 dup (0.0089893397)
Cheby5Exp2  real4 4 dup (0.055826318)
Cheby5Exp3  real4 4 dup (0.24015361)
Cheby5Exp4  real4 4 dup (0.69315308)
Cheby5Exp5  real4 4 dup (0.99999994)

.code
align 16
SSE2_Exp:  ; in: xmm0, out: xmm0
    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31
    cvttps2dq   xmm1,xmm2
    psubd       xmm1,xmm0
    movdqa      xmm0,xmm1
    cvtdq2ps    xmm1,xmm1
    subps       xmm2,xmm1
    movaps      xmm1,oword ptr Cheby5Exp0
    pslld       xmm0,23
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp1
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp2
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp3
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp4
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp5
    paddd       xmm0,xmm1
    ret
Creative coders use backward thinking techniques as a strategy.

nidud

#32
deleted

Siekmanski

SSE4_1 version, faster than the SSE2 version.

align 16
SSE41_Exp: ; in: xmm0, out: xmm0
    mulps       xmm0,oword ptr Chebylog2E
    roundps     xmm1,xmm0,1
    subps       xmm0,xmm1
    cvtps2dq    xmm2,xmm1
    pslld       xmm2,23
    movaps      xmm1,xmm0
    mulps       xmm0,oword ptr Cheby5Exp0
    addps       xmm0,oword ptr Cheby5Exp1
    mulps       xmm0,xmm1
    addps       xmm0,oword ptr Cheby5Exp2
    mulps       xmm0,xmm1
    addps       xmm0,oword ptr Cheby5Exp3
    mulps       xmm0,xmm1
    addps       xmm0,oword ptr Cheby5Exp4
    mulps       xmm0,xmm1
    addps       xmm0,oword ptr Cheby5Exp5
    paddd       xmm0,xmm2
    ret
Creative coders use backward thinking techniques as a strategy.

jj2007

@nidud: It seems you just want to argue. Frustrated? Can we help you?

include \masm32\MasmBasic\MasmBasic.inc
NotSoSmall REAL8 1.234567890123456e-308 ; legal for all assemblers, even AsmC
VerySmall REAL8 1.234567890123456e-322 ; legal for UAsm and MASM 14+
; TooSmall REAL8 1.234567890123456e-324 ; illegal even for UAsm
Result REAL8 ?
  Init
  fld NotSoSmall
  fmul FP8(1.0e154)
  fmul FP8(1.0e154)
  fstp Result
  Print Str$("NotSoSmall*1e308=\t%Jf\n", Result)
  fld VerySmall
  fmul FP8(1.0e161)
  fmul FP8(1.0e161)
  fstp Result
  Inkey Str$("VerySmall*1e322=\t%Jf\n", Result)
EndOfCode


Output:
NotSoSmall*1e308=       1.234567890123456246
VerySmall*1e322=        1.235164114603116481

guga

Quote from: Siekmanski on August 14, 2020, 01:03:38 AM
Hi guga,

Very fast real4 SSE2 Exponent routine.
Up to 4 exponents at once.
Precision: 7 digits

.const
Chebylog2E  real4 4 dup (1.44269504089)

; 5th Degree Chebyshev ( Remez Algorithm ) Polynomials
Cheby5Exp0  real4 4 dup (0.0018775767)
Cheby5Exp1  real4 4 dup (0.0089893397)
Cheby5Exp2  real4 4 dup (0.055826318)
Cheby5Exp3  real4 4 dup (0.24015361)
Cheby5Exp4  real4 4 dup (0.69315308)
Cheby5Exp5  real4 4 dup (0.99999994)

.code
align 16
SSE2_Exp:  ; in: xmm0, out: xmm0
    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31
    cvttps2dq   xmm1,xmm2
    psubd       xmm1,xmm0
    movdqa      xmm0,xmm1
    cvtdq2ps    xmm1,xmm1
    subps       xmm2,xmm1
    movaps      xmm1,oword ptr Cheby5Exp0
    pslld       xmm0,23
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp1
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp2
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp3
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp4
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp5
    paddd       xmm0,xmm1
    ret



Great ! A couple of questions:


1 - How did you got those numbers of ChebyXXX ? How to calculate this cheby values ?

2 - Can we improve the precision to the 13Th digit ? What is the cost in terms of speed ?

3 - what if we extend it to a 10th Degree Chebyshev ( Remez Algorithm ) Polynomials ? Is it possible to grant more accuracy ?

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

Hi guga,

I had the calculated polynomials already on my computer for a few years.
You can create them with the program called LolRemez.
It's open source.

https://github.com/samhocevar/lolremez

Maybe somebody here on the forum would be so kind to compile the program for us in Visual Studio?  :eusa_pray:

https://github.com/samhocevar/lolremez/releases/download/v0.6/lolremez-0.6.zip
Creative coders use backward thinking techniques as a strategy.

guga

I guess i did it  :thumbsup:


See if it is working as expected, please. I had to install github on the desktop to make this thing compile :mrgreen: :mrgreen: :mrgreen:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

 :thumbsup:

lolremez -d 10 -r 0:1 "2**x" "2**x"


// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 10.
// p(x)=(((((((((9.9522144894077596e-9*x+9.4609455637620191e-8)*x+1.331271998
894e-6)*x+1.5244659656760603e-5)*x+1.5403957490414841e-4)*x+1.333354373097474
3)*x+9.6181294091749148e-3)*x+5.5504108628244985e-2)*x+2.402265069613608e-1)*
.9314718055989127e-1)*x+1.0000000000000002
double f(double x)
{
    double u = 9.9522144894077596e-9;
    u = u * x + 9.4609455637620191e-8;
    u = u * x + 1.331271998884894e-6;
    u = u * x + 1.5244659656760603e-5;
    u = u * x + 1.5403957490414841e-4;
    u = u * x + 1.3333543730974749e-3;
    u = u * x + 9.6181294091749148e-3;
    u = u * x + 5.5504108628244985e-2;
    u = u * x + 2.402265069613608e-1;
    u = u * x + 6.9314718055989127e-1;
    return u * x + 1.0000000000000002;
}
Creative coders use backward thinking techniques as a strategy.

nidud

#39
deleted

guga

Quote from: Siekmanski on August 14, 2020, 06:16:40 AM
:thumbsup:

lolremez -d 10 -r 0:1 "2**x" "2**x"


// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 10.
// p(x)=(((((((((9.9522144894077596e-9*x+9.4609455637620191e-8)*x+1.331271998
894e-6)*x+1.5244659656760603e-5)*x+1.5403957490414841e-4)*x+1.333354373097474
3)*x+9.6181294091749148e-3)*x+5.5504108628244985e-2)*x+2.402265069613608e-1)*
.9314718055989127e-1)*x+1.0000000000000002
double f(double x)
{
    double u = 9.9522144894077596e-9;
    u = u * x + 9.4609455637620191e-8;
    u = u * x + 1.331271998884894e-6;
    u = u * x + 1.5244659656760603e-5;
    u = u * x + 1.5403957490414841e-4;
    u = u * x + 1.3333543730974749e-3;
    u = u * x + 9.6181294091749148e-3;
    u = u * x + 5.5504108628244985e-2;
    u = u * x + 2.402265069613608e-1;
    u = u * x + 6.9314718055989127e-1;
    return u * x + 1.0000000000000002;
}

Hi Marinus

Thanks... :thumbsup: :thumbsup: :thumbsup:

This is what i would ask yo8u When do you know which range to use ?

I mean, on https://github.com/samhocevar/lolremez/wiki/Tutorial-1-of-5%3A-exp%28x%29-the-quick-way  It shows a example of a exp(x) function with a range of -1 to 1. Yours is a range of 0 to 1. So, i donĀ“t get it. We need to use the value of x always normalized ???

Because even when we insert a range of 0 to 1 with exp(x), the values are different from the ones you posted.

Btw... why did you inputed "2**x" "2**x" to calculate exp(x) ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

2**x is just the notation for the Exponent function, the second one is the weight function.
Just as you, I have to find my way in this LolRemez program.  :thumbsup:

The example -1 to 1 is often used in audio.
You stay always in that range, so you don't have compensation code for numbers outside that range.

I just tried to get the polynomials I used in my SSE2 example.
They are almost the same.
I will use the new ones. ( should be more precise since LolRemez has evolved to version 0.6  :cool:)

lolremez -d 5 -r 0:1 "2**x" "2**x"

// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 5.
// p(x)=((((1.8775766751914795e-3*x+8.9893400904946651e-3)*x+5.5826318053295672e
-2)*x+2.401536170443754e-1)*x+6.9315307320016896e-1)*x+9.9999992506352618e-1
double f(double x)
{
    double u = 1.8775766751914795e-3;
    u = u * x + 8.9893400904946651e-3;
    u = u * x + 5.5826318053295672e-2;
    u = u * x + 2.401536170443754e-1;
    u = u * x + 6.9315307320016896e-1;
    return u * x + 9.9999992506352618e-1;
Creative coders use backward thinking techniques as a strategy.

jj2007

  fld FP4(0.5) ; x
  fld st
  fld st
  fld st
  fld st
  fmul FP8(1.8775766751914795e-3)
  fadd FP8(8.9893400904946651e-3)
  fmul
  fadd FP8(5.5826318053295672e-2)
  fmul
  fadd FP8(2.401536170443754e-1)
  fmul
  fadd FP8(6.9315307320016896e-1)
  fmul
  fadd FP8(9.9999992506352618e-1) ; result on FPU
  PrintLine Str$("2^0.5=\t  %Jf", ST(0)v), CrLf$, "expected: 1.414213562373095048"


Output:
2^0.5=    1.414213663708122130
expected: 1.414213562373095048

Siekmanski

Guga,
Thank you for compiling this piece of GOLD for me in Visual Studio.
LolRemez is really a very valuable program.
It opens doors to get very fast trigonometric functions with polynomials exactly the way I want them.

Hi Jochen,
This is cool stuff or not?
Now we can adjust the precision to real4 or real8 just by finding the correct degree of polynomials.
Creative coders use backward thinking techniques as a strategy.

Siekmanski

Jochen,

You calculated Exp2

Exp2(0.5) is equal to sqrt(2.0)
It calculates the value of 2 raised to the power of x

Here is the FPU function for Exp
It calculates the value of e (the base of natural logarithms) raised to the power of x

FPU_Exp:
; st(0) == x
    fldl2e
    fmulp       st(1), st(0)
    fld         st(0)
    frndint
    fsub        st(1), st(0)
    fxch
    f2xm1
    fld1
    faddp
    fscale
    fstp        st(1)
    ret


result: 1.648721270700 ( for 0.5 )
Creative coders use backward thinking techniques as a strategy.