Author Topic: Fast Exp approximation  (Read 1511 times)

jj2007

  • Member
  • *****
  • Posts: 10636
  • Assembler is fun ;-)
    • MasmBasic
Re: Fast Exp approximation
« Reply #30 on: August 13, 2020, 07:20:29 PM »
The assembler don't use the FPU (or SSE) so it doesn't set the underflow flag but rather the function that convert the string sets errno to ERANGE. I assume the changes in Masm (and Uasm) has to do with external library functions rather than any deliberate changes to the assembler.

    errno = 0; /* v2.11: init errno; errno is set on over- and under-flow */
    double_value = strtod( inp, NULL );

Of course it's not a runtime check. When you try to assemble somebyte db 256, the assembler checks the bounds and throws an error. Same for somedouble REAL8 1.0e-400 - that's an error, and the assembler knows it. But there is a grey area in this case due to the capacity of the FPU to load "denormalised" values beyond 2.22e-308. They behave like legit numbers, so those who need that (a very exotic case, but we stumbled over it in this thread) should be able to use this feature.

Therefore my proposal: issue a warning for the grey area. Maybe only once, I hate long listings of identical warnings.

At present, you get fatal error A1012: error count exceeds 100; stopping assembly, and that clearly is not necessary, given that the code runs just fine when assembled with UAasm.

Sheer luxury would be a one-liner:

W1000: 112 denormalised numbers between lines 14 and 39 :cool:

Siekmanski

  • Member
  • *****
  • Posts: 2365
Re: Fast Exp approximation
« Reply #31 on: August 14, 2020, 01:03:38 AM »
Hi guga,

Very fast real4 SSE2 Exponent routine.
Up to 4 exponents at once.
Precision: 7 digits

Code: [Select]
.const
Chebylog2E  real4 4 dup (1.44269504089)

; 5th Degree Chebyshev ( Remez Algorithm ) Polynomials
Cheby5Exp0  real4 4 dup (0.0018775767)
Cheby5Exp1  real4 4 dup (0.0089893397)
Cheby5Exp2  real4 4 dup (0.055826318)
Cheby5Exp3  real4 4 dup (0.24015361)
Cheby5Exp4  real4 4 dup (0.69315308)
Cheby5Exp5  real4 4 dup (0.99999994)

.code
align 16
SSE2_Exp:  ; in: xmm0, out: xmm0
    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31
    cvttps2dq   xmm1,xmm2
    psubd       xmm1,xmm0
    movdqa      xmm0,xmm1
    cvtdq2ps    xmm1,xmm1
    subps       xmm2,xmm1
    movaps      xmm1,oword ptr Cheby5Exp0
    pslld       xmm0,23
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp1
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp2
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp3
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp4
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp5
    paddd       xmm0,xmm1
    ret
Creative coders use backward thinking techniques as a strategy.

nidud

  • Member
  • *****
  • Posts: 1989
    • https://github.com/nidud/asmc
Re: Fast Exp approximation
« Reply #32 on: August 14, 2020, 01:06:50 AM »
Quote
Same for somedouble REAL8 1.0e-400 - that's an error, and the assembler knows it.

Why is that an error? According to your logic (and some assemblers) this is just zero.

Quote
But there is a grey area in this case due to the capacity of the FPU to load "denormalised" values beyond 2.22e-308.

The FPU is 80 bit and not 64, so again:

    f tbyte 2.42294657314453310e-310 ; legal 80 bit input
    x real8 ? ; 64 bit result
...
    fld f ; load 80 bits value (Asmc loads 128 bits)
    fstp x ; convert to 64 bit <-- error:0030: bit 4 and 5 set

So the FPU, Asmc, Jwasm and all versions of Masm up to v14 are in agreement here. This means something happened to the MS version of strtod() at some point.

Siekmanski

  • Member
  • *****
  • Posts: 2365
Re: Fast Exp approximation
« Reply #33 on: August 14, 2020, 01:14:10 AM »
SSE4_1 version, faster than the SSE2 version.

Code: [Select]
align 16
SSE41_Exp: ; in: xmm0, out: xmm0
    mulps       xmm0,oword ptr Chebylog2E
    roundps     xmm1,xmm0,1
    subps       xmm0,xmm1
    cvtps2dq    xmm2,xmm1
    pslld       xmm2,23
    movaps      xmm1,xmm0
    mulps       xmm0,oword ptr Cheby5Exp0
    addps       xmm0,oword ptr Cheby5Exp1
    mulps       xmm0,xmm1
    addps       xmm0,oword ptr Cheby5Exp2
    mulps       xmm0,xmm1
    addps       xmm0,oword ptr Cheby5Exp3
    mulps       xmm0,xmm1
    addps       xmm0,oword ptr Cheby5Exp4
    mulps       xmm0,xmm1
    addps       xmm0,oword ptr Cheby5Exp5
    paddd       xmm0,xmm2
    ret
Creative coders use backward thinking techniques as a strategy.

jj2007

  • Member
  • *****
  • Posts: 10636
  • Assembler is fun ;-)
    • MasmBasic
Re: Fast Exp approximation
« Reply #34 on: August 14, 2020, 02:20:26 AM »
@nidud: It seems you just want to argue. Frustrated? Can we help you?

Code: [Select]
include \masm32\MasmBasic\MasmBasic.inc
NotSoSmall REAL8 1.234567890123456e-308 ; legal for all assemblers, even AsmC
VerySmall REAL8 1.234567890123456e-322 ; legal for UAsm and MASM 14+
; TooSmall REAL8 1.234567890123456e-324 ; illegal even for UAsm
Result REAL8 ?
  Init
  fld NotSoSmall
  fmul FP8(1.0e154)
  fmul FP8(1.0e154)
  fstp Result
  Print Str$("NotSoSmall*1e308=\t%Jf\n", Result)
  fld VerySmall
  fmul FP8(1.0e161)
  fmul FP8(1.0e161)
  fstp Result
  Inkey Str$("VerySmall*1e322=\t%Jf\n", Result)
EndOfCode

Output:
NotSoSmall*1e308=       1.234567890123456246
VerySmall*1e322=        1.235164114603116481

guga

  • Member
  • *****
  • Posts: 1344
  • Assembly is a state of art.
    • RosAsm
Re: Fast Exp approximation
« Reply #35 on: August 14, 2020, 02:51:24 AM »
Hi guga,

Very fast real4 SSE2 Exponent routine.
Up to 4 exponents at once.
Precision: 7 digits

Code: [Select]
.const
Chebylog2E  real4 4 dup (1.44269504089)

; 5th Degree Chebyshev ( Remez Algorithm ) Polynomials
Cheby5Exp0  real4 4 dup (0.0018775767)
Cheby5Exp1  real4 4 dup (0.0089893397)
Cheby5Exp2  real4 4 dup (0.055826318)
Cheby5Exp3  real4 4 dup (0.24015361)
Cheby5Exp4  real4 4 dup (0.69315308)
Cheby5Exp5  real4 4 dup (0.99999994)

.code
align 16
SSE2_Exp:  ; in: xmm0, out: xmm0
    movaps      xmm2,oword ptr Chebylog2E
    mulps       xmm2,xmm0
    psrld       xmm0,31
    cvttps2dq   xmm1,xmm2
    psubd       xmm1,xmm0
    movdqa      xmm0,xmm1
    cvtdq2ps    xmm1,xmm1
    subps       xmm2,xmm1
    movaps      xmm1,oword ptr Cheby5Exp0
    pslld       xmm0,23
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp1
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp2
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp3
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp4
    mulps       xmm1,xmm2
    addps       xmm1,oword ptr Cheby5Exp5
    paddd       xmm0,xmm1
    ret


Great ! A couple of questions:


1 - How did you got those numbers of ChebyXXX ? How to calculate this cheby values ?

2 - Can we improve the precision to the 13Th digit ? What is the cost in terms of speed ?

3 - what if we extend it to a 10th Degree Chebyshev ( Remez Algorithm ) Polynomials ? Is it possible to grant more accuracy ?

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 2365
Re: Fast Exp approximation
« Reply #36 on: August 14, 2020, 04:30:26 AM »
Hi guga,

I had the calculated polynomials already on my computer for a few years.
You can create them with the program called LolRemez.
It's open source.

https://github.com/samhocevar/lolremez

Maybe somebody here on the forum would be so kind to compile the program for us in Visual Studio?  :eusa_pray:

https://github.com/samhocevar/lolremez/releases/download/v0.6/lolremez-0.6.zip
Creative coders use backward thinking techniques as a strategy.

guga

  • Member
  • *****
  • Posts: 1344
  • Assembly is a state of art.
    • RosAsm
Re: Fast Exp approximation
« Reply #37 on: August 14, 2020, 05:36:21 AM »
I guess i did it  :thumbsup:


See if it is working as expected, please. I had to install github on the desktop to make this thing compile :mrgreen: :mrgreen: :mrgreen:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 2365
Re: Fast Exp approximation
« Reply #38 on: August 14, 2020, 06:16:40 AM »
 :thumbsup:

lolremez -d 10 -r 0:1 "2**x" "2**x"


// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 10.
// p(x)=(((((((((9.9522144894077596e-9*x+9.4609455637620191e-8)*x+1.331271998
894e-6)*x+1.5244659656760603e-5)*x+1.5403957490414841e-4)*x+1.333354373097474
3)*x+9.6181294091749148e-3)*x+5.5504108628244985e-2)*x+2.402265069613608e-1)*
.9314718055989127e-1)*x+1.0000000000000002
double f(double x)
{
    double u = 9.9522144894077596e-9;
    u = u * x + 9.4609455637620191e-8;
    u = u * x + 1.331271998884894e-6;
    u = u * x + 1.5244659656760603e-5;
    u = u * x + 1.5403957490414841e-4;
    u = u * x + 1.3333543730974749e-3;
    u = u * x + 9.6181294091749148e-3;
    u = u * x + 5.5504108628244985e-2;
    u = u * x + 2.402265069613608e-1;
    u = u * x + 6.9314718055989127e-1;
    return u * x + 1.0000000000000002;
}
Creative coders use backward thinking techniques as a strategy.

nidud

  • Member
  • *****
  • Posts: 1989
    • https://github.com/nidud/asmc
Re: Fast Exp approximation
« Reply #39 on: August 14, 2020, 06:32:37 AM »
@nidud: It seems you just want to argue. Frustrated? Can we help you?

 :biggrin:

Referring to yourself in plural now?

What I say (or argue) is this:
The Masm @Version for Asmc (and Jwasm/Uasm) is 8, so this means that this error is expected and thus correct.

The gray area you refer to doesn't really exist. The conversion fails on under/overflow so it doesn't matter if you leave one bit or shift out all of them: the number is equally erroneous as 2 (e-323) and 0 (e-234).

Nasm accepts all of these without any errors and so does the compilers (CL/GCC). Masm trows an error at the last one. The output for (e-323) is 2 by the compilers and 5 for the assemblers (Asmc included).

So this covers one end of real8's and you have real2, 4, 10, and 16 in addition so I think I will hold my beer on this one. In the meantime if you want to play with erroneous floats use hexadecimal notation as normally done in these cases.

Quote
Output:
NotSoSmall*1e308=       1.234567890123456246
VerySmall*1e322=        1.235164114603116481


V = 1.234567890123456e-322
%echo @CatStr(%(V*1.0e161*1.0e161))

1.234567890123456

guga

  • Member
  • *****
  • Posts: 1344
  • Assembly is a state of art.
    • RosAsm
Re: Fast Exp approximation
« Reply #40 on: August 14, 2020, 06:42:51 AM »
:thumbsup:

lolremez -d 10 -r 0:1 "2**x" "2**x"


// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 10.
// p(x)=(((((((((9.9522144894077596e-9*x+9.4609455637620191e-8)*x+1.331271998
894e-6)*x+1.5244659656760603e-5)*x+1.5403957490414841e-4)*x+1.333354373097474
3)*x+9.6181294091749148e-3)*x+5.5504108628244985e-2)*x+2.402265069613608e-1)*
.9314718055989127e-1)*x+1.0000000000000002
double f(double x)
{
    double u = 9.9522144894077596e-9;
    u = u * x + 9.4609455637620191e-8;
    u = u * x + 1.331271998884894e-6;
    u = u * x + 1.5244659656760603e-5;
    u = u * x + 1.5403957490414841e-4;
    u = u * x + 1.3333543730974749e-3;
    u = u * x + 9.6181294091749148e-3;
    u = u * x + 5.5504108628244985e-2;
    u = u * x + 2.402265069613608e-1;
    u = u * x + 6.9314718055989127e-1;
    return u * x + 1.0000000000000002;
}

Hi Marinus

Thanks... :thumbsup: :thumbsup: :thumbsup:

This is what i would ask yo8u When do you know which range to use ?

I mean, on https://github.com/samhocevar/lolremez/wiki/Tutorial-1-of-5%3A-exp%28x%29-the-quick-way  It shows a example of a exp(x) function with a range of -1 to 1. Yours is a range of 0 to 1. So, i donĀ“t get it. We need to use the value of x always normalized ???

Because even when we insert a range of 0 to 1 with exp(x), the values are different from the ones you posted.

Btw... why did you inputed "2**x" "2**x" to calculate exp(x) ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 2365
Re: Fast Exp approximation
« Reply #41 on: August 14, 2020, 07:08:51 AM »
2**x is just the notation for the Exponent function, the second one is the weight function.
Just as you, I have to find my way in this LolRemez program.  :thumbsup:

The example -1 to 1 is often used in audio.
You stay always in that range, so you don't have compensation code for numbers outside that range.

I just tried to get the polynomials I used in my SSE2 example.
They are almost the same.
I will use the new ones. ( should be more precise since LolRemez has evolved to version 0.6  :cool:)

lolremez -d 5 -r 0:1 "2**x" "2**x"

// Approximation of f(x) = 2**x
// with weight function g(x) = 2**x
// on interval [ 0, 1 ]
// with a polynomial of degree 5.
// p(x)=((((1.8775766751914795e-3*x+8.9893400904946651e-3)*x+5.5826318053295672e
-2)*x+2.401536170443754e-1)*x+6.9315307320016896e-1)*x+9.9999992506352618e-1
double f(double x)
{
    double u = 1.8775766751914795e-3;
    u = u * x + 8.9893400904946651e-3;
    u = u * x + 5.5826318053295672e-2;
    u = u * x + 2.401536170443754e-1;
    u = u * x + 6.9315307320016896e-1;
    return u * x + 9.9999992506352618e-1;
Creative coders use backward thinking techniques as a strategy.

jj2007

  • Member
  • *****
  • Posts: 10636
  • Assembler is fun ;-)
    • MasmBasic
Re: Fast Exp approximation
« Reply #42 on: August 14, 2020, 08:23:28 AM »
Code: [Select]
  fld FP4(0.5) ; x
  fld st
  fld st
  fld st
  fld st
  fmul FP8(1.8775766751914795e-3)
  fadd FP8(8.9893400904946651e-3)
  fmul
  fadd FP8(5.5826318053295672e-2)
  fmul
  fadd FP8(2.401536170443754e-1)
  fmul
  fadd FP8(6.9315307320016896e-1)
  fmul
  fadd FP8(9.9999992506352618e-1) ; result on FPU
  PrintLine Str$("2^0.5=\t  %Jf", ST(0)v), CrLf$, "expected: 1.414213562373095048"

Output:
Code: [Select]
2^0.5=    1.414213663708122130
expected: 1.414213562373095048

Siekmanski

  • Member
  • *****
  • Posts: 2365
Re: Fast Exp approximation
« Reply #43 on: August 14, 2020, 08:32:19 AM »
Guga,
Thank you for compiling this piece of GOLD for me in Visual Studio.
LolRemez is really a very valuable program.
It opens doors to get very fast trigonometric functions with polynomials exactly the way I want them.

Hi Jochen,
This is cool stuff or not?
Now we can adjust the precision to real4 or real8 just by finding the correct degree of polynomials.
Creative coders use backward thinking techniques as a strategy.

Siekmanski

  • Member
  • *****
  • Posts: 2365
Re: Fast Exp approximation
« Reply #44 on: August 14, 2020, 08:48:42 AM »
Jochen,

You calculated Exp2

Exp2(0.5) is equal to sqrt(2.0)
It calculates the value of 2 raised to the power of x

Here is the FPU function for Exp
It calculates the value of e (the base of natural logarithms) raised to the power of x

Code: [Select]
FPU_Exp:
; st(0) == x
    fldl2e
    fmulp       st(1), st(0)
    fld         st(0)
    frndint
    fsub        st(1), st(0)
    fxch
    f2xm1
    fld1
    faddp
    fscale
    fstp        st(1)
    ret

result: 1.648721270700 ( for 0.5 )
Creative coders use backward thinking techniques as a strategy.