### Author Topic: FPU power  (Read 9409 times)

#### Siekmanski

• Member
• Posts: 1553
##### Re: FPU power
« Reply #30 on: October 07, 2015, 03:00:11 AM »
Am I right, that pow(x,y) is the same as exp(y * log(x)) ?

For example: pow(2.0, 0.5) =  1.414213562373 as calculated in the routines from Dave and JJ2007.
But if I calculate: exp(0.5 * log(2.0)).
I get 1.16243273894 as a result on my calculator.

log(2.0) = 0.30102999566398119521
exp(0.5 * 0.30102999566398119521) = 1.16243273894

Below a fast Exp approximation routine. ( the log() value is hard coded, needs a fast log routine..... )
It's not exact but it's fast,

Code: [Select]
`.dataexp0 real8 0.99962789571721377560130024774exp1 real8 0.99793872910703643074502373922exp2 real8 0.50289865085404914825661658491exp3 real8 0.17648623219024696305508965346exp4 real8 0.03996291422520886755278730528Y real8 0.5Xlog_2 real8 0.30102999566398119521 ; log(2.0)Result real8 0.0.code movsd xmm0,real8 ptr [Xlog_2] mulsd xmm0,real8 ptr [Y] movsd xmm1,real8 ptr [exp4] mulsd xmm1,xmm0 addsd xmm1,real8 ptr [exp3] mulsd xmm1,xmm0 addsd xmm1,real8 ptr [exp2] mulsd xmm1,xmm0 addsd xmm1,real8 ptr [exp1] mulsd xmm0,xmm1 addsd xmm0,real8 ptr [exp0] movsd Result,xmm0 ; result = 1.161847999603 ret`
Creative coders use backward thinking techniques as their strategy.

#### gelatine1

• Member
• Posts: 217
##### Re: FPU power
« Reply #31 on: October 07, 2015, 05:15:59 AM »
exp(y * ln(x)) is equal to x^y. (ln is log with base e...). I calculated exp(0.5 * ln(2.0)) and it is equal to 1.414213562373 as expected.

#### rrr314159

• Member
• Posts: 1382
##### Re: FPU power
« Reply #32 on: October 07, 2015, 05:50:16 AM »
Quote from: Siekmanski
Am I right, that pow(x,y) is the same as exp(y * log(x)) ?

Close! But you've mixed up log (base 10) and ln (base e).

The relevant math goes like this,

let f = x^y
ln f = y ln x    ; take ln of both sides
f = e ^ (y ln x) ; take exp of both sides

So, pow(x,y) = x^y = e ^ (y ln x). (as gelatine1 says)

OR ELSE,

let f = x^y
log f = y log x    ; take log of both sides
f = 10 ^ (y log x) ; take power-of-10 of both sides

So, pow(x,y) = x^y = 10 ^ (y log x)

Check it, both formulas give 1.41421...

10 was used in the old days when we looked answers up in log tables.

Your exp routine is using a set of coefficients close to the standard Taylor series but not the same; I guess those are Chebyshev coefficients? The Taylor series, of course, is

e ^ a = 1 + a + a^2/2 + a^3/3! + a^4/4! + ...

So all you need to make that a full "pow" routine is the ln series, plus some manipulation to scale by powers of 2 so that the series converge. (With computers of course we use powers of 2, in the old days would be powers of 10).
I am NaN

#### dedndave

• Member
• Posts: 8790
• Still using Abacus 2.0
##### Re: FPU power
« Reply #33 on: October 07, 2015, 07:00:20 AM »
xy = log-1(y*log(x))

doesn't matter which base you use, so long as it's the same base for log-1 (antilog) and log
for the FPU, it's base 2, of course

#### dedndave

• Member
• Posts: 8790
• Still using Abacus 2.0
##### Re: FPU power
« Reply #34 on: October 07, 2015, 07:07:13 AM »
the FY2L2X instruction performs the y*log(x) part in a single step

it's the antilog that's a pain - lol
it just looks bad because you have to work on the fraction and exponent seperately

#### rrr314159

• Member
• Posts: 1382
##### Re: FPU power
« Reply #35 on: October 07, 2015, 08:39:36 AM »
Quote from: dedndave
the FY2L2X instruction performs the y*log(x) part in a single step

- that's fyl2x (only one 2). It's the log base 2, of course

Quote from: dedndave
it's the antilog that's a pain - lol

- f2xm1 does the antilog (or exp) - base 2 of course

Quote from: dedndave

- ? Don't get it. What looks bad about it? hardest part is remembering the name of the instructions! I had to check make sure there's only one "2" in fyl2x

- When computed with a power series, both functions require scaling; by 2, in the computer case; by 10, in the good old days. The reason is, the respective power series must converge. No doubt the new instructions use, internally, series; and f2xm1 requires input less than 1. However fyl2x does not; apparently they do the scaling internally for us, which is nice. (Oddly fyl2xp1 does limit the input, X!  -according to masm32 fpu tute, anyway.)

- So in that sense you're right, f2xm1 looks more "bad". But for speed, we can't use these instructions anyway (AFAIK) so both operations require scaling. Amounts to using the mantissa in the series, and adjusting the exponent by shifting by powers of 2. That's what fscale does, but again, according to Agner, for speed it should be avoided also
I am NaN

#### Siekmanski

• Member
• Posts: 1553
##### Re: FPU power
« Reply #36 on: October 07, 2015, 03:24:06 PM »
@gelatine1,
Yes, I've mixed up log (base 10) and ln (base e).

It works with a natural logarithm.
ln(2.0) = 0.69314718056

Next step is a fast natural logarithm routine...

@rrr314159,
Those are Chebyshev coefficients,
http://lolengine.net/wiki/doc/maths/remez/tutorial-relative-error
Creative coders use backward thinking techniques as their strategy.

#### jj2007

• Member
• Posts: 8521
• Assembler is fun ;-)
##### Re: FPU power
« Reply #37 on: October 07, 2015, 06:18:14 PM »
Those are Chebyshev coefficients,
http://lolengine.net/wiki/doc/maths/remez/tutorial-relative-error

Quote
double fastexp2(double x)
{
const double a0 = 9.996278957172137756013002477405568253404e-1;
const double a1 = 9.979387291070364307450237392251445288594e-1;
const double a2 = 5.028986508540491482566165849167001609232e-1;
const double a3 = 1.764862321902469630550896534647276819692e-1;
const double a4 = 3.996291422520886755278730528311966596433e-2;

return a0 + x * (a1 + x * (a2 + x * (a3 + x * a4)));
}

Somebody should enlighten these guys about the physical limits of a double aka REAL8

The logic of the Fast Sinus() and Cosinus() algos could be used, but problem is the range. For Sinus & Cosinus, repetition allows to limit the lookup table, but logs have no limits in principle. Of course, if there was a concrete application with a known range, one could cheat... the potential gain is high, the logs are even slower than fsin:
Code: [Select]
`Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)8479    cycles for 100 * fsin12385   cycles for 100 * lognat1609    cycles for 100 * fast Sinus`

#### Siekmanski

• Member
• Posts: 1553
##### Re: FPU power
« Reply #38 on: October 07, 2015, 06:20:41 PM »
Still looking for a faster natural log routine...

Code: [Select]
`.dataX real8 2.0Y real8 0.5log_2 real8 0.69314718055994530941 ; log(2.0)exp0 real8 0.99962789571721377560exp1 real8 0.99793872910703643074exp2 real8 0.50289865085404914825exp3 real8 0.17648623219024696305exp4 real8 0.03996291422520886755.code;pow(X,Y) == exp(Y * ln(X)) fld real8 ptr [log_2] fld real8 ptr [X] fyl2x  ; slow.... fstp real8 ptr [esp] ; ln(X) movsd xmm0,real8 ptr [esp] mulsd xmm0,real8 ptr [Y] movsd xmm1,real8 ptr [exp4] mulsd xmm1,xmm0 addsd xmm1,real8 ptr [exp3] mulsd xmm1,xmm0 addsd xmm1,real8 ptr [exp2] mulsd xmm1,xmm0 addsd xmm1,real8 ptr [exp1] mulsd xmm0,xmm1 addsd xmm0,real8 ptr [exp0] ;result in xmm0`
Creative coders use backward thinking techniques as their strategy.

#### rrr314159

• Member
• Posts: 1382
##### Re: FPU power
« Reply #39 on: October 08, 2015, 12:58:41 AM »
This is the basic idea, very simple Taylor series:

let z = x - 1, then

ln x  = ln(1+z) = z - z^2/2 + z^3/3 - z^4/4 + z^5/5 - z^6/6 ...

It converges pretty well. Chebyshev - type coefficients no doubt a bit better, but this is good enough.

Of course abs val z must be < 1, so use the mantissa in this formula. Then you just add the exponent.

These days I'm trying to take advantage of the last few days of autumn, soon I'll be snow-bound and will write a ln routine, assuming it's still wanted. Maybe get around to it pretty quick
I am NaN

#### Siekmanski

• Member
• Posts: 1553
##### Re: FPU power
« Reply #40 on: October 08, 2015, 01:25:37 AM »
First I have also other things to do but later I'll try the Chebyshev polynomial algorithm to create a natural log routine.
Creative coders use backward thinking techniques as their strategy.

#### rrr314159

• Member
• Posts: 1382
##### Re: FPU power
« Reply #41 on: October 08, 2015, 08:34:51 AM »
Quote from: rrr
It converges pretty well

- u know, actually the ln series won't converge too well. Power series convergence depends on the size of the coefficients, you want them small, but ln has rather large coeff's. Namely, 1, -.5, .333...,-.25, .2, -.1666... etc. Compare to exp where coeff's are inverse of factorials, so instead of 1/4 (.25) you have 1/24 (.041666...). Sin and cos are really good, because every 2nd coeff is 0. So don't expect it to be as easy as it was with them. Chebyshev - same story, they're merely rather more accurate adjustments of Taylor. The only good thing about ln is the alternating signs (true also for sin and cos)

Quote from: jj
the logs are even slower than fsin

- The above explains why ...

Quote from: jj
problem is the range ... logs have no limits in principle

- That's not a problem, actually. The log (ln) series inputs must be limited anyway, for the series to even dream of converging.

Quote from: rrr
Of course abs val z must be < 1, so use the mantissa in this formula. Then you just add the exponent.

- Remember log tables, in the good old days? For a number like, say, 12345, you look up log of 1.2345 (the mantissa). Then you add 4 to it, because there are 4 powers of 10. With computers same idea but powers of 2 are used. It's very easy because the log of 10^n is simply n (the exponent).

Quote from: jj
Somebody should enlighten these guys about the physical limits of a double aka REAL8

- The rubes are very impressed when they see

Code: [Select]
`const double a2 = 5.028986508540491482566165849167001609232e-1; //!`
- It's even got a semicolon at the end - wow this guy knows what he's doing! Compare

Code: [Select]
` a2 real8 5.0289865085404914`
- I'm falling asleep just looking at it. BTW in this case it should end in 5; round up for the next digit, 8. Actually it's more complicated, for that last digit you have to express it to the closest precision as a binary number so it may be best to use 4 (? not sure about that)

- Other than PR there are a couple excuses for using all the digits. Someday you may want to change it to real10 or even larger, might as well have them all there for you. Think of the unused digits as a comment. Also ... I once had the impression the compiler actually looked one digit farther and took care of rounding; that was a long time ago, different compiler (C), but I've retained the habit of retaining at least one extra digit

Quote from: Siekmanski
First I have also other things to do but later I'll try the Chebyshev polynomial algorithm to create a natural log routine.

- Me too ... there's another problem with ln routine (and, pow(X,Y)) - neither of us need it! We both had a real need for sin / cos in our graphics work. In fact only jj2007 really wants faster pow, for MasmBasic. Maybe mabdelouahab also, but he hasn't mentioned need for speed. Of course I'm happy to help make MasmBasic as fast as possible, and I'm sure you are too (no, I'm not being sarcastic), plus there's the challenge of the thing, but it is, no doubt, easier to put off than trig functions. My MathArt, which you may remember posted in Workshop, was almost twice as fast (with "sin smoothing" option) due to our trig efforts.

Finally it's worth mentioning - and this may be the really fun part - there may be faster ways to tackle pow. The obvious thing is using the standard algo with a fast exp and fast ln; but it's possible to glom them together into one (2-valued) power series, if all you want is the final result (pow). May or may not help

There may be alternatives to the standard power series also. You know, for exp (ln also, but I don't remember those) there are series that converge much faster; but they're not power series and implementation (on computer) is typically so much messier that they're not worth the trouble. Anyway, (note to self:) browse the net a bit looking for tips like this. Find out how Mathematica, MathLab etc does it. When you get a round Tuit ...
I am NaN

#### jj2007

• Member
• Posts: 8521
• Assembler is fun ;-)
##### Re: FPU power
« Reply #42 on: October 08, 2015, 10:10:48 AM »
In fact only jj2007 really wants faster pow, for MasmBasic

But I have a couple of other things to solve, and ExpXY is fast enough already. Plus, the gliding SetPoly3 thing won't work as smoothly as on the Sinus().

#### rrr314159

• Member
• Posts: 1382
##### Re: FPU power
« Reply #43 on: October 08, 2015, 11:23:54 AM »
Quote from: jj2007
ExpXY is fast enough

- there's no such thing as "fast enough"
I am NaN

#### qWord

• Member
• Posts: 1473
• The base type of a type is the type itself
##### Re: FPU power
« Reply #44 on: October 10, 2015, 12:57:06 PM »
- The rubes are very impressed when they see

Code: [Select]
`const double a2 = 5.028986508540491482566165849167001609232e-1; //!`
- It's even got a semicolon at the end - wow this guy knows what he's doing! Compare

Code: [Select]
` a2 real8 5.0289865085404914`
Reminds me of this nice example found in a book:
We are doing the conversion of base 10 to 2 with infinite precision and then round the result to nearest (half to even) REAL4-precision (24 bits) value:
x1 = 1 + 2-24 = 1.000000059604644775390625 (this number is the midpoint between 1 and it's floating point successor 1+2-23)
x2 = 1 + 2-24 + 10-60 = 1.000000059604644775390625000000000000000000000000000000000001
A correct conversion routine* should return 1 (03F800000r) for x1  and the successor number of 1 for x2 (=1+2-23 = 03F800001r)

*e.g. as described in ISO C99 or IEEE754-2008 standart

regards
MREAL macros - when you need floating point arithmetic while assembling!