News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Lambert W Function

Started by guga, August 19, 2020, 11:16:17 AM

Previous topic - Next topic

guga

Hi Guys

I succeeded to implement the lambert W Function (product logarithm) with 100% of accuracy depending on the user input of the precision. The problem is that even considering i used the fast exp and log and log10 functions i posted previously, this lambert function will becomes slow because it needs XXX iterations to reach precision.

I was wondering if there is a way to make it works faster recalculating the math of this stuff.

The main math of this beast is as follows:

y = y - ((2 (y + 1) (y z - x))/((y + 2) (x + y z) + 2 z))

where z = exp(w), y = w and x = our number.
and
w = 3 * log(x + 1) / 4

The code i made is this:

[Float_Exp_Minus_One: R$ 0.367879441171442321595523770161460867445811131031767834507] ; exp(-1)
[Float_Three_Quarter: R$ (3/4)]
[Float_Two: R$ 2]

Proc LambertW:
    Arguments @pNumber, @Output, @Flag
    Local @AmountOfIterations, @BaseNumber, @WVar, @ExpWVar, @FloatOne, @pWVar ; <----- Some of the Local vars are not being used. I´ll remove later once i finish the function and see if it can be optimized even further
    Structure @TempStorage 16, @pTmpInputValDis 0, @pTmpWVar 8
    Uses edi, ecx

    mov eax D@Flag
    Test_if eax SSE_EXP_INT
        cvtsi2sd xmm0 D@pNumber ; converts a signed integer to double
    Test_Else_if eax SSE_EXP_FLOAT
        cvtss2sd xmm0 X@pNumber ; converts a single precision float to double
    Test_Else_if eax SSE_EXP_REAL8
        mov eax D@pNumber | movsd XMM0 X$eax
    Test_Else_if eax SSE_EXP_QWORD
        mov eax D@pNumber | movq XMM0 X$eax
    Test_Else
        xor eax eax | ExitP ; return 0 Invalid parameter
    Test_End

    SSE_D_If xmm0 <= X$Float_Exp_Minus_One ; Lambert function cannot be smaller then exp(-1)
        xor eax eax
        ExitP
    SSE_D_End

    lea edi D@pTmpInputValDis
    movdqu X$edi xmm0 ; save the copy of inputvar
    movdqu X$edi+8 xmm0 ; save the WVar

    ; computes the first branch for real values only

    ; amount of iterations (empirically found)

    ; amountOfIterations = max(4, (int)Math.Ceiling(Math.Log10(x) / 3));
    ; calculate Log10(x)
    call Sse2_log10_precise edi, SSE_EXP_REAL8

    mov eax 3 | cvtsi2sd xmm1 eax | divsd xmm0 xmm1 ; divide our Log10(x) with 3
    mov eax 4 | cvtsi2sd xmm1 eax
    maxsd xmm0 xmm1
    cvtsd2si eax xmm0 ; convert double to int
    mov D@AmountOfIterations eax ; This is precision. We started biasing on a max precision of 4 digits. But here we can simply extend it to 16 with "mov D@AmountOfIterations 16" but it will become slow

    ; initial guess is based on 0 < ln(a) < 3
    ; double w = 3 * log(x + 1) / 4
    movsd xmm0 X$edi | addsd xmm0 X$Float_One | movsd X$edi+8 xmm0 ; save it to pWVar

    lea eax D$edi+8
    call Sse2_log eax, SSE_EXP_REAL8
    mulsd xmm0 X$Float_Three_Quarter ; mul by 3/4
    movsd X$edi+8 xmm0 ; and store our WVar

    ;  Halley's method via eqn (5.9) in Corless et al (1996)
    ; w = w - (w * exp(w) - x) / (exp(w) * (w + 1) - (w + 2) * (w * exp(w) - x) / (2 * w + 2));
    ; can be rewritten as:
    ; y - (y * z - x) / (z * (y + 1) - (y + 2) * (y * z - x) / (2 * y + 2))
    ; where z = exp(w), y = w and x = our number. Resulting in:
    ; y = y - ((2 (y + 1) (y z - x))/((y + 2) (x + y z) + 2 z))

    xor ecx ecx
    mov eax D@AmountOfIterations
    .While ecx < D@AmountOfIterations ; jge
        lea eax D$edi+8
        call Sse2_exp_precise eax, SSE_EXP_REAL8
        movsd xmm1 X$edi+8; WVar
        ; (y z - x)
        movsd xmm2 xmm0 ; z = xmm2 = Exp(WVar) (In Real8 format)
        movsd xmm3 xmm1 | mulsd xmm3 xmm2 ; y*z = WVar*(ExpWVar)
        movsd xmm4 X$edi ; Our Number
        ; xmm1 = WVar = y
        ; xmm2 = Exp(WVar) = z
        ; xmm3 = WVar*(ExpWVar) = y*z
        ; xmm4 our Number converted to double = x

        ; y*z-x = WVar*(ExpWVar)- pNumber
        movsd xmm0 xmm3 | movsd xmm5 xmm4 | subsd xmm0 xmm5
        mulsd xmm0 X$Float_Two ; 2*(y*z-x)
        movsd xmm5 xmm1 | addsd xmm5 X$Float_One ; (y+1)
        mulsd xmm0 xmm5 ; 2*(y*z-x) * (y+1)
        ; xmm0 is our numerator. Let´s store it to xmm5 and start calculating the divisor

        ; xmm5  = 2*(y*z-x) * (y+1)
        movsd xmm5 xmm0

        ; let's work with the divisor now

        ; y*z+x = WVar*(ExpWVar)+ pNumber
        movsd xmm0 xmm3 | movsd xmm6 xmm4 | addsd xmm0 xmm6
        ; y+2
        movsd xmm7 xmm1 | addsd xmm7 X$Float_Two ; (y+2) = WVar+2
        ; (y+2) * (y*z+x)
        mulsd xmm0 xmm7

        ; y*2
        movsd xmm7 xmm1 | addsd xmm7 xmm7

        ; ((y+2) * (y*z+x)) + 2*y
        addsd xmm0 xmm7 ; Ok. xmm0 is our divisor

        ;let's divide the numerator (at xmm5) by the divisor (at xmm0)
        ; ((2 (y + 1) (y z - x))/((y + 2) (x + y z) + 2 z))
        divsd xmm5 xmm0

        ; good. Now we will subtract y (WVar) from this value at xmm5
        subsd xmm1 xmm5

        ; now we need only to convert it to single Float and store our WVar for the next loop
        movsd X$edi+8 xmm1  ; and store our WVar

        inc ecx
    .End_While

    mov eax D@Output
    fld R$edi+8
    fstp F$eax

EndP



The above function is called as:



[SSE_EXP_INT 1
SSE_EXP_FLOAT 2
SSE_EXP_REAL8 4
SSE_EXP_QWORD 8]


[TestLog: R$ 5.0]
[LambertOut: R$ 0]

call LambertW TestLog, LambertOut, SSE_EXP_REAL8


The original code was biased from: https://stackoverflow.com/questions/60211021/lambert-w-function-in-c-sharp

But i found that there could be a way using something called  Fritsch's iteration  that claims to make it work with only 1 iteration. See: https://arxiv.org/pdf/1003.1628.pdf at page 6



Also, there are some references about approximations that is way out of my head here:
https://www.researchgate.net/post/Is_there_approximation_to_the_LambertWx_function
https://mathoverflow.net/questions/57819/best-approximation-to-the-lambertwx-or-explambertwx
https://rgmia.org/papers/v10n2/lambert-v2.pdf
https://github.com/DarkoVeberic/LambertW

The question is, someone have done a stuff like that ? I mean trying to simplify this W lambert function ?  Although my version is 100% precise (and way way way faster then the C versions, btw)  it will becomes slow on the same way as the precision increases.


If i succeed to make it work faster, i´ll post here a benchmark for us. I just don´t put it right now, because i would have to port again the code to masm and it is not fast enough yet.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Done :)  :thumbsup: :thumbsup: :thumbsup:

The function takes something around 480 clocks and the precision is at least the 14th digit. I only had to implement this  Fritsch's iteration twice to work properly and 5x faster.



Lambert W of  445.0

Expected (https://www.wolframalpha.com/input/?i=lambert+w%28445%29):
4.5770250479420729678480...
my version
4.57702504784207332

I´ll make further tests tomorrow and iwill post the code and executable here to we benchmark.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Ok, Guys. I´m quite over.


Finished fixing a error of small numbers closer to zero and now i´m trying to fix the problem of negative values using LambertW (inside the limit of -1/exp(1)

The current version is fast. (Only 2 iterations needed to grant accuracy)



[SSE_Two_Thirds: R$ (2/3)]

Proc LambertW2:
    Arguments @pNumber, @Output, @Flag
    Local @AmountOfIterations, @BaseNumber, @WVar, @ExpWVar, @FloatOne, @pWVar
    Structure @TempStorage 32, @pTmpInputValDis 0, @pTmpWVar 8, @pZn 16, @Qn 24
    Uses edi, ecx

    mov eax D@Flag
    Test_if eax SSE_EXP_INT
        cvtsi2sd xmm0 D@pNumber ; converts a signed integer to double
    Test_Else_if eax SSE_EXP_FLOAT
        cvtss2sd xmm0 X@pNumber ; converts a single precision float to double
    Test_Else_if eax SSE_EXP_REAL8
        mov eax D@pNumber | movsd XMM0 X$eax
    Test_Else_if eax SSE_EXP_QWORD
        mov eax D@pNumber | movq XMM0 X$eax
    Test_Else
        xor eax eax | ExitP ; return 0 Invalid parameter
    Test_End

    SSE_D_If xmm0 <= X$Float_LambertBranch ; Lambert function cannot be smaller then exp(-1)
        xor eax eax
        ExitP
    SSE_D_End

    ; Before continue, we must check for NAN and friends. Todo later

    lea edi D@pTmpInputValDis
    movdqu X$edi xmm0 ; save the copy of inputvar
    movdqu X$edi+8 xmm0 ; save the WVar

    ; initial guess is based on 0 < ln(a) < 3
    ; double w = 3 * log(x + 1) / 4
    movsd xmm0 X$edi | addsd xmm0 X$Float_One | movsd X$edi+8 xmm0 ; save it to pWVar

    lea eax D$edi+8
    call Sse2_log eax, SSE_EXP_REAL8
;    mulsd xmm0 X$Float_Three_Quarter ; mul by 3/4 No longer needed


    movsd X$edi+8 xmm0  ; and store our WVar

    ;  Halley's method via eqn (5.9) in Corless et al (1996)
    ; w = w - (w * exp(w) - x) / (exp(w) * (w + 1) - (w + 2) * (w * exp(w) - x) / (2 * w + 2));
    ; can be rewritten as:
    ; y - (y * z - x) / (z * (y + 1) - (y + 2) * (y * z - x) / (2 * y + 2))
    ; where z = exp(w), y = w and x = our number. Resulting in:
    ; y = y - ((2 (y + 1) (y z - x))/((y + 2) (x + y z) + 2 z))

    lea eax D$edi+8
    call Sse2_exp_precise eax, SSE_EXP_REAL8
    movsd xmm1 X$edi+8; WVar
    ; (y z - x)
    movsd xmm2 xmm0 ; z = xmm2 = Exp(WVar) (In Real8 format)
    movsd xmm3 xmm1 | mulsd xmm3 xmm2 ; y*z = WVar*(ExpWVar)
    movsd xmm4 X$edi ; Our Number
    ; xmm1 = WVar = y
    ; xmm2 = Exp(WVar) = z
    ; xmm3 = WVar*(ExpWVar) = y*z
    ; xmm4 our Number converted to double = x

    ; y*z-x = WVar*(ExpWVar)- pNumber
    movsd xmm0 xmm3 | movsd xmm5 xmm4 | subsd xmm0 xmm5
    mulsd xmm0 X$Float_Two ; 2*(y*z-x)
    movsd xmm5 xmm1 | addsd xmm5 X$Float_One ; (y+1)
    mulsd xmm0 xmm5 ; 2*(y*z-x) * (y+1)
    ; xmm0 is our numerator. Let´s store it to xmm5 and start calculating the divisor
    ; xmm5  = 2*(y*z-x) * (y+1)
    movsd xmm5 xmm0
    ; let's work with the divisor now
    ; y*z+x = WVar*(ExpWVar)+ pNumber
    movsd xmm0 xmm3 | movsd xmm6 xmm4 | addsd xmm0 xmm6
    ; y+2
    movsd xmm7 xmm1 | addsd xmm7 X$Float_Two ; (y+2) = WVar+2
    ; (y+2) * (y*z+x)
    mulsd xmm0 xmm7
    ; y*2
    movsd xmm7 xmm1 | addsd xmm7 xmm7
    ; ((y+2) * (y*z+x)) + 2*y
    addsd xmm0 xmm7 ; Ok. xmm0 is our divisor
    ;let's divide the numerator (at xmm5) by the divisor (at xmm0)
    ; ((2 (y + 1) (y z - x))/((y + 2) (x + y z) + 2 z))
    divsd xmm5 xmm0
    ; good. Now we will subtract y (WVar) from this value at xmm5
    subsd xmm1 xmm5
    ; now we need only to convert it to store our WVar for the next loop
    movsd X$edi+8 xmm1  ; and store our WVar

    ; Try Fritsch iteration

    ; Wn+1 = Wn(1+E)

    ; E = (z/(1+w)) * ((q-z)/(q-2z))
    ; z = ln(x/W - W)
    ; q = 2*(1+W)*(1+W+Z*2/3)

    ; get Z to xmm1
    mov ecx 2; Perform only 2 iterations to ensure accuracy
    .Do

        movsd xmm0 X$edi | divsd xmm0 xmm1 | movsd X$edi+16 xmm0
        lea eax D$edi+16
        call Sse2_log eax, SSE_EXP_REAL8
        subsd xmm0 X$edi+8
        movsd X$edi+16 xmm0 ; save Zn

        ; Get Qn
        mulsd xmm0 X$SSE_Two_Thirds | addsd xmm0 X$edi+8 | addsd xmm0 X$Float_One
        movsd xmm1 X$edi+8 | addsd xmm1 X$Float_One | addsd xmm1 xmm1 | mulsd xmm0 xmm1
        movsd X$edi+24 xmm0

            ; get E
        subsd xmm0 X$edi+16
        movsd xmm1 X$edi+24 | subsd xmm1 X$edi+16 | subsd xmm1 X$edi+16
        divsd xmm0 xmm1
        mulsd xmm0 X$edi+16
        movsd xmm1 X$edi+8 | addsd xmm1 X$Float_One
        divsd xmm0 xmm1

        ; Get Wn+1
        addsd xmm0 X$Float_One | mulsd xmm0 X$edi+8

        movsd X$edi+8 xmm0; and store the result
        movsd xmm1 xmm0
        dec ecx
    .Loop_Until ecx = 0

    mov eax D@Output ; Allow the user to chose what type of output he want. Todo.
    fld R$edi+8 ; For now, output only as a real8. Maybe allowing to output int or even Real4, but...should it be necessary ? It maybe loose precision. To be checked later :)
    fstp F$eax

EndP


Once i succeed to fix the negative problem, i´ll write a small routine to check for NAN, Infinite etc and will try porting it to masm to we test. I could make all the calls to other functions such as exp or log be inline, but the could would be very very very unreadable. To get a bit more speed and also keep readability the better would be create variations of exp, log, log10 functions to use xmm0 as input without the internal checks for Nan´s (Afterall, this LambertW function will check for NAN already)

Lambert W is calculated to find x on a exponential function, like:

x*e(x) = 5

I´m trying to create this so we could apply these kind of complex math to image processing such as a faster Laplace Edge detector or Gaussian 2D filters
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

I´m facing a small problem when dealing with Negative values to be passed to Lambert W function.

Someone has any idea how to fix this on a faster way ? I mean, allowing us to calculate values such as "-0.2", for example ?

I suceeded to make it work for negative cases, but still looses accuracy. All i had to do is identify when the inputed number is negative and calculate the Poly-logarithm to work as a initial value to be found in the very 1st iteration.

        ; try with PolyLogarithm of iteration 1 https://mathworld.wolfram.com/Polylogarithm.html
        movsd xmm1 X$Float_LambertBranch | movsd X$edi+8 xmm1
        movsd xmm0 X$Float_One | subsd xmm0 X$edi+8 | movsd X$edi+8 xmm0 ; save it to pWVar
        lea eax D$edi+8
        call Sse2_log eax, SSE_EXP_REAL8
        mulsd xmm0 X$Float_Minus_One
        movsd X$edi+8 xmm0  ; and store our 1st iteration guessing WVar to be used to compute the final resultant WVar


It worked and gave a bit more of precision (something around 2 to 4 digits) but it´s not precise enough. I´m considering passing the result in a loop and calculate the difference (or a ratio) between the inputed value and the generated result and then pass the resultant value to this rate until it reaches a error of less then 1e-14.

Putting at the end of the function a flag of something like the pseudo-code below:

(....)

If InputValue < NegativeLimit
     Do
         newVal = WValue*exp(WValue) ; Check if the generated W Value will produce the same value as the input
         Ratio = newVal/InputVal ; calculate the ratio (difference) between the original inputed value and this new one we generated that could be the correct one
        Fix = sqrt(ratio) ;  Geet the ratio to fix the generated WValue
        newVal2 = WValue/Fix
        Diff =  InputVal-NewVal2
     Loop_Until Diff = 1e-14 ; See if the difference of the generated new WValue is accurate enough
      return NewVal2
Else
      return WValue
EndIf

But, it will probably slow down the code a little bit when we are dealing with negative value. Someone has any idea how to fix that and also keep the speed ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jack

Hi guga
some time ago I implemented the LambertW function in FreeBasic if you are interested I can post the code

guga

Quote from: jack on August 21, 2020, 05:02:53 AM
Hi guga
some time ago I implemented the LambertW function in FreeBasic if you are interested I can post the code

Hi Jack. many thanks.

Please  post it here so i can try to understand what it is doing. :thumbsup: :thumbsup: :thumbsup: I succeeded to make it work for positive values only. But negative values from x<0 x > -(1/e) it is failing miserably.

For positive numbers it works with only 1 iteration using Halley´s method followed by only 2 iterations of Fritsch method :thumbsup:. But for some odd reason, i´m not being able to find the initial approximation when the inputted number is negative.

For negative numbers, ex: -0.1 It do works only if i iterate with Halley´s method about 20 times and yet, it looses precision. :undecided:

I also found this one, but didn´t checked yet for precision, neither speed.
http://keithbriggs.info/software/LambertW.c
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jack

guga, I think that the C code you posted is what I translated to FB, not sure though

function LambertW(byval z as const double) as double
dim i as long
dim eps as const double = 4.0e-16
dim em1 as const double = 0.3678794411714423215955237701614608
dim p as double
dim e as double
dim t as double
dim w as double

if z < (-em1) then
print "LambertW: bad argument ";z;", exiting."
return 1
end if
if 0.0 = z then
return 0.0
end if
if z < ((-em1) + 1e-4) then
dim q as double = z + em1
dim r as double = sqr(q)
dim q2 as double = q * q
dim q3 as double = q2 * q
return ((((((((-1.0) + (2.331643981597124203363536062168 * r)) - (1.812187885639363490240191647568 * q)) + ((1.936631114492359755363277457668 * r) * q)) - (2.353551201881614516821543561516 * q2)) + ((3.066858901050631912893148922704 * r) * q2)) - (4.175335600258177138854984177460 * q3)) + ((5.858023729874774148815053846119 * r) * q3)) - ((8.401032217523977370984161688514 * q3) * q)
end if
if z < 1.0 then
p = sqr(2.0 * ((2.7182818284590452353602874713526625 * z) + 1.0))
w = (-1.0) + (p * (1.0 + (p * ((-0.333333333333333333333) + (p * 0.152777777777777777777777)))))
else
w = log(z)
end if
if z > 3.0 then
w -= log(w)
end if
for i as long=0 to 9
e=exp(w)
t=w*e-z
p=w+1.0
t/=e*p-0.5*(p+1.0)*t/p
w-=t
if abs(t)<eps*(1.0+abs(w)) then return w
next
print "LambertW: No convergence at z=";z;", exiting."
return 1
end function

sub main1
  dim as double z,w,er
  for i as long=0 to 99
    z=i/100.0-0.3678794411714423215955: w=LambertW(z)
    er=exp(w)-z/w     
    print using "W(###.####) =##.###############^^^^, check: exp(W(z))-z/W(z) =##.#####^^^^";z;w;er
  next
  for i as long=0 to 99
    z=i/1.0e-1-0.3: w=LambertW(z)
    err=exp(w)-z/w
    print using "W(###.####) =##.###############^^^^, check: exp(W(z))-z/W(z) =##.#####^^^^";z;w;er
  next
end sub

guga

Thanks a lot Jack. :thumbsup: :thumbsup: :thumbsup:

I´ll take a look. Indeed, both codes looks similar, but yours seems easier to follow :thumbsup: :thumbsup:

A few questions
a)  does it works for negative values ex: -0.1, -0.2 etc ?
b) What is the precision ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jack

a- yes
b- the precision is about 15 digits

coaster

Greetings,

This is my first post on this forum... proving that it's not a spam registration :) 
My knowledge of assembly is limited (X86) so cannot contribute to advanced coding, let alone to optimisations. Nevertheless, I recently looked for fast Lambert W implementations for large positive reals and have found some important papers missing from the original post.

The paper by Iacono & Boyd worth reading: 
https://www.researchgate.net/profile/John_Boyd2/publication/315904900_New_approximations_to_the_principal_real-valued_branch_of_the_Lambert_W-function/links/5c6bcb124585156b5706e27c/New-approximations-to-the-principal-real-valued-branch-of-the-Lambert-W-function.pdf

Fukushima's used tricks to avoid avoid log/exp evaluations, hence his solution is about twice faster than that of Fritsch, for doubles.
https://www.researchgate.net/publication/233741496_Precise_and_fast_computation_of_Lambert_W-functions_without_transcendental_function_evaluations
His original code (Fortran) was ported to C++ (from memory) by Veberic, and recently was enhanced by a group working on a library (duh, lost the link). As I recall they use C++ so that very likely can be improved using assembly.

Fritsch et al (it's easy to find their paper) was crafted for positive real arguments, so no wonder if Guga had issues for negative inputs. It is a remarkable solution because an iteration requires only one computationally expensive (log) evaluation. It's weakness is the poor initial solution. I needed W0(z) with about 6-7 digits accuracy using a single iteration so wanted a better initial value. By fitting data to various 'prototype' functions I found that with
c=ln(1+z)
Wi=c*(1-ln(1+c)/(2+c))
gives an initial Wi value with 2 correct significant digits. Bingo,  a single iteration gives 8 correct significant digits, my problem was solved. But to achieve double precision with a single Fritsch iteration would require an initial value with t 10^-4 accuracy... looks like a tough call.

guga

Quote from: coaster on October 11, 2020, 05:16:45 AM
Greetings,

This is my first post on this forum... proving that it's not a spam registration :) 
My knowledge of assembly is limited (X86) so cannot contribute to advanced coding, let alone to optimisations. Nevertheless, I recently looked for fast Lambert W implementations for large positive reals and have found some important papers missing from the original post.

The paper by Iacono & Boyd worth reading: 
https://www.researchgate.net/profile/John_Boyd2/publication/315904900_New_approximations_to_the_principal_real-valued_branch_of_the_Lambert_W-function/links/5c6bcb124585156b5706e27c/New-approximations-to-the-principal-real-valued-branch-of-the-Lambert-W-function.pdf

Fukushima's used tricks to avoid avoid log/exp evaluations, hence his solution is about twice faster than that of Fritsch, for doubles.
https://www.researchgate.net/publication/233741496_Precise_and_fast_computation_of_Lambert_W-functions_without_transcendental_function_evaluations
His original code (Fortran) was ported to C++ (from memory) by Veberic, and recently was enhanced by a group working on a library (duh, lost the link). As I recall they use C++ so that very likely can be improved using assembly.

Fritsch et al (it's easy to find their paper) was crafted for positive real arguments, so no wonder if Guga had issues for negative inputs. It is a remarkable solution because an iteration requires only one computationally expensive (log) evaluation. It's weakness is the poor initial solution. I needed W0(z) with about 6-7 digits accuracy using a single iteration so wanted a better initial value. By fitting data to various 'prototype' functions I found that with
c=ln(1+z)
Wi=c*(1-ln(1+c)/(2+c))
gives an initial Wi value with 2 correct significant digits. Bingo,  a single iteration gives 8 correct significant digits, my problem was solved. But to achieve double precision with a single Fritsch iteration would require an initial value with t 10^-4 accuracy... looks like a tough call.

Tks you a lot Coaster, and welcome onboard :bgrin:

I´pll take a look at the papers when i have some free time.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

LambertW() looks like a candidate for FastMath. If somebody volunteers to write a Masm proc, I can implement it, so that we can do some benchmarking.

guga

Quote from: jj2007 on October 11, 2020, 07:40:02 AM
LambertW() looks like a candidate for FastMath. If somebody volunteers to write a Masm proc, I can implement it, so that we can do some benchmarking.

Hi JJ

When i have some free time i´ll try to rewrite the function using the equation and articles posted by Coaster. Perhaps i can suceed to port to masmbasic using masm syntax as well, i presume. I just don´t know when i´ll be able to do it, because i´m trying to fix some  old old problems in RosAsm 1st. A hell, my friend, a true hell  :greensml: :greensml: :greensml: :greensml:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

Quote from: guga on October 11, 2020, 09:48:01 AMA hell, my friend, a true hell  :greensml: :greensml: :greensml: :greensml:

I can understand you  :greensml:

Yesterday I solved, after many hours of fighting, the Balloon tooltips flicker when themed (manifest) applied problem. If Bill Gates would have to pay for all his bugs, he'd sit in jail as a poor man :cool:

coaster

In the meantime I've found the library development link mentioned earlier:
https://www.boost.org/doc/libs/develop/libs/math/doc/html/math_toolkit/lambert_w.html