Ok, Guys. I´m quite over.

Finished fixing a error of small numbers closer to zero and now i´m trying to fix the problem of negative values using LambertW (inside the limit of -1/exp(1)

The current version is fast. (Only 2 iterations needed to grant accuracy)

`[SSE_Two_Thirds: R$ (2/3)]`

Proc LambertW2:

Arguments @pNumber, @Output, @Flag

Local @AmountOfIterations, @BaseNumber, @WVar, @ExpWVar, @FloatOne, @pWVar

Structure @TempStorage 32, @pTmpInputValDis 0, @pTmpWVar 8, @pZn 16, @Qn 24

Uses edi, ecx

mov eax D@Flag

Test_if eax SSE_EXP_INT

cvtsi2sd xmm0 D@pNumber ; converts a signed integer to double

Test_Else_if eax SSE_EXP_FLOAT

cvtss2sd xmm0 X@pNumber ; converts a single precision float to double

Test_Else_if eax SSE_EXP_REAL8

mov eax D@pNumber | movsd XMM0 X$eax

Test_Else_if eax SSE_EXP_QWORD

mov eax D@pNumber | movq XMM0 X$eax

Test_Else

xor eax eax | ExitP ; return 0 Invalid parameter

Test_End

SSE_D_If xmm0 <= X$Float_LambertBranch ; Lambert function cannot be smaller then exp(-1)

xor eax eax

ExitP

SSE_D_End

; Before continue, we must check for NAN and friends. Todo later

lea edi D@pTmpInputValDis

movdqu X$edi xmm0 ; save the copy of inputvar

movdqu X$edi+8 xmm0 ; save the WVar

; initial guess is based on 0 < ln(a) < 3

; double w = 3 * log(x + 1) / 4

movsd xmm0 X$edi | addsd xmm0 X$Float_One | movsd X$edi+8 xmm0 ; save it to pWVar

lea eax D$edi+8

call Sse2_log eax, SSE_EXP_REAL8

; mulsd xmm0 X$Float_Three_Quarter ; mul by 3/4 No longer needed

movsd X$edi+8 xmm0 ; and store our WVar

; Halley's method via eqn (5.9) in Corless et al (1996)

; w = w - (w * exp(w) - x) / (exp(w) * (w + 1) - (w + 2) * (w * exp(w) - x) / (2 * w + 2));

; can be rewritten as:

; y - (y * z - x) / (z * (y + 1) - (y + 2) * (y * z - x) / (2 * y + 2))

; where z = exp(w), y = w and x = our number. Resulting in:

; y = y - ((2 (y + 1) (y z - x))/((y + 2) (x + y z) + 2 z))

lea eax D$edi+8

call Sse2_exp_precise eax, SSE_EXP_REAL8

movsd xmm1 X$edi+8; WVar

; (y z - x)

movsd xmm2 xmm0 ; z = xmm2 = Exp(WVar) (In Real8 format)

movsd xmm3 xmm1 | mulsd xmm3 xmm2 ; y*z = WVar*(ExpWVar)

movsd xmm4 X$edi ; Our Number

; xmm1 = WVar = y

; xmm2 = Exp(WVar) = z

; xmm3 = WVar*(ExpWVar) = y*z

; xmm4 our Number converted to double = x

; y*z-x = WVar*(ExpWVar)- pNumber

movsd xmm0 xmm3 | movsd xmm5 xmm4 | subsd xmm0 xmm5

mulsd xmm0 X$Float_Two ; 2*(y*z-x)

movsd xmm5 xmm1 | addsd xmm5 X$Float_One ; (y+1)

mulsd xmm0 xmm5 ; 2*(y*z-x) * (y+1)

; xmm0 is our numerator. Let´s store it to xmm5 and start calculating the divisor

; xmm5 = 2*(y*z-x) * (y+1)

movsd xmm5 xmm0

; let's work with the divisor now

; y*z+x = WVar*(ExpWVar)+ pNumber

movsd xmm0 xmm3 | movsd xmm6 xmm4 | addsd xmm0 xmm6

; y+2

movsd xmm7 xmm1 | addsd xmm7 X$Float_Two ; (y+2) = WVar+2

; (y+2) * (y*z+x)

mulsd xmm0 xmm7

; y*2

movsd xmm7 xmm1 | addsd xmm7 xmm7

; ((y+2) * (y*z+x)) + 2*y

addsd xmm0 xmm7 ; Ok. xmm0 is our divisor

;let's divide the numerator (at xmm5) by the divisor (at xmm0)

; ((2 (y + 1) (y z - x))/((y + 2) (x + y z) + 2 z))

divsd xmm5 xmm0

; good. Now we will subtract y (WVar) from this value at xmm5

subsd xmm1 xmm5

; now we need only to convert it to store our WVar for the next loop

movsd X$edi+8 xmm1 ; and store our WVar

; Try Fritsch iteration

; Wn+1 = Wn(1+E)

; E = (z/(1+w)) * ((q-z)/(q-2z))

; z = ln(x/W - W)

; q = 2*(1+W)*(1+W+Z*2/3)

; get Z to xmm1

mov ecx 2; Perform only 2 iterations to ensure accuracy

.Do

movsd xmm0 X$edi | divsd xmm0 xmm1 | movsd X$edi+16 xmm0

lea eax D$edi+16

call Sse2_log eax, SSE_EXP_REAL8

subsd xmm0 X$edi+8

movsd X$edi+16 xmm0 ; save Zn

; Get Qn

mulsd xmm0 X$SSE_Two_Thirds | addsd xmm0 X$edi+8 | addsd xmm0 X$Float_One

movsd xmm1 X$edi+8 | addsd xmm1 X$Float_One | addsd xmm1 xmm1 | mulsd xmm0 xmm1

movsd X$edi+24 xmm0

; get E

subsd xmm0 X$edi+16

movsd xmm1 X$edi+24 | subsd xmm1 X$edi+16 | subsd xmm1 X$edi+16

divsd xmm0 xmm1

mulsd xmm0 X$edi+16

movsd xmm1 X$edi+8 | addsd xmm1 X$Float_One

divsd xmm0 xmm1

; Get Wn+1

addsd xmm0 X$Float_One | mulsd xmm0 X$edi+8

movsd X$edi+8 xmm0; and store the result

movsd xmm1 xmm0

dec ecx

.Loop_Until ecx = 0

mov eax D@Output ; Allow the user to chose what type of output he want. Todo.

fld R$edi+8 ; For now, output only as a real8. Maybe allowing to output int or even Real4, but...should it be necessary ? It maybe loose precision. To be checked later :)

fstp F$eax

EndP

Once i succeed to fix the negative problem, i´ll write a small routine to check for NAN, Infinite etc and will try porting it to masm to we test. I could make all the calls to other functions such as exp or log be inline, but the could would be very very very unreadable. To get a bit more speed and also keep readability the better would be create variations of exp, log, log10 functions to use xmm0 as input without the internal checks for Nan´s (Afterall, this LambertW function will check for NAN already)

Lambert W is calculated to find x on a exponential function, like:

x*e^{(x)} = 5

I´m trying to create this so we could apply these kind of complex math to image processing such as a faster Laplace Edge detector or Gaussian 2D filters