Ok, Guys. I´m quite over.
Finished fixing a error of small numbers closer to zero and now i´m trying to fix the problem of negative values using LambertW (inside the limit of -1/exp(1)
The current version is fast. (Only 2 iterations needed to grant accuracy)
[SSE_Two_Thirds: R$ (2/3)]
Proc LambertW2:
Arguments @pNumber, @Output, @Flag
Local @AmountOfIterations, @BaseNumber, @WVar, @ExpWVar, @FloatOne, @pWVar
Structure @TempStorage 32, @pTmpInputValDis 0, @pTmpWVar 8, @pZn 16, @Qn 24
Uses edi, ecx
mov eax D@Flag
Test_if eax SSE_EXP_INT
cvtsi2sd xmm0 D@pNumber ; converts a signed integer to double
Test_Else_if eax SSE_EXP_FLOAT
cvtss2sd xmm0 X@pNumber ; converts a single precision float to double
Test_Else_if eax SSE_EXP_REAL8
mov eax D@pNumber | movsd XMM0 X$eax
Test_Else_if eax SSE_EXP_QWORD
mov eax D@pNumber | movq XMM0 X$eax
Test_Else
xor eax eax | ExitP ; return 0 Invalid parameter
Test_End
SSE_D_If xmm0 <= X$Float_LambertBranch ; Lambert function cannot be smaller then exp(-1)
xor eax eax
ExitP
SSE_D_End
; Before continue, we must check for NAN and friends. Todo later
lea edi D@pTmpInputValDis
movdqu X$edi xmm0 ; save the copy of inputvar
movdqu X$edi+8 xmm0 ; save the WVar
; initial guess is based on 0 < ln(a) < 3
; double w = 3 * log(x + 1) / 4
movsd xmm0 X$edi | addsd xmm0 X$Float_One | movsd X$edi+8 xmm0 ; save it to pWVar
lea eax D$edi+8
call Sse2_log eax, SSE_EXP_REAL8
; mulsd xmm0 X$Float_Three_Quarter ; mul by 3/4 No longer needed
movsd X$edi+8 xmm0 ; and store our WVar
; Halley's method via eqn (5.9) in Corless et al (1996)
; w = w - (w * exp(w) - x) / (exp(w) * (w + 1) - (w + 2) * (w * exp(w) - x) / (2 * w + 2));
; can be rewritten as:
; y - (y * z - x) / (z * (y + 1) - (y + 2) * (y * z - x) / (2 * y + 2))
; where z = exp(w), y = w and x = our number. Resulting in:
; y = y - ((2 (y + 1) (y z - x))/((y + 2) (x + y z) + 2 z))
lea eax D$edi+8
call Sse2_exp_precise eax, SSE_EXP_REAL8
movsd xmm1 X$edi+8; WVar
; (y z - x)
movsd xmm2 xmm0 ; z = xmm2 = Exp(WVar) (In Real8 format)
movsd xmm3 xmm1 | mulsd xmm3 xmm2 ; y*z = WVar*(ExpWVar)
movsd xmm4 X$edi ; Our Number
; xmm1 = WVar = y
; xmm2 = Exp(WVar) = z
; xmm3 = WVar*(ExpWVar) = y*z
; xmm4 our Number converted to double = x
; y*z-x = WVar*(ExpWVar)- pNumber
movsd xmm0 xmm3 | movsd xmm5 xmm4 | subsd xmm0 xmm5
mulsd xmm0 X$Float_Two ; 2*(y*z-x)
movsd xmm5 xmm1 | addsd xmm5 X$Float_One ; (y+1)
mulsd xmm0 xmm5 ; 2*(y*z-x) * (y+1)
; xmm0 is our numerator. Let´s store it to xmm5 and start calculating the divisor
; xmm5 = 2*(y*z-x) * (y+1)
movsd xmm5 xmm0
; let's work with the divisor now
; y*z+x = WVar*(ExpWVar)+ pNumber
movsd xmm0 xmm3 | movsd xmm6 xmm4 | addsd xmm0 xmm6
; y+2
movsd xmm7 xmm1 | addsd xmm7 X$Float_Two ; (y+2) = WVar+2
; (y+2) * (y*z+x)
mulsd xmm0 xmm7
; y*2
movsd xmm7 xmm1 | addsd xmm7 xmm7
; ((y+2) * (y*z+x)) + 2*y
addsd xmm0 xmm7 ; Ok. xmm0 is our divisor
;let's divide the numerator (at xmm5) by the divisor (at xmm0)
; ((2 (y + 1) (y z - x))/((y + 2) (x + y z) + 2 z))
divsd xmm5 xmm0
; good. Now we will subtract y (WVar) from this value at xmm5
subsd xmm1 xmm5
; now we need only to convert it to store our WVar for the next loop
movsd X$edi+8 xmm1 ; and store our WVar
; Try Fritsch iteration
; Wn+1 = Wn(1+E)
; E = (z/(1+w)) * ((q-z)/(q-2z))
; z = ln(x/W - W)
; q = 2*(1+W)*(1+W+Z*2/3)
; get Z to xmm1
mov ecx 2; Perform only 2 iterations to ensure accuracy
.Do
movsd xmm0 X$edi | divsd xmm0 xmm1 | movsd X$edi+16 xmm0
lea eax D$edi+16
call Sse2_log eax, SSE_EXP_REAL8
subsd xmm0 X$edi+8
movsd X$edi+16 xmm0 ; save Zn
; Get Qn
mulsd xmm0 X$SSE_Two_Thirds | addsd xmm0 X$edi+8 | addsd xmm0 X$Float_One
movsd xmm1 X$edi+8 | addsd xmm1 X$Float_One | addsd xmm1 xmm1 | mulsd xmm0 xmm1
movsd X$edi+24 xmm0
; get E
subsd xmm0 X$edi+16
movsd xmm1 X$edi+24 | subsd xmm1 X$edi+16 | subsd xmm1 X$edi+16
divsd xmm0 xmm1
mulsd xmm0 X$edi+16
movsd xmm1 X$edi+8 | addsd xmm1 X$Float_One
divsd xmm0 xmm1
; Get Wn+1
addsd xmm0 X$Float_One | mulsd xmm0 X$edi+8
movsd X$edi+8 xmm0; and store the result
movsd xmm1 xmm0
dec ecx
.Loop_Until ecx = 0
mov eax D@Output ; Allow the user to chose what type of output he want. Todo.
fld R$edi+8 ; For now, output only as a real8. Maybe allowing to output int or even Real4, but...should it be necessary ? It maybe loose precision. To be checked later :)
fstp F$eax
EndP
Once i succeed to fix the negative problem, i´ll write a small routine to check for NAN, Infinite etc and will try porting it to masm to we test. I could make all the calls to other functions such as exp or log be inline, but the could would be very very very unreadable. To get a bit more speed and also keep readability the better would be create variations of exp, log, log10 functions to use xmm0 as input without the internal checks for Nan´s (Afterall, this LambertW function will check for NAN already)
Lambert W is calculated to find x on a exponential function, like:
x*e(x) = 5
I´m trying to create this so we could apply these kind of complex math to image processing such as a faster Laplace Edge detector or Gaussian 2D filters