Hi Guys
Another test. This time is a Fast Log10 function on the same way as we are testing the Fast exp on this thread
http://masm32.com/board/index.php?topic=8734.0I succeeded to assemble the file with masm basic (have small errors, but it is working as expected

). The function calculates the Log10 of a number with a precision of 16 digits after the ".".
Also, it checks for NAN, Zero and Negative and positive Infinite and calculates denormalized values.
The resultant value is stored in xmm0. (This time i succeded to make it use less xmm registers

)
Important: If you would assemble the function alone, it is mandatory to align the data on a 16 byte boundary due to the usage of 128 bit addressing of xmm registers. (I used memory addresses to minimize the usage of so many xmm registers)
Btw, plsease don´t botter the labels on the outpuuted console. I don´t know how to configure masmbasic to output the proper value

Original value of Log10(5)
Log10(5) = 0.
698970004336018804786261105275506973231810118537891458689
My version:
Log10(5) = 0.
698970004336018857
In my PC, the average speed is around 2500 cycles, but again..we can make it work even faster with some extra effort. I can, make it works in something around 1600 cycles
(Without loosing precision) simply removing the check for NANs and commenting this line
cmp ecx, 2045 ; Check for NAN, Infinite, Zero and Denormalized values
jbe loc_4C6BFC
(...)
loc_4C6BFC:
But, if we do so, we won´t be able to check for errors on the input number and neither would be able to calculate the log10 of denormalized values.
The parameters of usage are the same as in the exp function. So:
The parameters are:
1st Parameter = The value to be calculated. It can be a integer, a Float or a address to Real8 (or qword)
2nd Parameter - The flag to be used on each type of input. I build a set of equates for that.
SSE_EXP_INT equ 1
SSE_EXP_FLOAT equ 2
SSE_EXP_REAL8 equ 4
SSE_EXP_QWORD equ 8
So, if you want to calculate the log10 of 5, and the input is a integer you must use
invoke Sse2_log10_precise, 5, 1
or
invoke Sse2_log10_precise, 5, SSE_EXP_INT
If the input is a Float (Real4) you use:
MyValue Real4 5.0 ; I believe this is the masm syntax for Float, right ?
invoke Sse2_log10_precise, [MyValue], 2
or
invoke Sse2_log10_precise, [MyValue], SSE_EXP_FLOAT
If the input is a Real8, you must use as a input the address (Offset) of the Value, like this:
MyValue Real8 5.0
invoke Sse2_log10_precise, offset MyValue, 4
or
MyValue Real8 5.0
invoke Sse2_log10_precise, offset MyValue, SSE_EXP_REAL8
and, the same for using to calculate as a QWord (int64), but using 8 as a flag (2nd parameter) and the pointer to the int64 value on the 1st parameter.
Ex:
push 5 ; The value "5" we use to compute Real8 as i explained above
push offset MyExpo ; Pointer to Real8 or to a Qword only when we are using 64 bits values. For 32 Bits values (dword, Float etc),
; we use the value directly because int or Float are not pointers. (So, without the "offset" thing in masm)
call Sse2_log10_precise
RosAsm version is as follows:
[FloatZero: R$ 0]
[SSE_Two52: Q$ 4841369599423283200, 4841369599423283200] ; = 2^52 ;D$ 0, 043300000, 0, 043300000]
[SSE_One: R$ 1, R$ 1]
[SSE_LOG_EMASK < 2.22507385850720082e-308 >]; Using Body Equate to set the proper value which is Closer to DBL_MIN that is: 2.2250738585072014e-308
[SSE_LOG_HIMASK < 1.79769313486231571e+308 >]; Using Body Equate to set the proper value which is Closer to DBL_MAX that is: 1.7976931348623158e+308
[SSE_Emask: R$ SSE_LOG_EMASK, SSE_LOG_EMASK] ; D$ 0-1, 0FFFFF, 0-1, 0FFFFF
[<16 SSE_LOG10_CC_0: R$ (111/256), R$ (111/256)] ; L10EA
[<16 SSE_Magic0: R$ 4.39804651110300781e+12, R$ 4.39804651110300781e+12]
[SSE_LOG10_HIMASK < 07FFFFFFF__80000000 >]; Using Body Equate to set the proper value which is Closer to DBL_MAX that is: 1.7976931348623158e+308
[SSE_HiMask0: Q$ SSE_LOG10_HIMASK, SSE_LOG10_HIMASK]
[<16 SSE_Place_Log2: D$ 0, 0
D$ 0FFFFFFFF, 0FFFFFFFF,
D$ 0FFFFFFFF, 0FFFFFFFF,
D$ 0, 0]
[<16 SSE_Log1020: R$ 3.01029995663952832e-1, R$ 2.83633945510449641e-14] ; Log10(2) = 0.3010299956639811952137388947244930267681898814621085413104274611
[<16 Log_Coeff0: R$ 21.5354732628465832, R$ -3.07179525615370474]
[<16 SSE_Log10Var1: R$ -10.8935578527763628, R$ 1.77588163534834509]
[<16 SSE_Log10Var2: R$ 5.66760060334353621, R$ -1.15501676674018694]
[<16 SSE_Log10Var3: R$ 1.61610240749971053e-3, R$ 0]
; most likely a Log10(x) table from 0 to 2 followed by some error
[<16 Log10_Table_T: R$ 0, R$ 0, R$ 6.83942453019881210e-003, R$ 1.06653844666057790e-013
R$ 1.33507081443440260e-002, R$ 8.67505934719954320e-014, R$ 1.99611018521181900e-002, R$ 9.23215523723840900e-014
R$ 2.62229227369061850e-002, R$ 7.49930509874774870e-014, R$ 3.25763513509400580e-002, R$ 2.41276323350416150e-014
R$ 3.90241079016959700e-002, R$ 1.07524865705891230e-014, R$ 4.50982556138797010e-002, R$ 2.01960064510954180e-014
R$ 5.12585643186866950e-002, R$ 3.16576133915421310e-014, R$ 5.70236199724831750e-002, R$ 2.44070710374291600e-014
R$ 6.31113911136935710e-002, R$ 2.48257788963393190e-014, R$ 6.87885240052992230e-002, R$ 1.09693456491489220e-013
R$ 7.45408528944153660e-002, R$ 8.48557345140544200e-014, R$ 8.03703965551676450e-002, R$ 5.64317690419344980e-014
R$ 8.60206705779091860e-002, R$ 2.11074530681352090e-014, R$ 9.14835662794075690e-002, R$ 2.48825492648481080e-014
R$ 9.70160548793046470e-002, R$ 8.88302993483350900e-014, R$ 1.02351435027458140e-001, R$ 8.15103304243699220e-014
R$ 1.07753177325776050e-001, R$ 4.45086172353454940e-014, R$ 1.12947822295495830e-001, R$ 3.08831178628687270e-015
R$ 1.18205353949292660e-001, R$ 3.88929326942825740e-014, R$ 1.23245578588807800e-001, R$ 4.71750903210203580e-014
R$ 1.28344985300145710e-001, R$ 6.57467750563177680e-014, R$ 1.33216699989134210e-001, R$ 2.71250821687622270e-014
R$ 1.38435254551609430e-001, R$ 7.56877432679591610e-015, R$ 1.43127205461155430e-001, R$ 6.81055570899612790e-015
R$ 1.48168577326714510e-001, R$ 6.02542674600232690e-014, R$ 1.52967460208515150e-001, R$ 2.83427723708756480e-014
R$ 1.57515087959154700e-001, R$ 1.09440764415528030e-013, R$ 1.62418959194383210e-001, R$ 5.35145729088875760e-014
R$ 1.67067178541742580e-001, R$ 5.99509971730059460e-014, R$ 1.71450865902443180e-001, R$ 1.13452388582535560e-013
R$ 1.76197300927015020e-001, R$ 3.28390775041250930e-015, R$ 1.80674603281659070e-001, R$ 1.03481913998834240e-013
R$ 1.85198545041771470e-001, R$ 3.73116362151814480e-014, R$ 1.89441967200082220e-001, R$ 2.98008559425539880e-014
R$ 1.93727260613627550e-001, R$ 8.13197508503595880e-014, R$ 1.98055259839406970e-001, R$ 3.57492486261199750e-014
R$ 2.02426824636404490e-001, R$ 7.53155364447138070e-014, R$ 2.06501548650066980e-001, R$ 7.07722033818675070e-014
R$ 2.10959407186123830e-001, R$ 1.06420585148621610e-013, R$ 2.15115366957320480e-001, R$ 6.74895867627637050e-014
R$ 2.18960252674605730e-001, R$ 6.67672190370376170e-014, R$ 2.23193863603228240e-001, R$ 1.36388742479585940e-014
R$ 2.27111265564531100e-001, R$ 2.32760258694916680e-014, R$ 2.31425484637043160e-001, R$ 2.92761620518569070e-014
R$ 2.35053696899512940e-001, R$ 6.25806885708624120e-014, R$ 2.39080054690248290e-001, R$ 3.00590055364650880e-014
R$ 2.43144090557620980e-001, R$ 1.05192270531529520e-014, R$ 2.46871963076841890e-001, R$ 3.27758102924169520e-014
R$ 2.50632111950153560e-001, R$ 2.79059969846129800e-014, R$ 2.54425100967296200e-001, R$ 2.43505807623931280e-014
R$ 2.58251508820308120e-001, R$ 6.53068360680055810e-014, R$ 2.62111929633533690e-001, R$ 7.78445532603097920e-014
R$ 2.65615893362905810e-001, R$ 1.97240087303329050e-014, R$ 2.69542633331980140e-001, R$ 6.12305485363238830e-014
R$ 2.73107313935042840e-001, R$ 3.18805618188936470e-014, R$ 2.76701495678366880e-001, R$ 1.05904780751833190e-013
R$ 2.80325670940214880e-001, R$ 4.14680805130042090e-014, R$ 2.83572747613220600e-001, R$ 1.90891161216058830e-014
R$ 2.87254964996350280e-001, R$ 1.65981890215736650e-014, R$ 2.90554464110186930e-001, R$ 4.83600159236775040e-014
R$ 2.94296613004917160e-001, R$ 9.56300328864571600e-014, R$ 2.97650255012513300e-001, R$ 8.72995849029738040e-014
R$ 3.01029995663952830e-001, R$ 2.83633945510449640e-014
Log10_Table_B: R$ (227328/524288), R$ (227328/524288), R$ (223776/524288), R$ (223776/524288)
R$ (220446/524288), R$ (220446/524288), R$ (217116/524288), R$ (217116/524288)
R$ (214008/524288), R$ (214008/524288), R$ (210900/524288), R$ (210900/524288)
R$ (207792/524288), R$ (207792/524288), R$ (204906/524288), R$ (204906/524288)
R$ (202020/524288), R$ (202020/524288), R$ (199356/524288), R$ (199356/524288)
R$ (196581/524288), R$ (196581/524288), R$ (194028/524288), R$ (194028/524288)
R$ (191475/524288), R$ (191475/524288), R$ (188922/524288), R$ (188922/524288)
R$ (186480/524288), R$ (186480/524288), R$ (184149/524288), R$ (184149/524288)
R$ (181818/524288), R$ (181818/524288), R$ (179598/524288), R$ (179598/524288)
R$ (177378/524288), R$ (177378/524288), R$ (175269/524288), R$ (175269/524288)
R$ (173160/524288), R$ (173160/524288), R$ (171162/524288), R$ (171162/524288)
R$ (169164/524288), R$ (169164/524288), R$ (167277/524288), R$ (167277/524288)
R$ (165279/524288), R$ (165279/524288), R$ (163503/524288), R$ (163503/524288)
R$ (161616/524288), R$ (161616/524288), R$ (159840/524288), R$ (159840/524288)
R$ (158175/524288), R$ (158175/524288), R$ (156399/524288), R$ (156399/524288)
R$ (154734/524288), R$ (154734/524288), R$ (153180/524288), R$ (153180/524288)
R$ (151515/524288), R$ (151515/524288), R$ (149961/524288), R$ (149961/524288)
R$ (148407/524288), R$ (148407/524288), R$ (146964/524288), R$ (146964/524288)
R$ (145521/524288), R$ (145521/524288), R$ (144078/524288), R$ (144078/524288)
R$ (142635/524288), R$ (142635/524288), R$ (141303/524288), R$ (141303/524288)
R$ (139860/524288), R$ (139860/524288), R$ (138528/524288), R$ (138528/524288)
R$ (137307/524288), R$ (137307/524288), R$ (135975/524288), R$ (135975/524288)
R$ (134754/524288), R$ (134754/524288), R$ (133422/524288), R$ (133422/524288)
R$ (132312/524288), R$ (132312/524288), R$ (131091/524288), R$ (131091/524288)
R$ (129870/524288), R$ (129870/524288), R$ (128760/524288), R$ (128760/524288)
R$ (127650/524288), R$ (127650/524288), R$ (126540/524288), R$ (126540/524288)
R$ (125430/524288), R$ (125430/524288), R$ (124320/524288), R$ (124320/524288)
R$ (123321/524288), R$ (123321/524288), R$ (122211/524288), R$ (122211/524288)
R$ (121212/524288), R$ (121212/524288), R$ (120213/524288), R$ (120213/524288)
R$ (119214/524288), R$ (119214/524288), R$ (118326/524288), R$ (118326/524288)
R$ (117327/524288), R$ (117327/524288), R$ (116439/524288), R$ (116439/524288)
R$ (115440/524288), R$ (115440/524288), R$ (114552/524288), R$ (114552/524288)
R$ (113664/524288), R$ (113664/524288)]
; Parameters flag
[SSE_EXP_INT 1
SSE_EXP_FLOAT 2
SSE_EXP_REAL8 4
SSE_EXP_QWORD 8]
; Values to return
[SSE_EXP_INVALID_PARAMETER 0-1] ; Invalid flag
[SSE_UNDERFLOW 0-2] ; The inputed number is underflow
[SSE_OVERFLOWN 0-3] ; The inputed number is overflow
[SSE_INFINITE 0-4] ; General error. The inputed number is infinite, or NAN etc
[SSE_ZERO 0-5] ; The inputed number is zero. Log and Ln cannot have this
[SSE_NEG_INFINITE 0-6] ; Negative Infinite found
[SSE_POS_INFINITE 0-7] ; Negative Infinite found
[SSE_NAN 0-9] ; NAN. Not a number
Proc Sse2_log10_precise:
Arguments @Number, @Flag
Uses edx, ecx
mov eax D@Flag
Test_if eax SSE_EXP_INT
cvtsi2sd xmm0 D@Number ; converts a signed integer to double
Test_Else_if eax SSE_EXP_FLOAT
cvtss2sd xmm0 X@Number ; converts a single precision float to double
Test_Else_if eax SSE_EXP_REAL8
mov eax D@Number | movsd XMM0 X$eax
Test_Else_if eax SSE_EXP_QWORD
mov eax D@Number | movq XMM0 X$eax
Test_Else
xor eax eax | ExitP ; return 0 Invalid parameter
Test_End
xor edx edx
movupd XMM1 XMM0
unpcklpd XMM0 XMM0
psrlq XMM1 52 | pextrw ecx XMM1 0 | and ecx 0FFF | sub ecx 1
...If ecx > 2045 ; Special cases. Number have some error
.SSE_D_If xmm0 <= X$Float_Zero ; Inputed value is zero
mov eax SSE_ZERO
SSE_D_If xmm0 < X$Float_Zero ; Inputed value is negatve
mov eax SSE_NEG_INFINITE
SSE_D_End
ExitP ; Exit the function
.SSE_D_Else
..If ecx = 0-1 ; number is denormalized. We can continue
mulsd XMM0 X$SSE_Two52 ; for very tinny numbers. Ex: x = 2e-314;XMM1
mov edx 0-52
xor eax eax
movupd XMM1 XMM0
unpcklpd XMM0 XMM0
psrlq XMM1 52 | pextrw ecx XMM1 0 | and ecx 0FFF | sub ecx 1
..Else
movupd XMM1 X$SSE_One ; same as SSE_One2
andpd XMM0 X$SSE_Emask ; same as SSE_Emask2
orpd XMM0 XMM1
cmpsd XMM1 XMM0 0 ; (EQ) error in rosasm
pextrw eax XMM1 0
If eax = 0 ; Not a Number
mov eax SSE_NAN
Else ; Number is positive infinite
mov eax SSE_POS_INFINITE
End_If
ExitP ; Exit the function
..End_If
.SSE_D_End
...End_If
movupd XMM1 X$SSE_Magic0 | andpd XMM0 X$SSE_Emask | orpd XMM0 X$SSE_One | addpd XMM1 XMM0
pextrw eax XMM1 0 | and eax ((Size_of_LogTable*2)-48)
movupd XMM4 X$Log10_Table_T+eax
movupd XMM1 X$SSE_HiMask0 | andpd XMM1 XMM0
subpd XMM0 XMM1 | mulpd XMM0 X$Log10_Table_B+eax
mulpd XMM1 X$Log10_Table_B+eax | subpd XMM1 X$SSE_LOG10_CC_0
addsd XMM4 XMM1
sub ecx 1022 | add ecx edx
cvtsi2sd XMM2 ecx | shl ecx 10 | add eax ecx | mov ecx 16 | mov edx 0 | cmp eax 0 | cmovz edx ecx
movupd XMM3 XMM0 | andpd XMM3 X$SSE_Place_Log2+edx
addpd XMM0 XMM1
; same as SSE_Place_Log0
unpcklpd XMM2 XMM2 | mulpd XMM2 X$SSE_Log1020 | addpd XMM2 XMM3 | addpd XMM4 XMM2
movupd XMM1 X$Log_Coeff0
movupd XMM2 XMM0 | mulpd XMM2 XMM2 | mulsd XMM2 XMM2 | mulsd XMM2 XMM0
mulpd XMM1 XMM0 | addpd XMM1 X$SSE_Log10Var1
mulpd XMM1 XMM0
addpd XMM1 X$SSE_Log10Var2 | mulpd XMM1 XMM2
movupd xmm2 X$SSE_Log10Var3 | mulpd XMM2 XMM0
movupd XMM3 XMM4
unpckhpd XMM3 XMM3
movupd XMM0 XMM1
addpd XMM1 XMM2
unpckhpd XMM0 XMM0
addsd XMM0 XMM1
addsd XMM0 XMM3
addsd XMM0 XMM4
EndP
My timmings
AMD Ryzen 5 2400G with Radeon Vega Graphics (SSE4)
2747 cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise , 2.7182818^5)
11641 cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
27174 cycles for 100 * pow (CRT, 2.7182818^5)
2619 cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise , 2.7182818^5)
11399 cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
27011 cycles for 100 * pow (CRT, 2.7182818^5)
2632 cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise , 2.7182818^5)
11207 cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
25996 cycles for 100 * pow (CRT, 2.7182818^5)
2593 cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise , 2.7182818^5)
11419 cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
26219 cycles for 100 * pow (CRT, 2.7182818^5)
2553 cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise , 2.7182818^5)
11310 cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
26940 cycles for 100 * pow (CRT, 2.7182818^5)
148.413159102577 for Sse2_log10_precise (Guga SSE2 Log10 precise , 2.7182818^5)
148.413159102577 for ExpXY (MasmBasic, 2.7182818^5)
148.413159102577 for pow (CRT, 2.7182818^5)
Updated new version with the proper values (in between parenthesis) calculated. File: TimmingsLog10g.zip (faster and still accurate)