Fast Log10 approximation

jj2007 · August 16, 2020, 10:45:44 AM

Quote from: HSE on August 16, 2020, 10:31:42 AM
It's the rutine with the mov .

The code its above, just change the line 345.

Please show the line in context, I can't find it.

guga · August 16, 2020, 10:57:53 AM

Guys. The syntax of tableB is wrong. How to make masm calculate the values in parenthesis in the data section ?

Log10_Table_B dq (227328/524288), (227328/524288), <----- this is causing masm to compile as db 000000000 rather then the values of each division

guga · August 16, 2020, 01:42:48 PM

OK, Guys...Now it is working as expected. The result is ok.

All i did was replace the values of the variables in between ( ) with their calculated values and it returned the correct answer as expected.

Attached update. The src is the same as before, except the fix of the values in parenthesis. Updated the 1st post too

Result is:

Code Select


AMD Ryzen 5 2400G with Radeon Vega Graphics     (SSE4)

2786    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
11317   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
26628   cycles for 100 * pow (CRT, 2.7182818^5)

2662    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
11268   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
27312   cycles for 100 * pow (CRT, 2.7182818^5)

2686    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
11511   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
26168   cycles for 100 * pow (CRT, 2.7182818^5)

2832    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
12049   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
27194   cycles for 100 * pow (CRT, 2.7182818^5)

2781    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
11254   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
26377   cycles for 100 * pow (CRT, 2.7182818^5)

0.698970004336019 for Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
148.413159102577 for ExpXY (MasmBasic, 2.7182818^5)
148.413159102577 for pow (CRT, 2.7182818^5)

--- ok ---

TimoVJL · August 16, 2020, 05:18:16 PM

Code Select

AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

26512   cycles for 10000 * movlps
26975   cycles for 10000 * movlpd
62428   cycles for 10000 * movdqu

26448   cycles for 10000 * movlps
26523   cycles for 10000 * movlpd
62479   cycles for 10000 * movdqu

26537   cycles for 10000 * movlps
26812   cycles for 10000 * movlpd
62242   cycles for 10000 * movdqu

26604   cycles for 10000 * movlps
26673   cycles for 10000 * movlpd
62401   cycles for 10000 * movdqu

26578   cycles for 10000 * movlps
26707   cycles for 10000 * movlpd
62530   cycles for 10000 * movdqu

28      bytes for movlps
32      bytes for movlpd
32      bytes for movdqu

R8      1234567890.123456716
R8      1234567890.123456716
R8      1234567890.123456716

-

HSE · August 17, 2020, 02:53:55 AM

Quote from: jj2007 on August 16, 2020, 10:45:44 AM
Please show the line in context, I can't find it.

Have you problems with the TestBed?
(aclaration: the TestBed program was written by jj2007)

Quote from: TimoVJL on August 16, 2020, 05:18:16 PM
Code Select Expand
AMD Ryzen 5 3400G with Radeon Vega Graphics (SSE4)

Only SS3 here! Perhaps that is.

Quote from: guga on August 16, 2020, 10:57:53 AM
Guys. The syntax of tableB is wrong. How to make masm calculate the values in parenthesis in the data section ?

Log10_Table_B dq (227328/524288), (227328/524288), <----- this is causing masm to compile as db 000000000 rather then the values of each division

You can use qWord's floating point arithmetic while assembling: MREAL-macros

jack · August 17, 2020, 03:18:10 AM

from what I gather you use tables to calculate the logarithm, just for fun, using Maple I computed a rational approximation to ln(1+x) and by multiplying that by log10(e) you get an approximation to log10(1+x) with x between 0 and 1

Code Select


ln(1+x) = (1+(2.45432048794495419+(2.17440739254242255+(.841813647259988564+(.135819954481562186+(0.687077985071464530e-2+0.231261455529019880e-4*x)*x)*x)*x)*x)*x)*x/(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x)

log10(1+x) = (.434294481903251828+(1.06589784473659010+(.944333131990812123+(.365595021795863521+(0.589858567636932956e-1+(0.298394177553741882e-2+0.100435574013167601e-4*x)*x)*x)*x)*x)*x)*x/(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x)

guga · August 17, 2020, 04:54:55 AM

Quote from: jack on August 17, 2020, 03:18:10 AM
from what I gather you use tables to calculate the logarithm, just for fun, using Maple I computed a rational approximation to ln(1+x) and by multiplying that by log10(e) you get an approximation to log10(1+x) with x between 0 and 1
Code Select Expand
ln(1+x) = (1+(2.45432048794495419+(2.17440739254242255+(.841813647259988564+(.135819954481562186+(0.687077985071464530e-2+0.231261455529019880e-4*x)*x)*x)*x)*x)*x)*x/(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x) log10(1+x) = (.434294481903251828+(1.06589784473659010+(.944333131990812123+(.365595021795863521+(0.589858567636932956e-1+(0.298394177553741882e-2+0.100435574013167601e-4*x)*x)*x)*x)*x)*x)*x/(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x)

Great, jack. Thanks a lot.

Some questions:

1 - What is maple ? (Where can i download it)
2 - Did you tested the accuracy ? What is the precision of the values you inputted, and also can it handle denormalized values too ?
3 - Can mapple produces these polynomial values without division ? Division is slow.

Siekmanski · August 17, 2020, 05:45:59 AM

Hi Guga,

You can always avoid divisions when using constants.
Make a reciprocal of the constant and you can multiply instead of divide.

Value real4 256.0
recValue real4 0.00390625 (1/256)

divss xmm0,real4 ptr Value
mulss xmm0,real4 ptr recValue

both have the same result.

jack · August 17, 2020, 05:53:26 AM

1- maple is a Computer Algebra System https://www.maplesoft.com/ns/maple/cas/computer-algebra-systems-math-education.aspx
2- the input range for x is between 0 and 1, the error is probably +/-1e-16 it depends on the precision used in the evaluation, as for de-normalized values - this is just a rational polynomial approximation unrelated to floating point intrinsic
3- yes, but a rational polynomial usually requires fewer terms to achieve the precision than a plain polynomial, it would take a polynomial of degree 19 or 20 to get the same precision

mineiro · August 17, 2020, 06:13:45 AM

Quote from: Siekmanski on August 17, 2020, 05:45:59 AM
Make a reciprocal of the constant and you can multiply instead of divide.

Multiply is more eficient than divide

Integer division using reciprocals -- Robert Alverson
https://www.computer.org/csdl/proceedings-article/arith/1991/00145558/12OmNyaXPS1

jj2007 · August 17, 2020, 10:20:09 AM

Quote from: HSE on August 16, 2020, 07:47:28 AM

Code Select Expand
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3) 5396 cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise ) 11737 cycles for 100 * SmplMath: fSlv MyReal8 = log(MyExpo)

Code Select

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

3163    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
5060    cycles for 100 * Log10

3211    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
4933    cycles for 100 * Log10

3209    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
4823    cycles for 100 * Log10

3186    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
4773    cycles for 100 * Log10

3219    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
4739    cycles for 100 * Log10

536     bytes for Sse2_log10_precise (Guga SSE2 Log10 precise )
16      bytes for Log10

0.699076046207356 for Sse2_log10_precise (Guga SSE2 Log10 precise )
0.698970004336019 for Log10
0.6989700043360188 expected

HSE · August 17, 2020, 10:36:07 AM

With last Guga correction:

Code Select

AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

5257    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11751   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11687   cycles for 100 * JJ Log10

5251    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11741   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11637   cycles for 100 * JJ Log10

5254    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11758   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11664   cycles for 100 * JJ Log10

5248    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11760   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11641   cycles for 100 * JJ Log10

5274    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11735   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11653   cycles for 100 * JJ Log10

0.698970004336019 for Sse2_log10_precise (Guga SSE2 Log10 precise )
0.698970004336019 for SmplMath:  fSlv MyReal8 = log(MyExpo)
0.698970004336019 for JJ Log10

jj2007 · August 17, 2020, 10:54:07 AM

MB Log10 under the hood:

Code Select

fldlg2
fld MyExpo
fyl2x
fstp MyReal8

Mikl__ · August 17, 2020, 01:00:08 PM

[delete]

guga · August 17, 2020, 05:14:44 PM

Hi MArinus

With the equation provided by Jack, we can´t make a reciprocal, unfortunately. The divisor is another polynomial.

Jack, i gave a try using Log10(5+1) with the formula and the precision is something around 5 digits after the "." only. If you input 5 as x, the result of log(x+1) = log(6) will turn onto:

(0.434294481903251828+(1.0658978447365901+(0.944333131990812123+(0.365595021795863521+(0.589858567636932956e-1+(0.298394177553741882e-2+0.100435574013167601e-4*x)*x)*x)*x)*x)*x)*x, x=5
= 607.096994200488817

(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(0.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x) , x=5
= 780.177558728198735

607.096994200488817/780.177558728198735 = 0.778152341615854664455814528055710104487791345037987518732

Expected log10(6) = 0.7781512503836436325087667979796083359683187456528044061402931014.

0.778152341615854664455814528055710104487791345037987518732 ; result using maple
0.7781512503836436325087667979796083359683187456528044061402931014 ; expected result

Can you please give a try to see if the precision can be extended to at least 14 digits after the "." and also keeping the amount of polynomial "x" to be used (or simplifying would be better) ? I mean it is a equation where the numerator is a equation on the form of x^7+x^6+... and the divisor x^6+x^5... It can be reformulated as:

where A, B, C...are the values you posted and "x" the inputed value to calculate. We could try to put the numerator on a matrix and the divisor on other matrix and try to divide the matrix using the inversal of the divisor (The equation with x^6+...), but calculating the inverse matrix and also needing to check later if it can be divided will take a lot of time to process too.

Please, see if the numbers you created with Maple can be extended to at least 14 digits precision (after the "."), and also try simplifying the equation so we can try to see if it´s faster then the one i made.

The MASM Forum

News:

Fast Log10 approximation

jj2007

guga

guga

TimoVJL

HSE

jack

guga

Siekmanski

jack

mineiro

jj2007

HSE

jj2007

Mikl__

guga