News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Fast Log10 approximation

Started by guga, August 16, 2020, 06:01:01 AM

Previous topic - Next topic

jj2007

Quote from: HSE on August 16, 2020, 10:31:42 AM
It's the rutine with the mov .

The code its above, just change the line 345.

Please show the line in context, I can't find it.

guga

Guys. The syntax of tableB is wrong. How to make masm calculate the values in parenthesis in the data section ?

Log10_Table_B dq (227328/524288),  (227328/524288), <----- this is causing masm to compile as db 000000000 rather then the values of each division
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

OK, Guys...Now it is working as expected. The result is ok.

All i did was replace the values of the variables in between ( ) with their calculated values and it returned the correct answer as expected.

Attached update. The src is the same as before, except the fix of the values in parenthesis. Updated the 1st post too

Result is:

AMD Ryzen 5 2400G with Radeon Vega Graphics     (SSE4)

2786    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
11317   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
26628   cycles for 100 * pow (CRT, 2.7182818^5)

2662    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
11268   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
27312   cycles for 100 * pow (CRT, 2.7182818^5)

2686    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
11511   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
26168   cycles for 100 * pow (CRT, 2.7182818^5)

2832    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
12049   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
27194   cycles for 100 * pow (CRT, 2.7182818^5)

2781    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
11254   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
26377   cycles for 100 * pow (CRT, 2.7182818^5)

0.698970004336019 for Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
148.413159102577 for ExpXY (MasmBasic, 2.7182818^5)
148.413159102577 for pow (CRT, 2.7182818^5)

--- ok ---
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

TimoVJL

AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

26512   cycles for 10000 * movlps
26975   cycles for 10000 * movlpd
62428   cycles for 10000 * movdqu

26448   cycles for 10000 * movlps
26523   cycles for 10000 * movlpd
62479   cycles for 10000 * movdqu

26537   cycles for 10000 * movlps
26812   cycles for 10000 * movlpd
62242   cycles for 10000 * movdqu

26604   cycles for 10000 * movlps
26673   cycles for 10000 * movlpd
62401   cycles for 10000 * movdqu

26578   cycles for 10000 * movlps
26707   cycles for 10000 * movlpd
62530   cycles for 10000 * movdqu

28      bytes for movlps
32      bytes for movlpd
32      bytes for movdqu

R8      1234567890.123456716
R8      1234567890.123456716
R8      1234567890.123456716

-
May the source be with you

HSE

Quote from: jj2007 on August 16, 2020, 10:45:44 AM
Please show the line in context, I can't find it.

:biggrin: Have you problems with the TestBed?
(aclaration: the TestBed program was written by jj2007)

Quote from: TimoVJL on August 16, 2020, 05:18:16 PM
AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)


Only SS3 here! Perhaps that is.


Quote from: guga on August 16, 2020, 10:57:53 AM
Guys. The syntax of tableB is wrong. How to make masm calculate the values in parenthesis in the data section ?

Log10_Table_B dq (227328/524288),  (227328/524288), <----- this is causing masm to compile as db 000000000 rather then the values of each division
You can use qWord's floating point arithmetic while assembling: MREAL-macros


Equations in Assembly: SmplMath

jack

from what I gather you use tables to calculate the logarithm, just for fun, using Maple I computed a rational approximation to ln(1+x) and by multiplying that by log10(e) you get an approximation to log10(1+x) with x between 0 and 1

ln(1+x) = (1+(2.45432048794495419+(2.17440739254242255+(.841813647259988564+(.135819954481562186+(0.687077985071464530e-2+0.231261455529019880e-4*x)*x)*x)*x)*x)*x)*x/(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x)

log10(1+x) = (.434294481903251828+(1.06589784473659010+(.944333131990812123+(.365595021795863521+(0.589858567636932956e-1+(0.298394177553741882e-2+0.100435574013167601e-4*x)*x)*x)*x)*x)*x)*x/(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x)

guga

Quote from: jack on August 17, 2020, 03:18:10 AM
from what I gather you use tables to calculate the logarithm, just for fun, using Maple I computed a rational approximation to ln(1+x) and by multiplying that by log10(e) you get an approximation to log10(1+x) with x between 0 and 1

ln(1+x) = (1+(2.45432048794495419+(2.17440739254242255+(.841813647259988564+(.135819954481562186+(0.687077985071464530e-2+0.231261455529019880e-4*x)*x)*x)*x)*x)*x)*x/(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x)

log10(1+x) = (.434294481903251828+(1.06589784473659010+(.944333131990812123+(.365595021795863521+(0.589858567636932956e-1+(0.298394177553741882e-2+0.100435574013167601e-4*x)*x)*x)*x)*x)*x)*x/(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x)


Great, jack. Thanks a lot.

Some questions:

1 - What is maple ? (Where can i download it)
2 - Did you tested the accuracy ? What is the precision of the values you inputted, and also can it handle denormalized values too ?
3 - Can mapple produces these polynomial values without division ? Division is slow.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

Hi Guga,

You can always avoid divisions when using constants.
Make a reciprocal of the constant and you can multiply instead of divide.

Value      real4 256.0
recValue real4 0.00390625  (1/256)

divss xmm0,real4 ptr Value
mulss xmm0,real4 ptr recValue

both have the same result.
Creative coders use backward thinking techniques as a strategy.

jack

1- maple is a Computer Algebra System https://www.maplesoft.com/ns/maple/cas/computer-algebra-systems-math-education.aspx
2- the input range for x is between 0 and 1, the error is probably +/-1e-16 it depends on the precision used in the evaluation, as for de-normalized values - this is just a rational polynomial approximation unrelated to floating point intrinsic
3- yes, but a rational polynomial usually requires fewer terms to achieve the precision than a plain polynomial, it would take a polynomial of degree 19 or 20 to get the same precision

mineiro

Quote from: Siekmanski on August 17, 2020, 05:45:59 AM
Make a reciprocal of the constant and you can multiply instead of divide.
Multiply is more eficient than divide  :thumbsup:

Integer division using reciprocals -- Robert Alverson
https://www.computer.org/csdl/proceedings-article/arith/1991/00145558/12OmNyaXPS1
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

jj2007

Quote from: HSE on August 16, 2020, 07:47:28 AM
:thumbsup:

AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

5396    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11737   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

3163    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
5060    cycles for 100 * Log10

3211    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
4933    cycles for 100 * Log10

3209    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
4823    cycles for 100 * Log10

3186    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
4773    cycles for 100 * Log10

3219    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
4739    cycles for 100 * Log10

536     bytes for Sse2_log10_precise (Guga SSE2 Log10 precise )
16      bytes for Log10

0.699076046207356 for Sse2_log10_precise (Guga SSE2 Log10 precise )
0.698970004336019 for Log10
0.6989700043360188 expected

HSE

With last Guga correction:
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

5257    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11751   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11687   cycles for 100 * JJ Log10

5251    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11741   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11637   cycles for 100 * JJ Log10

5254    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11758   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11664   cycles for 100 * JJ Log10

5248    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11760   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11641   cycles for 100 * JJ Log10

5274    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11735   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11653   cycles for 100 * JJ Log10

0.698970004336019 for Sse2_log10_precise (Guga SSE2 Log10 precise )
0.698970004336019 for SmplMath:  fSlv MyReal8 = log(MyExpo)
0.698970004336019 for JJ Log10
Equations in Assembly: SmplMath

jj2007

MB Log10 under the hood:
fldlg2
fld MyExpo
fyl2x
fstp MyReal8

Mikl__


guga

#29
Hi MArinus

With the equation provided by Jack, we can´t make a reciprocal, unfortunately. The divisor is another polynomial.

Jack, i gave a try using Log10(5+1) with the formula and the precision is something around 5 digits after the "." only. If you input 5 as x, the result of log(x+1) = log(6) will turn onto:

(0.434294481903251828+(1.0658978447365901+(0.944333131990812123+(0.365595021795863521+(0.589858567636932956e-1+(0.298394177553741882e-2+0.100435574013167601e-4*x)*x)*x)*x)*x)*x)*x, x=5
= 607.096994200488817

(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(0.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x) , x=5
= 780.177558728198735

607.096994200488817/780.177558728198735 = 0.778152341615854664455814528055710104487791345037987518732

Expected log10(6) = 0.7781512503836436325087667979796083359683187456528044061402931014.

0.778152341615854664455814528055710104487791345037987518732 ;  result using maple
0.7781512503836436325087667979796083359683187456528044061402931014 ; expected result



Can you please give a try to see if the precision can be extended to at least 14 digits after the "." and also keeping the amount of polynomial "x"  to be used (or simplifying would be better) ? I mean it is a equation where the numerator is a equation on the form of x^7+x^6+... and the divisor x^6+x^5... It can be reformulated as:


where A, B, C...are the values you posted and "x" the inputed value to calculate. We could try to put the numerator on a matrix and the divisor on other matrix and try to divide the matrix using the inversal of the divisor (The equation with x^6+...), but calculating the inverse matrix and also needing to check later if it can be divided will take a lot of time to process too.

Please, see if the numbers you created with Maple can be extended to at least  14 digits precision (after the "."), and also try simplifying the equation so we can try to see if it´s faster then the one i made.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com