Author Topic: Fast Log10 approximation  (Read 15225 times)

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: Fast Log10 approximation
« Reply #15 on: August 16, 2020, 10:45:44 AM »
It's the rutine with the mov .

The code its above, just change the line 345.

Please show the line in context, I can't find it.

guga

  • Member
  • *****
  • Posts: 1467
  • Assembly is a state of art.
    • RosAsm
Re: Fast Log10 approximation
« Reply #16 on: August 16, 2020, 10:57:53 AM »
Guys. The syntax of tableB is wrong. How to make masm calculate the values in parenthesis in the data section ?

Log10_Table_B dq (227328/524288),  (227328/524288), <----- this is causing masm to compile as db 000000000 rather then the values of each division
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1467
  • Assembly is a state of art.
    • RosAsm
Re: Fast Log10 approximation
« Reply #17 on: August 16, 2020, 01:42:48 PM »
OK, Guys...Now it is working as expected. The result is ok.

All i did was replace the values of the variables in between ( ) with their calculated values and it returned the correct answer as expected.

Attached update. The src is the same as before, except the fix of the values in parenthesis. Updated the 1st post too

Result is:
Code: [Select]
AMD Ryzen 5 2400G with Radeon Vega Graphics     (SSE4)

2786    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
11317   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
26628   cycles for 100 * pow (CRT, 2.7182818^5)

2662    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
11268   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
27312   cycles for 100 * pow (CRT, 2.7182818^5)

2686    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
11511   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
26168   cycles for 100 * pow (CRT, 2.7182818^5)

2832    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
12049   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
27194   cycles for 100 * pow (CRT, 2.7182818^5)

2781    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
11254   cycles for 100 * ExpXY (MasmBasic, 2.7182818^5)
26377   cycles for 100 * pow (CRT, 2.7182818^5)

0.698970004336019 for Sse2_log10_precise (Guga SSE2 Log10 precise of 5)
148.413159102577 for ExpXY (MasmBasic, 2.7182818^5)
148.413159102577 for pow (CRT, 2.7182818^5)

--- ok ---
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

TimoVJL

  • Member
  • *****
  • Posts: 1320
Re: Fast Log10 approximation
« Reply #18 on: August 16, 2020, 05:18:16 PM »
Code: [Select]
AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

26512   cycles for 10000 * movlps
26975   cycles for 10000 * movlpd
62428   cycles for 10000 * movdqu

26448   cycles for 10000 * movlps
26523   cycles for 10000 * movlpd
62479   cycles for 10000 * movdqu

26537   cycles for 10000 * movlps
26812   cycles for 10000 * movlpd
62242   cycles for 10000 * movdqu

26604   cycles for 10000 * movlps
26673   cycles for 10000 * movlpd
62401   cycles for 10000 * movdqu

26578   cycles for 10000 * movlps
26707   cycles for 10000 * movlpd
62530   cycles for 10000 * movdqu

28      bytes for movlps
32      bytes for movlpd
32      bytes for movdqu

R8      1234567890.123456716
R8      1234567890.123456716
R8      1234567890.123456716

-
May the source be with you

HSE

  • Member
  • *****
  • Posts: 2502
  • AMD 7-32 / i3 10-64
Re: Fast Log10 approximation
« Reply #19 on: August 17, 2020, 02:53:55 AM »
Please show the line in context, I can't find it.

 :biggrin: Have you problems with the TestBed?
 (aclaration: the TestBed program was written by jj2007)

Code: [Select]
AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

Only SS3 here! Perhaps that is.


Guys. The syntax of tableB is wrong. How to make masm calculate the values in parenthesis in the data section ?

Log10_Table_B dq (227328/524288),  (227328/524288), <----- this is causing masm to compile as db 000000000 rather then the values of each division
You can use qWord's floating point arithmetic while assembling: MREAL-macros


Equations in Assembly: SmplMath

jack

  • Member
  • **
  • Posts: 231
Re: Fast Log10 approximation
« Reply #20 on: August 17, 2020, 03:18:10 AM »
from what I gather you use tables to calculate the logarithm, just for fun, using Maple I computed a rational approximation to ln(1+x) and by multiplying that by log10(e) you get an approximation to log10(1+x) with x between 0 and 1
Code: [Select]
ln(1+x) = (1+(2.45432048794495419+(2.17440739254242255+(.841813647259988564+(.135819954481562186+(0.687077985071464530e-2+0.231261455529019880e-4*x)*x)*x)*x)*x)*x)*x/(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x)

log10(1+x) = (.434294481903251828+(1.06589784473659010+(.944333131990812123+(.365595021795863521+(0.589858567636932956e-1+(0.298394177553741882e-2+0.100435574013167601e-4*x)*x)*x)*x)*x)*x)*x/(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x)

guga

  • Member
  • *****
  • Posts: 1467
  • Assembly is a state of art.
    • RosAsm
Re: Fast Log10 approximation
« Reply #21 on: August 17, 2020, 04:54:55 AM »
from what I gather you use tables to calculate the logarithm, just for fun, using Maple I computed a rational approximation to ln(1+x) and by multiplying that by log10(e) you get an approximation to log10(1+x) with x between 0 and 1
Code: [Select]
ln(1+x) = (1+(2.45432048794495419+(2.17440739254242255+(.841813647259988564+(.135819954481562186+(0.687077985071464530e-2+0.231261455529019880e-4*x)*x)*x)*x)*x)*x)*x/(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x)

log10(1+x) = (.434294481903251828+(1.06589784473659010+(.944333131990812123+(.365595021795863521+(0.589858567636932956e-1+(0.298394177553741882e-2+0.100435574013167601e-4*x)*x)*x)*x)*x)*x)*x/(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x)

Great, jack. Thanks a lot.

Some questions:

1 - What is maple ? (Where can i download it)
2 - Did you tested the accuracy ? What is the precision of the values you inputted, and also can it handle denormalized values too ?
3 - Can mapple produces these polynomial values without division ? Division is slow.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 2725
Re: Fast Log10 approximation
« Reply #22 on: August 17, 2020, 05:45:59 AM »
Hi Guga,

You can always avoid divisions when using constants.
Make a reciprocal of the constant and you can multiply instead of divide.

Value      real4 256.0
recValue real4 0.00390625  (1/256)

 divss xmm0,real4 ptr Value
 mulss xmm0,real4 ptr recValue

both have the same result.
Creative coders use backward thinking techniques as a strategy.

jack

  • Member
  • **
  • Posts: 231
Re: Fast Log10 approximation
« Reply #23 on: August 17, 2020, 05:53:26 AM »
1- maple is a Computer Algebra System https://www.maplesoft.com/ns/maple/cas/computer-algebra-systems-math-education.aspx
2- the input range for x is between 0 and 1, the error is probably +/-1e-16 it depends on the precision used in the evaluation, as for de-normalized values - this is just a rational polynomial approximation unrelated to floating point intrinsic
3- yes, but a rational polynomial usually requires fewer terms to achieve the precision than a plain polynomial, it would take a polynomial of degree 19 or 20 to get the same precision

mineiro

  • Member
  • ****
  • Posts: 958
Re: Fast Log10 approximation
« Reply #24 on: August 17, 2020, 06:13:45 AM »
Make a reciprocal of the constant and you can multiply instead of divide.
Multiply is more eficient than divide  :thumbsup:

Integer division using reciprocals -- Robert Alverson
https://www.computer.org/csdl/proceedings-article/arith/1991/00145558/12OmNyaXPS1
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: Fast Log10 approximation
« Reply #25 on: August 17, 2020, 10:20:09 AM »
:thumbsup:

Code: [Select]
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

5396    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11737   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

3163    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
5060    cycles for 100 * Log10

3211    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
4933    cycles for 100 * Log10

3209    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
4823    cycles for 100 * Log10

3186    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
4773    cycles for 100 * Log10

3219    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
4739    cycles for 100 * Log10

536     bytes for Sse2_log10_precise (Guga SSE2 Log10 precise )
16      bytes for Log10

0.699076046207356 for Sse2_log10_precise (Guga SSE2 Log10 precise )
0.698970004336019 for Log10
0.6989700043360188 expected

HSE

  • Member
  • *****
  • Posts: 2502
  • AMD 7-32 / i3 10-64
Re: Fast Log10 approximation
« Reply #26 on: August 17, 2020, 10:36:07 AM »
With last Guga correction:
Code: [Select]
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

5257    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11751   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11687   cycles for 100 * JJ Log10

5251    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11741   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11637   cycles for 100 * JJ Log10

5254    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11758   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11664   cycles for 100 * JJ Log10

5248    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11760   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11641   cycles for 100 * JJ Log10

5274    cycles for 100 * Sse2_log10_precise (Guga SSE2 Log10 precise )
11735   cycles for 100 * SmplMath:  fSlv MyReal8 = log(MyExpo)
11653   cycles for 100 * JJ Log10

0.698970004336019 for Sse2_log10_precise (Guga SSE2 Log10 precise )
0.698970004336019 for SmplMath:  fSlv MyReal8 = log(MyExpo)
0.698970004336019 for JJ Log10
Equations in Assembly: SmplMath

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: Fast Log10 approximation
« Reply #27 on: August 17, 2020, 10:54:07 AM »
MB Log10 under the hood:
Code: [Select]
fldlg2
fld MyExpo
fyl2x
fstp MyReal8

Mikl__

  • Member
  • *****
  • Posts: 1346
Re: Fast Log10 approximation
« Reply #28 on: August 17, 2020, 01:00:08 PM »
[delete]

guga

  • Member
  • *****
  • Posts: 1467
  • Assembly is a state of art.
    • RosAsm
Re: Fast Log10 approximation
« Reply #29 on: August 17, 2020, 05:14:44 PM »
Hi MArinus

With the equation provided by Jack, we can´t make a reciprocal, unfortunately. The divisor is another polynomial.

Jack, i gave a try using Log10(5+1) with the formula and the precision is something around 5 digits after the "." only. If you input 5 as x, the result of log(x+1) = log(6) will turn onto:

(0.434294481903251828+(1.0658978447365901+(0.944333131990812123+(0.365595021795863521+(0.589858567636932956e-1+(0.298394177553741882e-2+0.100435574013167601e-4*x)*x)*x)*x)*x)*x)*x, x=5
= 607.096994200488817

(1+(2.95432048794495173+(3.31823430318176310+(1.76615730286299464+(0.451400626940937546+(0.492131362215352252e-1+0.158489557252300951e-2*x)*x)*x)*x)*x)*x) , x=5
= 780.177558728198735

 607.096994200488817/780.177558728198735 = 0.778152341615854664455814528055710104487791345037987518732

Expected log10(6) = 0.7781512503836436325087667979796083359683187456528044061402931014.

0.778152341615854664455814528055710104487791345037987518732 ;  result using maple
0.7781512503836436325087667979796083359683187456528044061402931014 ; expected result



Can you please give a try to see if the precision can be extended to at least 14 digits after the "." and also keeping the amount of polynomial "x"  to be used (or simplifying would be better) ? I mean it is a equation where the numerator is a equation on the form of x^7+x^6+... and the divisor x^6+x^5... It can be reformulated as:


where A, B, C...are the values you posted and "x" the inputed value to calculate. We could try to put the numerator on a matrix and the divisor on other matrix and try to divide the matrix using the inversal of the divisor (The equation with x^6+...), but calculating the inverse matrix and also needing to check later if it can be divided will take a lot of time to process too.

Please, see if the numbers you created with Maple can be extended to at least  14 digits precision (after the "."), and also try simplifying the equation so we can try to see if it´s faster then the one i made.
« Last Edit: August 17, 2020, 07:58:17 PM by guga »
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com