Author Topic: ArcSin timings  (Read 4758 times)

jj2007

  • Member
  • *****
  • Posts: 11551
  • Assembler is fun ;-)
    • MasmBasic
Re: ArcSin timings
« Reply #15 on: December 22, 2020, 06:25:58 AM »
:thumbsup: Was missing MbProHeap initialization:
Code: [Select]
  ifdef MbBufferInit
call MbBufferInit
  endif

Did you call ArcSinInit before ShowCpu? Normally the Init macro takes care of all that, but my timings template keeps the option open to run it without MasmBasic, so I chose the ifdef MbBufferInit instead.

HSE

  • Member
  • *****
  • Posts: 1744
  • <AMD>< 7-32>
Re: ArcSin timings
« Reply #16 on: December 23, 2020, 01:24:33 AM »
Did you call ArcSinInit before ShowCpu? Normally the Init macro takes care of all that, but my timings template keeps the option open to run it without MasmBasic, so I chose the ifdef MbBufferInit instead.
Essentially is the code you posted. The MasmBasic book is not so clear  :biggrin:

BTW FastMath name sound nice, but is a little misleading. Perhaps MathLUT or something like that, because is a look up table creation macro, not fast calculation.  :thumbsup:

jj2007

  • Member
  • *****
  • Posts: 11551
  • Assembler is fun ;-)
    • MasmBasic
Re: ArcSin timings
« Reply #17 on: December 23, 2020, 01:35:06 AM »
BTW FastMath name sound nice, but is a little misleading. Perhaps MathLUT or something like that, because is a look up table creation macro, not fast calculation.  :thumbsup:

If it gives you a factor 8-12 faster math, then the name is not that important ;-)

daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
Re: ArcSin timings
« Reply #18 on: December 23, 2020, 02:35:41 AM »
originally made for Guga's color conversions
SIMD taylor version
Code: [Select]
I love SIMD
0.234714
0.61685
0.380504
0.234714
0.144784
2!,4!,6!,8!
0.5
0.0416667
0.00138889
2.48016e-05
cosine result :0.707426
sine result :0.707107
arcsine result:0.900242
times x:1000000
clock cycles :55624547
cycles/loop :55
SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

HSE

  • Member
  • *****
  • Posts: 1744
  • <AMD>< 7-32>
Re: ArcSin timings
« Reply #19 on: December 23, 2020, 03:03:25 AM »
If it gives you a factor 8-12 faster math
Not relly.

I'm thinking the equations (without presition acount) :
 
  •   Time_of_building_per_access = Time_to_build_table/Number_of_accesses_per_program
  •   FactorJJ = Time_of_calculation/(Time_of_access  + Time_of_building_per_access)
 

Indiference point is replacing around 25250 calculations. In this point factor is 1.

jj2007

  • Member
  • *****
  • Posts: 11551
  • Assembler is fun ;-)
    • MasmBasic
Re: ArcSin timings
« Reply #20 on: December 23, 2020, 04:53:18 AM »
It takes one millisecond to build the table, Héctor. The more complex the mathematical function is, the more you can gain with FastMath :cool:

HSE

  • Member
  • *****
  • Posts: 1744
  • <AMD>< 7-32>
Re: ArcSin timings
« Reply #21 on: December 23, 2020, 05:26:12 AM »
The more complex the mathematical function is, the more you can gain with FastMath :cool:
Only if you are calling the table beyond indiference point  :biggrin:

daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
Re: ArcSin timings
« Reply #22 on: December 23, 2020, 05:55:39 AM »
It takes one millisecond to build the table, Héctor. The more complex the mathematical function is, the more you can gain with FastMath :cool:
But if you make tables in Workerthread, while windows main thread creates and loads lot of things when it starts,you wouldn't notice the milliseconds it takes for make one or several tables

SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

HSE

  • Member
  • *****
  • Posts: 1744
  • <AMD>< 7-32>
Re: ArcSin timings
« Reply #23 on: December 23, 2020, 06:40:50 AM »
But if you make tables in Workerthread, while windows main thread creates and loads lot of things when it starts,you wouldn't notice the milliseconds it takes for make one or several tables
Yes, you need even more time to prepare for the game :biggrin:

Tables like this are very usefull in games because you don't need presition.

Only if you are calling the table beyond indiference point  :biggrin:
Indeed you have some profit after 50504 table access (almost double of indiference point)


jj2007

  • Member
  • *****
  • Posts: 11551
  • Assembler is fun ;-)
    • MasmBasic
Re: ArcSin timings
« Reply #24 on: December 23, 2020, 08:46:08 AM »
Indeed you have some profit after 50504 table access (almost double of indiference point)

Ok, put Step 0.5 in ArcSinInit and do your optimisation again :badgrin:

HSE

  • Member
  • *****
  • Posts: 1744
  • <AMD>< 7-32>
Re: ArcSin timings
« Reply #25 on: December 23, 2020, 09:03:07 AM »
Ok, put Step 0.02 in ArcSinInit and do your optimisation again :badgrin:
I you don't need any precition you just could put any number as a solution, that could be even faster  :biggrin:

jj2007

  • Member
  • *****
  • Posts: 11551
  • Assembler is fun ;-)
    • MasmBasic
Re: ArcSin timings
« Reply #26 on: December 23, 2020, 09:42:01 AM »
If you don't need precision, check if it's still good enough with Step 0.5:
Code: [Select]
ArcSinInit:
  NanoTimer()
  FastMath ArcSin ; define a math function
For_ fct=0.0 To 1.0 Step 0.1
fld fct
fstp REAL10 ptr [edi]
void Arcsinus(fct)
fstp REAL10 ptr [edi+REAL10]
add edi, 2*REAL10
Next
  FastMath
  PrintLine NanoTimer$(), " for initialising the FastMath macro"
  For_ fct=0.05 To 1.0 Step 0.1 ; compare the exact value with the estimate
PrintLine Str$("%3f\t", fct), Str$("%9f\t", Arcsinus(fct)v), Str$("%9f", ArcSin(fct)v)
  Next
  retn

Internally, FastMath uses SetPoly3. It's a fairly sophisticated LUT :cool:

HSE

  • Member
  • *****
  • Posts: 1744
  • <AMD>< 7-32>
Re: ArcSin timings
« Reply #27 on: December 24, 2020, 01:14:09 AM »
Hi JJ!

With step= 0.1 you can have an absolute error= 5.6 x 100 (relative error is 9.6% in that point). That is very big, especially  compared with usual default: absolute error = 1.0 x 10-6

Then first you have to find step size for presicion you need, and second you have to choose between direct calculation or table depending on number of access to solution.

Regards. HSE

jj2007

  • Member
  • *****
  • Posts: 11551
  • Assembler is fun ;-)
    • MasmBasic
Re: ArcSin timings
« Reply #28 on: December 24, 2020, 03:15:19 AM »
Right, 9.6% is quite big. Use 0.01 instead, the table gets created in less than 100 microseconds, and the error is 1% max.

If that is not precise enough, go for 0.001:
Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
202 µs for initialising the FastMath macro
0.0500  2.86598398      2.86598398
0.150   8.62692656      8.62692656
0.250   14.4775122      14.4775122
0.350   20.4873151      20.4873151
0.450   26.7436840      26.7436840
0.550   33.3670130      33.3670130
0.650   40.5416019      40.5416019
0.750   48.5903779      48.5903789
0.850   58.2116694      58.2116729
0.950   71.8051277      71.8051861

202 microseconds, or 0.2 milliseconds. Given that some of my "professional" software needs minutes to start, I start wondering what is the motivation behind your critical comments, Héctor :badgrin:

HSE

  • Member
  • *****
  • Posts: 1744
  • <AMD>< 7-32>
Re: ArcSin timings
« Reply #29 on: December 24, 2020, 04:07:18 AM »
I start wondering what is the motivation behind your critical comments, Héctor :badgrin:

Just beating what I can. It's the laboratory:
Quote
Algorithm and code design research laboratory. This is the place to post assembler algorithms and code design for discussion, optimisation and any other improvements that can be made on it. Post code here to be beaten to death to make it better, smaller, faster or more powerful. Feel free to explain the optimisation methods used so that everyone can get a feel for the code design.


Tables are interesting tools, especially if you macro make so easy to create, but how to build it and when to use it deserve some considerations.