The MASM Forum

General => The Laboratory => Topic started by: jj2007 on December 21, 2020, 11:16:57 AM

Title: ArcSin timings
Post by: jj2007 on December 21, 2020, 11:16:57 AM
Two algos that calculate arcsin(x) in the range x=0 ... 0.5. The first one, Arcsinus(), uses Raymond's tutorial (http://www.ray.masmcode.com/tutorial/fpuchap10.htm), the second algo uses FastMath with Arcsinus() values:

FastMath ArcSin ; define a math function
  For_ fct=0.0 To 1.0 Step 0.0001
fld fct
fstp REAL10 ptr [edi]
void Arcsinus(fct)
fstp REAL10 ptr [edi+REAL10]
add edi, 2*REAL10
  Next
FastMath


May I have some timings please?

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

15296   cycles for 100 * Arcsinus
1907    cycles for 100 * ArcSin

15373   cycles for 100 * Arcsinus
1899    cycles for 100 * ArcSin

15238   cycles for 100 * Arcsinus
1912    cycles for 100 * ArcSin

15206   cycles for 100 * Arcsinus
1910    cycles for 100 * ArcSin

15219   cycles for 100 * Arcsinus
1905    cycles for 100 * ArcSin

58      bytes for Arcsinus
209     bytes for ArcSin

Real8   29.99999926061033761    Arcsinus
Real8   29.99999926061034117    ArcSin
Title: Re: ArcSin timings
Post by: Siekmanski on December 21, 2020, 11:24:33 AM
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

18643   cycles for 100 * Arcsinus
2256    cycles for 100 * ArcSin

18636   cycles for 100 * Arcsinus
2246    cycles for 100 * ArcSin

18650   cycles for 100 * Arcsinus
2261    cycles for 100 * ArcSin

18629   cycles for 100 * Arcsinus
2255    cycles for 100 * ArcSin

18640   cycles for 100 * Arcsinus
2261    cycles for 100 * ArcSin

58      bytes for Arcsinus
209     bytes for ArcSin

Real8   29.99999926061033761    Arcsinus
Real8   29.99999926061034117    ArcSin

--- ok ---
Title: Re: ArcSin timings
Post by: jj2007 on December 21, 2020, 11:30:13 AM
Thanks, Marinus :thup:

I wonder why my old i5 is a tick faster... doesn't make much sense :cool:

Core i5-2450M (https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i5-2450M+%40+2.50GHz&id=800)
Core i7-4930K (https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i7-4930K+%40+3.40GHz&id=2023)
Title: Re: ArcSin timings
Post by: Siekmanski on December 21, 2020, 04:45:32 PM
Hi Jochen,
My system is clocked down to prevent noise, for live audio recordings.
Title: Re: ArcSin timings
Post by: daydreamer on December 21, 2020, 06:37:33 PM
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (SSE4)

16580   cycles for 100 * Arcsinus
1378    cycles for 100 * ArcSin

16408   cycles for 100 * Arcsinus
1378    cycles for 100 * ArcSin

16397   cycles for 100 * Arcsinus
1358    cycles for 100 * ArcSin

16463   cycles for 100 * Arcsinus
1368    cycles for 100 * ArcSin

16674   cycles for 100 * Arcsinus
1358    cycles for 100 * ArcSin

58      bytes for Arcsinus
209     bytes for ArcSin

Real8   29.99999926061033761    Arcsinus
Real8   29.99999926061034117    ArcSin

-

also wonder how the oldschool raycasting optimization stand compared to this:an arccos LUT? :biggrin:

I have a general SSE trigo PROC thats untested,just input 4 floats and offset value that controls which set of constants it points to it,so it become different taylor series


@Marinus
I thought you underclock it because it would be a bigger challenge to optimize it to run on slower cpu :badgrin:
AMIGA clock speed today would really challenging :biggrin:
Title: Re: ArcSin timings
Post by: Siekmanski on December 21, 2020, 10:47:47 PM
@Magnus
Still miss the Amiga days, banging directly to the hardware was a lot of fun.
Title: Re: ArcSin timings
Post by: HSE on December 22, 2020, 12:21:51 AM
 :biggrin: What its ArcSinInit's timing?
Title: Re: ArcSin timings
Post by: jj2007 on December 22, 2020, 12:50:55 AM
Quote from: HSE on December 22, 2020, 12:21:51 AM
:biggrin: What its ArcSinInit's timing?

Test yourself - it might be enough for a coffee break, who knows? :biggrin:

ArcSinInit:
NanoTimer()
FastMath ArcSin        ; define a math function
  For_ fct=0.0 To 1.0 Step 0.0001
        fld fct
        fstp REAL10 ptr [edi]
        void Arcsinus(fct)
        fstp REAL10 ptr [edi+REAL10]
        add edi, 2*REAL10
  Next
FastMath
PrintLine NanoTimer$(), " for initialising the ArcSin macro"
retn
Title: Re: ArcSin timings
Post by: HSE on December 22, 2020, 01:05:42 AM
Quote from: jj2007 on December 22, 2020, 12:50:55 AM
Test yourself - it might be enough for a coffee break, who knows? :biggrin:

I tried previously but I have a little crash related with memory allocation  :biggrin:
Title: Re: ArcSin timings
Post by: jj2007 on December 22, 2020, 02:36:31 AM
Post the exe, I am curious
Title: Re: ArcSin timings
Post by: TouEnMasm on December 22, 2020, 03:26:26 AM

Intel(R) Core(TM) i3-4150 CPU @ 3.50GHz (SSE4)

18942   cycles for 100 * Arcsinus
2095    cycles for 100 * ArcSin

18950   cycles for 100 * Arcsinus
2100    cycles for 100 * ArcSin

19068   cycles for 100 * Arcsinus
2137    cycles for 100 * ArcSin

18970   cycles for 100 * Arcsinus
2596    cycles for 100 * ArcSin

18904   cycles for 100 * Arcsinus
2112    cycles for 100 * ArcSin

58      bytes for Arcsinus
209     bytes for ArcSin

Real8   29.99999926061033761    Arcsinus
Real8   29.99999926061034117    ArcSin

--- ok ---
Title: Re: ArcSin timings
Post by: TimoVJL on December 22, 2020, 03:33:06 AM
AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

19735   cycles for 100 * Arcsinus
2530    cycles for 100 * ArcSin

19817   cycles for 100 * Arcsinus
2535    cycles for 100 * ArcSin

19796   cycles for 100 * Arcsinus
2309    cycles for 100 * ArcSin

19779   cycles for 100 * Arcsinus
2326    cycles for 100 * ArcSin

19822   cycles for 100 * Arcsinus
2321    cycles for 100 * ArcSin

58      bytes for Arcsinus
209     bytes for ArcSin

Real8   29.99999926061033761    Arcsinus
Real8   29.99999926061034117    ArcSin

-
Title: Re: ArcSin timings
Post by: HSE on December 22, 2020, 04:12:07 AM
Quote from: jj2007 on December 22, 2020, 02:36:31 AM
Post the exe, I am curious

:thumbsup: Was missing MbProHeap initialization:  ifdef MbBufferInit
call MbBufferInit
  endif
Title: Re: ArcSin timings
Post by: quarantined on December 22, 2020, 04:40:57 AM

AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)
1.60 GHz

28377   cycles for 100 * Arcsinus
4369    cycles for 100 * ArcSin

28350   cycles for 100 * Arcsinus
4247    cycles for 100 * ArcSin

28358   cycles for 100 * Arcsinus
4414    cycles for 100 * ArcSin

28341   cycles for 100 * Arcsinus
4440    cycles for 100 * ArcSin

28374   cycles for 100 * Arcsinus
4300    cycles for 100 * ArcSin

58      bytes for Arcsinus
209     bytes for ArcSin

Real8   29.99999926061033761    Arcsinus
Real8   29.99999926061034117    ArcSin


Windows 7 Pro, 32 bit
Title: Re: ArcSin timings
Post by: coaster on December 22, 2020, 05:13:35 AM
Intel(R) Core(TM) i5-7300U CPU @ 2.60GHz (SSE4)

14448   cycles for 100 * Arcsinus
1225    cycles for 100 * ArcSin

14644   cycles for 100 * Arcsinus
1214    cycles for 100 * ArcSin

14663   cycles for 100 * Arcsinus
1237    cycles for 100 * ArcSin

14606   cycles for 100 * Arcsinus
1206    cycles for 100 * ArcSin

14518   cycles for 100 * Arcsinus
1272    cycles for 100 * ArcSin

58      bytes for Arcsinus
209     bytes for ArcSin

Real8   29.99999926061033761    Arcsinus
Real8   29.99999926061034117    ArcSin
Title: Re: ArcSin timings
Post by: jj2007 on December 22, 2020, 06:25:58 AM
Quote from: HSE on December 22, 2020, 04:12:07 AM
:thumbsup: Was missing MbProHeap initialization:  ifdef MbBufferInit
call MbBufferInit
  endif


Did you call ArcSinInit before ShowCpu? Normally the Init macro takes care of all that, but my timings template keeps the option open to run it without MasmBasic, so I chose the ifdef MbBufferInit instead.
Title: Re: ArcSin timings
Post by: HSE on December 23, 2020, 01:24:33 AM
Quote from: jj2007 on December 22, 2020, 06:25:58 AM
Did you call ArcSinInit before ShowCpu? Normally the Init macro takes care of all that, but my timings template keeps the option open to run it without MasmBasic, so I chose the ifdef MbBufferInit instead.
Essentially is the code you posted. The MasmBasic book is not so clear  :biggrin:

BTW FastMath name sound nice, but is a little misleading. Perhaps MathLUT or something like that, because is a look up table creation macro, not fast calculation.  :thumbsup:
Title: Re: ArcSin timings
Post by: jj2007 on December 23, 2020, 01:35:06 AM
Quote from: HSE on December 23, 2020, 01:24:33 AMBTW FastMath name sound nice, but is a little misleading. Perhaps MathLUT or something like that, because is a look up table creation macro, not fast calculation.  :thumbsup:

If it gives you a factor 8-12 faster math, then the name is not that important ;-)
Title: Re: ArcSin timings
Post by: daydreamer on December 23, 2020, 02:35:41 AM
originally made for Guga's color conversions
SIMD taylor version
I love SIMD
0.234714
0.61685
0.380504
0.234714
0.144784
2!,4!,6!,8!
0.5
0.0416667
0.00138889
2.48016e-05
cosine result :0.707426
sine result :0.707107
arcsine result:0.900242
times x:1000000
clock cycles :55624547
cycles/loop :55

Title: Re: ArcSin timings
Post by: HSE on December 23, 2020, 03:03:25 AM
Quote from: jj2007 on December 23, 2020, 01:35:06 AM
If it gives you a factor 8-12 faster math
Not relly.

I'm thinking the equations (without presition acount) :

Indiference point is replacing around 25250 calculations. In this point factor is 1.
Title: Re: ArcSin timings
Post by: jj2007 on December 23, 2020, 04:53:18 AM
It takes one millisecond to build the table, Héctor. The more complex the mathematical function is, the more you can gain with FastMath :cool:
Title: Re: ArcSin timings
Post by: HSE on December 23, 2020, 05:26:12 AM
Quote from: jj2007 on December 23, 2020, 04:53:18 AM
The more complex the mathematical function is, the more you can gain with FastMath :cool:
Only if you are calling the table beyond indiference point  :biggrin:
Title: Re: ArcSin timings
Post by: daydreamer on December 23, 2020, 05:55:39 AM
Quote from: jj2007 on December 23, 2020, 04:53:18 AM
It takes one millisecond to build the table, Héctor. The more complex the mathematical function is, the more you can gain with FastMath :cool:
But if you make tables in Workerthread, while windows main thread creates and loads lot of things when it starts,you wouldn't notice the milliseconds it takes for make one or several tables

Title: Re: ArcSin timings
Post by: HSE on December 23, 2020, 06:40:50 AM
Quote from: daydreamer on December 23, 2020, 05:55:39 AM
But if you make tables in Workerthread, while windows main thread creates and loads lot of things when it starts,you wouldn't notice the milliseconds it takes for make one or several tables
Yes, you need even more time to prepare for the game :biggrin:

Tables like this are very usefull in games because you don't need presition.

Quote from: HSE on December 23, 2020, 05:26:12 AM
Only if you are calling the table beyond indiference point  :biggrin:
Indeed you have some profit after 50504 table access (almost double of indiference point)

Title: Re: ArcSin timings
Post by: jj2007 on December 23, 2020, 08:46:08 AM
Quote from: HSE on December 23, 2020, 06:40:50 AMIndeed you have some profit after 50504 table access (almost double of indiference point)

Ok, put Step 0.5 in ArcSinInit and do your optimisation again :badgrin:
Title: Re: ArcSin timings
Post by: HSE on December 23, 2020, 09:03:07 AM
Quote from: jj2007 on December 23, 2020, 08:46:08 AM
Ok, put Step 0.02 in ArcSinInit and do your optimisation again :badgrin:
I you don't need any precition you just could put any number as a solution, that could be even faster  :biggrin:
Title: Re: ArcSin timings
Post by: jj2007 on December 23, 2020, 09:42:01 AM
If you don't need precision, check if it's still good enough with Step 0.5:
ArcSinInit:
  NanoTimer()
  FastMath ArcSin ; define a math function
For_ fct=0.0 To 1.0 Step 0.1
fld fct
fstp REAL10 ptr [edi]
void Arcsinus(fct)
fstp REAL10 ptr [edi+REAL10]
add edi, 2*REAL10
Next
  FastMath
  PrintLine NanoTimer$(), " for initialising the FastMath macro"
  For_ fct=0.05 To 1.0 Step 0.1 ; compare the exact value with the estimate
PrintLine Str$("%3f\t", fct), Str$("%9f\t", Arcsinus(fct)v), Str$("%9f", ArcSin(fct)v)
  Next
  retn


Internally, FastMath uses SetPoly3 (http://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1124). It's a fairly sophisticated LUT :cool:
Title: Re: ArcSin timings
Post by: HSE on December 24, 2020, 01:14:09 AM
Hi JJ!

With step= 0.1 you can have an absolute error= 5.6 x 100 (relative error is 9.6% in that point). That is very big, especially  compared with usual default: absolute error = 1.0 x 10-6

Then first you have to find step size for presicion you need, and second you have to choose between direct calculation or table depending on number of access to solution.

Regards. HSE
Title: Re: ArcSin timings
Post by: jj2007 on December 24, 2020, 03:15:19 AM
Right, 9.6% is quite big. Use 0.01 instead, the table gets created in less than 100 microseconds, and the error is 1% max.

If that is not precise enough, go for 0.001:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
202 µs for initialising the FastMath macro
0.0500  2.86598398      2.86598398
0.150   8.62692656      8.62692656
0.250   14.4775122      14.4775122
0.350   20.4873151      20.4873151
0.450   26.7436840      26.7436840
0.550   33.3670130      33.3670130
0.650   40.5416019      40.5416019
0.750   48.5903779      48.5903789
0.850   58.2116694      58.2116729
0.950   71.8051277      71.8051861


202 microseconds, or 0.2 milliseconds. Given that some of my "professional" software needs minutes to start, I start wondering what is the motivation behind your critical comments, Héctor :badgrin:
Title: Re: ArcSin timings
Post by: HSE on December 24, 2020, 04:07:18 AM
Quote from: jj2007 on December 24, 2020, 03:15:19 AM
I start wondering what is the motivation behind your critical comments, Héctor :badgrin:

Just beating what I can. It's the laboratory:
QuoteAlgorithm and code design research laboratory. This is the place to post assembler algorithms and code design for discussion, optimisation and any other improvements that can be made on it. Post code here to be beaten to death to make it better, smaller, faster or more powerful. Feel free to explain the optimisation methods used so that everyone can get a feel for the code design.

Tables are interesting tools, especially if you macro make so easy to create, but how to build it and when to use it deserve some considerations.
Title: Re: ArcSin timings
Post by: daydreamer on December 24, 2020, 06:44:30 AM
Quote from: HSE on December 24, 2020, 04:07:18 AM
Quote from: jj2007 on December 24, 2020, 03:15:19 AM
I start wondering what is the motivation behind your critical comments, Héctor :badgrin:

Just beating what I can. It's the laboratory:
QuoteAlgorithm and code design research laboratory. This is the place to post assembler algorithms and code design for discussion, optimisation and any other improvements that can be made on it. Post code here to be beaten to death to make it better, smaller, faster or more powerful. Feel free to explain the optimisation methods used so that everyone can get a feel for the code design.

Tables are interesting tools, especially if you macro make so easy to create, but how to build it and when to use it deserve some considerations.
well if you use trigo later for drawing in this 64bit era,wouldnt fixed point table be faster alternative,even 32bit support MUL that results in 32bit in eax,32bit in edx?also DIV use both 32bit registers?
Title: Re: ArcSin timings
Post by: HSE on December 24, 2020, 07:08:13 AM
Quote from: daydreamer on December 24, 2020, 06:44:30 AM
well if you use trigo later for drawing in this 64bit era,wouldnt fixed point table be faster alternative,even 32bit support MUL that results in 32bit in eax,32bit in edx?also DIV use both 32bit registers?
You can timing that to be sure  :biggrin:
Title: Re: ArcSin timings
Post by: daydreamer on December 28, 2020, 12:42:47 AM
Quote from: HSE on December 24, 2020, 07:08:13 AM
Quote from: daydreamer on December 24, 2020, 06:44:30 AM
well if you use trigo later for drawing in this 64bit era,wouldnt fixed point table be faster alternative,even 32bit support MUL that results in 32bit in eax,32bit in edx?also DIV use both 32bit registers?
You can timing that to be sure  :biggrin:
it's best to make optimisation and timings also in practical uses,whole tunnel(stargate),sphere(planet) code,float to int conversion take not only some cycles,mixing SSE floating point code and SSE 2 integer code you get some penalty
For example circle , can be from 32x32 sprite to  big hires 4k hd screen, 32 diameter* pi vs 1080 diameter * pi, so a general 360 degree LUT, works best for 360 pixel circle, too many for 32diameter and too few for 1080 diameter
Title: Re: ArcSin timings
Post by: quarantined on April 11, 2021, 03:49:20 AM
Quote from: quarantined on December 22, 2020, 04:40:57 AM

AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)
1.60 GHz

28377   cycles for 100 * Arcsinus
4369    cycles for 100 * ArcSin

28350   cycles for 100 * Arcsinus
4247    cycles for 100 * ArcSin

28358   cycles for 100 * Arcsinus
4414    cycles for 100 * ArcSin

28341   cycles for 100 * Arcsinus
4440    cycles for 100 * ArcSin

28374   cycles for 100 * Arcsinus
4300    cycles for 100 * ArcSin

58      bytes for Arcsinus
209     bytes for ArcSin

Real8   29.99999926061033761    Arcsinus
Real8   29.99999926061034117    ArcSin

Windows 7 Pro, 32 bit

from new computer, xp 32 bit  :biggrin:
Quote
Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (SSE4)

17230   cycles for 100 * Arcsinus
2710    cycles for 100 * ArcSin

17212   cycles for 100 * Arcsinus
2726    cycles for 100 * ArcSin

17212   cycles for 100 * Arcsinus
2714    cycles for 100 * ArcSin

17219   cycles for 100 * Arcsinus
2710    cycles for 100 * ArcSin

17212   cycles for 100 * Arcsinus
2710    cycles for 100 * ArcSin

58      bytes for Arcsinus
209     bytes for ArcSin

Real8   29.99999926061033761    Arcsinus
Real8   29.99999926061034117    ArcSin

--- ok ---
[\quote]

W@hen I say 'new' its a misnomer. Its a refurished hp 8100 elite SFF box. For $130 USD, complete with monitor keyboard and mouse it's doing a terrific job so far. And drivers are still around for the good old XP.  :tongue: