Author Topic: ArcSin timings  (Read 4686 times)

daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
Re: ArcSin timings
« Reply #30 on: December 24, 2020, 06:44:30 AM »
I start wondering what is the motivation behind your critical comments, H├ęctor :badgrin:

Just beating what I can. It's the laboratory:
Quote
Algorithm and code design research laboratory. This is the place to post assembler algorithms and code design for discussion, optimisation and any other improvements that can be made on it. Post code here to be beaten to death to make it better, smaller, faster or more powerful. Feel free to explain the optimisation methods used so that everyone can get a feel for the code design.


Tables are interesting tools, especially if you macro make so easy to create, but how to build it and when to use it deserve some considerations.
well if you use trigo later for drawing in this 64bit era,wouldnt fixed point table be faster alternative,even 32bit support MUL that results in 32bit in eax,32bit in edx?also DIV use both 32bit registers?
SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

HSE

  • Member
  • *****
  • Posts: 1741
  • <AMD>< 7-32>
Re: ArcSin timings
« Reply #31 on: December 24, 2020, 07:08:13 AM »
well if you use trigo later for drawing in this 64bit era,wouldnt fixed point table be faster alternative,even 32bit support MUL that results in 32bit in eax,32bit in edx?also DIV use both 32bit registers?
You can timing that to be sure  :biggrin:

daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
Re: ArcSin timings
« Reply #32 on: December 28, 2020, 12:42:47 AM »
well if you use trigo later for drawing in this 64bit era,wouldnt fixed point table be faster alternative,even 32bit support MUL that results in 32bit in eax,32bit in edx?also DIV use both 32bit registers?
You can timing that to be sure  :biggrin:
it's best to make optimisation and timings also in practical uses,whole tunnel(stargate),sphere(planet) code,float to int conversion take not only some cycles,mixing SSE floating point code and SSE 2 integer code you get some penalty
For example circle , can be from 32x32 sprite to  big hires 4k hd screen, 32 diameter* pi vs 1080 diameter * pi, so a general 360 degree LUT, works best for 360 pixel circle, too many for 32diameter and too few for 1080 diameter
SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

quarantined

  • Regular Member
  • *
  • Posts: 22
Re: ArcSin timings
« Reply #33 on: April 11, 2021, 03:49:20 AM »

AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)
1.60 GHz

28377   cycles for 100 * Arcsinus
4369    cycles for 100 * ArcSin

28350   cycles for 100 * Arcsinus
4247    cycles for 100 * ArcSin

28358   cycles for 100 * Arcsinus
4414    cycles for 100 * ArcSin

28341   cycles for 100 * Arcsinus
4440    cycles for 100 * ArcSin

28374   cycles for 100 * Arcsinus
4300    cycles for 100 * ArcSin

58      bytes for Arcsinus
209     bytes for ArcSin

Real8   29.99999926061033761    Arcsinus
Real8   29.99999926061034117    ArcSin

Windows 7 Pro, 32 bit

from new computer, xp 32 bit  :biggrin:
Quote
Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (SSE4)

17230   cycles for 100 * Arcsinus
2710    cycles for 100 * ArcSin

17212   cycles for 100 * Arcsinus
2726    cycles for 100 * ArcSin

17212   cycles for 100 * Arcsinus
2714    cycles for 100 * ArcSin

17219   cycles for 100 * Arcsinus
2710    cycles for 100 * ArcSin

17212   cycles for 100 * Arcsinus
2710    cycles for 100 * ArcSin

58      bytes for Arcsinus
209     bytes for ArcSin

Real8   29.99999926061033761    Arcsinus
Real8   29.99999926061034117    ArcSin

--- ok ---
[\quote]

W@hen I say 'new' its a misnomer. Its a refurished hp 8100 elite SFF box. For $130 USD, complete with monitor keyboard and mouse it's doing a terrific job so far. And drivers are still around for the good old XP.  :tongue: