Author Topic: Atan2 SSE2  (Read 7493 times)

guga

  • Member
  • *****
  • Posts: 1451
  • Assembly is a state of art.
    • RosAsm
Re: Atan2 SSE2
« Reply #60 on: March 03, 2022, 05:27:22 AM »
Hi Guys

JJ, i don´t know yet how exactly the ranges works on the original lolremez version.

You can give a shot with the attached file or on it´s documentation here: https://github.com/samhocevar/lolremez

About Fb itself, i´m finishing to port, but i must confess, this is one of the most painful things i ever saw before. FB inserts tons of useless code during compiling and the code is so bad organized that it´s really really hard to port or even understand the code flow.
For example, to put a simple line on the console, FB code, seems to be easy, fast and reliable, right ? Something at principle, simple as:
Code: [Select]
   ' puts (String(Loword(Width)*Hiword(Width)," ")+Chr(10)) 'clear console
    puts("_____________________________")
    puts(!"Approxomating Polynomial\n")
     sh=copy + "  [" +str(lower) + " To " + str(upper) + "]"

Ok...but, that´s not true at all. Internally for inserting a simple line, FB does this:

Code: [Select]
    ; puts (String(Loword(Width)*Hiword(Width)," ")+Chr(10)) 'clear console
    C_call 'msvcrt.puts' {B$ "_____________________________", 0}
    C_call 'msvcrt.puts' {B$ "Approximating Polynomial", 0A, 0}
    call 'FbRtl32.fb_StrAssign' HValue, 0-1, DValue, 0-1, 0
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, {B$ "  [", 0}, 4, 0
    call 'FbRtl32.fb_DoubleToStr' D$Lower, D$Lower+4
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, eax, 0-1, 0
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, {B$ " To ", 0}, 5, 0
    call 'FbRtl32.fb_DoubleToStr' D$Upper, D$Upper+4
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, eax, 0-1, 0
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, {B$ "]", 0}, 2, 0
    C_call 'msvcrt.puts' D$HValue.pStringData

When, in fact, all of the above code, each function goes on several windows api (msvcrt), instead simply doing it in 2 lines of code (In fact, using only 2 of these apis: sprintf and puts) like this:

Code: [Select]
[Sz_PolyminalIntro: B$ "_____________________________
Approximating Polynomial

%s  [%.16g To %.16g]", 0]

[Sz_OutputStrBuff: B$ 0 #512]

    C_call 'msvcrt.sprintf' Sz_OutputStrBuff, Sz_PolyminalIntro, D$DValue.pStringData, D$Lower, D$Lower+4, D$Upper, D$Upper+4
    C_call 'msvcrt.puts' Sz_OutputStrBuff

Again, the whole idea is not bad, but the internal organization is terrible. On the other hand, BCX for example was used to produce a way cleaner code. (Don´t know how is it doing right now, because didn´t had time o test, but as far i remember, BCX was way more robust then FB - at least in terms of internal organization)

Anyway..i´ll continue to port this, and will try to finish today or tomorrow. Keep in mind, that, if FB did that mess on a single string outputted to a console, you may get an idea of what a hell i´m trying to fix on the rest of the code :biggrin: :biggrin: :biggrin: :biggrin:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

  • Member
  • *****
  • Posts: 13661
  • Assembly is fun ;-)
    • MasmBasic
Re: Atan2 SSE2
« Reply #61 on: March 03, 2022, 08:31:19 AM »
JJ, i don´t know yet how exactly the ranges works on the original lolremez version.
It's the -r parameter:
lolremez --double -d 9 -r "-pi:pi" "sin(x)+cos(x)"
pause

Quote
You can give a shot with the attached file

Here are the timings: LolRemez is faster if you don't need precision. Note these are results over the full -180...+180 degrees range:

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

6476    cycles for 100 * SinPlusCos FastMath
39649   cycles for 100 * SinPlusCos Fpu
5289    cycles for 100 * LolRemez 6
6341    cycles for 100 * LolRemez 7
7350    cycles for 100 * LolRemez 8
10189   cycles for 100 * LolRemez 10

6480    cycles for 100 * SinPlusCos FastMath
39487   cycles for 100 * SinPlusCos Fpu
5296    cycles for 100 * LolRemez 6
6334    cycles for 100 * LolRemez 7
7378    cycles for 100 * LolRemez 8
10195   cycles for 100 * LolRemez 10

75      bytes for SinPlusCos FastMath
33      bytes for SinPlusCos Fpu
83      bytes for LolRemez 6
93      bytes for LolRemez 7
103     bytes for LolRemez 8
123     bytes for LolRemez 10

exact   -0.9823952887191077263
Real8   -0.9823952887191077510  SinPlusCos FastMath
Real8   -0.9823952887191077510  SinPlusCos Fpu
Real8   -0.9580416352510443546  LolRemez 6
Real8   -0.9768328662750306313  LolRemez 7
Real8   -0.9833290686311381146  LolRemez 8
Real8   -0.9823756874760426472  LolRemez 10

guga

  • Member
  • *****
  • Posts: 1451
  • Assembly is a state of art.
    • RosAsm
Re: Atan2 SSE2
« Reply #62 on: March 03, 2022, 01:11:56 PM »
Hi JJ. So, on this example it´s not totally usefull. I´m finishing the app to we test later, but ...in meanwhile i found some interesting material that can improve accuracy using something called Caratheodory-Fejer approximation. This method seems to use the Chebyshev coefficients to decrease the error.

it´s way out of my head right now. But, these articles are the kind of material that Siekmanski and Jack would like  :azn: :azn: :azn:

https://encyclopediaofmath.org/wiki/Carath%C3%A9odory-Fej%C3%A9r_problem
https://www.embeddedrelated.com/showarticle/152.php
https://www.chebfun.org/docs/guide/guide04.html
https://www.mathworks.com/matlabcentral/fileexchange/22055-caratheodory-fejer-approximation
https://books.google.com.br/books?id=xsnBDwAAQBAJ&pg=PA159&lpg=PA159&dq=Caratheodory-Fejer+approximation,&source=bl&ots=uIKjZvoFNT&sig=ACfU3U1fb_hKZ9FRvyqG7b5y5KO4pCWsnA&hl=pt-BR&sa=X&ved=2ahUKEwiRn8_X7Kj2AhXJr5UCHQ8tDVo4HhDoAXoECA8QAw#v=onepage&q=Caratheodory-Fejer%20approximation%2C&f=false
« Last Edit: March 03, 2022, 07:30:40 PM by guga »
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

  • Member
  • *****
  • Posts: 13661
  • Assembly is fun ;-)
    • MasmBasic
Re: Atan2 SSE2
« Reply #63 on: March 03, 2022, 08:04:47 PM »
Hi JJ. So, on this example it´s not totally usefull.

FastMath is a factor 6 faster than the FPU, and on par with a LolRemez with 7 coefficients. The latter deviates by half a percent from the true result; that is 9 pixels on a screen 1600 pixels wide... so it depends on your usage. IIRC you need it for colours, so that is 256*0.5%->+-0.7 for the LolRemez 7. I suppose you can live with that :smiley:

Code: [Select]
6476    cycles for 100 * SinPlusCos FastMath
39649   cycles for 100 * SinPlusCos Fpu
5289    cycles for 100 * LolRemez 6
6341    cycles for 100 * LolRemez 7
...
exact   -0.9823952887191077263
Real8   -0.9823952887191077510  SinPlusCos FastMath
Real8   -0.9823952887191077510  SinPlusCos Fpu
Real8   -0.9580416352510443546  LolRemez 6
Real8   -0.9768328662750306313  LolRemez 7

guga

  • Member
  • *****
  • Posts: 1451
  • Assembly is a state of art.
    • RosAsm
Re: Atan2 SSE2
« Reply #64 on: March 04, 2022, 05:57:15 AM »
Hi JJ

Agree when you say that for colors, it may not be that precise, but for other needs it maybe necessary something that grants more precision. I´m quite finishing the Fb version and will release it soon to we test both the Dodicat´s approach and the FB runtime dlls we created  :thumbsup: :thumbsup: :thumbsup:

This is the app fully working on the same way as the original. I´ll clean up the code and will go to the fix of the export (savings) now.

One question, i plan to export the result also in Masm and RosAsm syntax, so we can use this thing if needed, but i don´t know how to create multiline comments in masm.

In Rosasm when i want a multiline comment, insert it all in between double semicolon in between the text. Like:
;;
     This is a comment
;;


How i do it in masm ? What is the syntax that allow me to do multi-line comments in masm ?


Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

HSE

  • Member
  • *****
  • Posts: 2366
  • AMD 7-32 / i3 10-64
Re: Atan2 SSE2
« Reply #65 on: March 04, 2022, 06:09:56 AM »
Standard is:
Code: [Select]
comment *
     This is a comment
*
 
Can be whatever character not used in commented lines.

To comment code:
Code: [Select]
if 0
commented code
endif
because you can uncomment:
Code: [Select]
if 1
commented code
endif



Equations in Assembly: SmplMath

jj2007

  • Member
  • *****
  • Posts: 13661
  • Assembly is fun ;-)
    • MasmBasic
Re: Atan2 SSE2
« Reply #66 on: March 04, 2022, 07:58:12 AM »
To comment code:
Code: [Select]
if 0
  commented code
endif

That's what I use (169 occurrences in the RichMasm source...)

For comparing several variants, I often use
Code: [Select]
if 0
   ... code version A ...
elseif 1
   ... code version B ... (will be used in this example)
else
   ... code version C ...
endif

FORTRANS

  • Member
  • *****
  • Posts: 1227
Re: Atan2 SSE2
« Reply #67 on: March 04, 2022, 08:19:22 AM »
Hi,

   There is the "COMMENT" directive.

Code: [Select]
        COMMENT !

   That works as [COMMENT Symbol] and when the symbol
next appears, it ends the comments section.  (With the
following line as code, I think.)

                !
; This is now normal code.
        MOV     EAX,-1

Cheers,

Steve N.

daydreamer

  • Member
  • *****
  • Posts: 2308
  • my kind of REAL10 Blonde
Re: Atan2 SSE2
« Reply #68 on: March 04, 2022, 07:02:13 PM »
Hi,

   There is the "COMMENT" directive.

Code: [Select]
        COMMENT !

   That works as [COMMENT Symbol] and when the symbol
next appears, it ends the comments section.  (With the
following line as code, I think.)

                !
; This is now normal code.
        MOV     EAX,-1

Cheers,

Steve N.
thanks Steve :thumbsup:
makes it easier to translate C style 
/*
multiline comment
*/
my none asm creations
http://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

InfiniteLoop

  • Regular Member
  • *
  • Posts: 39
Re: Atan2 SSE2
« Reply #69 on: April 21, 2022, 11:47:56 PM »
How are you calculating the "lolremez" AKA chebyshev coefficients? Are you using the formula or solving the simultaneous equations? I got stuck and gave up ages ago.

jj2007

  • Member
  • *****
  • Posts: 13661
  • Assembly is fun ;-)
    • MasmBasic
Re: Atan2 SSE2
« Reply #70 on: April 22, 2022, 01:14:27 AM »
See Reply #61