Atan2 SSE2

Started by guga, February 07, 2022, 08:27:40 AM

Hi Guys

JJ, i don´t know yet how exactly the ranges works on the original lolremez version.

You can give a shot with the attached file or on it´s documentation here:

About Fb itself, i´m finishing to port, but i must confess, this is one of the most painful things i ever saw before. FB inserts tons of useless code during compiling and the code is so bad organized that it´s really really hard to port or even understand the code flow.
For example, to put a simple line on the console, FB code, seems to be easy, fast and reliable, right ? Something at principle, simple as:
   ' puts (String(Loword(Width)*Hiword(Width)," ")+Chr(10)) 'clear console
    puts(!"Approxomating Polynomial\n")
     sh=copy + "  [" +str(lower) + " To " + str(upper) + "]"

Ok...but, that´s not true at all. Internally for inserting a simple line, FB does this:

    ; puts (String(Loword(Width)*Hiword(Width)," ")+Chr(10)) 'clear console
    C_call 'msvcrt.puts' {B$ "_____________________________", 0}
    C_call 'msvcrt.puts' {B$ "Approximating Polynomial", 0A, 0}
    call 'FbRtl32.fb_StrAssign' HValue, 0-1, DValue, 0-1, 0
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, {B$ "  [", 0}, 4, 0
    call 'FbRtl32.fb_DoubleToStr' D$Lower, D$Lower+4
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, eax, 0-1, 0
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, {B$ " To ", 0}, 5, 0
    call 'FbRtl32.fb_DoubleToStr' D$Upper, D$Upper+4
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, eax, 0-1, 0
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, {B$ "]", 0}, 2, 0
    C_call 'msvcrt.puts' D$HValue.pStringData

When, in fact, all of the above code, each function goes on several windows api (msvcrt), instead simply doing it in 2 lines of code (In fact, using only 2 of these apis: sprintf and puts) like this:

[Sz_PolyminalIntro: B$ "_____________________________
Approximating Polynomial

%s  [%.16g To %.16g]", 0]

[Sz_OutputStrBuff: B$ 0 #512]

    C_call 'msvcrt.sprintf' Sz_OutputStrBuff, Sz_PolyminalIntro, D$DValue.pStringData, D$Lower, D$Lower+4, D$Upper, D$Upper+4
    C_call 'msvcrt.puts' Sz_OutputStrBuff

Again, the whole idea is not bad, but the internal organization is terrible. On the other hand, BCX for example was used to produce a way cleaner code. (Don´t know how is it doing right now, because didn´t had time o test, but as far i remember, BCX was way more robust then FB - at least in terms of internal organization)

Anyway..i´ll continue to port this, and will try to finish today or tomorrow. Keep in mind, that, if FB did that mess on a single string outputted to a console, you may get an idea of what a hell i´m trying to fix on the rest of the code :biggrin: :biggrin: :biggrin: :biggrin:
Here are the timings: LolRemez is faster if you don't need precision. Note these are results over the full -180...+180 degrees range:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

6476    cycles for 100 * SinPlusCos FastMath
39649   cycles for 100 * SinPlusCos Fpu
5289    cycles for 100 * LolRemez 6
6341    cycles for 100 * LolRemez 7
7350    cycles for 100 * LolRemez 8
10189   cycles for 100 * LolRemez 10

6480    cycles for 100 * SinPlusCos FastMath
39487   cycles for 100 * SinPlusCos Fpu
5296    cycles for 100 * LolRemez 6
6334    cycles for 100 * LolRemez 7
7378    cycles for 100 * LolRemez 8
10195   cycles for 100 * LolRemez 10

75      bytes for SinPlusCos FastMath
33      bytes for SinPlusCos Fpu
83      bytes for LolRemez 6
93      bytes for LolRemez 7
103     bytes for LolRemez 8
123     bytes for LolRemez 10

exact   -0.9823952887191077263
Real8   -0.9823952887191077510  SinPlusCos FastMath
Real8   -0.9823952887191077510  SinPlusCos Fpu
Real8   -0.9580416352510443546  LolRemez 6
Real8   -0.9768328662750306313  LolRemez 7
Real8   -0.9833290686311381146  LolRemez 8
Real8   -0.9823756874760426472  LolRemez 10


Hi JJ. So, on this example it´s not totally usefull. I´m finishing the app to we test later, but meanwhile i found some interesting material that can improve accuracy using something called Caratheodory-Fejer approximation. This method seems to use the Chebyshev coefficients to decrease the error.

it´s way out of my head right now. But, these articles are the kind of material that Siekmanski and Jack would like  :azn: :azn: :azn:,&source=bl&ots=uIKjZvoFNT&sig=ACfU3U1fb_hKZ9FRvyqG7b5y5KO4pCWsnA&hl=pt-BR&sa=X&ved=2ahUKEwiRn8_X7Kj2AhXJr5UCHQ8tDVo4HhDoAXoECA8QAw#v=onepage&q=Caratheodory-Fejer%20approximation%2C&f=false
FastMath is a factor 6 faster than the FPU, and on par with a LolRemez with 7 coefficients. The latter deviates by half a percent from the true result; that is 9 pixels on a screen 1600 pixels wide... so it depends on your usage. IIRC you need it for colours, so that is 256*0.5%->+-0.7 for the LolRemez 7. I suppose you can live with that :smiley:

6476    cycles for 100 * SinPlusCos FastMath
39649   cycles for 100 * SinPlusCos Fpu
5289    cycles for 100 * LolRemez 6
6341    cycles for 100 * LolRemez 7
exact   -0.9823952887191077263
Real8   -0.9823952887191077510  SinPlusCos FastMath
Real8   -0.9823952887191077510  SinPlusCos Fpu
Real8   -0.9580416352510443546  LolRemez 6
Real8   -0.9768328662750306313  LolRemez 7



Agree when you say that for colors, it may not be that precise, but for other needs it maybe necessary something that grants more precision. I´m quite finishing the Fb version and will release it soon to we test both the Dodicat´s approach and the FB runtime dlls we created  :thumbsup: :thumbsup: :thumbsup:

This is the app fully working on the same way as the original. I´ll clean up the code and will go to the fix of the export (savings) now.

One question, i plan to export the result also in Masm and RosAsm syntax, so we can use this thing if needed, but i don´t know how to create multiline comments in masm.

In Rosasm when i want a multiline comment, insert it all in between double semicolon in between the text. Like:
     This is a comment

How i do it in masm ? What is the syntax that allow me to do multi-line comments in masm ?

Standard is:

comment *
     This is a comment

Can be whatever character not used in commented lines.

To comment code:if 0
commented code

because you can uncomment:if 1
commented code

Equations in Assembly: SmplMath


That's what I use (169 occurrences in the RichMasm source...)

For comparing several variants, I often use
if 0
   ... code version A ...
elseif 1
   ... code version B ... (will be used in this example)
   ... code version C ...



   There is the "COMMENT" directive.

        COMMENT !

   That works as [COMMENT Symbol] and when the symbol
next appears, it ends the comments section.  (With the
following line as code, I think.)

; This is now normal code.
        MOV     EAX,-1


Steve N.


thanks Steve :thumbsup:
makes it easier to translate C style 
multiline comment
How are you calculating the "lolremez" AKA chebyshev coefficients? Are you using the formula or solving the simultaneous equations? I got stuck and gave up ages ago.
