News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Atan2 SSE2

Started by guga, February 07, 2022, 08:27:40 AM

Previous topic - Next topic

guga

Hi Guys

JJ, i don´t know yet how exactly the ranges works on the original lolremez version.

You can give a shot with the attached file or on it´s documentation here: https://github.com/samhocevar/lolremez

About Fb itself, i´m finishing to port, but i must confess, this is one of the most painful things i ever saw before. FB inserts tons of useless code during compiling and the code is so bad organized that it´s really really hard to port or even understand the code flow.
For example, to put a simple line on the console, FB code, seems to be easy, fast and reliable, right ? Something at principle, simple as:
   ' puts (String(Loword(Width)*Hiword(Width)," ")+Chr(10)) 'clear console
    puts("_____________________________")
    puts(!"Approxomating Polynomial\n")
     sh=copy + "  [" +str(lower) + " To " + str(upper) + "]"


Ok...but, that´s not true at all. Internally for inserting a simple line, FB does this:


    ; puts (String(Loword(Width)*Hiword(Width)," ")+Chr(10)) 'clear console
    C_call 'msvcrt.puts' {B$ "_____________________________", 0}
    C_call 'msvcrt.puts' {B$ "Approximating Polynomial", 0A, 0}
    call 'FbRtl32.fb_StrAssign' HValue, 0-1, DValue, 0-1, 0
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, {B$ "  [", 0}, 4, 0
    call 'FbRtl32.fb_DoubleToStr' D$Lower, D$Lower+4
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, eax, 0-1, 0
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, {B$ " To ", 0}, 5, 0
    call 'FbRtl32.fb_DoubleToStr' D$Upper, D$Upper+4
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, eax, 0-1, 0
    call 'FbRtl32.fb_StrConcatAssign' HValue, 0-1, {B$ "]", 0}, 2, 0
    C_call 'msvcrt.puts' D$HValue.pStringData


When, in fact, all of the above code, each function goes on several windows api (msvcrt), instead simply doing it in 2 lines of code (In fact, using only 2 of these apis: sprintf and puts) like this:

[Sz_PolyminalIntro: B$ "_____________________________
Approximating Polynomial

%s  [%.16g To %.16g]", 0]

[Sz_OutputStrBuff: B$ 0 #512]

    C_call 'msvcrt.sprintf' Sz_OutputStrBuff, Sz_PolyminalIntro, D$DValue.pStringData, D$Lower, D$Lower+4, D$Upper, D$Upper+4
    C_call 'msvcrt.puts' Sz_OutputStrBuff


Again, the whole idea is not bad, but the internal organization is terrible. On the other hand, BCX for example was used to produce a way cleaner code. (Don´t know how is it doing right now, because didn´t had time o test, but as far i remember, BCX was way more robust then FB - at least in terms of internal organization)

Anyway..i´ll continue to port this, and will try to finish today or tomorrow. Keep in mind, that, if FB did that mess on a single string outputted to a console, you may get an idea of what a hell i´m trying to fix on the rest of the code :biggrin: :biggrin: :biggrin: :biggrin:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

Quote from: guga on March 03, 2022, 05:27:22 AMJJ, i don´t know yet how exactly the ranges works on the original lolremez version.
It's the -r parameter:
lolremez --double -d 9 -r "-pi:pi" "sin(x)+cos(x)"
pause

QuoteYou can give a shot with the attached file

Here are the timings: LolRemez is faster if you don't need precision. Note these are results over the full -180...+180 degrees range:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

6476    cycles for 100 * SinPlusCos FastMath
39649   cycles for 100 * SinPlusCos Fpu
5289    cycles for 100 * LolRemez 6
6341    cycles for 100 * LolRemez 7
7350    cycles for 100 * LolRemez 8
10189   cycles for 100 * LolRemez 10

6480    cycles for 100 * SinPlusCos FastMath
39487   cycles for 100 * SinPlusCos Fpu
5296    cycles for 100 * LolRemez 6
6334    cycles for 100 * LolRemez 7
7378    cycles for 100 * LolRemez 8
10195   cycles for 100 * LolRemez 10

75      bytes for SinPlusCos FastMath
33      bytes for SinPlusCos Fpu
83      bytes for LolRemez 6
93      bytes for LolRemez 7
103     bytes for LolRemez 8
123     bytes for LolRemez 10

exact   -0.9823952887191077263
Real8   -0.9823952887191077510  SinPlusCos FastMath
Real8   -0.9823952887191077510  SinPlusCos Fpu
Real8   -0.9580416352510443546  LolRemez 6
Real8   -0.9768328662750306313  LolRemez 7
Real8   -0.9833290686311381146  LolRemez 8
Real8   -0.9823756874760426472  LolRemez 10

guga

#62
Hi JJ. So, on this example it´s not totally usefull. I´m finishing the app to we test later, but ...in meanwhile i found some interesting material that can improve accuracy using something called Caratheodory-Fejer approximation. This method seems to use the Chebyshev coefficients to decrease the error.

it´s way out of my head right now. But, these articles are the kind of material that Siekmanski and Jack would like  :azn: :azn: :azn:

https://encyclopediaofmath.org/wiki/Carath%C3%A9odory-Fej%C3%A9r_problem
https://www.embeddedrelated.com/showarticle/152.php
https://www.chebfun.org/docs/guide/guide04.html
https://www.mathworks.com/matlabcentral/fileexchange/22055-caratheodory-fejer-approximation
https://books.google.com.br/books?id=xsnBDwAAQBAJ&pg=PA159&lpg=PA159&dq=Caratheodory-Fejer+approximation,&source=bl&ots=uIKjZvoFNT&sig=ACfU3U1fb_hKZ9FRvyqG7b5y5KO4pCWsnA&hl=pt-BR&sa=X&ved=2ahUKEwiRn8_X7Kj2AhXJr5UCHQ8tDVo4HhDoAXoECA8QAw#v=onepage&q=Caratheodory-Fejer%20approximation%2C&f=false
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

Quote from: guga on March 03, 2022, 01:11:56 PM
Hi JJ. So, on this example it´s not totally usefull.

FastMath is a factor 6 faster than the FPU, and on par with a LolRemez with 7 coefficients. The latter deviates by half a percent from the true result; that is 9 pixels on a screen 1600 pixels wide... so it depends on your usage. IIRC you need it for colours, so that is 256*0.5%->+-0.7 for the LolRemez 7. I suppose you can live with that :smiley:

6476    cycles for 100 * SinPlusCos FastMath
39649   cycles for 100 * SinPlusCos Fpu
5289    cycles for 100 * LolRemez 6
6341    cycles for 100 * LolRemez 7
...
exact   -0.9823952887191077263
Real8   -0.9823952887191077510  SinPlusCos FastMath
Real8   -0.9823952887191077510  SinPlusCos Fpu
Real8   -0.9580416352510443546  LolRemez 6
Real8   -0.9768328662750306313  LolRemez 7

guga

Hi JJ

Agree when you say that for colors, it may not be that precise, but for other needs it maybe necessary something that grants more precision. I´m quite finishing the Fb version and will release it soon to we test both the Dodicat´s approach and the FB runtime dlls we created  :thumbsup: :thumbsup: :thumbsup:

This is the app fully working on the same way as the original. I´ll clean up the code and will go to the fix of the export (savings) now.

One question, i plan to export the result also in Masm and RosAsm syntax, so we can use this thing if needed, but i don´t know how to create multiline comments in masm.

In Rosasm when i want a multiline comment, insert it all in between double semicolon in between the text. Like:
;;
     This is a comment
;;


How i do it in masm ? What is the syntax that allow me to do multi-line comments in masm ?


Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

HSE

Standard is:

comment *
     This is a comment
*

Can be whatever character not used in commented lines.

To comment code:if 0
commented code
endif

because you can uncomment:if 1
commented code
endif



Equations in Assembly: SmplMath

jj2007

Quote from: HSE on March 04, 2022, 06:09:56 AMTo comment code:if 0
  commented code
endif

That's what I use (169 occurrences in the RichMasm source...)

For comparing several variants, I often use
if 0
   ... code version A ...
elseif 1
   ... code version B ... (will be used in this example)
else
   ... code version C ...
endif

FORTRANS

Hi,

   There is the "COMMENT" directive.

        COMMENT !

   That works as [COMMENT Symbol] and when the symbol
next appears, it ends the comments section.  (With the
following line as code, I think.)

                !
; This is now normal code.
        MOV     EAX,-1


Cheers,

Steve N.

daydreamer

Quote from: FORTRANS on March 04, 2022, 08:19:22 AM
Hi,

   There is the "COMMENT" directive.

        COMMENT !

   That works as [COMMENT Symbol] and when the symbol
next appears, it ends the comments section.  (With the
following line as code, I think.)

                !
; This is now normal code.
        MOV     EAX,-1


Cheers,

Steve N.
thanks Steve :thumbsup:
makes it easier to translate C style 
/*
multiline comment
*/
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

InfiniteLoop

How are you calculating the "lolremez" AKA chebyshev coefficients? Are you using the formula or solving the simultaneous equations? I got stuck and gave up ages ago.

jj2007