### Author Topic: asin() in SmplMath

##### asin() in SmplMath
« on: February 22, 2017, 11:07:37 AM »
Hi qWord!!

There is an issue in asin():
`fslv_fnc_asin macro  fld st  fmulp st,st  fld1  fsubr fpu_const_r4_one  fsqrt  fpatanendm`
And should be:
`fslv_fnc_asin macro  fld st  fmul st,st  fsubr fpu_const_r4_one  fsqrt  fpatanendm`
Regards. HSE

##### Re: asin() in SmplMath
« Reply #1 on: February 23, 2017, 04:36:35 AM »
I believe qWord's procedure is basicly correct. The intent is to compute the equivalent cosine according to the relation
sin2 + cos2 = 1

which translates into
cos = sqrt(1-sin2)

And then you get the angle from the arctan of the sin/cos ratio.

The only concern I have is the use of "fpu_const_r4_one" which I don't know what it is, and probably should not even be there.
##### Re: asin() in SmplMath
« Reply #2 on: February 23, 2017, 04:59:12 AM »
Hi Raymond!

The entire qWord's macro system is in SmplMath

For sure, something else is in arcsin function place. I have not studied the code very much, instead I tested results against other programs

LATER

Perhaps the idea was:
`fslv_fnc_asin macro  fld st  fmul st,st ; without p  fld1  fsubr    ; without fpu_const_r4_one  fsqrt  fpatanendm`
##### Re: asin() in SmplMath
« Reply #3 on: February 23, 2017, 07:00:09 AM »
Perhaps the idea was:

Could very well be that he forgot to comment it out after some other modification. Good point. Qword should be able to confirm that.
##### Re: asin() in SmplMath
« Reply #4 on: February 23, 2017, 07:25:22 AM »
yes, I confirm this a bug caused by an modification I did with version 2.0.

The FLD1 should be omitted and FMULP becomes FMUL. The subtraction 1-x2 is then done using the constant fpu_const_r4_one (value=1.0) as argument for FSUBR.
The idea was to keep the FPU-stack usage as small as possible - the version with FLD1 needs one more free FPU-register.

I will fix that if time permitting it. Until then you can correct the macro yourself as described (using FLD1-version would be wrong, because of additional FPU-register usage)

regards
qWord
##### Re: asin() in SmplMath
« Reply #5 on: February 26, 2017, 10:24:20 AM »
I was thinking that perhaps with fld1 is fast. But most of the time the opposite is true.
`; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤    include \masm32\include\masm32rt.inc; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤comment * -----------------------------------------------------                     Build this console app with                  "MAKEIT.BAT" on the PROJECT menu.        ----------------------------------------------------- *    .data? value dd ?    .data veces dd 10000000 item dd 0 x1 dq 0.1596 fp64 dq 0.0 fpu_const_r4_one dq 1.0        .codestart:   ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤    call main    inkey    exit; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤main proc    cls    finit mov ecx , 5mayor: push ecx mov item, rv(GetTickCount) mov ecx, veces@empieza: fld x1   fld st  fmul st,st  fld1  fsubr;         fpu_const_r4_one  fsqrt  fpatan fstp fp64    loop @empieza sub rv(GetTickCount), item printf("%d\t is a value\n", eax); mov item, rv(GetTickCount) mov ecx, veces@empieza2: fld x1   fld st  fmul st,st  fsubr fpu_const_r4_one  fsqrt  fpatan fstp fp64    loop @empieza2    sub rv(GetTickCount), item printf("%d\t is a value\n", eax); printf("\t \n"); pop ecx dec ecx .if ecx > 0     jmp mayor .endif    retmain endp; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤end start`
`1029     is a value1030     is a value1045     is a value1030     is a value998      is a value983      is a value998      is a value983      is a value983      is a value998      is a valuePress any key to continue ....`

##### Re: asin() in SmplMath
« Reply #6 on: February 26, 2017, 09:28:58 PM »
I was thinking that perhaps with fld1 is fast.

fld1 is fast (approx. 1 cycle), especially when followed by ultra-slow fsqrt or fpatan:
`Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)143     cycles for 100 * fld162      cycles for 100 * fld one real62      cycles for 100 * fld one int63      cycles for 100 * fldpi1818    cycles for 100 * fsqrt28991   cycles for 100 * fpatan144     cycles for 100 * fld1172     cycles for 100 * fld one real172     cycles for 100 * fld one int143     cycles for 100 * fldpi1817    cycles for 100 * fsqrt29007   cycles for 100 * fpatan143     cycles for 100 * fld162      cycles for 100 * fld one real62      cycles for 100 * fld one int63      cycles for 100 * fldpi1820    cycles for 100 * fsqrt28994   cycles for 100 * fpatan`
The real surprise here is that fld1 is a little bit slower than fldpi (same for FLDL2E etc).

##### Re: asin() in SmplMath
« Reply #7 on: February 27, 2017, 03:27:15 AM »
Pefect JJ!  :t

`AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)??      cycles for 1000 * fld1232     cycles for 1000 * fld one real140     cycles for 1000 * fld one int??      cycles for 1000 * fldpi29720   cycles for 1000 * fsqrt44963   cycles for 1000 * fpatan212128  cycles for 1000 * WithFld1212253  cycles for 1000 * WithoutFld1??      cycles for 1000 * fld111      cycles for 1000 * fld one real141     cycles for 1000 * fld one int4       cycles for 1000 * fldpi29691   cycles for 1000 * fsqrt45041   cycles for 1000 * fpatan212272  cycles for 1000 * WithFld1212404  cycles for 1000 * WithoutFld10       cycles for 1000 * fld138      cycles for 1000 * fld one real143     cycles for 1000 * fld one int??      cycles for 1000 * fldpi29729   cycles for 1000 * fsqrt44907   cycles for 1000 * fpatan212263  cycles for 1000 * WithFld1212198  cycles for 1000 * WithoutFld14       bytes for fld18       bytes for fld one real8       bytes for fld one int4       bytes for fldpi6       bytes for fsqrt6       bytes for fpatan24      bytes for WithFld126      bytes for WithoutFld1--- ok ---`
With these numbers, one free FPU register cost 0.03% of time. Cheap, I think .

Thanks. HSE

##### Re: asin() in SmplMath
« Reply #8 on: February 27, 2017, 04:13:07 AM »
Hi,

Some results from the oldie, moldy CPU collection.  Somewhat
weird results?

`P-IIIpre-P4 (SSE1)2 cycles for 100 * fld1103 cycles for 100 * fld one real193 cycles for 100 * fld one int42 cycles for 100 * fldpi6861 cycles for 100 * fsqrt10301 cycles for 100 * fpatan2 cycles for 100 * fld1103 cycles for 100 * fld one real192 cycles for 100 * fld one int42 cycles for 100 * fldpi6849 cycles for 100 * fsqrt10300 cycles for 100 * fpatan1 cycles for 100 * fld1103 cycles for 100 * fld one real192 cycles for 100 * fld one int41 cycles for 100 * fldpi6849 cycles for 100 * fsqrt10297 cycles for 100 * fpatan4 bytes for fld18 bytes for fld one real8 bytes for fld one int4 bytes for fldpi6 bytes for fsqrt6 bytes for fpatan--- ok ---P-MMX pre-P4303 cycles for 100 * fld1201 cycles for 100 * fld one real403 cycles for 100 * fld one int911 cycles for 100 * fldpi8012 cycles for 100 * fsqrt7287 cycles for 100 * fpatan303 cycles for 100 * fld1202 cycles for 100 * fld one real405 cycles for 100 * fld one int914 cycles for 100 * fldpi8044 cycles for 100 * fsqrt7283 cycles for 100 * fpatan305 cycles for 100 * fld1200 cycles for 100 * fld one real402 cycles for 100 * fld one int915 cycles for 100 * fldpi7994 cycles for 100 * fsqrt7285 cycles for 100 * fpatan4 bytes for fld18 bytes for fld one real8 bytes for fld one int4 bytes for fldpi6 bytes for fsqrt6 bytes for fpatan--- ok ---Intel(R) Pentium(R) M processor 1.70GHz (SSE2)206 cycles for 100 * fld199 cycles for 100 * fld one real280 cycles for 100 * fld one int242 cycles for 100 * fldpi7307 cycles for 100 * fsqrt14883 cycles for 100 * fpatan204 cycles for 100 * fld1101 cycles for 100 * fld one real271 cycles for 100 * fld one int237 cycles for 100 * fldpi6864 cycles for 100 * fsqrt14878 cycles for 100 * fpatan205 cycles for 100 * fld198 cycles for 100 * fld one real274 cycles for 100 * fld one int247 cycles for 100 * fldpi6860 cycles for 100 * fsqrt14882 cycles for 100 * fpatan4 bytes for fld18 bytes for fld one real8 bytes for fld one int4 bytes for fldpi6 bytes for fsqrt6 bytes for fpatan--- ok --- `
HTH,

Steve N.

##### Re: asin() in SmplMath
« Reply #9 on: February 27, 2017, 05:01:42 AM »

Yeah, the results look a bit weird. The cycles are taken from the difference between a full loop minus the empty loop. That doesn't work exactly the same way with all processors. On the positive side, the timings are usually quite stable