### Author Topic: asin() in SmplMath  (Read 2720 times)

#### HSE

• Member
• Posts: 1104
• <AMD>< 7-32>
##### asin() in SmplMath
« on: February 22, 2017, 11:07:37 AM »
Hi qWord!!

There is an issue in asin():
Code: [Select]
`fslv_fnc_asin macro  fld st  fmulp st,st  fld1  fsubr fpu_const_r4_one  fsqrt  fpatanendm`
And should be:
Code: [Select]
`fslv_fnc_asin macro  fld st  fmul st,st  fsubr fpu_const_r4_one  fsqrt  fpatanendm`
Regards. HSE

#### raymond

• Member
• Posts: 234
##### Re: asin() in SmplMath
« Reply #1 on: February 23, 2017, 04:36:35 AM »
I believe qWord's procedure is basicly correct. The intent is to compute the equivalent cosine according to the relation
sin2 + cos2 = 1

which translates into
cos = sqrt(1-sin2)

And then you get the angle from the arctan of the sin/cos ratio.

The only concern I have is the use of "fpu_const_r4_one" which I don't know what it is, and probably should not even be there.
Whenever you assume something, you risk being wrong half the time.
http://www.ray.masmcode.com/

#### HSE

• Member
• Posts: 1104
• <AMD>< 7-32>
##### Re: asin() in SmplMath
« Reply #2 on: February 23, 2017, 04:59:12 AM »
Hi Raymond!

The entire qWord's macro system is in SmplMath

For sure, something else is in arcsin function place. I have not studied the code very much, instead I tested results against other programs

LATER

Perhaps the idea was:
Code: [Select]
`fslv_fnc_asin macro  fld st  fmul st,st ; without p  fld1  fsubr    ; without fpu_const_r4_one  fsqrt  fpatanendm`
« Last Edit: February 23, 2017, 06:04:24 AM by HSE »

#### raymond

• Member
• Posts: 234
##### Re: asin() in SmplMath
« Reply #3 on: February 23, 2017, 07:00:09 AM »
Quote
Perhaps the idea was:

Could very well be that he forgot to comment it out after some other modification. Good point. Qword should be able to confirm that.
Whenever you assume something, you risk being wrong half the time.
http://www.ray.masmcode.com/

#### qWord

• Member
• Posts: 1473
• The base type of a type is the type itself
##### Re: asin() in SmplMath
« Reply #4 on: February 23, 2017, 07:25:22 AM »
yes, I confirm this a bug caused by an modification I did with version 2.0.

The FLD1 should be omitted and FMULP becomes FMUL. The subtraction 1-x2 is then done using the constant fpu_const_r4_one (value=1.0) as argument for FSUBR.
The idea was to keep the FPU-stack usage as small as possible - the version with FLD1 needs one more free FPU-register.

I will fix that if time permitting it. Until then you can correct the macro yourself as described (using FLD1-version would be wrong, because of additional FPU-register usage)

regards
qWord
MREAL macros - when you need floating point arithmetic while assembling!

#### HSE

• Member
• Posts: 1104
• <AMD>< 7-32>
##### Re: asin() in SmplMath
« Reply #5 on: February 26, 2017, 10:24:20 AM »
I was thinking that perhaps with fld1 is fast. But most of the time the opposite is true.
Code: [Select]
`; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤    include \masm32\include\masm32rt.inc; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤comment * -----------------------------------------------------                     Build this console app with                  "MAKEIT.BAT" on the PROJECT menu.        ----------------------------------------------------- *    .data? value dd ?    .data veces dd 10000000 item dd 0 x1 dq 0.1596 fp64 dq 0.0 fpu_const_r4_one dq 1.0        .codestart:   ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤    call main    inkey    exit; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤main proc    cls    finit mov ecx , 5mayor: push ecx mov item, rv(GetTickCount) mov ecx, veces@empieza: fld x1   fld st  fmul st,st  fld1  fsubr;         fpu_const_r4_one  fsqrt  fpatan fstp fp64    loop @empieza sub rv(GetTickCount), item printf("%d\t is a value\n", eax); mov item, rv(GetTickCount) mov ecx, veces@empieza2: fld x1   fld st  fmul st,st  fsubr fpu_const_r4_one  fsqrt  fpatan fstp fp64    loop @empieza2    sub rv(GetTickCount), item printf("%d\t is a value\n", eax); printf("\t \n"); pop ecx dec ecx .if ecx > 0     jmp mayor .endif    retmain endp; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤end start`
Code: [Select]
`1029     is a value1030     is a value1045     is a value1030     is a value998      is a value983      is a value998      is a value983      is a value983      is a value998      is a valuePress any key to continue ....`

#### jj2007

• Member
• Posts: 9684
• Assembler is fun ;-)
##### Re: asin() in SmplMath
« Reply #6 on: February 26, 2017, 09:28:58 PM »
I was thinking that perhaps with fld1 is fast.

fld1 is fast (approx. 1 cycle), especially when followed by ultra-slow fsqrt or fpatan:
Code: [Select]
`Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)143     cycles for 100 * fld162      cycles for 100 * fld one real62      cycles for 100 * fld one int63      cycles for 100 * fldpi1818    cycles for 100 * fsqrt28991   cycles for 100 * fpatan144     cycles for 100 * fld1172     cycles for 100 * fld one real172     cycles for 100 * fld one int143     cycles for 100 * fldpi1817    cycles for 100 * fsqrt29007   cycles for 100 * fpatan143     cycles for 100 * fld162      cycles for 100 * fld one real62      cycles for 100 * fld one int63      cycles for 100 * fldpi1820    cycles for 100 * fsqrt28994   cycles for 100 * fpatan`
The real surprise here is that fld1 is a little bit slower than fldpi (same for FLDL2E etc).

#### HSE

• Member
• Posts: 1104
• <AMD>< 7-32>
##### Re: asin() in SmplMath
« Reply #7 on: February 27, 2017, 03:27:15 AM »
Pefect JJ!  :t

Code: [Select]
`AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)??      cycles for 1000 * fld1232     cycles for 1000 * fld one real140     cycles for 1000 * fld one int??      cycles for 1000 * fldpi29720   cycles for 1000 * fsqrt44963   cycles for 1000 * fpatan212128  cycles for 1000 * WithFld1212253  cycles for 1000 * WithoutFld1??      cycles for 1000 * fld111      cycles for 1000 * fld one real141     cycles for 1000 * fld one int4       cycles for 1000 * fldpi29691   cycles for 1000 * fsqrt45041   cycles for 1000 * fpatan212272  cycles for 1000 * WithFld1212404  cycles for 1000 * WithoutFld10       cycles for 1000 * fld138      cycles for 1000 * fld one real143     cycles for 1000 * fld one int??      cycles for 1000 * fldpi29729   cycles for 1000 * fsqrt44907   cycles for 1000 * fpatan212263  cycles for 1000 * WithFld1212198  cycles for 1000 * WithoutFld14       bytes for fld18       bytes for fld one real8       bytes for fld one int4       bytes for fldpi6       bytes for fsqrt6       bytes for fpatan24      bytes for WithFld126      bytes for WithoutFld1--- ok ---`
With these numbers, one free FPU register cost 0.03% of time. Cheap, I think .

Thanks. HSE

#### FORTRANS

• Member
• Posts: 1056
##### Re: asin() in SmplMath
« Reply #8 on: February 27, 2017, 04:13:07 AM »
Hi,

Some results from the oldie, moldy CPU collection.  Somewhat
weird results?

Code: [Select]
`P-IIIpre-P4 (SSE1)2 cycles for 100 * fld1103 cycles for 100 * fld one real193 cycles for 100 * fld one int42 cycles for 100 * fldpi6861 cycles for 100 * fsqrt10301 cycles for 100 * fpatan2 cycles for 100 * fld1103 cycles for 100 * fld one real192 cycles for 100 * fld one int42 cycles for 100 * fldpi6849 cycles for 100 * fsqrt10300 cycles for 100 * fpatan1 cycles for 100 * fld1103 cycles for 100 * fld one real192 cycles for 100 * fld one int41 cycles for 100 * fldpi6849 cycles for 100 * fsqrt10297 cycles for 100 * fpatan4 bytes for fld18 bytes for fld one real8 bytes for fld one int4 bytes for fldpi6 bytes for fsqrt6 bytes for fpatan--- ok ---P-MMX pre-P4303 cycles for 100 * fld1201 cycles for 100 * fld one real403 cycles for 100 * fld one int911 cycles for 100 * fldpi8012 cycles for 100 * fsqrt7287 cycles for 100 * fpatan303 cycles for 100 * fld1202 cycles for 100 * fld one real405 cycles for 100 * fld one int914 cycles for 100 * fldpi8044 cycles for 100 * fsqrt7283 cycles for 100 * fpatan305 cycles for 100 * fld1200 cycles for 100 * fld one real402 cycles for 100 * fld one int915 cycles for 100 * fldpi7994 cycles for 100 * fsqrt7285 cycles for 100 * fpatan4 bytes for fld18 bytes for fld one real8 bytes for fld one int4 bytes for fldpi6 bytes for fsqrt6 bytes for fpatan--- ok ---Intel(R) Pentium(R) M processor 1.70GHz (SSE2)206 cycles for 100 * fld199 cycles for 100 * fld one real280 cycles for 100 * fld one int242 cycles for 100 * fldpi7307 cycles for 100 * fsqrt14883 cycles for 100 * fpatan204 cycles for 100 * fld1101 cycles for 100 * fld one real271 cycles for 100 * fld one int237 cycles for 100 * fldpi6864 cycles for 100 * fsqrt14878 cycles for 100 * fpatan205 cycles for 100 * fld198 cycles for 100 * fld one real274 cycles for 100 * fld one int247 cycles for 100 * fldpi6860 cycles for 100 * fsqrt14882 cycles for 100 * fpatan4 bytes for fld18 bytes for fld one real8 bytes for fld one int4 bytes for fldpi6 bytes for fsqrt6 bytes for fpatan--- ok --- `
HTH,

Steve N.

#### jj2007

• Member
• Posts: 9684
• Assembler is fun ;-)
##### Re: asin() in SmplMath
« Reply #9 on: February 27, 2017, 05:01:42 AM »

Yeah, the results look a bit weird. The cycles are taken from the difference between a full loop minus the empty loop. That doesn't work exactly the same way with all processors. On the positive side, the timings are usually quite stable