The MASM Forum

Miscellaneous => Miscellaneous Projects => Topic started by: HSE on February 22, 2017, 11:07:37 AM

Title: asin() in SmplMath
Post by: HSE on February 22, 2017, 11:07:37 AM
Hi qWord!!

There is an issue in asin():fslv_fnc_asin macro
  fld st
  fmulp st,st
  fld1
  fsubr fpu_const_r4_one
  fsqrt
  fpatan
endm


And should be:fslv_fnc_asin macro
  fld st
  fmul st,st
  fsubr fpu_const_r4_one
  fsqrt
  fpatan
endm


Regards. HSE
Title: Re: asin() in SmplMath
Post by: raymond on February 23, 2017, 04:36:35 AM
I believe qWord's procedure is basicly correct. The intent is to compute the equivalent cosine according to the relation
sin2 + cos2 = 1

which translates into
cos = sqrt(1-sin2)

And then you get the angle from the arctan of the sin/cos ratio.

The only concern I have is the use of "fpu_const_r4_one" which I don't know what it is, and probably should not even be there.
Title: Re: asin() in SmplMath
Post by: HSE on February 23, 2017, 04:59:12 AM
Hi Raymond!

The entire qWord's macro system is in SmplMath (https://sourceforge.net/projects/smplmath/)

For sure, something else is in arcsin function place. I have not studied the code very much, instead I tested results against other programs :biggrin:

LATER

Perhaps the idea was:fslv_fnc_asin macro
  fld st
  fmul st,st ; without p
  fld1
  fsubr    ; without fpu_const_r4_one
  fsqrt
  fpatan
endm
Title: Re: asin() in SmplMath
Post by: raymond on February 23, 2017, 07:00:09 AM
QuotePerhaps the idea was:

Could very well be that he forgot to comment it out after some other modification. Good point. Qword should be able to confirm that.
Title: Re: asin() in SmplMath
Post by: qWord on February 23, 2017, 07:25:22 AM
yes, I confirm this a bug caused by an modification I did with version 2.0.

The FLD1 should be omitted and FMULP becomes FMUL. The subtraction 1-x2 is then done using the constant fpu_const_r4_one (value=1.0) as argument for FSUBR.
The idea was to keep the FPU-stack usage as small as possible - the version with FLD1 needs one more free FPU-register.

I will fix that if time permitting it. Until then you can correct the macro yourself as described (using FLD1-version would be wrong, because of additional FPU-register usage)

regards
qWord
Title: Re: asin() in SmplMath
Post by: HSE on February 26, 2017, 10:24:20 AM
I was thinking that perhaps with fld1 is fast. But most of the time the opposite is true.; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                     Build this console app with
                  "MAKEIT.BAT" on the PROJECT menu.
        ----------------------------------------------------- *

    .data?
value dd ?

    .data
veces dd 10000000
item dd 0

x1 dq 0.1596
fp64 dq 0.0

fpu_const_r4_one dq 1.0
   
    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc
   cls

  finit
mov ecx , 5
mayor:
push ecx
mov item, rv(GetTickCount)
mov ecx, veces
@empieza:
fld x1
  fld st
  fmul st,st
  fld1
  fsubr;         fpu_const_r4_one
  fsqrt
  fpatan
fstp fp64

    loop @empieza

sub rv(GetTickCount), item
printf("%d\t is a value\n", eax);

mov item, rv(GetTickCount)
mov ecx, veces
@empieza2:
fld x1
fld st
  fmul st,st
  fsubr fpu_const_r4_one
  fsqrt
  fpatan
fstp fp64
    loop @empieza2
   
sub rv(GetTickCount), item
printf("%d\t is a value\n", eax);
printf("\t \n");
pop ecx
dec ecx
.if ecx > 0
    jmp mayor
.endif
    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start


1029     is a value
1030     is a value

1045     is a value
1030     is a value

998      is a value
983      is a value

998      is a value
983      is a value

983      is a value
998      is a value

Press any key to continue ....

Title: Re: asin() in SmplMath
Post by: jj2007 on February 26, 2017, 09:28:58 PM
Quote from: HSE on February 26, 2017, 10:24:20 AM
I was thinking that perhaps with fld1 is fast.

fld1 is fast (approx. 1 cycle), especially when followed by ultra-slow fsqrt or fpatan:Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

143     cycles for 100 * fld1
62      cycles for 100 * fld one real
62      cycles for 100 * fld one int
63      cycles for 100 * fldpi
1818    cycles for 100 * fsqrt
28991   cycles for 100 * fpatan

144     cycles for 100 * fld1
172     cycles for 100 * fld one real
172     cycles for 100 * fld one int
143     cycles for 100 * fldpi
1817    cycles for 100 * fsqrt
29007   cycles for 100 * fpatan

143     cycles for 100 * fld1
62      cycles for 100 * fld one real
62      cycles for 100 * fld one int
63      cycles for 100 * fldpi
1820    cycles for 100 * fsqrt
28994   cycles for 100 * fpatan


The real surprise here is that fld1 is a little bit slower than fldpi (same for FLDL2E etc).
Title: Re: asin() in SmplMath
Post by: HSE on February 27, 2017, 03:27:15 AM
Pefect JJ!  :t

AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

??      cycles for 1000 * fld1
232     cycles for 1000 * fld one real
140     cycles for 1000 * fld one int
??      cycles for 1000 * fldpi
29720   cycles for 1000 * fsqrt
44963   cycles for 1000 * fpatan
212128  cycles for 1000 * WithFld1
212253  cycles for 1000 * WithoutFld1

??      cycles for 1000 * fld1
11      cycles for 1000 * fld one real
141     cycles for 1000 * fld one int
4       cycles for 1000 * fldpi
29691   cycles for 1000 * fsqrt
45041   cycles for 1000 * fpatan
212272  cycles for 1000 * WithFld1
212404  cycles for 1000 * WithoutFld1

0       cycles for 1000 * fld1
38      cycles for 1000 * fld one real
143     cycles for 1000 * fld one int
??      cycles for 1000 * fldpi
29729   cycles for 1000 * fsqrt
44907   cycles for 1000 * fpatan
212263  cycles for 1000 * WithFld1
212198  cycles for 1000 * WithoutFld1

4       bytes for fld1
8       bytes for fld one real
8       bytes for fld one int
4       bytes for fldpi
6       bytes for fsqrt
6       bytes for fpatan
24      bytes for WithFld1
26      bytes for WithoutFld1

--- ok ---


With these numbers, one free FPU register cost 0.03% of time. Cheap, I think :biggrin:.

Thanks. HSE
Title: Re: asin() in SmplMath
Post by: FORTRANS on February 27, 2017, 04:13:07 AM
Hi,

   Some results from the oldie, moldy CPU collection.  Somewhat
weird results?

P-III
pre-P4 (SSE1)

2 cycles for 100 * fld1
103 cycles for 100 * fld one real
193 cycles for 100 * fld one int
42 cycles for 100 * fldpi
6861 cycles for 100 * fsqrt
10301 cycles for 100 * fpatan

2 cycles for 100 * fld1
103 cycles for 100 * fld one real
192 cycles for 100 * fld one int
42 cycles for 100 * fldpi
6849 cycles for 100 * fsqrt
10300 cycles for 100 * fpatan

1 cycles for 100 * fld1
103 cycles for 100 * fld one real
192 cycles for 100 * fld one int
41 cycles for 100 * fldpi
6849 cycles for 100 * fsqrt
10297 cycles for 100 * fpatan

4 bytes for fld1
8 bytes for fld one real
8 bytes for fld one int
4 bytes for fldpi
6 bytes for fsqrt
6 bytes for fpatan


--- ok ---

P-MMX
pre-P4
303 cycles for 100 * fld1
201 cycles for 100 * fld one real
403 cycles for 100 * fld one int
911 cycles for 100 * fldpi
8012 cycles for 100 * fsqrt
7287 cycles for 100 * fpatan

303 cycles for 100 * fld1
202 cycles for 100 * fld one real
405 cycles for 100 * fld one int
914 cycles for 100 * fldpi
8044 cycles for 100 * fsqrt
7283 cycles for 100 * fpatan

305 cycles for 100 * fld1
200 cycles for 100 * fld one real
402 cycles for 100 * fld one int
915 cycles for 100 * fldpi
7994 cycles for 100 * fsqrt
7285 cycles for 100 * fpatan

4 bytes for fld1
8 bytes for fld one real
8 bytes for fld one int
4 bytes for fldpi
6 bytes for fsqrt
6 bytes for fpatan


--- ok ---

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

206 cycles for 100 * fld1
99 cycles for 100 * fld one real
280 cycles for 100 * fld one int
242 cycles for 100 * fldpi
7307 cycles for 100 * fsqrt
14883 cycles for 100 * fpatan

204 cycles for 100 * fld1
101 cycles for 100 * fld one real
271 cycles for 100 * fld one int
237 cycles for 100 * fldpi
6864 cycles for 100 * fsqrt
14878 cycles for 100 * fpatan

205 cycles for 100 * fld1
98 cycles for 100 * fld one real
274 cycles for 100 * fld one int
247 cycles for 100 * fldpi
6860 cycles for 100 * fsqrt
14882 cycles for 100 * fpatan

4 bytes for fld1
8 bytes for fld one real
8 bytes for fld one int
4 bytes for fldpi
6 bytes for fsqrt
6 bytes for fpatan


--- ok ---


HTH,

Steve N.
Title: Re: asin() in SmplMath
Post by: jj2007 on February 27, 2017, 05:01:42 AM
 :biggrin:

Yeah, the results look a bit weird. The cycles are taken from the difference between a full loop minus the empty loop. That doesn't work exactly the same way with all processors. On the positive side, the timings are usually quite stable :bgrin: