Hi folks,
Just uploaded a new version (http://masm32.com/board/index.php?topic=94.0) featuring faster Sinus() and Cosinus() algos. The absolute error is rather small, here compared to the FPU fsin and fcos functions:
x sin x error
-90 -1.000000000000000000 0.0
-45 -0.7071067811865475244 -9.38e-18
0 0.0 0.0
45 0.7071067811865475244 9.38e-18
90 1.000000000000000000 0.0
135 0.7071067811865475244 -2.82e-17
180 0.0 -5.31e-17
225 -0.7071067811865475244 -4.69e-17
270 -1.000000000000000000 0.0
315 -0.7071067811865475244 6.57e-17
360 0.0 1.06e-16
405 0.7071067811865475244 8.44e-17
450 1.000000000000000000 0.0
x cos x error
-90 0.0 0.0
-45 0.7071067811865475244 -9.38e-18
0 1.000000000000000000 0.0
45 0.7071067811865475244 -9.38e-18
90 0.0 -2.65e-17
135 -0.7071067811865475244 -2.81e-17
180 -1.000000000000000000 0.0
225 -0.7071067811865475244 4.69e-17
270 0.0 7.96e-17
315 0.7071067811865475244 6.56e-17
360 1.000000000000000000 0.0
405 0.7071067811865475244 -8.45e-17
450 0.0 -1.33e-16
On my machine, they are almost five times as fast. Can I have some timings please? Thanks.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
293 ms fpu fsin 71 ms MasmBasic
295 ms fpu fsin 61 ms MasmBasic
294 ms fpu fsin 61 ms MasmBasic
295 ms fpu fsin 66 ms MasmBasic
290 ms fpu fsin 60 ms MasmBasic
290 ms fpu fsin 61 ms MasmBasic
299 ms fpu fsin 61 ms MasmBasic
288 ms fpu fsin 67 ms MasmBasic
Intel(R) Pentium(R) M processor 1.70GHz
459 ms fpu fsin 167 ms MasmBasic
433 ms fpu fsin 167 ms MasmBasic
442 ms fpu fsin 166 ms MasmBasic
451 ms fpu fsin 167 ms MasmBasic
456 ms fpu fsin 167 ms MasmBasic
432 ms fpu fsin 167 ms MasmBasic
443 ms fpu fsin 166 ms MasmBasic
449 ms fpu fsin 167 ms MasmBasic
C:\Users\luce\Downloads>fastsincos
Intel(R) Core(TM) i3-4150 CPU @ 3.50GHz
236 ms fpu fsin 46 ms MasmBasic
258 ms fpu fsin 51 ms MasmBasic
238 ms fpu fsin 56 ms MasmBasic
239 ms fpu fsin 46 ms MasmBasic
241 ms fpu fsin 46 ms MasmBasic
240 ms fpu fsin 46 ms MasmBasic
241 ms fpu fsin 68 ms MasmBasic
238 ms fpu fsin 46 ms MasmBasic
AMD A10-7850K APU with Radeon(TM) R7 Graphics
252 ms fpu fsin 76 ms MasmBasic
250 ms fpu fsin 73 ms MasmBasic
249 ms fpu fsin 75 ms MasmBasic
250 ms fpu fsin 76 ms MasmBasic
250 ms fpu fsin 76 ms MasmBasic
248 ms fpu fsin 74 ms MasmBasic
249 ms fpu fsin 76 ms MasmBasic
249 ms fpu fsin 76 ms MasmBasic
Thanxalot :icon14:
One more, with CRT sin added:
Intel(R) Celeron(R) CPU N2840 @ 2.16GHz
411 ms fpu fsin 1074 ms CRT sin 175 ms MasmBasic
409 ms fpu fsin 1096 ms CRT sin 180 ms MasmBasic
401 ms fpu fsin 1070 ms CRT sin 174 ms MasmBasic
397 ms fpu fsin 1098 ms CRT sin 181 ms MasmBasic
407 ms fpu fsin 1072 ms CRT sin 173 ms MasmBasic
395 ms fpu fsin 1054 ms CRT sin 172 ms MasmBasic
392 ms fpu fsin 1055 ms CRT sin 172 ms MasmBasic
394 ms fpu fsin 1054 ms CRT sin 172 ms MasmBasic
Precision of Sinus() is marginally better than the CRT algo.
Genuine Intel(R) CPU T2060 @ 1.60GHz
524 ms fpu fsin 179 ms MasmBasic
472 ms fpu fsin 211 ms MasmBasic
463 ms fpu fsin 175 ms MasmBasic
460 ms fpu fsin 175 ms MasmBasic
449 ms fpu fsin 175 ms MasmBasic
459 ms fpu fsin 175 ms MasmBasic
462 ms fpu fsin 175 ms MasmBasic
458 ms fpu fsin 175 ms MasmBasic
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz
438 ms fpu fsin 156 ms MasmBasic
445 ms fpu fsin 149 ms MasmBasic
431 ms fpu fsin 167 ms MasmBasic
441 ms fpu fsin 147 ms MasmBasic
433 ms fpu fsin 147 ms MasmBasic
432 ms fpu fsin 150 ms MasmBasic
434 ms fpu fsin 146 ms MasmBasic
431 ms fpu fsin 149 ms MasmBasic
Same computer as my previous posting, but this time I started up with 'Minimal' services running,
just to see if there is much difference.
Genuine Intel(R) CPU T2060 @ 1.60GHz
451 ms fpu fsin 175 ms MasmBasic
457 ms fpu fsin 175 ms MasmBasic
460 ms fpu fsin 175 ms MasmBasic
455 ms fpu fsin 175 ms MasmBasic
447 ms fpu fsin 175 ms MasmBasic
462 ms fpu fsin 178 ms MasmBasic
460 ms fpu fsin 175 ms MasmBasic
455 ms fpu fsin 175 ms MasmBasic
11 year old Gateway laptop.
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
254 ms fpu fsin 50 ms MasmBasic
245 ms fpu fsin 50 ms MasmBasic
245 ms fpu fsin 50 ms MasmBasic
245 ms fpu fsin 50 ms MasmBasic
244 ms fpu fsin 50 ms MasmBasic
245 ms fpu fsin 50 ms MasmBasic
245 ms fpu fsin 50 ms MasmBasic
245 ms fpu fsin 50 ms MasmBasic
Quote from: zedd151 on September 10, 2015, 08:30:22 AM
Same computer as my previous posting, but this time I started up with 'Minimal' services running,
just to see if there is much difference.
..
11 year old Gateway laptop.
No difference for the
lowest values. The algo grabs its share and doesn't care for services :P
@Marinus: This was inspired by your earlier lolengine/remez post (http://masm32.com/board/index.php?topic=4118.msg44033#msg44033), thanks :icon14:
Hi Jochen,
Did you use his c++ example source code to get the Chebyshev coefficients ?
Quote from: Siekmanski on September 10, 2015, 09:30:39 PMDid you use his c++ example source code to get the Chebyshev coefficients ?
I tried these coefficients for fastsin2:
Quote from: Siekmanski on April 15, 2015, 04:36:41 PM
And this one is our goldmine i think,
http://lolengine.net/wiki/doc/maths/remez
...
There is even a faster one with 4 coeffs,
double fastsin2(double x)
{
const double a3 = -1.666665709650470145824129400050267289858e-1;
const double a5 = 8.333017291562218127986291618761571373087e-3;
const double a7 = -1.980661520135080504411629636078917643846e-4;
const double a9 = 2.600054767890361277123254766503271638682e-6;
return x + x*x*x * (a3 + x*x * (a5 + x*x * (a7 + x*x * a9))));
}
Problem is they are reasonably precise only for the 0...90 degrees range. Of course, with known symmetries you can cover the whole range, but that makes it a bit slower.
In the end, I redesigned my SetPoly3 (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1124) for REAL10 precision and voilà, fast
and precise :biggrin:
Quote from: jj2007 on September 11, 2015, 03:50:49 AMProblem is they are reasonably precise only for the 0...90 degrees range. Of course, with known symmetries you can cover the whole range, but that makes it a bit slower.
Just to note that the range reduction is the complicated part here and critical for the precision.
It might be also interesting to see plots of the relative error compared to the FPU (or any high-precision library with correct rounding)...
Quote from: qWord on September 12, 2015, 02:55:41 AMIt might be also interesting to see plots of the relative error compared to the FPU (or any high-precision library with correct rounding)...
Can't offer plots right now, but here is a table version:
x MasmBasic Sinus(x)
-0.707106781186547524400844362104849039284...(Wolfram Alpha)
error fpu-CRT error fpu-MB
-45 -0.7071067811865475244 -5.33e-17 -9.38e-18
-30 -0.5000000000000000001 -4.78e-17 -7.72e-18
-15 -0.2588190451025207624 -1.86e-17 -4.31e-18
0 0.0 0.0 0.0
15 0.2588190451025207624 1.86e-17 4.31e-18
30 0.5000000000000000001 4.78e-17 7.72e-18
45 0.7071067811865475244 5.33e-17 9.38e-18
60 0.8660254037844386468 4.13e-17 8.89e-18
75 0.9659258262890682867 -3.12e-17 5.64e-18
90 1.000000000000000000 0.0 0.0
105 0.9659258262890682867 -1.75e-17 -8.08e-18
120 0.8660254037844386468 -4.32e-17 -1.77e-17
135 0.7071067811865475244 -2.02e-17 -2.82e-17
150 0.5000000000000000001 9.38e-17 -3.83e-17
165 0.2588190451025207624 -2.08e-16 -4.69e-17
180 0.0 -6.94e-17 -5.31e-17
195 -0.2588190451025207624 8.82e-17 -5.56e-17
210 -0.5000000000000000001 1.65e-16 -5.37e-17
225 -0.7071067811865475244 -1.58e-17 -4.69e-17
240 -0.8660254037844386468 -2.37e-16 -3.54e-17
255 -0.9659258262890682867 4.49e-17 -1.94e-17
270 -1.000000000000000000 0.0 0.0
285 -0.9659258262890682867 1.15e-16 2.18e-17
300 -0.8660254037844386468 -9.44e-17 4.42e-17
315 -0.7071067811865475244 9.37e-17 6.57e-17
330 -0.5000000000000000001 3.60e-16 8.42e-17
345 -0.2588190451025207624 -1.77e-16 9.82e-17
360 0.0 1.39e-16 1.06e-16
375 0.2588190451025207624 4.16e-16 1.07e-16
390 0.5000000000000000001 -9.96e-17 9.96e-17
405 0.7071067811865475244 8.93e-17 8.44e-17
x MasmBasic Cosinus(x)
0.707106781186547524400844362104849039284...(Wolfram Alpha)
error fpu-CRT error fpu-MB
-45 0.7071067811865475244 -3.90e-17 -9.38e-18
-30 0.8660254037844386468 -5.64e-17 -4.39e-18
-15 0.9659258262890682867 -2.43e-17 -1.19e-18
0 1.000000000000000000 0.0 0.0
15 0.9659258262890682867 -2.43e-17 -1.19e-18
30 0.8660254037844386468 -5.64e-17 -4.39e-18
45 0.7071067811865475244 -3.90e-17 -9.38e-18
60 0.5000000000000000001 -9.57e-17 -1.53e-17
75 0.2588190451025207624 4.42e-17 -2.13e-17
90 0.0 -3.47e-17 -2.65e-17
105 -0.2588190451025207624 1.18e-16 -2.99e-17
120 -0.5000000000000000001 -1.91e-16 -3.07e-17
135 -0.7071067811865475244 -3.45e-17 -2.81e-17
150 -0.8660254037844386468 8.29e-17 -2.21e-17
165 -0.9659258262890682867 -7.30e-17 -1.25e-17
180 -1.000000000000000000 0.0 0.0
195 -0.9659258262890682867 1.06e-17 1.50e-17
210 -0.8660254037844386468 -8.12e-17 3.10e-17
225 -0.7071067811865475244 1.12e-16 4.69e-17
240 -0.5000000000000000001 3.83e-16 6.12e-17
255 -0.2588190451025207624 -2.07e-16 7.26e-17
270 0.0 1.04e-16 7.96e-17
285 0.2588190451025207624 3.86e-16 8.12e-17
300 0.5000000000000000001 -1.88e-16 7.67e-17
315 0.7071067811865475244 1.08e-16 6.56e-17
330 0.8660254037844386468 2.24e-16 4.87e-17
345 0.9659258262890682867 -5.18e-17 2.62e-17
360 1.000000000000000000 0.0 0.0
375 0.9659258262890682867 -1.08e-16 -2.87e-17
390 0.8660254037844386468 1.08e-16 -5.75e-17
405 0.7071067811865475244 -7.49e-17 -8.45e-17
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
287 ms fpu fsin 610 ms CRT sin 64 ms MasmBasic
292 ms fpu fsin 604 ms CRT sin 62 ms MasmBasic
294 ms fpu fsin 604 ms CRT sin 62 ms MasmBasic
289 ms fpu fsin 604 ms CRT sin 62 ms MasmBasic
288 ms fpu fsin 601 ms CRT sin 62 ms MasmBasic
289 ms fpu fsin 601 ms CRT sin 63 ms MasmBasic
300 ms fpu fsin 600 ms CRT sin 64 ms MasmBasic
286 ms fpu fsin 612 ms CRT sin 62 ms MasmBasic
From my slow assed laptop
Quote
Genuine Intel(R) CPU T2060 @ 1.60GHz (SSE3)
x MasmBasic Sinus(x)
-0.707106781186547524400844362104849039284...(Wolfram Alpha)
error fpu-CRT error fpu-MB
-45 -0.7071067811865475244 -5.33e-17 -9.38e-18
-30 -0.5000000000000000001 -4.78e-17 -7.72e-18
-15 -0.2588190451025207624 -1.86e-17 -4.31e-18
0 0.0 0.0 0.0
15 0.2588190451025207624 1.86e-17 4.31e-18
30 0.5000000000000000001 4.78e-17 7.72e-18
45 0.7071067811865475244 5.33e-17 9.38e-18
60 0.8660254037844386468 4.13e-17 8.89e-18
75 0.9659258262890682867 -3.12e-17 5.64e-18
90 1.000000000000000000 0.0 0.0
105 0.9659258262890682867 -1.75e-17 -8.08e-18
120 0.8660254037844386468 -4.32e-17 -1.77e-17
135 0.7071067811865475244 -2.02e-17 -2.82e-17
150 0.5000000000000000001 9.38e-17 -3.83e-17
165 0.2588190451025207624 -2.08e-16 -4.69e-17
180 0.0 -6.94e-17 -5.31e-17
195 -0.2588190451025207624 8.82e-17 -5.56e-17
210 -0.5000000000000000001 1.65e-16 -5.37e-17
225 -0.7071067811865475244 -1.58e-17 -4.69e-17
240 -0.8660254037844386468 -2.37e-16 -3.54e-17
255 -0.9659258262890682867 4.49e-17 -1.94e-17
270 -1.000000000000000000 0.0 0.0
285 -0.9659258262890682867 1.15e-16 2.18e-17
300 -0.8660254037844386468 -9.44e-17 4.42e-17
315 -0.7071067811865475244 9.37e-17 6.57e-17
330 -0.5000000000000000001 3.60e-16 8.42e-17
345 -0.2588190451025207624 -1.77e-16 9.82e-17
360 0.0 1.39e-16 1.06e-16
375 0.2588190451025207624 4.16e-16 1.07e-16
390 0.5000000000000000001 -9.96e-17 9.96e-17
405 0.7071067811865475244 8.93e-17 8.44e-17
x MasmBasic Cosinus(x)
0.707106781186547524400844362104849039284...(Wolfram Alpha)
error fpu-CRT error fpu-MB
-45 0.7071067811865475244 -3.90e-17 -9.38e-18
-30 0.8660254037844386468 -5.64e-17 -4.39e-18
-15 0.9659258262890682867 -2.43e-17 -1.19e-18
0 1.000000000000000000 0.0 0.0
15 0.9659258262890682867 -2.43e-17 -1.19e-18
30 0.8660254037844386468 -5.64e-17 -4.39e-18
45 0.7071067811865475244 -3.90e-17 -9.38e-18
60 0.5000000000000000001 -9.57e-17 -1.53e-17
75 0.2588190451025207624 4.42e-17 -2.13e-17
90 0.0 -3.47e-17 -2.65e-17
105 -0.2588190451025207624 1.18e-16 -2.99e-17
120 -0.5000000000000000001 -1.91e-16 -3.07e-17
135 -0.7071067811865475244 -3.45e-17 -2.81e-17
150 -0.8660254037844386468 8.29e-17 -2.21e-17
165 -0.9659258262890682867 -7.30e-17 -1.25e-17
180 -1.000000000000000000 0.0 0.0
195 -0.9659258262890682867 1.06e-17 1.50e-17
210 -0.8660254037844386468 -8.12e-17 3.10e-17
225 -0.7071067811865475244 1.12e-16 4.69e-17
240 -0.5000000000000000001 3.83e-16 6.12e-17
255 -0.2588190451025207624 -2.07e-16 7.26e-17
270 0.0 1.04e-16 7.96e-17
285 0.2588190451025207624 3.86e-16 8.12e-17
300 0.5000000000000000001 -1.88e-16 7.67e-17
315 0.7071067811865475244 1.08e-16 6.56e-17
330 0.8660254037844386468 2.24e-16 4.87e-17
345 0.9659258262890682867 -5.18e-17 2.62e-17
360 1.000000000000000000 0.0 0.0
375 0.9659258262890682867 -1.08e-16 -2.87e-17
390 0.8660254037844386468 1.08e-16 -5.75e-17
405 0.7071067811865475244 -7.49e-17 -8.45e-17
Genuine Intel(R) CPU T2060 @ 1.60GHz
531 ms fpu fsin 1264 ms CRT sin 198 ms MasmBasic
461 ms fpu fsin 1323 ms CRT sin 196 ms MasmBasic
462 ms fpu fsin 1257 ms CRT sin 195 ms MasmBasic
462 ms fpu fsin 1256 ms CRT sin 196 ms MasmBasic
461 ms fpu fsin 1265 ms CRT sin 196 ms MasmBasic
461 ms fpu fsin 1268 ms CRT sin 196 ms MasmBasic
487 ms fpu fsin 1333 ms CRT sin 201 ms MasmBasic
462 ms fpu fsin 1267 ms CRT sin 197 ms MasmBasic
Hi jj how much time it was called? Fsin was 100microsecond for 1 million loop. 358ms was too slow.
Quote from: Farabi on October 16, 2015, 11:22:40 PM
Hi jj how much time it was called? Fsin was 100microsecond for 1 million loop. 358ms was too slow.
Can you explain a bit what you are referring to?
Edit:
Sorry I get a wrong perception.
Quote from: jj2007 on September 10, 2015, 12:08:13 AM
Hi folks,
Just uploaded a new version (http://masm32.com/board/index.php?topic=94.0) featuring faster Sinus() and Cosinus() algos. The absolute error is rather small, here compared to the FPU fsin and fcos functions:
x sin x error
-90 -1.000000000000000000 0.0
-45 -0.7071067811865475244 -9.38e-18
0 0.0 0.0
45 0.7071067811865475244 9.38e-18
90 1.000000000000000000 0.0
135 0.7071067811865475244 -2.82e-17
180 0.0 -5.31e-17
225 -0.7071067811865475244 -4.69e-17
270 -1.000000000000000000 0.0
315 -0.7071067811865475244 6.57e-17
360 0.0 1.06e-16
405 0.7071067811865475244 8.44e-17
450 1.000000000000000000 0.0
x cos x error
-90 0.0 0.0
-45 0.7071067811865475244 -9.38e-18
0 1.000000000000000000 0.0
45 0.7071067811865475244 -9.38e-18
90 0.0 -2.65e-17
135 -0.7071067811865475244 -2.81e-17
180 -1.000000000000000000 0.0
225 -0.7071067811865475244 4.69e-17
270 0.0 7.96e-17
315 0.7071067811865475244 6.56e-17
360 1.000000000000000000 0.0
405 0.7071067811865475244 -8.45e-17
450 0.0 -1.33e-16
On my machine, they are almost five times as fast. Can I have some timings please? Thanks.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
293 ms fpu fsin 71 ms MasmBasic
295 ms fpu fsin 61 ms MasmBasic
294 ms fpu fsin 61 ms MasmBasic
295 ms fpu fsin 66 ms MasmBasic
290 ms fpu fsin 60 ms MasmBasic
290 ms fpu fsin 61 ms MasmBasic
299 ms fpu fsin 61 ms MasmBasic
288 ms fpu fsin 67 ms MasmBasic
Ah my mistake, your are faster. :dazzled:
And looks not only yours, as with sameness of 18 digits with fpu-CRT - not sure as my test of original C++ algo, shows minimal sameness of 6 digits, as in this angel
36.66666 0.59715859 0.59715860
- mostly correct are 7 digits, in 8 digits output already many difference with MS-CRT sin in test tabels.
Could you post your source, please? I have tried to mimic your tabels output, but somehow the steps are not exactly the same.
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
SetGlobals fct:REAL10
Init
For_ fct=0.0 To 0.785398 Step 0.00289814760147601476014760147601
Print Str$("\n%9f\t", fct), Str$("%8f", Sinus(fct, rad)#)
Next
EndOfCode
Sinus() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1334) has the same precision as the FPU fsin function when used with integer arguments such as 45°, 123° etc.
Source on C++ - there exactness of REAL8, step equal 10 minutes in degrees and asm source of REAL10 I not had, so full test can't.
P.S. Translating source if need yourself, so it's and me disturbing.
Slower in this AMD? Why?
AMD Athlon(tm) II X2 220 Processor 2.8 GHz
230 ms fpu fsin 725 ms CRT sin 116 ms MasmBasic
231 ms fpu fsin 706 ms CRT sin 119 ms MasmBasic
234 ms fpu fsin 704 ms CRT sin 116 ms MasmBasic
229 ms fpu fsin 707 ms CRT sin 116 ms MasmBasic
229 ms fpu fsin 708 ms CRT sin 116 ms MasmBasic
231 ms fpu fsin 710 ms CRT sin 116 ms MasmBasic
229 ms fpu fsin 714 ms CRT sin 117 ms MasmBasic
230 ms fpu fsin 710 ms CRT sin 115 ms MasmBasic