### Author Topic: Trigonometry ...  (Read 49244 times)

#### rrr314159

• Member
• Posts: 1382
##### Trigonometry ...
« on: March 26, 2015, 04:48:39 PM »
These routines replace fsin and fcos using the FPU. They're based on Taylor Series for sin and cos. I had been using sin/cos lookup tables (4096 entries) which are a bit faster, but for more precision the LUT gets very large. But the main reason I developed these is because I'm translating all my algos to SSE, and would need the gather instruction (AVX2) to use LUTs. Even if I had them on my machine, most people don't.

I have various requirements for trig routines:

- min precision about 4 decimal digits but capable of (much) higher when a flag is set.
- use no more than 4 FPU registers preferably (could be waived if worth it).
- at least 3 times faster than FPU.
- SIMD - izable, which rules out some techniques.

These two routines, trigC and trigS, are based on Cos and Sin taylor series respectively. trigC is faster but requires one FPU reg per T.S. term (beyond the first "1"). With 4 regs (last term is x^8) it achieves only 5 significant decimal digits. trigS OTOH uses only 4 regs no matter how many terms. This version, at higher precision setting, uses 6 terms, up to x^11, to achieve almost 9 digits. Both routines calculate both sin and cos, with different accuracy patterns (trigS pattern is better, but that gets into too much detail - if interested say so).

On the Intel i5 speed is satisfactory, varying from 4 times faster than FPU to more than 9: only 3 nanoseconds per iteration, with 4 digits precision, enough for some purposes.

However on the AMD they're unsatisfactory. First, they're 3 times slower - rather worse than usual. But the amazing thing is, AMD FPU sin and cos are almost as fast as Intel! Since clock speed is just a little more than half, per cycle AMD is much better.

BTW I've heard it said there's no big difference between Intel and AMD - that's not my experience.

Anyway, I want faster and better routines if possible. These two can be sped up maybe 10% in obvious ways - some of which, no doubt, I'm overlooking - so let me know of any you notice. But also, I'm wondering if there are any tricks that would really make a difference. Some ideas,

- I wonder if fixed point would help. I doubt it.
- By factoring the series you can eliminate one multiply, don't think it's worth it.
- Manipulating the quadrants flag you can eliminate one branch, doesn't seem to help though, the extra memory access kills the advantage.
- Mixing the two Taylor Series in the obvious way - I'm currently investigating that
- I did look on the net, saw various libraries, but it's a lot easier (and almost always better) to just write my own than hassle with them. So please don't just give me a link to a library unless u have reason to think it's worth it.

If you have a candidate routine use my test bed to get stats, or give it to me and I'll do it.

Here is the fastest version I have at the moment, called trigC:

Code: [Select]
`.data                             ;; for both trig routines    piover2 real8 1.5707963267948966    twooverpi real8 0.63661977236758138.code; »»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»».data                                   ; data used by trigC MACRO    __cos1 real8 -1.2337005501361697    ; 1/2 (pi/2)^2    __cos2 real8 0.16666666666666667    ; 2/(3*4)    __cos3 real8 0.06666666666666667    ; 2/(5*6)    __cos4 real8 0.03571428571428571    ; 2/(7*8)    one real8 1.0.code; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««;;*******************************       ;; sin or cos in st(0), uses 4 regstrigC MACRO sincosflag:=<0>             ;; but needs one extra per term                   ;;*******************************    fmul twooverpi                      ;; div by pi/2 to put in 4 quadrants    fld st    fisttp qword ptr [esp-8]            ;; math "int" truncate, down towards -inf    mov eax, dword ptr [esp-8]          ;; (lower word of) int quotient x / pi/2    fild qword ptr [esp-8]    add eax, sincosflag                 ;; sin if sincosflag = 0, cos if 1                          fsub                                ;; now mod 1 (0 to .999999) meaning 0-pi/2    test eax, 1    jnz @F        fld1        fsubrp                          ;; replace w/ 1-x for these quadrants    @@:    fmul st, st    fmul __cos1    fld st              ;; c1*x^2, c1*x^2    fmul st, st(1)      ;; c1^2*x^4, c1*x^2      fmul __cos2         ;; c2*c1^2*x^4, c1*x^2    fld st              ;; c2*c1^2*x^4, c2*c1^2*x^4, c1*x^2    fmul st, st(2)      ;; c2*c1^3*x^6, c2*c1^2*x^4, c1*x^2    fmul __cos3         ;; c3*c2*c1^3*x^6, c2*c1^2*x^4, c1*x^2    IF MORE_PRECISION EQ 1                       ;; do one more term        fld st          ;; c3*c2*c1^3*x^6, c3*c2*c1^3*x^6, c2*c1^2*x^4, c1*x^2        fmul st, st(3)  ;; c3*c2*c1^4*x^8, c3*c2*c1^3*x^6, c2*c1^2*x^4, c1*x^2        fmul __cos4     ;; c4*c3*c2*c1^4*x^8, c3*c2*c1^3*x^6, c2*c1^2*x^4, c1*x^2        fadd    ENDIF    fadd    fadd    fadd one                            ;; answer in st(0) all other regs free    and eax, 2    je @F        fchs                            ;; was in a negative quadrant    @@:ENDM;;*******************************`
and here are runs with timing and precision stats:

Code: [Select]
`Intel i5 3330 2.94 Ghz    ----------------------------------------FPU fsin nanos per iter         32.129FPU fcos nanos per iter         31.583Test Function: trigS ========================================********** SIN Using faster version **********Nanoseconds per Iteration       4.996Speed ratio FPU/test fn         6.38Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** COS Using faster version **********Nanoseconds per Iteration       4.910Speed ratio FPU/test fn         6.49Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** SIN Using higher precision **********Nanoseconds per Iteration       8.427Speed ratio FPU/test fn         3.78Precisionaverage precision               4.12e-009worst precision                 5.63e-008********** COS Using higher precision **********Nanoseconds per Iteration       8.380Speed ratio FPU/test fn         3.8Precisionaverage precision               4.12e-009worst precision                 5.63e-008Test Function: trigC ========================================********** SIN Using faster version **********Nanoseconds per Iteration       3.604Speed ratio FPU/test fn         8.84Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** COS Using faster version **********Nanoseconds per Iteration       3.463Speed ratio FPU/test fn         9.2Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** SIN Using higher precision **********Nanoseconds per Iteration       5.211Speed ratio FPU/test fn         6.11Precisionaverage precision               2.29e-006worst precision                 2.47e-005********** COS Using higher precision **********Nanoseconds per Iteration       5.206Speed ratio FPU/test fn         6.12Precisionaverage precision               2.29e-006worst precision                 2.47e-005================================================================================================AMD A6 1.8 Ghz    ----------------------------------------FPU fsin nanos per iter         35.279FPU fcos nanos per iter         37.223Test Function: trigS ========================================********** SIN Using faster version **********Nanoseconds per Iteration       17.517Speed ratio FPU/test fn         2.07Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** COS Using faster version **********Nanoseconds per Iteration       17.350Speed ratio FPU/test fn         2.09Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** SIN Using higher precision **********Nanoseconds per Iteration       27.397Speed ratio FPU/test fn         1.32Precisionaverage precision               4.12e-009worst precision                 5.63e-008********** COS Using higher precision **********Nanoseconds per Iteration       27.137Speed ratio FPU/test fn         1.34Precisionaverage precision               4.12e-009worst precision                 5.63e-008Test Function: trigC ========================================********** SIN Using faster version **********Nanoseconds per Iteration       12.161Speed ratio FPU/test fn         2.98Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** COS Using faster version **********Nanoseconds per Iteration       12.103Speed ratio FPU/test fn         3Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** SIN Using higher precision **********Nanoseconds per Iteration       16.754Speed ratio FPU/test fn         2.16Precisionaverage precision               2.29e-006worst precision                 2.47e-005********** COS Using higher precision **********Nanoseconds per Iteration       16.673Speed ratio FPU/test fn         2.17Precisionaverage precision               2.29e-006worst precision                 2.47e-005`
The zip includes trig32.asm with trigS and trigC, plus test_support_macros.asm with the test bed. Writing that was a PITA, more code than the trig routines, and much less fun. The zip also has trig32.exe which produces tables like that shown above.

 apologies to anyone who downloaded this expecting it to be correct - u shld know me better :) qword pointed out mistake. fisttp shld be fistp, with "down towards -inf" rounding. Has no effect on any runs made so far, but would if u used negative inputs...

« Last Edit: March 31, 2015, 03:14:37 PM by rrr314159 »
I am NaN ;)

#### sinsi

• Guest
##### Re: Trigonometry ...
« Reply #1 on: March 26, 2015, 05:32:48 PM »
AMD A10-7850K 3.7GHz
Code: [Select]
`    --------------------------------------FPU fsin nanos per iter         30.486FPU fcos nanos per iter         29.228Test Function: trigS =====================********** SIN Using faster version ******Nanoseconds per Iteration       4.541Speed ratio FPU/test fn         6.57Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** COS Using faster version ******Nanoseconds per Iteration       4.527Speed ratio FPU/test fn         6.6Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** SIN Using higher precision ****Nanoseconds per Iteration       7.935Speed ratio FPU/test fn         3.76Precisionaverage precision               4.12e-009worst precision                 5.63e-008********** COS Using higher precision ****Nanoseconds per Iteration       8.240Speed ratio FPU/test fn         3.62Precisionaverage precision               4.12e-009worst precision                 5.63e-008Test Function: trigC =====================********** SIN Using faster version ******Nanoseconds per Iteration       2.835Speed ratio FPU/test fn         10.5Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** COS Using faster version ******Nanoseconds per Iteration       2.906Speed ratio FPU/test fn         10.3Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** SIN Using higher precision ****Nanoseconds per Iteration       4.382Speed ratio FPU/test fn         6.81Precisionaverage precision               2.29e-006worst precision                 2.47e-005********** COS Using higher precision ****Nanoseconds per Iteration       4.677Speed ratio FPU/test fn         6.38Precisionaverage precision               2.29e-006worst precision                 2.47e-005`

#### rrr314159

• Member
• Posts: 1382
##### Re: Trigonometry ...
« Reply #2 on: March 26, 2015, 05:46:39 PM »
Thanks Sinsi, your A10 has similar numbers to my i5. Strange how AMD A6, while generally much slower, is so fast for fsin and fcos
I am NaN ;)

#### MichaelW

• Global Moderator
• Member
• Posts: 1204
##### Re: Trigonometry ...
« Reply #3 on: March 26, 2015, 05:47:21 PM »
Core-i3 3.0 GHz:
Code: [Select]
`    ----------------------------------------FPU fsin nanos per iter         34.265FPU fcos nanos per iter         34.241Test Function: trigS ========================================********** SIN Using faster version **********Nanoseconds per Iteration       5.325Speed ratio FPU/test fn         6.43Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** COS Using faster version **********Nanoseconds per Iteration       5.256Speed ratio FPU/test fn         6.52Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** SIN Using higher precision **********Nanoseconds per Iteration       9.030Speed ratio FPU/test fn         3.79Precisionaverage precision               4.12e-009worst precision                 5.63e-008********** COS Using higher precision **********Nanoseconds per Iteration       8.969Speed ratio FPU/test fn         3.82Precisionaverage precision               4.12e-009worst precision                 5.63e-008Test Function: trigC ========================================********** SIN Using faster version **********Nanoseconds per Iteration       3.778Speed ratio FPU/test fn         9.07Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** COS Using faster version **********Nanoseconds per Iteration       3.645Speed ratio FPU/test fn         9.4Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** SIN Using higher precision **********Nanoseconds per Iteration       5.434Speed ratio FPU/test fn         6.3Precisionaverage precision               2.29e-006worst precision                 2.47e-005********** COS Using higher precision **********Nanoseconds per Iteration       5.444Speed ratio FPU/test fn         6.29Precisionaverage precision               2.29e-006worst precision                 2.47e-005E:\Downloads\NaN\trig_functions\trig functions>pausePress any key to continue . . .`

Well Microsoft, here’s another nice mess you’ve gotten us into.

#### dedndave

• Member
• Posts: 8829
• Still using Abacus 2.0
##### Re: Trigonometry ...
« Reply #4 on: March 26, 2015, 05:47:55 PM »
p-4 prescott w/htt @ 3 GHz
Code: [Select]
`FPU fsin nanos per iter 0.048FPU fcos nanos per iter 0.048Test Function: trigS ========================================********** SIN Using faster version **********Nanoseconds per Iteration 0.029Speed ratio FPU/test fn 1.63Precisionaverage precision 1.59e-005worst precision    1.57e-004********** COS Using faster version **********Nanoseconds per Iteration 0.030Speed ratio FPU/test fn 1.6Precisionaverage precision 1.59e-005worst precision    1.57e-004********** SIN Using higher precision **********Nanoseconds per Iteration 0.032Speed ratio FPU/test fn 1.51Precisionaverage precision 4.12e-009worst precision    5.63e-008********** COS Using higher precision **********Nanoseconds per Iteration 0.031Speed ratio FPU/test fn 1.51Precisionaverage precision 4.12e-009worst precision    5.63e-008Test Function: trigC ========================================********** SIN Using faster version **********Nanoseconds per Iteration 0.029Speed ratio FPU/test fn 1.63Precisionaverage precision 1.01e-004worst precision    8.95e-004********** COS Using faster version **********Nanoseconds per Iteration 0.029Speed ratio FPU/test fn 1.63Precisionaverage precision 1.01e-004worst precision    8.95e-004********** SIN Using higher precision **********Nanoseconds per Iteration 0.030Speed ratio FPU/test fn 1.6Precisionaverage precision 2.29e-006worst precision    2.47e-005********** COS Using higher precision **********Nanoseconds per Iteration 0.030Speed ratio FPU/test fn 1.6Precisionaverage precision 2.29e-006worst precision    2.47e-005`

#### TWell

• Member
• Posts: 748
##### Re: Trigonometry ...
« Reply #5 on: March 26, 2015, 06:11:29 PM »
AMD E1-6010 1.35 GHz
Code: [Select]
`    ----------------------------------------FPU fsin nanos per iter 60.574FPU fcos nanos per iter 63.197Test Function: trigS ========================================********** SIN Using faster version **********Nanoseconds per Iteration 29.655Speed ratio FPU/test fn 2.09Precisionaverage precision 1.59e-005worst precision    1.57e-004********** COS Using faster version **********Nanoseconds per Iteration 30.067Speed ratio FPU/test fn 2.06Precisionaverage precision 1.59e-005worst precision    1.57e-004********** SIN Using higher precision **********Nanoseconds per Iteration 51.286Speed ratio FPU/test fn 1.21Precisionaverage precision 4.12e-009worst precision    5.63e-008********** COS Using higher precision **********Nanoseconds per Iteration 48.504Speed ratio FPU/test fn 1.28Precisionaverage precision 4.12e-009worst precision    5.63e-008Test Function: trigC ========================================********** SIN Using faster version **********Nanoseconds per Iteration 20.868Speed ratio FPU/test fn 2.97Precisionaverage precision 1.01e-004worst precision    8.95e-004********** COS Using faster version **********Nanoseconds per Iteration 20.494Speed ratio FPU/test fn 3.02Precisionaverage precision 1.01e-004worst precision    8.95e-004********** SIN Using higher precision **********Nanoseconds per Iteration 28.468Speed ratio FPU/test fn 2.17Precisionaverage precision 2.29e-006worst precision    2.47e-005********** COS Using higher precision **********Nanoseconds per Iteration 29.783Speed ratio FPU/test fn 2.08Precisionaverage precision 2.29e-006worst precision    2.47e-005`

#### MichaelW

• Global Moderator
• Member
• Posts: 1204
##### Re: Trigonometry ...
« Reply #6 on: March 26, 2015, 07:01:26 PM »
Core-i3 3.0 GHz:
FPU fsin nanos per iter         34.265

p-4 prescott w/htt @ 3 GHz
FPU fsin nanos per iter    0.048

A P4 with the same clock speed and a lower IPC is ~713 times faster?
Well Microsoft, here’s another nice mess you’ve gotten us into.

#### jj2007

• Member
• Posts: 11894
• Assembler is fun ;-)
##### Re: Trigonometry ...
« Reply #7 on: March 26, 2015, 07:25:19 PM »
Core i5:
Code: [Select]
`FPU fsin nanos per iter         33.359FPU fcos nanos per iter         32.851Test Function: trigS ========================================********** SIN Using faster version **********Nanoseconds per Iteration       4.705Speed ratio FPU/test fn         7.04Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** COS Using faster version **********Nanoseconds per Iteration       4.586Speed ratio FPU/test fn         7.22Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** SIN Using higher precision **********Nanoseconds per Iteration       8.167Speed ratio FPU/test fn         4.05Precisionaverage precision               4.12e-009worst precision                 5.63e-008********** COS Using higher precision **********Nanoseconds per Iteration       8.208Speed ratio FPU/test fn         4.03Precisionaverage precision               4.12e-009worst precision                 5.63e-008Test Function: trigC ========================================********** SIN Using faster version **********Nanoseconds per Iteration       3.212Speed ratio FPU/test fn         10.3Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** COS Using faster version **********Nanoseconds per Iteration       3.200Speed ratio FPU/test fn         10.3Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** SIN Using higher precision **********Nanoseconds per Iteration       4.908Speed ratio FPU/test fn         6.75Precisionaverage precision               2.29e-006worst precision                 2.47e-005********** COS Using higher precision **********Nanoseconds per Iteration       4.911Speed ratio FPU/test fn         6.74Precisionaverage precision               2.29e-006worst precision                 2.47e-005`

#### hutch--

• Member
• Posts: 8893
• Mnemonic Driven API Grinder
##### Re: Trigonometry ...
« Reply #8 on: March 26, 2015, 07:29:03 PM »
On my i7.

Code: [Select]
`FPU fsin nanos per iter         30.473FPU fcos nanos per iter         30.446Test Function: trigS ========================================********** SIN Using faster version **********Nanoseconds per Iteration       8.907Speed ratio FPU/test fn         3.42Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** COS Using faster version **********Nanoseconds per Iteration       9.117Speed ratio FPU/test fn         3.34Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** SIN Using higher precision **********Nanoseconds per Iteration       12.853Speed ratio FPU/test fn         2.37Precisionaverage precision               4.12e-009worst precision                 5.63e-008********** COS Using higher precision **********Nanoseconds per Iteration       12.999Speed ratio FPU/test fn         2.34Precisionaverage precision               4.12e-009worst precision                 5.63e-008Test Function: trigC ========================================********** SIN Using faster version **********Nanoseconds per Iteration       6.278Speed ratio FPU/test fn         4.85Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** COS Using faster version **********Nanoseconds per Iteration       5.839Speed ratio FPU/test fn         5.22Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** SIN Using higher precision **********Nanoseconds per Iteration       8.680Speed ratio FPU/test fn         3.51Precisionaverage precision               2.29e-006worst precision                 2.47e-005********** COS Using higher precision **********Nanoseconds per Iteration       8.861Speed ratio FPU/test fn         3.44Precisionaverage precision               2.29e-006worst precision                 2.47e-005`
hutch at movsd dot com
http://www.masm32.com

#### nidud

• Member
• Posts: 2354
##### Re: Trigonometry ...
« Reply #9 on: March 26, 2015, 11:31:08 PM »
AMD Athlon(tm) II X2 245 Processor (SSE3)
Code: [Select]
`FPU fsin nanos per iter 0.024FPU fcos nanos per iter 0.025Test Function: trigS ========================================********** SIN Using faster version **********Nanoseconds per Iteration 0.022Speed ratio FPU/test fn 1.14Precisionaverage precision 1.59e-005worst precision    1.57e-004********** COS Using faster version **********Nanoseconds per Iteration 0.021Speed ratio FPU/test fn 1.15Precisionaverage precision 1.59e-005worst precision    1.57e-004********** SIN Using higher precision **********Nanoseconds per Iteration 0.027Speed ratio FPU/test fn 0.903Precisionaverage precision 4.12e-009worst precision    5.63e-008********** COS Using higher precision **********Nanoseconds per Iteration 0.027Speed ratio FPU/test fn 0.898Precisionaverage precision 4.12e-009worst precision    5.63e-008Test Function: trigC ========================================********** SIN Using faster version **********Nanoseconds per Iteration 0.021Speed ratio FPU/test fn 1.16Precisionaverage precision 1.01e-004worst precision    8.95e-004********** COS Using faster version **********Nanoseconds per Iteration 0.021Speed ratio FPU/test fn 1.16Precisionaverage precision 1.01e-004worst precision    8.95e-004********** SIN Using higher precision **********Nanoseconds per Iteration 0.026Speed ratio FPU/test fn 0.96Precisionaverage precision 2.29e-006worst precision    2.47e-005********** COS Using higher precision **********Nanoseconds per Iteration 0.025Speed ratio FPU/test fn 0.962Precisionaverage precision 2.29e-006worst precision    2.47e-005`

#### Siekmanski

• Member
• Posts: 2446
##### Re: Trigonometry ...
« Reply #10 on: March 27, 2015, 12:12:01 AM »
Windows 8.1  Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz

Code: [Select]
`    ----------------------------------------FPU fsin nanos per iter         6.956FPU fcos nanos per iter         6.860Test Function: trigS ========================================********** SIN Using faster version **********Nanoseconds per Iteration       1.051Speed ratio FPU/test fn         6.57Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** COS Using faster version **********Nanoseconds per Iteration       1.031Speed ratio FPU/test fn         6.7Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** SIN Using higher precision **********Nanoseconds per Iteration       1.835Speed ratio FPU/test fn         3.76Precisionaverage precision               4.12e-009worst precision                 5.63e-008********** COS Using higher precision **********Nanoseconds per Iteration       1.787Speed ratio FPU/test fn         3.86Precisionaverage precision               4.12e-009worst precision                 5.63e-008Test Function: trigC ========================================********** SIN Using faster version **********Nanoseconds per Iteration       0.749Speed ratio FPU/test fn         9.23Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** COS Using faster version **********Nanoseconds per Iteration       0.718Speed ratio FPU/test fn         9.62Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** SIN Using higher precision **********Nanoseconds per Iteration       1.096Speed ratio FPU/test fn         6.3Precisionaverage precision               2.29e-006worst precision                 2.47e-005********** COS Using higher precision **********Nanoseconds per Iteration       1.096Speed ratio FPU/test fn         6.3Precisionaverage precision               2.29e-006worst precision                 2.47e-005`
« Last Edit: March 30, 2015, 08:40:50 AM by Siekmanski »
Creative coders use backward thinking techniques as a strategy.

#### dedndave

• Member
• Posts: 8829
• Still using Abacus 2.0
##### Re: Trigonometry ...
« Reply #11 on: March 27, 2015, 01:27:46 AM »
anyone notice that my P4 and nidud's Athlon are kicking ass ?

maybe something not right with the test program ?

#### rrr314159

• Member
• Posts: 1382
##### Re: Trigonometry ...
« Reply #12 on: March 27, 2015, 02:50:48 AM »
Thanks Marinus,

But, (AFAIK) u need vgather instruction to use LUT with SSE, which is no good for me because, even if I had it, very few others do. Maybe your routine has some way to deal with its absence, I'll see (this evening, busy with "life" stuff now). Your i7 is scary fast, I gotta have one of those!

BTW LUT is used only for first quadrant as a matter of course, the other quadrants are symmetric - if u know anyone using a LUT all around the circle, ... well, don't let them perform any work where numerical skills are vital, let's put it that way.

My LUT currently is only 4096, I did make a version with 16384, even 65536 just for the heck of it, but that was overkill. One problem is that, although precision vs. speed is rather better than Taylor Series, it's not smooth, but stair-steps between the LUT points. For graphics (my main use these days) a less precise, but smooth, curve is much better than a more precise jagged one. (Of course u can interpolate between LUT points to give an n-gon shape but that totally kills speed advantage). Considering that, the LUT and T.S. approaches are fairly equal. The T.S. is easier to deal with. However nice thing about LUT, easily adaptable to other periodic curves such as sin^2, can even make custom curves (user makes with the mouse) that can look cool or weird. Bottom line - I'd stick with LUT if I, and "almost all" users, had vgather.

@dedndave and MichaelW, yes something is definitely wrong w/ test program on those old machines, Prescott and Athlon II! I guess - they don't have QueryPerformanceFrequency instruction? (Or, mfence ... but why didn't they just blow up...?) So they're trying to divide by zero, or random garbage. Sorry. Still don't feel you've wasted your time (dedndave and nidud) those results are more valuable than others, to remind me there are still some computers out there that need alternate routines. Perhaps, for them, I can keep count by making notches in a clay tablet ... At least the actual calc's are being done right, as u can tell from the precision numbers, which should be (and are) identical.

All the other times look right, thanks to all.

 Siekmanski, put life on hold for a moment and looked at your routine, didn't realize it was so simple. Looks like better way to handle LUT than way I've been doing it (which has about twice as many instructions), but - it only looks up one value at a time, right? Not using SIMD - that's why there's no vgather. Presumably T.S. (which can easily calc 4 or 8, whatever, values at once, with SSE) will turn out to be better - will let u know.
I am NaN ;)

#### Antariy

• Member
• Posts: 564
##### Re: Trigonometry ...
« Reply #13 on: March 27, 2015, 05:24:51 AM »
rrr, what is "T.S"?

The timings on Celeron D310
Code: [Select]
`    ----------------------------------------FPU fsin nanos per iter         78.477FPU fcos nanos per iter         78.188Test Function: trigS ========================================********** SIN Using faster version **********Nanoseconds per Iteration       26.469Speed ratio FPU/test fn         2.96Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** COS Using faster version **********Nanoseconds per Iteration       26.556Speed ratio FPU/test fn         2.95Precisionaverage precision               1.59e-005worst precision                 1.57e-004********** SIN Using higher precision **********Nanoseconds per Iteration       43.149Speed ratio FPU/test fn         1.82Precisionaverage precision               4.12e-009worst precision                 5.63e-008********** COS Using higher precision **********Nanoseconds per Iteration       43.107Speed ratio FPU/test fn         1.82Precisionaverage precision               4.12e-009worst precision                 5.63e-008Test Function: trigC ========================================********** SIN Using faster version **********Nanoseconds per Iteration       22.383Speed ratio FPU/test fn         3.5Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** COS Using faster version **********Nanoseconds per Iteration       22.422Speed ratio FPU/test fn         3.49Precisionaverage precision               1.01e-004worst precision                 8.95e-004********** SIN Using higher precision **********Nanoseconds per Iteration       32.139Speed ratio FPU/test fn         2.44Precisionaverage precision               2.29e-006worst precision                 2.47e-005********** COS Using higher precision **********Nanoseconds per Iteration       32.084Speed ratio FPU/test fn         2.44Precisionaverage precision               2.29e-006worst precision                 2.47e-005`

#### dedndave

• Member
• Posts: 8829
• Still using Abacus 2.0
##### Re: Trigonometry ...
« Reply #14 on: March 27, 2015, 05:30:46 AM »
T.S. = Taylor Series   :P

this machine has QueryPerformanceCounter
something not quite right, there   :redface:
maybe you do a MUL and disregard the high dword ? - something along those lines