Author Topic: 1/x timings for FPU and SIMD code  (Read 569 times)

jj2007

  • Member
  • *****
  • Posts: 8735
  • Assembler is fun ;-)
    • MasmBasic
Re: 1/x timings for FPU and SIMD code
« Reply #15 on: June 25, 2018, 03:15:00 AM »
fdiv is definitely out

Not for Yuri's i3 - rcpss is 1/x only, while fdiv and divss are used for generic division. But I agree that on other cpus divss is faster. Whether it matters is another question: Do you have an innermost loop with a Million iterations that needs a division and can live with low precision?

LiaoMi

  • Member
  • ***
  • Posts: 288
Re: 1/x timings for FPU and SIMD code
« Reply #16 on: June 25, 2018, 10:56:44 PM »
Code: [Select]
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

2838 cycles for 1000 * rcpss
12369 cycles for 1000 * 1/x using fdiv
4554 cycles for 1000 * 1/x using divss

2852 cycles for 1000 * rcpss
12214 cycles for 1000 * 1/x using fdiv
4574 cycles for 1000 * 1/x using divss

2873 cycles for 1000 * rcpss
13037 cycles for 1000 * 1/x using fdiv
4570 cycles for 1000 * 1/x using divss

24 bytes for rcpss
23 bytes for 1/x using fdiv
39 bytes for 1/x using divss

ST0 123453440.0000000000
ST0 123456792.0000000000
ST0 123453440.0000000000

--- ok ---

FORTRANS

  • Member
  • *****
  • Posts: 1016
Re: 1/x timings for FPU and SIMD code
« Reply #17 on: June 26, 2018, 03:20:18 AM »
Code: [Select]
Cut and paste from screen.
F:\TEMP\TEST>1_DIV_X
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

2088    cycles for 1000 * rcpss
13466   cycles for 1000 * fdiv

2085    cycles for 1000 * rcpss
13432   cycles for 1000 * fdiv

2087    cycles for 1000 * rcpss
13449   cycles for 1000 * fdiv

2050    cycles for 1000 * rcpss
13485   cycles for 1000 * fdiv

2083    cycles for 1000 * rcpss
13454   cycles for 1000 * fdiv

24      bytes for rcpss
23      bytes for fdiv

ST0     123453440.0000000000
ST0     123456792.0000000000

--- ok ---Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

3732 cycles for 1000 * rcpss
15936 cycles for 1000 * fdiv

3736 cycles for 1000 * rcpss
15921 cycles for 1000 * fdiv

3738 cycles for 1000 * rcpss
15888 cycles for 1000 * fdiv

3762 cycles for 1000 * rcpss
15983 cycles for 1000 * fdiv

3762 cycles for 1000 * rcpss
15903 cycles for 1000 * fdiv

24 bytes for rcpss
23 bytes for fdiv

ST0 123453440.0000000000
ST0 123456792.0000000000

--- ok ---

Output redirected to file.
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

2246 cycles for 1000 * rcpss
13534 cycles for 1000 * 1/x using fdiv
16239 cycles for 1000 * 1/x using divss

2078 cycles for 1000 * rcpss
13686 cycles for 1000 * 1/x using fdiv
16026 cycles for 1000 * 1/x using divss

2482 cycles for 1000 * rcpss
13335 cycles for 1000 * 1/x using fdiv
16349 cycles for 1000 * 1/x using divss

24 bytes for rcpss
23 bytes for 1/x using fdiv
39 bytes for 1/x using divss

ST0 123453440.0000000000
ST0 123456792.0000000000
ST0 123453440.0000000000

--- ok ---

F:\TEMP\TEST>1_DIV_X
Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

3770    cycles for 1000 * rcpss
15923   cycles for 1000 * 1/x using fdiv
5992    cycles for 1000 * 1/x using divss

3768    cycles for 1000 * rcpss
15919   cycles for 1000 * 1/x using fdiv
5970    cycles for 1000 * 1/x using divss

3770    cycles for 1000 * rcpss
15932   cycles for 1000 * 1/x using fdiv
5970    cycles for 1000 * 1/x using divss

24      bytes for rcpss
23      bytes for 1/x using fdiv
39      bytes for 1/x using divss

ST0     123453440.0000000000
ST0     123456792.0000000000
ST0     123453440.0000000000

--- ok ---
Intel(R) Celeron(R) CPU N3350 @ 1.10GHz (SSE4)

1433    cycles for 1000 * rcpss
22803   cycles for 1000 * 1/x using fdiv
8498    cycles for 1000 * 1/x using divss

1411    cycles for 1000 * rcpss
21311   cycles for 1000 * 1/x using fdiv
8264    cycles for 1000 * 1/x using divss

1420    cycles for 1000 * rcpss
22523   cycles for 1000 * 1/x using fdiv
8388    cycles for 1000 * 1/x using divss

24      bytes for rcpss
23      bytes for 1/x using fdiv
39      bytes for 1/x using divss

ST0     123453440.0000000000
ST0     123456792.0000000000
ST0     123453440.0000000000

--- ok ---