Author Topic: Fast Fourier transform - AVX and FMA3  (Read 217 times)

LiaoMi

  • Member
  • ***
  • Posts: 486
Fast Fourier transform - AVX and FMA3
« on: June 12, 2019, 11:36:37 PM »
Hi,



I began to translate one example for masm assembler located here - https://www.nayuki.io/page/fast-fourier-transform-in-x86-assembly

FFT library to assess the effort and speedup of a hand-written SIMD vectorized implementation. The assembly implementation is under 150 lines of clear code; it achieves a speedup of 2× on small inputs, but only slight speedup on large inputs (memory-bound?).
The assembly code requires x86-64 AVX and FMA3, supported by Intel Haswell and later CPUs.

License: MIT (open source)
Source code notes:

The key idea behind the x86-64 AVX implementation is that all processing happens on vectors of four 64-bit floats, rather than individual scalar floats. In theory this could be 4× as fast as scalar code, but is limited by memory bandwidth, FPU execution resources, inherently serial loop control code, etc. Also, the cosine/sine tables are interleaved in blocks of 4 elements, and the table values for each FFT size are packed together so that the memory access stride is always contiguous.

As a bonus, the x86-64 AVX auxiliary C code has an implementation of a more accurate sine function by doing argument reduction to a smaller domain (1/8th of a circle instead of a half circle).


It remains to correct fft-test.asm and fft-x8664-avx-aux.c.asm, I'll finish what I started with small steps, in the meantime, maybe someone will be interested to see the implementation.

daydreamer

  • Member
  • ****
  • Posts: 808
  • watch Chebyshev on the backside of the Moon
Re: Fast Fourier transform - AVX and FMA3
« Reply #1 on: June 13, 2019, 04:18:11 AM »
nice Liaomi
but if you replace sincos table with generate it with AVX on the fly,the code doesnt need to read lots of values from tables it would maybe handle higher input?

Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
MM,XMM,YMM,ZMM registers,whats next?use nonenglish 29 letter alphabet,ÅMM,ÄMM,ÖMM registers?:D

Siekmanski

  • Member
  • *****
  • Posts: 1827
Re: Fast Fourier transform - AVX and FMA3
« Reply #2 on: June 13, 2019, 10:03:06 AM »
Don't think that would be faster.
You can precalculate all sin_cos pairs for each log_loop and read 8 values at once and do 8 multiplications at once.
Creative coders use backward thinking techniques as a strategy.

daydreamer

  • Member
  • ****
  • Posts: 808
  • watch Chebyshev on the backside of the Moon
Re: Fast Fourier transform - AVX and FMA3
« Reply #3 on: June 13, 2019, 01:54:55 PM »
Don't think that would be faster.
You can precalculate all sin_cos pairs for each log_loop and read 8 values at once and do 8 multiplications at once.
Been there,done that and I always a SIMD SSEalot (zealot)  :tongue:
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
MM,XMM,YMM,ZMM registers,whats next?use nonenglish 29 letter alphabet,ÅMM,ÄMM,ÖMM registers?:D

Raistlin

  • Member
  • ***
  • Posts: 481
Re: Fast Fourier transform - AVX and FMA3
« Reply #4 on: June 14, 2019, 03:54:39 PM »
@LiaoMi: I do believe Siekmanski and company killed this topic (as in serious SSE skill-sets applied)
numerous times in this thread -> http://masm32.com/board/index.php?topic=4231.msg75007#msg75007

Some interesting outcomes included reduction of transforms [complexity down from 5 N log2(N) to 4 N log(N)] ,
and a genius-level butterfly bit-reversal strategy (ala Siekmanski/rrr), all within multi-threaded framework.
As to your topical interest: AVX was theorized to provide a factor of x2 improvement on calculations but might
potentially incur memory management penalties that seemingly would negate performance gains from its use.
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...

daydreamer

  • Member
  • ****
  • Posts: 808
  • watch Chebyshev on the backside of the Moon
Re: Fast Fourier transform - AVX and FMA3
« Reply #5 on: June 20, 2019, 09:25:24 PM »
@LiaoMi: I do believe Siekmanski and company killed this topic (as in serious SSE skill-sets applied)
numerous times in this thread -> http://masm32.com/board/index.php?topic=4231.msg75007#msg75007
there is also some very fast chebyshev sine somewhere on board,that is my opinion to test that code instead of the LUT's in FFT code you posted
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
MM,XMM,YMM,ZMM registers,whats next?use nonenglish 29 letter alphabet,ÅMM,ÄMM,ÖMM registers?:D

Raistlin

  • Member
  • ***
  • Posts: 481
Re: Fast Fourier transform - AVX and FMA3
« Reply #6 on: June 22, 2019, 03:06:08 AM »
Very cool Daydreamer will look into what I perceived to be
a passing comment way back when : chebychev thingy
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...