News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Fast Fourier transform - AVX and FMA3

Started by LiaoMi, June 12, 2019, 11:36:37 PM

Previous topic - Next topic

LiaoMi

Hi,



I began to translate one example for masm assembler located here - https://www.nayuki.io/page/fast-fourier-transform-in-x86-assembly

FFT library to assess the effort and speedup of a hand-written SIMD vectorized implementation. The assembly implementation is under 150 lines of clear code; it achieves a speedup of 2× on small inputs, but only slight speedup on large inputs (memory-bound?).
The assembly code requires x86-64 AVX and FMA3, supported by Intel Haswell and later CPUs.

License: MIT (open source)
Source code notes:

The key idea behind the x86-64 AVX implementation is that all processing happens on vectors of four 64-bit floats, rather than individual scalar floats. In theory this could be 4× as fast as scalar code, but is limited by memory bandwidth, FPU execution resources, inherently serial loop control code, etc. Also, the cosine/sine tables are interleaved in blocks of 4 elements, and the table values for each FFT size are packed together so that the memory access stride is always contiguous.

As a bonus, the x86-64 AVX auxiliary C code has an implementation of a more accurate sine function by doing argument reduction to a smaller domain (1/8th of a circle instead of a half circle).


It remains to correct fft-test.asm and fft-x8664-avx-aux.c.asm, I'll finish what I started with small steps, in the meantime, maybe someone will be interested to see the implementation.

daydreamer

nice Liaomi
but if you replace sincos table with generate it with AVX on the fly,the code doesnt need to read lots of values from tables it would maybe handle higher input?

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

Siekmanski

Don't think that would be faster.
You can precalculate all sin_cos pairs for each log_loop and read 8 values at once and do 8 multiplications at once.
Creative coders use backward thinking techniques as a strategy.

daydreamer

Quote from: Siekmanski on June 13, 2019, 10:03:06 AM
Don't think that would be faster.
You can precalculate all sin_cos pairs for each log_loop and read 8 values at once and do 8 multiplications at once.
Been there,done that and I always a SIMD SSEalot (zealot)  :tongue:
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

Raistlin

@LiaoMi: I do believe Siekmanski and company killed this topic (as in serious SSE skill-sets applied)
numerous times in this thread -> http://masm32.com/board/index.php?topic=4231.msg75007#msg75007

Some interesting outcomes included reduction of transforms [complexity down from 5 N log2(N) to 4 N log(N)] ,
and a genius-level butterfly bit-reversal strategy (ala Siekmanski/rrr), all within multi-threaded framework.
As to your topical interest: AVX was theorized to provide a factor of x2 improvement on calculations but might
potentially incur memory management penalties that seemingly would negate performance gains from its use.
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...

daydreamer

Quote from: Raistlin on June 14, 2019, 03:54:39 PM
@LiaoMi: I do believe Siekmanski and company killed this topic (as in serious SSE skill-sets applied)
numerous times in this thread -> http://masm32.com/board/index.php?topic=4231.msg75007#msg75007
there is also some very fast chebyshev sine somewhere on board,that is my opinion to test that code instead of the LUT's in FFT code you posted
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

Raistlin

Very cool Daydreamer will look into what I perceived to be
a passing comment way back when : chebychev thingy
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...