Hi marinus, i analysed your code and there are few things i didn´t understood
Your function InitFFTSinCosTable have a pointer to the variable FFT_SinCosTable and is called like:
invoke InitFFTSinCosTable, addr FFT_SinCosTable, FFTsize, FFTBut a problem seems to happens with the size of the variable followed by a array of another variable with a different size (labeled as ComplexDataXXX which is also a virtual data with 2048 dword).
FFT_SinCosTable dd 1536 dup(?)
ComplexData dd 2048 dup(?)
ComplexData1 dd 2048 dup(?)
ComplexData2 dd 2048 dup(?)
.....But,since FFTSize = 1024, it means that InitFFTSinCosTable converts only 8160 bytes from FFT_SinCosTable (2040 dwords) which results data to be filled in the address at Complexdata as the image below:

Also, why 8160 bytes ? The total size of ComplexData is 8192 (2048 dwords/Real4). So, it is missing the analysis of 8 dwords ?
Another question...concerning the miltithread.
Why using ComplexData as virtual data and why using multiples variables for it ? Does it means that each Threadfucntion (ThreadM1, ThreadM2 etc) will use separated data chuncks from ComplexData ? I mean each ComplexData2, ComplexData3 is a copy of COmplexdata or pointers to the total size of ComplexData divided by 16 ?