I finished a routine that calculates 16 to 1024 size tables.
The algorithm works but I don't have enough registers to calculate larger then 1024 size tables. ( at least I think so.... )
I may need to store temporary pre calculated column values inside the table memory to calculate larger power of 2 size tables.
After all the goal is, to use this algorithm to calculate very large tables as fast as possible.
Edit: see Reply #26 for the source code