well if you use trigo later for drawing in this 64bit era,wouldnt fixed point table be faster alternative,even 32bit support MUL that results in 32bit in eax,32bit in edx?also DIV use both 32bit registers?
You can timing that to be sure 
it's best to make optimisation and timings also in practical uses,whole tunnel(stargate),sphere(planet) code,float to int conversion take not only some cycles,mixing SSE floating point code and SSE 2 integer code you get some penalty
For example circle , can be from 32x32 sprite to big hires 4k hd screen, 32 diameter* pi vs 1080 diameter * pi, so a general 360 degree LUT, works best for 360 pixel circle, too many for 32diameter and too few for 1080 diameter