if you are spoiled by 4.5ghz cpu on PC,you dont notice any lag or problem with your code
but if you program for 1.6+ghz console,I guess its to get most out of SIMT first to distribute between many cores,maybe critical sections are exchanged from scalar code to SIMD
but it must also be advantage to run easier on Atom laptops
also one SIMT question
so I get a separate stack space for each workerthread?so I could keep it running in a PROC,doing stack tricks here,without affecting main threads stack