olá!
You appear to know a lot about these things.
Not really. I'm fairly new to asm myself. but I've been doing a really intensive training and learning from whatever source I can. As I'm currently writing I prog in asm myself, I happened to benchmark shl and can confirm it's a rather slow option, if you are aiming for speed.
You never heard that lea is an handy fast arithmetic calculator? I am using it like that, not to load an effective memory address.
I do! But I confess I only passed my eyes on the lea instructions. Sorry. Nevertheless, it's a place to check the clock and maybe see if it's the fastest option.
I'm very curious to see how you and these guys can optimize this algo and how asm will react afterwards.
there is also a suggestion from Hutch I read sometime ago one should take into consideration: building the asm side in a dedicated ide, not in VS.