32 bit CL.EXE used to have the option to output assembler code and while it was scruffy looking stuff, it generally worked OK. It was a reasonable way to get a C algo, output it as asm then optimised the asm as well as tidy it up.
It still does. Sometimes we can get smaller executables simply by using an Assembler on that.
For speed is more difficult but some times possible.
It is not always possible, if I provide some pseudo code and tell an ASM programmer, even a good one, to code it without checking what a C/C++ compiler would do, most times the compiler will do better. So, trying to see first what the compiler does and optimize on it may be fruitful. Trying to beat it blindly will likely be a failure.
