I opened a new thread called "
Timings for bswap, ror, imul, push+pop vs mov [esp+x], nnn, lodsd vs mov eax,.." in the Lab

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
23 cycles for 100 * imul 10
54 cycles for 100 * lea: *10
4871 cycles for 100 * lodsd (25 DWORDs)
4863 cycles for 100 * mov eax, [esi] + add esi, 4
58 cycles for 100 * lea10, add eax
57 cycles for 100 * lea10, shl eax, 1
27 cycles for 100 * bswap
62 cycles for 100 * ror 16