My test with the routine to reverse an array 4096 bytes large:
-------------------------------------------------
Intel(R) Core(TM)2 CPU E6600 @ 2.40GHz
Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
-------------------------------------------------
1.193 cycles for Reverse Array with PSHUFB
1.193 cycles for Reverse Array with PSHUFB
Any smarter code, or improvement?
Note: PSHUFB needs a CPU with SSSE3 capabilities
or newer one, from Core duo 2 upwards.
Frank