Inspired by Vortex' code for PoAsm (http://masm32.com/board/index.php?topic=445.msg3261#new):
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
100 result UniStrLen
100 result UniStrLen2
100 result UniStrLen3
100 result UniStrLen4
100 result wStrLen
30 bytes for UniStrLen
20 bytes for UniStrLen2
20 bytes for UniStrLen3
15 bytes for UniStrLen4
11 bytes for wStrLen
227 cycles for UniStrLen
213 cycles for UniStrLen2
214 cycles for UniStrLen3
218 cycles for UniStrLen4
215 cycles for wStrLen
226 cycles for UniStrLen
213 cycles for UniStrLen2
214 cycles for UniStrLen3
218 cycles for UniStrLen4
215 cycles for wStrLen
MasmBasic does the 100-char string in about 150 cycles, but the attached source is pure Masm32 ;)
Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz (SSE4)
100 result UniStrLen
100 result UniStrLen2
100 result UniStrLen3
100 result UniStrLen4
100 result wStrLen
30 bytes for UniStrLen
20 bytes for UniStrLen2
20 bytes for UniStrLen3
15 bytes for UniStrLen4
11 bytes for wStrLen
167 cycles for UniStrLen
141 cycles for UniStrLen2
141 cycles for UniStrLen3
162 cycles for UniStrLen4
162 cycles for wStrLen
167 cycles for UniStrLen
145 cycles for UniStrLen2
141 cycles for UniStrLen3
162 cycles for UniStrLen4
162 cycles for wStrLen
--- ok ---