Yup a little late, but I wanted to test the performance of my new toy...
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (SSE4)
27111 cycles for 100 * MbInstr 0
24259 cycles for 100 * MbInstr 1
26501 cycles for 100 * MbInstr 2
23610 cycles for 100 * MbInstr 4
24436 cycles for 100 * crt_strstr
27061 cycles for 100 * M32 find$
26900 cycles for 100 * MbInstr 0
29255 cycles for 100 * MbInstr 1
29007 cycles for 100 * MbInstr 2
27196 cycles for 100 * MbInstr 4
28573 cycles for 100 * crt_strstr
26529 cycles for 100 * M32 find$
29880 cycles for 100 * MbInstr 0
24030 cycles for 100 * MbInstr 1
23902 cycles for 100 * MbInstr 2
25233 cycles for 100 * MbInstr 4
22454 cycles for 100 * crt_strstr
26943 cycles for 100 * M32 find$
18 bytes for MbInstr 0
18 bytes for MbInstr 1
18 bytes for MbInstr 2
18 bytes for MbInstr 4
22 bytes for crt_strstr
15 bytes for M32 find$
97 = eax MbInstr 0
97 = eax MbInstr 1
97 = eax MbInstr 2
97 = eax MbInstr 4
97 = eax crt_strstr
97 = eax M32 find$

:P