The general slowness of the MSVCRT functions can be partially explained by the need to run on older processors. For my test I used the Microsoft strcmp source from the PSDK, compiled with the range of optimizations provided with the VC++ Toolkit 2003 compiler.

Windows 2000 SP4, P6:

`1105 cycles, crt_strcmp`

882 cycles, strcmp_gb

883 cycles, strcmp_g3

882 cycles, strcmp_g4

882 cycles, strcmp_g5

882 cycles, strcmp_g6

1098 cycles, strcmp_g7

1098 cycles, strcmp_g7_sse2

1106 cycles, crt_strcmp

883 cycles, strcmp_gb

883 cycles, strcmp_g3

883 cycles, strcmp_g4

883 cycles, strcmp_g5

883 cycles, strcmp_g6

1098 cycles, strcmp_g7

1098 cycles, strcmp_g7_sse2

1106 cycles, crt_strcmp

884 cycles, strcmp_gb

883 cycles, strcmp_g3

883 cycles, strcmp_g4

883 cycles, strcmp_g5

883 cycles, strcmp_g6

1097 cycles, strcmp_g7

1097 cycles, strcmp_g7_sse2

Windows XP SP3, P4 Northwood:

`633 cycles, crt_strcmp`

1318 cycles, strcmp_gb

1316 cycles, strcmp_g3

1316 cycles, strcmp_g4

1316 cycles, strcmp_g5

1319 cycles, strcmp_g6

893 cycles, strcmp_g7

911 cycles, strcmp_g7_sse2

619 cycles, crt_strcmp

1317 cycles, strcmp_gb

1316 cycles, strcmp_g3

1316 cycles, strcmp_g4

1316 cycles, strcmp_g5

1316 cycles, strcmp_g6

904 cycles, strcmp_g7

914 cycles, strcmp_g7_sse2

626 cycles, crt_strcmp

1317 cycles, strcmp_gb

1315 cycles, strcmp_g3

1316 cycles, strcmp_g4

1316 cycles, strcmp_g5

1316 cycles, strcmp_g6

904 cycles, strcmp_g7

915 cycles, strcmp_g7_sse2

Note how much lower the cycle count is for the XP SP3 MSVCRT, and that this is running on a processor with a lower IPC than the P3.

The relevant parts of the code-generation options:

`/G3 optimize for 80386`

/G4 optimize for 80486

/G5 optimize for Pentium

/G6 optimize for PPro, P-II, P-III

/G7 optimize for Pentium 4 or Athlon

/GB optimize for blended model (default)

/arch:<SSE|SSE2> minimum CPU architecture requirements, one of:

SSE - enable use of instructions available with SSE enabled CPUs

SSE2 - enable use of instructions available with SSE2 enabled CPUs

The SSE2 option had no effect.