That's an old story, of course. IMHO M$ should have declared non-volatile ecx at least, because it has several special instructions; that's why MB preserves it. A push/pop pair more in slow API code would cost absolutely nothing.
In 64-bit code, passing paras seems to be a big issue. I would exclude MMX because I like the FPU, but one could declare it a matter of taste.
OTOH, why aren't the additional r8...r15 enough? Do we really need so many regs?? Besides, with movd, movlps and movhps, one has really plenty of additional dword and qword registers. Wasn't one of the "important" arguments of 64-bit code that you have more registers? I never was short of regs in 32-bit code, so I really can't understand why suddenly we need three or four times as many...