hutch, I thought you'd appreciate a little humor to lighten your day! But the main reason I didn't answer, couldn't find my previous posts from long ago that went into this; and don't want to get into a long discussion about a trivial error I might make, recalling how it goes. Anyway - mineiro is right, but here's my take on it (with probably a trivial error).
The ABI fastcall allows up to four parameters passed in registers rcx, rdx, r8, r9. After that they go on the stack. But the strange thing is you must allow four spaces on the stack even though you don't send any data in them. Called spill or shadow space. The called routine can use that space to store the four registers if they want. It's hokey but that's MS for you.
The other requirement is that when you call, the stack must be on 16-bit boundary, ???????0h. The call will put the return address on stack and jump to called routine. So when that routine starts, it will be on 8h. As long as everyone follows the rules it will always be that way. So the same thing has occurred in your own routine: when it started, you're on an 8-boundary. Therefore you have to add one more dword to get to 0h. That's why you need 5 8's in all: 40, or 28h. One of them is to round it up to 0h, then 4 (20h) for the actual spill space.
You mentioned it works only with exactly that number; no, it's ok with 38h, 48h, etc; but you have to adjust stack afterwards, before returning from your routine.
The reason for insisting on this standard alignment is that XMM registers must go on the stack at even boundaries; some of the instructions need that.
It's important to note the following fact, which has tripped up many people. When I was learning I found long threads on StackExchange (or whatever) that never did get this point straight. MessageBox is one of the few simple functions that really does insist on this alignment! printf, for instance, does not. So if you experiment on many other simple calls you wind up thinking you have more latitude. But then MessageBox will get you; and, some others. Best to follow the rules at all times; although, for convenience and speed, my code breaks this rule often - when I know all subsequent calls will be "safe".
Why does MessageBox behave like that? I don't doubt it's because they make a call to a window routine to put up that box. Whereas most other basic functions don't, and their code just never uses XMM registers.
There are other mistakes in all tutorials you'll see, which I'll mention briefly. They say all floating points are passed in XMM's. No, they're often passed in the GPRs. For instance printf gets floats from GPRs and will ignore any data you send in XMM. Also VARARG is handled specially. I found one ref somewhere on MS that explained that correctly. Other MS pages, and (iirc) all others, got it wrong. I actually don't remember the details. See the way I did it in my nvk Macro, "Yet Another Macro" post, it's about 40 posts ago in 64-bit forum. There was also a post a year ago, or so, where I answered all this in detail. It's not on 64-bit forum though, because OP (I think it was fearless?) asked the q. somewhere else. Generally, you could do a lot worse than simply review all my 64-bit work from that period.