The problem as I see it is that you are shifting across a large variety of calling conventions and effectively blurring the distinctions. We all know how 32 bit Windows STDCALL worked, push/call notation and the specs are publically available for Microsoft x64 Windows and there is no leakage across the two.
The Microsoft x64 ABI FASTCALL does not use PUSH at all and the notion of a PUSH order may be convenient when you are dealing with a multitude of different calling conventions but it misrepresents the 64 bit Windows FASTCALL. Now as I am sure that you understand the M$ FASTCALL, the obvious is that when you write the 1st 4 args to registers,
mov rcx, 1
mov rdx, 2
mov r8, 3
mov r9, 4
is the same as
mov r8, 3
mov rcx, 1
mov r9, 4
mov rdx, 2
While the identical data is written to the 1st 4 registers, they are not written in the same order as it simply does not matter, the 4 arguments are written to the correct registers. The idea of PUSH order is incorrect here, its the WRITE order that matters. With the 5th and higher arguments written to the stack after the 4 shadow space positions, they don't have to be written in PUSH order either. For each 64 bit location on the stack, if you write the correct data to each 64 bit location, you can write it in any order you like.