Passing arguments by registers has been with us for nearly eternity in computer terms from before DOS interrupts to current the win64 ABI. What I have had in mind in what is now the twilight of Win32 (even if its a long twilight) is producing a consistent system that is Win32 ABI compliant that can pick up the performance gains and in some contexts simplicity gains of passing up to 3 arguments in registers (EAX, ECX & EDX). An added advantage is the ability to use EBP without the complicated ESP adjustments that are required with no stack frame stack based procedures.
I have done some benchmarking on what to do with procedures that have more than 3 arguments passed to them and while registers are the fastest, global variables are nearly as fast and clearly faster than using the stack. As registers are themselves global in scope, the use of global scope memory operands is unproblematic, all it would require is a list of global scope variables included in the app and you could have as many as you like.
I have attached a simple test library that handles a number of single argument procedures, arg in EAX, result in EAX. The spec I had in mind was using EAX, ECX & EDX in that order for the first 3 args then global scope variables for any other args. Return values would be in the same 3 registers, EAX, ECX and EDX in that order.