News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Test of a FASTCALL library.

Started by hutch--, January 30, 2016, 07:27:05 PM

Previous topic - Next topic

hutch--

Passing arguments by registers has been with us for nearly eternity in computer terms from before DOS interrupts to current the win64 ABI. What I have had in mind in what is now the twilight of Win32 (even if its a long twilight) is producing a consistent system that is Win32 ABI compliant that can pick up the performance gains and in some contexts simplicity gains of passing up to 3 arguments in registers (EAX, ECX & EDX). An added advantage is the ability to use EBP without the complicated ESP adjustments that are required with no stack frame stack based procedures.

I have done some benchmarking on what to do with procedures that have more than 3 arguments passed to them and while registers are the fastest, global variables are nearly as fast and clearly faster than using the stack. As registers are themselves global in scope, the use of global scope memory operands is unproblematic, all it would require is a list of global scope variables included in the app and you could have as many as you like.

I have attached a simple test library that handles a number of single argument procedures, arg in EAX, result in EAX. The spec I had in mind was using EAX, ECX & EDX in that order for the first 3 args then global scope variables for any other args. Return values would be in the same 3 registers, EAX, ECX and EDX in that order.

jj2007

Works fine :t

Of course, it requires a bit of programming discipline to ensure that the arguments, once the point of calling the proc is reached, are already in eax, ecx and edx.

hutch--

The attached test piece is a quick knife and fork of an algo in the masm32 library and it is to test a couple of things, a FASTPROC with 2 arguments and a MACRO to simulate a local in the uninitialised data section for a FASTPROC procedure that does not allocate locals on the stack. It calls the no stack version of "strlen?" to determine 2 lengths and scans through the source string counting the search text returning the result in EAX. The algo could probably do with a complete rewrite but it works OK as it is.