While progress is being made on a GoLink update (done, see version 1.0), I have started more planning for GoAsm. For x64 exception handling, FRAME should be providing pdata RUNTIME_FUNCTION info and xdata UNWIND_INFO on the function's prolog stack usage (work in progress). Version 0.58 started with some FRAME and USEDATA adjustments required for this, and a few more are on the way for your info:
x86/x64 LOCALFREE (done, see version 0.59)
Since this stack re-allocation will usually alter that which was recorded in the prolog UNWIND_INFO, LOCALFREE would be changed to not be allowed for x86 and x64 modes (only allowed for regular 32-bit code).
x64 ARGD ARGQ INVOKE (done, see version 0.62)
Since GoAsm does not do type checking, ARGD and ARGQ need to be used with INVOKE in order to identify whether an argument is a 32-bit single-precision (ARGD) or a 64-bit double-precision (ARGQ) floating-point value. These variations of ARG will use the appropriate XMM0-3 register. A similar extension for arguments passed on the same line after INVOKE has also been included with the format D or Q followed by a space character and then the argument.
ARGD xmm0
ARGD [FP_DD]
ARGD eax
ARGD 4.32 ;uses r11
INVOKE Procedure
INVOKE Procedure, D 4.32, D eax, D [FP_DD], D xmm0
ARGQ xmm0
ARGQ [FP_DQ]
ARGQ rax
ARGQ 8.65 ;uses r11
INVOKE Procedure
INVOKE Procedure, Q 8.65, Q rax, Q [FP_DQ], Q xmm0
x64 SHADOW (done, see version 0.62)
Saving regular registers to the shadow space is done automatically with FRAME. Since GoAsm does not do type checking, SHADOW needs to be used with FRAME in order to identify a floating-point parameter passed in an XMM0-3 register. This can also be used to omit saving one or more of these registers, typically in a smaller procedure that will use the register directly, before a possible function call that would overwrite them (use of parameter name will then not have the proper value since it does not get saved to the Shadow space).
SHADOW xmm0, xmm1, r8, r9 ;a mixed register example
SHADOW ;no registers specified, no registers saved
x64 USES (done, see version 0.62)
The xmm registers should be allowed with USES within FRAME...ENDF. However, since the xmm registers would be dealt with differently by allocating stack space (properly aligned for x64) and using movaps (movups for x86), it is clearer (when comparing to the list file output) to use a separate USES line placed after the regular register USES line (if used). Note that because of the stack allocation adjusting LOCAL, if USES has an xmm register listed, the USES line(s) will need to be placed before any LOCAL line(s).
A function based on the less common USES...ENDU remains available for 32-bit, but for /x86 and /x64 this will now need to be changed (since it is not a leaf function) to a USES statement within FRAME...ENDF for upcoming exception handling.
x86/x64 USEDATA
Currently there is a default SHIELDSIZE (100h bytes for 32-bit, 200h bytes for 64-bit, also with a minimum of 20h) which allows for possible stack usage in the FRAME beyond that which was done in the prolog since GoAsm does not track this. There is a several step adjustment to RSP which is a fixed amount relative to RBP, but the amount relative to RSP is not known by GoAsm at assembly-time, and this is required for assembly-time UNWIND_INFO.
The change here would be to alter default the SHIELDSIZE to 4h bytes for x86 and 8h bytes for x64 (unchanged for regular 32-bit code). These would also be the new minimum values which take into account just the return value on the stack for the call in INVOKE. The USEDATA prolog would be simplified, and the epilog change would have RSP restored using the LEA valid format for SEH. Note that this may require adjusting current x86/x64 code by specifying a different SHIELDSIZE value for the amount of stack space actually used (ex. for manually supplying arguments to a USEDATA procedure, along with the return value for the call). See also the following...
x64 INVOKE
The current x64 approach is somewhat similar to 32-bit code which pushes function arguments on the stack, except for the first 4 which are passed through registers (a stack allocation for these 4 are still always made for each INVOKE, and removed after the call). Due to stricter x64 stack 16-byte alignment for a call, there is also extra code added with each x64 INVOKE.
This is where the above USEDATA change can get a bit tricky. Depending upon the USES and LOCAL lists, there may be an extra 8-bytes for stack alignment, and if so they need to be included in the SHIELDSIZE.
Because of this USEDATA related issue and also for better optimised and easier to follow code for x64 INVOKE, there is another approach. This involves creating enough stack space in advance for the arguments (properly 16-byte aligned) and reuse these for each INVOKE using mov for the non-register arguments (for ARG [Label] this 'memory to memory' mov would most likely go through the R11 register). However, in order to use this, you would need to ensure that when you use x64 INVOKE the stack has not changed from where it was after the prolog, which still may require adjusting current x86/x64 code.
x64 FRAME / USEDATA / LOCAL
The above would only be available on a per x64 FRAME (or USEDATA) basis by specifying the maximum number of ARGs (or more if you want to waste stack space, or not be bothered with error messages) that will be used by INVOKE within the FRAME (or USEDATA). GoAsm is a single pass assembler and does not look that far in advance, but would give an error message with the ARG count if INVOKE exceeds the maximum.
The syntax to achieve this would be added to LOCAL with the keyword ARGS and a required [N], placed at the end of the LOCAL list, where N is the maximum number of arguments allowed for any INVOKE within the FRAME or USEDATA procedure (the minimum N would be 4 for the required shadow space). For example:
LOCAL Loc1, Loc2, ARGS[4] ;placed at end on same line
;also
LOCAL Loc1, Loc2
LOCAL ARGS[8] ;placed at end on a separate line
The use of ARGS would not alter x86 mode stack allocation or INVOKE.
That is it for now...