Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Upcoming GoAsm x86/x64 changes

Started by wjr, September 08, 2013, 09:27:37 AM

Previous topic - Next topic


While progress is being made on a GoLink update (done, see version 1.0), I have started more planning for GoAsm. For x64 exception handling, FRAME should be providing pdata RUNTIME_FUNCTION info and xdata UNWIND_INFO on the function's prolog stack usage (work in progress). Version 0.58 started with some FRAME and USEDATA adjustments required for this, and a few more are on the way for your info:

x86/x64 LOCALFREE (done, see version 0.59)
Since this stack re-allocation will usually alter that which was recorded in the prolog UNWIND_INFO, LOCALFREE would be changed to not be allowed for x86 and x64 modes (only allowed for regular 32-bit code).

x64 ARGD ARGQ INVOKE (done, see version 0.62)
Since GoAsm does not do type checking, ARGD and ARGQ need to be used with INVOKE in order to identify whether an argument is a 32-bit single-precision (ARGD) or a 64-bit double-precision (ARGQ) floating-point value. These variations of ARG will use the appropriate XMM0-3 register. A similar extension for arguments passed on the same line after INVOKE has also been included with the format D or Q followed by a space character and then the argument.

ARGD xmm0
ARGD eax
ARGD 4.32 ;uses r11
INVOKE Procedure

INVOKE Procedure, D 4.32, D eax, D [FP_DD], D xmm0

ARGQ xmm0
ARGQ rax
ARGQ 8.65 ;uses r11
INVOKE Procedure

INVOKE Procedure, Q 8.65, Q rax, Q [FP_DQ], Q xmm0

x64 SHADOW (done, see version 0.62)
Saving regular registers to the shadow space is done automatically with FRAME. Since GoAsm does not do type checking, SHADOW needs to be used with FRAME in order to identify a floating-point parameter passed in an XMM0-3 register. This can also be used to omit saving one or more of these registers, typically in a smaller procedure that will use the register directly, before a possible function call that would overwrite them (use of parameter name will then not have the proper value since it does not get saved to the Shadow space).

SHADOW xmm0, xmm1, r8, r9 ;a mixed register example

SHADOW ;no registers specified, no registers saved

x64 USES (done, see version 0.62)
The xmm registers should be allowed with USES within FRAME...ENDF. However, since the xmm registers would be dealt with differently by allocating stack space (properly aligned for x64) and using movaps (movups for x86), it is clearer (when comparing to the list file output) to use a separate USES line placed after the regular register USES line (if used). Note that because of the stack allocation adjusting LOCAL, if USES has an xmm register listed, the USES line(s) will need to be placed before any LOCAL line(s).

A function based on the less common USES...ENDU remains available for 32-bit, but for /x86 and /x64 this will now need to be changed (since it is not a leaf function) to a USES statement within FRAME...ENDF for upcoming exception handling.

x86/x64 USEDATA
Currently there is a default SHIELDSIZE (100h bytes for 32-bit, 200h bytes for 64-bit, also with a minimum of 20h) which allows for possible stack usage in the FRAME beyond that which was done in the prolog since GoAsm does not track this. There is a several step adjustment to RSP which is a fixed amount relative to RBP, but the amount relative to RSP is not known by GoAsm at assembly-time, and this is required for assembly-time UNWIND_INFO.

The change here would be to alter default the SHIELDSIZE to 4h bytes for x86 and 8h bytes for x64 (unchanged for regular 32-bit code). These would also be the new minimum values which take into account just the return value on the stack for the call in INVOKE. The USEDATA prolog would be simplified, and the epilog change would have RSP restored using the LEA valid format for SEH. Note that this may require adjusting current x86/x64 code by specifying a different SHIELDSIZE value for the amount of stack space actually used (ex. for manually supplying arguments to a USEDATA procedure, along with the return value for the call). See also the following...

The current x64 approach is somewhat similar to 32-bit code which pushes function arguments on the stack, except for the first 4 which are passed through registers (a stack allocation for these 4 are still always made for each INVOKE, and removed after the call). Due to stricter x64 stack 16-byte alignment for a call, there is also extra code added with each x64 INVOKE.

This is where the above USEDATA change can get a bit tricky. Depending upon the USES and LOCAL lists, there may be an extra 8-bytes for stack alignment, and if so they need to be included in the SHIELDSIZE.

Because of this USEDATA related issue and also for better optimised and easier to follow code for x64 INVOKE, there is another approach. This involves creating enough stack space in advance for the arguments (properly 16-byte aligned) and reuse these for each INVOKE using mov for the non-register arguments (for ARG [Label] this 'memory to memory' mov would most likely go through the R11 register). However, in order to use this, you would need to ensure that when you use x64 INVOKE the stack has not changed from where it was after the prolog, which still may require adjusting current x86/x64 code.

The above would only be available on a per x64 FRAME (or USEDATA) basis by specifying the maximum number of ARGs (or more if you want to waste stack space, or not be bothered with error messages) that will be used by INVOKE within the FRAME (or USEDATA). GoAsm is a single pass assembler and does not look that far in advance, but would give an error message with the ARG count if INVOKE exceeds the maximum.

The syntax to achieve this would be added to LOCAL with the keyword ARGS and a required [N], placed at the end of the LOCAL list, where N is the maximum number of arguments allowed for any INVOKE within the FRAME or USEDATA procedure (the minimum N would be 4 for the required shadow space). For example:

LOCAL Loc1, Loc2, ARGS[4] ;placed at end on same line


LOCAL Loc1, Loc2
LOCAL ARGS[8] ;placed at end on a separate line

The use of ARGS would not alter x86 mode stack allocation or INVOKE.

That is it for now...


Shifting main focus back to GoAsm...


Other progress didn't make it into version 0.59. Back to regularly scheduled programming...


Hi Wayne,

I was wondering if there is a chance we may see improved (enhanced) source listings for GoAsm.  As it is now, listings are cropped at 80-columns, which is quite limiting, especially if comments are taken into consideration.  More times than not, a source listing is all one needs for de-bugging, and I frequently use listings in place of GoBug for the simpler things (and also really like to see the translation into binary).  Also, as it is now, listings at the assembler stage are only partially helpful for viewing the binary, so would a linker stage listing be better in this respect?

GoAsm is really a great assembler and I work with it literally every day, but am wondering about this one topic and if there is anything planned for the future and what would be involved in tackling it.  If not, is there a workaround I might implement on my own?

Thank you.


The listing file will have plenty of [00000000] since the address relocations are done at a later stage with linking. Having those values filled in would be more accurate along with instruction and label addresses showing, but having the linker do this would most likely involve GoAsm passing line number information (COFF line number info has been deprecated). No easy work around to implement on your own that I can think of.

The intent may have been to limit for printer friendlier output, but more characters can be fit in per line going with a smaller font and landscape orientation.

The current typical listing line format is 26 characters for opcodes plus 74 characters from the source file (that is 100 characters; perhaps your text editor is cropping this further to 80 characters for printing).

I made some small changes already to the listing file in an earlier version for display of code that GoAsm inserts, and have considered an adjustment which would extend the opcode section to at least 32 characters (also displaying a bit more in the data section, ex. strings). I took a quick look, and it should be very easy to extend the portion copied from the source file. Going with around 98 characters (an extra 24) seems reasonable to fit a total of 130 characters at font size 9 with landscape orientation and narrower margins.

As for the opcode adjustment, instead of left justified, I was considering some spacing and lining up so that it would be easier to identify portions of the instruction encoding (various Prefixes, Opcode(s), ModRM, SIB, Displacement, and Immediate values).


Thanks for the response.  Actually, I did later notice it was cropped at 100 columns, not 80, and my confusion was from using two different vertical edge settings (one in the editor for writing source, and the other in Notepad2 for opening up .asm files quickly  :icon_redface: ).  My source, including comments, does extend to 130 columns.  One can live with using printouts from source text in landscape mode and de-bugging with the .lst file.

Anything you feel might help with GoAsm listings will certainly be appreciated here.

Thanks again,


The next update will have some changes to the list file and also include support for SSE3 and SSSE3 instructions. Relatively soon, depending on how far I can go with the list file changes...


Great news wjr.
keep it up, GoASM is getting better and better!



Maybe you can explain the following.

I am building a x64 bit dialogbox.
What I am wondering is the following on other x64 bit processes all system dll's are loaded into the 7ffc'........ range being 64 bit addresses.
on my x64 bit program only ntdll is inside 64 bit range range all other system dll's are below that.
Could you explain why that is?
I tested Jeremy's HelloWorld2 program and on that program all system dll's are in 64 bit range.


Use the GoLink /LARGEADDRESSAWARE command line switch.

This assumes that you have made any necessary 64-bit pointer changes in your program. For example, instructions accessing memory using a base register and/or index register with a label address (ex. [rdx*4+MyTable]) do not use RIP-relative addressing, so this would need to be adjusted to handle a larger Image Base (ex. [rax+rdx*4] with rax as ADDR of MyTable).


SSE4 looks fairly straight forward to add as well...


It has been a while. I added to the above to-do list with ARGD, ARGQ, and SHADOW, but have already done these in a Beta version available here Although GoAsm version 0.62 will mainly update x64 floating-point issues, still good if a few others can test things out in advance since it has touched upon INVOKE processing.

BetaG also includes an encoding fix for MOVQ r64,xmm and MOVQ xmm/m64,xmm as well as improved List file output for ARGs, INVOKE, and repeat instructions and data declarations. I still have some work to do on INVOKE error for register already used (mixed case of regular and XMM), and shorter coding for an immediate floating-point value of 0.0.

Not mentioned above, ARGD and ARGQ memory arguments typically do not need a type indicator, so an additional feature with one specified is that the memory assumed as an integer gets converted to a floating point value (CVTSI2SS or CVTSI2SD) in register XMM0-3 (or for arguments 5 and above in XMM5 then with MOVD or MOVQ and use of R11 pushed onto the stack).


So, this ARGD/Q feature can only be used in x64 code? What about floating point parameters in x86 mode?


In x86 mode, ARGD is similar to ARG and does the usual push, but ARGQ enhancements are under consideration...


Version 0.62 BetaH now available here which provides support for USES with XMM registers within FRAME...ENDF (for now, currently ignored within USEDATA...ENDU and USES...ENDU). I adjusted the initial post description, but a single USES statement can also be used with a mix of regular and XMM registers (still dealt with separately though).

As before, I still have some work to do on INVOKE error for register already used (mixed case of regular and XMM), shorter coding for an immediate floating-point value of 0.0, along with x86 ARGQ enhancements under consideration...