PUSH ADDR in x64

Yuri · June 26, 2013, 02:56:27 AM

There is one paragraph in the GoAsm help file I don't quite understand:

Quote
PUSH immediate on the AMD64 takes a 32-bit immediate (number) value and sign extends bit 31 into all higher bits. There is no single instruction capable of taking a 64-bit immediate value and PUSHing that onto the stack. For this reason PUSH ADDR THING is not a recognised instruction on the AMD64 (the offset value is treated as an immediate). The problem here is that the actual immediate value of any particular offset is unknown until link-time, and at assemble-time it is impossible for the assembler to know whether the offset is above 7FFFFFFFh and so would be affected by the sign extension. Therefore in GoAsm, PUSH ADDR THING is actually coded as:-

PUSH RAX
MOV [RSP],ADDR THING

According to the Intel docs, both instructions, PUSH and MOV, sign extend the 32-bit immediate:

Quote
PUSH imm32 Push sign-extended imm32.
MOV r/m64,imm32 Move imm32 sign extended to 64-bits to r/m64

So what does the above trick actually do? It's longer by 4 bytes, so maybe they could be saved.

dedndave · June 26, 2013, 05:33:43 AM

what they are saying is, there is no instruction to push 64-bit immediates
however, you can push a 32-bit immediate value, which is sign extended
addresses in 64-bit may or may not fit into 32 bits

so, they PUSH RAX, which creates 64 bits of space on the stack (similar to SUB RSP,64 or LEA RSP,[RSP-64])
then, they fill that space with the 64-bit address

qWord · June 26, 2013, 06:07:49 AM

Quote from: dedndave on June 26, 2013, 05:33:43 AMso, they PUSH RAX, which creates 64 bits of space on the stack (similar to SUB RSP,64 or LEA RSP,[RSP-64])
then, they fill that space with the 64-bit address

there is no MOV mem64,imm64 encoding.
maybe the following sequence is meant:

Code Select

xor eax,eax                ; 2 byte
...
push rax                   ; 8 byte
mov DWORD ptr [rsp],offset ;

vs.

Code Select

mov rax,imm64 ;
push rax      ; 11 bytes

Yuri · June 26, 2013, 12:40:08 PM

Quote from: qWord on June 26, 2013, 06:07:49 AM
there is no MOV mem64,imm64 encoding.

Yes, exactly. And there seems to be no difference in the resulting value on the stack between pushing RAX and then moving a 32-bit immediate to [RSP] and simply pushing a 32-bit immediate. But the latter would be 4 bytes shorter.

Code Select


0000000000401001: 50                       push rax
0000000000401002: 48 C7 04 24 00 20 40 00  mov qword ptr [rsp],402000h
                 
000000000040100A: 68 00 20 40 00           push 402000h

Sign extension can be a problem for addresses above 7FFFFFFF, but the trick with MOV doesn't seem to solve it. We still have to handle it manually.

wjr · June 27, 2013, 12:07:23 PM

Note that the above also affects push and INVOKE with "..." and <...>. At least it does not do this with push ADDR of a FRAME parameter or LOCAL variable (which does push rbp then add [rsp],sign-extended imm32displacement).

Correct, in comparing the two, the shorter version could be used. However, that still does not address the potential address sign-extension issue in larger programs or those with a higher Image Base. The following does not change the size of the current instructions, but covers the address issue to a certain extent:

Code Select


6A00		 	push	0			;imm8 sign-extended to 64-bits
C70424[000000000]	mov	D[RSP],ADDR Symbol	;imm32 mov to mem32

I should be able to do this change soon. This looks good from the point view of source for 32-bit output with the /x86 switch and 64-bit output with the /x64 switch (the above would be done the normal way for the 32-bit case).

From a more pure 64-bit point of view, ADDR still has 32-bit limitations. I shall look into the mov reg64, ADDR instruction. The above may need changing to the larger 11 byte sequence qWord mentions above, but this has other implications with use of a register (MASM INVOKE can use eax, GoAsm INVOKE so far does not)...

Yuri · June 27, 2013, 06:01:15 PM

Quote from: wjr on June 27, 2013, 12:07:23 PM
Note that the above also affects push and INVOKE with "..." and <...>.

Actually this is what set me thinking about passing addresses in x64. For label addresses, the first 4 parameters are OK because GoAsm uses LEA, and we can use LEA manually to handle the rest of them, but neither works for string literals. Assigning each and every short string a label can be quite annoying. Maybe it makes sense to implement LEA RAX,"String"? Of course, automatic address handling for all parameters would be ideal.

I was also going to suggest automatic substitution of LEA RAX,[Label] for MOV RAX,ADDR Label. In x86 mode GoAsm already does the opposite replacement, so it has a precedent.

I don't know what are the odds of a DLL being loaded above the first 4 GB, but at least technically this is possible. I tried to do so and it worked, although GoLink refused to accept such a large base address and I had to use a hex editor to change it manually.

wjr · June 28, 2013, 01:51:06 PM

I tried that too. For now, probably best to keep GoLink with an Image Base address at 32-bits. Setting it to 80000000h will allow the above ADDR sign-extension problems to occur. With the GoLink /DYNAMICBASE option to allow ASLR, it looks like the OS still picks an Image Base below this. However, I see that there is an even newer /HIGHENTROPYVA Link option used with /DYNAMICBASE for 64-bit ASLR which most likely has a much larger range (for now, not available with GoLink).

Yes, there is a similar sign-extend problem with mov reg64, ADDR Label which also affects string literals. There would be two solutions for this:

Code Select


	lea	reg64,[Label]	;7 bytes
;or
	mov	reg32,imm32	;5 bytes, zero-extends 32-bit ADDR to 64-bits

Between the two, it looks like the second shorter one would be better. This is because the first one as is, if I understand correctly, would use RIP-relative addressing and the displacement would be +/- 2GB which may not be sufficient in a very large program in the 2-4GB range.

So with these fixes ADDR will provide the correct 64-bit address. However, the Image Base and program size will for now still require ADDR to fit within a 32-bit value.

Yuri · June 28, 2013, 03:18:24 PM

Quote from: wjr on June 28, 2013, 01:51:06 PM
This is because the first one as is, if I understand correctly, would use RIP-relative addressing and the displacement would be +/- 2GB which may not be sufficient in a very large program in the 2-4GB range.

According to the MS PE and COFF specification, an image can't be larger than 2 GB:

Quote
PE32+ images allow for a 64-bit address space while limiting the image size to 2 gigabytes.

I think this is exactly because RIP relative addressing should be able to reach any place from any other place, even if one is at the beginning and the other at the end of the image.

wjr · June 29, 2013, 04:19:02 AM

My understanding has been corrected by -2GB (will also correct GoAsm help RIP-relative addressing section where both sizes are mentioned). Still, between the two, I'm opting for second one based on shorter size.

Spoke too soon... one other case with mov mem32/64, ADDR. This would appear to be using RIP-addressing but the mem32/64 address has not been adjusted to give the proper displacement for this (so currently does not work, either giving an exception or a mov way off by amount of Image Base).

With these fixed soon, we could have an /x64 EXE just under 2GB with a DLL loaded at 80000000h also around 2GB... good enough for now. For future consideration would be an option to have ADDR actually use 64-bits. This would involve modifying ADDR related instructions with an increase in size by 4+ in some cases with use of eax.

Yuri · June 29, 2013, 12:26:38 PM

Quote from: wjr on June 29, 2013, 04:19:02 AM
Still, between the two, I'm opting for second one based on shorter size.

I agree that this is more practical for now. If a necessity arises to handle larger addresses, LEA is always available.

Actually not all Windows code can work properly with a DLL loaded above 4GB (at least on my Win7 SP1). Neither JScript nor VBScript were able to create an object from a COM server DLL because they treated function pointers in the Class Factory VTable as 32-bit, ignoring the upper part.

wjr · July 08, 2013, 03:25:51 AM

I shall post a message once this is eventually up. I still have a few things to check and update the help file, but getting closer:

PUSH ADDR... had an easy fix.
MOV reg64,ADDR... didn't go as planned with the shorter option. This was a little more challenging to trace, but there ended up being an easy fix which used the longer LEA option (this was originally intended).
MOV mem64,ADDR... was harder to trace, dealing with multiple relocations, and has been partially fixed. The remaining problem here is that this instruction is still a sign extended imm32.

In the second one, LEA is the way to go since it uses RIP-relative addressing. The imm32 is a signed displacement within +/-2GB from the starting RIP for the next instruction, so there would be no problem here in having the Image Base somewhere >4GB.

I wanted to avoid using a register, but for the first and third cases to work similarly, one would need to be used. Although ML INVOKE can use EAX (no comparison with ML64, INVOKE revoked), this is not a good choice since RAX is used as a return value and that could be needed in a following call. If I were to take this next step for x64, how about using R11?

Code Select


4C8D1D[00000000]	lea r11,[x]	;PUSH ADDR x
4153			push r11

4C8D1D[00000000]	lea r11,[x]	;MOV [y],ADDR x
4C891D[00000000]	mov [y],r11

No impact on the more direct 32-bit code which works as usual. Same for x86 which can't use R11, so no conflict if also used for x64. One would need to be aware of this for newer x64 code taking advantage of the extra registers (the first one would proceed as usual for the case of ADDR with a FRAME ARG or LOCAL and not use R11).

Yuri · July 08, 2013, 01:30:06 PM

I was about to suggest using R11 too. All code I now write is switchable, so I almost never use the new registers. Of course, it may not be the case for others, so I am only speaking for myself.

What will happen to the value in R11 if we pass, move or push an address? Will it be lost or temporarily stored somewhere and moved back? If it was only about Invoke, it would not be a great problem to remember about the damage to R11's content, but with three cases it looks easier to forget.

wjr · July 09, 2013, 04:04:28 PM

I would be fine with losing it, R11 that is.

This would only occur in the 1st and 3rd cases where the ADDR (or address of "..." or <...>) is being stored in memory (stack). However, as mentioned, for the 1st case R11 would not be used for a FRAME ARG or LOCAL. Also, R11 would not be used if the ADDR (or address of "..." or <...>) was one of the first four parameters of INVOKE since it uses registers RCX RDX R8 R9 for these...

R11 would not be used for the 2nd case where the ADDR is being moved (changed to LEA) into a register. So, if R11 was being used for something else, or just to avoid some of the above... this would be the way out, do it yourself, specify another register to move ADDR to, and use that register instead.

This is important with an 8TB virtual address space. I am seeing a typical x64 Image Base of 140000000h which will quickly point out these bigger problems with pointers...

Yuri · July 10, 2013, 01:03:35 AM

Yes, I understand. By three cases I meant Invoke, explicit PUSH, and MOV to memory. Personally I have nothing against R11 being special. Actually it must be easier to remember the special status of one register than the need to count the parameters of every Invoke to figure out whether an address goes beyond the 4th one and requires a LEA.

wjr · July 16, 2013, 03:23:03 PM

Very soon now... the R11 adjustments look good, I am just making sure the relocation records are also good.

Although not directly related to making ADDR capable of handling a larger address, there are some instructions which do not use RIP-relative addressing, for example memory using a base register and/or index register with a label address (ex. [rdx*4+MyTable]). For these to work properly, the Image Base would need to be well below 7FFFFFFFh. These instructions would still need to be manually adjusted to handle a larger Image Base (ex. [rax+rdx*4] with rax as ADDR of MyTable).

Currently for x64, GoLink sets the LARGEADDRESSAWARE flag. Because of the above, I think it would be better if this was not set by default, and GoLink had a /LARGEADDRESSAWARE option which would then set this flag, giving a default Image Base of 140000000h (and allowing other >4GB addresses for the /BASE option). This could also be useful for a 32-bit program which on a 64-bit version of Windows would be given a 4GB address space instead of 2GB.

The MASM Forum

News:

PUSH ADDR in x64

Yuri

dedndave

qWord

Yuri

wjr

Yuri

wjr

Yuri

wjr

Yuri

wjr

Yuri

wjr

Yuri

wjr