Author Topic: Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?  (Read 3970 times)

nidud

  • Member
  • *****
  • Posts: 2212
    • https://github.com/nidud/asmc
it uses the first 4 registers then WRITES arguments to sequential addresses on the stack.

Microsoft and GCC do this in reverse, in push-order which is right-to-left.

    invoke foo,1,2,3,4,ecx,edx,eax

right-to-left:
        mov     dword ptr [rsp+30H], eax
        mov     dword ptr [rsp+28H], edx
        mov     dword ptr [rsp+20H], ecx
        mov     r9d, 4                 
        mov     r8d, 3                 
        mov     edx, 2                 
        mov     ecx, 1                 
        call    foo                     

left-to-right:
        mov     rcx, 1
        mov     rdx, 2
        mov     r8, 3
        mov     r9, 4
        mov     dword ptr [rsp+20H], ecx
        mov     dword ptr [rsp+28H], edx
        mov     dword ptr [rsp+30H], eax
        call    foo                     

jj2007

  • Member
  • *****
  • Posts: 11550
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #1 on: March 04, 2021, 06:23:53 AM »
Microsoft and GCC do this in reverse, in push-order which is right-to-left

Normally, the order is not an issue - except if you use those registers as arguments which are used for the 4 fastcall arguments:
Code: [Select]
include \Masm32\MasmBasic\Res\JBasic.inc ; ## builds in 32- or 64-bit mode with UAsm, ML, AsmC ##
BytesWritten dq ?
Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
  jinvoke DeleteFile, Chr$("CreateFile.opened")
  if 1
mov eax, GENERIC_WRITE
mov ecx, FILE_SHARE_WRITE
mov edx, CREATE_ALWAYS
mov r8, FILE_ATTRIBUTE_NORMAL

jinvoke CreateFile, Chr$("CreateFile.opened"), eax, ecx, NULL, edx, r8, 0
  else
jinvoke CreateFile, Chr$("CreateFile.opened"), GENERIC_WRITE, FILE_SHARE_WRITE, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0
  endif
  xchg rax, rbx
  PrintLine Err$()
  jinvoke WriteFile, rbx, Chr$("Hello World, how are you?"), c$Len, addr BytesWritten, 0
  jinvoke CloseHandle, rbx
EndOfCode

Under the hood:
Code: [Select]
int3                                                      |
mov eax,40000000                                          |
mov ecx,2                                                 |
mov edx,2                                                 |
mov r8,80                                                 |

and qword ptr ss:[rsp+30],0                               |
mov r10,r8                                                |
mov qword ptr ss:[rsp+28],r10                             |
mov r10d,edx                                              |
mov qword ptr ss:[rsp+20],r10                             |
xor r9d,r9d                                               |
mov r8d,ecx                                               |
mov edx,eax                                               |
lea rcx,qword ptr ds:[1400013A9]                          | 1400013A9:"CreateFile.opened"
call qword ptr ds:[<sub_140001740>]                       |

Obviously, left-to-right order would fail when passing arguments in rcx and rdx.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #2 on: March 04, 2021, 06:32:43 AM »
Write order is not specific, only argument location.

        mov     dword ptr [rsp+28H], edx
        mov     dword ptr [rsp+30H], eax
        mov     dword ptr [rsp+20H], ecx
        mov     r9d, 4                 
        mov     r8d, 3                 
        mov     edx, 2                 
        mov     ecx, 1                 
        call    foo


Microsoft and GCC do this in reverse, in push-order which is right-to-left.

x64 does not have a PUSH order.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

nidud

  • Member
  • *****
  • Posts: 2212
    • https://github.com/nidud/asmc
Re: Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #3 on: March 04, 2021, 08:02:16 AM »
x64 does not have a PUSH order.

 :biggrin:

Of course it has. This is the main part of the convention which determine the position of each argument on the stack for a function call.

In the original convention for Windows the push-order was (and still is) left-to-right which prohibit the use of a variable argument size (VARARG). The push-order in a C-stack is right-to-left which is used in most conventions today.

Note that push-order don't necessarily refer to the PUSH instruction as arguments could be larger than stack size.

foo proc a:zword, b:dword, c:dword
    mov eax,b
    mov edx,c
    ret
foo endp

main proc
    invoke foo,zmm0,2,3
    ret
main endp

foo:
        mov     qword ptr [rsp+88H], r8
        mov     qword ptr [rsp+48H], rdx
        push    rbp                     
        mov     rbp, rsp               
        sub     rsp, 32                 
        mov     eax, dword ptr [rbp+50H]
        mov     edx, dword ptr [rbp+90H]
        leave                           
        ret                             
main:
        sub     rsp, 264 ; 4*64+8
        mov     r8d, 3                 
        mov     edx, 2                 
        call    foo                     
        add     rsp, 264               
        ret                             

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #4 on: March 04, 2021, 10:54:06 AM »
> Of course it has. This is the main part of the convention which determine the position of each argument on the stack for a function call.

You may have a deviant terminology but I suggest that it has more to do with you supporting 32 bit STDCALL than 64 bit FASTCALL. With the x64 ABI you can have left to right, up to down, inside out and upside down or even a random distribution, as long as all the arguments end up in the right sequence, it will work but PUSH order, forget it.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 11550
  • Assembler is fun ;-)
    • MasmBasic
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #5 on: March 04, 2021, 12:28:43 PM »
With the x64 ABI you can have left to right, up to down, inside out and upside down or even a random distribution, as long as all the arguments end up in the right sequence, it will work

See Reply #1. Remember the "register gets overwritten" error?

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #6 on: March 05, 2021, 12:22:21 AM »
Yes, it was a bad example but I understood what he was aiming at. It seems that many paddling around trying to understand x64 FASTCALL have yet to fully understand how it works.

This is among the reasons why shadow space is required in some situations but the general drift is don't use any of the first four registers in the first four arguments.

The layout of x64 FASTCALL is primarily designed for a 64 bit C compiler but MASM can be configured to properly use x64 FASTCALL.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

nidud

  • Member
  • *****
  • Posts: 2212
    • https://github.com/nidud/asmc
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #7 on: March 05, 2021, 01:14:22 AM »
> Of course it has. This is the main part of the convention which determine the position of each argument on the stack for a function call.

You may have a deviant terminology but I suggest that it has more to do with you supporting 32 bit STDCALL than 64 bit FASTCALL. With the x64 ABI you can have left to right, up to down, inside out and upside down or even a random distribution, as long as all the arguments end up in the right sequence, it will work but PUSH order, forget it.

 :biggrin:

Well, it goes a bit deeper than that. The Binary Interface covers a wide array of conventions. The most commonly used in a typical Windows DLL is THISCALL, FASTCALL, and VECTORCALL for public functions in addition to an array of other conventions used internally.

So as for me supporting 32 bit STDCALL than 64 bit FASTCALL is a bit misguided (I will assume you mean Asmc here) as Asmc also support 64 bit C, STDCALL, PASCAL and a whole array of other calling conventions. Here's a sample using a few of these conventions.

asmc64 -pe test.asm

Code: [Select]
include stdio.inc

.code

foo_p proc pascal a:dword, b:dword, c:dword, d:dword, e:dword, f:dword
    printf("args: %d,%d,%d,%d,%d,%d\n", a,b,c,d,e,f )
    ret
foo_p endp

foo_c proc c a:dword, b:dword, c:dword, d:dword, e:dword, f:dword
    printf("args: %d,%d,%d,%d,%d,%d\n", a,b,c,d,e,f )
    ret
foo_c endp

foo_s proc stdcall a:dword, b:dword, c:dword, d:dword, e:dword, f:dword
    printf("args: %d,%d,%d,%d,%d,%d\n", a,b,c,d,e,f )
    ret
foo_s endp

foo_f proc fastcall a:dword, b:dword, c:dword, d:dword, e:dword, f:dword
    printf("args: %d,%d,%d,%d,%d,%d\n", a,b,c,d,e,f )
    ret
foo_f endp

main proc
    foo_p(1,2,3,4,5,6)
    foo_c(1,2,3,4,5,6)
    foo_s(1,2,3,4,5,6)
    foo_f(1,2,3,4,5,6)
    ret
main endp

    end main
Output:

args: 1,2,3,4,5,6
args: 1,2,3,4,5,6
args: 1,2,3,4,5,6
args: 1,2,3,4,5,6

Disassembly:
Code: [Select]
FOO_P   PROC
        push    rbp                                     
        mov     rbp, rsp                               
        sub     rsp, 64                                 
        mov     eax, dword ptr [rbp+10H]               
        mov     dword ptr [rsp+30H], eax               
        mov     eax, dword ptr [rbp+18H]               
        mov     dword ptr [rsp+28H], eax               
        mov     eax, dword ptr [rbp+20H]               
        mov     dword ptr [rsp+20H], eax               
        mov     r9d, dword ptr [rbp+28H]               
        mov     r8d, dword ptr [rbp+30H]               
        mov     edx, dword ptr [rbp+38H]               
        lea     rcx, [DS0000]                           
        call    printf                                 
        leave                                           
        ret     48                                     
FOO_P   ENDP

_foo_c  PROC
        push    rbp                                     
        mov     rbp, rsp                               
        sub     rsp, 64                                 
        mov     eax, dword ptr [rbp+38H]               
        mov     dword ptr [rsp+30H], eax               
        mov     eax, dword ptr [rbp+30H]               
        mov     dword ptr [rsp+28H], eax               
        mov     eax, dword ptr [rbp+28H]               
        mov     dword ptr [rsp+20H], eax               
        mov     r9d, dword ptr [rbp+20H]               
        mov     r8d, dword ptr [rbp+18H]               
        mov     edx, dword ptr [rbp+10H]               
        lea     rcx, [DS0000]                           
        call    printf                                 
        leave                                           
        ret                                             
_foo_c  ENDP

_foo_s@48 PROC
        push    rbp                                     
        mov     rbp, rsp                               
        sub     rsp, 64                                 
        mov     eax, dword ptr [rbp+38H]               
        mov     dword ptr [rsp+30H], eax               
        mov     eax, dword ptr [rbp+30H]               
        mov     dword ptr [rsp+28H], eax               
        mov     eax, dword ptr [rbp+28H]               
        mov     dword ptr [rsp+20H], eax               
        mov     r9d, dword ptr [rbp+20H]               
        mov     r8d, dword ptr [rbp+18H]               
        mov     edx, dword ptr [rbp+10H]               
        lea     rcx, [DS0000]                           
        call    printf                                 
        leave                                           
        ret     48                                     
_foo_s@48 ENDP

foo_f   PROC
        mov     qword ptr [rsp+20H], r9                 
        mov     qword ptr [rsp+18H], r8                 
        mov     qword ptr [rsp+10H], rdx               
        mov     qword ptr [rsp+8H], rcx                 
        push    rbp                                     
        mov     rbp, rsp                               
        sub     rsp, 64                                 
        mov     eax, dword ptr [rbp+38H]               
        mov     dword ptr [rsp+30H], eax               
        mov     eax, dword ptr [rbp+30H]               
        mov     dword ptr [rsp+28H], eax               
        mov     eax, dword ptr [rbp+28H]               
        mov     dword ptr [rsp+20H], eax               
        mov     r9d, dword ptr [rbp+20H]               
        mov     r8d, dword ptr [rbp+18H]               
        mov     edx, dword ptr [rbp+10H]               
        lea     rcx, [DS0000]                           
        call    printf                                 
        leave                                           
        ret                                             
foo_f   ENDP

main    PROC
        sub     rsp, 56                                 
        push    1                                       
        push    2                                       
        push    3                                       
        push    4                                       
        push    5                                       
        push    6                                       
        call    FOO_P                                   
        push    6                                       
        push    5                                       
        push    4                                       
        push    3                                       
        push    2                                       
        push    1                                       
        call    _foo_c                                 
        add     rsp, 48                                 
        push    6                                       
        push    5                                       
        push    4                                       
        push    3                                       
        push    2                                       
        push    1                                       
        call    _foo_s@48                               
        mov     dword ptr [rsp+28H], 6                 
        mov     dword ptr [rsp+20H], 5                 
        mov     r9d, 4                                 
        mov     r8d, 3                                 
        mov     edx, 2                                 
        mov     ecx, 1                                 
        call    foo_f                                   
        add     rsp, 56                                 
        ret                                             
main    ENDP

Note that this compile, link, and run on Windows so it adhere to the x64 ABI. As for push-order they all use a C-stack except the first one (PASCAL) which is left-to-right. The reason the word C-stack is used to describe FASTCALL is that they are identical so a C call may be rendered like this:

        sub     rsp, 6*8
        mov     dword ptr [rsp+5*8], 6
        mov     dword ptr [rsp+4*8], 5
        mov     dword ptr [rsp+3*8], 4
        mov     dword ptr [rsp+2*8], 3
        mov     dword ptr [rsp+1*8], 2
        mov     dword ptr [rsp+0*8], 1
        call    foo_c
        add     rsp, 6*8

And FASTCALL like this:

        mov     r9d, 4                                 
        mov     r8d, 3                                 
        mov     edx, 2                                 
        mov     ecx, 1                                 
        push    6
        push    5                                       
        push    r9
        push    r8
        push    rdx
        push    rcx
        call    foo_f
        add     rsp, 6*8

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #8 on: March 05, 2021, 12:28:45 PM »
The problem as I see it is that you are shifting across a large variety of calling conventions and effectively blurring the distinctions. We all know how 32 bit Windows STDCALL worked, push/call notation and the specs are publically available for Microsoft x64 Windows and there is no leakage across the two.

The Microsoft x64 ABI FASTCALL does not use PUSH at all and the notion of a PUSH order may be convenient when you are dealing with a multitude of different calling conventions but it misrepresents the 64 bit Windows FASTCALL. Now as I am sure that you understand the M$ FASTCALL, the obvious is that when you write the 1st 4 args to registers,

mov rcx, 1
mov rdx, 2
mov r8, 3
mov r9, 4

is the same as

mov r8, 3
mov rcx, 1
mov r9, 4
mov rdx, 2

While the identical data is written to the 1st 4 registers, they are not written in the same order as it simply does not matter, the 4 arguments are written to the correct registers. The idea of PUSH order is incorrect here, its the WRITE order that matters. With the 5th and higher arguments written to the stack after the 4 shadow space positions, they don't have to be written in PUSH order either. For each 64 bit location on the stack, if you write the correct data to each 64 bit location, you can write it in any order you like.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

nidud

  • Member
  • *****
  • Posts: 2212
    • https://github.com/nidud/asmc
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #9 on: March 05, 2021, 09:55:56 PM »
It actually changes everything and it's done so by design. RCX is the most important register as it is used as a counter (hence the C) and operand for various shift operations. RDX is also a better choice than R8..Rn as it has two byte registers. Compilers will therefor use (spill) RAX, RCX, and RDX for loading arguments from the right and assign value to RCX at the end.

Assigning values from the left also means you only have RAX, R10, and R11 available for the argument list on the right. This changes the way you think and thus have an impact on the code produced.

    foo(1,2,3,4,rax,r10,r11)

versus

    foo(1,2,3,4,rax,rcx,rdx,r8,r9,r10,r11)


hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #10 on: March 05, 2021, 10:52:05 PM »
 :biggrin:
Quote
It actually changes everything and it's done so by design. RCX is the most important register as it is used as a counter (hence the C) and operand for various shift operations. RDX is also a better choice than R8..Rn as it has two byte registers. Compilers will therefor use (spill) RAX, RCX, and RDX for loading arguments from the right and assign value to RCX at the end.
This is MS-DOS level thinking and it is not x64 Windows ABI compliant.

ax = accumulator
bx = base address
cx = counter
dx = data
si = source index
di = destination address
sp = stack pointer
bp = base pointer

With Win64 ABI you are free from this ancient thinking, in the 1st 4 registers in FASTCALL you can put anything you like as long as you don't over write registers with other registers.

Now when you make a call in 64 bit Windows FASTCALL, you have written values to the first 4 registers but in the procedure that you are calling you can directly use them, assign them to locals, copy them to other registers and even save some of the system registers and save them there.

There is no reason why a compiler cannot be fully ABI compliant, slopping data around with unsound assumptions produces junk code, something a compiler should not do.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 11550
  • Assembler is fun ;-)
    • MasmBasic
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #11 on: March 05, 2021, 11:29:49 PM »
as long as you don't over write registers with other registers.

Yep, and that's the only reason why the "push order" matters a little bit.

nidud

  • Member
  • *****
  • Posts: 2212
    • https://github.com/nidud/asmc
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #12 on: March 06, 2021, 01:20:46 AM »
:biggrin:
Quote
It actually changes everything and it's done so by design. RCX is the most important register as it is used as a counter (hence the C) and operand for various shift operations. RDX is also a better choice than R8..Rn as it has two byte registers. Compilers will therefor use (spill) RAX, RCX, and RDX for loading arguments from the right and assign value to RCX at the end.
This is MS-DOS level thinking and it is not x64 Windows ABI compliant.

ax = accumulator
bx = base address
cx = counter
dx = data
si = source index
di = destination address
sp = stack pointer
bp = base pointer


 :biggrin:

The point is that RCX and RDX has other qualities (or privileges) than R10 and R11.

    movsx edx,ah        ; works
    shrd rax,rdx,cl

    movsx r10d,ah       ; fails
    shrd rax,rdx,r10b

And if anybody should have any doubt, RCX is still the counter, RSI source, RDI destination, and RSP the stack pointer.

    mov rsi,source
    mov rdi,destination
    mov rcx,count
    rep movsb

Quote
With Win64 ABI you are free from this ancient thinking, in the 1st 4 registers in FASTCALL you can put anything you like as long as you don't over write registers with other registers.

If you render the call manually, sure.

As for the ABI this refers to the binary used by the loader, debugger and (AV) scanners. The debugger will use the stack position for arguments and not the registers as this is not a "real" FASTCALL convention.

TimoVJL

  • Member
  • ****
  • Posts: 723
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #13 on: March 06, 2021, 01:34:59 AM »
Windows x64 ABI and C/C++ rules, so be calm.

BTW:
Nice feature of programming, when we old people go to pension, our software still runs, even after we die or vanish :smiley:
Hardware workers are in better case, their work are usable a lot of longer.
« Last Edit: March 06, 2021, 02:48:23 AM by TimoVJL »
May the source be with you

HSE

  • Member
  • *****
  • Posts: 1741
  • <AMD>< 7-32>
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #14 on: March 06, 2021, 01:59:52 AM »
as this is not a "real" FASTCALL convention.

Things begin to make sense  :thumbsup:

Thanks.