Author Topic: Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?  (Read 3969 times)

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #15 on: March 06, 2021, 08:28:11 AM »
> The point is that RCX and RDX has other qualities (or privileges) than R10 and R11.

The catch here is you have to go back a very long way with your instruction choice to find these things. With the exception of the MOVS STOS and SCAS which have special case circuitry due to their popularity over a long period, these fixed register instructions are mainly old junk that is best avoided in 64 bit Windows as they are often very slow.

Now as far as FASTCALL, in this context of Microsoft Windows x64 ABI for Windows 10, its specifications are known by most programmers who write 64 bit Windows code and it is not some open sauce binding set of rules that bridge across multiple OS versions. The UNIX guys have their own specs and I have no doubt that other platforms have theirs as well but there is little in common between them.

I mean seriously, who cares what you need for MAC on MIPS ?

I have no doubt that you well understand the Win64 x64 ABI but your description of how it works shows the cross influence of working across different OS types and platforms.

Here is a simple byte copy procedure using REP MOVSB.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

 NOSTACKFRAME

 bcopy proc

  ; rcx = src
  ; rdx = dst
  ; r8  = count

    mov r11, rsi
    mov r10, rdi

    mov rsi, rcx
    mov rdi, rdx
    mov rcx, r8

    rep movsb

    mov rsi, r11
    mov rdi, r10

    ret

 bcopy endp

 STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

No stack frame and using some of the extra registers rather than using LOCALs to preserve RSI and RDI and arguments read at the proc end directly from registers in accordance with the Microsoft 64 bit FASTCALL ABI without the need to read shadow space.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

HSE

  • Member
  • *****
  • Posts: 1741
  • <AMD>< 7-32>
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #16 on: March 06, 2021, 09:02:58 AM »
Hi Hutch!

Microsoft 64 bit FASTCALL ABI without the need to read shadow space.

I readed an article by Microsoft from twelve years ago that say that in 64 bits there is only one calling convention, the (Microsoft) "x64  calling convention". Fastcall it's not part of the name. They only say: is a fastcall-like calling convention. That is a lot more clear for me that, you know, I know very little :biggrin:   


hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #17 on: March 06, 2021, 09:08:42 AM »
Hi Hector,

I think its a case of a rose by any other name still has thorns.  :tongue:
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

nidud

  • Member
  • *****
  • Posts: 2212
    • https://github.com/nidud/asmc
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #18 on: March 06, 2021, 10:50:54 AM »
> The point is that RCX and RDX has other qualities (or privileges) than R10 and R11.

The catch here is you have to go back a very long way with your instruction choice to find these things. With the exception of the MOVS STOS and SCAS which have special case circuitry due to their popularity over a long period, these fixed register instructions are mainly old junk that is best avoided in 64 bit Windows as they are often very slow.

True, newer CPU's actually favors the MOVS instructions over XMM but the usage and functionality of the shift/rotate instructions are the same, so nothing has changed in that regard. MOVSX is a relatively new instruction that didn't exist back in the DOS area. An immediate operand was also added to most instructions using CL as count.

    SHL, SHR, ROR, ROL, SAL, SAR, RCL, RCR, SHLD, SHRD, ...

Quote
Now as far as FASTCALL, in this context of Microsoft Windows x64 ABI for Windows 10, its specifications are known by most programmers who write 64 bit Windows code and it is not some open sauce binding set of rules that bridge across multiple OS versions. The UNIX guys have their own specs and I have no doubt that other platforms have theirs as well but there is little in common between them.

As for the calling convention the stack layout are exactly the same as a C-stack in 16 bit. The register part is mainly to create dependency between caller and callee to enforce alignment, and offset some of the bloat created in the process.

However, the arguments here was right-to-left versus left-to-right and the consequences of choosing one over the other. Or more to the point, the origin of the idea of the choice made.

daydreamer

  • Member
  • *****
  • Posts: 1719
  • building nextdoor
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #19 on: March 06, 2021, 11:57:10 PM »
> The point is that RCX and RDX has other qualities (or privileges) than R10 and R11.

The catch here is you have to go back a very long way with your instruction choice to find these things. With the exception of the MOVS STOS and SCAS which have special case circuitry due to their popularity over a long period, these fixed register instructions are mainly old junk that is best avoided in 64 bit Windows as they are often very slow.

True, newer CPU's actually favors the MOVS instructions over XMM but the usage and functionality of the shift/rotate instructions are the same, so nothing has changed in that regard. MOVSX is a relatively new instruction that didn't exist back in the DOS area. An immediate operand was also added to most instructions using CL as count.

    SHL, SHR, ROR, ROL, SAL, SAR, RCL, RCR, SHLD, SHRD, ...


isnt in DOS 16bit minimum size MOVS,STOS instructions smallest
they are wrapping MOV [edi],[esi] and inc alternative dec instructions
my theory when they went from old hardware wired with good old opcode->circuitry,to modern microcode architechture,longer snippets of microcode is needed to perform all those instructions,compared to simpler modern mov instructions
with 64bit opcodes+many of them +8bytes of immediate,adress its pointless to use it as smaller size instruction
SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #20 on: March 07, 2021, 08:45:11 AM »
> However, the arguments here was right-to-left versus left-to-right and the consequences of choosing one over the other. Or more to the point, the origin of the idea of the choice made.

The argument was about you use of PUSH ORDER done by association with a collection of technologies that belong in the past. I make the same point, Windows x64 ABI only requires the WRITE order to be correct, there is no PUSH ORDER. For the obvious reason, I don't see the association between DOS C to x64 Windows and that is because there is none. You may also note that under DOS in a non-re-entrant environment there were no formal calling conventions as it did not matter, they all did their own thing.

Win32 had a PUSH ORDER using push/call notation via STDCALL, Win x64 writes addresses to locations on the stack WITHOUT CHANGING THE STACK ADDRESS.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 11550
  • Assembler is fun ;-)
    • MasmBasic
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #21 on: March 07, 2021, 09:08:45 AM »
Win32 had a PUSH ORDER using push/call notation via STDCALL, Win x64 writes addresses to locations on the stack WITHOUT CHANGING THE STACK ADDRESS.

Hutch, I think we all know that we are not talking about a "push order" in the old Win32 sense...

as long as you don't over write registers with other registers.

Yep, and that's the only reason why the "push order" matters a little bit.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #22 on: March 07, 2021, 12:16:13 PM »
I have no doubt that most DO understand how the win32 push order works but I complained about it being applied to windows x64 because its simply wrong. Anyone who can read the Win64 ABI know that values are written to the 1st 4 registers and depending on how the call is done, the same 4 registers are written to shadow space then all other arguments are written after that.

The difference is MOV rather than PUSH, PUSH changes the stack, MOV does not and that is the spec for Windows x64.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 11550
  • Assembler is fun ;-)
    • MasmBasic
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #23 on: March 07, 2021, 02:34:13 PM »
values are written to the 1st 4 registers and depending on how the call is done, the same 4 registers are written to shadow space then all other arguments are written after that.

My understanding is that the 4 registers rcx rdx r8 r9 are not written to shadow space before the call; it is the called proc's task to write them to shadow space if needed.

IMHO it is better to deal first with the higher arguments, in order to be able to use rcx rdx r8 r9 also as arguments. Afterwards, the first 4 arguments can be assigned to rcx rdx r8 r9 without risking to overwrite arguments.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #24 on: March 07, 2021, 05:36:53 PM »
Interesting view but its not part of the x64 ABI. The ABI only specifies the first 4 registers for the first 4 arguments and after the first 4 argument shadow spaces locations, any following arguments written to the 64 bit locations. Its a mistake that many make trying to append their own rules to an existing specification.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

nidud

  • Member
  • *****
  • Posts: 2212
    • https://github.com/nidud/asmc
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #25 on: March 07, 2021, 11:36:55 PM »
 :biggrin:

Semantics.
The ABI defines the stack frame, not how you allocate it. What the Microsoft documentation says about the convention is this:
Quote
This convention simplifies support for unprototyped C-language functions and vararg C/C++ functions.
and, given this is a regular C-stack frame:
Quote
Remaining arguments get pushed on the stack in right-to-left order.

This is how Microsoft populate the stack frame, as in:
Quote
The register part is mainly to create dependency between caller and callee to enforce alignment,

    sub     rsp, 56 ; 6*8 + 8 for alignment
    mov     DWORD PTR [rsp+40], 6
    mov     DWORD PTR [rsp+32], 5
    mov     r9d, 4
    mov     r8d, 3
    mov     edx, 2
    mov     ecx, 1
    call    int foo(int,int,int,int,int,int)
    foo:
        mov     qword ptr [rsp+20H], r9
        mov     qword ptr [rsp+18H], r8
        mov     qword ptr [rsp+10H], rdx
        mov     qword ptr [rsp+8H], rcx
        ; The C-stack is now:
        ; [rsp+0*8] return address
        ; [rsp+1*8] arg(1) --> shadow space
        ; [rsp+2*8] arg(2)
        ; [rsp+3*8] arg(3)
        ; [rsp+4*8] arg(4) <--
        ; [rsp+5*8] arg(5)
        ; [rsp+6*8] arg(6)
        ...
        ret
    add     rsp, 56 ; C-stack clean-up

Same as:

    push    rbp     ; align 16
    mov     rbp,rsp
    push    6
    push    5
    sub     rsp,4*8 ; shadow space
    mov     r9d, 4
    mov     r8d, 3
    mov     edx, 2
    mov     ecx, 1
    call    int foo(int,int,int,int,int,int)
    ...

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #26 on: March 08, 2021, 01:58:38 AM »
You are mixing the creation of a stack frame with passing arguments to a procedure. From a calling procedure, the stack must be aligned for another procedure call due to the CALL mnemonic and the RET mnemonic, this is why you provide multiple stack frame designs for different tasks.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 11550
  • Assembler is fun ;-)
    • MasmBasic
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #27 on: March 08, 2021, 02:44:39 AM »
Anyone who can read the Win64 ABI know that values are written to the 1st 4 registers and depending on how the call is done, the same 4 registers are written to shadow space then all other arguments are written after that.

The caller does not write the regs to shadow space. The callee may write them to shadow space.

https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160
Quote
The caller is responsible for allocating space for the callee's parameters. The caller must always allocate sufficient space to store four register parameters, even if the callee doesn't take that many parameters.

nidud

  • Member
  • *****
  • Posts: 2212
    • https://github.com/nidud/asmc
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #28 on: March 08, 2021, 02:52:54 AM »
You are mixing the creation of a stack frame with passing arguments to a procedure.

Difficult to avoid since they're interlinked.

Quote
From a calling procedure, the stack must be aligned for another procedure call due to the CALL mnemonic and the RET mnemonic, this is why you provide multiple stack frame designs for different tasks.

There's not multiple stack frame designs here. The caller creates a regular C stack.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: 64-bit: Why Can't I get "CreateFileA" to Access a File or Device?
« Reply #29 on: March 08, 2021, 03:59:40 AM »
Are you saying you cannot produce varieties of stack frames. I can routinely do it in MASM.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy: