News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Playtime with ML64 and a question on spill space.

Started by hutch--, June 23, 2016, 01:52:21 PM

Previous topic - Next topic

Mikl__

Vou ver o que pode ser feito com veh-exemplo...

mineiro

beleza, eu lembro que fiz uma divisão por zero para causar um erro intencional na época em que estava me aventurando com win64, inserí alguns nop's antes e depois da instrução para ter um limite de endereços para trabalhar.
abraços.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

rrr314159

hutch, I thought you'd appreciate a little humor to lighten your day! But the main reason I didn't answer, couldn't find my previous posts from long ago that went into this; and don't want to get into a long discussion about a trivial error I might make, recalling how it goes. Anyway - mineiro is right, but here's my take on it (with probably a trivial error).

The ABI fastcall allows up to four parameters passed in registers rcx, rdx, r8, r9. After that they go on the stack. But the strange thing is you must allow four spaces on the stack even though you don't send any data in them. Called spill or shadow space. The called routine can use that space to store the four registers if they want. It's hokey but that's MS for you.

The other requirement is that when you call, the stack must be on 16-bit boundary, ???????0h. The call will put the return address on stack and jump to called routine. So when that routine starts, it will be on 8h. As long as everyone follows the rules it will always be that way. So the same thing has occurred in your own routine: when it started, you're on an 8-boundary. Therefore you have to add one more dword to get to 0h. That's why you need 5 8's in all: 40, or 28h. One of them is to round it up to 0h, then 4 (20h) for the actual spill space.

You mentioned it works only with exactly that number; no, it's ok with 38h, 48h, etc; but you have to adjust stack afterwards, before returning from your routine.

The reason for insisting on this standard alignment is that XMM registers must go on the stack at even boundaries; some of the instructions need that.

It's important to note the following fact, which has tripped up many people. When I was learning I found long threads on StackExchange (or whatever) that never did get this point straight. MessageBox is one of the few simple functions that really does insist on this alignment! printf, for instance, does not. So if you experiment on many other simple calls you wind up thinking you have more latitude. But then MessageBox will get you; and, some others. Best to follow the rules at all times; although, for convenience and speed, my code breaks this rule often - when I know all subsequent calls will be "safe".

Why does MessageBox behave like that? I don't doubt it's because they make a call to a window routine to put up that box. Whereas most other basic functions don't, and their code just never uses XMM registers.

There are other mistakes in all tutorials you'll see, which I'll mention briefly. They say all floating points are passed in XMM's. No, they're often passed in the GPRs. For instance printf gets floats from GPRs and will ignore any data you send in XMM. Also VARARG is handled specially. I found one ref somewhere on MS that explained that correctly. Other MS pages, and (iirc) all others, got it wrong. I actually don't remember the details. See the way I did it in my nvk Macro, "Yet Another Macro" post, it's about 40 posts ago in 64-bit forum. There was also a post a year ago, or so, where I answered all this in detail. It's not on 64-bit forum though, because OP (I think it was fearless?) asked the q. somewhere else. Generally, you could do a lot worse than simply review all my 64-bit work from that period.
I am NaN ;)

jj2007

#18
Quote from: rrr314159 on June 24, 2016, 12:21:10 AM"Yet Another Macro"

It's here: http://masm32.com/board/index.php?topic=3988.msg42003#msg42003

It might be helpful to have a sticky post in the 64-bit forum with a "Hello World" archive containing
- basic includes (kernel, user, msvcrt, ...), or at least exact links to them
- basic libs, or at least exact links to them
- a batch file that takes an asm file as argument
- a link to the version of ml/jwasm/hjwasm/asmc that works with the hello world
- a link to a free 64-bit debugger

So far, I find it far too confusing to even start playing with 64-bit code 8)

mineiro

#19
I really don't get the point about fastcall calling convention. They said that's because speed, ok, I agree, parameters on registers is really quickly. But you should move registers to stack, so where's the gain? Only to use rbp as a normal register because rip relative addressing?
You're not forced to move parameters to stack, but this becames bad habits.
So, why not code as a stdcall, where you push things on stack and adjust that after function is callled if more than 4 parameter? On linux is the same thing I suppose, the difference is that have 6 registers instead of 4. So, why bother about rsp alignment? I really don't get the point about fastcall.
Try a wsprintf with 7 parameters and an error can happen, this way we lost precious memory to alignt stack and to nothing.

---edited----
C calling convention instead of stdcall as I said before.
And more, reading that topic about 32 bits versus 64 bits I think that everybody agree that does not have a real gain from one to another, only on specific types of code (overhead removed). So my conclusion is that programs to 64 bits eats more memory and do not have a real gain.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

hutch--

This is just playing with ML64 macros.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    OPTION DOTNAME
   
    option casemap:none

    include \masm64\include\win64.inc
    include \masm64\include\temphls.inc

    include \masm64\include\kernel32.inc
    include \masm64\include\user32.inc
    include \masm64\include\msvcrt.inc

    includelib \masm64\lib\user32.lib   
    includelib \masm64\lib\kernel32.lib
    includelib \masm64\lib\msvcrt.lib

; char *_itoa(
;    int value,
;    char *str,
;    int radix

    buff$ MACRO valu
      LOCAL buffer,pbuf
      .data?
        buffer db 32 dup (?)
      .data
        pbuf dq buffer
      .code
      invoke _itoa,valu,pbuf,10
      EXITM <pbuf>
    ENDM

    falloc MACRO bsize
      invoke GlobalAlloc,GMEM_FIXED,bsize
      EXITM <rax>
    ENDM

    fxfree MACRO hndl
      invoke GlobalFree,hndl
    ENDM

    appexit MACRO valu
      invoke ExitProcess,0
    ENDM

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

  .data
    pttl db "Memory Address",0

  .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL pMem  :QWORD

    push rbp

    mov pMem, falloc(1024*1024*1024*8)              ; allocate fixed memory
    invoke MessageBox,0,buff$(pMem),ADDR pttl,0     ; display string of memory value
    fxfree pMem                                     ; release memory
    appexit 0                                       ; exit the process

    pop rbp

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment #

    https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx

    Volatile
    rax rcx rdx r8 r9 r10 r11

    Non Volatile
    r12 r13 r14 r15 rdi rsi rbx rbp rsp

    Volotile
    xmm0 ymmo
    xmm1 ymm1
    xmm2 ymm2
    xmm3 ymm3
    xmm4 ymm4
    xmm5 ymm5

    Nonvolatile (XMM), Volatile (upper half of YMM)
    xmm6-15
    ymm6-15

#

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

  end


The register data is ALA Microsoft.

mineiro

Structures should be padded to 8 (on each member?, if yes a lot of them should be done by hands, I think assemblers get total size of structure instead of each member size), so, pointers are all 8 bytes, but handles are not I suppose. If speed is the argument as they say, put all types to qwords make more sense, but it's not this way. But again I think, not sure, lea instruction (addr) deals with dword addressing size on long mode (x86-64) and offset deals with qwords. Procedures should be aligned to 16 to favor use of xmm/ymm.
One doubt, whats the minimum machine that can be used to win64? I was reading about SSE2 as minimum, not all machines have ymm registers and instructions set.
too much headcache, good adventures.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

rrr314159

mineiro, SSE2 is all you need.

As for fastcall, if you just remember that MS did a lousy job with the ABI it all makes sense. As you say advantage of passing values in rcx, rdx, r8, r9 is somewhat negated by necessity of stack manipulation, and so forth. But remember, with your own code you don't have to follow any conventions at all. Only when interfacing with MS or other outside entities. In your own code, you can take advantage of those extra registers to almost eliminate passing anything on the stack. And of course ignore alignment except when using instructions (like some SSE XMM instructions) that demand it.
I am NaN ;)

rrr314159

Quote from: jj2007 on June 24, 2016, 01:11:22 AMIt might be helpful to have a sticky post in the 64-bit forum with a "Hello World" archive containing ...

Only problem with that idea, it sounds suspiciously similar to work

Quote from: jj2007So far, I find it far too confusing to even start playing with 64-bit code

For the type of thing that MasmBasic does 64-bit is a lot of work for essentially no gain. You "join the 64-bit world", and someday when everybody has mega-gigs of RAM it might be necessary, but apart from that there is no added functionality. It just bloats the code and slows it down a tiny bit.

There are two main reasons why it's bad, neither of them having to do with Intel, or the basic concept of 64-bit. First, MS did a lousy job with the ABI and much of the rest of their implementation. Second there's no "masm64", so 64-bit asm is not standardized. For instance MikL and I use different libraries and a couple other less important differences so are not immediately compatible.

All these bad points disappear when writing code for yourself (more or less). All my code obeys MS interface where it must, maybe 10%. In the rest I can freely use the extra registers and qword-manipulation capability. It all works great, easy to learn, big advantage especially for math routines, but also graphics and, lesser extent, any other code.

So for non-production code, for your own use, or distributed only to (more or less) friends, I extremely recommend getting into 64-bit. But for the type of thing most people here do, like you and hutch, not. It's still worth learning about but only as a dull chore, so you can keep up with the times. For MS-production coding it's pretty much an unalloyed negative.

If you, hutch and similar want to make something good out of it, well worth considering is working with Habran to make HJWasm a de facto standard. It could be developed to be the core of a "masm64.com" site.
I am NaN ;)

habran

Thanks rrr314159 :t
You are downright rational thinker 8)

hutch and jj2007, if you find the FASTCALL to compex, brace yourself for the VECTORCALL which is coming in the next release of HJWasm.
The VECTORCALL will also work in x86.
I am sure that rrr314159 and qWORD will embrace it ;)
Here is MSDN introduction what is it about:
Quote
In addition to SIMD data types, Vector Calling Convention can also be
used for Homogeneous Vector Aggregate data-type (HVA) and Homogeneous Float Aggregate data-type (HFA).
An HVA/HFA data-type is a composite type where all of fundamental data types of members that compose
the type are the same and are of Vector or Floating Point data type. (__m128, __m256,__512 float/double).
An HVA/HFA data type can have at most four members.
Cod-Father

mineiro

Hmm, yes sir rrr314159, you're right. Inside scope of our program we can do anything and if need to talk with other calling conventions/abi we follow that rules. This way can have a real gain to justify fastcall.
I like to read your opinion, so, to you, whats the best way to release a library (like masm32 lib)? Maybe a (or many) prologue and epilogue function(s) to deal with other languages/abi and an internal calling convention to assembly programmers?

Sir habran, you have any plans to continue support to linux?
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

fearless

Quote from: jj2007 on June 24, 2016, 01:11:22 AM
- basic includes (kernel, user, msvcrt, ...), or at least exact links to them
- basic libs, or at least exact links to them

Some information related to these might be found in this post: JWasm64 with RadASM - http://masm32.com/board/index.php?topic=4162.msg44176#msg44176

Quote from: jj2007 on June 24, 2016, 01:11:22 AM
- a link to a free 64-bit debugger
http://x64dbg.com/#start
Latest snapshots are available from here: https://github.com/x64dbg/x64dbg/releases

Some additional bits and pieces i played around with ive uploaded to bitbucket: https://bitbucket.org/mrfearless/jwasm64-with-radasm and https://bitbucket.org/mrfearless/debug64-for-jwasm64 - related post (http://masm32.com/board/index.php?topic=4203.msg44670#msg44670)

I also started a port of some of the functions from the masm32.lib for x64 a while ago: https://bitbucket.org/LetTheLightIn/masm64-library

Any and all can be downloaded, modified etc - they are a work in progress, or a starting point for some other enterprising fella to continue on with.

habran

Sir mineiro ;)
I can assure you that my co-developer sir Johnsa will make it work for linux 8)
Cod-Father

habran

Excellent job mr. fearless :t
However, when I tied to download RadAsm from your link this is what I get:
QuoteThe site ahead contains harmful programs

Attackers on www.assembly.com.br might attempt to trick you into installing programs that harm your browsing experience (for example, by changing your homepage or showing extra ads on sites you visit).
Cod-Father

rrr314159

Quote from: mineiro on June 24, 2016, 09:29:47 AM
I like to read your opinion, so, to you, whats the best way to release a library (like masm32 lib)? Maybe a (or many) prologue and epilogue function(s) to deal with other languages/abi and an internal calling convention to assembly programmers?

Hi mineiro, since you ask,

Avoid complexity. I gather you want to support both Windows and Linux - that's already a fair amount of complexity in the interface. Don't forget, not only do you have to program it, but also produce documentation; and your users have to understand it.

If you want one function to serve for both, you should just use the simplest approach. My guess is, that would mean doing it the Linux way and providing prologue / epilogue to translate to Windows. But if the best way is to develop your own internal methods and translate to both OS's, then fine do that. But in that case I wouldn't publish your internal ABI so others can use it. Then you're locked in to that definition, and also must provide documentation and support. Every error they find can be a big headache: minimize the ways they can access the code to produce errors.

Perhaps you've already done this sort of thing and know all about it, in which case my opinion is superfluous. I've directed many software interfaces on Navy projects but have almost no professional experience in commercial projects. FWIW, when planning a project I always emphasize one thing: simplicity. Not speed or anything else. Everything tends to be a lot more complex than you thought at first, don't add any "extras". That can be done in version 2.

So bottom line - decide what interfaces you MUST support, decide the simplest way to do that, don't publish any more interfaces (like your internal ABI) than necessary.

An alternative might be only publish your own, "improved", ABI, and leave translation for Windows and Linux to external routines.

The "KISS" principle: "Keep it simple, sailor!"
I am NaN ;)