News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

atou_ex 64bit version

Started by fearless, April 25, 2015, 10:04:50 AM

Previous topic - Next topic

fearless

Ive modified Paul Dixon's orignal atou_ex function for use with 64bit JWasm64. Return value in rax. Done some basic tests and it seemed to work - so fingers crossed that it continues to. If anything looks out of place let me know.

.686
.MMX
.XMM
.x64

option casemap : none
option win64 : 11
option frame : auto
option stackbase : rsp

atou_ex  PROTO :QWORD

.code

;--------------------------------------------------------------------------------------------------------------------
; Convert ascii string pointed to by String param to unsigned qword value. Returns qword value in rax.
; Original code for atou_ex by Paul Dixon. Converted to 64bit support by fearless 2015
;--------------------------------------------------------------------------------------------------------------------

atou_ex PROC FRAME USES RCX RDX String:QWORD

  ; ------------------------------------------------
  ; Convert decimal string into UNSIGNED QWORD value
  ; ------------------------------------------------

    mov rdx, String

    xor rcx, rcx
    movzx rax, BYTE PTR [rdx]
    test rax, rax
    jz quit

    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+1]
    test rax, rax
    jz quit

    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+2]
    test rax, rax
    jz quit

    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+3]
    test rax, rax
    jz quit

    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+4]
    test rax, rax
    jz quit

    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+5]
    test rax, rax
    jz quit

    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+6]
    test rax, rax
    jz quit

    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+7]
    test rax, rax
    jz quit

    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+8]
    test rax, rax
    jz quit

    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+9]
    test rax, rax
    jz quit

    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+10]
    test rax, rax
    jz quit
   
    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+11]
    test rax, rax
    jz quit
   
    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+12]
    test rax, rax
    jz quit
   
    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+13]
    test rax, rax
    jz quit
   
    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+14]
    test rax, rax
    jz quit
   
    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+15]
    test rax, rax
    jz quit
   
    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+16]
    test rax, rax
    jz quit
   
    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+17]
    test rax, rax
    jz quit
   
    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+18]
    test rax, rax
    jz quit
   
    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+19]
    test rax, rax
    jz quit
   
    lea rcx, [rcx+rcx*4]
    lea rcx, [rax+rcx*2-48]
    movzx rax, BYTE PTR [rdx+20] ; if > 20 ascii chars in length will fall out.
    test rax, rax
    jnz out_of_range

  quit:
    lea rax, [rcx]      ; return value in RAX
    or rcx, -1          ; non zero in RCX for success
    ret

  out_of_range:
    xor rax, rax        ; zero return value on error
    xor rcx, rcx        ; zero in RCX is out of range error
    ret

atou_ex endp

END

Gunther

Hi fearless,

why do you use a stack frame? Is it necessary?

Gunther
You have to know the facts before you can distort them.

fearless

Hi Gunther,

Not sure if it is necessary, tbh i just added it in to most of the functions i created and seemed to compile ok, i just kept my fingers crossed. Maybe others will know if its needed or if there is other optional (maybe more optimal?) ways of calling/defining functions to use. Im still new to the 64bit stuff, so my fingers have been crossed a lot lately ;-)

Gunther

Hi  fearless,

Quote from: fearless on April 26, 2015, 07:19:59 AM
Not sure if it is necessary, tbh i just added it in to most of the functions i created and seemed to compile ok, i just kept my fingers crossed. Maybe others will know if its needed or if there is other optional (maybe more optimal?) ways of calling/defining functions to use. Im still new to the 64bit stuff, so my fingers have been crossed a lot lately ;-)

I think that by following the calling conventions, a stack frame isn't necessary. The details can be found here. Please notice the differences between MS calling convention and the System V AMD64 ABI. Also new is the  MS __vectorcall. It supports homogeneous vector aggregate (HVA) values. I haven't used that call until now, because it wasn't necessary. But in some situations it could be quite well.

Gunther
You have to know the facts before you can distort them.

rrr314159

Since fearless has no local variables, no stack frame will be generated anyway. Habran can correct me if wrong; certainly, it shouldn't be generated. So you can leave that option on always, will only have effect when it's needed. It's also involved with SEH, but I don't think that's relevant here.

In general stack frame is not necessary for fastcall convention. Arguments are (well, can be) pushed on stack, like 32-bit; main difference is first four are also in registers RCX etc. Stack frame, as I say, is for local variables and isn't directly concerned with calling conventions.

MS "vectorcall" - in assembler of course we can use SIMD's to pass args without Bill's gracious permission; I've been doing it as a matter of course.

U know, fearless, u don't have to "uses" RCX and RDX since they're volatile. In fact since rcx is used for (the first) argument, I don't think "uses" is of any use. And, since rcx already contains "String" it would make more sense to switch the roles of rcx and rdx in your code.

BTW fearless, (answering the other thread) no I don't have vKim 64-bit macros handy! Someday when I have spare time it would be an interesting project
I am NaN ;)

dedndave

local variables OR passed arguments will cause generation of a prologue/epilogue

nidud

#6
deleted

fearless

I think i was mainly using the FRAME as a just in case measure, not really understanding its inner working, but hoping that including it would be ok. I used it primarily when doing stuff for the x64dbg plugins, the callback functions crashed without the FRAME, so when i put it back in, i also ended up putting it everywhere else in other functions as well. Probably a habit of declaring USES X Y Z now as well, just in case  - even tho as you say the rcx rdx r8 and r9 are volitile and are used anyhow

When i was converting the orignal code for atou_ex, it used the edx and ecx registers, and so i didnt want to modify too heavily or change the flow of the code use of registers that much, i was hoping to just expand the registers to their full 64bit width, and adjust the start of the code that had the push ebp stuff (commented them out) to use the params directly. Seemed to work so just left it as listed above.

Ive had a look at the utoa_ex function as well, but so far its proved a little more troublesome - i can get it to work sort of, but only for the dword equivalent length of a number is changed to its ascii representation so far.

rrr314159

It's a little confusing ...

To me the word "stack frame" means using rbp (or, ebp) to reserve space below the entry point on the stack for local variables. (Or other purposes, see SEH). However I found an MSN reference which uses the word "frame" in the context of merely putting arguments on the stack - like dedndave (who's implying that just having a prologue/eliplogue constitutes a "frame"). So, let's ignore the language.

What JWASM does when option frame : auto is used - OR, not used - is this. If there are no local variables (or, other complications like RtlAddFunctionTable() - not applicable here; or non-leaf, see below) then it doesn't use rbp to create what I've traditionally called a frame. Instead it puts the first 4 args in rcx, rdx, r8, r9, reserve 20h byte on stack for them, any other args on stack above (ignoring other complications which aren't applicable here). When you do have locals JWasm creates an "rbp frame" whether the option is on or not. In other words, the option is ignored.

But when the function is not a leaf function, the option causes an "rbp frame", due to Habran's latest work. Again - NAH (not applicable here). I believe that's the only difference with the "auto" option. See Habran's posts in the JWasm development forum for examples.

Habran of course knows exactly so if I'm wrong pls correct me. We mortals can simply check the code generated and see what I mean. AFAIK fearless's code will compile the same with or without that option; and so will the invoke used to call it. I should check it myself but I'll take a chance on being wrong, I'm used to it.

nidud, not just C-proc, an assembler proc also needs to use the stack for arguments if you're using JWasm invoke, and following new ABI.

p.s. just checked JWasm, seems it also puts first args on stack, which is not required by ABI but makes sense.
I am NaN ;)

rrr314159

@fearless, I posted the above b4 seeing your post - sure your code will work as it is. Just makes more sense to use rcx for String since it's there already.

p.s. the frame has something to do with SEH (not sure what exactly) so makes sense that when working with a debugger it's necessary.
I am NaN ;)

nidud

#10
deleted

habran

Listen to rrr314159, he is a teacher ;)
Cod-Father

Gunther

Hi nidud,

Quote from: nidud on April 27, 2015, 05:12:37 AM
Is this correct ?

it's all described here and in post #3 above.

Gunther
You have to know the facts before you can distort them.

rrr314159

Listen to Habran, he is a doer, much better than a teacher

@nidud, no that's not quite it.

mov rcx,1 ; load regs with 4 args
mov rdx,2
mov r8,3
mov r9,4
sub rsp,8*8 ; reserve shadow space for first four, plus "real" space for last 4
mov [rsp+24],5 ; load stack with 4 args, 5 thru 8
mov [rsp+16],6
mov [rsp+8],7
mov [rsp+0],8
call proc8 ; make the call
add rsp,8*8 ; release stack

It could be done like this, modifying your example minimally. Although JWasm doesn't do it exactly that way, the effect is the same. Of course this assumes "5", "6" etc are registers else the mov's won't work ...

And the called proc doesn't allocate stack, that's already been done. But it may spill rcx, rdx, r8 r9 to their (already allocated) space on the stack - that's what it's there for. That's as you show, except you have them in upside-down order. And in this case we wouldn't call them "locals" they're saved ("spilled") args.

JWasm doesn't put rcx etc on the stack when calling. But when they're used in the proc, it spills them first. Me, I write my own invoke, and always put them on the stack when calling, except when speed is really critical.

Anyway this means that even if fearless xor's rcx before putting String in rdx, it will work OK, because rcx gets saved before xor'ing. Seems strange but if JWasm does it that way it must be good!

If there were locals - which you don't show - then space would be used on the stack, but it goes below the entry point, not above (as for args), and uses rbp to reference them.

There are more details, some args are pointed to not passed, stack alignment is critical (above discussion assumes it's already aligned) and then there's SEH which I haven't really gotten to yet

BTW Gunther's reference (Wikipedia) is incomplete and has the standard mistake re. floating point (which I've mentioned often) - seems to be required for any write-up on the net. Use MSDN for info, the only place I've seen that gets this one right
I am NaN ;)

habran

#14
Here is one way to do that:

  atou_ex PROC FRAME String:QWORD   
    xor eax, eax
  .repeat
    movzx rdx, BYTE PTR [rcx]
    test rdx, rdx
    .break .if (ZERO?)
    lea rax, [rax+rax*4]
    lea rax, [rdx+rax*2-48]
    inc rcx
  .until FALSE
    mov rcx, -1          ; non zero in RCX for success
    ret                  ; rax contains the number
atou_ex endp

atou_ex:
0000000001041147 33 C0                xor         eax,eax 
0000000001041149 48 0F B6 11          movzx       rdx,byte ptr [rcx] 
000000000104114D 48 85 D2             test        rdx,rdx 
0000000001041150 74 0E                je          atou_ex+19h (01041160h) 
0000000001041152 48 8D 04 80          lea         rax,[rax+rax*4] 
0000000001041156 48 8D 44 42 D0       lea         rax,[rdx+rax*2-30h] 
000000000104115B 48 FF C1             inc         rcx 
000000000104115E EB E9                jmp         atou_ex+2h (01041149h) 
0000000001041160 48 C7 C1 FF FF FF FF mov         rcx,0FFFFFFFFFFFFFFFFh 
0000000001041167 C3                   ret

You can try to add a counter :biggrin:
Cod-Father