News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Int128 in assembler

Started by bigbadbob, June 13, 2018, 02:04:43 PM

Previous topic - Next topic

aw27

Quote from: nidud on June 15, 2018, 04:48:24 AM
So it's not because he follows the Windows ABI but because he does not know yet how to do it otherwise.
Nah, it is a little "secret" of the Windows ABI how to do it otherwise.  :biggrin:

bigbadbob

The feature PInvoke in C# has to conform to 64 bit ABI when running in 64 bit mode.  The reason is that is used to call the Windows API or any other Native DLL.  A DLL that contains 64 bit assembly and standard C/C++ are Native.  In 32 bit mode I think that it uses STDCALL.

I'm able to tell a .Net DLL to be compiled to work only for x64.  The default is 'Any CPU' which is independent of architecture.

I do not know if PInvoke will accept an XMM0 return value.  I'm aware that RAX is the 64 bit return value register, but cannot be used since I would have a 128 bit return value.

Not all my functions are void, but I did not show this one yet.

_text SEGMENT 

public UInt128Parse

; UInt128Parse
; ---------
; RCX - QWORD - PTR to String (input)
; RDX - QWORD - PTR to Int128 (result)
; R8  - unused
; R9  - unused
; ---------
; RAX volatile
; R10 volatile
; R11 volatile
;----------
; C Header
; DWORD UInt128Parse(wchar* lpwszString, _int128* result )
;----------
; input lpwszString remains unchanged
; result the pointer's contents are updated
; ---------
; returns 0 sucess
; returns 1 overflow
; returns 2 invalid format
; ---------
;
UInt128Parse PROC FRAME 
   push rbp 
.pushreg rbp
   push rdx 
.pushreg rdx
   push rbx
.pushreg rbx
   push r12
.pushreg r12
   sub rsp, 030h 
.allocstack 030h 
   mov rbp, rsp 
.setframe rbp, 0 
.endprolog 
   mov r12, rdx

   xor r11, r11 ; keep r11 zero
   mov QWORD PTR [rbp+8], r11
   mov QWORD PTR [rbp+16], r11
   mov QWORD PTR [rbp+24], r11
   
   xor r10, r10 ; move to the begining of the string
   jmp start_loop

keep_looping:
   mov rbx, 10
   mov rax, [rbp+8]
   mul rbx
   mov QWORD PTR [rbp+32], rdx
   mov QWORD PTR [rbp+8], rax
   mov rax, QWORD PTR [rbp+16]
   mul rbx

   add rax, QWORD PTR [rbp+32]
   adc rdx, r11                ; add with carry and zero
   mov QWORD PTR [rbp+16], rax
   mov QWORD PTR [rbp+24], rdx
   cmp rdx, r11
   jne overflow

start_loop:
   mov dx, WORD PTR [rcx+r10] ; r10 is the string offset
   cmp dx, '0'
   jb  invalid_character

   cmp dx, '9'
   ja  invalid_character

   sub dx, '0'
   xor rax, rax
   mov ax, dx

   add QWORD PTR [rbp+8], rax
   adc QWORD PTR [rbp+16], r11 ; add with carry and zero
   adc QWORD PTR [rbp+24], r11 ; add with carry and zero

   cmp QWORD PTR [rbp+24], r11 ; compare to zero
   jne overflow
   
   add r10, 2
   mov dx, WORD PTR [rcx+r10]
   cmp dx, 0
   je  done   
   jmp keep_looping


done:
   mov rax, QWORD PTR [rbp+8]
   mov rdx, QWORD PTR [rbp+16]
   mov QWORD PTR [r12], rax
   mov QWORD PTR [r12+8], rdx

   xor eax, eax
   jmp method_exit

overflow:
   mov eax, 1
   jmp method_exit

invalid_character:
   mov eax, 2

method_exit:

   ; epilog 
   add rsp, 030h
   pop r12
   pop rbx
   pop rdx
   pop rbp 
   ret 
UInt128Parse ENDP 
_text ENDS 
END

hutch--

Bob,

If the return value is a problem, what about passing the address of a buffer in the arguments that is any size you like, write the results to that buffer in your assembler proc then back in your calling language just read the buffer ? This is pretty standard stuff and Windows API functions use it regularly.

bigbadbob

My latest method signature in C# is this:


        [DllImport("BigBadInt128.dll")]
        private static extern Int128 Int128Add(ref Int128 addend1, ref Int128 addend2, out Int128 result);


The first 2 parameters are "ref" Int128, so that means pointer to my struct.
The last parameter is "out" Int128.  It is still a pointer, but it means that it is only a result, not an input.  I don't need a buffer my struct is 128 bits (2 QWORDS in size).


    [StructLayout(LayoutKind.Sequential)]
    public struct Int128
    {
        private Int64 loQWORD;
        private Int64 hiQWORD;
    }


As to Hutch's comment about the buffer, I was only responding to people saying that I don't know any other way.

By the way I knew the whole time that even the first add function worked.  I ran it with C# already.  I was just wondering if I did it right.
I was not sure if I had to have a FRAME pointer even if I don't use the stack.  The reason is I read some article, but I don't remember where it is.  Someone said always make a FRAME for exception unwinding.

I might eventually write a C++ program and call my DLL from that also.  I just don't usually write code in C++.  Not that I never have.  I think about 15 years ago I wrote some COM in C++.  But it has been so long ago that I most likely won't remember all of the details.

So I keep reading that we allocate shadow space of 32 bytes just in case we erase our registers.
What are the locations if I want to use that space?

mov [rsp+8], rcx - ??? - I think that I saw this somewhere.
mov [rsp+16], rdx
mov [rsp+24], r8
mov [rsp+32], r9

I'm sorry if I'm wrong I'm only guessing.

hutch--

Bob,

Here is the reference on the calling convention that I use for the MASM64 SDK. There is a mountain of bullshit about how it works across the internet, I did this one the hard way, write, test, verify and it works correctly.

The Win 64 Calling Convention, How Does It Work ?

Win 64 effectively only has one form of calling convention and it is used on all of the Windows API functions and while it is more complicated than the STDCALL and C calling conventions in Win 32, it is also more flexible in the each argument passed to another function can be specified in any of 4 different data sizes, BYTE WORD DWORD and QWORD being respectively 8 bit, 16 bit, 32 bit and 64 bit.

Whereas Win32 only used the stack with STDCALL and the C calling convention, Win 64 uses a combination of integer registers and stack locations to pass the number of arguments required for different procedures. In the specification of the Win 64 calling convention, the stack pointer (RSP) must remain 16 byte aligned which is done for performance reasons with larger data types and a number of instructions that need aligned memory to procedure.

Calling a procedure

The first four (4) arguments are written to the RCX RDX R8 and R9 in any of the 4 data sizes supported by the calling convention and any following arguments are written to a stack relative location in memory without changing the stack pointer (RSP). Many procedures have 4 or less arguments and obtain the advantage of lower calling overhead by receiving the 4 or less arguments directly in the four specified registers.

When there are more than four arguments, arguments 5 and upwards are written to the stack and here there is another consideration that will become obvious at the receiving end of a procedure call, the first four locations on the stack are left empty so that the 4 registers can be stored at those locations if necessary. The first four stack addresses are [rsp], [rsp+8], [rsp+16] and [rsp+24]which are left empty. Argument 5 and upwards are written to the RSP relative address [rsp+32] and upwards with an increase in displacement of 8 bytes for each argument.

A typical procedure call with 6 arguments will look like this.

mov rcx, arg1
mov rdx, arg2
mov r8, arg3
mov r9, arg4
mov QWORD PTR [rsp+32], arg5
mov QWORD PTR [rsp+40], arg6
call FunctionName

It is worth noting that with the stack arguments, if they are either a register or an immediate operand they are written directly to the RSP relative stack address. If the argument is a memory operand, either LOCAL of GLOBAL, it will be written to a register first then the register is written to the stack address as x86 - 64 processors do not support direct memory to memory copy.

It will look like this.

mov rax, arg5
mov QWORD PTR [rsp+32], rax
mov rax, arg6
mov QWORD PTR [rsp+40], rax

The Procedure Being Called

Depending on the number of arguments being passed to the procedure that is being called, a simple procedure that does not call other procedures (a leaf procedure) does not need to create a stack frame and can use the 4 or less registers in the design of the procedure along with other available registers. When a procedure received 5 or more arguments and requires LOCAL variables it usually requires a stack frame which makes the arguments passed on the stack RBP relative.

When a procedure with a stack frame is called, 8 bytes are stored on the stack for the return address and another 8 bytes are used when creating the stack frame. This shifts the location of the first 4 empty arguments up by 16 bytes so that the first empty stack location is located at address [rbp+16]. The four registers that hold the first four arguments are volatile registers which means they can be overwritten by any following mnemonics so in a normal high level procedure that will call multiple procedures, the correct solution is to copy the four registers into the four stack locations. The four empty locations are generally referred to as shadow space.

mov [rbp+16], rcx
mov [rbp+24], rdx
mov [rbp+32], r8
mov [rbp+40], r9

There is good reason to write the four registers to the RBP relative addresses rather than to the variable names as the addresses are fixed at 64 bit and you don't have to bother with any different data sizes. A modern compiler will automate this process and an assembler needs to preserve the 4 register arguments in the (shadow space) so they are not overwritten.

With a language that specifies an argument list at a procedure's entry with an example something like this,

MyFunction proc arg1:QWORD,arg2:QWORD,arg3:QWORD,arg4:QWORD,arg5:QWORD,arg6:QWORD

Once the four registers have been copied to the four RBP relative addresses (shadow space), you can use the argument names in the argument list in the normal manner when writing the procedure.

The notation used in the above examples is in the format of the 64 bit Microsoft assembler, ML64.EXE and has been developed and tested successfully in Windows 10 Professional. Compatibility testing has also been successfully performed on Win7 64 bit Ultimate and Win 8/8.1 64 bit.

aw27

A wrote a few programs in C#, eventually more complicated than any you have ever done.
I have even written an article about mixing C# and ASM in a single executable.
So, I know very well what C# is all about.

Quote
I was only responding to people saying that I don't know any other way

You have not. Everybody and their cat know that integer values up to 64-bit are returned in eax/rax. We were talking about the reason you used void functions when you had a return value.

aw27

@Hutch,

There is the part about float (real4) parameters and returning floats that you did not mention.

bigbadbob

Quote
You have not. Everybody and their cat know that integer values up to 64-bit are returned in eax/rax. We were talking about the reason you used void functions when you had a return value.

My return value was 128 bit, so I recieved a passed in pointer for most functions.  I probably misused RAX.  The function was "void" meaning you can ignore RAX, though I don't know if that is a bad practice.  I thought that RAX is volatile if the function is void.  If you think that is a bad practice I won't use RAX for a void.

I used EAX to return a DWORD in my parse routine.  I used that as a success and failure code.  0 is for success.  Otherwise in C# I throw an exception.  Like OverflowException and FormatException.

I was also responding to Hutch.  I'm not sure if he called assembly from C#.  Not saying he did or did not.  Just I could not gauge it based on how he responded.  Sorry, too many people on the forum.  AW, I'm not trying to call myself smarter than you so there was really no reason to belittle me as a defense mechanism and say that your C# is more complicated than mine.  Maybe it is, but I really thought that stating that at this time was uncalled for.

aw27

May be I am talking chinese without knowing.
Let me try again.

You are doing this:
void Int128Add(_m128* const input1, _m128* const input2, _m128* result )

while what you really want is this (or some variation on the same line):
_m128 result = Int128Add(_m128* const input1, _m128* const input)

but you don't know yet how to do it.


hutch--

Jose,

> There is the part about float (real4) parameters and returning floats that you did not mention.

You are correct here and as well I did not address a number of other data types that can be returned within a 64 bit data size but in some contexts a returned register does the job, if you return the fp0 floating point register it can handle 32, 64 and 80 bit data where with a 64 bit return value you will only do 32 and 64 bit.

sinsi

Doesn't the ABI tell you? If you need a result that doesn't fit into RAX (integer) or XMM0 (float) you pass the address of the return type in RCX and bump the other args.

https://docs.microsoft.com/en-au/cpp/build/return-values-cpp


aw27

 RCX, RDX, R8, R9 are used for integer and pointer arguments in that order left to right.
 XMM0, 1, 2, and 3 are used for floating point arguments.
 Additional arguments are pushed on the stack left to right.
 Parameters less than 64 bits long are not zero extended; the high bits contain garbage.
 It is the caller‟s responsibility to allocate 32 bytes of "shadow space" (for storing RCX, RDX, R8, and R9 if needed) before calling the function.
 It is the caller‟s responsibility to clean the stack after the call.
 Integer return values (similar to x86) are returned in RAX if 64 bits or less.
 Floating point return values are returned in XMM0.
 Larger return values (structs) have space allocated on the stack by the caller, and RCX then contains a pointer to the return space when the callee is called. Register usage for integer parameters is then pushed one to the right. RAX returns this address to the caller.

It is all there, or almost all.
For example if you use XMM0 to pass a float you can not use RCX to pass a value, If you use XMM1 you can't use RDX, etc

hutch--

Bob,

> I'm not sure if he called assembly from C#.

You can be sure I have never called assembler from C# as I never have and never will use it. What I was suggestion was that if you simply pass a pointer to memory, a structure or a variable to an assembler procedure, you can write whatever result you like to that address and at the caller end you will have the result you produced in the assembler procedure.

aw27

In other words:
"Larger return values (structs) have space allocated on the stack by the caller, and RCX then contains a pointer to the return space when the callee is called. Register usage for integer parameters is then pushed one to the right. RAX returns this address to the caller."

It is all here: https://software.intel.com/en-us/articles/introduction-to-x64-assembly
This is all you need to know to become a great 64-bit ASM programmer.


bigbadbob

Quote from: hutch-- on June 15, 2018, 03:52:50 PM
Bob,

> I'm not sure if he called assembly from C#.

You can be sure I have never called assembler from C# as I never have and never will use it. What I was suggestion was that if you simply pass a pointer to memory, a structure or a variable to an assembler procedure, you can write whatever result you like to that address and at the caller end you will have the result you produced in the assembler procedure.

Sorry I thought you meant BYTE or CHAR buffer.  I was using pointers the whole time.

Quote from: AW on June 15, 2018, 03:59:30 PM
In other words:
"Larger return values (structs) have space allocated on the stack by the caller, and RCX then contains a pointer to the return space when the callee is called, Register usage for integer parameters is then pushed one to the right. RAX returns this address to the caller.
"

It is all here: https://software.intel.com/en-us/articles/introduction-to-x64-assembly
This is all you need to know to become a great 64-bit ASM programmer.



Thank you AW.  I read that and forgot.  Not sure if it was the Intel page.  So ECX is my pointer.