The MASM Forum

General => The Workshop => Topic started by: bigbadbob on June 13, 2018, 02:04:43 PM

Title: Int128 in assembler
Post by: bigbadbob on June 13, 2018, 02:04:43 PM
I'm a C# programmer, but have always understood assembly.  I just don't code in it very often.

This is my first at least semi-working 64-bit dll written in assembler.
I actually called it from C# using PInvoke to test it, so at the minimum it is compliant to the ABI, and should work from assembler "C"/"C++" and .Net using PInvoke.
Not really sure if it is fully compliant, but it did not crash C#.

Sample Add method:

_text SEGMENT 

public Int128Add

; Int128Add
; ---------
; RCX - QWORD - PTR to Int128 (input1)
; RDX - QWORD - PTR to Int128 (input2)
; R8  - QWORD - PTR to Int128 (result)
; R9  - unused
; ---------
; RAX volatile
; R10 volatile
; R11 volatile
;----------
; C Header
; void Int128Add(_m128* const input1, _m128* const input2, _m128* result )
;----------
; source remains unchanged
; performs *R8 = *RCX + *RDX in 128 bit mode
Int128Add PROC FRAME 
   push rbp 
.pushreg rbp 
   sub rsp, 010h 
.allocstack 010h 
   mov rbp, rsp 
.setframe rbp, 0 
.endprolog 
   mov rax, QWORD PTR [rcx]
   mov r10, QWORD PTR [rcx+8]
   add rax, QWORD PTR [rdx]
   adc r10, QWORD PTR [rdx+8]
   mov QWORD PTR [r8], rax
   mov QWORD PTR [r8+8], r10

   ; epilog 
   add rsp, 010h 
   pop rbp 
   ret 
Int128Add ENDP 
_text ENDS 
END


And that method is working, but I think that I allocated too much stack.

I think that add and subtract are the same Int128 (signed) and UInt128 (unsigned)

Also untested, multiply:

_text SEGMENT 

public UInt128Mul

; UInt128Mul
; ---------
; RCX - QWORD - PTR to Int128 (input1)
; RDX - QWORD - PTR to Int128 (input2)
; R8  - QWORD - PTR to Int256 (result)
; R9  - unused
; ---------
; RAX volatile
; R10 volatile
; R11 volatile
;----------
; C Header
; void UInt128Mul(_int128* const input1, _int128* const input2, _int256* result )
;----------
; source remains unchanged
; performs *R8 = *RCX * *RDX in 128 bit mode resulting in 256 bits
UInt128Mul PROC FRAME 
   push rbp 
.pushreg rbp
   push rdx
.pushreg rdx 
   sub rsp, 050h 
.allocstack 050h 
   mov rbp, rsp 
.setframe rbp, 0 
.endprolog 
;  _int256 temp1 = 20h bytes rbp,    rbp+8,  rbp+16, rbp+24
;  _int256 temp2 = 20h bytes rbp+32, rbp+40, rbp+48, rbp+56
;  _int128 input2 shadow = 16 bytes (10h) rbp+64, rbp+72
   
   ; temp1 = temp2 = 0
   xor rax, rax
   mov QWORD PTR [rbp], rax
   mov QWORD PTR [rbp+8], rax
   mov QWORD PTR [rbp+16], rax
   mov QWORD PTR [rbp+24], rax
   mov QWORD PTR [rbp+32], rax
   mov QWORD PTR [rbp+40], rax
   mov QWORD PTR [rbp+48], rax
   mov QWORD PTR [rbp+56], rax

   ; input2 shadow = *input2
   mov rax, QWORD PTR [rdx]
   mov QWORD PTR [rbp+64], rax
   mov rax, QWORD PTR [rdx+8]
   mov QWORD PTR [rbp+72], rax

   mov rax, QWORD PTR [rcx]
   mov rdx, QWORD PTR [rbp+64]
   mul rdx
   mov QWORD PTR [rbp], rax
   mov QWORD PTR [rbp+8], rdx

   mov rax, QWORD PTR [rcx+8]
   mov rdx, QWORD PTR [rbp+64]
   mul rdx
   mov QWORD PTR [rbp+32], rax
   mov QWORD PTR [rbp+40], rdx
   call add_temp

   mov rax, QWORD PTR [rcx]
   mov rdx, QWORD PTR [rbp+72]
   mul rdx
   mov QWORD PTR [rbp+32], rax
   mov QWORD PTR [rbp+40], rdx
   call add_temp

   mov rax, QWORD PTR [rcx+8]
   mov rdx, QWORD PTR [rbp+72]
   mul rdx
   mov QWORD PTR [rbp+40], rax
   mov QWORD PTR [rbp+48], rdx
   xor rax, rax
   mov QWORD PTR [rbp+32], rax
   call add_temp

   mov rax, QWORD PTR [rbp]
   mov rdx, QWORD PTR [rbp+8]
   mov r10, QWORD PTR [rbp+16]
   mov r11, QWORD PTR [rbp+24]
   mov QWORD PTR [r8], rax
   mov QWORD PTR [r8+8], rdx
   mov QWORD PTR [r8+16], r10
   mov QWORD PTR [r8+24], r11

   ; epilog 
   add rsp, 050h
   pop rdx 
   pop rbp 
   ret 

add_temp:
   mov rax, QWORD PTR [rbp]
   add rax, QWORD PTR [rbp+32]
   mov QWORD PTR [rbp], rax

   mov rax, QWORD PTR [rbp+8]
   adc rax, QWORD PTR [rbp+40]
   mov QWORD PTR [rbp+8], rax

   mov rax, QWORD PTR [rbp+16]
   adc rax, QWORD PTR [rbp+48]
   mov QWORD PTR [rbp+16], rax

   mov rax, QWORD PTR [rbp+24]
   adc rax, QWORD PTR [rbp+56]
   mov QWORD PTR [rbp+24], rax
   
   ret
UInt128Mul ENDP 
_text ENDS 
END


I'm also looking for a good divide 128 algorithm.
Title: Re: Int128 in assembler
Post by: aw27 on June 13, 2018, 04:54:04 PM
There is nothing we can call a _int128 data type (as far as I know), it is likely a structure you forgot to define.

In ASM we have owords but the ABI does not consider them a "returnable" data type.

Your coding style is bad, nobody really uses exception frames in ASM without knowing exactly what they are doing. This is not your case, you are not even sure you need 16 bytes of stack.


I did not look at your 2nd function.
Title: Re: Int128 in assembler
Post by: hutch-- on June 13, 2018, 05:44:44 PM
Bob,

If you want to pass and return a 128 bit sized piece of data, you would normally pass the address of that data. You could of course use one or more SSE registers both in and out but AW is correct here, according to the Win64 ABI, you can only pass up to a 64 bit value as normal arguments as the ABI is designed that way.
Title: Re: Int128 in assembler
Post by: nidud on June 13, 2018, 08:18:44 PM
deleted
Title: Re: Int128 in assembler
Post by: Mikl__ on June 13, 2018, 11:52:17 PM
Hi, bigbadbob!
will look here (http://x86asm.net/articles/working-with-big-numbers-using-x86-instructions/index.html)
Title: Re: Int128 in assembler
Post by: bigbadbob on June 14, 2018, 12:51:23 PM
Thank you for your responses.
My goal is to learn enough that I can properly explain/teach it to someone else.  There should be enough comments that I could learn the ABI all over again if I forget.
I've never wrote assembly language code for pay.  I would consider myself a beginner, but I learn quickly.

Just so you know I'm writing a DLL in GitHub.
https://github.com/robertkolski/BigBadInt128/blob/master/src/Int128Add.asm
As far as I know, nobody looks at my project.  I thought that joining this forum would be a way to learn.

Now back to the assembly:
I learned somewhere that if you change the stack you should have an exception FRAME pointer in case your memory access, like mov r10, [rcx] fails.  For instance if someone passed rcx = 0 to your function should use the FRAME pointer to unwind.  Later I read that you only need to set up a frame pointer if you actually use the stack within the procedure.  And I was not sure if I could call this function a leaf function, but now I think that I can.

Please see the revised Add function (I use it as a method)

_text SEGMENT 

public Int128Add

; Int128Add
; -------------------------------------
; Signed and unsigned add.
; RCX - PTR to OWORD (input1)
; RDX - PTR to OWORD (input2)
; R8  - PTR to OWORD (result)
; R9  - unused
; -------------------------------------
; RAX volatile - but not used
; R10 volatile
; R11 volatile
;--------------------------------------
; C Header - pseudocode prototype
; void Int128Add(_int128* input1, _int128* input2, _int128* result )
; assume that I have a C compiler that supports _int128 or it is a struct
;--------------------------------------
; C# types (without full implementation here):
; public struct Int128
; {
;    private Int64 loQWORD;
;    private Int64 hiQWORD;
;    [DllImport("BigBadInt128.dll")]
;    private static extern void Int128Add(IntPtr addend1, IntPtr addend2, IntPtr result);
;    // public methods not shown
; }
; public struct UInt128
; {
;    private UInt64 loQWORD;
;    private UInt64 hiQWORD;
;    [DllImport("BigBadInt128.dll")]
;    private static extern void Int128Add(IntPtr addend1, IntPtr addend2, IntPtr result);
;    // public methods not shown
; }
;--------------------------------------
; input1 and input2 remain unchanged
; the contents of the OWORD result is modified
;--------------------------------------
; Don't need FRAME and PROLOG because
; this is a leaf function
; it does not need any stack space
; -------------------------------------
Int128Add PROC
   mov r10, QWORD PTR [rcx]
   mov r11, QWORD PTR [rcx+8]
   add r10, QWORD PTR [rdx]
   adc r11, QWORD PTR [rdx+8]
   mov QWORD PTR [r8], r10
   mov QWORD PTR [r8+8], r11
   ret 
Int128Add ENDP 
_text ENDS 
END
Title: Re: Int128 in assembler
Post by: aw27 on June 14, 2018, 03:36:40 PM
Quote
I would consider myself a beginner, but I learn quickly
You don't need to put it that way, people see immediately who you are.
In addition you are a C# programmer which is not a good starting point.

ASM programmers usually do not spend time with FRAME, particularly if they are beginners, for a few reasons I could detail here but will not. However, for an introduction on the subject you can read this article 64-bit Structured Exception Handling (SEH) in ASM (https://www.codeproject.com/Articles/1212332/bit-Structured-Exception-Handling-SEH-in-ASM)




Title: Re: Int128 in assembler
Post by: nidud on June 14, 2018, 09:18:56 PM
deleted
Title: Re: Int128 in assembler
Post by: aw27 on June 14, 2018, 09:38:35 PM
@nidud

Quote
or in this case RDX:RAX.

Is this the Windows ABI? I don't think so.
Title: Re: Int128 in assembler
Post by: nidud on June 14, 2018, 11:12:14 PM
deleted
Title: Re: Int128 in assembler
Post by: aw27 on June 15, 2018, 12:39:28 AM
Quote
I'm a C# programmer
He was very clear at that.
Title: Re: Int128 in assembler
Post by: hutch-- on June 15, 2018, 01:13:35 AM
It still sounds like there are 2 choices, either use 64 bit pointers to the 128 bit variable OR pass the data in SSE registers. I guess you could pass that data in AVX registers as well. As far as I know VS does not support registers directly so the choices collapse down to passing 64 bit pointers to the 128 bit data like the ABI for Win 64 supports.
Title: Re: Int128 in assembler
Post by: nidud on June 15, 2018, 01:20:04 AM
deleted
Title: Re: Int128 in assembler
Post by: aw27 on June 15, 2018, 03:01:05 AM
According to the Windows ABI he has to use pointers to the variable.
The Windows ABI is a convention to be used by all programs, written in any programming language. We know that it is possible to use special calling conventions, or derivatives when we know beforehand with what our ASM module will be linked to.
This is really not the case here, C# does not allow special calling conventions.

Quote
None of the functions presented by Bob use any return type other than void and all arguments are passed as pointers
We know why he did that, because he does not know yet how to do it otherwise.
Title: Re: Int128 in assembler
Post by: nidud on June 15, 2018, 04:48:24 AM
deleted
Title: Re: Int128 in assembler
Post by: aw27 on June 15, 2018, 04:53:18 AM
Quote from: nidud on June 15, 2018, 04:48:24 AM
So it's not because he follows the Windows ABI but because he does not know yet how to do it otherwise.
Nah, it is a little "secret" of the Windows ABI how to do it otherwise.  :biggrin:
Title: Re: Int128 in assembler
Post by: bigbadbob on June 15, 2018, 12:18:41 PM
The feature PInvoke in C# has to conform to 64 bit ABI when running in 64 bit mode.  The reason is that is used to call the Windows API or any other Native DLL.  A DLL that contains 64 bit assembly and standard C/C++ are Native.  In 32 bit mode I think that it uses STDCALL.

I'm able to tell a .Net DLL to be compiled to work only for x64.  The default is 'Any CPU' which is independent of architecture.

I do not know if PInvoke will accept an XMM0 return value.  I'm aware that RAX is the 64 bit return value register, but cannot be used since I would have a 128 bit return value.

Not all my functions are void, but I did not show this one yet.

_text SEGMENT 

public UInt128Parse

; UInt128Parse
; ---------
; RCX - QWORD - PTR to String (input)
; RDX - QWORD - PTR to Int128 (result)
; R8  - unused
; R9  - unused
; ---------
; RAX volatile
; R10 volatile
; R11 volatile
;----------
; C Header
; DWORD UInt128Parse(wchar* lpwszString, _int128* result )
;----------
; input lpwszString remains unchanged
; result the pointer's contents are updated
; ---------
; returns 0 sucess
; returns 1 overflow
; returns 2 invalid format
; ---------
;
UInt128Parse PROC FRAME 
   push rbp 
.pushreg rbp
   push rdx 
.pushreg rdx
   push rbx
.pushreg rbx
   push r12
.pushreg r12
   sub rsp, 030h 
.allocstack 030h 
   mov rbp, rsp 
.setframe rbp, 0 
.endprolog 
   mov r12, rdx

   xor r11, r11 ; keep r11 zero
   mov QWORD PTR [rbp+8], r11
   mov QWORD PTR [rbp+16], r11
   mov QWORD PTR [rbp+24], r11
   
   xor r10, r10 ; move to the begining of the string
   jmp start_loop

keep_looping:
   mov rbx, 10
   mov rax, [rbp+8]
   mul rbx
   mov QWORD PTR [rbp+32], rdx
   mov QWORD PTR [rbp+8], rax
   mov rax, QWORD PTR [rbp+16]
   mul rbx

   add rax, QWORD PTR [rbp+32]
   adc rdx, r11                ; add with carry and zero
   mov QWORD PTR [rbp+16], rax
   mov QWORD PTR [rbp+24], rdx
   cmp rdx, r11
   jne overflow

start_loop:
   mov dx, WORD PTR [rcx+r10] ; r10 is the string offset
   cmp dx, '0'
   jb  invalid_character

   cmp dx, '9'
   ja  invalid_character

   sub dx, '0'
   xor rax, rax
   mov ax, dx

   add QWORD PTR [rbp+8], rax
   adc QWORD PTR [rbp+16], r11 ; add with carry and zero
   adc QWORD PTR [rbp+24], r11 ; add with carry and zero

   cmp QWORD PTR [rbp+24], r11 ; compare to zero
   jne overflow
   
   add r10, 2
   mov dx, WORD PTR [rcx+r10]
   cmp dx, 0
   je  done   
   jmp keep_looping


done:
   mov rax, QWORD PTR [rbp+8]
   mov rdx, QWORD PTR [rbp+16]
   mov QWORD PTR [r12], rax
   mov QWORD PTR [r12+8], rdx

   xor eax, eax
   jmp method_exit

overflow:
   mov eax, 1
   jmp method_exit

invalid_character:
   mov eax, 2

method_exit:

   ; epilog 
   add rsp, 030h
   pop r12
   pop rbx
   pop rdx
   pop rbp 
   ret 
UInt128Parse ENDP 
_text ENDS 
END
Title: Re: Int128 in assembler
Post by: hutch-- on June 15, 2018, 01:38:08 PM
Bob,

If the return value is a problem, what about passing the address of a buffer in the arguments that is any size you like, write the results to that buffer in your assembler proc then back in your calling language just read the buffer ? This is pretty standard stuff and Windows API functions use it regularly.
Title: Re: Int128 in assembler
Post by: bigbadbob on June 15, 2018, 02:23:43 PM
My latest method signature in C# is this:


        [DllImport("BigBadInt128.dll")]
        private static extern Int128 Int128Add(ref Int128 addend1, ref Int128 addend2, out Int128 result);


The first 2 parameters are "ref" Int128, so that means pointer to my struct.
The last parameter is "out" Int128.  It is still a pointer, but it means that it is only a result, not an input.  I don't need a buffer my struct is 128 bits (2 QWORDS in size).


    [StructLayout(LayoutKind.Sequential)]
    public struct Int128
    {
        private Int64 loQWORD;
        private Int64 hiQWORD;
    }


As to Hutch's comment about the buffer, I was only responding to people saying that I don't know any other way.

By the way I knew the whole time that even the first add function worked.  I ran it with C# already.  I was just wondering if I did it right.
I was not sure if I had to have a FRAME pointer even if I don't use the stack.  The reason is I read some article, but I don't remember where it is.  Someone said always make a FRAME for exception unwinding.

I might eventually write a C++ program and call my DLL from that also.  I just don't usually write code in C++.  Not that I never have.  I think about 15 years ago I wrote some COM in C++.  But it has been so long ago that I most likely won't remember all of the details.

So I keep reading that we allocate shadow space of 32 bytes just in case we erase our registers.
What are the locations if I want to use that space?

mov [rsp+8], rcx - ??? - I think that I saw this somewhere.
mov [rsp+16], rdx
mov [rsp+24], r8
mov [rsp+32], r9

I'm sorry if I'm wrong I'm only guessing.
Title: Re: Int128 in assembler
Post by: hutch-- on June 15, 2018, 02:31:23 PM
Bob,

Here is the reference on the calling convention that I use for the MASM64 SDK. There is a mountain of bullshit about how it works across the internet, I did this one the hard way, write, test, verify and it works correctly.

The Win 64 Calling Convention, How Does It Work ?

Win 64 effectively only has one form of calling convention and it is used on all of the Windows API functions and while it is more complicated than the STDCALL and C calling conventions in Win 32, it is also more flexible in the each argument passed to another function can be specified in any of 4 different data sizes, BYTE WORD DWORD and QWORD being respectively 8 bit, 16 bit, 32 bit and 64 bit.

Whereas Win32 only used the stack with STDCALL and the C calling convention, Win 64 uses a combination of integer registers and stack locations to pass the number of arguments required for different procedures. In the specification of the Win 64 calling convention, the stack pointer (RSP) must remain 16 byte aligned which is done for performance reasons with larger data types and a number of instructions that need aligned memory to procedure.

Calling a procedure

The first four (4) arguments are written to the RCX RDX R8 and R9 in any of the 4 data sizes supported by the calling convention and any following arguments are written to a stack relative location in memory without changing the stack pointer (RSP). Many procedures have 4 or less arguments and obtain the advantage of lower calling overhead by receiving the 4 or less arguments directly in the four specified registers.

When there are more than four arguments, arguments 5 and upwards are written to the stack and here there is another consideration that will become obvious at the receiving end of a procedure call, the first four locations on the stack are left empty so that the 4 registers can be stored at those locations if necessary. The first four stack addresses are [rsp], [rsp+8], [rsp+16] and [rsp+24]which are left empty. Argument 5 and upwards are written to the RSP relative address [rsp+32] and upwards with an increase in displacement of 8 bytes for each argument.

A typical procedure call with 6 arguments will look like this.

mov rcx, arg1
mov rdx, arg2
mov r8, arg3
mov r9, arg4
mov QWORD PTR [rsp+32], arg5
mov QWORD PTR [rsp+40], arg6
call FunctionName

It is worth noting that with the stack arguments, if they are either a register or an immediate operand they are written directly to the RSP relative stack address. If the argument is a memory operand, either LOCAL of GLOBAL, it will be written to a register first then the register is written to the stack address as x86 - 64 processors do not support direct memory to memory copy.

It will look like this.

mov rax, arg5
mov QWORD PTR [rsp+32], rax
mov rax, arg6
mov QWORD PTR [rsp+40], rax

The Procedure Being Called

Depending on the number of arguments being passed to the procedure that is being called, a simple procedure that does not call other procedures (a leaf procedure) does not need to create a stack frame and can use the 4 or less registers in the design of the procedure along with other available registers. When a procedure received 5 or more arguments and requires LOCAL variables it usually requires a stack frame which makes the arguments passed on the stack RBP relative.

When a procedure with a stack frame is called, 8 bytes are stored on the stack for the return address and another 8 bytes are used when creating the stack frame. This shifts the location of the first 4 empty arguments up by 16 bytes so that the first empty stack location is located at address [rbp+16]. The four registers that hold the first four arguments are volatile registers which means they can be overwritten by any following mnemonics so in a normal high level procedure that will call multiple procedures, the correct solution is to copy the four registers into the four stack locations. The four empty locations are generally referred to as shadow space.

mov [rbp+16], rcx
mov [rbp+24], rdx
mov [rbp+32], r8
mov [rbp+40], r9

There is good reason to write the four registers to the RBP relative addresses rather than to the variable names as the addresses are fixed at 64 bit and you don't have to bother with any different data sizes. A modern compiler will automate this process and an assembler needs to preserve the 4 register arguments in the (shadow space) so they are not overwritten.

With a language that specifies an argument list at a procedure's entry with an example something like this,

MyFunction proc arg1:QWORD,arg2:QWORD,arg3:QWORD,arg4:QWORD,arg5:QWORD,arg6:QWORD

Once the four registers have been copied to the four RBP relative addresses (shadow space), you can use the argument names in the argument list in the normal manner when writing the procedure.

The notation used in the above examples is in the format of the 64 bit Microsoft assembler, ML64.EXE and has been developed and tested successfully in Windows 10 Professional. Compatibility testing has also been successfully performed on Win7 64 bit Ultimate and Win 8/8.1 64 bit.
Title: Re: Int128 in assembler
Post by: aw27 on June 15, 2018, 02:37:51 PM
A wrote a few programs in C#, eventually more complicated than any you have ever done.
I have even written an article about mixing C# and ASM in a single executable.
So, I know very well what C# is all about.

Quote
I was only responding to people saying that I don't know any other way

You have not. Everybody and their cat know that integer values up to 64-bit are returned in eax/rax. We were talking about the reason you used void functions when you had a return value.
Title: Re: Int128 in assembler
Post by: aw27 on June 15, 2018, 02:41:22 PM
@Hutch,

There is the part about float (real4) parameters and returning floats that you did not mention.
Title: Re: Int128 in assembler
Post by: bigbadbob on June 15, 2018, 03:00:32 PM
Quote
You have not. Everybody and their cat know that integer values up to 64-bit are returned in eax/rax. We were talking about the reason you used void functions when you had a return value.

My return value was 128 bit, so I recieved a passed in pointer for most functions.  I probably misused RAX.  The function was "void" meaning you can ignore RAX, though I don't know if that is a bad practice.  I thought that RAX is volatile if the function is void.  If you think that is a bad practice I won't use RAX for a void.

I used EAX to return a DWORD in my parse routine.  I used that as a success and failure code.  0 is for success.  Otherwise in C# I throw an exception.  Like OverflowException and FormatException.

I was also responding to Hutch.  I'm not sure if he called assembly from C#.  Not saying he did or did not.  Just I could not gauge it based on how he responded.  Sorry, too many people on the forum.  AW, I'm not trying to call myself smarter than you so there was really no reason to belittle me as a defense mechanism and say that your C# is more complicated than mine.  Maybe it is, but I really thought that stating that at this time was uncalled for.
Title: Re: Int128 in assembler
Post by: aw27 on June 15, 2018, 03:23:37 PM
May be I am talking chinese without knowing.
Let me try again.

You are doing this:
void Int128Add(_m128* const input1, _m128* const input2, _m128* result )

while what you really want is this (or some variation on the same line):
_m128 result = Int128Add(_m128* const input1, _m128* const input)

but you don't know yet how to do it.

Title: Re: Int128 in assembler
Post by: hutch-- on June 15, 2018, 03:34:54 PM
Jose,

> There is the part about float (real4) parameters and returning floats that you did not mention.

You are correct here and as well I did not address a number of other data types that can be returned within a 64 bit data size but in some contexts a returned register does the job, if you return the fp0 floating point register it can handle 32, 64 and 80 bit data where with a 64 bit return value you will only do 32 and 64 bit.
Title: Re: Int128 in assembler
Post by: sinsi on June 15, 2018, 03:37:59 PM
Doesn't the ABI tell you? If you need a result that doesn't fit into RAX (integer) or XMM0 (float) you pass the address of the return type in RCX and bump the other args.

https://docs.microsoft.com/en-au/cpp/build/return-values-cpp

Title: Re: Int128 in assembler
Post by: aw27 on June 15, 2018, 03:44:23 PM
 RCX, RDX, R8, R9 are used for integer and pointer arguments in that order left to right.
 XMM0, 1, 2, and 3 are used for floating point arguments.
 Additional arguments are pushed on the stack left to right.
 Parameters less than 64 bits long are not zero extended; the high bits contain garbage.
 It is the caller‟s responsibility to allocate 32 bytes of "shadow space" (for storing RCX, RDX, R8, and R9 if needed) before calling the function.
 It is the caller‟s responsibility to clean the stack after the call.
 Integer return values (similar to x86) are returned in RAX if 64 bits or less.
 Floating point return values are returned in XMM0.
 Larger return values (structs) have space allocated on the stack by the caller, and RCX then contains a pointer to the return space when the callee is called. Register usage for integer parameters is then pushed one to the right. RAX returns this address to the caller.

It is all there, or almost all.
For example if you use XMM0 to pass a float you can not use RCX to pass a value, If you use XMM1 you can't use RDX, etc
Title: Re: Int128 in assembler
Post by: hutch-- on June 15, 2018, 03:52:50 PM
Bob,

> I'm not sure if he called assembly from C#.

You can be sure I have never called assembler from C# as I never have and never will use it. What I was suggestion was that if you simply pass a pointer to memory, a structure or a variable to an assembler procedure, you can write whatever result you like to that address and at the caller end you will have the result you produced in the assembler procedure.
Title: Re: Int128 in assembler
Post by: aw27 on June 15, 2018, 03:59:30 PM
In other words:
"Larger return values (structs) have space allocated on the stack by the caller, and RCX then contains a pointer to the return space when the callee is called. Register usage for integer parameters is then pushed one to the right. RAX returns this address to the caller."

It is all here: https://software.intel.com/en-us/articles/introduction-to-x64-assembly
This is all you need to know to become a great 64-bit ASM programmer.

Title: Re: Int128 in assembler
Post by: bigbadbob on June 15, 2018, 04:07:16 PM
Quote from: hutch-- on June 15, 2018, 03:52:50 PM
Bob,

> I'm not sure if he called assembly from C#.

You can be sure I have never called assembler from C# as I never have and never will use it. What I was suggestion was that if you simply pass a pointer to memory, a structure or a variable to an assembler procedure, you can write whatever result you like to that address and at the caller end you will have the result you produced in the assembler procedure.

Sorry I thought you meant BYTE or CHAR buffer.  I was using pointers the whole time.

Quote from: AW on June 15, 2018, 03:59:30 PM
In other words:
"Larger return values (structs) have space allocated on the stack by the caller, and RCX then contains a pointer to the return space when the callee is called, Register usage for integer parameters is then pushed one to the right. RAX returns this address to the caller.
"

It is all here: https://software.intel.com/en-us/articles/introduction-to-x64-assembly
This is all you need to know to become a great 64-bit ASM programmer.



Thank you AW.  I read that and forgot.  Not sure if it was the Intel page.  So ECX is my pointer.
Title: Re: Int128 in assembler
Post by: bigbadbob on June 15, 2018, 04:26:54 PM
So I did this in C#:


    [StructLayout(LayoutKind.Sequential)]
    public struct Int128
    {
        private Int64 loQWORD;
        private Int64 hiQWORD;

        [DllImport("BigBadInt128.dll")]
        private static extern Int128 Int128Add(ref Int128 addend1, ref Int128 addend2);

        public static Int128 operator+ (Int128 addend1, Int128 addend2)
        {
            return Int128Add(ref addend1, ref addend2);
        }
}


And this in 64-bit assembly:

Int128Add PROC
   mov r10, QWORD PTR [rdx]
   mov r11, QWORD PTR [rdx+8]
   add r10, QWORD PTR [r8]
   adc r11, QWORD PTR [r8+8]
   mov QWORD PTR [rcx], r10
   mov QWORD PTR [rcx+8], r11
   mov rax, rcx
   ret 
Int128Add ENDP 
Title: Re: Int128 in assembler
Post by: nidud on June 15, 2018, 11:55:18 PM
deleted
Title: Re: Int128 in assembler
Post by: aw27 on June 16, 2018, 12:26:41 AM
Come on nidud, you should know this  :redface:

typedef struct
{
   __int64 num1;
   __int64 num2;
}_m128t;

int main()
{
   _m128t in1 = { 1,1 };
   _m128t in2 = { 2,2 };

   _m128t myNum = Int128Add(&in1, &in2);

    return 0;
}

   _m128t myNum = Int128Add(&in1, &in2);
000000013F124D50  lea         r8,[in2] 
000000013F124D54  lea         rdx,[in1] 
000000013F124D58  lea         rcx,[rbp+188h] 
000000013F124D5F  call        Int128Add (013F121375h) 


Look at how RCX is used, lol.
Title: Re: Int128 in assembler
Post by: nidud on June 16, 2018, 01:31:32 AM
deleted
Title: Re: Int128 in assembler
Post by: aw27 on June 16, 2018, 01:41:49 AM
I am talking about Windows ABI since the beginning of this thread and you are talking about a feature of the GCC compiler (a more sophisticated compiler according to you  :icon_eek:)
Title: Re: Int128 in assembler
Post by: nidud on June 16, 2018, 01:47:00 AM
deleted
Title: Re: Int128 in assembler
Post by: aw27 on June 16, 2018, 01:52:05 AM
I know you will end winning any discussion due to fatigue of the opponent.

I will keep only this for the record:
Quote
Unless he uses a more sophisticated compiler this will not be possible given the maximum returned integer value is 64-bit.
:bgrin:
Title: Re: Int128 in assembler
Post by: nidud on June 16, 2018, 01:58:34 AM
deleted
Title: Re: Int128 in assembler
Post by: aw27 on June 16, 2018, 02:07:10 AM
Quote from: nidud on June 16, 2018, 01:58:34 AM
As for your usual (your all idiots because you don't know what I just learn five minutes ago from google) babble, that's just entertainment.
You are a bad loser, you should recognize the nonsense you have been saying. This kind of ignorance is not acceptable from someone that is developing an assembler supposed to be compliant with the Windows ABI. Or is it compliant with the GCC ABI?
Bob is saying that he is a C# developer since the first message and you don't stop push selling your GCC ideas.
Title: Re: Int128 in assembler
Post by: bigbadbob on June 16, 2018, 11:50:28 AM
I would like to learn the Windows ABI way of doing it.  I'll be using C# from Windows.  I might also use it from C++, but it will be the Visual Studio compiler.
Title: Re: Int128 in assembler
Post by: hutch-- on June 16, 2018, 12:49:26 PM
Bob,

Don't take any notice of the "kiddies", its just a form of sport.  :P

Kiddies,

Behave yourself ! ;)
Title: Re: Int128 in assembler
Post by: nidud on June 16, 2018, 11:11:34 PM
deleted
Title: Re: Int128 in assembler
Post by: aw27 on June 16, 2018, 11:29:44 PM
Quote
What I've been saying is that using VS your limited to 64-bit size arguments and return values so you have to use pointers.
What's wrong with using pointers? We all know that XMM registers are bad doing integer operations, and they don't any 128-bit operation! They are just carriers  :badgrin:, so you will have to offload their content to make something useful. People usually forget that! It is the same with the VectorCall convention, people forget that the XMM registers have to be loaded and this takes CPU cycles. Don't embark in buzz words, experiment and test by yourself.

Quote
bla, bla bla, ...

No comments. You insist that C# has 128-bit data types. It has NOT.

Quote
As already mention, C# is not bound to a specific ABI. It's created to be used on different computer platforms without being rewritten for specific architectures.
Welcome to planet Earth, please land now that there is no fog. Things here are quite different.  :biggrin:
Title: Re: Int128 in assembler
Post by: hutch-- on June 17, 2018, 01:57:49 AM
 :biggrin:

> What I've been saying is that using VS your limited to 64-bit size arguments and return values so you have to use pointers.

VS does have a technique for writing 128 and 256 bit data types, its called MASM. That is why Microsoft supply MASM in both the old 32 bit version and the 64 bit version. Now you can be sure that nether will run on a Motorola MAC, MIPS, PDP8 or Lunix but both can produce binaries for the OS they are supplied for, Windows.  :P
Title: Re: Int128 in assembler
Post by: nidud on June 17, 2018, 02:57:05 AM
deleted
Title: Re: Int128 in assembler
Post by: aw27 on June 17, 2018, 04:22:34 AM
@nidud

Quote
Registers:
Code: [Select]
    mov rax,rcx ; in:  rdx:rcx, r8:r9
    add rax,r8  ; out: rdx:rax
    adc rdx,r9
Pointers:
Code: [Select]
    mov r9,[rcx]        ; in:  [rcx], [rdx]
    mov r10,[rcx+8]     ; out: [r8]
    add r9,[rdx]
    adc r10,[rdx+8]
    mov [r8],r9
    mov [r8+8],r10

You are mixing apples with oranges in a frustrated attempt to come up with something true.
I am talking about what is wrong with pointers when calling functions. and you start a compulsive addition manipulation inside a function.

Quote
The Linux implementation of the Quadmath actually use both. There may be some advantages in doing that but I failed to see any:
If we need a high precision (you define the precision you want) math library we should use MPIR (GMP fork), not bloated limited precision math DLLs like the quadmath. MPIR is in large part written in ASM. I have already posted how to use MPIR from ASM.
Tell me, what is quadmath good for? Can we use it for a large number factorization for example?

Quote
This is simply not true so I think it's safe to just write this off as pure ignorance on your part.
is this an argument? Why don't you produce another compulsive code demo on this one?

Quote
You see, your assertion that it's impossible to write assembler code which is faster and more compact than optimized C++ is simply not true (I assume that was the conclusion in the article you wrote).
I never said it was impossible, but am waiting patiently for someone to beat the compiler on the same routines. This will be more interesting than going there and downvoting an article, as some people do once in a while, that has deserved the prize of article of the month.

Title: Re: Int128 in assembler
Post by: nidud on June 17, 2018, 07:54:08 AM
deleted
Title: Re: Int128 in assembler
Post by: bigbadbob on June 17, 2018, 09:48:00 AM
I found something annoying about using FASTCALL (Win64 ABI) when using PROC arguments.


Test1 PROC arg1:QWORD,arg2:QWORD,arg3:QWORD,arg4:QWORD,arg5:QWORD,arg6:QWORD
   ; save shadow space
   mov [rbp+16], rcx
   mov [rbp+24], rdx
   mov [rbp+32], r8
   mov [rbp+40], r9

   ; ready to add the arguments up
   mov rax, arg1
   add rax, arg2
   add rax, arg3
   add rax, arg4
   add rax, arg5
   add rax, arg6
   ret 
Test1 ENDP 


And this one:

Test2 PROC arg1:QWORD,arg2:QWORD,arg3:QWORD,arg4:QWORD,arg5:QWORD,arg6:QWORD
   ; save shadow space
   mov arg1, rcx
   mov arg2, rdx
   mov arg3, r8
   mov arg4, r9

   ; ready to add the arguments up
   mov rax, arg1
   add rax, arg2
   add rax, arg3
   add rax, arg4
   add rax, arg5
   add rax, arg6
   ret 
Test2 ENDP 


I made up an example of storing the shadow space. 
Both of those work the same.
I wrote this based on what I thought Hutch said.  I would think that always using the name of the argument is better.  Though I know it might be different if the arguments were not all QWORD.

I have this example, though...

Test3 PROC arg1:WORD,arg2:WORD,arg3:WORD,arg4:WORD,arg5:WORD,arg6:WORD
   ; save shadow space
   mov QWORD PTR arg1, rcx
   mov QWORD PTR arg2, rdx
   mov QWORD PTR arg3, r8
   mov QWORD PTR arg4, r9

   ; ready to add the arguments up
   mov ax, arg1
   add ax, arg2
   add ax, arg3
   add ax, arg4
   add ax, arg5
   add ax, arg6
   ret 
Test3 ENDP 


What is your opinion on this?  I recast it to QWORD so that the actual type of the argument does not matter.
My issue is that if I added "USES RBX" then the stack would change?  I think that using the named arguments is better.
If a programmer is going to bother using the automatic stack manipulation for named arguments, then the programmer
should never manually mention the stack locations they might change if the procedure is rewritten and a new
register is added to the list in USES.

Now I would also like to reveal something...
I've used MASM32 before and I know all about PROC, PROTO, invoke, etc.  (It might have been 10-15 years ago).
But I've read the MASM64 help and noticed that ML64 does not have invoke, but someone made a macro.

I also was wondering why someone did not yet write:
invokefast which would not waste time with storing the first 4 arguments on the stack.
The callee should use the stack space if needed, but the caller shouldn't.
It should be a black box.  The caller says here is a shadow space, I will not fill it in.  Use it if you want, but I'll ignore it.

The callee if it uses a whole bunch of registers might use the shadow space.  But the caller should not even know.
What I'm asking about is a macro that follows the FASTCALL convention exactly.

So I would like to tell you that I expected the named arguments to be on the stack, that is why I did not use this notation.
I started coding after I read to use the registers, so that is what I've been doing.

Time to rewrite my UInt128Mul... I'll be using the shadow space correctly now.
Title: Re: Int128 in assembler
Post by: hutch-- on June 17, 2018, 02:30:26 PM
Bob,

The calling techniques in the MASM64 stuff so far does procedure calls in a couple of ways. There is a general "procedure_call" type macro that is called with a number of wrappers, "invoke" included and a pure register call macro which will accept up to 4 arguments, both conform to the Win 64 ABI and handle both ends of the market.

The invoke style macro writes the first 4 args to shadow space as well as the registers and the rest directly to the correct stack locations. It also supports quoted text. The direct register call macros only write up to the first 4 registers so you can do both and in a very efficient manner.

ML64 is an unconfigured assembler and needs to use the pre-processor to configure it. Whereas ML.EXE in Win 32 was easy enough to write pure mnemonic code, the Win 64 ABI is a lot more complex than stack based Win 32 and while it can be done purely manually, its not for the faint of heart.

With stackframe support, you can handle any of the normal high level API and procedure calls but for pure algorithms you write procedures with no stack frame and call them using up to the first 4 registers.
Title: Re: Int128 in assembler
Post by: bigbadbob on June 17, 2018, 03:40:23 PM
Quote from: hutch-- on June 17, 2018, 02:30:26 PM
The invoke style macro writes the first 4 args to shadow space as well as the registers and the rest directly to the correct stack locations. It also supports quoted text. The direct register call macros only write up to the first 4 registers so you can do both and in a very efficient manner.

I read the macro source code and I don't think that invoke writes to the shadow space, it actually appears to work as I expected.

Please note that I installed MASM32 and MASM64 on my current computer in 2017.

Title: Re: Int128 in assembler
Post by: aw27 on June 17, 2018, 10:28:25 PM
@nidud

Quote
Well, lets take this from the start

Sure, we can retry as many times as you need, hopefully you will understand in the end.

Quote
GCC extends this to 64-bit.

  mov rax,rcx ; in:  rdx:rcx, r8:r9
    add rax,r8  ; out: rdx:rax
    adc rdx,r9
    ret

Ah, so this is the part you find really cool.
The problem is that nothing useful can be done with the return value in RDX:RAX.
You will still need to call functions with pointer arguments, or are you going to pass arguments in rdx:rax or may be also in r11:r10, r13:r12, you don't clarify this part (ah, from __int128 foo(__int128 a, __int128 b ) it appears that you are thinking about some 128 bit registers that don't exist yet in this planet) ?
If neither is true you will need to save RDX:RAX to memory, like it or not. This means that RDX:RAX is only a useless carrier and consumer of CPU cycles of the return value from callee to caller.


Title: Re: Int128 in assembler
Post by: hutch-- on June 18, 2018, 12:11:42 AM
Bob,

This works fine, the macro that "invoke" calls definitely writes to shadow space.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    invoke testproc,150,300,450,600

    waitkey

    .exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

testproc proc arg1:QWORD,arg2:QWORD,arg3:QWORD,arg4:QWORD

  ; clear the 4 registers

    xor rcx, rcx
    xor rdx, rdx
    xor r8, r8
    xor r9, r9

  ; display the values from shadow space

    conout str$(arg1),lf
    conout str$(arg2),lf
    conout str$(arg3),lf
    conout str$(arg4),lf

    ret

testproc endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end
Title: Re: Int128 in assembler
Post by: jj2007 on June 18, 2018, 12:49:50 AM
Quote from: bigbadbob on June 17, 2018, 03:40:23 PMI read the macro source code and I don't think that invoke writes to the shadow space

Macros are powerful, you can ask them explicitly to write to shadow space; with jinvoke, just add a <cb> after proc:

include \Masm32\MasmBasic\Res\JBasic.inc        ; see 64-bit assembly with RichMasm (http://masm32.com/board/index.php?topic=5314.msg59884#msg59884)
.code
testproc proc <cb> arg1:QWORD,arg2:QWORD,arg3:QWORD,arg4:QWORD
    xor rcx, rcx                        ; clear the 4 registers
    xor rdx, rdx
    xor r8, r8
    xor r9, r9
    PrintLine Str$("a1: %i\na2: %i\na3: %i\na4: %i", arg1, arg2, arg3, arg4)  ; display the values from shadow space
    ret
testproc endp

Init           ; OPT_64 1      ; put 0 for 32 bit, 1 for 64 bit assembly
  jinvoke testproc, 100, 200, 300, 400
  Inkey Chr$("This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format")
EndOfCode


Output:
100
200
300
400
This code was assembled with ml64 in 64-bit format

Builds & runs also with UAsm, of course. The <cb> behaviour is needed for Windows callback functions, such as WndProc. Without the <cb>, the macro saves the four instructions needed to write to shadow space, but the callee must know what to do with the four registers 8)
Title: Re: Int128 in assembler
Post by: nidud on June 18, 2018, 03:34:20 AM
deleted
Title: Re: Int128 in assembler
Post by: bigbadbob on June 18, 2018, 06:57:03 AM
Let's move further ABI/FASTCALL discussion to http://masm32.com/board/index.php?topic=7222.0 (http://masm32.com/board/index.php?topic=7222.0)

nidud I noticed that this code does not properly multiply the generic case of OWORD multiplied by OWORD.  You said that it was inline so it probably works for a special case.
Quote
These functions are used for the REAL16 (quadmath) implementation so most of them are inline. The mul function goes something like this:


        .if !rdx && !r11
            mul     r10
            xor     r10,r10
        .else
            mul     r10
            mov     rbx,rdx
            mov     rdi,rax
            mov     rax,rcx
            mul     r11
            mov     r11,rdx
            xchg    r10,rax
            mov     rdx,rcx
            mul     rdx
            add     rbx,rax
            adc     r10,rdx
            adc     r11,0


There should be 4 multiplies in the else part.
rdx:rax * r11:r10 = rax * r10 (QWORD0 and QWORD1) + rdx * r10 (QWORD1 and QWORD2) + rax * r11 (QWORD1 and QWORD2) + rdx * r11 (QWORD2 and QWORD3)

Title: Re: Int128 in assembler
Post by: nidud on June 18, 2018, 07:46:51 AM
deleted
Title: Re: Int128 in assembler
Post by: nidud on June 18, 2018, 09:10:02 AM
deleted
Title: Re: Int128 in assembler
Post by: bigbadbob on June 18, 2018, 10:04:24 AM
Quote from: nidud on June 18, 2018, 09:10:02 AM
:biggrin:

There actually is 4 multiplies in there so it's more or less the same code as above using different regs.

I accidentally did not scroll and then quoted incorrectly.  You are correct, the first post was correct.
:t
Title: Re: Int128 in assembler
Post by: hutch-- on June 18, 2018, 10:26:58 AM
Guys,

I moved this topic as it is way too complicated for learners.
Title: Re: Int128 in assembler
Post by: bigbadbob on June 18, 2018, 02:00:44 PM
Hi nidud,

Here is my version, very commented.

_text SEGMENT 

public UInt128Mul

UInt128 STRUCT
    loQWORD QWORD ?
    hiQWORD QWORD ?
UInt128 ENDS

UInt256 STRUCT
    myQWORD0 QWORD ?
    myQWORD1 QWORD ?
    myQWORD2 QWORD ?
    myQWORD3 QWORD ?
UInt256 ENDS

; UInt128Mul
; ---------
; RCX - WORD - PTR to UInt256 (result)
; RDX - WORD - PTR to UInt128 (input1)
; R8  - WORD - PTR to UInt128 (input2)
; R9  - volatile, not used for parameter passing
; ---------
; RAX volatile
; R10 volatile
; R11 volatile
;----------
; C Header - variant 1
; UInt128 UInt128Mul(UInt128* const input1, UInt128* const input2);
; C Header - variant 2
; void UInt128Mul(UInt256* result, UInt128* const input1, UInt128* const input2);
;----------
; source remains unchanged
; performs *R8 = *RCX * *RDX in 128 bit mode resulting in 256 bits
UInt128Mul PROC result:PTR UInt128, input1:PTR UInt128, input2:PTR UInt128
   ; no need to save shadow space yet
   cmp (UInt128 PTR [rdx]).hiQWORD, 0
   jne long_math
   cmp (UInt128 PTR [r8]).hiQWORD, 0
   jne long_math

   mov rax, (UInt128 PTR [rdx]).loQWORD
   mov r10, (UInt128 PTR [r8]).loQWORD
   mul r10
   mov (UInt256 PTR [rcx]).myQWORD0, rax
   mov (UInt256 PTR [rcx]).myQWORD1, rdx
   mov (UInt256 PTR [rcx]).myQWORD2, 0
   mov (UInt256 PTR [rcx]).myQWORD3, 0
   mov rax, rcx
   ret

long_math:
   ; save shadow space
   mov result, rcx ; this saved shadow parameter is actually used
   mov input1, rdx ; saved but not used
   mov input2, r8  ; saved but not used

   push r12
   push r13
   push r14
   push r15
   xor r14, r14 ; make r14 and r15 zero because they will start as carries
   xor r15, r15
   
   mov rax, (UInt128 PTR [rdx]).loQWORD ; rdx is a pointer to input1
   mov r10, (UInt128 PTR [r8]).loQWORD  ; r8 is a pointer to input2
   mov rcx, (UInt128 PTR [rdx]).hiQWORD
   mov r11, (UInt128 PTR [r8]).hiQWORD
   mov r8, rax
   mul r10         ; input1.loQWORD * input2.loQWORD ==> rdx : rax
   mov r12, rax    ; this is the result.myQWORD0 (final result)
   mov r13, rdx    ; this is the result.myQWORD1 (temp)
   mov rax, r8   
   mul r11         ; input1.loQWORD * input2.hiQWORD ==> rdx : rax
   add r13, rax    ; update result.myQWORD1 (still temp) by adding
   adc r14, rdx    ; result.myQWORD2 (temp) add with carry
   mov rax, rcx
   mul r10         ; input1.hiQWORD * input2.loQWORD ==> rdx : rax
   add r13, rax    ; update result.myQWORD1 (final result) by adding
   adc r14, rdx    ; update result.myQWORD2 (temp) by adding with carry
   adc r15, 0      ; begin using result.myQWORD3 (temp) in case of carry
   mov rax, rcx
   mul r11         ; input1.hiQWORD * input2.hiQWORD ==> rdx : rax
   add r14, rax    ; update result.myQWORD2 (final result) by adding
   adc r15, rdx    ; update result.myQWORD3 (final result) by adding with carry
   mov rax, result ; load result pointer into rax to begin storing in memory
   mov (UInt256 PTR [rax]).myQWORD0, r12
   mov (UInt256 PTR [rax]).myQWORD1, r13
   mov (UInt256 PTR [rax]).myQWORD1, r14
   mov (UInt256 PTR [rax]).myQWORD1, r15
   pop r15
   pop r14
   pop r13
   pop r12
   ret
UInt128Mul ENDP 
_text ENDS 
END
Title: Re: Int128 in assembler
Post by: aw27 on June 18, 2018, 06:08:57 PM
Quote from: nidud on June 18, 2018, 03:34:20 AM
Nevertheless you can't do the RDX:RAX thing in VS (if that was your plan) as already explained, so this is strictly assembler.
I know, it is something that can eventually be explored only in Assembler. Not either within the Windows ABI or the System V ABI.
However, I can not visualize a good way to explore it in a ASM only application. Don't be afraid to post a solution if you have it.
Title: Re: Int128 in assembler
Post by: nidud on June 19, 2018, 12:28:00 AM
deleted
Title: Re: Int128 in assembler
Post by: aw27 on June 19, 2018, 05:01:10 AM
It is not a routine that proves what you want. You need to make a function that calls that routine and then calls the same or a similar routine in order to produce something useful that can be printed - in other words make a f*g application, stop bluffing. When you try that you will see you are wasting CPU cycles with those maneuvers.
Title: Re: Int128 in assembler
Post by: hutch-- on June 19, 2018, 02:56:00 PM
What fascinates me with this discussion is the level of fud involved, you have in Windows a published ABI and it handles from BYTE to QWORD, above that (SSE, AVX) you use pointers to larger data sizes OR you directly load SSE or AVX registers then call the procedure. Now I have no doubt that you can do things in different ways if you have code at the receiving end that will handle it but its hard to beat a single pointer when the alternative is to have to re-assembler weird techniques back into a usable data type.

I have no doubt its character building and may even be amusement but its not an improvement over the published ABI. Now an alternative is to create a structure if you have to pass a variety of different sizes to the same proc. Ensure the data in the struct is aligned correctly, big first dropping down in size to the smaller sizes then pass a single structure pointer and you have probably hit the big time in terms of the most efficient technique to call a procedure with variable sized arguments.
Title: Re: Int128 in assembler
Post by: aw27 on June 19, 2018, 05:28:37 PM
Quote from: hutch-- on June 19, 2018, 02:56:00 PM
What fascinates me with this discussion is the level of fud involved,

Nothing forbids us from inventing our own ABI and use it inside our ASM only application - practically the only restriction is keep aligned what needs to be aligned to prevent an exception. But in almost every case there is little to no advantage in inventing a new ABI (this includes the VectorCall, which can be advantageous only with specially tailor made routines).

Large data will need to be passed with pointers, like it or not. And return values will continue limited by the size of registers as well or returned  in pointers. Now, comes @nidud saying that we can return values in 2 registers instead of one (he got the idea from the 32-bit way of returning a 64-bit value ). Looks like an appealing idea if we abstract how we are going to deal with the data in 2 registers on the calling end.
@nidud says "easy, see that we can multiply and come up with a 256 bit value in 4 registers!" (I am sure we can also come up with a 512 bit value in 8 registers).

What else can we do out of that mess of data spread across multiple registers?
If we need to call a function we will have to move data around to new registers (which may end up being a "musical chairs" game) or save it in memory (something we were trying to avoid at all costs in the first place).
Title: Re: Int128 in assembler
Post by: nidud on June 20, 2018, 12:25:12 AM
deleted
Title: Re: Int128 in assembler
Post by: jj2007 on June 20, 2018, 02:28:51 AM
Quote from: nidud on June 20, 2018, 12:25:12 AMLook, nobody has disputed the fact that you need a pointer to return a value larger than 64-bit in Windows 64 unless you use a vector to do so. In the case of __int128 and larger integer values (the subject of this tread) I recommended, based on experience, not to do so and use pointers instead.

In the case of __int128, returning the value in xmm0 would be a natural choice.
Title: Re: Int128 in assembler
Post by: aw27 on June 20, 2018, 02:43:55 AM
Quote from: jj2007 on June 20, 2018, 02:28:51 AM
In the case of __int128, returning the value in xmm0 would be a natural choice.
Complete nonsense.  :badgrin:
Title: Re: Int128 in assembler
Post by: aw27 on June 20, 2018, 02:53:17 AM
Quote
It was the big software corporations and chip manufactures who invented things like vectorcall and System V to improve performance. You think they where all wrong, Idiots even?
Sorry, I believed you were the socialist guy in here. I never though that about big corporations, they are the best think in this World  :t
Title: Re: Int128 in assembler
Post by: nidud on June 20, 2018, 03:10:56 AM
deleted
Title: Re: Int128 in assembler
Post by: jj2007 on June 20, 2018, 03:31:08 AM
Quote from: AW on June 20, 2018, 02:43:55 AM
Quote from: jj2007 on June 20, 2018, 02:28:51 AM
In the case of __int128, returning the value in xmm0 would be a natural choice.
Complete nonsense.  :badgrin:

Your remark is about as competent as saying that returning DWORD values in eax is "complete nonsense".
(ok ok, I know I shouldn't feed the troll, but he looks soooo hungry... poor beast :shock:)
Title: Re: Int128 in assembler
Post by: nidud on June 20, 2018, 03:33:13 AM
deleted
Title: Re: Int128 in assembler
Post by: aw27 on June 20, 2018, 03:40:43 AM
Quote from: nidud on June 20, 2018, 03:10:56 AM
Quote from: AW on June 20, 2018, 02:53:17 AM
I never though that about big corporations, they are the best think in this World  :t

:biggrin:

So who's the idiot then?
I guess you already got the confirmation from the mirror:
"Mirror, mirror, on the wall,
Who in this land is the most idiot of all?"

:t
Title: Re: Int128 in assembler
Post by: aw27 on June 20, 2018, 03:42:23 AM
Quote from: jj2007 on June 20, 2018, 03:31:08 AM
Quote from: AW on June 20, 2018, 02:43:55 AM
Quote from: jj2007 on June 20, 2018, 02:28:51 AM
In the case of __int128, returning the value in xmm0 would be a natural choice.
Complete nonsense.  :badgrin:

Your remark is about as competent as saying that returning DWORD values in eax is "complete nonsense".
(ok ok, I know I shouldn't feed the troll, but he looks soooo hungry... poor beast :shock:)

LOL, you are becoming less and less intelligent every day.  :badgrin:
Title: Re: Int128 in assembler
Post by: hutch-- on June 20, 2018, 04:02:57 AM
 :biggrin:

Maybe we shoud have moved this discussion to Romper Room.  :P
Title: Re: Int128 in assembler
Post by: daydreamer on June 20, 2018, 09:02:21 PM
(puts on flameproof asbestos suit)
For simple add128 and sub128,wouldnt it be easier with macros,than to worry about calling conventions?
Would be unnesserary slow add overhead calling convention+call/ret for fast add/sub

Title: Re: Int128 in assembler
Post by: nidud on June 20, 2018, 11:00:48 PM
deleted