News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Int128 in assembler

Started by bigbadbob, June 13, 2018, 02:04:43 PM

Previous topic - Next topic

bigbadbob

I'm a C# programmer, but have always understood assembly.  I just don't code in it very often.

This is my first at least semi-working 64-bit dll written in assembler.
I actually called it from C# using PInvoke to test it, so at the minimum it is compliant to the ABI, and should work from assembler "C"/"C++" and .Net using PInvoke.
Not really sure if it is fully compliant, but it did not crash C#.

Sample Add method:

_text SEGMENT 

public Int128Add

; Int128Add
; ---------
; RCX - QWORD - PTR to Int128 (input1)
; RDX - QWORD - PTR to Int128 (input2)
; R8  - QWORD - PTR to Int128 (result)
; R9  - unused
; ---------
; RAX volatile
; R10 volatile
; R11 volatile
;----------
; C Header
; void Int128Add(_m128* const input1, _m128* const input2, _m128* result )
;----------
; source remains unchanged
; performs *R8 = *RCX + *RDX in 128 bit mode
Int128Add PROC FRAME 
   push rbp 
.pushreg rbp 
   sub rsp, 010h 
.allocstack 010h 
   mov rbp, rsp 
.setframe rbp, 0 
.endprolog 
   mov rax, QWORD PTR [rcx]
   mov r10, QWORD PTR [rcx+8]
   add rax, QWORD PTR [rdx]
   adc r10, QWORD PTR [rdx+8]
   mov QWORD PTR [r8], rax
   mov QWORD PTR [r8+8], r10

   ; epilog 
   add rsp, 010h 
   pop rbp 
   ret 
Int128Add ENDP 
_text ENDS 
END


And that method is working, but I think that I allocated too much stack.

I think that add and subtract are the same Int128 (signed) and UInt128 (unsigned)

Also untested, multiply:

_text SEGMENT 

public UInt128Mul

; UInt128Mul
; ---------
; RCX - QWORD - PTR to Int128 (input1)
; RDX - QWORD - PTR to Int128 (input2)
; R8  - QWORD - PTR to Int256 (result)
; R9  - unused
; ---------
; RAX volatile
; R10 volatile
; R11 volatile
;----------
; C Header
; void UInt128Mul(_int128* const input1, _int128* const input2, _int256* result )
;----------
; source remains unchanged
; performs *R8 = *RCX * *RDX in 128 bit mode resulting in 256 bits
UInt128Mul PROC FRAME 
   push rbp 
.pushreg rbp
   push rdx
.pushreg rdx 
   sub rsp, 050h 
.allocstack 050h 
   mov rbp, rsp 
.setframe rbp, 0 
.endprolog 
;  _int256 temp1 = 20h bytes rbp,    rbp+8,  rbp+16, rbp+24
;  _int256 temp2 = 20h bytes rbp+32, rbp+40, rbp+48, rbp+56
;  _int128 input2 shadow = 16 bytes (10h) rbp+64, rbp+72
   
   ; temp1 = temp2 = 0
   xor rax, rax
   mov QWORD PTR [rbp], rax
   mov QWORD PTR [rbp+8], rax
   mov QWORD PTR [rbp+16], rax
   mov QWORD PTR [rbp+24], rax
   mov QWORD PTR [rbp+32], rax
   mov QWORD PTR [rbp+40], rax
   mov QWORD PTR [rbp+48], rax
   mov QWORD PTR [rbp+56], rax

   ; input2 shadow = *input2
   mov rax, QWORD PTR [rdx]
   mov QWORD PTR [rbp+64], rax
   mov rax, QWORD PTR [rdx+8]
   mov QWORD PTR [rbp+72], rax

   mov rax, QWORD PTR [rcx]
   mov rdx, QWORD PTR [rbp+64]
   mul rdx
   mov QWORD PTR [rbp], rax
   mov QWORD PTR [rbp+8], rdx

   mov rax, QWORD PTR [rcx+8]
   mov rdx, QWORD PTR [rbp+64]
   mul rdx
   mov QWORD PTR [rbp+32], rax
   mov QWORD PTR [rbp+40], rdx
   call add_temp

   mov rax, QWORD PTR [rcx]
   mov rdx, QWORD PTR [rbp+72]
   mul rdx
   mov QWORD PTR [rbp+32], rax
   mov QWORD PTR [rbp+40], rdx
   call add_temp

   mov rax, QWORD PTR [rcx+8]
   mov rdx, QWORD PTR [rbp+72]
   mul rdx
   mov QWORD PTR [rbp+40], rax
   mov QWORD PTR [rbp+48], rdx
   xor rax, rax
   mov QWORD PTR [rbp+32], rax
   call add_temp

   mov rax, QWORD PTR [rbp]
   mov rdx, QWORD PTR [rbp+8]
   mov r10, QWORD PTR [rbp+16]
   mov r11, QWORD PTR [rbp+24]
   mov QWORD PTR [r8], rax
   mov QWORD PTR [r8+8], rdx
   mov QWORD PTR [r8+16], r10
   mov QWORD PTR [r8+24], r11

   ; epilog 
   add rsp, 050h
   pop rdx 
   pop rbp 
   ret 

add_temp:
   mov rax, QWORD PTR [rbp]
   add rax, QWORD PTR [rbp+32]
   mov QWORD PTR [rbp], rax

   mov rax, QWORD PTR [rbp+8]
   adc rax, QWORD PTR [rbp+40]
   mov QWORD PTR [rbp+8], rax

   mov rax, QWORD PTR [rbp+16]
   adc rax, QWORD PTR [rbp+48]
   mov QWORD PTR [rbp+16], rax

   mov rax, QWORD PTR [rbp+24]
   adc rax, QWORD PTR [rbp+56]
   mov QWORD PTR [rbp+24], rax
   
   ret
UInt128Mul ENDP 
_text ENDS 
END


I'm also looking for a good divide 128 algorithm.

aw27

There is nothing we can call a _int128 data type (as far as I know), it is likely a structure you forgot to define.

In ASM we have owords but the ABI does not consider them a "returnable" data type.

Your coding style is bad, nobody really uses exception frames in ASM without knowing exactly what they are doing. This is not your case, you are not even sure you need 16 bytes of stack.


I did not look at your 2nd function.

hutch--

Bob,

If you want to pass and return a 128 bit sized piece of data, you would normally pass the address of that data. You could of course use one or more SSE registers both in and out but AW is correct here, according to the Win64 ABI, you can only pass up to a 64 bit value as normal arguments as the ABI is designed that way.

nidud

#3
deleted

Mikl__


bigbadbob

Thank you for your responses.
My goal is to learn enough that I can properly explain/teach it to someone else.  There should be enough comments that I could learn the ABI all over again if I forget.
I've never wrote assembly language code for pay.  I would consider myself a beginner, but I learn quickly.

Just so you know I'm writing a DLL in GitHub.
https://github.com/robertkolski/BigBadInt128/blob/master/src/Int128Add.asm
As far as I know, nobody looks at my project.  I thought that joining this forum would be a way to learn.

Now back to the assembly:
I learned somewhere that if you change the stack you should have an exception FRAME pointer in case your memory access, like mov r10, [rcx] fails.  For instance if someone passed rcx = 0 to your function should use the FRAME pointer to unwind.  Later I read that you only need to set up a frame pointer if you actually use the stack within the procedure.  And I was not sure if I could call this function a leaf function, but now I think that I can.

Please see the revised Add function (I use it as a method)

_text SEGMENT 

public Int128Add

; Int128Add
; -------------------------------------
; Signed and unsigned add.
; RCX - PTR to OWORD (input1)
; RDX - PTR to OWORD (input2)
; R8  - PTR to OWORD (result)
; R9  - unused
; -------------------------------------
; RAX volatile - but not used
; R10 volatile
; R11 volatile
;--------------------------------------
; C Header - pseudocode prototype
; void Int128Add(_int128* input1, _int128* input2, _int128* result )
; assume that I have a C compiler that supports _int128 or it is a struct
;--------------------------------------
; C# types (without full implementation here):
; public struct Int128
; {
;    private Int64 loQWORD;
;    private Int64 hiQWORD;
;    [DllImport("BigBadInt128.dll")]
;    private static extern void Int128Add(IntPtr addend1, IntPtr addend2, IntPtr result);
;    // public methods not shown
; }
; public struct UInt128
; {
;    private UInt64 loQWORD;
;    private UInt64 hiQWORD;
;    [DllImport("BigBadInt128.dll")]
;    private static extern void Int128Add(IntPtr addend1, IntPtr addend2, IntPtr result);
;    // public methods not shown
; }
;--------------------------------------
; input1 and input2 remain unchanged
; the contents of the OWORD result is modified
;--------------------------------------
; Don't need FRAME and PROLOG because
; this is a leaf function
; it does not need any stack space
; -------------------------------------
Int128Add PROC
   mov r10, QWORD PTR [rcx]
   mov r11, QWORD PTR [rcx+8]
   add r10, QWORD PTR [rdx]
   adc r11, QWORD PTR [rdx+8]
   mov QWORD PTR [r8], r10
   mov QWORD PTR [r8+8], r11
   ret 
Int128Add ENDP 
_text ENDS 
END

aw27

Quote
I would consider myself a beginner, but I learn quickly
You don't need to put it that way, people see immediately who you are.
In addition you are a C# programmer which is not a good starting point.

ASM programmers usually do not spend time with FRAME, particularly if they are beginners, for a few reasons I could detail here but will not. However, for an introduction on the subject you can read this article 64-bit Structured Exception Handling (SEH) in ASM





nidud

#7
deleted

aw27

@nidud

Quote
or in this case RDX:RAX.

Is this the Windows ABI? I don't think so.

nidud

#9
deleted

aw27

Quote
I'm a C# programmer
He was very clear at that.

hutch--

It still sounds like there are 2 choices, either use 64 bit pointers to the 128 bit variable OR pass the data in SSE registers. I guess you could pass that data in AVX registers as well. As far as I know VS does not support registers directly so the choices collapse down to passing 64 bit pointers to the 128 bit data like the ABI for Win 64 supports.

nidud

#12
deleted

aw27

According to the Windows ABI he has to use pointers to the variable.
The Windows ABI is a convention to be used by all programs, written in any programming language. We know that it is possible to use special calling conventions, or derivatives when we know beforehand with what our ASM module will be linked to.
This is really not the case here, C# does not allow special calling conventions.

Quote
None of the functions presented by Bob use any return type other than void and all arguments are passed as pointers
We know why he did that, because he does not know yet how to do it otherwise.

nidud

#14
deleted