Author Topic: Int128 in assembler  (Read 770 times)

bigbadbob

  • Regular Member
  • *
  • Posts: 17
Int128 in assembler
« on: June 13, 2018, 02:04:43 PM »
I'm a C# programmer, but have always understood assembly.  I just don't code in it very often.

This is my first at least semi-working 64-bit dll written in assembler.
I actually called it from C# using PInvoke to test it, so at the minimum it is compliant to the ABI, and should work from assembler "C"/"C++" and .Net using PInvoke.
Not really sure if it is fully compliant, but it did not crash C#.

Sample Add method:

Code: [Select]
_text SEGMENT 

public Int128Add

; Int128Add
; ---------
; RCX - QWORD - PTR to Int128 (input1)
; RDX - QWORD - PTR to Int128 (input2)
; R8  - QWORD - PTR to Int128 (result)
; R9  - unused
; ---------
; RAX volatile
; R10 volatile
; R11 volatile
;----------
; C Header
; void Int128Add(_m128* const input1, _m128* const input2, _m128* result )
;----------
; source remains unchanged
; performs *R8 = *RCX + *RDX in 128 bit mode
Int128Add PROC FRAME 
   push rbp 
.pushreg rbp 
   sub rsp, 010h 
.allocstack 010h 
   mov rbp, rsp 
.setframe rbp, 0 
.endprolog 
   mov rax, QWORD PTR [rcx]
   mov r10, QWORD PTR [rcx+8]
   add rax, QWORD PTR [rdx]
   adc r10, QWORD PTR [rdx+8]
   mov QWORD PTR [r8], rax
   mov QWORD PTR [r8+8], r10

   ; epilog 
   add rsp, 010h 
   pop rbp 
   ret 
Int128Add ENDP 
_text ENDS 
END

And that method is working, but I think that I allocated too much stack.

I think that add and subtract are the same Int128 (signed) and UInt128 (unsigned)

Also untested, multiply:

Code: [Select]
_text SEGMENT 

public UInt128Mul

; UInt128Mul
; ---------
; RCX - QWORD - PTR to Int128 (input1)
; RDX - QWORD - PTR to Int128 (input2)
; R8  - QWORD - PTR to Int256 (result)
; R9  - unused
; ---------
; RAX volatile
; R10 volatile
; R11 volatile
;----------
; C Header
; void UInt128Mul(_int128* const input1, _int128* const input2, _int256* result )
;----------
; source remains unchanged
; performs *R8 = *RCX * *RDX in 128 bit mode resulting in 256 bits
UInt128Mul PROC FRAME 
   push rbp 
.pushreg rbp
   push rdx
.pushreg rdx 
   sub rsp, 050h 
.allocstack 050h 
   mov rbp, rsp 
.setframe rbp, 0 
.endprolog 
;  _int256 temp1 = 20h bytes rbp,    rbp+8,  rbp+16, rbp+24
;  _int256 temp2 = 20h bytes rbp+32, rbp+40, rbp+48, rbp+56
;  _int128 input2 shadow = 16 bytes (10h) rbp+64, rbp+72
   
   ; temp1 = temp2 = 0
   xor rax, rax
   mov QWORD PTR [rbp], rax
   mov QWORD PTR [rbp+8], rax
   mov QWORD PTR [rbp+16], rax
   mov QWORD PTR [rbp+24], rax
   mov QWORD PTR [rbp+32], rax
   mov QWORD PTR [rbp+40], rax
   mov QWORD PTR [rbp+48], rax
   mov QWORD PTR [rbp+56], rax

   ; input2 shadow = *input2
   mov rax, QWORD PTR [rdx]
   mov QWORD PTR [rbp+64], rax
   mov rax, QWORD PTR [rdx+8]
   mov QWORD PTR [rbp+72], rax

   mov rax, QWORD PTR [rcx]
   mov rdx, QWORD PTR [rbp+64]
   mul rdx
   mov QWORD PTR [rbp], rax
   mov QWORD PTR [rbp+8], rdx

   mov rax, QWORD PTR [rcx+8]
   mov rdx, QWORD PTR [rbp+64]
   mul rdx
   mov QWORD PTR [rbp+32], rax
   mov QWORD PTR [rbp+40], rdx
   call add_temp

   mov rax, QWORD PTR [rcx]
   mov rdx, QWORD PTR [rbp+72]
   mul rdx
   mov QWORD PTR [rbp+32], rax
   mov QWORD PTR [rbp+40], rdx
   call add_temp

   mov rax, QWORD PTR [rcx+8]
   mov rdx, QWORD PTR [rbp+72]
   mul rdx
   mov QWORD PTR [rbp+40], rax
   mov QWORD PTR [rbp+48], rdx
   xor rax, rax
   mov QWORD PTR [rbp+32], rax
   call add_temp

   mov rax, QWORD PTR [rbp]
   mov rdx, QWORD PTR [rbp+8]
   mov r10, QWORD PTR [rbp+16]
   mov r11, QWORD PTR [rbp+24]
   mov QWORD PTR [r8], rax
   mov QWORD PTR [r8+8], rdx
   mov QWORD PTR [r8+16], r10
   mov QWORD PTR [r8+24], r11

   ; epilog 
   add rsp, 050h
   pop rdx 
   pop rbp 
   ret 

add_temp:
   mov rax, QWORD PTR [rbp]
   add rax, QWORD PTR [rbp+32]
   mov QWORD PTR [rbp], rax

   mov rax, QWORD PTR [rbp+8]
   adc rax, QWORD PTR [rbp+40]
   mov QWORD PTR [rbp+8], rax

   mov rax, QWORD PTR [rbp+16]
   adc rax, QWORD PTR [rbp+48]
   mov QWORD PTR [rbp+16], rax

   mov rax, QWORD PTR [rbp+24]
   adc rax, QWORD PTR [rbp+56]
   mov QWORD PTR [rbp+24], rax
   
   ret
UInt128Mul ENDP 
_text ENDS 
END

I'm also looking for a good divide 128 algorithm.

AW

  • Member
  • *****
  • Posts: 1300
  • Let's Make ASM Great Again!
Re: Int128 in assembler
« Reply #1 on: June 13, 2018, 04:54:04 PM »
There is nothing we can call a _int128 data type (as far as I know), it is likely a structure you forgot to define.

In ASM we have owords but the ABI does not consider them a "returnable" data type.

Your coding style is bad, nobody really uses exception frames in ASM without knowing exactly what they are doing. This is not your case, you are not even sure you need 16 bytes of stack.


I did not look at your 2nd function.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 5427
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Int128 in assembler
« Reply #2 on: June 13, 2018, 05:44:44 PM »
Bob,

If you want to pass and return a 128 bit sized piece of data, you would normally pass the address of that data. You could of course use one or more SSE registers both in and out but AW is correct here, according to the Win64 ABI, you can only pass up to a 64 bit value as normal arguments as the ABI is designed that way.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

nidud

  • Member
  • *****
  • Posts: 1506
    • https://github.com/nidud/asmc
Re: Int128 in assembler
« Reply #3 on: June 13, 2018, 08:18:44 PM »
Hi Bob,

Quote
I'm also looking for a good divide 128 algorithm.

Here's the unsigned divide function I use:
Code: [Select]
;
; _udiv256() - Divide
;
; Unsigned binary division of dividend by source.
; Note: The quotient is stored in dividend.
;
include quadmath.inc

.code

ifdef _LINUX
_udiv256 proc uses rbx dividend:PU256, divisor:PU256, reminder:PU256
    mov r10,dividend
    mov r11,reminder
    mov rbx,divisor
else
option cstack:on
option win64:rsp nosave noauto
_udiv256 proc uses rsi rdi rbx dividend:PU256, divisor:PU256, reminder:PU256
    mov r10,rcx ; R10: quotient
    mov r11,r8  ; R11: reminder
    mov rbx,rdx ; RBX: divisor
endif

    mov rsi,r10 ; dividend --> reminder
    mov rdi,r11
    mov ecx,4
    rep movsq

    xor eax,eax ; quotient (dividend) --> 0
    mov rdi,r10
    mov ecx,4
    rep stosq

    .repeat

        or rax,[rbx] ; divisor zero ?
        or rax,[rbx+8]
        or rax,[rbx+16]
        or rax,[rbx+24]
        .ifz
            mov rdi,r11
            mov ecx,4
            rep stosq
            .break
        .endif

        mov rax,[rbx+24]
        .if rax == [r11+24]
            mov rax,[rbx+16]
            .if rax == [r11+16]
                mov rax,[rbx+8]
                .if rax == [r11+8]
                    mov rax,[rbx]
                    cmp rax,[r11]
                .endif
            .endif
        .endif
        .ifa
            ;
            ; divisor > dividend : reminder = dividend, quotient = 0
            ;
            .break
        .else
            .ifz
                ;
                ; divisor == dividend : reminder = 0, quotient = 1
                ;
                mov rdi,r11
                mov ecx,4
                rep stosq
                inc byte ptr [r10]
                .break
            .endif
        .endif

        mov rdi,rbx
        mov rcx,[rdi+24]
        mov rbx,[rdi+16]
        mov rdx,[rdi+8]
        mov rax,[rdi]
        xor r8d,r8d

        .while 1
            add rax,rax
            adc rdx,rdx
            adc rbx,rbx
            adc rcx,rcx
            .break .ifc
            .if rcx == [r11+24]
                .if rbx == [r11+16]
                    .if rdx == [r11+8]
                        cmp rax,[r11]
                    .endif
                .endif
            .endif
            .break .ifa
            inc r8d
        .endw

        .while 1
            rcr rcx,1
            rcr rbx,1
            rcr rdx,1
            rcr rax,1
            sub [r11],rax
            sbb [r11+8],rdx
            sbb [r11+16],rbx
            sbb [r11+24],rcx
            cmc
            .ifnc
                .repeat
                    mov r9,[r10]
                    add [r10],r9
                    mov r9,[r10+8]
                    adc [r10+8],r9
                    mov r9,[r10+16]
                    adc [r10+16],r9
                    mov r9,[r10+24]
                    adc [r10+24],r9
                    dec r8d
                    .ifs
                        add [r11],rax
                        adc [r11+8],rdx
                        adc [r11+16],rbx
                        adc [r11+24],rcx
                        .break(1)
                    .endif
                    shr rcx,1
                    rcr rbx,1
                    rcr rdx,1
                    rcr rax,1
                    add [r11],rax
                    adc [r11+8],rdx
                    adc [r11+16],rbx
                    adc [r11+24],rcx
                .untilb
            .endif
            mov r9,[r10]
            adc [r10],r9
            mov r9,[r10+8]
            adc [r10+8],r9
            mov r9,[r10+16]
            adc [r10+16],r9
            mov r9,[r10+24]
            adc [r10+24],r9
            dec r8d
            .break .ifs
        .endw
    .until 1
    ret

_udiv256 endp

    end

These functions are used for the REAL16 (quadmath) implementation so most of them are inline. The mul function goes something like this:

Code: [Select]
        .if !rdx && !r11
            mul     r10
            xor     r10,r10
        .else
            mul     r10
            mov     rbx,rdx
            mov     rdi,rax
            mov     rax,rcx
            mul     r11
            mov     r11,rdx
            xchg    r10,rax
            mov     rdx,rcx
            mul     rdx
            add     rbx,rax
            adc     r10,rdx
            adc     r11,0
            mov     rax,r8
            mov     rdx,r9
            mul     rdx
            add     rbx,rax
            adc     r10,rdx
            adc     r11,0
            mov     rdx,rbx
            mov     rax,rdi
        .endif

Mikl__

  • Member
  • ****
  • Posts: 613
Re: Int128 in assembler
« Reply #4 on: June 13, 2018, 11:52:17 PM »
Hi, bigbadbob!
will look here

bigbadbob

  • Regular Member
  • *
  • Posts: 17
Re: Int128 in assembler
« Reply #5 on: June 14, 2018, 12:51:23 PM »
Thank you for your responses.
My goal is to learn enough that I can properly explain/teach it to someone else.  There should be enough comments that I could learn the ABI all over again if I forget.
I've never wrote assembly language code for pay.  I would consider myself a beginner, but I learn quickly.

Just so you know I'm writing a DLL in GitHub.
https://github.com/robertkolski/BigBadInt128/blob/master/src/Int128Add.asm
As far as I know, nobody looks at my project.  I thought that joining this forum would be a way to learn.

Now back to the assembly:
I learned somewhere that if you change the stack you should have an exception FRAME pointer in case your memory access, like mov r10, [rcx] fails.  For instance if someone passed rcx = 0 to your function should use the FRAME pointer to unwind.  Later I read that you only need to set up a frame pointer if you actually use the stack within the procedure.  And I was not sure if I could call this function a leaf function, but now I think that I can.

Please see the revised Add function (I use it as a method)
Code: [Select]
_text SEGMENT 

public Int128Add

; Int128Add
; -------------------------------------
; Signed and unsigned add.
; RCX - PTR to OWORD (input1)
; RDX - PTR to OWORD (input2)
; R8  - PTR to OWORD (result)
; R9  - unused
; -------------------------------------
; RAX volatile - but not used
; R10 volatile
; R11 volatile
;--------------------------------------
; C Header - pseudocode prototype
; void Int128Add(_int128* input1, _int128* input2, _int128* result )
; assume that I have a C compiler that supports _int128 or it is a struct
;--------------------------------------
; C# types (without full implementation here):
; public struct Int128
; {
;    private Int64 loQWORD;
;    private Int64 hiQWORD;
;    [DllImport("BigBadInt128.dll")]
;    private static extern void Int128Add(IntPtr addend1, IntPtr addend2, IntPtr result);
;    // public methods not shown
; }
; public struct UInt128
; {
;    private UInt64 loQWORD;
;    private UInt64 hiQWORD;
;    [DllImport("BigBadInt128.dll")]
;    private static extern void Int128Add(IntPtr addend1, IntPtr addend2, IntPtr result);
;    // public methods not shown
; }
;--------------------------------------
; input1 and input2 remain unchanged
; the contents of the OWORD result is modified
;--------------------------------------
; Don't need FRAME and PROLOG because
; this is a leaf function
; it does not need any stack space
; -------------------------------------
Int128Add PROC
   mov r10, QWORD PTR [rcx]
   mov r11, QWORD PTR [rcx+8]
   add r10, QWORD PTR [rdx]
   adc r11, QWORD PTR [rdx+8]
   mov QWORD PTR [r8], r10
   mov QWORD PTR [r8+8], r11
   ret 
Int128Add ENDP 
_text ENDS 
END

AW

  • Member
  • *****
  • Posts: 1300
  • Let's Make ASM Great Again!
Re: Int128 in assembler
« Reply #6 on: June 14, 2018, 03:36:40 PM »
Quote
I would consider myself a beginner, but I learn quickly
You don't need to put it that way, people see immediately who you are.
In addition you are a C# programmer which is not a good starting point.

ASM programmers usually do not spend time with FRAME, particularly if they are beginners, for a few reasons I could detail here but will not. However, for an introduction on the subject you can read this article 64-bit Structured Exception Handling (SEH) in ASM





nidud

  • Member
  • *****
  • Posts: 1506
    • https://github.com/nidud/asmc
Re: Int128 in assembler
« Reply #7 on: June 14, 2018, 09:18:56 PM »
There's nothing wrong with Add function Bob, and the first version also works :t

The C function goes something like this:
Code: [Select]
void Int128Add(__int128 *a, __int128 *b, __int128 *result)
{
    *result = *a + *b;
}

You may copy and past it into https://gcc.godbolt.org/

Using the switch -O2 produce this code:
Code: [Select]
  mov rax, QWORD PTR [rdi]
  mov rcx, rdx
  mov rdx, QWORD PTR [rdi+8]
  mov rdi, QWORD PTR [rsi+8]
  mov rsi, QWORD PTR [rsi]
  add rax, rsi
  adc rdx, rdi
  mov QWORD PTR [rcx], rax
  mov QWORD PTR [rcx+8], rdx
  ret

Note the difference in arguments RCX/RDX/R8 versus RDI/RSI/RDX. Otherwise the code produced is the same.

As for returning values or passing arguments 2*n, this is normally done using DX:AX, EDX:EAX, or in this case RDX:RAX.

Code: [Select]
__int128 foo(__int128 a, __int128 b )
{
    __int128 result;
   
    Int128Add(&a, &b, &result);
    return result;
}

Code: [Select]
  mov r9, rdi
  mov r10, rsi
  add r9, rdx
  adc r10, rcx
  mov rax, r9
  mov rdx, r10
  ret

AW

  • Member
  • *****
  • Posts: 1300
  • Let's Make ASM Great Again!
Re: Int128 in assembler
« Reply #8 on: June 14, 2018, 09:38:35 PM »
@nidud

Quote
or in this case RDX:RAX.

Is this the Windows ABI? I don't think so.

nidud

  • Member
  • *****
  • Posts: 1506
    • https://github.com/nidud/asmc
Re: Int128 in assembler
« Reply #9 on: June 14, 2018, 11:12:14 PM »
If he plans to interact with Windows C compilers that may be a relevant question. The __int128 keyword used in his comment suggest he don't.

One would assume that register spanning like long in 16-bit and __int64 in 32-bit should expand to __int128 in 64-bit as in GCC, but this (as far as I know) is not the case in VS. The MS version will be vectorcall using xmm registers for arguments and return value.

So there are no Windows ABI for 128-bit values. In other words, the maximum value is still 64-bit.

Using GCC for Windows will (for this reason) convert all arguments above 64-bit to pointers and return the value in xmm0.
Code: [Select]
        sub     rsp, 24
        mov     rax, qword ptr [rdx]
        mov     r9, qword ptr [rcx]
        mov     r10, qword ptr [rcx+8H]
        mov     rdx, qword ptr [rdx+8H]
        add     r9, rax
        adc     r10, rdx
        mov     qword ptr [rsp], r9
        mov     qword ptr [rsp+8H], r10
        movdqa  xmm0, xmmword ptr [rsp]
        add     rsp, 24
        ret

The same logic apply to Asmc:
Code: [Select]
.code

foo proto :oword, :oword

bar proc

  local a:oword
  local b:oword

  foo(a, b)
  ret

bar endp

    end

Code: [Select]
bar     PROC
        push    rbp
        mov     rbp, rsp
        sub     rsp, 64
        lea     rcx, [rbp-10H]
        lea     rdx, [rbp-20H]
        call    foo
        leave
        ret
bar     ENDP

AW

  • Member
  • *****
  • Posts: 1300
  • Let's Make ASM Great Again!
Re: Int128 in assembler
« Reply #10 on: June 15, 2018, 12:39:28 AM »
Quote
I'm a C# programmer
He was very clear at that.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 5427
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Int128 in assembler
« Reply #11 on: June 15, 2018, 01:13:35 AM »
It still sounds like there are 2 choices, either use 64 bit pointers to the 128 bit variable OR pass the data in SSE registers. I guess you could pass that data in AVX registers as well. As far as I know VS does not support registers directly so the choices collapse down to passing 64 bit pointers to the 128 bit data like the ABI for Win 64 supports.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

nidud

  • Member
  • *****
  • Posts: 1506
    • https://github.com/nidud/asmc
Re: Int128 in assembler
« Reply #12 on: June 15, 2018, 01:20:04 AM »
 :biggrin:

None of the functions presented by Bob use any return type other than void and all arguments are passed as pointers.

AW

  • Member
  • *****
  • Posts: 1300
  • Let's Make ASM Great Again!
Re: Int128 in assembler
« Reply #13 on: June 15, 2018, 03:01:05 AM »
According to the Windows ABI he has to use pointers to the variable.
The Windows ABI is a convention to be used by all programs, written in any programming language. We know that it is possible to use special calling conventions, or derivatives when we know beforehand with what our ASM module will be linked to.
This is really not the case here, C# does not allow special calling conventions.

Quote
None of the functions presented by Bob use any return type other than void and all arguments are passed as pointers
We know why he did that, because he does not know yet how to do it otherwise.

nidud

  • Member
  • *****
  • Posts: 1506
    • https://github.com/nidud/asmc
Re: Int128 in assembler
« Reply #14 on: June 15, 2018, 04:48:24 AM »
According to the Windows ABI he has to use pointers to the variable.

So you think that's the reason why he's doing just that?

Quote
This is really not the case here, C# does not allow special calling conventions.

True. C# is a common language infrastructure abstracted from any hardware platforms and thus not bound to any specific binary interface.

Quote
Quote
None of the functions presented by Bob use any return type other than void and all arguments are passed as pointers
We know why he did that, because he does not know yet how to do it otherwise.

 :biggrin:

So it's not because he follows the Windows ABI but because he does not know yet how to do it otherwise.