The MASM Forum

General => The Workshop => Topic started by: dedndave on April 22, 2013, 11:14:29 AM

Title: Large 64-bit MUL
Post by: dedndave on April 22, 2013, 11:14:29 AM
my brain isn't working as well as it used to - lol
i am wondering if some of you guys might help   :t

i am having difficulty managing the registers for this routine
;############################################################################################

LrgMul64 PROC USES EBX ESI EDI lpuLrgInt:LPVOID,nDwords:DWORD,uMultLo:DWORD,uMultHi:DWORD

;Call With:
;lpuLrgInt       = pointer to large unsigned integer
;                  the buffer must be at least 2 dwords longer than the input value
;                  the bytes at the end of the integer must be 0 to form a complete dword
;nDwords         = number of dwords in integer pointed to by lpuLrgInt
;uMultHi:uMultLo = 64-bit multiplier

;Returns:
;integer pointed to by lpuLrgInt is multiplied by 64-bit value of uMultHi:uMultLo

        xor     ecx,ecx
        mov     esi,lpuLrgInt
        xor     ebx,ebx
        xor     edi,edi

LgMul0: mov     eax,[esi]
        push    eax
        mul dword ptr uMultLo
        add     ecx,eax
        adc     ebx,edx
        pop     eax
        adc     edi,0
        mov     [esi],ecx
        mul dword ptr uMultHi
        add     esi,4
        add     ebx,eax
        mov     ecx,0
        adc     edi,edx
        adc     ecx,0
        dec dword ptr nDwords
        xchg    ebx,edi
        xchg    ecx,edi
        jnz     LgMul0

        mov     [esi],ecx
        mov     [esi+4],ebx
        ret

LrgMul64 ENDP

;############################################################################################


EDIT: corrected code for the working routine
the routine does not set any return values, as shown
it could be modified to return the length of the resulting integer
Title: Re: Large 64-bit MUL
Post by: dedndave on April 22, 2013, 12:06:13 PM
i think i got it   :P
i need to do some testing
Title: Re: Large 64-bit MUL
Post by: Ficko on April 22, 2013, 05:33:38 PM
Hi Dale,

I am using this proc for my stuff may give you some hint.


HIGHDWORD equ 4

i64mul proc multiplicand:SQWORD,multiplier:SQWORD
mov edx, multiplicand + HIGHDWORD
mov ecx, multiplier + HIGHDWORD
or edx, ecx ; One operand >= 2^32?
mov edx, multiplier
mov eax, multiplicand
jnz @F ; Yes, need two multiplies.
mul edx ; multiplicand_lo * multiplier_lo
ret 10h ; Done, return to caller.

@@: imul edx, multiplicand + HIGHDWORD                  ; p3_lo = multiplicand_hi * multiplier_lo
imul ecx, eax ; p2_lo = multiplier_hi * multiplicand_lo
add ecx, edx ; p2_lo + p3_lo
mul dword ptr multiplier+ HIGHDWORD                     ; p1 = multiplicand_lo * multiplier_lo
add edx, ecx ; p1 + p2_lo + p3_lo = result in EDX:EAX
i64mul endp
Title: Re: Large 64-bit MUL
Post by: TouEnMasm on April 22, 2013, 06:11:02 PM

Seems that intel had made it for you here (vc++ express):
C:\Program Files\Microsoft Visual Studio 10.0\VC\crt\src\intel\llmul.asm
Title: Re: Large 64-bit MUL
Post by: dedndave on April 22, 2013, 09:30:57 PM
thanks guys
64 x 64 isn't too hard
i am doing 64 x 32N   :P
it would be easy with 64-bit registers - lol
Title: Re: Large 64-bit MUL
Post by: FORTRANS on April 22, 2013, 10:33:48 PM
Hi Dave,

   If I can get N x N byte and word multiplies to work, I am sure you
can get N x N double multiplies to work.  I may have to revisit mine
to find out why I have a really odd routine in there to make it work.
It has to be a blindness on my part as no one else needs it.

Best of luck,

Steve N.
Title: Re: Large 64-bit MUL
Post by: dedndave on April 23, 2013, 12:43:12 AM
i am close   :lol:

        xor     ecx,ecx
        mov     esi,lpuLrgInt
        xor     ebx,ebx
        xor     edi,edi

;EDX:EAX = working registers
;ESI = source pointer
;ECX = carry 0
;EBX = carry 1
;EDI = carry 2

loop00: mov     eax,[esi]
        push    eax

        mul dword ptr uMultLo

        add     ecx,eax
        adc     ebx,edx
        pop     eax
        adc     edi,0
        mov     [esi],ecx

        mul dword ptr uMultHi

        add     ebx,eax
        mov     ecx,0
        adc     edi,edx
        adc     ecx,0

        xchg    ebx,edi
        xchg    ecx,edi

        add     esi,4
        dec dword ptr nDwords
        jnz     loop00

        mov     [esi],ecx
        mov     [esi+4],ebx
        ret


but no cigar   :(
Title: Re: Large 64-bit MUL
Post by: dedndave on April 23, 2013, 01:06:57 AM
it works !!!!!   :eusa_dance:

i was passing the wrong register on the call - lol
Title: Re: Large 64-bit MUL
Post by: dedndave on April 23, 2013, 01:25:47 AM
i updated the code in the original post   :t
Title: Re: Large 64-bit MUL
Post by: Jibz on April 23, 2013, 08:40:09 AM
Quote from: ToutEnMasm on April 22, 2013, 06:11:02 PM

Seems that intel had made it for you here (vc++ express):
C:\Program Files\Microsoft Visual Studio 10.0\VC\crt\src\intel\llmul.asm

As an interesting aside, the implementation of the 64-bit arithmetic functions for 32-bit code in Visual C++ are not optimal (at least on todays hardware). I wrote a post about it a couple of years ago if anyone is interested:

http://www.hardtoc.com/archives/154
Title: Re: Large 64-bit MUL
Post by: dedndave on April 23, 2013, 08:43:47 AM
no matter - i looked at the code for that function
it appears to be for 64x64 mul

i didn't bother timing my routine
it could be done faster with SSE code, i am sure
but, one of my goals in this case is not to use MMX/FPU/XMM registers
Title: Re: Large 64-bit MUL
Post by: jj2007 on April 23, 2013, 05:06:45 PM
Quote from: dedndave on April 23, 2013, 08:43:47 AM
but, one of my goals in this case is not to use MMX/FPU/XMM registers

What's wrong with the FPU? Just curious ;-)

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
loop overhead is approx. 238/100 cycles

3994    cycles for 100 * div
1971    cycles for 100 * fdiv

3984    cycles for 100 * div
1902    cycles for 100 * fdiv

3981    cycles for 100 * div
1912    cycles for 100 * fdiv

16      bytes for div
16      bytes for fdiv

TestA proc
  mov ebx, AlgoLoops-1   ; loop e.g. 100x
  mov ecx, 123
  push -1
  push -123
  align 4
  .Repeat
   mov eax, dword ptr [esp]
   mov edx, dword ptr [esp+4]
   idiv ecx
   dec ebx
  .Until Sign?
  pop eax
  pop edx
  ret
TestA endp

align 16
TestB proc
  mov ebx, AlgoLoops-1   ; loop e.g. 100x
  mov ecx, 123
  push -1
  push -123
  align 4
  .Repeat
   fild qword ptr [esp]
   fdiv FP8(123.0)
   dec ebx
  .Until Sign?
  pop eax
  pop edx
  ret
TestB endp
Title: Re: Large 64-bit MUL
Post by: dedndave on April 23, 2013, 07:05:18 PM
nothing wrong with using FPU or SSE, at all
i am doing a float-to-ascii type routine
my "theory" is, if conversion routines don't use those registers,
you can convert and display values in the middle of your FPU/SSE code, without destroying the contents or state

i don't understand why you are timing the DIV operation
my desired routine performs a MUL operation

if you want to time my function.....
the longest integer in my application is 1194 dwords long
so, the worst case would be that many dwords of FFFFFFFFh, multiplied by FFFFFFFF_FFFFFFFFh
be sure to leave extra room in the buffer (at least 2 more dwords)
see the code in the first post
i am guessing something like 35,000 clock cycles   :P

EDIT: i was close   :P
i get about 41,000 cycles on my old P4
Title: Re: Large 64-bit MUL
Post by: dedndave on April 23, 2013, 07:59:50 PM
oh, and that's only part of the function - lol

to generate the exponential probably takes 10,000 cycles
then, multiply it - 41,000 cycles
then, convert to ASCII (11,514 digits) - probably another 40,000 cycles

so, to evaluate the REAL10 value 0001_FFFFFFFF_FFFFFFFF to full precision, about 100,000 cycles
i am sure that's faster than my old version
it shifted the ASCII string right to divide by 2^N
evaluating the same real with that method took about 1 to 2 seconds   :biggrin:
Title: Re: Large 64-bit MUL
Post by: Gunther on April 23, 2013, 08:22:58 PM
Hi Jibz,

Quote from: Jibz on April 23, 2013, 08:40:09 AM
As an interesting aside, the implementation of the 64-bit arithmetic functions for 32-bit code in Visual C++ are not optimal (at least on todays hardware). I wrote a post about it a couple of years ago if anyone is interested:

http://www.hardtoc.com/archives/154

very interesting post. I'll check out next weekend what the gcc is "saying".

Gunther
Title: Re: Large 64-bit MUL
Post by: jj2007 on April 23, 2013, 10:33:51 PM
Quote from: dedndave on April 23, 2013, 07:05:18 PM
i don't understand why you are timing the DIV operation
my desired routine performs a MUL operation

Sorry, I should read your posts more carefully :redface:
Title: Re: Large 64-bit MUL
Post by: dedndave on April 24, 2013, 12:39:46 AM
no biggy, Jochen
interesting to see the FDIV stuff   :P