Simplifying this Short MASM

alfrqan · March 29, 2013, 07:49:58 PM

heyo
I am trying to simplify this code for making it fast..it takes 700 ms for every addition
I want to make it shorter
so i need to simplify my code ,if u can help me
      _asm {
         mov edx, summand
         mov eax, [edx]
         mov ebx, this
         add eax, [ebx]
         mov [ebx], eax

         mov ecx, 4
         mov eax, [edx + ecx]
         adc eax, [ebx + ecx]
         mov [ebx + ecx], eax

         mov ecx, 8
         mov eax, [edx + ecx]
         adc eax, [ebx + ecx]
         mov [ebx + ecx], eax

         mov ecx, 12
         mov eax, [edx + ecx]
         adc eax, [ebx + ecx]
         mov [ebx + ecx], eax
      }
I do need to represent this code in shoter and simplified way if possible

jj2007 · March 29, 2013, 08:47:00 PM

Hi,
Your code definitely does not take 700ms for an addition. It takes a few cycles, i.e. nanoseconds. If it is slow, then the reason is elsewhere. Putting it in a loop to make it shorter, like this:

xor ecx, ecx
clc
.Repeat
lea ecx, [ecx+4]
mov eax, [edx + ecx]
adc eax, [ebx + ecx]
mov [ebx + ecx], eax
.Until ecx==3*4

looks tempting but ecx== destroys your carry flag.

EDIT: Ossa posted a nice solution here, taking account of the fact that inc does not change the carry flag.

Perhaps you should post some more details, so that we understand better what's going wrong in your code.

And, by the way: Welcome to the Forum :icon14:

alfrqan · March 29, 2013, 09:10:12 PM

thanks mate for your great answer
well,xor what for use in the code u posted?
and also u did not use my summand variable with this..
are you sure this works for every masm assembler?
thnx mate for ur helpfull idea of loop and about ur response

i am doing big integer addition

jj2007 · March 29, 2013, 09:28:45 PM

- xor ecx, ecx is a shorter way of achieving mov ecx, 0
- yes, I used your summand variable
- no, the loop does not solve your problem, since the comparison trashes the carry flag.

alfrqan · March 29, 2013, 09:45:39 PM

this is inline assembler btw..thnx friend
_asm {
         mov edx, summand
         mov eax, [edx]
         mov ebx, this
        add eax, [ebx]
         mov [ebx], eax
        xor esi,esi
        add esi,12
        xor ecx, ecx
     repeat:
lea ecx, [ecx+4]
mov eax, [edx + ecx]
adc eax, [ebx + ecx]
mov [ebx + ecx], eax
        cmp ecx, esi
jne repeat
      }
this what i am trying to convert from c++ to asm
   template <std::uint16_t N, typename AtomicType>
   void big_int<N, AtomicType>::add_(const big_int& summand) {
      AtomicType carry = 0;
      for(std::size_t n = 0; n < N; ++n) { // invert direction for big endian
         AtomicType result = storage_[n] + summand.storage_[n] + carry;
         carry = carry ? result <= storage_[n] : result < storage_[n];
         storage_[n] = result;
      }
   }
it looks like this now

dedndave · March 29, 2013, 10:30:28 PM

putting it in a loop won't make it faster - just smaller
but, if it's slow, it may be because you use EBX without preserving it
try this...

Code Select

_asm {
    mov     ecx,this
    mov     edx,summand

    mov     eax,[ecx]
    add     eax,[edx]
    mov     [ecx],eax

    mov     eax,[ecx+4]
    adc     eax,[edx+4]
    mov     [ecx+4],eax

    mov     eax,[ecx+8]
    adc     eax,[edx+8]
    mov     [ecx+8],eax

    mov     eax,[ecx+12]
    adc     eax,[edx+12]
    mov     [ecx+12],eax
}

... and welcome to the forum :t

alfrqan · March 29, 2013, 11:56:03 PM

Quote from: dedndave on March 29, 2013, 10:30:28 PM
putting it in a loop won't make it faster - just smaller
but, if it's slow, it may be because you use EBX without preserving it
try this...
Code Select Expand
_asm { mov ecx,this mov edx,summand mov eax,[ecx] add eax,[edx] mov [ecx],eax mov eax,[ecx+4] adc eax,[edx+4] mov [ecx+4],eax mov eax,[ecx+8] adc eax,[edx+8] mov [ecx+8],eax mov eax,[ecx+12] adc eax,[edx+12] mov [ecx+12],eax }

... and welcome to the forum :t

well now it is slower by 1 second:P now it is 1.7 seconds while it was 0.7seconds
so type of register effecting the speed?!
Definitely it is the Asm which is slowing the process..there something can be changed on it which might make it fast

jj2007 · March 30, 2013, 12:07:29 AM

Quote from: alfrqan on March 29, 2013, 11:56:03 PM
well now it is slower by 1 second:P

> it takes 700 ms for every addition

The code shown above takes a handful of cycles for every addition, say: 20.
With a CPU that runs a 2 GHz, i.e. 2000000000 cycles per seconds, that makes
20 cycles/2000000000 cycles/s=1.0E-008 seconds = 1.0E-005 milliseconds = 1.0E-002 nanoseconds = 10 picoseconds. So
1. either your problem is somewhere else
2. or you talking not about "every addition" but rather about "every ten Million additions".

#2?

alfrqan · March 30, 2013, 12:31:14 AM

Quote from: jj2007 on March 30, 2013, 12:07:29 AM
Quote from: alfrqan on March 29, 2013, 11:56:03 PM
well now it is slower by 1 second:P

> it takes 700 ms for every addition

The code shown above takes a handful of cycles for every addition, say: 20.
With a CPU that runs a 2 GHz, i.e. 2000000000 cycles per seconds, that makes
20 cycles/2000000000 cycles/s=1.0E-008 seconds = 1.0E-005 milliseconds = 1.0E-002 nanoseconds = 10 picoseconds. So
1. either your problem is somewhere else
2. or you talking not about "every addition" but rather about "every ten Million additions".

#2?

Well i am trying to add a numbers with nearly 512 bits
but using high level language such c++ will give me nearly 300 ms while Assembmbly giving me nearly 700 ms with my algorithms in ASM ,last one u gave me takes 1.7 seconds
Waiting ur anwer
thanks

dedndave · March 30, 2013, 01:07:47 AM

QuoteWell i am trying to add a numbers with nearly 512 bits

ahah ! - lol

so - you are calling this rountine more than once for each add - there's your problem
write the asm routine to handle 512 bits, rather than 128

take my code from reply #5 and extend it to add 512 bits
for that, you could actually use another register or 2 - it is worth the cost of push/pop

ADD is a fairly fast instruction
ADC is a bit slower
so, if you can ADC register to register, so much the better, i think

the fact is - what you really want is to use SSE :P

qWord · March 30, 2013, 01:53:00 AM

Quote from: dedndave on March 30, 2013, 01:07:47 AMthe fact is - what you really want is to use SSE :P

that is definitively not want he want, because these instructions does not have a carry logic.

dedndave · March 30, 2013, 02:19:49 AM

i thought i saw some that did
even so, i would think you could load 64 bits (high 64 bits set to 0) and add it as a 128 bit value, move to the next 64
i will leave that for those who know SSE well :P

this should be a reasonably fast 512-bit add...

Code Select

_asm {
    push    ebx
    push    esi
    push    edi
    mov     edx,summand
    mov     ecx,this

    mov     eax,[edx]
    mov     edi,[edx+4]
    mov     esi,[edx+8]
    mov     ebx,[edx+12]
    add     [ecx],eax
    adc     [ecx+4],edi
    adc     [ecx+8],esi
    adc     [ecx+12],ebx

    mov     eax,[edx+16]
    mov     edi,[edx+20]
    mov     esi,[edx+24]
    mov     ebx,[edx+28]
    adc     [ecx+16],eax
    adc     [ecx+20],edi
    adc     [ecx+24],esi
    adc     [ecx+28],ebx

    mov     eax,[edx+32]
    mov     edi,[edx+36]
    mov     esi,[edx+40]
    mov     ebx,[edx+44]
    adc     [ecx+32],eax
    adc     [ecx+36],edi
    adc     [ecx+40],esi
    adc     [ecx+44],ebx

    mov     eax,[edx+48]
    mov     edi,[edx+52]
    mov     esi,[edx+56]
    mov     ebx,[edx+60]
    adc     [ecx+48],eax
    adc     [ecx+52],edi
    adc     [ecx+56],esi
    adc     [ecx+60],ebx

    pop     edi
    pop     esi
    pop     ebx
}

alfrqan · March 30, 2013, 07:33:43 AM

Quote from: dedndave on March 30, 2013, 02:19:49 AM
i thought i saw some that did
even so, i would think you could load 64 bits (high 64 bits set to 0) and add it as a 128 bit value, move to the next 64
i will leave that for those who know SSE well :P

this should be a reasonably fast 512-bit add...
Code Select Expand
_asm { push ebx push esi push edi mov edx,summand mov ecx,this mov eax,[edx] mov edi,[edx+4] mov esi,[edx+8] mov ebx,[edx+12] add [ecx],eax adc [ecx+4],edi adc [ecx+8],esi adc [ecx+12],ebx mov eax,[edx+16] mov edi,[edx+20] mov esi,[edx+24] mov ebx,[edx+28] adc [ecx+16],eax adc [ecx+20],edi adc [ecx+24],esi adc [ecx+28],ebx mov eax,[edx+32] mov edi,[edx+36] mov esi,[edx+40] mov ebx,[edx+44] adc [ecx+32],eax adc [ecx+36],edi adc [ecx+40],esi adc [ecx+44],ebx mov eax,[edx+48] mov edi,[edx+52] mov esi,[edx+56] mov ebx,[edx+60] adc [ecx+48],eax adc [ecx+52],edi adc [ecx+56],esi adc [ecx+60],ebx pop edi pop esi pop ebx }

well this is very slow:P
it took nearly 2543 ms
Really appreciating your answers:D
Still waiting for best solution

alfrqan · March 30, 2013, 07:35:25 AM

Quote from: dedndave on March 29, 2013, 10:30:28 PM
putting it in a loop won't make it faster - just smaller
but, if it's slow, it may be because you use EBX without preserving it
try this...
Code Select Expand
_asm { mov ecx,this mov edx,summand mov eax,[ecx] add eax,[edx] mov [ecx],eax mov eax,[ecx+4] adc eax,[edx+4] mov [ecx+4],eax mov eax,[ecx+8] adc eax,[edx+8] mov [ecx+8],eax mov eax,[ecx+12] adc eax,[edx+12] mov [ecx+12],eax }

... and welcome to the forum :t

thanks for your trying of helping me
yes thnx for you welcoming me:D
hopping going to find a good community as i expected
tc

alfrqan · March 30, 2013, 07:38:47 AM

Quote from: qWord on March 30, 2013, 01:53:00 AM
Quote from: dedndave on March 30, 2013, 01:07:47 AMthe fact is - what you really want is to use SSE :P
that is definitively not want he want, because these instructions does not have a carry logic.

Exactly true carry is most important
thnx mate

The MASM Forum

News:

Simplifying this Short MASM

alfrqan

jj2007

alfrqan

jj2007

alfrqan

dedndave

alfrqan

jj2007

alfrqan

dedndave

qWord

dedndave

alfrqan

alfrqan

alfrqan