Author Topic: ascii adder  (Read 30947 times)

zedd151

  • Member
  • *****
  • Posts: 1942
Re: ascii adder
« Reply #15 on: September 26, 2015, 04:07:44 PM »
I went back and took another look at using AAA. It replaced a whole series of discrete instructions. Here are the results, using the same test methods as in the first (except only 5 reps):

Code: [Select]
original
5470
5498
5499
5496
5501

with AAA
2659
2557
2558
2556
2556

And this is the revised algo:

Code: [Select]
bugs

I found out that I was using a mov before, where I should have had an add.   ::)

Now lemme work on the code before the loop....
« Last Edit: September 27, 2015, 11:42:48 AM by zedd151 »
Regards, zedd.
:tongue:

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: ascii adder
« Reply #16 on: September 26, 2015, 04:11:29 PM »
that's a little better   :t

here's a table
each byte is meant to be SHR'ed
bit 0 becomes the carry flag for the next pass
the remaining bits are the ASCII byte total

Code: [Select]
AddAscTable db 60h,62h,64h,66h,68h,6Ah,6Ch,6Eh,70h,72h
            db 61h,63h,65h,67h,69h,6Bh,6Dh,6Fh,71h,73h
« Last Edit: September 27, 2015, 01:10:02 AM by dedndave »

rrr314159

  • Member
  • *****
  • Posts: 1378
Re: ascii adder
« Reply #17 on: September 26, 2015, 04:55:22 PM »
FWIW,

Here's what I came up with. According to my (unstable) timings, takes about 2000 cycles for the inputs you used for timing (111... and 222...). Takes about 400 more for the other test inputs (where the answer is 0101010101010101010101736973578888
297386792883747746736446685377357724486347246553692855982856
984682746982760763986982798077775597748957736987058639774692
870248639867730348707686676977588882886675865579578755697664
858857) since that pair involves carries. On AMD takes considerably longer.

Here's the main routine:

Code: [Select]
; ascii adder by rrr3134159 9/25/2015

include support.inc
Timer_Data

ASCII_LENGTH = 256                      ; must be divisible by 4
ASCII_LENGTH_DWORDS = ASCII_LENGTH / 4

copy_from_end MACRO dest, src, cnt
    lea esi, src
    lea edi, dest
    mov ecx, cnt
    inc ecx
    std
    rep movsb
    cld
ENDM

.data

; didn't bother to input these from console ...

a1  db "72687256878728728578278273764572634567527634762347624623645268275497274697367264"
    db "59726597629769726979767654976479567269726957629764592769247629767629347697676576"
    db "876587872876575764578568745687564758756", 0

;a1 db 256 dup (31h), 0          ; these data take about 2000 cycles, 400 less, since no carry

    a1_end = $ - 1
    a1_length = a1_end - a1

a2  db "10101010101010101010101010101010101010101010101010101010101010101010101010101010"
    db "10101010101100110101010101010010100101001010101001010010100100101010010100101001"
    db "010100101001010010100101001010010100101001010010010100100101", 0

;a2 db 256 dup (32h), 0

    a2_end = $ - 1
    a2_length = a2_end - a2

    num1 db ASCII_LENGTH dup(30h)
    num1_end = $
    db 0
    num2 db ASCII_LENGTH dup(30h)
    num2_end = $
    db 0
    ans db ASCII_LENGTH dup(30h)
    ans_end = $
    db 0
    ans_length dd 0
.code

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
aa:
    pp "****************************************************************\n"
    pp "num1 %s\n\n", offset a1
    pp "num2 %s\n\n", offset a2

    InitCycles                          ; start timer

    copy_from_end num1_end, a1_end, a1_length
    copy_from_end num2_end, a2_end, a2_length

;    get_time inputting took             ; uncomment to separate input timing

; add per dword, and propagate the carry thru the dword
    mov ecx, ASCII_LENGTH_DWORDS
    dec ecx
    mov dl, 0
    add_loop:
        mov eax, DWORD PTR num1[ecx*4]
        add eax, DWORD PTR num2[ecx*4]
        sub eax, 30303030h
        add al, dl
        cmp al, 3ah
        jl @f
            sub al, 0ah
            add ah, 1
        @@:
        mov dl, 0
        cmp ah, 3ah
        jl @F
            sub ah, 0ah
            mov dl, 1 
        @@:
        bswap eax
        add ah, dl
        cmp ah, 3ah
        jl @f
            sub ah, 0ah
            add al, 1
        @@:
        mov dl, 0
        cmp al, 3ah
        jl @F
            sub al, 0ah
            mov dl, 1 
        @@:
        bswap eax
        mov DWORD PTR ans[ecx*4], eax
        dec ecx
        jge add_loop
; if dl = 1 the addition overflowed

; find first non-zero digit to determine length of answer
    xor ecx, ecx
    @@:
        cmp BYTE PTR ans[ecx], 30h
        jne @F
        inc ecx
        jmp @B
    @@:
    sub ecx, ASCII_LENGTH
    neg ecx
    mov ans_length, ecx

; get time, print answer
    get_time adding took
    lea esi, ans_end
    sub esi, ans_length
    pp "\nans  %s \n\n", esi

ret
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end aa
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

I really have no idea if this is worth anything, if anyone cares I can clean it up in various ways.
I am NaN ;)

zedd151

  • Member
  • *****
  • Posts: 1942
Re: ascii adder
« Reply #18 on: September 26, 2015, 05:07:37 PM »
I really have no idea if this is worth anything, if anyone cares I can clean it up in various ways.

No, the code looks fine. It's doing still around 5000 cycles, I think bswap is killing it.

is the main loop aligned, I haven't checked?

later:

with the main loop aligned, no change
Code: [Select]

[code]
adding took 5016 cycles



Anyway, I was trying earlier to do something very similar 4 bytes at a time.

Could you set up my original algo - the faster one - in your testbed for better comparison?

I would try to put yours into mine but it might alter certain facets of your algo.

Mine is a little more forgiving I think. You can put a call to it from a simple proc as in my example.

later:

I tried putting my algo in your testbed for better comparison, but this is the result I got,
which I question.

Code: [Select]
adding took 7044 cycles - yours

adding took 6288 cycles - mine

So, I'm pretty sure I screwed it up. lol.
Regards, zedd.
:tongue:

zedd151

  • Member
  • *****
  • Posts: 1942
Re: ascii adder
« Reply #19 on: September 26, 2015, 07:05:01 PM »
used a rolled up version of StrLen as a macro here, in place of the clumsy string length loops.

Code: [Select]
w StrLen macro and AAA
2055
2003
2001
2000
2002

with AAA
2643
2565
2567
2565
2565

original
5492
5549
5533
5549
5509

Press any key to continue ...


I found a bug in the original design.

Not in the algo itself, but with the stack balancing.  ::)
So I removed all the faulty code.....
« Last Edit: September 27, 2015, 11:51:09 AM by zedd151 »
Regards, zedd.
:tongue:

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: ascii adder
« Reply #20 on: September 26, 2015, 10:47:34 PM »
the one i am writing uses the masm32 StrLen function, because i know it's reasonably fast   :P

rrr314159

  • Member
  • *****
  • Posts: 1378
Re: ascii adder
« Reply #21 on: September 26, 2015, 11:45:28 PM »
Quote from: zedd151
No, the code looks fine. It's doing still around 5000 cycles, I think bswap is killing it.

- dunno, on my i5 it's usually about 2400 - or close to 2000, with the "111..." and "222..." test cases; but on AMD more like twice that. Do u have an AMD? I think bswap is slow on them ...
I am NaN ;)

TWell

  • Member
  • ****
  • Posts: 743
Re: ascii adder
« Reply #22 on: September 26, 2015, 11:58:51 PM »
AMD E450 1.65 GHz
Code: [Select]
w strlen
3720
3708
3719
3825
3719
with AAA
4878
4874
4907
4873
4875
original
5926
5992
6011
6109
6145

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: ascii adder
« Reply #23 on: September 27, 2015, 12:16:48 AM »
i tested mine with 256 dup(31h) and 256 dup(32h)
the counts are over 5000 cycles, so no use finishing that routine - lol

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: ascii adder
« Reply #24 on: September 27, 2015, 12:44:05 AM »
here's the code, if you're interested

look-up table methods usually compare rather well
in this case, though, it's a pipeline pile-up in the loop   :lol:

Code: [Select]
        ALIGN   16

AddAscii PROC USES EBX ESI EDI lpszVal1:LPSTR,lpszVal2:LPSTR,lpszResult:LPSTR

;--------------------------------

        .DATA

AddAscTable db 60h,62h,64h,66h,68h,6Ah,6Ch,6Eh,70h,72h
            db 61h,63h,65h,67h,69h,6Bh,6Dh,6Fh,71h,73h

        ALIGN   4

;--------------------------------

        .CODE

    mov     edi,lpszVal1
    mov     esi,lpszVal2
    INVOKE  StrLen,edi
    xchg    eax,ebx
    INVOKE  StrLen,esi
    mov     edx,lpszResult                  ;EDX = base address of result buffer
    .if eax<ebx
        xchg    eax,ebx                     ;EDI,EBX = address,length of shorter input
        xchg    esi,edi                     ;ESI,EAX = address,length of longer input
    .endif
    add     edx,eax                         ;EDX = address of last result byte
    lea     esi,[esi+eax-1]                 ;ESI = address of last input byte (longer)
    xor     ecx,ecx                         ;ECX = 0
    sub     eax,ebx                         ;EAX = length difference (ripple carry count) (clears CF)
    mov byte ptr [edx+1],cl                 ;null terminate result
    mov     eax,ecx                         ;EAX = 0
    lea     edi,[edi+ebx-1]                 ;EDI = address of last input byte (shorter)
    .repeat
        mov     cl,[esi]
        dec     esi
        adc     cl,[edi]
        dec     edi
        mov     al,AddAscTable[ecx-60h]
        shr     eax,1
        dec     ebx
        mov     [edx],al
        lea     edx,[edx-1]
    .until ZERO?

;more code here to ripple carry

    ret

AddAscii ENDP

zedd151

  • Member
  • *****
  • Posts: 1942
Re: ascii adder
« Reply #25 on: September 27, 2015, 12:59:10 AM »
the one i am writing uses the masm32 StrLen function, because i know it's reasonably fast   :P

Copycat. I had the same idea, but I macro-ized it. It did improve things.

Quote
look-up table methods usually compare rather well
I awas working on one too.

I'll check your out though. :t
Regards, zedd.
:tongue:

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: ascii adder
« Reply #26 on: September 27, 2015, 01:09:16 AM »
oops - one of the table values is wrong
won't speed things up, though
edited the two tables above

zedd151

  • Member
  • *****
  • Posts: 1942
Re: ascii adder
« Reply #27 on: September 27, 2015, 01:11:03 AM »
 
Code: [Select]
zedds w strlen macro
2043
2005
2011
2009
2016

with AAA
2687
2706
2668
2579
2570

dave AddAscii
5102
4885
4946
4886
4871

I dunno, dave.

run my testebd below, on your machine to see what he says.

I fixed the table in the download, dave.
« Last Edit: September 27, 2015, 11:47:01 AM by zedd151 »
Regards, zedd.
:tongue:

zedd151

  • Member
  • *****
  • Posts: 1942
Re: ascii adder
« Reply #28 on: September 27, 2015, 01:27:34 AM »
Here, I swapped the calls to StrLen with the macro calls

Code: [Select]
dave AddAscii - using my StrLen macro

4954
4883
4875
4892
4900


Well, the macros use a rolled (not unrolled StrLen)

and my fastest algo using a call to StrLen, instead of the macros...

Code: [Select]
dave AddAscii
2032
1979
1985
1990
1988


I think I will stick with my fast version, and the StrLen macro.  8)

Your code is fine dave, find the bottleneck.
Regards, zedd.
:tongue:

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: ascii adder
« Reply #29 on: September 27, 2015, 01:38:20 AM »
oh - well, if it runs well on your machine, i will finish the code
i am using an older P4 model CPU