Author Topic: ascii adder  (Read 19540 times)

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: ascii adder
« Reply #45 on: September 27, 2015, 03:07:05 AM »
i highly recommend Ray's tutorial (and his libraries, too)

http://www.ray.masmcode.com/

zedd151

  • Member
  • ****
  • Posts: 871
Re: ascii adder
« Reply #46 on: September 27, 2015, 03:08:17 AM »
i highly recommend Ray's tutorial (and his libraries, too)

http://www.ray.masmcode.com/

I just posted above yours :P
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

rrr314159

  • Member
  • *****
  • Posts: 1382
Re: ascii adder
« Reply #47 on: September 27, 2015, 04:00:04 AM »
Quote from: dedndave
if you just want to add 2 strings together,  working directly on ascii may be ok

- I bet just adding 2 numbers (or even doing a few additions) can be done faster working directly with ascii, than any other way. You have to remember that conversion from ascii decimal to binary integer - and back again - takes time. Not so sure about subtraction. For mult and div, or many adds/subtracts in a row, undoubtedly better to convert to binary first.

Quote from: dedndave
if you want fast, learn to use the FPU and SSE floats

- for exact precision of very large numbers (like 10^256, or 10^2048, as used in RSA algo) it's not so clear that floating point is the right way? 80 bits only gets you to 10^24 so you still need to chain them together - tricky to get exact precision? Don't forget 64-bit integers (with 64-bit code) are very easy to use. I suppose somebody knows what's the best approach; my guess, 64-bit integer is better than FPU (or SSE) - for exact precision. AVX, not to mention AVX512, is another story

Quote from: zedd151
Was a fun little project, wasn't it? ...
Was a fun project though.  8)

- well, sure! ... also, as I say, if you only want one or a few additions ascii approach is probably the best
« Last Edit: September 27, 2015, 07:13:43 AM by rrr314159 »
I am NaN ;)

zedd151

  • Member
  • ****
  • Posts: 871
Re: ascii adder
« Reply #48 on: September 27, 2015, 04:06:50 AM »
... also, as I say, if you only want one or a few additions ascii approach is probably the best

 You just gave me an idea  :idea:

Suppose we have a variable number of additions to do. I am going to *try* to run a simple
test using 3, 4, 5 inputs, to determine how much longer each extra calculation takes.

During those tests, for simplicity all the input strings will have the same length.
Then again, they don't necessarily have to be the same length....

thinking....................
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

zedd151

  • Member
  • ****
  • Posts: 871
Re: ascii adder
« Reply #49 on: September 27, 2015, 09:08:26 AM »
Tried a minor rewrite, attempting to free up ebp from usage.

Code: [Select]
    OPTION PROLOGUE:NONE
    OPTION EPILOGUE:NONE
    align 16
    nops 12
    asciiadderNA proc src1:dword, src2:dword, dst:dword
        push ebx
        push esi
        push edi
        mov esi, [esp+10h]
        mov ebx, [esp+14h]
        invoke StrLen, esi
        push eax
        invoke StrLen, ebx
        mov edx, eax
        pop ecx
        mov edi, [esp+18h]
        add edi, ecx
        add edi, 1
    @@:
        xor eax, eax
        mov al, byte ptr [esi+ecx]
        add al, byte ptr [ebx+edx]
        AAA
        add al, 30h
        add byte ptr [edi], al
        mov byte ptr [edi-1], ah
        dec edi
        dec edx
        dec ecx
        cmp ecx, -1
        jnz @b
        dec edi
        add word ptr [edi], 3030h
        pop edi
        pop esi
        pop ebx
        ret 12
    asciiadderNA endp
    OPTION PROLOGUE:PrologueDef
    OPTION EPILOGUE:EpilogueDef

Results on m,y machine were about ~20 cycles slower. I even used the unrolled StrLen proc,

and took a shorcut (necessitating the input strings be the same length)

It was a half-assed try any way.

I have another idea to do this two bytes at a time, no shr or ror needed....
lemme try it out.
I still want to exclude ebp from casual usage.....
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

zedd151

  • Member
  • ****
  • Posts: 871
Re: ascii adder
« Reply #50 on: September 27, 2015, 09:30:56 AM »
 :biggrin:
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

rrr314159

  • Member
  • *****
  • Posts: 1382
Re: ascii adder
« Reply #51 on: September 27, 2015, 09:45:23 AM »
I managed to squeeze another 17 cycles out of the fastest (so far) algo. Now ebp is left alone.

 - I thought you were moving on to subtraction! Maybe I'll revisit the addition algo, see if I can squeeze out a few cycles.

- BTW, do u (and / or anyone else) agree that (as I think) for a few additions, this ascii-based way ought to be the fastest? If so it makes the exercise worthwhile; if not, not. For mult, div, any serious manipulations, clearly it's best to convert to binary and do all arithmetic there, then convert back. But there could be apps where just additions are desired. So do u (and / or anyone else) agree that "ascii arithmetic" is probably the fastest way to add two (or a few) large decimal ascii numbers?
I am NaN ;)

zedd151

  • Member
  • ****
  • Posts: 871
Re: ascii adder
« Reply #52 on: September 27, 2015, 10:00:17 AM »
Ummmm...the project is on hold. I found a bug where if one string is much shorter, a sort of buffer under run, if that makes any sense.

Now back to the drawing board.

I know it was working 100% at one point
Code: [Select]
1

plus

99999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999999999999999999999999999999999999999999999999999999999
9999999999999999999999999999999999999999999999999999999999999999

equals

10000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000

all good now. Unfortunately that is basically the old algo.

Somewhere in the optimized main loop of the 'fast' algo, there is some sort of bug....
.................................................
..................................... now to go bughunting.

That'll teach me to make a bunch of changes without testing the results.  ::)

The old algo is the only one that works with inputs with uneven lengths.
I have tried to coerce the new algo to work properly, but have failed.

The new algo only works if the input string lengths are equal. :(


My software does not have bugs, it just has features you need to understand...
....to use it properly.  :biggrin:
« Last Edit: September 27, 2015, 12:02:58 PM by zedd151 »
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

zedd151

  • Member
  • ****
  • Posts: 871
Re: ascii adder
« Reply #53 on: September 27, 2015, 01:22:18 PM »
Okay a fresh new look at this little program.

I removed ebp as a general purpose register. Also using StrLen, to obtain of course the length of the strings.

Code: [Select]

    include \masm32\include\masm32rt.inc

    asciiadder PROTO :DWORD, :DWORD, :DWORD

    .data
         ss1 db "1", 0
         ss2 db "99999999999999999999999999999999999", 0
         dd1 db 40h dup (0)
    .code

    start:
        invoke asciiadder, addr ss2, addr ss1, addr dd1
        invoke MessageBoxA, 0, addr dd1, 0, 0
        invoke ExitProcess, 0

    OPTION PROLOGUE:NONE
    OPTION EPILOGUE:NONE
    asciiadder proc src1:dword, src2:dword, dst:dword
        push ebx
        push esi
        push edi
        mov esi, [esp+10h]
        mov ebx, [esp+14h]
        mov edi, [esp+18h]

        invoke StrLen, esi
        dec eax
        push eax

        invoke StrLen, ebx
        dec eax
        mov edx, eax
        pop ecx

        cmp ecx, edx
        jl edxgr
        add edi, ecx
        jmp setdst
    edxgr:
        add edi, edx
    setdst:
        inc edi
        ; ---------------------- main loop
    calc:
        mov al, 0
        cmp ecx, -1
        js @f
        mov al, byte ptr [esi+ecx]
    @@:
        cmp edx, -1
        js @f
        add al, byte ptr [ebx+edx]
    @@:
        sub al, 30h
        cmp al, 0Ah
        jl @f
        sub al, 30h
    @@:
        cmp al, 0Ah
        jl @f
        sub al, 0Ah
        inc byte ptr [edi-1]
    @@:
        add al, 30h
        add al, byte ptr [edi]
        cmp al, 39h
        jng @f
        sub al, 0Ah
        inc byte ptr [edi-1]
    @@:
        mov byte ptr [edi], al
        dec edi
        dec edx
        dec ecx
        cmp edi, [esp+18h]
        ja calc
        ; ---------------------- end main loop
        cmp byte ptr [edi], 30h
        jg @f
        add byte ptr [edi], 30h
    @@:
        pop edi
        pop esi
        pop ebx
        ret 12
    asciiadder endp
    OPTION PROLOGUE:PrologueDef
    OPTION EPILOGUE:EpilogueDef

    end start

Now I will once again concentrate on optimizing the main loop. Or at least try to minimize the number of jumps in this thing.
« Last Edit: July 20, 2018, 04:53:23 AM by zedd151 »
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

zedd151

  • Member
  • ****
  • Posts: 871
Re: ascii adder
« Reply #54 on: September 27, 2015, 02:24:52 PM »
New Test Piece.

Easy to use fixture for testing the modifications I will be making while 'optimising' the AsciiAdder.
 ;)
« Last Edit: July 20, 2018, 04:52:47 AM by zedd151 »
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

rrr314159

  • Member
  • *****
  • Posts: 1382
Re: ascii adder
« Reply #55 on: September 27, 2015, 03:04:58 PM »
We both tried the approach of adding dwords at a time, and subtracting 30303030h 4 bytes at a time, etc. Makes sense but doesn't gain enough for the trouble; byte-by-byte is still faster. Obvious is to use SSE: 16 bytes at a time will probably be fastest way. If I get around to it ...
I am NaN ;)

zedd151

  • Member
  • ****
  • Posts: 871
Re: ascii adder
« Reply #56 on: September 27, 2015, 03:18:54 PM »
I have reduced the register usage:

Code: [Select]
    OPTION PROLOGUE:NONE
    OPTION EPILOGUE:NONE
    asciiadder proc src1:dword, src2:dword, dst:dword
        push edi
        mov edi, [esp+10h]

        invoke StrLen, [esp+8]
        push eax

        invoke StrLen, [esp+10h]
        mov edx, eax
        pop ecx

        cmp ecx, edx
        jl edxgr
        add edi, ecx
        jmp setdst
    edxgr:
        add edi, edx
    setdst:
        add ecx, [esp+8]
        add edx, [esp+0Ch]
        dec ecx
        dec edx
        ; ---------------------- main loop
       
    calc:
        xor eax, eax
        cmp ecx, [esp+8]
        jl @f
        add al, byte ptr [ecx]
    @@:
        cmp edx, [esp+0Ch]
        jl @f
        add al, byte ptr [edx]
    @@:
        sub al, 30h
        cmp al, 0Ah
        jl @f
        sub al, 30h
    @@:
        cmp al, 0Ah
        jl @f
        sub al, 0Ah
        inc byte ptr [edi-1]
    @@:
        add al, 30h
        add al, byte ptr [edi]
        cmp al, 39h
        jng @f
        sub al, 0Ah
        inc byte ptr [edi-1]
    @@:
        mov byte ptr [edi], al
        dec ecx
        dec edx
        dec edi
        cmp edi, [esp+10h]
        jg calc
        inc edi
        ; ---------------------- end main loop
       
        cmp byte ptr [edi-1], 30h
        jg @f
        add byte ptr [edi-1], 30h
    @@:
        pop edi
        ret 12
    asciiadder endp
    OPTION PROLOGUE:PrologueDef
    OPTION EPILOGUE:EpilogueDef

Now I think I can put AAA back in. using a different mechnism to detect the start of the uneven string. I had previously relied on checking the sign flag for the buffers' counter register.

But since the buffers no longer have counters associated, I am simply directly comparing with the src pointer for each buffer. It took a little while to get to this point.

Now I will go through the process again of trying to get it up to speed. (again)
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

zedd151

  • Member
  • ****
  • Posts: 871
Re: ascii adder
« Reply #57 on: September 27, 2015, 03:53:23 PM »
Okay!! I have finally managed to both keep register usage to minimum, and also reinstate using the AAA instruction.  :icon_exclaim:

It took longer than I thought it should

Code: [Select]
    OPTION PROLOGUE:NONE
    OPTION EPILOGUE:NONE
    asciiadder proc src1:dword, src2:dword, dst:dword
        push edi
        mov edi, [esp+10h]
            invoke StrLen, [esp+8]
        push eax
            invoke StrLen, [esp+10h]
        mov edx, eax
        pop ecx
        cmp ecx, edx
        jl edxgr
        add edi, ecx
        jmp setdst
    edxgr:
        add edi, edx
    setdst:
        add ecx, [esp+8]
        add edx, [esp+0Ch]
        dec ecx
        dec edx
    calc:
        xor eax, eax
        mov al, byte ptr [edi]
        cmp ecx, [esp+8]
      jl @f
        add al, byte ptr [ecx]
      @@:
        cmp edx, [esp+0Ch]
      jl @f
        add al, byte ptr [edx]
      @@:
        cmp al, 39h
      jle @f
        AAA                         ; AAA per suggestion by dedndave
        add al, 30h
      @@:
        mov byte ptr [edi-1], ah    ; ah contains the carry byte
        mov byte ptr [edi], al      ; al contains the sum
        dec ecx
        dec edx
        dec edi
        cmp edi, [esp+10h]
        jg calc
        inc edi
        cmp byte ptr [edi-1], 30h
      jg @f
        add byte ptr [edi-1], 30h
      @@:
        pop edi
        ret 12
    asciiadder endp
    OPTION PROLOGUE:PrologueDef
    OPTION EPILOGUE:EpilogueDef

Still don't like all the cmp's and jxx's in there...
« Last Edit: July 20, 2018, 04:51:31 AM by zedd151 »
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

zedd151

  • Member
  • ****
  • Posts: 871
Re: ascii adder
« Reply #58 on: September 27, 2015, 04:05:47 PM »
Okay, here are the numbers so far..

Code: [Select]
newstyle --- * with the lower register usage. and AAA
3399
3344
3345
3363
3350

original old style - from the very first version, for comarison
5498
5543
5510
5513
5555

« Last Edit: July 20, 2018, 04:50:59 AM by zedd151 »
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: ascii adder
« Reply #59 on: September 27, 2015, 07:33:32 PM »
P4 Prescott w/htt @3GHz, XP MCE2005 SP3, 4Gb RAM
Code: [Select]
newstyle --- *
8492
8504
8505
8504
8745
original old style
7783
8219
7853
7873
7852