News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

String append algo for JJ.

Started by hutch--, April 28, 2018, 12:06:36 PM

Previous topic - Next topic

hutch--

I have only just finished working on this in asm for PB so it was an easy conversion to 32 bit MASM. I don't have time at the moment to write a benchmark for it but in the exact PB version it was appending about 1 billion average sized words in about 5 seconds (error corrected here). Its done as pseudo FASTCALL and uses a macro to call it but its easy enough to manually call the algo. Timings vary depending on processor but it should be fast on most hardware.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    AppendText MACRO buff,text,eloc
      mov eax, buff
      mov ecx, text
      mov edx, eloc
      call szappend_data
      EXITM <eax>
    ENDM

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL ptxt1 :DWORD
    LOCAL ptxt2 :DWORD
    LOCAL ptxt3 :DWORD
    LOCAL ptxt4 :DWORD
    LOCAL ptxt5 :DWORD
    LOCAL ptxt6 :DWORD
    LOCAL ptxt7 :DWORD
    LOCAL ptxt8 :DWORD

    LOCAL pmem  :DWORD
    LOCAL cloc  :DWORD

    sas ptxt1,"one "
    sas ptxt2,"two "
    sas ptxt3,"three "
    sas ptxt4,"four "
    sas ptxt5,"five "
    sas ptxt6,"six "
    sas ptxt7,"seven "
    sas ptxt8,"eight "

    mov pmem, alloc(4096)
    mov cloc, 0

    mov cloc, AppendText(pmem,ptxt1,cloc)
    mov cloc, AppendText(pmem,ptxt2,cloc)
    mov cloc, AppendText(pmem,ptxt3,cloc)
    mov cloc, AppendText(pmem,ptxt4,cloc)
    mov cloc, AppendText(pmem,ptxt5,cloc)
    mov cloc, AppendText(pmem,ptxt6,cloc)
    mov cloc, AppendText(pmem,ptxt7,cloc)
    mov cloc, AppendText(pmem,ptxt8,cloc)

    print pmem,13,10

    free pmem

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

szappend_data proc

  ; ------------------------------------------------------------
  ; FASTCALL
  ; eax = pBuffer     the main buffer to append extra data to.
  ; ecx = pData       the byte data to append to the main buffer
  ; edx = location    current location pointer
  ; ------------------------------------------------------------

    push esi
    push edi

    mov esi, eax
    ;;; mov ecx, ecx
    add esi, edx
    mov edi, edx

    or eax, -1
  ; -----------
  ; unroll by 2
  ; -----------
  lbl0:
    add eax, 1
    movzx edx, BYTE PTR [ecx+eax]
    mov [esi+eax], dl
    test edx, edx
    jz lbl1                     ; exit on written terminator

    add eax, 1
    movzx edx, BYTE PTR [ecx+eax]
    mov [esi+eax], dl
    test edx, edx
    jnz lbl0                    ; exit on written terminator

  lbl1:
    add eax, edi

    pop edi
    pop esi
    ret

szappend_data endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

jj2007

Thanks, Hutch - I appreciate this, and it runs fine, of course.

My perspective was another one, though:

Imagine you have a string on the heap. Now you need to add another one, without knowing yet how long that string is.

Here is a snippet performing that simple task - "simple" for BASIC programmers, at least:
include \masm32\MasmBasic\MasmBasic.inc
  Init
  Recall "\Masm32\include\Windows.inc", L$() ; create a string array
  NanoTimer()
  xor ecx, ecx ; counter
  Let esi="" ; start with an empty string
  .Repeat
Let esi=esi+L$(ecx) ; append elements of the string array
test ecx, 1023
.if Zero?
Print "*" ; progress bar ;-)
.endif
inc ecx
  .Until ecx>=L$(?)
  Inkey CrLf$, NanoTimer$(), Str$(" ms to concatenate %i strings", L$(?))
EndOfCode


Project attached. Hint: For a Billion strings, it will take more than 5 seconds ::)

P.S.: A faster variant:  .Repeat
Let esi=esi+L$(ecx)+L$(ecx+1)+L$(ecx+2)+L$(ecx+3)+L$(ecx+4)+L$(ecx+5) ; append elements of the string array
add ecx, 6
  .Until ecx>=L$(?)

aw27

A shorter version.


include \masm32\include\masm32rt.inc

.code

_strcat proc dest : ptr, src : ptr
mov eax, dest
        dec eax
@@:
inc eax
cmp byte ptr [eax], 0
jne short @B

mov ecx, src
sub ecx, eax
@@:
mov dl, byte ptr [ecx+eax]
mov byte ptr [eax], dl
inc eax
test dl, dl
jne short @B
dec   eax
ret
_strcat endp

main proc
LOCAL ptxt1 :ptr
LOCAL ptxt2 :ptr
LOCAL ptxt3 :ptr
LOCAL ptxt4 :ptr
LOCAL ptxt5 :ptr
LOCAL ptxt6 :ptr
LOCAL ptxt7 :ptr
LOCAL ptxt8 :ptr

LOCAL pmem  :ptr

sas ptxt1,"one "
sas ptxt2,"two "
sas ptxt3,"three "
sas ptxt4,"four "
sas ptxt5,"five "
sas ptxt6,"six "
sas ptxt7,"seven "
sas ptxt8,"eight "

mov pmem, alloc(4096)

;mov eax, pmem

invoke _strcat, eax, ptxt1
invoke _strcat, eax, ptxt2
invoke _strcat, eax, ptxt3
invoke _strcat, eax, ptxt4
invoke _strcat, eax, ptxt5
invoke _strcat, eax, ptxt6
invoke _strcat, eax, ptxt7
invoke _strcat, eax, ptxt8

print pmem,10

free pmem

ret
main endp

end main

aw27

Even shorter:


include \masm32\include\masm32rt.inc

.code

_strcat proc dest : ptr, src : ptr
mov eax, dest

mov ecx, src
sub ecx, eax
@@:
mov dl, byte ptr [ecx+eax]
mov byte ptr [eax], dl
inc eax
test dl, dl
jne short @B
dec eax
ret
_strcat endp

main proc
LOCAL ptxt1 :ptr
LOCAL ptxt2 :ptr
LOCAL ptxt3 :ptr
LOCAL ptxt4 :ptr
LOCAL ptxt5 :ptr
LOCAL ptxt6 :ptr
LOCAL ptxt7 :ptr
LOCAL ptxt8 :ptr

LOCAL pmem  :ptr

sas ptxt1,"one "
sas ptxt2,"two "
sas ptxt3,"three "
sas ptxt4,"four "
sas ptxt5,"five "
sas ptxt6,"six "
sas ptxt7,"seven "
sas ptxt8,"eight "

mov pmem, alloc(4096)

;mov eax, pmem

invoke _strcat, eax, ptxt1
invoke _strcat, eax, ptxt2
invoke _strcat, eax, ptxt3
invoke _strcat, eax, ptxt4
invoke _strcat, eax, ptxt5
invoke _strcat, eax, ptxt6
invoke _strcat, eax, ptxt7
invoke _strcat, eax, ptxt8

print pmem,10

free pmem

ret
main endp

end main

daydreamer

Nice work,so if I replace my print loops with append string,I go from milliseconds * number of character to few clock cycles * character + usage of one single print or one winapi call to show in text  :t
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

aw27

#5
Quote from: daydreamer on April 29, 2018, 07:56:21 AM
Nice work,so if I replace my print loops with append string,I go from milliseconds * number of character to few clock cycles * character + usage of one single print or one winapi call to show in text  :t
Actually, Shlemiel had a O(n) time complexity problem, he could only satisfy his boss by transforming it to a O(1) time complexity problem.
strcat is also O(n). In some cases, as illustrated, but not always, it is possible to find a O(1) substitute - but these are not universal panaceas.

Shiemiel, could also drag the paint can as he advances, but what if the paint can is too heavy and what happens when he needs to replace the can?

jj2007

Quote from: aw27 on April 29, 2018, 04:23:47 PMwhat happens when he needs to replace the can?

Yes indeed. That is where the analogy becomes a bit odd: HeapReAlloc is the keyword, but Shlemiel's boss will hardly tell him "repaint everything on another, fresh road today" ;-)

There are cases where you know beforehand how many bytes you need for your initial big enough buffer; but a general purpose routine a$=b$+c$ should work even if you don't know that.

aw27

Yes, memory reallocation is equivalent to getting a new can. Soon or later we will need to do it and in practice people don't make a big pool of memory for the effect, usually they realloc as needed, and don't use giant paint cans.

hutch--

You can do sequential re-allocations and I have done them in the past but with modern hardware, half a gig or so is no big deal so a big single buffer works fine. When I have gone the re-alloc route, I found that as long as you increment the allocation by a big enough block, it is not all that much slower.

aw27

After a certain amount most of the reallocated memory will be swapped to disk. And in 32-bit you will have real sluggish problems accessing it when it approaches 1 GB (even with memory mapped files), although in theory you can go up to 2 GB in practice is much less.

jj2007

Big allocations are not a problem if you are dealing with a few strings. But if you have an array of x,000 strings, the problem can't be solved with a fat allocation for each string. See e.g. my old example that opens Windows.inc, converts all hexadecimal equates into decimal ones, and writes it back to disk. There, the process is indeed (identical btw to what Spolsky proposes)

- check required length of new element n, i.e. Left$(L$(n), pos-1)+"EQU "+Str$(esi)+Mid$(L$(n), posAfter)
- heapallocate the exact memory needed
- construct & copy the new string
- free the old string

Takes a bit longer than the algos above, but overall it's just 16ms. If you guess-allocate, say, 32k for each string, you end up with bigger initial strings, and in the end you must reallocate them anyway because you can't leave 32k*26k in memory. Besides, you need a human intervention beforehand when guessing that 32k will be enough.

daydreamer

#11
It doesn't work,like I intended code to work,because. I discovered string is unicode,so I produce garbage and/or causes protection fault
  :(
maybe better control use winapi textarea?
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

Try DeepL, they say its results are better than G translate.

hutch--

Magnus,

Just write a UNICODE version. If you have an ANSI version, use WORD sized data and increment by 2.

daydreamer

Quote from: hutch-- on May 05, 2018, 11:52:10 AM
Just write a UNICODE version. If you have an ANSI version, use WORD sized data and increment by 2.
I have some vc++ 2008 express , that has c++strings that dont work with asm inline simd code, i think its time to throw away strings and console , in favour of memorybuffers and a richedit textarea
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding