I have only just finished working on this in asm for PB so it was an easy conversion to 32 bit MASM. I don't have time at the moment to write a benchmark for it but in the exact PB version it was appending about 1 billion average sized words in about 5 seconds (error corrected here). Its done as pseudo FASTCALL and uses a macro to call it but its easy enough to manually call the algo. Timings vary depending on processor but it should be fast on most hardware.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *
AppendText MACRO buff,text,eloc
mov eax, buff
mov ecx, text
mov edx, eloc
call szappend_data
EXITM <eax>
ENDM
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
LOCAL ptxt1 :DWORD
LOCAL ptxt2 :DWORD
LOCAL ptxt3 :DWORD
LOCAL ptxt4 :DWORD
LOCAL ptxt5 :DWORD
LOCAL ptxt6 :DWORD
LOCAL ptxt7 :DWORD
LOCAL ptxt8 :DWORD
LOCAL pmem :DWORD
LOCAL cloc :DWORD
sas ptxt1,"one "
sas ptxt2,"two "
sas ptxt3,"three "
sas ptxt4,"four "
sas ptxt5,"five "
sas ptxt6,"six "
sas ptxt7,"seven "
sas ptxt8,"eight "
mov pmem, alloc(4096)
mov cloc, 0
mov cloc, AppendText(pmem,ptxt1,cloc)
mov cloc, AppendText(pmem,ptxt2,cloc)
mov cloc, AppendText(pmem,ptxt3,cloc)
mov cloc, AppendText(pmem,ptxt4,cloc)
mov cloc, AppendText(pmem,ptxt5,cloc)
mov cloc, AppendText(pmem,ptxt6,cloc)
mov cloc, AppendText(pmem,ptxt7,cloc)
mov cloc, AppendText(pmem,ptxt8,cloc)
print pmem,13,10
free pmem
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
szappend_data proc
; ------------------------------------------------------------
; FASTCALL
; eax = pBuffer the main buffer to append extra data to.
; ecx = pData the byte data to append to the main buffer
; edx = location current location pointer
; ------------------------------------------------------------
push esi
push edi
mov esi, eax
;;; mov ecx, ecx
add esi, edx
mov edi, edx
or eax, -1
; -----------
; unroll by 2
; -----------
lbl0:
add eax, 1
movzx edx, BYTE PTR [ecx+eax]
mov [esi+eax], dl
test edx, edx
jz lbl1 ; exit on written terminator
add eax, 1
movzx edx, BYTE PTR [ecx+eax]
mov [esi+eax], dl
test edx, edx
jnz lbl0 ; exit on written terminator
lbl1:
add eax, edi
pop edi
pop esi
ret
szappend_data endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
Thanks, Hutch - I appreciate this, and it runs fine, of course.
My perspective was another one, though:
Imagine you have a string on the heap. Now you need to add another one, without knowing yet how long that string is.
Here is a snippet performing that simple task - "simple" for BASIC programmers, at least:
include \masm32\MasmBasic\MasmBasic.inc
Init
Recall "\Masm32\include\Windows.inc", L$() ; create a string array
NanoTimer()
xor ecx, ecx ; counter
Let esi="" ; start with an empty string
.Repeat
Let esi=esi+L$(ecx) ; append elements of the string array
test ecx, 1023
.if Zero?
Print "*" ; progress bar ;-)
.endif
inc ecx
.Until ecx>=L$(?)
Inkey CrLf$, NanoTimer$(), Str$(" ms to concatenate %i strings", L$(?))
EndOfCode
Project attached. Hint: For a Billion strings, it will take more than 5 seconds ::)
P.S.: A faster variant: .Repeat
Let esi=esi+L$(ecx)+L$(ecx+1)+L$(ecx+2)+L$(ecx+3)+L$(ecx+4)+L$(ecx+5) ; append elements of the string array
add ecx, 6
.Until ecx>=L$(?)
A shorter version.
include \masm32\include\masm32rt.inc
.code
_strcat proc dest : ptr, src : ptr
mov eax, dest
dec eax
@@:
inc eax
cmp byte ptr [eax], 0
jne short @B
mov ecx, src
sub ecx, eax
@@:
mov dl, byte ptr [ecx+eax]
mov byte ptr [eax], dl
inc eax
test dl, dl
jne short @B
dec eax
ret
_strcat endp
main proc
LOCAL ptxt1 :ptr
LOCAL ptxt2 :ptr
LOCAL ptxt3 :ptr
LOCAL ptxt4 :ptr
LOCAL ptxt5 :ptr
LOCAL ptxt6 :ptr
LOCAL ptxt7 :ptr
LOCAL ptxt8 :ptr
LOCAL pmem :ptr
sas ptxt1,"one "
sas ptxt2,"two "
sas ptxt3,"three "
sas ptxt4,"four "
sas ptxt5,"five "
sas ptxt6,"six "
sas ptxt7,"seven "
sas ptxt8,"eight "
mov pmem, alloc(4096)
;mov eax, pmem
invoke _strcat, eax, ptxt1
invoke _strcat, eax, ptxt2
invoke _strcat, eax, ptxt3
invoke _strcat, eax, ptxt4
invoke _strcat, eax, ptxt5
invoke _strcat, eax, ptxt6
invoke _strcat, eax, ptxt7
invoke _strcat, eax, ptxt8
print pmem,10
free pmem
ret
main endp
end main
Even shorter:
include \masm32\include\masm32rt.inc
.code
_strcat proc dest : ptr, src : ptr
mov eax, dest
mov ecx, src
sub ecx, eax
@@:
mov dl, byte ptr [ecx+eax]
mov byte ptr [eax], dl
inc eax
test dl, dl
jne short @B
dec eax
ret
_strcat endp
main proc
LOCAL ptxt1 :ptr
LOCAL ptxt2 :ptr
LOCAL ptxt3 :ptr
LOCAL ptxt4 :ptr
LOCAL ptxt5 :ptr
LOCAL ptxt6 :ptr
LOCAL ptxt7 :ptr
LOCAL ptxt8 :ptr
LOCAL pmem :ptr
sas ptxt1,"one "
sas ptxt2,"two "
sas ptxt3,"three "
sas ptxt4,"four "
sas ptxt5,"five "
sas ptxt6,"six "
sas ptxt7,"seven "
sas ptxt8,"eight "
mov pmem, alloc(4096)
;mov eax, pmem
invoke _strcat, eax, ptxt1
invoke _strcat, eax, ptxt2
invoke _strcat, eax, ptxt3
invoke _strcat, eax, ptxt4
invoke _strcat, eax, ptxt5
invoke _strcat, eax, ptxt6
invoke _strcat, eax, ptxt7
invoke _strcat, eax, ptxt8
print pmem,10
free pmem
ret
main endp
end main
Nice work,so if I replace my print loops with append string,I go from milliseconds * number of character to few clock cycles * character + usage of one single print or one winapi call to show in text :t
Quote from: daydreamer on April 29, 2018, 07:56:21 AM
Nice work,so if I replace my print loops with append string,I go from milliseconds * number of character to few clock cycles * character + usage of one single print or one winapi call to show in text :t
Actually, Shlemiel had a O(n) time complexity problem, he could only satisfy his boss by transforming it to a O(1) time complexity problem.
strcat is also O(n). In some cases, as illustrated, but not always, it is possible to find a O(1) substitute - but these are not universal panaceas.
Shiemiel, could also drag the paint can as he advances, but what if the paint can is too heavy and what happens when he needs to replace the can?
Quote from: aw27 on April 29, 2018, 04:23:47 PMwhat happens when he needs to replace the can?
Yes indeed. That is where the analogy becomes a bit odd: HeapReAlloc is the keyword, but Shlemiel's boss will hardly tell him "repaint everything on another, fresh road today" ;-)
There are cases where you know beforehand how many bytes you need for your initial big enough buffer; but a general purpose routine a$=b$+c$ should work even if you don't know that.
Yes, memory reallocation is equivalent to getting a new can. Soon or later we will need to do it and in practice people don't make a big pool of memory for the effect, usually they realloc as needed, and don't use giant paint cans.
You can do sequential re-allocations and I have done them in the past but with modern hardware, half a gig or so is no big deal so a big single buffer works fine. When I have gone the re-alloc route, I found that as long as you increment the allocation by a big enough block, it is not all that much slower.
After a certain amount most of the reallocated memory will be swapped to disk. And in 32-bit you will have real sluggish problems accessing it when it approaches 1 GB (even with memory mapped files), although in theory you can go up to 2 GB in practice is much less.
Big allocations are not a problem if you are dealing with a few strings. But if you have an array of x,000 strings, the problem can't be solved with a fat allocation for each string. See e.g. my old example that opens Windows.inc, converts all hexadecimal equates into decimal ones, and writes it back to disk (http://masm32.com/board/index.php?topic=94.msg265#msg265). There, the process is indeed (identical btw to what Spolsky proposes)
- check required length of new element n, i.e. Left$(L$(n), pos-1)+"EQU "+Str$(esi)+Mid$(L$(n), posAfter)
- heapallocate the exact memory needed
- construct & copy the new string
- free the old string
Takes a bit longer than the algos above, but overall it's just 16ms. If you guess-allocate, say, 32k for each string, you end up with bigger initial strings, and in the end you must reallocate them anyway because you can't leave 32k*26k in memory. Besides, you need a human intervention beforehand when guessing that 32k will be enough.
It doesn't work,like I intended code to work,because. I discovered string is unicode,so I produce garbage and/or causes protection fault
:(
maybe better control use winapi textarea?
Try DeepL (https://www.deepl.com/translator), they say its results are better than G translate.
Magnus,
Just write a UNICODE version. If you have an ANSI version, use WORD sized data and increment by 2.
Quote from: hutch-- on May 05, 2018, 11:52:10 AM
Just write a UNICODE version. If you have an ANSI version, use WORD sized data and increment by 2.
I have some vc++ 2008 express , that has c++strings that dont work with asm inline simd code, i think its time to throw away strings and console , in favour of memorybuffers and a richedit textarea
I question to C programmers:should it be better use char16_t than short in include files?