This algo does a one pass from left to right, determines the last valid character if there is one then terminates the string after that last character. If it is an empty string or a string full of spaces it writes the terminator at the beginning of the buffer so it is a zero length string.
The idea was to avoid alternative techniques where you had to determine the length then back scan to find the first non blank character. My guess is a one pass version, even though the instruction count per iteration is high is still faster than doing the same task in 2 passes.
IF 0 ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
rtrim2 PROTO :DWORD
.data
item1 db " this is a test ",0
item2 db " ",0
item3 db 0
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
LOCAL txt :DWORD
mov txt, rv(rtrim2,ADDR item1) ; string with characters
print txt,13,10
print ustr$(len(txt)),13,10
mov txt, rv(rtrim2,ADDR item2) ; string with spaces only
print txt,13,10
print ustr$(len(txt)),13,10
mov txt, rv(rtrim2,ADDR item3) ; empty string, must be valid address
print txt,13,10
print ustr$(len(txt)),13,10
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
rtrim2 proc ptxt:DWORD
; ------------------------------------------------
; one pass left to right to determine last valid
; character then terminate it after that character
; ------------------------------------------------
mov edx, [esp+4] ; load address into EDX
mov ecx, edx ; store that address in ECX
sub edx, 1
lpst:
add edx, 1
movzx eax, BYTE PTR [edx] ; zero extend byte into EAX
test eax, eax ; test for zero terminator
jz lpout ; exit loop on zero
cmp eax, 32 ; test if space character or lower
jbe lpst ; jump back if below or equal
mov ecx, edx ; store updated last character location in ECX
jmp lpst ; jump back to loop start
lpout:
cmp ecx, [esp+4] ; if ECX has not been modified (empty string)
je nxt ; jump to NXT label
add ecx, 1 ; add 1 to ECX to write terminator past last character
nxt:
mov BYTE PTR [ecx], 0 ; write the terminator
mov eax, [esp+4] ; put original stack address in EAX as return address
ret 4 ; BYE !
rtrim2 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
Timings:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
77434 cycles for 100 * rtrim2
76024 cycles for 100 * rtrim$
34988 cycles for 100 * Rtrim$
76767 cycles for 100 * trim2
76382 cycles for 100 * rtrim$
35256 cycles for 100 * Rtrim$
76691 cycles for 100 * trim2
76024 cycles for 100 * rtrim$
34993 cycles for 100 * Rtrim$
Results:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
43270 cycles for 100 * trim2
26528 cycles for 100 * rtrim$
9904 cycles for 100 * Rtrim$
42794 cycles for 100 * trim2
26361 cycles for 100 * rtrim$
9810 cycles for 100 * Rtrim$
43108 cycles for 100 * trim2
26416 cycles for 100 * rtrim$
9834 cycles for 100 * Rtrim$
8 bytes for trim2
15 bytes for rtrim$
10 bytes for Rtrim$
--- ok ---
Gunther
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
93453 cycles for 100 * trim2
91914 cycles for 100 * rtrim$
57921 cycles for 100 * Rtrim$
93506 cycles for 100 * trim2
92673 cycles for 100 * rtrim$
57474 cycles for 100 * Rtrim$
93606 cycles for 100 * trim2
92791 cycles for 100 * rtrim$
57808 cycles for 100 * Rtrim$
JJ,
Do you have a copy of your rtrim algo as I cannot build it with the normal masm32 system.
Quote from: hutch-- on June 13, 2014, 03:34:08 AM
Do you have a copy of your rtrim algo as I cannot build it with the normal masm32 system.
Not sure if that is helpful, but here it is. I guess the "speed secret" is the call to MbStrLen aka Len() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1144)
MbLMR proc uses esi edi ebx ecx edx srcLMR:DWORD, ctLMR:DWORD, stposLMR:DWORD ; 192 bytes (11/2013)
if 0
mov eax, ctLMR
bswap eax
.if al==128
shl ctLMR, 1
.endif
else
.if byte ptr ctLMR+3==128
shl ctLMR, 1
.endif
endif
mov esi, srcLMR ; if not 0, esi is a pointer to the source string
.if esi==127
push 0
call MbGetSlotPointer
mov esi, [edx]
mov eax, [edx+4]
test eax, eax
.if Sign?
neg eax
.endif
.elseif esi<127 ; PreAddLen
mov eax, offset MbSlots
.if !dword ptr [eax-4]
call MbBufferInit
.endif
lea eax, [eax+8*esi-80]
mov esi, [eax]
mov eax, [eax+4]
.else
push esi
.if al==128
call MbStrLenW
shl eax, 1
.else
call MbStrLen
.endif
.endif
mov ebx, eax ; Len(esi)
mov ecx, ctLMR ; no. of characters to be copied
if 0
test ecx, ecx
je lmrRet
else
jecxz lmrRet ; nothing to copy, nullslot - nullpointer??
endif
mov edx, stposLMR ; edx loaded with start pos: 1 for Left$, 0 for Null$, >0 for Mid$, <0 for Right$
test edx, edx
je lmrRetZero ; nothing to copy, nullslot - nullpointer??
.if sdword ptr edx<0 ; Right$("Hallo", 3) (traps also bad MID$("...", -1, n)
cmp ecx, eax ; ct>len()
jg lmrRetFull ; Greater (signed)
add esi, eax ; Len(src)
.if sdword ptr ecx<0 ; negative ct allowed, returns full string
mov ecx, eax
.endif
sub esi, ecx
inc eax
.elseif edx ; MID$ if stpos>1, otherwise Left$
cmp edx, eax ; start>len?
jg lmrRetZero
inc eax
add esi, edx ; MID$ start pos 1
dec esi
sub eax, edx
.endif
.if ecx>eax ; unsigned, ecx might be negative
mov ecx, eax ; ct > strlen: limit
.endif
test ecx, ecx
jge @F
lmrRetFull:
mov ecx, ebx ; use full len
jmp @F
lmrRetZero:
xor ecx, ecx
@@:
lmrRet:
push esi
.if srcLMR==127
sub [esp], esi ; clearing the stack is one byte shorter than .else construct
.endif
call MbGetSlotPointer ; returns pointer to string in [edx]
mov [edx], esi ; pass the address
mov [edx+4], ecx ; and the len
ret
MbLMR endp
It looks like an interesting number but it is still unbuildable.
Tough luck, Hutch - it's a bit too strongly integrated with the rest of the library. But if you are really keen on a fast algo, check the old forum for the fastest StrLen algo and use it. Or replace include \masm32\include\masm32rt.inc with ... MasmBasic.inc and use Len() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1144) ;)
rtrimUltra: ; fastest rtrim ever
mov edx, [esp+4] ; load address into EDX
add Len(edx), edx
.Repeat
dec eax
.Until byte ptr [eax]>32 || eax<=edx
mov byte ptr [eax+1], 0
xchg eax, edx
retn 4
If that is still too MasmBasic, replace Len with len, and build it with masm32rt.inc - half as fast, but still a factor three faster than the original rtrim2.
BTW inserting the zero delimiter is considered an anti-social act, especially in read-only memory. MasmBasic Rtrim$() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1166) uses a softer technique 8)
What I was looking for was a technique to benchmark and test different algos but it appears there is not one available. Sounds like you have a fast combination to perform this task but if I can't build it, I can 't test it.
Quote from: hutch-- on June 14, 2014, 11:54:05 AM
What I was looking for was a technique to benchmark and test different algos but it appears there is not one available.
Attached. It's very easy to use and builds with Masm32 only if you wish so.
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
14953 cycles for 100 * rtrimUltra (Masm32 len)
73356 cycles for 100 * rtrim$
14960 cycles for 100 * rtrimUltra (Masm32 len)
72571 cycles for 100 * rtrim$QuoteSounds like you have a fast combination to perform this task but if I can't build it, I can 't test it.
Sounds a bit like "the CRT seems to have some good algos but since I don't want to include msvcrt.lib, I can't test them" ;-)
Hey jj, am I missing something?
In your original attachment you reuse somestring but hutch's rtrim2 trims it, leaving the other tests working on an already-trimmed string.
There are a number of things I need to test that don't come in a pre-canned test bed, the attack rate (many short strings), the long read with a high count of end spaces as you sometimes get in log files to test any backward scanning techniques, whether the iteration count effects the timing etc etc etc ....
So far I have the original library version and the one I first posted here to test the newer one(s) against but as you seem to have a fast version, I was interested to see how it performed on multiple complex tests.
RE: read only files, if you need to trim a string from a file or source that is read only, is there any other way than to write it to a buffer while processing it ?
Quote from: sinsi on June 14, 2014, 06:00:49 PM
Hey jj, am I missing something?
In your original attachment you reuse somestring but hutch's rtrim2 trims it, leaving the other tests working on an already-trimmed string.
Sinsi,
You are perfectly right. Ideally, one would have to re-create the string for each iteration, but how to time that??
I have inserted this...
mov byte ptr [eax+1
+40], 0
... to provide more realistic timings, i.e. rtrimUltra does not trim off the original. In round 2, the Masm32 original trims it, though.
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
31666 cycles for 100 * rtrimUltra (Masm32 len)
17806 cycles for 100 * rtrimUltra (MasmBasic Len)
73692 cycles for 100 * rtrim$
14948 cycles for 100 * rtrimUltra (Masm32 len)
4899 cycles for 100 * rtrimUltra (MasmBasic Len)
73104 cycles for 100 * rtrim$Quote from: hutch-- on June 14, 2014, 06:11:48 PM
RE: read only files, if you need to trim a string from a file or source that is read only, is there any other way than to write it to a buffer while processing it ?
Valid point, of course, though I still would feel a bit uneasy doing that in a library routine. Maybe mention it in the documentation?
Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz (SSE4)
24159 cycles for 100 * rtrimUltra (Masm32 len)
17598 cycles for 100 * rtrimUltra (MasmBasic Len)
68562 cycles for 100 * rtrim$
11685 cycles for 100 * rtrimUltra (Masm32 len)
4215 cycles for 100 * rtrimUltra (MasmBasic Len)
68624 cycles for 100 * rtrim$
11795 cycles for 100 * rtrimUltra (Masm32 len)
4337 cycles for 100 * rtrimUltra (MasmBasic Len)
68742 cycles for 100 * rtrim$
36 bytes for rtrimUltra (Masm32 len)
8 bytes for rtrimUltra (MasmBasic Len)
13 bytes for rtrim$
This old quad has made a clown out of me many times with timings, after writing a second algo that has a shorter instruction path and writing a method of benchmarking the difference, the two algos stubbornly refuse to yield any difference in timing. The technique allocates very large amounts of memory as an array of strings then tests each algo. Timings on 1.5 gig of memory yield 250 ms (32 million strings of 48 bytes each), I have dropped the count to 12 million so more people could run it without allocating so much memory but it yields timings as follows,
Allocating 589824 megabytes for testing
If it crashes due to lack of memory, reduce the allocation
size in the 'lpcount' equate above
93 milliseconds rtrim1
94 milliseconds rtrim2
93 milliseconds rtrim1
94 milliseconds rtrim2
94 milliseconds rtrim1
94 milliseconds rtrim2
94 milliseconds rtrim1
94 milliseconds rtrim2
Press any key to continue ...
With just on 590 mem of memory with 12 million strings running under 100 ms, its a case of who cares.
Here is the test bed.
IF 0 ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
rtrim1 PROTO :DWORD ; first test algo
rtrim2 PROTO :DWORD ; second test algo
.data
item1 db "this is a test of rtrim algos ",0 ; 48 bytes total
item2 db " this is a test of rtrim algos ",0 ; 48 bytes total
item3 db " this is a test of rtrim algos ",0 ; 48 bytes total
item4 db " ",0 ; 48 bytes total
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
lpcount equ <1024*1024*12>
main proc
LOCAL txt :DWORD
LOCAL hArr :DWORD
LOCAL pArr :DWORD
LOCAL aInd :DWORD
LOCAL lcnt :DWORD
LOCAL icnt :DWORD
push esi
push edi
print "Allocating "
print ustr$(lpcount*48 / 1024)," megabytes for testing",13,10
print "If it crashes due to lack of memory, reduce the allocation",13,10
print "size in the 'lpcount' equate above",13,10
invoke SetPriorityClass,rv(GetCurrentProcess),HIGH_PRIORITY_CLASS
mov icnt, 4
tlp:
; ******************************************
; write an array of strings
; ------------------------------------------
mov eax, lpcount
shr eax, 2
mov lcnt, eax
mov pArr, rv(create_array,lpcount,48) ; pointer array handle
mov hArr, ecx ; main array memory handle
mov edi, pArr
larr1:
cst [edi], OFFSET item1
cst [edi+4], OFFSET item2
cst [edi+8], OFFSET item3
cst [edi+12], OFFSET item4
add edi, 4
sub lcnt, 1
jnz larr1
; ------------------------------------------
invoke GetTickCount
push eax
mov eax, lpcount
shr eax, 2
mov lcnt, eax
tarr1:
mov txt, rv(rtrim1,[edi])
mov txt, rv(rtrim1,[edi+4])
mov txt, rv(rtrim1,[edi+8])
mov txt, rv(rtrim1,[edi+12])
add edi, 4
sub lcnt, 1
jnz tarr1
free pArr
free hArr
invoke GetTickCount
pop ecx
sub eax, ecx
print ustr$(eax)," milliseconds rtrim1",13,10
; ******************************************
; write an array of strings
; ------------------------------------------
mov eax, lpcount
shr eax, 2
mov lcnt, eax
mov pArr, rv(create_array,lpcount,48) ; pointer array handle
mov hArr, ecx ; main array memory handle
mov edi, pArr
larr2:
cst [edi], OFFSET item1
cst [edi+4], OFFSET item2
cst [edi+8], OFFSET item3
cst [edi+12], OFFSET item4
add edi, 4
sub lcnt, 1
jnz larr2
; ------------------------------------------
invoke GetTickCount
push eax
mov eax, lpcount
shr eax, 2
mov lcnt, eax
tarr2:
mov txt, rv(rtrim2,[edi])
mov txt, rv(rtrim2,[edi+4])
mov txt, rv(rtrim2,[edi+8])
mov txt, rv(rtrim2,[edi+12])
add edi, 4
sub lcnt, 1
jnz tarr2
free pArr
free hArr
invoke GetTickCount
pop ecx
sub eax, ecx
print ustr$(eax)," milliseconds rtrim2",13,10
; ******************************************
sub icnt, 1
jnz tlp
invoke SetPriorityClass,rv(GetCurrentProcess),NORMAL_PRIORITY_CLASS
pop edi
pop esi
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
rtrim1 proc ptxt:DWORD
; ------------------------------------------------
; one pass left to right to determine last valid
; character then terminate it after that character
; ------------------------------------------------
mov edx, [esp+4] ; load address into EDX
mov ecx, edx ; store that address in ECX
sub edx, 1
lpst:
add edx, 1
movzx eax, BYTE PTR [edx] ; zero extend byte into EAX
test eax, eax ; test for zero terminator
jz lpout ; exit loop on zero
cmp eax, 32 ; test if space character or lower
jbe lpst ; jump back if below or equal
mov ecx, edx ; store updated last character location in ECX
jmp lpst ; jump back to loop start
lpout:
cmp ecx, [esp+4] ; if ECX has not been modified (empty string)
je nxt ; jump to NXT label
add ecx, 1 ; add 1 to ECX to write terminator past last character
nxt:
mov BYTE PTR [ecx], 0 ; write the terminator
mov eax, [esp+4] ; put original stack address in EAX as return address
ret 4 ; BYE !
rtrim1 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
rtrim2 proc ptxt:DWORD
; ------------------------------------------------
; one pass left to right to determine last valid
; character then terminate it after that character
; ------------------------------------------------
mov edx, [esp+4] ; load address into EDX
mov ecx, edx ; store that address in ECX
jmp ji
pre:
mov ecx, edx ; store updated last character location in ECX
lp1:
add edx, 1
ji:
movzx eax, BYTE PTR [edx] ; zero extend byte into EAX
cmp eax, 32 ; test if space character or lower
ja pre
test eax, eax ; test for zero terminator
jnz lp1 ; fall through on zero
lpout:
cmp ecx, [esp+4] ; if ECX has not been modified (empty string)
je nxt ; jump to NXT label
add ecx, 1 ; add 1 to ECX to write terminator past last character
nxt:
mov BYTE PTR [ecx], 0 ; write the terminator
mov eax, [esp+4] ; put original stack address in EAX as return address
ret 4 ; BYE !
rtrim2 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
Thanks. The performance of any rtrim() algo depends basically on
1. the cycles required to find the end of the string (len, Len, lstrlen, whatever)
2. the cycles required to find (backwards) the first char above Ascii 32
Once found, Masm32 inserts the zero delimiter there; MasmBasic Rtrim$() does not change the original but rather passes the pointer to the original plus the string len without spaces to the string engine.
Allocating 589824 megabytes for testing
If it crashes due to lack of memory, reduce the allocation
size in the 'lpcount' equate above
94 milliseconds rtrim1
78 milliseconds rtrim2
78 milliseconds rtrim1
93 milliseconds rtrim2
78 milliseconds rtrim1
78 milliseconds rtrim2
78 milliseconds rtrim1
93 milliseconds rtrim2
Press any key to continue ...
prescott w/htt @ 3 GHz
171 milliseconds rtrim1
171 milliseconds rtrim2
172 milliseconds rtrim1
187 milliseconds rtrim2
171 milliseconds rtrim1
172 milliseconds rtrim2
172 milliseconds rtrim1
172 milliseconds rtrim2
may i suggest.....
seperate the functions
you want to work on the RTrim operation, not StrLen
and what about UNICODE aware ? :icon_eek:
Dave,
The two algos do not use a len operation, they are a single pass from left to right that determines the last non blank space then places a terminator after it. If I have it right the same speed of both algos is determined by the memory access and for a byte scan design it probably will not go any faster.
A unicode version would look very similar but would read WORD size characters, not byte.
deleted
Hi nidud,
results from strtrim:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
171741 cycles - 0: standard (scasb)
10004 cycles - 1: rtrim1
11408 cycles - 2: rtrim2
11234 cycles - 3: new strlen (AgnerFog)
71151 cycles - 0: standard (scasb)
9874 cycles - 1: rtrim1
26642 cycles - 2: rtrim2
27218 cycles - 3: new strlen (AgnerFog)
133009 cycles - 0: standard (scasb)
23452 cycles - 1: rtrim1
26731 cycles - 2: rtrim2
27158 cycles - 3: new strlen (AgnerFog)
--- ok ---
Gunther
Here is a version that also tests an algo that scans the length then back scans to find last acceptable character. It is substantially slower than the single pass versions. It uses Agner Fog's old StrLen algo unrolled by 4 then back scans the string to find that last character. It could be optimised some more but there is little gain in it as the back scanner only has a single memory access.
Allocating 589824 megabytes for testing
If it crashes due to lack of memory, reduce the allocation
size in the 'lpcount' equate above
94 milliseconds rtrim1
94 milliseconds rtrim2
125 milliseconds rtrim3
94 milliseconds rtrim1
93 milliseconds rtrim2
125 milliseconds rtrim3
94 milliseconds rtrim1
94 milliseconds rtrim2
125 milliseconds rtrim3
94 milliseconds rtrim1
94 milliseconds rtrim2
125 milliseconds rtrim3
Press any key to continue ...
This is the test piece.
IF 0 ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
rtrim1 PROTO :DWORD ; first test algo
rtrim2 PROTO :DWORD ; second test algo
rtrim3 PROTO :DWORD ; third test algo
.data
item1 db "this is a test of rtrim algos ",0 ; 48 bytes total
item2 db " this is a test of rtrim algos ",0 ; 48 bytes total
item3 db " this is a test of rtrim algos ",0 ; 48 bytes total
item4 db " ",0 ; 48 bytes total
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
lpcount equ <1024*1024*12>
main proc
LOCAL txt :DWORD
LOCAL hArr :DWORD
LOCAL pArr :DWORD
LOCAL aInd :DWORD
LOCAL lcnt :DWORD
LOCAL icnt :DWORD
push esi
push edi
print "Allocating "
print ustr$(lpcount*48 / 1024)," megabytes for testing",13,10
print "If it crashes due to lack of memory, reduce the allocation",13,10
print "size in the 'lpcount' equate above",13,10
invoke SetPriorityClass,rv(GetCurrentProcess),HIGH_PRIORITY_CLASS
mov icnt, 4
tlp:
; ******************************************
; write an array of strings
; ------------------------------------------
mov eax, lpcount
shr eax, 2
mov lcnt, eax
mov pArr, rv(create_array,lpcount,48) ; pointer array handle
mov hArr, ecx ; main array memory handle
mov edi, pArr
larr1:
cst [edi], OFFSET item1
cst [edi+4], OFFSET item2
cst [edi+8], OFFSET item3
cst [edi+12], OFFSET item4
add edi, 4
sub lcnt, 1
jnz larr1
; ------------------------------------------
invoke GetTickCount
push eax
mov eax, lpcount
shr eax, 2
mov lcnt, eax
tarr1:
mov txt, rv(rtrim1,[edi])
mov txt, rv(rtrim1,[edi+4])
mov txt, rv(rtrim1,[edi+8])
mov txt, rv(rtrim1,[edi+12])
add edi, 4
sub lcnt, 1
jnz tarr1
free pArr
free hArr
invoke GetTickCount
pop ecx
sub eax, ecx
print ustr$(eax)," milliseconds rtrim1",13,10
; ******************************************
; write an array of strings
; ------------------------------------------
mov eax, lpcount
shr eax, 2
mov lcnt, eax
mov pArr, rv(create_array,lpcount,48) ; pointer array handle
mov hArr, ecx ; main array memory handle
mov edi, pArr
larr2:
cst [edi], OFFSET item1
cst [edi+4], OFFSET item2
cst [edi+8], OFFSET item3
cst [edi+12], OFFSET item4
add edi, 4
sub lcnt, 1
jnz larr2
; ------------------------------------------
invoke GetTickCount
push eax
mov eax, lpcount
shr eax, 2
mov lcnt, eax
tarr2:
mov txt, rv(rtrim2,[edi])
mov txt, rv(rtrim2,[edi+4])
mov txt, rv(rtrim2,[edi+8])
mov txt, rv(rtrim2,[edi+12])
add edi, 4
sub lcnt, 1
jnz tarr2
free pArr
free hArr
invoke GetTickCount
pop ecx
sub eax, ecx
print ustr$(eax)," milliseconds rtrim2",13,10
; ------------------------------------------
mov eax, lpcount
shr eax, 2
mov lcnt, eax
mov pArr, rv(create_array,lpcount,48) ; pointer array handle
mov hArr, ecx ; main array memory handle
mov edi, pArr
larr3:
cst [edi], OFFSET item1
cst [edi+4], OFFSET item2
cst [edi+8], OFFSET item3
cst [edi+12], OFFSET item4
add edi, 4
sub lcnt, 1
jnz larr3
; ------------------------------------------
invoke GetTickCount
push eax
mov eax, lpcount
shr eax, 2
mov lcnt, eax
tarr3:
mov txt, rv(rtrim3,[edi])
mov txt, rv(rtrim3,[edi+4])
mov txt, rv(rtrim3,[edi+8])
mov txt, rv(rtrim3,[edi+12])
add edi, 4
sub lcnt, 1
jnz tarr3
free pArr
free hArr
invoke GetTickCount
pop ecx
sub eax, ecx
print ustr$(eax)," milliseconds rtrim3",13,10
; ******************************************
sub icnt, 1
jnz tlp
invoke SetPriorityClass,rv(GetCurrentProcess),NORMAL_PRIORITY_CLASS
pop edi
pop esi
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
rtrim1 proc ptxt:DWORD
; ------------------------------------------------
; one pass left to right to determine last valid
; character then terminate it after that character
; ------------------------------------------------
mov edx, [esp+4] ; load address into EDX
mov ecx, edx ; store that address in ECX
sub edx, 1
lpst:
add edx, 1
movzx eax, BYTE PTR [edx] ; zero extend byte into EAX
test eax, eax ; test for zero terminator
jz lpout ; exit loop on zero
cmp eax, 32 ; test if space character or lower
jbe lpst ; jump back if below or equal
mov ecx, edx ; store updated last character location in ECX
jmp lpst ; jump back to loop start
lpout:
cmp ecx, [esp+4] ; if ECX has not been modified (empty string)
je nxt ; jump to NXT label
add ecx, 1 ; add 1 to ECX to write terminator past last character
nxt:
mov BYTE PTR [ecx], 0 ; write the terminator
mov eax, [esp+4] ; put original stack address in EAX as return address
ret 4 ; BYE !
rtrim1 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
rtrim2 proc ptxt:DWORD
; ------------------------------------------------
; one pass left to right to determine last valid
; character then terminate it after that character
; ------------------------------------------------
mov edx, [esp+4] ; load address into EDX
mov ecx, edx ; store that address in ECX
jmp ji
pre:
mov ecx, edx ; store updated last character location in ECX
lpst:
add edx, 1
ji:
movzx eax, BYTE PTR [edx] ; zero extend byte into EAX
cmp eax, 32 ; test if space character or lower
ja pre
test eax, eax ; test for zero terminator
jnz lpst ; fall through on zero
lpout:
cmp ecx, [esp+4] ; if ECX has not been modified (empty string)
je nxt ; jump to NXT label
add ecx, 1 ; add 1 to ECX to write terminator past last character
nxt:
mov BYTE PTR [ecx], 0 ; write the terminator
mov eax, [esp+4] ; put original stack address in EAX as return address
ret 4 ; BYE !
rtrim2 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
rtrim3 proc ptxt:DWORD
mov eax, [esp+4] ; get pointer to string
lea edx, [eax+3] ; pointer+3 used in the end
push esi
push edi
mov edi, 80808080h
@@:
REPEAT 3
mov esi, [eax] ; read first 4 bytes
add eax, 4 ; increment pointer
lea ecx, [esi-01010101h] ; subtract 1 from each byte
not esi ; invert all bytes
and ecx, esi ; and these two
and ecx, edi
jnz nxt
ENDM
mov esi, [eax] ; read first 4 bytes
add eax, 4 ; 4 increment DWORD pointer
lea ecx, [esi-01010101h] ; subtract 1 from each byte
not esi ; invert all bytes
and ecx, esi ; and these two
and ecx, edi
jz @B ; no zero bytes, continue loop
nxt:
test ecx, 00008080h ; test first two bytes
jnz @F
shr ecx, 16 ; not in the first 2 bytes
add eax, 2
@@:
shl cl, 1 ; use carry flag to avoid branch
sbb eax, edx ; compute length
; --------------------------------------
; set up for back scanning end of string
; --------------------------------------
mov esi, [esp+4][8] ; lower
mov edi, eax
add edi, esi ; higher
add edi, 1
bscn:
sub edi, 1
cmp esi, edi ; don't scan below start address
je bsout
cmp BYTE PTR [edi], 32 ; test if ascii 32 or less
jbe bscn ; loop back if unwanted character
add edi, 1 ; correct for string with characters
bsout:
mov BYTE PTR [edi], 0 ; terminate string
mov eax, esi ; copy start address into EAX
pop edi
pop esi
ret 4 ; BYE !
rtrim3 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
deleted
prescott w/htt @ 3.0 GHz
Allocating 589824 megabytes for testing
If it crashes due to lack of memory, reduce the allocation
size in the 'lpcount' equate above
188 milliseconds rtrim1
172 milliseconds rtrim2
297 milliseconds rtrim3
172 milliseconds rtrim1
188 milliseconds rtrim2
297 milliseconds rtrim3
188 milliseconds rtrim1
187 milliseconds rtrim2
297 milliseconds rtrim3
188 milliseconds rtrim1
188 milliseconds rtrim2
297 milliseconds rtrim3
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
31757 cycles - 0: standard (scasb)
27877 cycles - 1: rtrim1
17383 cycles - 2: rtrim2
12989 cycles - 3: new strlen (AgnerFog)
18351 cycles - 4: rtrimUltra (Masm32 len)
16126 cycles - 5: rtrim3
13127 cycles - 6: repeat 4 (AgnerFog)
13788 cycles - 7: rtrimUltra (MasmBasic Len) - SSE
8063 cycles - 8: strtrim unaligned - SSE
31439 cycles - 0: standard (scasb)
27853 cycles - 1: rtrim1
18167 cycles - 2: rtrim2
12934 cycles - 3: new strlen (AgnerFog)
18120 cycles - 4: rtrimUltra (Masm32 len)
15627 cycles - 5: rtrim3
13031 cycles - 6: repeat 4 (AgnerFog)
13014 cycles - 7: rtrimUltra (MasmBasic Len) - SSE
8047 cycles - 8: strtrim unaligned - SSE
33208 cycles - 0: standard (scasb)
27703 cycles - 1: rtrim1
17897 cycles - 2: rtrim2
15139 cycles - 3: new strlen (AgnerFog)
18173 cycles - 4: rtrimUltra (Masm32 len)
15778 cycles - 5: rtrim3
13135 cycles - 6: repeat 4 (AgnerFog)
10862 cycles - 7: rtrimUltra (MasmBasic Len) - SSE
10029 cycles - 8: strtrim unaligned - SSE
Quote from: nidud on June 15, 2014, 09:24:15 PM
Some new results including SSE functions.
Nice work :t
It seems your SSE2 version is a bit like Len() without the overhead.
But Hutch has created a testbed that identifies other champions, and I still have not fully understood why. No cache, of course, with hundreds of megabytes to process. And the trailing spaces are quite many (which makes the testbed a bit unrealistic 8)), but still, that doesn't explain why the one-pass should be faster than the len-plus-search-backwards strategy ::)
I did the data this way for a reason, small to large trailing spaces on a reasonable length string (48 bytes) to test the linear speed of the forward scan, something that should suit a DWORD scanner over a BYTE scanner but then taking extra time to back scan to find the last non character position. The very large sample was to get a duration long enough, I used to test on 100 meg but with faster processors its too small.
Testing on a very large linear source defeats loop and cache effects and you are left with processing time on always new data, not looped in cache data.
deleted
Results strtrim3:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
13273 cycles - 0: standard (scasb)
12690 cycles - 1: rtrim1
10177 cycles - 2: rtrim2
4261 cycles - 3: new strlen (AgnerFog)
5656 cycles - 4: rtrimUltra (Masm32 len)
3653 cycles - 5: rtrim3
3450 cycles - 6: repeat 4 (AgnerFog)
2589 cycles - 7: rtrimUltra (MasmBasic Len) - SSE
1501 cycles - 8: strtrim unaligned - SSE
13906 cycles - 0: standard (scasb)
32205 cycles - 1: rtrim1
24778 cycles - 2: rtrim2
11307 cycles - 3: new strlen (AgnerFog)
13530 cycles - 4: rtrimUltra (Masm32 len)
8814 cycles - 5: rtrim3
8101 cycles - 6: repeat 4 (AgnerFog)
5996 cycles - 7: rtrimUltra (MasmBasic Len) - SSE
4181 cycles - 8: strtrim unaligned - SSE
32142 cycles - 0: standard (scasb)
30590 cycles - 1: rtrim1
24709 cycles - 2: rtrim2
10966 cycles - 3: new strlen (AgnerFog)
13328 cycles - 4: rtrimUltra (Masm32 len)
8860 cycles - 5: rtrim3
8013 cycles - 6: repeat 4 (AgnerFog)
5974 cycles - 7: rtrimUltra (MasmBasic Len) - SSE
3501 cycles - 8: strtrim unaligned - SSE
--- ok ---
Gunther
I suspected there was a stuffup in the benchmark I had used because it was not responding to code changes but I found the problem, I was not resetting EDI before each read block performing the rtrim function. Very different results which are now making sense. Once I got it working correctly I have unrolled the 3rd algo (Agner Fog's Strlen algo with back scanner) by a factor of 8 which is overdoing it for an algo of this type but on this old quad, its about 5 times faster than the one pass scanners. I also mis-aligned the data by 1 byte to reduce the effectiveness of aligned reads but it made no difference in timings.
391 milliseconds rtrim1
422 milliseconds rtrim2
78 milliseconds rtrim3
375 milliseconds rtrim1
406 milliseconds rtrim2
78 milliseconds rtrim3
390 milliseconds rtrim1
406 milliseconds rtrim2
78 milliseconds rtrim3
390 milliseconds rtrim1
406 milliseconds rtrim2
78 milliseconds rtrim3
Press any key to continue ...
The modified benchmark is as follows.
IF 0 ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
rtrim1 PROTO :DWORD ; first test algo
rtrim2 PROTO :DWORD ; second test algo
rtrim3 PROTO :DWORD ; third test algo
.data
align 4
db 0
item1 db "this is a test of rtrim algos ",0 ; 48 bytes total
align 4
db 0
item2 db " this is a test of rtrim algos ",0 ; 48 bytes total
align 4
db 0
item3 db " this is a test of rtrim algos ",0 ; 48 bytes total
align 4
db 0
item4 db " ",0 ; 48 bytes total
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
lpcount equ <1024*1024*4>
main proc
LOCAL txt :DWORD
LOCAL hArr :DWORD
LOCAL pArr :DWORD
LOCAL aInd :DWORD
LOCAL lcnt :DWORD
LOCAL icnt :DWORD
push esi
push edi
invoke SetPriorityClass,rv(GetCurrentProcess),HIGH_PRIORITY_CLASS
mov icnt, 4
tlp:
; ******************************************
; write an array of strings
; ------------------------------------------
mov eax, lpcount
shr eax, 2
mov lcnt, eax
mov pArr, rv(create_array,lpcount,48) ; pointer array handle
mov hArr, ecx ; main array memory handle
mov edi, pArr
larr1:
cst [edi], OFFSET item1
cst [edi+4], OFFSET item2
cst [edi+8], OFFSET item3
cst [edi+12], OFFSET item4
add edi, 4
sub lcnt, 1
jnz larr1
; ------------------------------------------
invoke GetTickCount
push eax
mov eax, lpcount
shr eax, 2
mov lcnt, eax
mov edi, pArr
tarr1:
mov txt, rv(rtrim1,[edi])
mov txt, rv(rtrim1,[edi+4])
mov txt, rv(rtrim1,[edi+8])
mov txt, rv(rtrim1,[edi+12])
add edi, 4
sub lcnt, 1
jnz tarr1
free pArr
free hArr
invoke GetTickCount
pop ecx
sub eax, ecx
print ustr$(eax)," milliseconds rtrim1",13,10
; ******************************************
; write an array of strings
; ------------------------------------------
mov eax, lpcount
shr eax, 2
mov lcnt, eax
mov pArr, rv(create_array,lpcount,48) ; pointer array handle
mov hArr, ecx ; main array memory handle
mov edi, pArr
larr2:
cst [edi], OFFSET item1
cst [edi+4], OFFSET item2
cst [edi+8], OFFSET item3
cst [edi+12], OFFSET item4
add edi, 4
sub lcnt, 1
jnz larr2
; ------------------------------------------
invoke GetTickCount
push eax
mov eax, lpcount
shr eax, 2
mov lcnt, eax
mov edi, pArr
tarr2:
mov txt, rv(rtrim2,[edi])
mov txt, rv(rtrim2,[edi+4])
mov txt, rv(rtrim2,[edi+8])
mov txt, rv(rtrim2,[edi+12])
add edi, 4
sub lcnt, 1
jnz tarr2
free pArr
free hArr
invoke GetTickCount
pop ecx
sub eax, ecx
print ustr$(eax)," milliseconds rtrim2",13,10
; ------------------------------------------
mov eax, lpcount
shr eax, 2
mov lcnt, eax
mov pArr, rv(create_array,lpcount,48) ; pointer array handle
mov hArr, ecx ; main array memory handle
mov edi, pArr
larr3:
cst [edi], OFFSET item1
cst [edi+4], OFFSET item2
cst [edi+8], OFFSET item3
cst [edi+12], OFFSET item4
add edi, 4
sub lcnt, 1
jnz larr3
; ------------------------------------------
invoke GetTickCount
push eax
mov eax, lpcount
shr eax, 2
mov lcnt, eax
mov edi, pArr
tarr3:
mov txt, rv(rtrim3,[edi])
mov txt, rv(rtrim3,[edi+4])
mov txt, rv(rtrim3,[edi+8])
mov txt, rv(rtrim3,[edi+12])
add edi, 4
sub lcnt, 1
jnz tarr3
free pArr
free hArr
invoke GetTickCount
pop ecx
sub eax, ecx
print ustr$(eax)," milliseconds rtrim3",13,10
; ******************************************
sub icnt, 1
jnz tlp
invoke SetPriorityClass,rv(GetCurrentProcess),NORMAL_PRIORITY_CLASS
pop edi
pop esi
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
rtrim1 proc ptxt:DWORD
; ------------------------------------------------
; one pass left to right to determine last valid
; character then terminate it after that character
; ------------------------------------------------
mov edx, [esp+4] ; load address into EDX
mov ecx, edx ; store that address in ECX
sub edx, 1
lpst:
add edx, 1
movzx eax, BYTE PTR [edx] ; zero extend byte into EAX
test eax, eax ; test for zero terminator
jz lpout ; exit loop on zero
cmp eax, 32 ; test if space character or lower
jbe lpst ; jump back if below or equal
mov ecx, edx ; store updated last character location in ECX
jmp lpst ; jump back to loop start
lpout:
cmp ecx, [esp+4] ; if ECX has not been modified (empty string)
je nxt ; jump to NXT label
add ecx, 1 ; add 1 to ECX to write terminator past last character
nxt:
mov BYTE PTR [ecx], 0 ; write the terminator
mov eax, [esp+4] ; put original stack address in EAX as return address
ret 4 ; BYE !
rtrim1 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
rtrim2 proc ptxt:DWORD
; ------------------------------------------------
; one pass left to right to determine last valid
; character then terminate it after that character
; ------------------------------------------------
mov edx, [esp+4] ; load address into EDX
mov ecx, edx ; store that address in ECX
jmp ji
pre:
mov ecx, edx ; store updated last character location in ECX
lpst:
add edx, 1
ji:
movzx eax, BYTE PTR [edx] ; zero extend byte into EAX
cmp eax, 32 ; test if space character or lower
ja pre
test eax, eax ; test for zero terminator
jnz lpst ; fall through on zero
lpout:
cmp ecx, [esp+4] ; if ECX has not been modified (empty string)
je nxt ; jump to NXT label
add ecx, 1 ; add 1 to ECX to write terminator past last character
nxt:
mov BYTE PTR [ecx], 0 ; write the terminator
mov eax, [esp+4] ; put original stack address in EAX as return address
ret 4 ; BYE !
rtrim2 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
rtrim3 proc ptxt:DWORD
mov eax, [esp+4] ; get pointer to string
lea edx, [eax+3] ; pointer+3 used in the end
push esi
push edi
mov edi, 80808080h
@@:
REPEAT 7
mov esi, [eax] ; read first 4 bytes
add eax, 4 ; increment pointer
lea ecx, [esi-01010101h] ; subtract 1 from each byte
not esi ; invert all bytes
and ecx, esi ; and these two
and ecx, edi
jnz nxt
ENDM
mov esi, [eax] ; read first 4 bytes
add eax, 4 ; 4 increment DWORD pointer
lea ecx, [esi-01010101h] ; subtract 1 from each byte
not esi ; invert all bytes
and ecx, esi ; and these two
and ecx, edi
jz @B ; no zero bytes, continue loop
nxt:
test ecx, 00008080h ; test first two bytes
jnz @F
shr ecx, 16 ; not in the first 2 bytes
add eax, 2
@@:
shl cl, 1 ; use carry flag to avoid branch
sbb eax, edx ; compute length
; --------------------------------------
; set up for back scanning end of string
; --------------------------------------
mov esi, [esp+4][8] ; lower
mov edi, eax
add edi, esi ; higher
add edi, 1
bscn:
REPEAT 7
sub edi, 1
cmp esi, edi ; don't scan below start address
je bsout
cmp BYTE PTR [edi], 32 ; test if ascii 32 or less
ja bnxt ; loop back if unwanted character
ENDM
sub edi, 1
cmp esi, edi ; don't scan below start address
je bsout
cmp BYTE PTR [edi], 32 ; test if ascii 32 or less
jbe bscn ; loop back if unwanted character
bnxt:
add edi, 1 ; correct for string with characters
bsout:
mov BYTE PTR [edi], 0 ; terminate string
mov eax, esi ; copy start address into EAX
pop edi
pop esi
ret 4 ; BYE !
rtrim3 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
nidud StrTrim3
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
36191 cycles - 0: standard (scasb)
21212 cycles - 1: rtrim1
20491 cycles - 2: rtrim2
11768 cycles - 3: new strlen (AgnerFog)
16217 cycles - 4: rtrimUltra (Masm32 len)
15518 cycles - 5: rtrim3
11998 cycles - 6: repeat 4 (AgnerFog)
10416 cycles - 7: rtrimUltra (MasmBasic Len) - SSE
8129 cycles - 8: strtrim unaligned - SSE
29498 cycles - 0: standard (scasb)
21395 cycles - 1: rtrim1
22650 cycles - 2: rtrim2
11830 cycles - 3: new strlen (AgnerFog)
15956 cycles - 4: rtrimUltra (Masm32 len)
14441 cycles - 5: rtrim3
11902 cycles - 6: repeat 4 (AgnerFog)
10443 cycles - 7: rtrimUltra (MasmBasic Len) - SSE
9087 cycles - 8: strtrim unaligned - SSE
31280 cycles - 0: standard (scasb)
21175 cycles - 1: rtrim1
23340 cycles - 2: rtrim2
11768 cycles - 3: new strlen (AgnerFog)
16005 cycles - 4: rtrimUltra (Masm32 len)
14263 cycles - 5: rtrim3
12082 cycles - 6: repeat 4 (AgnerFog)
10416 cycles - 7: rtrimUltra (MasmBasic Len) - SSE
8142 cycles - 8: strtrim unaligned - SSE
Hutch #3
prescott w/htt @ 3.0 GHz
359 milliseconds rtrim1
328 milliseconds rtrim2
141 milliseconds rtrim3
438 milliseconds rtrim1
328 milliseconds rtrim2
141 milliseconds rtrim3
360 milliseconds rtrim1
328 milliseconds rtrim2
141 milliseconds rtrim3
375 milliseconds rtrim1
328 milliseconds rtrim2
141 milliseconds rtrim3
hutch latest (use code tags, n00b)
125 milliseconds rtrim1
78 milliseconds rtrim2
62 milliseconds rtrim3
141 milliseconds rtrim1
78 milliseconds rtrim2
46 milliseconds rtrim3
125 milliseconds rtrim1
78 milliseconds rtrim2
47 milliseconds rtrim3
125 milliseconds rtrim1
94 milliseconds rtrim2
47 milliseconds rtrim3
nidud latest
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
----------------------------------------------------
13864 cycles - 0: standard (scasb)
13400 cycles - 1: rtrim1
11011 cycles - 2: rtrim2
4630 cycles - 3: new strlen (AgnerFog)
5866 cycles - 4: rtrimUltra (Masm32 len)
3843 cycles - 5: rtrim3
3542 cycles - 6: repeat 4 (AgnerFog)
2604 cycles - 7: rtrimUltra (MasmBasic Len) - SSE
1555 cycles - 8: strtrim unaligned - SSE
14328 cycles - 0: standard (scasb)
14194 cycles - 1: rtrim1
10710 cycles - 2: rtrim2
4967 cycles - 3: new strlen (AgnerFog)
6027 cycles - 4: rtrimUltra (Masm32 len)
3792 cycles - 5: rtrim3
3449 cycles - 6: repeat 4 (AgnerFog)
2669 cycles - 7: rtrimUltra (MasmBasic Len) - SSE
1642 cycles - 8: strtrim unaligned - SSE
13670 cycles - 0: standard (scasb)
14597 cycles - 1: rtrim1
10797 cycles - 2: rtrim2
4746 cycles - 3: new strlen (AgnerFog)
5665 cycles - 4: rtrimUltra (Masm32 len)
3872 cycles - 5: rtrim3
3487 cycles - 6: repeat 4 (AgnerFog)
2690 cycles - 7: rtrimUltra (MasmBasic Len) - SSE
1514 cycles - 8: strtrim unaligned - SSE
deleted
Results from rtrim:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
140233 cycles - 500 ( 41) 0: strtrim
114064 cycles - 500 ( 53) 1: **failed** rtrim1
107836 cycles - 500 ( 50) 2: **failed** rtrim2
44405 cycles - 500 (319) 3: **failed** rtrim3
50526 cycles - 500 (103) 4: AgnerFog
49096 cycles - 500 ( 66) 5: AgnerFog (unaligned)
68013 cycles - 500 ( 0) 6: **failed** rtrimUltra (Masm32 len)
27418 cycles - 500 ( 0) 7: **failed** rtrimUltra (MasmBasic Len) - SSE
26399 cycles - 500 ( 60) 8: **failed** strtrim - SSE
--- ok ---
strlen:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
113712 cycles - 50 ( 0) 0: crt_strlen
189882 cycles - 50 ( 0) 1: MASM32 - szLen()
21866 cycles - 50 ( 0) 2: MasmBasic MbStrLen Len() - SSE
102459 cycles - 50 ( 86) 3: AgnerFog
100484 cycles - 50 ( 49) 4: AgnerFog unaligned
222688 cycles - 50 ( 94) 5: Dave
24659 cycles - 50 ( 45) 6: strlen SSE
--- ok ---
strstr:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
382038 cycles - 50 ( 67) 1: strstr
379086 cycles - 50 ( 71) 2: x
376889 cycles - 50 ( 57) 4: x
382763 cycles - 50 ( 54) 5: x
382307 cycles - 50 ( 50) 7: x
--- ok ---
OS is Win 7-64, SP 1.
Gunther
Quote from: nidud on July 16, 2014, 09:20:31 PM
148115 cycles - 500 ( 0) 6: **failed** rtrimUltra (Masm32 len)
102954 cycles - 500 ( 0) 7: **failed** rtrimUltra (MasmBasic Len) - SSE
How do they fail? Where are they? I count 40 files in your archive, dispersed over three folders, but can't find my algos there (and they usually don't fail :icon_mrgreen:).
Besides, the various makeit.bat don't create any executable, so where is the testbed? Is it one folder higher:
test.exe
test.obj
strlen.exe
test.asm
strstr.exe
rtrim.exe
makeit.bat
?? All 4 exe files brutally fail with a GPF...
deleted
Quote from: nidud on July 18, 2014, 11:21:02 AM
Quote(and they usually don't fail :icon_mrgreen:).
arg: db ' ',13,10,0 ; 000A0D20h
result: db ' ',00,10,0 ; 000A0020h
error: **failed** rtrimUltra (MasmBasic Len) - SSE
They don't fail, they just don't return what
validate_x expects. Rtrim$() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1166) tells Let (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1132), Cat$() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1133) or Print (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1112) where the trimmed string starts, and how many bytes are valid. In many cases, this is more efficient than shuffling bytes around.
deleted
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
TheSrc db "The source ", 0
Init
mov esi, Cat$(Rtrim$(offset TheSrc))
.While 1
lodsb
.Break .if !al
Print Hex$(al), " "
.Endw
Inkey " - OK?"
Exit
end start
54 68 65 20 73 6F 75 72 63 65 - OK?
;-)
deleted
Sorry, this intrigues me - and I can't reproduce the problems. Can you post source and exe, please?
Thanks :icon14:
Again, can't see how to reproduce this one. Besides, what did you use to produce the Ascii 13 of \r ? cStyle$() doesn't recognise that escape character, just \t and \n.
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
.code
TheSrc db "The source ", 0
TheR db " ", 9, 13, 13, 10, 0
start:
Print "RTRIM$: |", Rtrim$(" Trim a string "), "|",CrLf$
Print "[", Rtrim$(cStyle$("1 \t\n\n")), "]", CrLf$
Print "[", Rtrim$(cStyle$(" \t\n\n")), "]", CrLf$
Print "[", Rtrim$(offset TheR), "]", CrLf$
mov esi, Cat$(Rtrim$(offset TheSrc))
.Repeat
lodsb
Print Hex$(al), " "
.Until !byte ptr [esi]
Inkey " - OK?"
Exit
end start
Output:
RTRIM$: | Trim a string|
[1]
[]
[]
54 68 65 20 73 6F 75 72 63 65 - OK?
Can you post your source and the crashing exe, please?
deleted
deleted
Print "RTRIM$: |", Rtrim$(" Trim a string "), "|",CrLf$
Output:
RTRIM$: | Trim a string|
Looks OK for me...
The exe in your attachment crashes indeed, but even with full symbols the test.asm you sent assembles to a 35k file - your exe has 45k. So it was produced with a different source. Can you post source and matching exe, please?
deleted
deleted
Quote from: nidud on July 19, 2014, 12:17:23 AM
ok. I updated the library -- no crash
however, it still mess up the font
Try a
MbCpAnsi=437 before the first Print. See the orange section here (http://en.wikipedia.org/wiki/Code_page_437#Standard_code_page) for details.
deleted
Quote from: nidud on July 19, 2014, 04:24:34 AM
The standard code page for the console is normally "DOS code pages", 850 on my machine. The default for MasmBasic seems to be "Windows-1252" (http://en.wikipedia.org/wiki/Code_page_1252#Code_page).
Indeed. You can change that with MbCpAnsi=850
Quoteif I test the Rtrim$(" \r\n") macro nothing happens
proc_7 proc string:dword
mov eax,Rtrim$(string)
mov eax,string
ret
proc_7 endp
returns " \r\n"
Yes, because Rtrim$() doesn't change the original string. Comment out the
mov eax,string to get a pointer to the copy.