The MASM Forum

General => The Laboratory => Topic started by: hutch-- on June 13, 2014, 01:14:42 AM

Title: New right trim algo.
Post by: hutch-- on June 13, 2014, 01:14:42 AM
This algo does a one pass from left to right, determines the last valid character if there is one then terminates the string after that last character. If it is an empty string or a string full of spaces it writes the terminator at the beginning of the buffer so it is a zero length string.

The idea was to avoid alternative techniques where you had to determine the length then back scan to find the first non blank character. My guess is a one pass version, even though the instruction count per iteration is high is still faster than doing the same task in 2 passes.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    rtrim2 PROTO :DWORD

    .data
      item1 db "    this is a test    ",0
      item2 db "                      ",0
      item3 db 0

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL txt   :DWORD

    mov txt, rv(rtrim2,ADDR item1)      ; string with characters
    print txt,13,10
    print ustr$(len(txt)),13,10

    mov txt, rv(rtrim2,ADDR item2)      ; string with spaces only
    print txt,13,10
    print ustr$(len(txt)),13,10

    mov txt, rv(rtrim2,ADDR item3)      ; empty string, must be valid address
    print txt,13,10
    print ustr$(len(txt)),13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

rtrim2 proc ptxt:DWORD

  ; ------------------------------------------------
  ; one pass left to right to determine last valid
  ; character then terminate it after that character
  ; ------------------------------------------------

    mov edx, [esp+4]                ; load address into EDX
    mov ecx, edx                    ; store that address in ECX
    sub edx, 1

  lpst:
    add edx, 1
    movzx eax, BYTE PTR [edx]       ; zero extend byte into EAX
    test eax, eax                   ; test for zero terminator
    jz lpout                        ; exit loop on zero
    cmp eax, 32                     ; test if space character or lower
    jbe lpst                        ; jump back if below or equal
    mov ecx, edx                    ; store updated last character location in ECX
    jmp lpst                        ; jump back to loop start

  lpout:
    cmp ecx, [esp+4]                ; if ECX has not been modified (empty string)
    je nxt                          ; jump to NXT label
    add ecx, 1                      ; add 1 to ECX to write terminator past last character
  nxt:
    mov BYTE PTR [ecx], 0           ; write the terminator

    mov eax, [esp+4]                ; put original stack address in EAX as return address
    ret 4                           ; BYE !

rtrim2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: New right trim algo.
Post by: jj2007 on June 13, 2014, 01:57:24 AM
Timings:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
77434   cycles for 100 * rtrim2
76024   cycles for 100 * rtrim$
34988   cycles for 100 * Rtrim$

76767   cycles for 100 * trim2
76382   cycles for 100 * rtrim$
35256   cycles for 100 * Rtrim$

76691   cycles for 100 * trim2
76024   cycles for 100 * rtrim$
34993   cycles for 100 * Rtrim$
Title: Re: New right trim algo.
Post by: Gunther on June 13, 2014, 02:01:17 AM
Results:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

43270   cycles for 100 * trim2
26528   cycles for 100 * rtrim$
9904    cycles for 100 * Rtrim$

42794   cycles for 100 * trim2
26361   cycles for 100 * rtrim$
9810    cycles for 100 * Rtrim$

43108   cycles for 100 * trim2
26416   cycles for 100 * rtrim$
9834    cycles for 100 * Rtrim$

8       bytes for trim2
15      bytes for rtrim$
10      bytes for Rtrim$

--- ok ---


Gunther
Title: Re: New right trim algo.
Post by: dedndave on June 13, 2014, 02:08:18 AM
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

93453   cycles for 100 * trim2
91914   cycles for 100 * rtrim$
57921   cycles for 100 * Rtrim$

93506   cycles for 100 * trim2
92673   cycles for 100 * rtrim$
57474   cycles for 100 * Rtrim$

93606   cycles for 100 * trim2
92791   cycles for 100 * rtrim$
57808   cycles for 100 * Rtrim$
Title: Re: New right trim algo.
Post by: hutch-- on June 13, 2014, 03:34:08 AM
JJ,

Do you have a copy of your rtrim algo as I cannot build it with the normal masm32 system.
Title: Re: New right trim algo.
Post by: jj2007 on June 13, 2014, 04:01:09 AM
Quote from: hutch-- on June 13, 2014, 03:34:08 AM
Do you have a copy of your rtrim algo as I cannot build it with the normal masm32 system.

Not sure if that is helpful, but here it is. I guess the "speed secret" is the call to MbStrLen aka Len() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1144)

MbLMR proc uses esi edi ebx ecx edx srcLMR:DWORD, ctLMR:DWORD, stposLMR:DWORD      ; 192 bytes (11/2013)
  if 0
      mov eax, ctLMR
      bswap eax
      .if al==128
            shl ctLMR, 1
      .endif
else
      .if byte ptr ctLMR+3==128
            shl ctLMR, 1
      .endif
endif
  mov esi, srcLMR                  ; if not 0, esi is a pointer to the source string
  .if esi==127
      push 0
      call MbGetSlotPointer
      mov esi, [edx]
      mov eax, [edx+4]
      test eax, eax
      .if Sign?
            neg eax
      .endif
  .elseif esi<127                              ; PreAddLen
      mov eax, offset MbSlots
      .if !dword ptr [eax-4]
            call MbBufferInit
      .endif
      lea eax, [eax+8*esi-80]
      mov esi, [eax]
      mov eax, [eax+4]
  .else
      push esi
      .if al==128
            call MbStrLenW
            shl eax, 1
      .else
            call MbStrLen
      .endif
  .endif
  mov ebx, eax                  ; Len(esi)
  mov ecx, ctLMR                  ; no. of characters to be copied
  if 0
      test ecx, ecx
      je lmrRet
  else
      jecxz lmrRet                        ; nothing to copy, nullslot - nullpointer??
  endif
  mov edx, stposLMR            ; edx loaded with start pos: 1 for Left$, 0 for Null$, >0 for Mid$, <0 for Right$
  test edx, edx
  je lmrRetZero                  ; nothing to copy, nullslot - nullpointer??
  .if sdword ptr edx<0      ; Right$("Hallo", 3)      (traps also bad MID$("...", -1, n)
      cmp ecx, eax            ; ct>len()
      jg lmrRetFull            ; Greater (signed)
      add esi, eax            ; Len(src)
      .if sdword ptr ecx<0      ; negative ct allowed, returns full string
            mov ecx, eax
      .endif
      sub esi, ecx
      inc eax
  .elseif edx                        ; MID$ if stpos>1, otherwise Left$
      cmp edx, eax            ; start>len?
      jg lmrRetZero
      inc eax
      add esi, edx            ; MID$ start pos 1
      dec esi
      sub eax, edx
  .endif
  .if ecx>eax            ; unsigned, ecx might be negative
      mov ecx, eax                  ; ct > strlen: limit
  .endif
  test ecx, ecx
  jge @F
lmrRetFull:
  mov ecx, ebx            ; use full len
  jmp @F
lmrRetZero:
  xor ecx, ecx
@@:
lmrRet:
  push esi
  .if srcLMR==127
      sub [esp], esi                  ; clearing the stack is one byte shorter than .else construct
  .endif
  call MbGetSlotPointer      ; returns pointer to string in [edx]
  mov [edx], esi            ; pass the address
  mov [edx+4], ecx            ; and the len
  ret
MbLMR endp
Title: Re: New right trim algo.
Post by: hutch-- on June 14, 2014, 02:50:57 AM
It looks like an interesting number but it is still unbuildable.
Title: Re: New right trim algo.
Post by: jj2007 on June 14, 2014, 07:53:06 AM
Tough luck, Hutch - it's a bit too strongly integrated with the rest of the library. But if you are really keen on a fast algo, check the old forum for the fastest StrLen algo and use it. Or replace include \masm32\include\masm32rt.inc with ... MasmBasic.inc and use Len() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1144)  ;)

rtrimUltra:  ; fastest rtrim ever
    mov edx, [esp+4]                ; load address into EDX
    add Len(edx), edx
    .Repeat
      dec eax
    .Until byte ptr [eax]>32 || eax<=edx
    mov byte ptr [eax+1], 0
    xchg eax, edx
    retn 4


If that is still too MasmBasic, replace Len with len, and build it with masm32rt.inc - half as fast, but still a factor three faster than the original rtrim2.

BTW inserting the zero delimiter is considered an anti-social act, especially in read-only memory. MasmBasic Rtrim$() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1166) uses a softer technique 8)
Title: Re: New right trim algo.
Post by: hutch-- on June 14, 2014, 11:54:05 AM
What I was looking for was a technique to benchmark and test different algos but it appears there is not one available. Sounds like you have a fast combination to perform this task but if I can't build it, I can 't test it.
Title: Re: New right trim algo.
Post by: jj2007 on June 14, 2014, 05:23:16 PM
Quote from: hutch-- on June 14, 2014, 11:54:05 AM
What I was looking for was a technique to benchmark and test different algos but it appears there is not one available.
Attached. It's very easy to use and builds with Masm32 only if you wish so.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

14953   cycles for 100 * rtrimUltra (Masm32 len)
73356   cycles for 100 * rtrim$

14960   cycles for 100 * rtrimUltra (Masm32 len)
72571   cycles for 100 * rtrim$


QuoteSounds like you have a fast combination to perform this task but if I can't build it, I can 't test it.
Sounds a bit like "the CRT seems to have some good algos but since I don't want to include msvcrt.lib, I can't test them" ;-)
Title: Re: New right trim algo.
Post by: sinsi on June 14, 2014, 06:00:49 PM
Hey jj, am I missing something?
In your original attachment you reuse somestring but hutch's rtrim2 trims it, leaving the other tests working on an already-trimmed string.
Title: Re: New right trim algo.
Post by: hutch-- on June 14, 2014, 06:11:48 PM
There are a number of things I need to test that don't come in a pre-canned test bed, the attack rate (many short strings), the long read with a high count of end spaces as you sometimes get in log files to test any backward scanning techniques, whether the iteration count effects the timing etc etc etc ....

So far I have the original library version and the one I first posted here to test the newer one(s) against but as you seem to have a fast version, I was interested to see how it performed on multiple complex tests.

RE: read only files, if you need to trim a string from a file or source that is read only, is there any other way than to write it to a buffer while processing it ?
Title: Re: New right trim algo.
Post by: jj2007 on June 14, 2014, 06:50:53 PM
Quote from: sinsi on June 14, 2014, 06:00:49 PM
Hey jj, am I missing something?
In your original attachment you reuse somestring but hutch's rtrim2 trims it, leaving the other tests working on an already-trimmed string.

Sinsi,
You are perfectly right. Ideally, one would have to re-create the string for each iteration, but how to time that??

I have inserted this...

    mov byte ptr [eax+1+40], 0

... to provide more realistic timings, i.e. rtrimUltra does not trim off the original. In round 2, the Masm32 original trims it, though.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

31666   cycles for 100 * rtrimUltra (Masm32 len)
17806   cycles for 100 * rtrimUltra (MasmBasic Len)
73692   cycles for 100 * rtrim$

14948   cycles for 100 * rtrimUltra (Masm32 len)
4899    cycles for 100 * rtrimUltra (MasmBasic Len)
73104   cycles for 100 * rtrim$


Quote from: hutch-- on June 14, 2014, 06:11:48 PM
RE: read only files, if you need to trim a string from a file or source that is read only, is there any other way than to write it to a buffer while processing it ?

Valid point, of course, though I still would feel a bit uneasy doing that in a library routine. Maybe mention it in the documentation?
Title: Re: New right trim algo.
Post by: LarryC on June 14, 2014, 08:46:43 PM
Intel(R) Core(TM) i7 CPU         960  @ 3.20GHz (SSE4)

24159   cycles for 100 * rtrimUltra (Masm32 len)
17598   cycles for 100 * rtrimUltra (MasmBasic Len)
68562   cycles for 100 * rtrim$

11685   cycles for 100 * rtrimUltra (Masm32 len)
4215    cycles for 100 * rtrimUltra (MasmBasic Len)
68624   cycles for 100 * rtrim$

11795   cycles for 100 * rtrimUltra (Masm32 len)
4337    cycles for 100 * rtrimUltra (MasmBasic Len)
68742   cycles for 100 * rtrim$

36      bytes for rtrimUltra (Masm32 len)
8       bytes for rtrimUltra (MasmBasic Len)
13      bytes for rtrim$
Title: Re: New right trim algo.
Post by: hutch-- on June 14, 2014, 10:23:46 PM
This old quad has made a clown out of me many times with timings, after writing a second algo that has a shorter instruction path and writing a method of benchmarking the difference, the two algos stubbornly refuse to yield any difference in timing. The technique allocates very large amounts of memory as an array of strings then tests each algo. Timings on 1.5 gig of memory yield 250 ms (32 million strings of 48 bytes each), I have dropped the count to 12 million so more people could run it without allocating so much memory but it yields timings as follows,


Allocating 589824 megabytes for testing
If it crashes due to lack of memory, reduce the allocation
size in the 'lpcount' equate above
93 milliseconds rtrim1
94 milliseconds rtrim2
93 milliseconds rtrim1
94 milliseconds rtrim2
94 milliseconds rtrim1
94 milliseconds rtrim2
94 milliseconds rtrim1
94 milliseconds rtrim2
Press any key to continue ...


With just on 590 mem of memory with 12 million strings running under 100 ms, its a case of who cares.

Here is the test bed.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    rtrim1 PROTO :DWORD         ; first test algo
    rtrim2 PROTO :DWORD         ; second test algo

    .data
      item1 db "this is a test of rtrim algos                  ",0  ; 48 bytes total
      item2 db "       this is a test of rtrim algos           ",0  ; 48 bytes total
      item3 db "              this is a test of rtrim algos    ",0  ; 48 bytes total
      item4 db "                                               ",0  ; 48 bytes total

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    lpcount equ <1024*1024*12>

main proc

    LOCAL txt   :DWORD
    LOCAL hArr  :DWORD
    LOCAL pArr  :DWORD
    LOCAL aInd  :DWORD
    LOCAL lcnt  :DWORD
    LOCAL icnt  :DWORD

    push esi
    push edi

    print "Allocating "
    print ustr$(lpcount*48 / 1024)," megabytes for testing",13,10
    print "If it crashes due to lack of memory, reduce the allocation",13,10
    print "size in the 'lpcount' equate above",13,10

    invoke SetPriorityClass,rv(GetCurrentProcess),HIGH_PRIORITY_CLASS

  mov icnt, 4
  tlp:

  ; ******************************************
  ; write an array of strings
  ; ------------------------------------------
    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

    mov pArr, rv(create_array,lpcount,48)       ; pointer array handle
    mov hArr, ecx                               ; main array memory handle
    mov edi, pArr

  larr1:
    cst [edi],    OFFSET item1
    cst [edi+4],  OFFSET item2
    cst [edi+8],  OFFSET item3
    cst [edi+12], OFFSET item4
    add edi, 4
    sub lcnt, 1
    jnz larr1
  ; ------------------------------------------

    invoke GetTickCount
    push eax

    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

  tarr1:
    mov txt, rv(rtrim1,[edi])
    mov txt, rv(rtrim1,[edi+4])
    mov txt, rv(rtrim1,[edi+8])
    mov txt, rv(rtrim1,[edi+12])
    add edi, 4
    sub lcnt, 1
    jnz tarr1

    free pArr
    free hArr

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," milliseconds rtrim1",13,10

  ; ******************************************
  ; write an array of strings
  ; ------------------------------------------
    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

    mov pArr, rv(create_array,lpcount,48)       ; pointer array handle
    mov hArr, ecx                               ; main array memory handle
    mov edi, pArr

  larr2:
    cst [edi],    OFFSET item1
    cst [edi+4],  OFFSET item2
    cst [edi+8],  OFFSET item3
    cst [edi+12], OFFSET item4
    add edi, 4
    sub lcnt, 1
    jnz larr2
  ; ------------------------------------------

    invoke GetTickCount
    push eax

    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

  tarr2:
    mov txt, rv(rtrim2,[edi])
    mov txt, rv(rtrim2,[edi+4])
    mov txt, rv(rtrim2,[edi+8])
    mov txt, rv(rtrim2,[edi+12])
    add edi, 4
    sub lcnt, 1
    jnz tarr2

    free pArr
    free hArr

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," milliseconds rtrim2",13,10

  ; ******************************************

    sub icnt, 1
    jnz tlp

    invoke SetPriorityClass,rv(GetCurrentProcess),NORMAL_PRIORITY_CLASS

    pop edi
    pop esi

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

rtrim1 proc ptxt:DWORD

  ; ------------------------------------------------
  ; one pass left to right to determine last valid
  ; character then terminate it after that character
  ; ------------------------------------------------

    mov edx, [esp+4]                ; load address into EDX
    mov ecx, edx                    ; store that address in ECX
    sub edx, 1

  lpst:
    add edx, 1
    movzx eax, BYTE PTR [edx]       ; zero extend byte into EAX
    test eax, eax                   ; test for zero terminator
    jz lpout                        ; exit loop on zero
    cmp eax, 32                     ; test if space character or lower
    jbe lpst                        ; jump back if below or equal
    mov ecx, edx                    ; store updated last character location in ECX
    jmp lpst                        ; jump back to loop start

  lpout:
    cmp ecx, [esp+4]                ; if ECX has not been modified (empty string)
    je nxt                          ; jump to NXT label
    add ecx, 1                      ; add 1 to ECX to write terminator past last character
  nxt:
    mov BYTE PTR [ecx], 0           ; write the terminator

    mov eax, [esp+4]                ; put original stack address in EAX as return address
    ret 4                           ; BYE !

rtrim1 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

rtrim2 proc ptxt:DWORD

  ; ------------------------------------------------
  ; one pass left to right to determine last valid
  ; character then terminate it after that character
  ; ------------------------------------------------

    mov edx, [esp+4]                ; load address into EDX
    mov ecx, edx                    ; store that address in ECX
    jmp ji

  pre:
    mov ecx, edx                    ; store updated last character location in ECX
  lp1:
    add edx, 1
  ji:
    movzx eax, BYTE PTR [edx]       ; zero extend byte into EAX
    cmp eax, 32                     ; test if space character or lower
    ja pre
    test eax, eax                   ; test for zero terminator
    jnz lp1                         ; fall through on zero

  lpout:
    cmp ecx, [esp+4]                ; if ECX has not been modified (empty string)
    je nxt                          ; jump to NXT label
    add ecx, 1                      ; add 1 to ECX to write terminator past last character
  nxt:
    mov BYTE PTR [ecx], 0           ; write the terminator

    mov eax, [esp+4]                ; put original stack address in EAX as return address
    ret 4                           ; BYE !

rtrim2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: New right trim algo.
Post by: jj2007 on June 14, 2014, 10:24:20 PM
Thanks. The performance of any rtrim() algo depends basically on
1. the cycles required to find the end of the string (len, Len, lstrlen, whatever)
2. the cycles required to find (backwards) the first char above Ascii 32

Once found, Masm32 inserts the zero delimiter there; MasmBasic Rtrim$() does not change the original but rather passes the pointer to the original plus the string len without spaces to the string engine.
Title: Re: New right trim algo.
Post by: LarryC on June 14, 2014, 11:54:31 PM
Allocating 589824 megabytes for testing
If it crashes due to lack of memory, reduce the allocation
size in the 'lpcount' equate above
94 milliseconds rtrim1
78 milliseconds rtrim2
78 milliseconds rtrim1
93 milliseconds rtrim2
78 milliseconds rtrim1
78 milliseconds rtrim2
78 milliseconds rtrim1
93 milliseconds rtrim2
Press any key to continue ...
Title: Re: New right trim algo.
Post by: dedndave on June 15, 2014, 01:29:48 AM
prescott w/htt @ 3 GHz
171 milliseconds rtrim1
171 milliseconds rtrim2
172 milliseconds rtrim1
187 milliseconds rtrim2
171 milliseconds rtrim1
172 milliseconds rtrim2
172 milliseconds rtrim1
172 milliseconds rtrim2


may i suggest.....
seperate the functions
you want to work on the RTrim operation, not StrLen
and what about UNICODE aware ?   :icon_eek:
Title: Re: New right trim algo.
Post by: hutch-- on June 15, 2014, 02:23:50 AM
Dave,

The two algos do not use a len operation, they are a single pass from left to right that determines the last non blank space then places a terminator after it. If I have it right the same speed of both algos is determined by the memory access and for a byte scan design it probably will not go any faster.

A unicode version would look very similar but would read WORD size characters, not byte.
Title: Re: New right trim algo.
Post by: nidud on June 15, 2014, 02:58:18 AM
deleted
Title: Re: New right trim algo.
Post by: Gunther on June 15, 2014, 03:04:41 AM
Hi nidud,

results from strtrim:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
171741  cycles - 0: standard (scasb)
10004   cycles - 1: rtrim1
11408   cycles - 2: rtrim2
11234   cycles - 3: new strlen (AgnerFog)

71151   cycles - 0: standard (scasb)
9874    cycles - 1: rtrim1
26642   cycles - 2: rtrim2
27218   cycles - 3: new strlen (AgnerFog)

133009  cycles - 0: standard (scasb)
23452   cycles - 1: rtrim1
26731   cycles - 2: rtrim2
27158   cycles - 3: new strlen (AgnerFog)

--- ok ---


Gunther
Title: Re: New right trim algo.
Post by: hutch-- on June 15, 2014, 03:09:18 PM
Here is a version that also tests an algo that scans the length then back scans to find last acceptable character. It is substantially slower than the single pass versions. It uses Agner Fog's old StrLen algo unrolled by 4 then back scans the string to find that last character. It could be optimised some more but there is little gain in it as the back scanner only has a single memory access.


Allocating 589824 megabytes for testing
If it crashes due to lack of memory, reduce the allocation
size in the 'lpcount' equate above
94 milliseconds rtrim1
94 milliseconds rtrim2
125 milliseconds rtrim3
94 milliseconds rtrim1
93 milliseconds rtrim2
125 milliseconds rtrim3
94 milliseconds rtrim1
94 milliseconds rtrim2
125 milliseconds rtrim3
94 milliseconds rtrim1
94 milliseconds rtrim2
125 milliseconds rtrim3
Press any key to continue ...


This is the test piece.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    rtrim1 PROTO :DWORD         ; first test algo
    rtrim2 PROTO :DWORD         ; second test algo
    rtrim3 PROTO :DWORD         ; third test algo

    .data
      item1 db "this is a test of rtrim algos                  ",0  ; 48 bytes total
      item2 db "       this is a test of rtrim algos           ",0  ; 48 bytes total
      item3 db "              this is a test of rtrim algos    ",0  ; 48 bytes total
      item4 db "                                               ",0  ; 48 bytes total

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    lpcount equ <1024*1024*12>

main proc

    LOCAL txt   :DWORD
    LOCAL hArr  :DWORD
    LOCAL pArr  :DWORD
    LOCAL aInd  :DWORD
    LOCAL lcnt  :DWORD
    LOCAL icnt  :DWORD

    push esi
    push edi

    print "Allocating "
    print ustr$(lpcount*48 / 1024)," megabytes for testing",13,10
    print "If it crashes due to lack of memory, reduce the allocation",13,10
    print "size in the 'lpcount' equate above",13,10

    invoke SetPriorityClass,rv(GetCurrentProcess),HIGH_PRIORITY_CLASS

  mov icnt, 4
  tlp:

  ; ******************************************
  ; write an array of strings
  ; ------------------------------------------
    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

    mov pArr, rv(create_array,lpcount,48)       ; pointer array handle
    mov hArr, ecx                               ; main array memory handle
    mov edi, pArr

  larr1:
    cst [edi],    OFFSET item1
    cst [edi+4],  OFFSET item2
    cst [edi+8],  OFFSET item3
    cst [edi+12], OFFSET item4
    add edi, 4
    sub lcnt, 1
    jnz larr1
  ; ------------------------------------------

    invoke GetTickCount
    push eax

    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

  tarr1:
    mov txt, rv(rtrim1,[edi])
    mov txt, rv(rtrim1,[edi+4])
    mov txt, rv(rtrim1,[edi+8])
    mov txt, rv(rtrim1,[edi+12])
    add edi, 4
    sub lcnt, 1
    jnz tarr1

    free pArr
    free hArr

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," milliseconds rtrim1",13,10

  ; ******************************************
  ; write an array of strings
  ; ------------------------------------------
    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

    mov pArr, rv(create_array,lpcount,48)       ; pointer array handle
    mov hArr, ecx                               ; main array memory handle
    mov edi, pArr

  larr2:
    cst [edi],    OFFSET item1
    cst [edi+4],  OFFSET item2
    cst [edi+8],  OFFSET item3
    cst [edi+12], OFFSET item4
    add edi, 4
    sub lcnt, 1
    jnz larr2
  ; ------------------------------------------

    invoke GetTickCount
    push eax

    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

  tarr2:
    mov txt, rv(rtrim2,[edi])
    mov txt, rv(rtrim2,[edi+4])
    mov txt, rv(rtrim2,[edi+8])
    mov txt, rv(rtrim2,[edi+12])
    add edi, 4
    sub lcnt, 1
    jnz tarr2

    free pArr
    free hArr

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," milliseconds rtrim2",13,10

  ; ------------------------------------------
    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

    mov pArr, rv(create_array,lpcount,48)       ; pointer array handle
    mov hArr, ecx                               ; main array memory handle
    mov edi, pArr

  larr3:
    cst [edi],    OFFSET item1
    cst [edi+4],  OFFSET item2
    cst [edi+8],  OFFSET item3
    cst [edi+12], OFFSET item4
    add edi, 4
    sub lcnt, 1
    jnz larr3

  ; ------------------------------------------

    invoke GetTickCount
    push eax

    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

  tarr3:
    mov txt, rv(rtrim3,[edi])
    mov txt, rv(rtrim3,[edi+4])
    mov txt, rv(rtrim3,[edi+8])
    mov txt, rv(rtrim3,[edi+12])
    add edi, 4
    sub lcnt, 1
    jnz tarr3

    free pArr
    free hArr

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," milliseconds rtrim3",13,10


  ; ******************************************

    sub icnt, 1
    jnz tlp

    invoke SetPriorityClass,rv(GetCurrentProcess),NORMAL_PRIORITY_CLASS

    pop edi
    pop esi

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

rtrim1 proc ptxt:DWORD

  ; ------------------------------------------------
  ; one pass left to right to determine last valid
  ; character then terminate it after that character
  ; ------------------------------------------------

    mov edx, [esp+4]                ; load address into EDX
    mov ecx, edx                    ; store that address in ECX
    sub edx, 1

  lpst:
    add edx, 1
    movzx eax, BYTE PTR [edx]       ; zero extend byte into EAX
    test eax, eax                   ; test for zero terminator
    jz lpout                        ; exit loop on zero
    cmp eax, 32                     ; test if space character or lower
    jbe lpst                        ; jump back if below or equal
    mov ecx, edx                    ; store updated last character location in ECX
    jmp lpst                        ; jump back to loop start

  lpout:
    cmp ecx, [esp+4]                ; if ECX has not been modified (empty string)
    je nxt                          ; jump to NXT label
    add ecx, 1                      ; add 1 to ECX to write terminator past last character
  nxt:
    mov BYTE PTR [ecx], 0           ; write the terminator

    mov eax, [esp+4]                ; put original stack address in EAX as return address
    ret 4                           ; BYE !

rtrim1 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

rtrim2 proc ptxt:DWORD

  ; ------------------------------------------------
  ; one pass left to right to determine last valid
  ; character then terminate it after that character
  ; ------------------------------------------------
    mov edx, [esp+4]                ; load address into EDX
    mov ecx, edx                    ; store that address in ECX
    jmp ji

  pre:
    mov ecx, edx                    ; store updated last character location in ECX
  lpst:
    add edx, 1
  ji:
    movzx eax, BYTE PTR [edx]       ; zero extend byte into EAX
    cmp eax, 32                     ; test if space character or lower
    ja pre
    test eax, eax                   ; test for zero terminator
    jnz lpst                        ; fall through on zero

  lpout:
    cmp ecx, [esp+4]                ; if ECX has not been modified (empty string)
    je nxt                          ; jump to NXT label
    add ecx, 1                      ; add 1 to ECX to write terminator past last character
  nxt:
    mov BYTE PTR [ecx], 0           ; write the terminator

    mov eax, [esp+4]                ; put original stack address in EAX as return address
    ret 4                           ; BYE !

rtrim2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

rtrim3 proc ptxt:DWORD

    mov     eax, [esp+4]            ; get pointer to string
    lea     edx, [eax+3]            ; pointer+3 used in the end
    push    esi
    push    edi
    mov     edi, 80808080h

  @@:     
  REPEAT 3
    mov     esi, [eax]              ; read first 4 bytes
    add     eax, 4                  ; increment pointer
    lea     ecx, [esi-01010101h]    ; subtract 1 from each byte
    not     esi                     ; invert all bytes
    and     ecx, esi                ; and these two
    and     ecx, edi
    jnz     nxt
  ENDM

    mov     esi, [eax]              ; read first 4 bytes
    add     eax, 4                  ; 4 increment DWORD pointer
    lea     ecx, [esi-01010101h]    ; subtract 1 from each byte
    not     esi                     ; invert all bytes
    and     ecx, esi                ; and these two
    and     ecx, edi
    jz      @B                      ; no zero bytes, continue loop

  nxt:
    test    ecx, 00008080h          ; test first two bytes
    jnz     @F
    shr     ecx, 16                 ; not in the first 2 bytes
    add     eax, 2
  @@:
    shl     cl, 1                   ; use carry flag to avoid branch
    sbb     eax, edx                ; compute length

  ; --------------------------------------
  ; set up for back scanning end of string
  ; --------------------------------------
    mov esi, [esp+4][8]             ; lower
    mov edi, eax
    add edi, esi                    ; higher
    add edi, 1

  bscn:
    sub edi, 1
    cmp esi, edi                    ; don't scan below start address
    je bsout
    cmp BYTE PTR [edi], 32          ; test if ascii 32 or less
    jbe bscn                        ; loop back if unwanted character

    add edi, 1                      ; correct for string with characters

  bsout:
    mov BYTE PTR [edi], 0           ; terminate string
    mov eax, esi                    ; copy start address into EAX

    pop     edi
    pop     esi

    ret 4                           ; BYE !

rtrim3 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: New right trim algo.
Post by: nidud on June 15, 2014, 09:24:15 PM
deleted
Title: Re: New right trim algo.
Post by: dedndave on June 15, 2014, 09:24:57 PM
prescott w/htt @ 3.0 GHz
Allocating 589824 megabytes for testing
If it crashes due to lack of memory, reduce the allocation
size in the 'lpcount' equate above
188 milliseconds rtrim1
172 milliseconds rtrim2
297 milliseconds rtrim3
172 milliseconds rtrim1
188 milliseconds rtrim2
297 milliseconds rtrim3
188 milliseconds rtrim1
187 milliseconds rtrim2
297 milliseconds rtrim3
188 milliseconds rtrim1
188 milliseconds rtrim2
297 milliseconds rtrim3
Title: Re: New right trim algo.
Post by: dedndave on June 15, 2014, 09:27:51 PM
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
31757   cycles - 0: standard (scasb)
27877   cycles - 1: rtrim1
17383   cycles - 2: rtrim2
12989   cycles - 3: new strlen (AgnerFog)
18351   cycles - 4: rtrimUltra (Masm32 len)
16126   cycles - 5: rtrim3
13127   cycles - 6: repeat 4 (AgnerFog)
13788   cycles - 7: rtrimUltra (MasmBasic Len) - SSE
8063    cycles - 8: strtrim unaligned - SSE

31439   cycles - 0: standard (scasb)
27853   cycles - 1: rtrim1
18167   cycles - 2: rtrim2
12934   cycles - 3: new strlen (AgnerFog)
18120   cycles - 4: rtrimUltra (Masm32 len)
15627   cycles - 5: rtrim3
13031   cycles - 6: repeat 4 (AgnerFog)
13014   cycles - 7: rtrimUltra (MasmBasic Len) - SSE
8047    cycles - 8: strtrim unaligned - SSE

33208   cycles - 0: standard (scasb)
27703   cycles - 1: rtrim1
17897   cycles - 2: rtrim2
15139   cycles - 3: new strlen (AgnerFog)
18173   cycles - 4: rtrimUltra (Masm32 len)
15778   cycles - 5: rtrim3
13135   cycles - 6: repeat 4 (AgnerFog)
10862   cycles - 7: rtrimUltra (MasmBasic Len) - SSE
10029   cycles - 8: strtrim unaligned - SSE
Title: Re: New right trim algo.
Post by: jj2007 on June 15, 2014, 09:40:41 PM
Quote from: nidud on June 15, 2014, 09:24:15 PM
Some new results including SSE functions.

Nice work :t
It seems your SSE2 version is a bit like Len() without the overhead.

But Hutch has created a testbed that identifies other champions, and I still have not fully understood why. No cache, of course, with hundreds of megabytes to process. And the trailing spaces are quite many (which makes the testbed a bit unrealistic 8)), but still, that doesn't explain why the one-pass should be faster than the len-plus-search-backwards strategy ::)
Title: Re: New right trim algo.
Post by: hutch-- on June 15, 2014, 10:12:59 PM
I did the data this way for a reason, small to large trailing spaces on a reasonable length string (48 bytes) to test the linear speed of the forward scan, something that should suit a DWORD scanner over a BYTE scanner but then taking extra time to back scan to find the last non character position. The very large sample was to get a duration long enough, I used to test on 100 meg but with faster processors its too small.

Testing on a very large linear source defeats loop and cache effects and you are left with processing time on always new data, not looped in cache data.
Title: Re: New right trim algo.
Post by: nidud on June 15, 2014, 11:07:34 PM
deleted
Title: Re: New right trim algo.
Post by: Gunther on June 15, 2014, 11:23:04 PM
Results strtrim3:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
13273   cycles - 0: standard (scasb)
12690   cycles - 1: rtrim1
10177   cycles - 2: rtrim2
4261    cycles - 3: new strlen (AgnerFog)
5656    cycles - 4: rtrimUltra (Masm32 len)
3653    cycles - 5: rtrim3
3450    cycles - 6: repeat 4 (AgnerFog)
2589    cycles - 7: rtrimUltra (MasmBasic Len) - SSE
1501    cycles - 8: strtrim unaligned - SSE

13906   cycles - 0: standard (scasb)
32205   cycles - 1: rtrim1
24778   cycles - 2: rtrim2
11307   cycles - 3: new strlen (AgnerFog)
13530   cycles - 4: rtrimUltra (Masm32 len)
8814    cycles - 5: rtrim3
8101    cycles - 6: repeat 4 (AgnerFog)
5996    cycles - 7: rtrimUltra (MasmBasic Len) - SSE
4181    cycles - 8: strtrim unaligned - SSE

32142   cycles - 0: standard (scasb)
30590   cycles - 1: rtrim1
24709   cycles - 2: rtrim2
10966   cycles - 3: new strlen (AgnerFog)
13328   cycles - 4: rtrimUltra (Masm32 len)
8860    cycles - 5: rtrim3
8013    cycles - 6: repeat 4 (AgnerFog)
5974    cycles - 7: rtrimUltra (MasmBasic Len) - SSE
3501    cycles - 8: strtrim unaligned - SSE

--- ok ---


Gunther
Title: Re: New right trim algo.
Post by: hutch-- on June 16, 2014, 01:49:23 AM
I suspected there was a stuffup in the benchmark I had used because it was not responding to code changes but I found the problem, I was not resetting EDI before each read block performing the rtrim function. Very different results which are now making sense. Once I got it working correctly I have unrolled the 3rd algo (Agner Fog's Strlen algo with back scanner) by a factor of 8 which is overdoing it for an algo of this type but on this old quad, its about 5 times faster than the one pass scanners. I also mis-aligned the data by 1 byte to reduce the effectiveness of aligned reads but it made no difference in timings.


391 milliseconds rtrim1
422 milliseconds rtrim2
78 milliseconds rtrim3
375 milliseconds rtrim1
406 milliseconds rtrim2
78 milliseconds rtrim3
390 milliseconds rtrim1
406 milliseconds rtrim2
78 milliseconds rtrim3
390 milliseconds rtrim1
406 milliseconds rtrim2
78 milliseconds rtrim3
Press any key to continue ...


The modified benchmark is as follows.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    rtrim1 PROTO :DWORD         ; first test algo
    rtrim2 PROTO :DWORD         ; second test algo
    rtrim3 PROTO :DWORD         ; third test algo

    .data
    align 4
    db 0
      item1 db "this is a test of rtrim algos                  ",0  ; 48 bytes total

    align 4
    db 0
      item2 db "       this is a test of rtrim algos           ",0  ; 48 bytes total

    align 4
    db 0
      item3 db "              this is a test of rtrim algos    ",0  ; 48 bytes total

    align 4
    db 0
      item4 db "                                               ",0  ; 48 bytes total

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    lpcount equ <1024*1024*4>

main proc

    LOCAL txt   :DWORD
    LOCAL hArr  :DWORD
    LOCAL pArr  :DWORD
    LOCAL aInd  :DWORD
    LOCAL lcnt  :DWORD
    LOCAL icnt  :DWORD

    push esi
    push edi

    invoke SetPriorityClass,rv(GetCurrentProcess),HIGH_PRIORITY_CLASS

  mov icnt, 4
  tlp:

  ; ******************************************
  ; write an array of strings
  ; ------------------------------------------
    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

    mov pArr, rv(create_array,lpcount,48)       ; pointer array handle
    mov hArr, ecx                               ; main array memory handle
    mov edi, pArr

  larr1:
    cst [edi],    OFFSET item1
    cst [edi+4],  OFFSET item2
    cst [edi+8],  OFFSET item3
    cst [edi+12], OFFSET item4
    add edi, 4
    sub lcnt, 1
    jnz larr1
  ; ------------------------------------------

    invoke GetTickCount
    push eax

    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

    mov edi, pArr

  tarr1:
    mov txt, rv(rtrim1,[edi])
    mov txt, rv(rtrim1,[edi+4])
    mov txt, rv(rtrim1,[edi+8])
    mov txt, rv(rtrim1,[edi+12])
    add edi, 4
    sub lcnt, 1
    jnz tarr1

    free pArr
    free hArr

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," milliseconds rtrim1",13,10

  ; ******************************************
  ; write an array of strings
  ; ------------------------------------------
    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

    mov pArr, rv(create_array,lpcount,48)       ; pointer array handle
    mov hArr, ecx                               ; main array memory handle
    mov edi, pArr

  larr2:
    cst [edi],    OFFSET item1
    cst [edi+4],  OFFSET item2
    cst [edi+8],  OFFSET item3
    cst [edi+12], OFFSET item4
    add edi, 4
    sub lcnt, 1
    jnz larr2
  ; ------------------------------------------

    invoke GetTickCount
    push eax

    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

    mov edi, pArr

  tarr2:
    mov txt, rv(rtrim2,[edi])
    mov txt, rv(rtrim2,[edi+4])
    mov txt, rv(rtrim2,[edi+8])
    mov txt, rv(rtrim2,[edi+12])
    add edi, 4
    sub lcnt, 1
    jnz tarr2

    free pArr
    free hArr

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," milliseconds rtrim2",13,10

  ; ------------------------------------------
    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

    mov pArr, rv(create_array,lpcount,48)       ; pointer array handle
    mov hArr, ecx                               ; main array memory handle
    mov edi, pArr

  larr3:
    cst [edi],    OFFSET item1
    cst [edi+4],  OFFSET item2
    cst [edi+8],  OFFSET item3
    cst [edi+12], OFFSET item4
    add edi, 4
    sub lcnt, 1
    jnz larr3

  ; ------------------------------------------

    invoke GetTickCount
    push eax

    mov eax, lpcount
    shr eax, 2
    mov lcnt, eax

    mov edi, pArr

  tarr3:
    mov txt, rv(rtrim3,[edi])
    mov txt, rv(rtrim3,[edi+4])
    mov txt, rv(rtrim3,[edi+8])
    mov txt, rv(rtrim3,[edi+12])
    add edi, 4
    sub lcnt, 1
    jnz tarr3

    free pArr
    free hArr

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," milliseconds rtrim3",13,10

  ; ******************************************

    sub icnt, 1
    jnz tlp

    invoke SetPriorityClass,rv(GetCurrentProcess),NORMAL_PRIORITY_CLASS

    pop edi
    pop esi

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

rtrim1 proc ptxt:DWORD

  ; ------------------------------------------------
  ; one pass left to right to determine last valid
  ; character then terminate it after that character
  ; ------------------------------------------------

    mov edx, [esp+4]                ; load address into EDX
    mov ecx, edx                    ; store that address in ECX
    sub edx, 1

  lpst:
    add edx, 1
    movzx eax, BYTE PTR [edx]       ; zero extend byte into EAX
    test eax, eax                   ; test for zero terminator
    jz lpout                        ; exit loop on zero
    cmp eax, 32                     ; test if space character or lower
    jbe lpst                        ; jump back if below or equal
    mov ecx, edx                    ; store updated last character location in ECX
    jmp lpst                        ; jump back to loop start

  lpout:
    cmp ecx, [esp+4]                ; if ECX has not been modified (empty string)
    je nxt                          ; jump to NXT label
    add ecx, 1                      ; add 1 to ECX to write terminator past last character
  nxt:
    mov BYTE PTR [ecx], 0           ; write the terminator

    mov eax, [esp+4]                ; put original stack address in EAX as return address
    ret 4                           ; BYE !

rtrim1 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

rtrim2 proc ptxt:DWORD

  ; ------------------------------------------------
  ; one pass left to right to determine last valid
  ; character then terminate it after that character
  ; ------------------------------------------------
    mov edx, [esp+4]                ; load address into EDX
    mov ecx, edx                    ; store that address in ECX
    jmp ji

  pre:
    mov ecx, edx                    ; store updated last character location in ECX
  lpst:
    add edx, 1
  ji:
    movzx eax, BYTE PTR [edx]       ; zero extend byte into EAX
    cmp eax, 32                     ; test if space character or lower
    ja pre
    test eax, eax                   ; test for zero terminator
    jnz lpst                        ; fall through on zero

  lpout:
    cmp ecx, [esp+4]                ; if ECX has not been modified (empty string)
    je nxt                          ; jump to NXT label
    add ecx, 1                      ; add 1 to ECX to write terminator past last character
  nxt:
    mov BYTE PTR [ecx], 0           ; write the terminator

    mov eax, [esp+4]                ; put original stack address in EAX as return address
    ret 4                           ; BYE !

rtrim2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

rtrim3 proc ptxt:DWORD

    mov     eax, [esp+4]            ; get pointer to string
    lea     edx, [eax+3]            ; pointer+3 used in the end
    push    esi
    push    edi
    mov     edi, 80808080h

  @@:     
  REPEAT 7
    mov     esi, [eax]              ; read first 4 bytes
    add     eax, 4                  ; increment pointer
    lea     ecx, [esi-01010101h]    ; subtract 1 from each byte
    not     esi                     ; invert all bytes
    and     ecx, esi                ; and these two
    and     ecx, edi
    jnz     nxt
  ENDM

    mov     esi, [eax]              ; read first 4 bytes
    add     eax, 4                  ; 4 increment DWORD pointer
    lea     ecx, [esi-01010101h]    ; subtract 1 from each byte
    not     esi                     ; invert all bytes
    and     ecx, esi                ; and these two
    and     ecx, edi
    jz      @B                      ; no zero bytes, continue loop

  nxt:
    test    ecx, 00008080h          ; test first two bytes
    jnz     @F
    shr     ecx, 16                 ; not in the first 2 bytes
    add     eax, 2
  @@:
    shl     cl, 1                   ; use carry flag to avoid branch
    sbb     eax, edx                ; compute length

  ; --------------------------------------
  ; set up for back scanning end of string
  ; --------------------------------------
    mov esi, [esp+4][8]             ; lower
    mov edi, eax
    add edi, esi                    ; higher
    add edi, 1

  bscn:
  REPEAT 7
    sub edi, 1
    cmp esi, edi                    ; don't scan below start address
    je bsout
    cmp BYTE PTR [edi], 32          ; test if ascii 32 or less
    ja bnxt                         ; loop back if unwanted character
  ENDM

    sub edi, 1
    cmp esi, edi                    ; don't scan below start address
    je bsout
    cmp BYTE PTR [edi], 32          ; test if ascii 32 or less
    jbe bscn                        ; loop back if unwanted character

  bnxt:
    add edi, 1                      ; correct for string with characters

  bsout:
    mov BYTE PTR [edi], 0           ; terminate string
    mov eax, esi                    ; copy start address into EAX

    pop     edi
    pop     esi

    ret 4                           ; BYE !

rtrim3 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: New right trim algo.
Post by: dedndave on June 16, 2014, 02:33:19 AM
nidud StrTrim3
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
36191   cycles - 0: standard (scasb)
21212   cycles - 1: rtrim1
20491   cycles - 2: rtrim2
11768   cycles - 3: new strlen (AgnerFog)
16217   cycles - 4: rtrimUltra (Masm32 len)
15518   cycles - 5: rtrim3
11998   cycles - 6: repeat 4 (AgnerFog)
10416   cycles - 7: rtrimUltra (MasmBasic Len) - SSE
8129    cycles - 8: strtrim unaligned - SSE

29498   cycles - 0: standard (scasb)
21395   cycles - 1: rtrim1
22650   cycles - 2: rtrim2
11830   cycles - 3: new strlen (AgnerFog)
15956   cycles - 4: rtrimUltra (Masm32 len)
14441   cycles - 5: rtrim3
11902   cycles - 6: repeat 4 (AgnerFog)
10443   cycles - 7: rtrimUltra (MasmBasic Len) - SSE
9087    cycles - 8: strtrim unaligned - SSE

31280   cycles - 0: standard (scasb)
21175   cycles - 1: rtrim1
23340   cycles - 2: rtrim2
11768   cycles - 3: new strlen (AgnerFog)
16005   cycles - 4: rtrimUltra (Masm32 len)
14263   cycles - 5: rtrim3
12082   cycles - 6: repeat 4 (AgnerFog)
10416   cycles - 7: rtrimUltra (MasmBasic Len) - SSE
8142    cycles - 8: strtrim unaligned - SSE
Title: Re: New right trim algo.
Post by: dedndave on June 16, 2014, 02:35:35 AM
Hutch #3
prescott w/htt @ 3.0 GHz
359 milliseconds rtrim1
328 milliseconds rtrim2
141 milliseconds rtrim3
438 milliseconds rtrim1
328 milliseconds rtrim2
141 milliseconds rtrim3
360 milliseconds rtrim1
328 milliseconds rtrim2
141 milliseconds rtrim3
375 milliseconds rtrim1
328 milliseconds rtrim2
141 milliseconds rtrim3
Title: Re: New right trim algo.
Post by: sinsi on June 16, 2014, 03:57:34 PM
hutch latest (use code tags, n00b)

125 milliseconds rtrim1
78 milliseconds rtrim2
62 milliseconds rtrim3

141 milliseconds rtrim1
78 milliseconds rtrim2
46 milliseconds rtrim3

125 milliseconds rtrim1
78 milliseconds rtrim2
47 milliseconds rtrim3

125 milliseconds rtrim1
94 milliseconds rtrim2
47 milliseconds rtrim3


Title: Re: New right trim algo.
Post by: sinsi on June 16, 2014, 04:00:42 PM
nidud latest

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
----------------------------------------------------
13864   cycles - 0: standard (scasb)
13400   cycles - 1: rtrim1
11011   cycles - 2: rtrim2
4630    cycles - 3: new strlen (AgnerFog)
5866    cycles - 4: rtrimUltra (Masm32 len)
3843    cycles - 5: rtrim3
3542    cycles - 6: repeat 4 (AgnerFog)
2604    cycles - 7: rtrimUltra (MasmBasic Len) - SSE
1555    cycles - 8: strtrim unaligned - SSE

14328   cycles - 0: standard (scasb)
14194   cycles - 1: rtrim1
10710   cycles - 2: rtrim2
4967    cycles - 3: new strlen (AgnerFog)
6027    cycles - 4: rtrimUltra (Masm32 len)
3792    cycles - 5: rtrim3
3449    cycles - 6: repeat 4 (AgnerFog)
2669    cycles - 7: rtrimUltra (MasmBasic Len) - SSE
1642    cycles - 8: strtrim unaligned - SSE

13670   cycles - 0: standard (scasb)
14597   cycles - 1: rtrim1
10797   cycles - 2: rtrim2
4746    cycles - 3: new strlen (AgnerFog)
5665    cycles - 4: rtrimUltra (Masm32 len)
3872    cycles - 5: rtrim3
3487    cycles - 6: repeat 4 (AgnerFog)
2690    cycles - 7: rtrimUltra (MasmBasic Len) - SSE
1514    cycles - 8: strtrim unaligned - SSE

Title: Re: New right trim algo.
Post by: nidud on July 16, 2014, 09:20:31 PM
deleted
Title: Re: New right trim algo.
Post by: Gunther on July 16, 2014, 11:35:22 PM
Results from rtrim:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
140233  cycles - 500 ( 41) 0: strtrim
114064  cycles - 500 ( 53) 1: **failed** rtrim1
107836  cycles - 500 ( 50) 2: **failed** rtrim2
44405   cycles - 500 (319) 3: **failed** rtrim3
50526   cycles - 500 (103) 4: AgnerFog
49096   cycles - 500 ( 66) 5: AgnerFog (unaligned)
68013   cycles - 500 (  0) 6: **failed** rtrimUltra (Masm32 len)
27418   cycles - 500 (  0) 7: **failed** rtrimUltra (MasmBasic Len) - SSE
26399   cycles - 500 ( 60) 8: **failed** strtrim - SSE
--- ok ---

strlen:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
113712  cycles - 50 (  0) 0: crt_strlen
189882  cycles - 50 (  0) 1: MASM32 - szLen()
21866   cycles - 50 (  0) 2: MasmBasic MbStrLen Len() - SSE
102459  cycles - 50 ( 86) 3: AgnerFog
100484  cycles - 50 ( 49) 4: AgnerFog unaligned
222688  cycles - 50 ( 94) 5: Dave
24659   cycles - 50 ( 45) 6: strlen SSE
--- ok ---

strstr:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
382038  cycles - 50 ( 67) 1: strstr
379086  cycles - 50 ( 71) 2: x
376889  cycles - 50 ( 57) 4: x
382763  cycles - 50 ( 54) 5: x
382307  cycles - 50 ( 50) 7: x
--- ok ---


OS is Win 7-64, SP 1.

Gunther
Title: Re: New right trim algo.
Post by: jj2007 on July 18, 2014, 07:17:35 AM
Quote from: nidud on July 16, 2014, 09:20:31 PM
148115  cycles - 500 (  0) 6: **failed** rtrimUltra (Masm32 len)
102954  cycles - 500 (  0) 7: **failed** rtrimUltra (MasmBasic Len) - SSE

How do they fail? Where are they? I count 40 files in your archive, dispersed over three folders, but can't find my algos there (and they usually don't fail :icon_mrgreen:).

Besides, the various makeit.bat don't create any executable, so where is the testbed? Is it one folder higher:

test.exe
test.obj
strlen.exe
test.asm
strstr.exe
rtrim.exe
makeit.bat
?? All 4 exe files brutally fail with a GPF...
Title: Re: New right trim algo.
Post by: nidud on July 18, 2014, 11:21:02 AM
deleted
Title: Re: New right trim algo.
Post by: jj2007 on July 18, 2014, 04:41:09 PM
Quote from: nidud on July 18, 2014, 11:21:02 AM
Quote(and they usually don't fail :icon_mrgreen:).
arg:    db ' ',13,10,0 ; 000A0D20h
result: db ' ',00,10,0 ; 000A0020h

error: **failed** rtrimUltra (MasmBasic Len) - SSE

They don't fail, they just don't return what validate_x expects. Rtrim$() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1166) tells Let (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1132), Cat$() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1133) or Print (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1112) where the trimmed string starts, and how many bytes are valid. In many cases, this is more efficient than shuffling bytes around.
Title: Re: New right trim algo.
Post by: nidud on July 18, 2014, 08:37:45 PM
deleted
Title: Re: New right trim algo.
Post by: jj2007 on July 18, 2014, 08:48:28 PM
include \masm32\MasmBasic\MasmBasic.inc      ; download (http://masm32.com/board/index.php?topic=94.0)
TheSrc      db "The source       ", 0
  Init
  mov esi, Cat$(Rtrim$(offset TheSrc))
  .While 1
      lodsb
      .Break .if !al
      Print Hex$(al), " "
  .Endw
  Inkey " - OK?"
  Exit
end start



54 68 65 20 73 6F 75 72 63 65  - OK?

;-)
Title: Re: New right trim algo.
Post by: nidud on July 18, 2014, 09:23:52 PM
deleted
Title: Re: New right trim algo.
Post by: jj2007 on July 18, 2014, 09:39:31 PM
Sorry, this intrigues me - and I can't reproduce the problems. Can you post source and exe, please?
Thanks :icon14:
Title: Re: New right trim algo.
Post by: jj2007 on July 18, 2014, 09:50:16 PM
Again, can't see how to reproduce this one. Besides, what did you use to produce the Ascii 13 of \r ? cStyle$() doesn't recognise that escape character, just \t and \n.

include \masm32\MasmBasic\MasmBasic.inc      ; download (http://masm32.com/board/index.php?topic=94.0)
.code
TheSrc    db "The source       ", 0
TheR      db "  ", 9, 13, 13, 10, 0
start:
  Print "RTRIM$: |", Rtrim$("                Trim a string                "), "|",CrLf$
  Print "[", Rtrim$(cStyle$("1 \t\n\n")), "]", CrLf$
  Print "[", Rtrim$(cStyle$("  \t\n\n")), "]", CrLf$
  Print "[", Rtrim$(offset TheR), "]", CrLf$
  mov esi, Cat$(Rtrim$(offset TheSrc))
  .Repeat
      lodsb
      Print Hex$(al), " "
  .Until !byte ptr [esi]
  Inkey " - OK?"
  Exit
end start


Output:

RTRIM$: |                Trim a string|
[1]
[]
[]
54 68 65 20 73 6F 75 72 63 65  - OK?


Can you post your source and the crashing exe, please?
Title: Re: New right trim algo.
Post by: nidud on July 18, 2014, 09:57:00 PM
deleted
Title: Re: New right trim algo.
Post by: nidud on July 18, 2014, 10:19:53 PM
deleted
Title: Re: New right trim algo.
Post by: jj2007 on July 18, 2014, 10:22:47 PM
   Print "RTRIM$: |", Rtrim$("                Trim a string                "), "|",CrLf$

Output:
RTRIM$: |                Trim a string|

Looks OK for me...

The exe in your attachment crashes indeed, but even with full symbols the test.asm you sent assembles to a 35k file - your exe has 45k. So it was produced with a different source. Can you post source and matching exe, please?
Title: Re: New right trim algo.
Post by: nidud on July 18, 2014, 10:41:09 PM
deleted
Title: Re: New right trim algo.
Post by: nidud on July 19, 2014, 12:17:23 AM
deleted
Title: Re: New right trim algo.
Post by: jj2007 on July 19, 2014, 02:15:32 AM
Quote from: nidud on July 19, 2014, 12:17:23 AM
ok. I updated the library -- no crash
however, it still mess up the font

Try a MbCpAnsi=437 before the first Print. See the orange section here (http://en.wikipedia.org/wiki/Code_page_437#Standard_code_page) for details.
Title: Re: New right trim algo.
Post by: nidud on July 19, 2014, 04:24:34 AM
deleted
Title: Re: New right trim algo.
Post by: jj2007 on July 19, 2014, 07:50:14 AM
Quote from: nidud on July 19, 2014, 04:24:34 AM
The standard code page for the console is normally "DOS code pages", 850 on my machine. The default for MasmBasic seems to be "Windows-1252" (http://en.wikipedia.org/wiki/Code_page_1252#Code_page).

Indeed. You can change that with MbCpAnsi=850

Quoteif I test the Rtrim$(" \r\n") macro nothing happens
proc_7 proc string:dword
    mov eax,Rtrim$(string)
    mov eax,string
    ret
proc_7 endp


returns " \r\n"

Yes, because Rtrim$() doesn't change the original string. Comment out the mov eax,string to get a pointer to the copy.