News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Trimming spaces and tabs inside a string

Started by Vortex, June 17, 2024, 07:52:59 PM

Previous topic - Next topic

Vortex

Hello,

Functions to trim spaces and tabs inside a string :

include     RemoveSpaces.inc

.data

lookupTbl   db 1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1


mystr       db '    This    Is   A       Test String.',0
msg         db '%s',13,10,'Lenght of string = %u',0

.data?

buffer      db 64 dup(?)
buffer2     db 32 dup(?)

.code

RemoveSpaces PROC uses edi ebx str1:DWORD,buff:DWORD

    mov     ebx,OFFSET lookupTbl
    mov     ecx,str1
    mov     edi,buff
@@:
    movzx   eax,BYTE PTR [ecx]
    movzx   edx,BYTE PTR [ebx+eax]
    mov     BYTE PTR [edi],al
    add     ecx,1
    add     edi,edx
    test    eax,eax
    jnz     @b   
 
finish:

    mov     eax,edi
    sub     eax,1
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer

    invoke  wsprintf,ADDR buffer2,\
            ADDR msg,ADDR buffer,eax

    invoke  StdOut,ADDR buffer2
    invoke  ExitProcess,0

END start

Another version without a lookup table :

include     RemoveSpaces.inc

.data

mystr       db '    This    Is   A       Test String.',0
msg         db '%s',13,10,'Lenght of string = %u',0

.data?

buffer      db 256 dup(?)
buffer2     db 32 dup(?)

.code

RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD

    mov     ecx,str1
    mov     edx,buff
    xor     ebx,ebx
@@:
    movzx   eax,BYTE PTR [ecx]
    mov     BYTE PTR [edx],al
    add     ecx,1

    xor     al,32
    setnz   ah
    xor     al,41
    setnz   bl
    and     bl,ah
    add     edx,ebx

    cmp     al,9
    jnz     @b

finish:

    mov     eax,edx
    sub     eax,1
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer

    invoke  wsprintf,ADDR buffer2,\
            ADDR msg,ADDR buffer,eax

    invoke  StdOut,ADDR buffer2
    invoke  ExitProcess,0

END start

NoCforMe

#1
Is that really better than this?
MOV/LEA ESI, source
MOV/LEA EDI, dest
XOR EDX, EDX

skip: LODSB
TEST AL, AL
JZ done
CMP AL, ' '
JE skip
CMP AL, $tab
JE skip
STOSB
INC EDX
JMP skip

done: STOSB ;Null-terminate result.
MOV EAX, EDX ;Return w/trimmed len.
    . . .
Mine is far simpler, anyhow. No sexy tricks, though.
Assembly language programming should be fun. That's why I do it.

zedd151

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
nospace proc string:dword, bufferx:dword
    mov ecx, [esp+4]
    mov eax, [esp+8]
    dec ecx
  @@:
    inc ecx
    cmp byte ptr [ecx], 0
    jz @f
    cmp byte ptr [ecx], 20h
    jz @b
    cmp byte ptr [ecx], 9
    jz @b
    mov dl, [ecx]
    mov [eax], dl
    inc eax
    jmp @b
  @@:
    sub eax, [esp+8]
    ret 8
nospace endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
  :biggrin:  A different approach to achieve the same, no stack frame.

NoCforMe

Yes. Your code is basically a paraphrase of mine.
So what's wrong with stack frames?
Assembly language programming should be fun. That's why I do it.

zedd151

Quote from: NoCforMe on June 18, 2024, 05:44:31 AMYes. Your code is basically a paraphrase of mine.
Not really.
QuoteSo what's wrong with stack frames?
Saves a few bytes.  :smiley: and mine does not use esi, edi or ebx and of course, ebp. Hence, no need to preserve registers - which saves a couple more bytes.

NoCforMe

Quote from: sudoku on June 18, 2024, 05:53:09 AM
Quote from: NoCforMe on June 18, 2024, 05:44:31 AMYes. Your code is basically a paraphrase of mine.
Not really.
Sure it is:
o Look @ next character:
o    Zero? done
o    Space or tab? skip
o    Else store it
QuoteSo what's wrong with stack frames?
QuoteSaves a few bytes.  :smiley: and mine does not use esi, edi or ebx and of course, ebp. Hence, no need to preserve registers - which saves a couple more bytes.
Yawn. Color me unimpressed. Do you really count code bytes in your programs?
Assembly language programming should be fun. That's why I do it.

zedd151

Quote from: NoCforMe on June 18, 2024, 06:07:11 AMYawn. Color me unimpressed.
:rolleyes:  I am not surprised.
QuoteDo you actually count code bytes in your programs?
No, but this is The Laboratory after all. "Post code here to be beaten to death to make it better, smaller, faster or more powerful." I made my version smaller, by removing the stack frame and not using esi or edi.

NoCforMe

Well then, by that metric the OP's code is a clear loser.
Interesting, though.
Assembly language programming should be fun. That's why I do it.

zedd151

For much longer strings, Vortex's use of the lookup table may be faster - but would need testing.

jj2007

Quote from: NoCforMe on June 18, 2024, 04:59:30 AMMine is far simpler
Yes indeed.

TrimIt proc uses esi edi pString
  mov esi, pString
  mov edi, esi
  .Repeat        ; skip leading whitespace
    lodsb
  .Until al!=9 && al!=32
  stosb
  .Repeat
    .Repeat    ; skip more than one whitespace
        lodsb
        mov dl, [esi]
    .Until al!=9 && al!=32 || dl!=9 && dl!=32
    stosb
  .Until !al
  dec edi
  .Repeat
    dec edi
    mov al, [edi]
  .Until al!=32 && al!=9
  mov byte ptr [edi+1], 0
  ret
TrimIt endp

Original: [    This    Is  A        Test String.  ]
Erol:     [ThisIsATestString.]
nospace:  [ThisIsATestString.]
TrimIt:   [This Is A Test String.]

NoCforMe

So yours is really a "trim excess spaces & tabs" function, yes? Which is different from all the other examples so far (which discard all spaces & tabs).
Your routine basically does what HTML does: collapse all whitespace to a single space (except that you trim leading and trailing spaces). A useful function for sure.
And has the advantage of doing the trimming in-place without needing a second buffer.

Question: does your code work correctly if there's a space and a tab next to each other?
Assembly language programming should be fun. That's why I do it.

zedd151

NoCforMe and jj2007, what is the value in eax when your code finishes? It should contain the length of the string after processing (sans spaces and/or tabs). I had tested my results (using 'nospace') and compared it to Vortex's results, so that my code is equivalent to his (using a different method).

jj2007

Quote from: NoCforMe on June 18, 2024, 06:56:29 AMdoes your code work correctly if there's a space and a tab next to each other?

Yes.

NoCforMe

Quote from: sudoku on June 18, 2024, 07:42:16 AMNoCforMe and jj2007, what is the value in eax when your code finishes?
It's AL = last character seen, the rest "undefined".
QuoteIt should contain the length of the string after processing (sans spaces and/or tabs).
Sez who? I don't remember seeing that as one of the requirements for this function.
Assembly language programming should be fun. That's why I do it.

zedd151

Quote from: NoCforMe on June 18, 2024, 09:01:26 AMSez who? I don't remember seeing that as one of the requirements for this function.
I guess that you didn't really look too hard at Vortex's code.

From example 1
finish:

    mov     eax,edi
    sub     eax,1
    sub     eax,buff ; <---- here
    ret


from example 2
finish:

    mov    eax,edx
    sub    eax,1
    sub    eax,buff ; <---- here
    ret