The MASM Forum

General => The Laboratory => Topic started by: Vortex on June 17, 2024, 07:52:59 PM

Title: Trimming spaces and tabs inside a string
Post by: Vortex on June 17, 2024, 07:52:59 PM
Hello,

Functions to trim spaces and tabs inside a string :

include     RemoveSpaces.inc

.data

lookupTbl   db 1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1


mystr       db '    This    Is   A       Test String.',0
msg         db '%s',13,10,'Lenght of string = %u',0

.data?

buffer      db 64 dup(?)
buffer2     db 32 dup(?)

.code

RemoveSpaces PROC uses edi ebx str1:DWORD,buff:DWORD

    mov     ebx,OFFSET lookupTbl
    mov     ecx,str1
    mov     edi,buff
@@:
    movzx   eax,BYTE PTR [ecx]
    movzx   edx,BYTE PTR [ebx+eax]
    mov     BYTE PTR [edi],al
    add     ecx,1
    add     edi,edx
    test    eax,eax
    jnz     @b   
 
finish:

    mov     eax,edi
    sub     eax,1
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer

    invoke  wsprintf,ADDR buffer2,\
            ADDR msg,ADDR buffer,eax

    invoke  StdOut,ADDR buffer2
    invoke  ExitProcess,0

END start

Another version without a lookup table :

include     RemoveSpaces.inc

.data

mystr       db '    This    Is   A       Test String.',0
msg         db '%s',13,10,'Lenght of string = %u',0

.data?

buffer      db 256 dup(?)
buffer2     db 32 dup(?)

.code

RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD

    mov     ecx,str1
    mov     edx,buff
    xor     ebx,ebx
@@:
    movzx   eax,BYTE PTR [ecx]
    mov     BYTE PTR [edx],al
    add     ecx,1

    xor     al,32
    setnz   ah
    xor     al,41
    setnz   bl
    and     bl,ah
    add     edx,ebx

    cmp     al,9
    jnz     @b

finish:

    mov     eax,edx
    sub     eax,1
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer

    invoke  wsprintf,ADDR buffer2,\
            ADDR msg,ADDR buffer,eax

    invoke  StdOut,ADDR buffer2
    invoke  ExitProcess,0

END start
Title: Re: Trimming spaces and tabs inside a string
Post by: NoCforMe on June 18, 2024, 04:59:30 AM
Is that really better than this?
MOV/LEA ESI, source
MOV/LEA EDI, dest
XOR EDX, EDX

skip: LODSB
TEST AL, AL
JZ done
CMP AL, ' '
JE skip
CMP AL, $tab
JE skip
STOSB
INC EDX
JMP skip

done: STOSB ;Null-terminate result.
MOV EAX, EDX ;Return w/trimmed len.
    . . .
Mine is far simpler, anyhow. No sexy tricks, though.
Title: Re: Trimming spaces and tabs inside a string
Post by: zedd on June 18, 2024, 05:19:49 AM
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
nospace proc string:dword, bufferx:dword
    mov ecx, [esp+4]
    mov eax, [esp+8]
    dec ecx
  @@:
    inc ecx
    cmp byte ptr [ecx], 0
    jz @f
    cmp byte ptr [ecx], 20h
    jz @b
    cmp byte ptr [ecx], 9
    jz @b
    mov dl, [ecx]
    mov [eax], dl
    inc eax
    jmp @b
  @@:
    sub eax, [esp+8]
    ret 8
nospace endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
  :biggrin:  A different approach to achieve the same, no stack frame.
Title: Re: Trimming spaces and tabs inside a string
Post by: NoCforMe on June 18, 2024, 05:44:31 AM
Yes. Your code is basically a paraphrase of mine.
So what's wrong with stack frames?
Title: Re: Trimming spaces and tabs inside a string
Post by: zedd on June 18, 2024, 05:53:09 AM
Quote from: NoCforMe on June 18, 2024, 05:44:31 AMYes. Your code is basically a paraphrase of mine.
Not really.
QuoteSo what's wrong with stack frames?
Saves a few bytes.  :smiley: and mine does not use esi, edi or ebx and of course, ebp. Hence, no need to preserve registers - which saves a couple more bytes.
Title: Re: Trimming spaces and tabs inside a string
Post by: NoCforMe on June 18, 2024, 06:07:11 AM
Quote from: sudoku on June 18, 2024, 05:53:09 AM
Quote from: NoCforMe on June 18, 2024, 05:44:31 AMYes. Your code is basically a paraphrase of mine.
Not really.
Sure it is:
o Look @ next character:
o    Zero? done
o    Space or tab? skip
o    Else store it
QuoteSo what's wrong with stack frames?
QuoteSaves a few bytes.  :smiley: and mine does not use esi, edi or ebx and of course, ebp. Hence, no need to preserve registers - which saves a couple more bytes.
Yawn. Color me unimpressed. Do you really count code bytes in your programs?
Title: Re: Trimming spaces and tabs inside a string
Post by: zedd on June 18, 2024, 06:14:38 AM
Quote from: NoCforMe on June 18, 2024, 06:07:11 AMYawn. Color me unimpressed.
:rolleyes:  I am not surprised.
QuoteDo you actually count code bytes in your programs?
No, but this is The Laboratory after all. "Post code here to be beaten to death to make it better, smaller, faster or more powerful." I made my version smaller, by removing the stack frame and not using esi or edi.
Title: Re: Trimming spaces and tabs inside a string
Post by: NoCforMe on June 18, 2024, 06:24:11 AM
Well then, by that metric the OP's code is a clear loser.
Interesting, though.
Title: Re: Trimming spaces and tabs inside a string
Post by: zedd on June 18, 2024, 06:26:47 AM
For much longer strings, Vortex's use of the lookup table may be faster - but would need testing.
Title: Re: Trimming spaces and tabs inside a string
Post by: jj2007 on June 18, 2024, 06:47:50 AM
Quote from: NoCforMe on June 18, 2024, 04:59:30 AMMine is far simpler
Yes indeed.

TrimIt proc uses esi edi pString
  mov esi, pString
  mov edi, esi
  .Repeat        ; skip leading whitespace
    lodsb
  .Until al!=9 && al!=32
  stosb
  .Repeat
    .Repeat    ; skip more than one whitespace
        lodsb
        mov dl, [esi]
    .Until al!=9 && al!=32 || dl!=9 && dl!=32
    stosb
  .Until !al
  dec edi
  .Repeat
    dec edi
    mov al, [edi]
  .Until al!=32 && al!=9
  mov byte ptr [edi+1], 0
  ret
TrimIt endp

Original: [    This    Is  A        Test String.  ]
Erol:     [ThisIsATestString.]
nospace:  [ThisIsATestString.]
TrimIt:   [This Is A Test String.]
Title: Re: Trimming spaces and tabs inside a string
Post by: NoCforMe on June 18, 2024, 06:56:29 AM
So yours is really a "trim excess spaces & tabs" function, yes? Which is different from all the other examples so far (which discard all spaces & tabs).
Your routine basically does what HTML does: collapse all whitespace to a single space (except that you trim leading and trailing spaces). A useful function for sure.
And has the advantage of doing the trimming in-place without needing a second buffer.

Question: does your code work correctly if there's a space and a tab next to each other?
Title: Re: Trimming spaces and tabs inside a string
Post by: zedd on June 18, 2024, 07:42:16 AM
NoCforMe and jj2007, what is the value in eax when your code finishes? It should contain the length of the string after processing (sans spaces and/or tabs). I had tested my results (using 'nospace') and compared it to Vortex's results, so that my code is equivalent to his (using a different method).
Title: Re: Trimming spaces and tabs inside a string
Post by: jj2007 on June 18, 2024, 07:55:11 AM
Quote from: NoCforMe on June 18, 2024, 06:56:29 AMdoes your code work correctly if there's a space and a tab next to each other?

Yes.
Title: Re: Trimming spaces and tabs inside a string
Post by: NoCforMe on June 18, 2024, 09:01:26 AM
Quote from: sudoku on June 18, 2024, 07:42:16 AMNoCforMe and jj2007, what is the value in eax when your code finishes?
It's AL = last character seen, the rest "undefined".
QuoteIt should contain the length of the string after processing (sans spaces and/or tabs).
Sez who? I don't remember seeing that as one of the requirements for this function.
Title: Re: Trimming spaces and tabs inside a string
Post by: zedd on June 18, 2024, 09:04:06 AM
Quote from: NoCforMe on June 18, 2024, 09:01:26 AMSez who? I don't remember seeing that as one of the requirements for this function.
I guess that you didn't really look too hard at Vortex's code.

From example 1
finish:

    mov     eax,edi
    sub     eax,1
    sub     eax,buff ; <---- here
    ret


from example 2
finish:

    mov    eax,edx
    sub    eax,1
    sub    eax,buff ; <---- here
    ret
Title: Re: Trimming spaces and tabs inside a string
Post by: NoCforMe on June 18, 2024, 09:07:37 AM
Nope, I missed that.
Modified my code above to return the trimmed length.
Happy?
Title: Re: Trimming spaces and tabs inside a string
Post by: zedd on June 18, 2024, 09:10:58 AM
Quote from: NoCforMe on June 18, 2024, 09:07:37 AMHappy?
As a clam.  :smiley:
Title: Re: Trimming spaces and tabs inside a string
Post by: TimoVJL on June 18, 2024, 09:06:29 PM
Quote from: sudoku on June 18, 2024, 09:10:58 AM
Quote from: NoCforMe on June 18, 2024, 09:07:37 AMHappy?
As a clam.  :smiley:
is clam clam.s or clam.asm ?   :tongue:
Also is an as a linux as or windows as.exe ?  :tongue:
Title: Re: Trimming spaces and tabs inside a string
Post by: zedd on June 18, 2024, 09:51:55 PM
Quote from: TimoVJL on June 18, 2024, 09:06:29 PMAlso is an as a linux as or windows as.exe ?  :tongue:
You do have a sense of humor.   :greenclp:
No, not using "as" as assembler here.  :tongue:  therefore I am not assembling said clam.  :toothy:
Title: Re: Trimming spaces and tabs inside a string
Post by: Vortex on June 19, 2024, 04:51:36 AM
Another version :

- Removed conditional setnz instructions.
- No need of ebx.

include     RemoveSpaces.inc

.data

mystr       db '    This    Is   A       Test String.',0
msg         db '%s',13,10,'Lenght of string = %u',0

.data?

buffer      db 256 dup(?)
buffer2     db 32 dup(?)

.code

RemoveSpaces PROC str1:DWORD,buff:DWORD

    mov     ecx,str1
    mov     edx,buff
@@:
    movzx   eax,BYTE PTR [ecx]
    mov     BYTE PTR [edx],al
    add     ecx,1

    xor     al,32
    mov     ah,al
    xor     al,41
    and     ah,al
    add     ah,0FFh
    adc     edx,0

    cmp     al,9
    jne     @b

finish:

    mov     eax,edx
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer

    invoke  wsprintf,ADDR buffer2,\
            ADDR msg,ADDR buffer,eax

    invoke  StdOut,ADDR buffer2
    invoke  ExitProcess,0

END start

Title: Re: Trimming spaces and tabs inside a string
Post by: NoCforMe on June 19, 2024, 06:02:51 AM
So can you explain to us how this works?
    xor    al,32
    mov    ah,al
    xor    al,41
    and    ah,al
    add    ah,0FFh
    adc    edx,0

    cmp    al,9
    jne    @b
It may be obvious to you, but it sure isn't to me.
Title: Re: Trimming spaces and tabs inside a string
Post by: Vortex on June 19, 2024, 06:36:19 AM
Hi NoCforMe,

    xor    al,32 ; XORing al with 32 has two results : zero or another value
    mov    ah,al ; copy al to ah

    xor    al,41 ; To get back the original value of al, we can do again xor al,32. This would be followed by
                   xor al,9. Combined XOR operations removes the extra second xor al,32  : 32 xor 9 = 41
                   xor ( xor al,32 ) , 9 = xor al,41
                   XORing al with 41 has two results : zero or another value

    and    ah,al ; This and operation will reduce the number of results ( ah and al ) to one :
                   The possible combinations :
                   ah=0 , al=non-zero , and ah,al -> 0
                   ah=non-zero , al=0 , and ah,al -> 0
                   ah=non-zero , al=non-zero , and ah,al -> non-zero

    add    ah,0FFh ; if (and ah,al) = 0 => 0+255 = 255 = > the carry flag is zero.
                     if (and ah,al) = non-zero => <non-zero value> + 255 will cause an overflow setting the
                     carry flag to 1

    adc    edx,0   ; Carry flag = 0 => the original value of al was 32 or 9 and they should be bypassed with
                     the condition carry flag=0 : edx + 0 + carry flag 0 = edx
                   ; Carry flag = 1 => the original value of al was not 32 or 9 and this character should be
                     preserved in the buffer pointed by edx : edx + 0 + carry flag 1 = edx+1
    cmp    al,9    ; After the previous operations xor al,32 and xor al,41, the NULL terminator is converted to
                     ASCII 9
    jne    @b      ; If al!=9 the go to return back to the beginning of the loop.
Title: Re: Trimming spaces and tabs inside a string
Post by: NoCforMe on June 19, 2024, 07:01:08 AM
Thanks! Great explanation.
Title: Re: Trimming spaces and tabs inside a string
Post by: interfind on June 24, 2024, 06:35:51 AM
Is there a Problem in RemoveSpaces
with the char's ! ( ) in the result string?
Title: Re: Trimming spaces and tabs inside a string
Post by: Vortex on June 24, 2024, 06:52:35 AM
Hello,

Thanks for your feedback, you are right. The previous version seems to work fine, tested with the exclamation symbol !

https://masm32.com/board/index.php?msg=131754

RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD

    mov     ecx,str1
    mov     edx,buff
    xor     ebx,ebx
@@:
    movzx   eax,BYTE PTR [ecx]
    mov     BYTE PTR [edx],al
    add     ecx,1

    xor     al,32
    setnz   ah
    xor     al,41
    setnz   bl
    and     bl,ah
    add     edx,ebx

    cmp     al,9
    jnz     @b

finish:

    mov     eax,edx
    sub     eax,1
    sub     eax,buff
    ret

RemoveSpaces ENDP
Title: Re: Trimming spaces and tabs inside a string
Post by: NoCforMe on June 24, 2024, 08:25:20 AM
@Vortex: questions for you:
1. WHY?
I get that your routine is very, very clever, but is that the reason you coded it? To prove how tricky you can be? I still can't really see the advantage over my admittedly somewhat dumbass (meaning straightforward) approach to the problem. Maybe your code is faster, I'll grant that, but in most cases, does that really matter?

2. HOW?
How did you come up with this way of stripping characters? All that XORing and stuff; did you come up with this on your own? or did you see this code somewhere?
If you came up with it on your own, how did you work this out?

Curious minds want to know.