Hello,
Functions to trim spaces and tabs inside a string :
include RemoveSpaces.inc
.data
lookupTbl db 1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
mystr db ' This Is A Test String.',0
msg db '%s',13,10,'Lenght of string = %u',0
.data?
buffer db 64 dup(?)
buffer2 db 32 dup(?)
.code
RemoveSpaces PROC uses edi ebx str1:DWORD,buff:DWORD
mov ebx,OFFSET lookupTbl
mov ecx,str1
mov edi,buff
@@:
movzx eax,BYTE PTR [ecx]
movzx edx,BYTE PTR [ebx+eax]
mov BYTE PTR [edi],al
add ecx,1
add edi,edx
test eax,eax
jnz @b
finish:
mov eax,edi
sub eax,1
sub eax,buff
ret
RemoveSpaces ENDP
start:
invoke RemoveSpaces,ADDR mystr,ADDR buffer
invoke wsprintf,ADDR buffer2,\
ADDR msg,ADDR buffer,eax
invoke StdOut,ADDR buffer2
invoke ExitProcess,0
END start
Another version without a lookup table :
include RemoveSpaces.inc
.data
mystr db ' This Is A Test String.',0
msg db '%s',13,10,'Lenght of string = %u',0
.data?
buffer db 256 dup(?)
buffer2 db 32 dup(?)
.code
RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD
mov ecx,str1
mov edx,buff
xor ebx,ebx
@@:
movzx eax,BYTE PTR [ecx]
mov BYTE PTR [edx],al
add ecx,1
xor al,32
setnz ah
xor al,41
setnz bl
and bl,ah
add edx,ebx
cmp al,9
jnz @b
finish:
mov eax,edx
sub eax,1
sub eax,buff
ret
RemoveSpaces ENDP
start:
invoke RemoveSpaces,ADDR mystr,ADDR buffer
invoke wsprintf,ADDR buffer2,\
ADDR msg,ADDR buffer,eax
invoke StdOut,ADDR buffer2
invoke ExitProcess,0
END start
Is that really better than this?
MOV/LEA ESI, source
MOV/LEA EDI, dest
XOR EDX, EDX
skip: LODSB
TEST AL, AL
JZ done
CMP AL, ' '
JE skip
CMP AL, $tab
JE skip
STOSB
INC EDX
JMP skip
done: STOSB ;Null-terminate result.
MOV EAX, EDX ;Return w/trimmed len.
. . .
Mine is far simpler, anyhow. No sexy tricks, though.
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
nospace proc string:dword, bufferx:dword
mov ecx, [esp+4]
mov eax, [esp+8]
dec ecx
@@:
inc ecx
cmp byte ptr [ecx], 0
jz @f
cmp byte ptr [ecx], 20h
jz @b
cmp byte ptr [ecx], 9
jz @b
mov dl, [ecx]
mov [eax], dl
inc eax
jmp @b
@@:
sub eax, [esp+8]
ret 8
nospace endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
:biggrin: A different approach to achieve the same, no stack frame.
Yes. Your code is basically a paraphrase of mine.
So what's wrong with stack frames?
Quote from: NoCforMe on June 18, 2024, 05:44:31 AMYes. Your code is basically a paraphrase of mine.
Not really.
QuoteSo what's wrong with stack frames?
Saves a few bytes. :smiley: and mine does not use esi, edi or ebx and of course, ebp. Hence, no need to preserve registers - which saves a couple more bytes.
Quote from: sudoku on June 18, 2024, 05:53:09 AMQuote from: NoCforMe on June 18, 2024, 05:44:31 AMYes. Your code is basically a paraphrase of mine.
Not really.
Sure it is:
o Look @ next character:
o Zero? done
o Space or tab? skip
o Else store it
QuoteSo what's wrong with stack frames?
QuoteSaves a few bytes. :smiley: and mine does not use esi, edi or ebx and of course, ebp. Hence, no need to preserve registers - which saves a couple more bytes.
Yawn. Color me unimpressed. Do you really count code bytes in your programs?
Quote from: NoCforMe on June 18, 2024, 06:07:11 AMYawn. Color me unimpressed.
:rolleyes: I am not surprised.
QuoteDo you actually count code bytes in your programs?
No, but this
is The Laboratory after all. "Post code here to be beaten to death to make it better,
smaller, faster or more powerful." I made my version smaller, by removing the stack frame and not using esi or edi.
Well then, by that metric the OP's code is a clear loser.
Interesting, though.
For much longer strings, Vortex's use of the lookup table may be faster - but would need testing.
Quote from: NoCforMe on June 18, 2024, 04:59:30 AMMine is far simpler
Yes indeed.
TrimIt proc uses esi edi pString
mov esi, pString
mov edi, esi
.Repeat ; skip leading whitespace
lodsb
.Until al!=9 && al!=32
stosb
.Repeat
.Repeat ; skip more than one whitespace
lodsb
mov dl, [esi]
.Until al!=9 && al!=32 || dl!=9 && dl!=32
stosb
.Until !al
dec edi
.Repeat
dec edi
mov al, [edi]
.Until al!=32 && al!=9
mov byte ptr [edi+1], 0
ret
TrimIt endp
Original: [ This Is A Test String. ]
Erol: [ThisIsATestString.]
nospace: [ThisIsATestString.]
TrimIt: [This Is A Test String.]
So yours is really a "trim excess spaces & tabs" function, yes? Which is different from all the other examples so far (which discard all spaces & tabs).
Your routine basically does what HTML does: collapse all whitespace to a single space (except that you trim leading and trailing spaces). A useful function for sure.
And has the advantage of doing the trimming in-place without needing a second buffer.
Question: does your code work correctly if there's a space and a tab next to each other?
NoCforMe and jj2007, what is the value in eax when your code finishes? It should contain the length of the string after processing (sans spaces and/or tabs). I had tested my results (using 'nospace') and compared it to Vortex's results, so that my code is equivalent to his (using a different method).
Quote from: NoCforMe on June 18, 2024, 06:56:29 AMdoes your code work correctly if there's a space and a tab next to each other?
Yes.
Quote from: sudoku on June 18, 2024, 07:42:16 AMNoCforMe and jj2007, what is the value in eax when your code finishes?
It's AL = last character seen, the rest "undefined".
QuoteIt should contain the length of the string after processing (sans spaces and/or tabs).
Sez who? I don't remember seeing that as one of the requirements for this function.
Quote from: NoCforMe on June 18, 2024, 09:01:26 AMSez who? I don't remember seeing that as one of the requirements for this function.
I guess that you didn't really look too hard at Vortex's code.
From example 1
finish:
mov eax,edi
sub eax,1
sub eax,buff ; <---- here
ret
from example 2
finish:
mov eax,edx
sub eax,1
sub eax,buff ; <---- here
ret
Nope, I missed that.
Modified my code above to return the trimmed length.
Happy?
Quote from: sudoku on June 18, 2024, 09:10:58 AMQuote from: NoCforMe on June 18, 2024, 09:07:37 AMHappy?
As a clam. :smiley:
is clam clam.s or clam.asm ? :tongue:
Also is an as a linux as or windows as.exe ? :tongue:
Quote from: TimoVJL on June 18, 2024, 09:06:29 PMAlso is an as a linux as or windows as.exe ? :tongue:
You
do have a sense of humor. :greenclp:
No, not using "as" as assembler here. :tongue: therefore I am not assembling said clam. :toothy:
Another version :
- Removed conditional setnz instructions.
- No need of ebx.
include RemoveSpaces.inc
.data
mystr db ' This Is A Test String.',0
msg db '%s',13,10,'Lenght of string = %u',0
.data?
buffer db 256 dup(?)
buffer2 db 32 dup(?)
.code
RemoveSpaces PROC str1:DWORD,buff:DWORD
mov ecx,str1
mov edx,buff
@@:
movzx eax,BYTE PTR [ecx]
mov BYTE PTR [edx],al
add ecx,1
xor al,32
mov ah,al
xor al,41
and ah,al
add ah,0FFh
adc edx,0
cmp al,9
jne @b
finish:
mov eax,edx
sub eax,buff
ret
RemoveSpaces ENDP
start:
invoke RemoveSpaces,ADDR mystr,ADDR buffer
invoke wsprintf,ADDR buffer2,\
ADDR msg,ADDR buffer,eax
invoke StdOut,ADDR buffer2
invoke ExitProcess,0
END start
So can you explain to us how this works?
xor al,32
mov ah,al
xor al,41
and ah,al
add ah,0FFh
adc edx,0
cmp al,9
jne @b
It may be obvious to you, but it sure isn't to me.
Hi NoCforMe,
xor al,32 ; XORing al with 32 has two results : zero or another value
mov ah,al ; copy al to ah
xor al,41 ; To get back the original value of al, we can do again xor al,32. This would be followed by
xor al,9. Combined XOR operations removes the extra second xor al,32 : 32 xor 9 = 41
xor ( xor al,32 ) , 9 = xor al,41
XORing al with 41 has two results : zero or another value
and ah,al ; This and operation will reduce the number of results ( ah and al ) to one :
The possible combinations :
ah=0 , al=non-zero , and ah,al -> 0
ah=non-zero , al=0 , and ah,al -> 0
ah=non-zero , al=non-zero , and ah,al -> non-zero
add ah,0FFh ; if (and ah,al) = 0 => 0+255 = 255 = > the carry flag is zero.
if (and ah,al) = non-zero => <non-zero value> + 255 will cause an overflow setting the
carry flag to 1
adc edx,0 ; Carry flag = 0 => the original value of al was 32 or 9 and they should be bypassed with
the condition carry flag=0 : edx + 0 + carry flag 0 = edx
; Carry flag = 1 => the original value of al was not 32 or 9 and this character should be
preserved in the buffer pointed by edx : edx + 0 + carry flag 1 = edx+1
cmp al,9 ; After the previous operations xor al,32 and xor al,41, the NULL terminator is converted to
ASCII 9
jne @b ; If al!=9 the go to return back to the beginning of the loop.
Thanks! Great explanation.
Is there a Problem in RemoveSpaces
with the char's ! ( ) in the result string?
Hello,
Thanks for your feedback, you are right. The previous version seems to work fine, tested with the exclamation symbol !
https://masm32.com/board/index.php?msg=131754
RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD
mov ecx,str1
mov edx,buff
xor ebx,ebx
@@:
movzx eax,BYTE PTR [ecx]
mov BYTE PTR [edx],al
add ecx,1
xor al,32
setnz ah
xor al,41
setnz bl
and bl,ah
add edx,ebx
cmp al,9
jnz @b
finish:
mov eax,edx
sub eax,1
sub eax,buff
ret
RemoveSpaces ENDP
@Vortex: questions for you:
1. WHY?
I get that your routine is very, very clever, but is that the reason you coded it? To prove how tricky you can be? I still can't really see the advantage over my admittedly somewhat dumbass (meaning straightforward) approach to the problem. Maybe your code is faster, I'll grant that, but in most cases, does that really matter?
2. HOW?
How did you come up with this way of stripping characters? All that XORing and stuff; did you come up with this on your own? or did you see this code somewhere?
If you came up with it on your own, how did you work this out?
Curious minds want to know.