String reverse algo

hutch-- · May 07, 2021, 09:34:39 PM

I don't get much time to write 32 bit code these days but since its prototype was wasted in the campus, here is a derivation of the first version that is self contained and full register design. Nothing to test it against but it runs OK.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *

tRev PROTO pstr:DWORD

.data
numz db "1234567890",0

.code

start:

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

call main
inkey
exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

LOCAL rval :DWORD

invoke tRev,ADDR numz ; reverse the number above
mov rval, eax

print rval,13,10

ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

tRev proc pstr:DWORD
; ---------------------------------------------
; in place string reverse, fully self contained
; ---------------------------------------------
push ebx ; save EBX
push edi
push esi

mov edi, [ebp+8]
mov eax, edi
sub eax, 1

@@: ; unroll by 4
REPEAT 3
add eax, 1 ; get length
cmp BYTE PTR [eax], 0 ; loop to buffer end
je @F
ENDM
add eax, 1 ; get length
cmp BYTE PTR [eax], 0 ; loop to buffer end
jne @B

@@:
sub eax, edi ; sub address from eax for len
mov esi, eax ; store length in esi
shr eax, 1 ; get integer half length of string
mov ebx, eax ; store it in variable
mov eax, edi ; eax has start char
mov edx, edi ; load start address
add edx, esi ; get end character address
sub edx, 1 ; sub 1 to get end char
mov esi, ebx ; copy half len into edi

@@:
movzx ebx, BYTE PTR [eax] ; load start char
movzx ecx, BYTE PTR [edx] ; load end char
mov [eax], cl ; reverse char pair
add eax, 1 ; increment start char
mov [edx], bl
sub edx, 1 ; decrement end char
sub esi, 1 ; sub 1 from half length
jnz @B ; loop back if its not zero

mov eax, edi ; return the address in eax

pop esi
pop edi
pop ebx ; restore EBX

ret 4 ; balance the stack by 4 bytes

tRev endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

jj2007 · May 07, 2021, 09:53:37 PM

Since we are in the Lab

Code Select

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

25756   cycles for 100 * tRev (Hutch)
33742   cycles for 100 * strrev (CRT)
13687   cycles for 100 * jRev (jj)

24425   cycles for 100 * tRev (Hutch)
33634   cycles for 100 * strrev (CRT)
13737   cycles for 100 * jRev (jj)

24779   cycles for 100 * tRev (Hutch)
33666   cycles for 100 * strrev (CRT)
13671   cycles for 100 * jRev (jj)

24694   cycles for 100 * tRev (Hutch)
33700   cycles for 100 * strrev (CRT)
13758   cycles for 100 * jRev (jj)

107     bytes for tRev (Hutch)
19      bytes for strrev (CRT)
51      bytes for jRev (jj)

1819043144      = eax tRev (Hutch)
1819043144      = eax strrev (CRT)
1819043144      = eax jRev (jj)

FORTRANS · May 07, 2021, 11:28:11 PM

Hi,

Two results.

Code Select

Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

25531	cycles for 100 * tRev (Hutch)
38912	cycles for 100 * strrev (CRT)
14510	cycles for 100 * jRev (jj)

25745	cycles for 100 * tRev (Hutch)
38790	cycles for 100 * strrev (CRT)
14539	cycles for 100 * jRev (jj)

25463	cycles for 100 * tRev (Hutch)
39089	cycles for 100 * strrev (CRT)
14285	cycles for 100 * jRev (jj)

25515	cycles for 100 * tRev (Hutch)
38875	cycles for 100 * strrev (CRT)
14330	cycles for 100 * jRev (jj)

107	bytes for tRev (Hutch)
19	bytes for strrev (CRT)
51	bytes for jRev (jj)

1819043144	= eax tRev (Hutch)
1819043144	= eax strrev (CRT)
1819043144	= eax jRev (jj)

--- ok ---
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

37301	cycles for 100 * tRev (Hutch)
65684	cycles for 100 * strrev (CRT)
24447	cycles for 100 * jRev (jj)

37206	cycles for 100 * tRev (Hutch)
65876	cycles for 100 * strrev (CRT)
24418	cycles for 100 * jRev (jj)

37151	cycles for 100 * tRev (Hutch)
65790	cycles for 100 * strrev (CRT)
24463	cycles for 100 * jRev (jj)

37155	cycles for 100 * tRev (Hutch)
65919	cycles for 100 * strrev (CRT)
24468	cycles for 100 * jRev (jj)

107	bytes for tRev (Hutch)
19	bytes for strrev (CRT)
51	bytes for jRev (jj)

1819043144	= eax tRev (Hutch)
1819043144	= eax strrev (CRT)
1819043144	= eax jRev (jj)

--- ok ---

Cheers,

Steve N.

jj2007 · May 08, 2021, 12:26:26 AM

Thanks, Steve. Here is another one called xRev, using pshufb:

Code Select

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

24424   cycles for 100 * tRev (Hutch)
33751   cycles for 100 * strrev (CRT)
13704   cycles for 100 * jRev (jj)
14602   cycles for 100 * Mirror$ (MasmBasic)
4662    cycles for 100 * xRev (jj)

24513   cycles for 100 * tRev (Hutch)
33951   cycles for 100 * strrev (CRT)
13689   cycles for 100 * jRev (jj)
14589   cycles for 100 * Mirror$ (MasmBasic)
4659    cycles for 100 * xRev (jj)

24555   cycles for 100 * tRev (Hutch)
33987   cycles for 100 * strrev (CRT)
13773   cycles for 100 * jRev (jj)
14670   cycles for 100 * Mirror$ (MasmBasic)
4677    cycles for 100 * xRev (jj)

24750   cycles for 100 * tRev (Hutch)
33709   cycles for 100 * strrev (CRT)
13749   cycles for 100 * jRev (jj)
14620   cycles for 100 * Mirror$ (MasmBasic)
4721    cycles for 100 * xRev (jj)

107     bytes for tRev (Hutch)
19      bytes for strrev (CRT)
51      bytes for jRev (jj)
12      bytes for Mirror$ (MasmBasic)
87      bytes for xRev (jj)

1819043144      = eax tRev (Hutch)
1819043144      = eax strrev (CRT)
1819043144      = eax jRev (jj)
2053468783      = eax Mirror$ (MasmBasic)
1819043144      = eax xRev (jj)

Note Mirror$() and Caballero's algo write to a buffer, i.e. they are not in-place algos.

Do not use for serious stuff - this is work in progress and not fully tested

P.S.: I've added Lingo's algo (fast but crashes for unaligned strings), and Caballero's contribution.

daydreamer · May 08, 2021, 03:22:06 AM

wouldnt it be more standard today with unicode 16 reverse string?

hutch-- · May 08, 2021, 03:30:07 AM

Write one for us Magnus.

FORTRANS · May 08, 2021, 04:52:04 AM

Hi,

One result and a partial.

Code Select

Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

25930	cycles for 100 * tRev (Hutch)
39465	cycles for 100 * strrev (CRT)
14214	cycles for 100 * jRev (jj)
17288	cycles for 100 * Mirror$ (MasmBasic)
5350	cycles for 100 * xRev (jj, needs pshufb)
6834	cycles for 100 * RevLingo (aligned strings only)
113656	cycles for 100 * cRev (caballero)

26047	cycles for 100 * tRev (Hutch)
40261	cycles for 100 * strrev (CRT)
14364	cycles for 100 * jRev (jj)
18570	cycles for 100 * Mirror$ (MasmBasic)
5377	cycles for 100 * xRev (jj, needs pshufb)
6637	cycles for 100 * RevLingo (aligned strings only)
114863	cycles for 100 * cRev (caballero)

26578	cycles for 100 * tRev (Hutch)
39065	cycles for 100 * strrev (CRT)
14363	cycles for 100 * jRev (jj)
16243	cycles for 100 * Mirror$ (MasmBasic)
5453	cycles for 100 * xRev (jj, needs pshufb)
6650	cycles for 100 * RevLingo (aligned strings only)
115443	cycles for 100 * cRev (caballero)

26088	cycles for 100 * tRev (Hutch)
39862	cycles for 100 * strrev (CRT)
14335	cycles for 100 * jRev (jj)
16200	cycles for 100 * Mirror$ (MasmBasic)
5371	cycles for 100 * xRev (jj, needs pshufb)
6628	cycles for 100 * RevLingo (aligned strings only)
115281	cycles for 100 * cRev (caballero)

107	bytes for tRev (Hutch)
19	bytes for strrev (CRT)
51	bytes for jRev (jj)
10	bytes for Mirror$ (MasmBasic)
87	bytes for xRev (jj, needs pshufb)
123	bytes for RevLingo (aligned strings only)
56	bytes for cRev (caballero)

1259388704	= eax tRev (Hutch)
1259388704	= eax strrev (CRT)
1259388704	= eax jRev (jj)
1259388704	= eax Mirror$ (MasmBasic)
1259388704	= eax xRev (jj, needs pshufb)
1259388704	= eax RevLingo (aligned strings only)
1259388704	= eax cRev (caballero)

--- ok ---

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

37197	cycles for 100 * tRev (Hutch)
65773	cycles for 100 * strrev (CRT)
24460	cycles for 100 * jRev (jj)
28888	cycles for 100 * Mirror$ (MasmBasic)

Regards,

Steve N.

jj2007 · May 08, 2021, 05:31:16 AM

Thanks, Steve. Apparently your Pentium M doesn't feature pshufb...

avcaballero · May 08, 2021, 05:31:42 AM

Hi, I wonder why is my version soooo slow. As I see, I think the problem is the use of "lstrlen" each time cRev proc is called. If we know that the string is always 100 chars long, we could change the code as follows. It should be faster now.

Code Select


cRev proc uses esi edi pSrc, pDest
  mov esi, pSrc
  ;invoke lstrlen, esi
  ;mov ecx, eax
  mov ecx, 100
  mov edi, pDest
  add edi, ecx
  mov byte ptr [edi], 0
  dec edi
  @Bucle:
  movsb
  sub edi, 2
  loop @Bucle
  ret
cRev endp

jj2007 · May 08, 2021, 12:06:41 PM

The new version is indeed a bit faster. However, all other algos check for the length of the string.

Code Select

cRev proc uses esi edi pSrc, pDest
  mov esi, pSrc
  if 0
	mov ecx, 100	; 75800 cycles for 100 * cRev (caballero)
  else
	invoke lstrlen, esi	; 96500 cycles
	mov ecx, eax
  endif

hutch-- · May 08, 2021, 06:57:24 PM

I don't know the range of designs but any string reverse should be able to handle variable length strings. To be fair to all techniques, ranging from 2 or 3 characters to a couple of hundred characters would give you more useful results.

avcaballero · May 08, 2021, 07:56:51 PM

I have done some testings. It seems to penalize a lot using movs and even loop. Finally the best of all is using dwords as much as possible. All of them use the str length.

It was a bit hard to me compile jj's code, so I created my own to test it.

These are the results:

Code Select


Test Algorithm 7:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 6:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 5:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 4:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 3:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 2:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 1:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Timming Algorithm 7: 94
Test Timming Algorithm 6: 78
Test Timming Algorithm 5:  109
Test Timming Algorithm 4:  219
Test Timming Algorithm 3:  203
Test Timming Algorithm 2:  313
Test Timming Algorithm 1:  312

Code Select


algo 1: lstrlen, movsb, loop ecx
algo 2: Longitud (my own subroutine to calc str_len), movsb, loop ecx
algo 3: lstrlen, inc/dec, loop ecx
algo 4: Longitud, inc/dec, loop ecx
algo 5: Longitud, inc/dec, dec ecx
algo 6: lstrlen, dwords, dec ecx
algo 7: Longitud, dwords, dec ecx

hutch-- · May 08, 2021, 08:25:27 PM

caballero,

Try something like this.

mov eax, pTxt
sub eax, 1
@@:
add eax, 1
cmp BYTE PTR [eax], 0
jne @B
sub eax, pTxt

Length is in eax.

avcaballero · May 08, 2021, 08:41:30 PM

@Hutch
Yes, that's faster

. It definitely doesn't seem like a good idea to use movs niether scas

nidud · May 08, 2021, 08:42:56 PM

deleted

The MASM Forum

News:

String reverse algo

hutch--

jj2007

FORTRANS

jj2007

daydreamer

hutch--

FORTRANS

jj2007

avcaballero

jj2007

hutch--

avcaballero

hutch--

avcaballero

nidud