The MASM Forum

General => The Laboratory => Topic started by: hutch-- on May 07, 2021, 09:34:39 PM

Title: String reverse algo
Post by: hutch-- on May 07, 2021, 09:34:39 PM
I don't get much time to write 32 bit code these days but since its prototype was wasted in the campus, here is a derivation of the first version that is self contained and full register design. Nothing to test it against but it runs OK.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    tRev PROTO pstr:DWORD

    .data
      numz db "1234567890",0

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL rval :DWORD

    invoke tRev,ADDR numz       ; reverse the number above
    mov rval, eax

    print rval,13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

tRev proc pstr:DWORD
  ; ---------------------------------------------
  ; in place string reverse, fully self contained
  ; ---------------------------------------------
    push ebx                            ; save EBX
    push edi
    push esi

    mov edi, [ebp+8]
    mov eax, edi
    sub eax, 1

  @@:                                   ; unroll by 4
  REPEAT 3
    add eax, 1                          ; get length
    cmp BYTE PTR [eax], 0               ; loop to buffer end
    je @F
  ENDM
    add eax, 1                          ; get length
    cmp BYTE PTR [eax], 0               ; loop to buffer end
    jne @B

  @@:
    sub eax, edi                        ; sub address from eax for len
    mov esi, eax                        ; store length in esi
    shr eax, 1                          ; get integer half length of string
    mov ebx, eax                        ; store it in variable
    mov eax, edi                        ; eax has start char
    mov edx, edi                        ; load start address
    add edx, esi                        ; get end character address
    sub edx, 1                          ; sub 1 to get end char
    mov esi, ebx                        ; copy half len into edi

  @@:
    movzx ebx, BYTE PTR [eax]           ; load start char
    movzx ecx, BYTE PTR [edx]           ; load end char
    mov [eax], cl                       ; reverse char pair
    add eax, 1                          ; increment start char
    mov [edx], bl
    sub edx, 1                          ; decrement end char
    sub esi, 1                          ; sub 1 from half length
    jnz @B                              ; loop back if its not zero

    mov eax, edi                        ; return the address in eax

    pop esi
    pop edi
    pop ebx                             ; restore EBX

    ret 4                               ; balance the stack by 4 bytes

tRev endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: String reverse algo
Post by: jj2007 on May 07, 2021, 09:53:37 PM
Since we are in the Lab :biggrin:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

25756   cycles for 100 * tRev (Hutch)
33742   cycles for 100 * strrev (CRT)
13687   cycles for 100 * jRev (jj)

24425   cycles for 100 * tRev (Hutch)
33634   cycles for 100 * strrev (CRT)
13737   cycles for 100 * jRev (jj)

24779   cycles for 100 * tRev (Hutch)
33666   cycles for 100 * strrev (CRT)
13671   cycles for 100 * jRev (jj)

24694   cycles for 100 * tRev (Hutch)
33700   cycles for 100 * strrev (CRT)
13758   cycles for 100 * jRev (jj)

107     bytes for tRev (Hutch)
19      bytes for strrev (CRT)
51      bytes for jRev (jj)

1819043144      = eax tRev (Hutch)
1819043144      = eax strrev (CRT)
1819043144      = eax jRev (jj)
Title: Re: String reverse algo
Post by: FORTRANS on May 07, 2021, 11:28:11 PM
Hi,

   Two results.

Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

25531 cycles for 100 * tRev (Hutch)
38912 cycles for 100 * strrev (CRT)
14510 cycles for 100 * jRev (jj)

25745 cycles for 100 * tRev (Hutch)
38790 cycles for 100 * strrev (CRT)
14539 cycles for 100 * jRev (jj)

25463 cycles for 100 * tRev (Hutch)
39089 cycles for 100 * strrev (CRT)
14285 cycles for 100 * jRev (jj)

25515 cycles for 100 * tRev (Hutch)
38875 cycles for 100 * strrev (CRT)
14330 cycles for 100 * jRev (jj)

107 bytes for tRev (Hutch)
19 bytes for strrev (CRT)
51 bytes for jRev (jj)

1819043144 = eax tRev (Hutch)
1819043144 = eax strrev (CRT)
1819043144 = eax jRev (jj)

--- ok ---
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

37301 cycles for 100 * tRev (Hutch)
65684 cycles for 100 * strrev (CRT)
24447 cycles for 100 * jRev (jj)

37206 cycles for 100 * tRev (Hutch)
65876 cycles for 100 * strrev (CRT)
24418 cycles for 100 * jRev (jj)

37151 cycles for 100 * tRev (Hutch)
65790 cycles for 100 * strrev (CRT)
24463 cycles for 100 * jRev (jj)

37155 cycles for 100 * tRev (Hutch)
65919 cycles for 100 * strrev (CRT)
24468 cycles for 100 * jRev (jj)

107 bytes for tRev (Hutch)
19 bytes for strrev (CRT)
51 bytes for jRev (jj)

1819043144 = eax tRev (Hutch)
1819043144 = eax strrev (CRT)
1819043144 = eax jRev (jj)

--- ok ---


Cheers,

Steve N.
Title: Re: String reverse algo
Post by: jj2007 on May 08, 2021, 12:26:26 AM
Thanks, Steve. Here is another one called xRev, using pshufb:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

24424   cycles for 100 * tRev (Hutch)
33751   cycles for 100 * strrev (CRT)
13704   cycles for 100 * jRev (jj)
14602   cycles for 100 * Mirror$ (MasmBasic)
4662    cycles for 100 * xRev (jj)

24513   cycles for 100 * tRev (Hutch)
33951   cycles for 100 * strrev (CRT)
13689   cycles for 100 * jRev (jj)
14589   cycles for 100 * Mirror$ (MasmBasic)
4659    cycles for 100 * xRev (jj)

24555   cycles for 100 * tRev (Hutch)
33987   cycles for 100 * strrev (CRT)
13773   cycles for 100 * jRev (jj)
14670   cycles for 100 * Mirror$ (MasmBasic)
4677    cycles for 100 * xRev (jj)

24750   cycles for 100 * tRev (Hutch)
33709   cycles for 100 * strrev (CRT)
13749   cycles for 100 * jRev (jj)
14620   cycles for 100 * Mirror$ (MasmBasic)
4721    cycles for 100 * xRev (jj)

107     bytes for tRev (Hutch)
19      bytes for strrev (CRT)
51      bytes for jRev (jj)
12      bytes for Mirror$ (MasmBasic)
87      bytes for xRev (jj)

1819043144      = eax tRev (Hutch)
1819043144      = eax strrev (CRT)
1819043144      = eax jRev (jj)
2053468783      = eax Mirror$ (MasmBasic)
1819043144      = eax xRev (jj)


Note Mirror$() (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1167) and Caballero's algo write to a buffer, i.e. they are not in-place algos.

Do not use for serious stuff - this is work in progress and not fully tested :cool:

P.S.: I've added Lingo's algo (http://www.masmforum.com/board/index.php?topic=14041.msg111377#msg111377) (fast but crashes for unaligned strings), and Caballero's contribution (http://masm32.com/board/index.php?topic=9347.msg102591#msg102591).
Title: Re: String reverse algo
Post by: daydreamer on May 08, 2021, 03:22:06 AM
wouldnt it be more standard today with unicode 16 reverse string?

Title: Re: String reverse algo
Post by: hutch-- on May 08, 2021, 03:30:07 AM
Write one for us Magnus.  :thup:
Title: Re: String reverse algo
Post by: FORTRANS on May 08, 2021, 04:52:04 AM
Hi,

   One result and a partial.

Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

25930 cycles for 100 * tRev (Hutch)
39465 cycles for 100 * strrev (CRT)
14214 cycles for 100 * jRev (jj)
17288 cycles for 100 * Mirror$ (MasmBasic)
5350 cycles for 100 * xRev (jj, needs pshufb)
6834 cycles for 100 * RevLingo (aligned strings only)
113656 cycles for 100 * cRev (caballero)

26047 cycles for 100 * tRev (Hutch)
40261 cycles for 100 * strrev (CRT)
14364 cycles for 100 * jRev (jj)
18570 cycles for 100 * Mirror$ (MasmBasic)
5377 cycles for 100 * xRev (jj, needs pshufb)
6637 cycles for 100 * RevLingo (aligned strings only)
114863 cycles for 100 * cRev (caballero)

26578 cycles for 100 * tRev (Hutch)
39065 cycles for 100 * strrev (CRT)
14363 cycles for 100 * jRev (jj)
16243 cycles for 100 * Mirror$ (MasmBasic)
5453 cycles for 100 * xRev (jj, needs pshufb)
6650 cycles for 100 * RevLingo (aligned strings only)
115443 cycles for 100 * cRev (caballero)

26088 cycles for 100 * tRev (Hutch)
39862 cycles for 100 * strrev (CRT)
14335 cycles for 100 * jRev (jj)
16200 cycles for 100 * Mirror$ (MasmBasic)
5371 cycles for 100 * xRev (jj, needs pshufb)
6628 cycles for 100 * RevLingo (aligned strings only)
115281 cycles for 100 * cRev (caballero)

107 bytes for tRev (Hutch)
19 bytes for strrev (CRT)
51 bytes for jRev (jj)
10 bytes for Mirror$ (MasmBasic)
87 bytes for xRev (jj, needs pshufb)
123 bytes for RevLingo (aligned strings only)
56 bytes for cRev (caballero)

1259388704 = eax tRev (Hutch)
1259388704 = eax strrev (CRT)
1259388704 = eax jRev (jj)
1259388704 = eax Mirror$ (MasmBasic)
1259388704 = eax xRev (jj, needs pshufb)
1259388704 = eax RevLingo (aligned strings only)
1259388704 = eax cRev (caballero)

--- ok ---

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

37197 cycles for 100 * tRev (Hutch)
65773 cycles for 100 * strrev (CRT)
24460 cycles for 100 * jRev (jj)
28888 cycles for 100 * Mirror$ (MasmBasic)


Regards,

Steve N.
Title: Re: String reverse algo
Post by: jj2007 on May 08, 2021, 05:31:16 AM
Thanks, Steve. Apparently your Pentium M doesn't feature pshufb...
Title: Re: String reverse algo
Post by: avcaballero on May 08, 2021, 05:31:42 AM
Hi, I wonder why is my version soooo slow. As I see, I think the problem is the use of "lstrlen" each time cRev proc is called. If we know that the string is always 100 chars long, we could change the code as follows. It should be faster now.

cRev proc uses esi edi pSrc, pDest
  mov esi, pSrc
  ;invoke lstrlen, esi
  ;mov ecx, eax
  mov ecx, 100
  mov edi, pDest
  add edi, ecx
  mov byte ptr [edi], 0
  dec edi
  @Bucle:
  movsb
  sub edi, 2
  loop @Bucle
  ret
cRev endp

Title: Re: String reverse algo
Post by: jj2007 on May 08, 2021, 12:06:41 PM
The new version is indeed a bit faster. However, all other algos check for the length of the string.

cRev proc uses esi edi pSrc, pDest
  mov esi, pSrc
  if 0
mov ecx, 100 ; 75800 cycles for 100 * cRev (caballero)
  else
invoke lstrlen, esi ; 96500 cycles
mov ecx, eax
  endif
Title: Re: String reverse algo
Post by: hutch-- on May 08, 2021, 06:57:24 PM
I don't know the range of designs but any string reverse should be able to handle variable length strings. To be fair to all techniques, ranging from 2 or 3 characters to a couple of hundred characters would give you more useful results.
Title: Re: String reverse algo
Post by: avcaballero on May 08, 2021, 07:56:51 PM
I have done some testings. It seems to penalize a lot using movs and even loop. Finally the best of all is using dwords as much as possible. All of them use the str length.

It was a bit hard to me compile jj's code, so I created my own to test it.

These are the results:

Test Algorithm 7:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 6:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 5:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 4:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 3:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 2:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 1:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Timming Algorithm 7: 94
Test Timming Algorithm 6: 78
Test Timming Algorithm 5:  109
Test Timming Algorithm 4:  219
Test Timming Algorithm 3:  203
Test Timming Algorithm 2:  313
Test Timming Algorithm 1:  312


algo 1: lstrlen, movsb, loop ecx
algo 2: Longitud (my own subroutine to calc str_len), movsb, loop ecx
algo 3: lstrlen, inc/dec, loop ecx
algo 4: Longitud, inc/dec, loop ecx
algo 5: Longitud, inc/dec, dec ecx
algo 6: lstrlen, dwords, dec ecx
algo 7: Longitud, dwords, dec ecx

Title: Re: String reverse algo
Post by: hutch-- on May 08, 2021, 08:25:27 PM
caballero,

Try something like this.

    mov eax, pTxt
    sub eax, 1
  @@:
    add eax, 1
    cmp BYTE PTR [eax], 0
    jne @B
    sub eax, pTxt

Length is in eax.
Title: Re: String reverse algo
Post by: avcaballero on May 08, 2021, 08:41:30 PM
@Hutch
Yes, that's faster  :thumbsup:. It definitely doesn't seem like a good idea to use movs niether scas
Title: Re: String reverse algo
Post by: nidud on May 08, 2021, 08:42:56 PM
deleted
Title: Re: String reverse algo
Post by: avcaballero on May 08, 2021, 09:13:34 PM
@nidud
Interesting! though it overlaps the original string. Yet, it seems that the best algo remains the 7th one.
Quote
Test Algorithm 1:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 2:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 3:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 4:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 5:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 6:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 7:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 8:
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Timming Algorithm 1:  329
Test Timming Algorithm 2:  281
Test Timming Algorithm 3:  219
Test Timming Algorithm 4:  203
Test Timming Algorithm 5: 93
Test Timming Algorithm 6: 79
Test Timming Algorithm 7: 62
Test Timming Algorithm 8: 78
Title: Re: String reverse algo
Post by: jj2007 on May 08, 2021, 09:20:12 PM
New version with Nidud's algo (I called it nRev for consistency :biggrin:):

nRev:
  mov edx,[esp+4]
  mov ecx,[esp+4]
  xor eax,eax
slen:
  add ecx,1
  cmp al,[ecx-1]
  jne slen
  sub ecx,2
lupe:
  cmp edx,ecx
  ja done
  mov al,[edx]
  mov ah,[ecx]
  mov [ecx],al
  mov [edx],ah
  lea edx,[edx+1]
  lea ecx,[ecx-1]
  jne lupe
done:
  mov eax,[esp+4]
  retn 4


Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

24486   cycles for 100 * tRev (Hutch)
33686   cycles for 100 * strrev (CRT)
13557   cycles for 100 * jRev (jj)
13920   cycles for 100 * Mirror$ (MasmBasic)
4541    cycles for 100 * xRev (jj, needs pshufb)
5548    cycles for 100 * RevLingo (aligned strings only)
95812   cycles for 100 * cRev (caballero)
32676   cycles for 100 * nRev (Nidud)

24901   cycles for 100 * tRev (Hutch)
33800   cycles for 100 * strrev (CRT)
13677   cycles for 100 * jRev (jj)
14012   cycles for 100 * Mirror$ (MasmBasic)
4618    cycles for 100 * xRev (jj, needs pshufb)
5955    cycles for 100 * RevLingo (aligned strings only)
95700   cycles for 100 * cRev (caballero)
32590   cycles for 100 * nRev (Nidud)

24625   cycles for 100 * tRev (Hutch)
33799   cycles for 100 * strrev (CRT)
13914   cycles for 100 * jRev (jj)
14180   cycles for 100 * Mirror$ (MasmBasic)
4617    cycles for 100 * xRev (jj, needs pshufb)
5668    cycles for 100 * RevLingo (aligned strings only)
95766   cycles for 100 * cRev (caballero)
32612   cycles for 100 * nRev (Nidud)

24617   cycles for 100 * tRev (Hutch)
33827   cycles for 100 * strrev (CRT)
13672   cycles for 100 * jRev (jj)
14032   cycles for 100 * Mirror$ (MasmBasic)
4619    cycles for 100 * xRev (jj, needs pshufb)
5663    cycles for 100 * RevLingo (aligned strings only)
97332   cycles for 100 * cRev (caballero)
33423   cycles for 100 * nRev (Nidud)

107     bytes for tRev (Hutch)
19      bytes for strrev (CRT)
51      bytes for jRev (jj)
10      bytes for Mirror$ (MasmBasic)
87      bytes for xRev (jj, needs pshufb)
123     bytes for RevLingo (aligned strings only)
56      bytes for cRev (caballero)
63      bytes for nRev (Nidud)

1259388704      = eax tRev (Hutch)
1259388704      = eax strrev (CRT)
1259388704      = eax jRev (jj)
1259388704      = eax Mirror$ (MasmBasic)
1259388704      = eax xRev (jj, needs pshufb)
1259388704      = eax RevLingo (aligned strings only)
1259388704      = eax cRev (caballero)
1259388704      = eax nRev (Nidud)
Title: Re: String reverse algo
Post by: hutch-- on May 08, 2021, 09:49:19 PM
Using the old string instructions without the REP prefix is self inflicted punishment, with the REP prefix you get special case circuitry that performs well, still slightly slower than SSE or AVX but plenty fast enough for most situations. Avoid old instructions like the plague, they live in junk microcode for backwards compatibility, not performance.
Title: Re: String reverse algo
Post by: nidud on May 08, 2021, 09:54:06 PM
deleted
Title: Re: String reverse algo
Post by: nidud on May 08, 2021, 10:21:57 PM
deleted
Title: Re: String reverse algo
Post by: LiaoMi on May 09, 2021, 01:42:49 AM
Hi,

Fast array reversal with SIMD (A small study in hardware accelerated AoS reversal) - https://dev.to/wunk/fast-array-reversal-with-simd-j3p
https://github.com/Wunkolo/qreverse/archive/refs/heads/master.zip

// Powers of two
   Bench< ELEMENTSIZE, 8 >();
   Bench< ELEMENTSIZE, 16 >();
   Bench< ELEMENTSIZE, 32 >();
   Bench< ELEMENTSIZE, 64 >();
   Bench< ELEMENTSIZE, 128 >();
   Bench< ELEMENTSIZE, 256 >();
   Bench< ELEMENTSIZE, 512 >();
   Bench< ELEMENTSIZE, 1024 >();

   // Powers of ten
   Bench< ELEMENTSIZE, 100 >();
   Bench< ELEMENTSIZE, 1000 >();
   Bench< ELEMENTSIZE, 10000 >();
   Bench< ELEMENTSIZE, 100000 >();
   Bench< ELEMENTSIZE, 1000000 >();

   // Primes
   Bench< ELEMENTSIZE, 59 >();
   Bench< ELEMENTSIZE, 79 >();
   Bench< ELEMENTSIZE, 173 >();
   Bench< ELEMENTSIZE, 6133 >();
   Bench< ELEMENTSIZE, 10177 >();
   Bench< ELEMENTSIZE, 25253 >();
   Bench< ELEMENTSIZE, 31391 >();
   Bench< ELEMENTSIZE, 50432 >();

AVX2 byte
void qReverse<1>(void* Array, std::size_t Count)
{
std::uint8_t* Array8 = reinterpret_cast<std::uint8_t*>(Array);
std::size_t i = 0;

        for( std::size_t j = i / 32; j < ((Count / 2) / 32); ++j )
{
const __m256i ShuffleRev = _mm256_set_epi8(
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15
);
// Load 32 elements at once into one 32-byte register
__m256i Lower = _mm256_loadu_si256(
reinterpret_cast<__m256i*>(&Array8[i])
);
__m256i Upper = _mm256_loadu_si256(
reinterpret_cast<__m256i*>(&Array8[Count - i - 32])
);

// Reverse the byte order of our 32-byte vectors
Lower = _mm256_shuffle_epi8(Lower,ShuffleRev);
Upper = _mm256_shuffle_epi8(Upper,ShuffleRev);

Lower = _mm256_permute2x128_si256(Lower,Lower,1);
Upper = _mm256_permute2x128_si256(Upper,Upper,1);

// Place them at their swapped position
_mm256_storeu_si256(
reinterpret_cast<__m256i*>(&Array8[i]),
Upper
);
_mm256_storeu_si256(
reinterpret_cast<__m256i*>(&Array8[Count - i - 32]),
Lower
);

// 32 elements at a time
i += 32;
}
}// BSWAP 32
for( std::size_t j = i / 4; j < ((Count / 2) / 4); ++j )
{
// Get bswapped versions of our Upper and Lower 4-byte chunks
std::uint32_t Lower = Swap32(
*reinterpret_cast<std::uint32_t*>(&Array8[i])
);
std::uint32_t Upper = Swap32(
*reinterpret_cast<std::uint32_t*>(&Array8[Count - i - 4])
);

// Place them at their swapped position
*reinterpret_cast<std::uint32_t*>(&Array8[i]) = Upper;
*reinterpret_cast<std::uint32_t*>(&Array8[Count - i - 4]) = Lower;

// Four elements at a time
i += 4;
}
// BSWAP 16
for( std::size_t j = i / 2; j < ((Count / 2) / 2); ++j )
{
// Get bswapped versions of our Upper and Lower 4-byte chunks
std::uint16_t Lower = Swap16(
*reinterpret_cast<std::uint16_t*>(&Array8[i])
);
std::uint16_t Upper = Swap16(
*reinterpret_cast<std::uint16_t*>(&Array8[Count - i - 2])
);

// Place them at their swapped position
*reinterpret_cast<std::uint16_t*>(&Array8[i]) = Upper;
*reinterpret_cast<std::uint16_t*>(&Array8[Count - i - 2]) = Lower;

// Two elements at a time
i += 2;
}

// Everything else that we can not do a bswap on, we swap normally
// Naive swaps
for( ; i < Count / 2; ++i )
{
// Exchange the upper and lower element as we work our
// way down to the middle from either end
std::uint8_t Temp(Array8[i]);
Array8[i] = Array8[Count - i - 1];
Array8[Count - i - 1] = Temp;
}
}


SSE3

void qReverse<1>(void* Array, std::size_t Count)
{
std::uint8_t* Array8 = reinterpret_cast<std::uint8_t*>(Array);
std::size_t i = 0;

for( std::size_t j = i / 16; j < ((Count / 2) / 16); ++j )
{
const __m128i ShuffleRev = _mm_set_epi8(
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
);
// Load 16 elements at once into one 16-byte register
__m128i Lower = _mm_loadu_si128(
reinterpret_cast<__m128i*>(&Array8[i])
);
__m128i Upper = _mm_loadu_si128(
reinterpret_cast<__m128i*>(&Array8[Count - i - 16])
);

// Reverse the byte order of our 16-byte vectors
Lower = _mm_shuffle_epi8(Lower, ShuffleRev);
Upper = _mm_shuffle_epi8(Upper, ShuffleRev);

// Place them at their swapped position
_mm_storeu_si128(
reinterpret_cast<__m128i*>(&Array8[i]),
Upper
);
_mm_storeu_si128(
reinterpret_cast<__m128i*>(&Array8[Count - i - 16]),
Lower
);

// 16 elements at a time
i += 16;
}
// BSWAP 32
for( std::size_t j = i / 4; j < ((Count / 2) / 4); ++j )
{
// Get bswapped versions of our Upper and Lower 4-byte chunks
std::uint32_t Lower = Swap32(
*reinterpret_cast<std::uint32_t*>(&Array8[i])
);
std::uint32_t Upper = Swap32(
*reinterpret_cast<std::uint32_t*>(&Array8[Count - i - 4])
);

// Place them at their swapped position
*reinterpret_cast<std::uint32_t*>(&Array8[i]) = Upper;
*reinterpret_cast<std::uint32_t*>(&Array8[Count - i - 4]) = Lower;

// Four elements at a time
i += 4;
}
// BSWAP 16
for( std::size_t j = i / 2; j < ((Count / 2) / 2); ++j )
{
// Get bswapped versions of our Upper and Lower 4-byte chunks
std::uint16_t Lower = Swap16(
*reinterpret_cast<std::uint16_t*>(&Array8[i])
);
std::uint16_t Upper = Swap16(
*reinterpret_cast<std::uint16_t*>(&Array8[Count - i - 2])
);

// Place them at their swapped position
*reinterpret_cast<std::uint16_t*>(&Array8[i]) = Upper;
*reinterpret_cast<std::uint16_t*>(&Array8[Count - i - 2]) = Lower;

// Two elements at a time
i += 2;
}

// Everything else that we can not do a bswap on, we swap normally
// Naive swaps
for( ; i < Count / 2; ++i )
{
// Exchange the upper and lower element as we work our
// way down to the middle from either end
std::uint8_t Temp(Array8[i]);
Array8[i] = Array8[Count - i - 1];
Array8[Count - i - 1] = Temp;
}
}
Title: Re: String reverse algo
Post by: TouEnMasm on May 09, 2021, 03:03:26 AM
other method
I have modify the end of reverse,a char was missing

chaine db "Hello, this is a simple string intended for testing string algos. It has 100 characters without zero",0
firstreverse db 100h dup (0)
reorder  db  100h dup (0)

.code
reverse proc uses esi edi ebx
;invoke DebugBreak
mov esi,offset chaine
mov edx,offset chaine
add esi,sizeof chaine
dec esi
dec esi
mov edi,offset firstreverse
@@:
std
lodsb
cld
stosb
cmp esi,edx
jge @B
mov byte ptr [edi],0
ret


reverse endp
main PROC  C 
invoke reverse
invoke printf,TXT("reverse : %s "),addr firstreverse
invoke _getch
ret
main endp

Title: Re: String reverse algo
Post by: jj2007 on May 09, 2021, 04:33:08 AM
Yves,

Attached the testbed with your algo - it's very slow on my machine. Since nobody used sizeof chaine, I've used len() instead. But the problem is elsewhere: the flags switches std and cld are very, very slow.

yRev proc uses esi edi ebx pSrc, pDest
  mov esi, pSrc
  add esi, len(esi) ; sizeof chaine
  mov edx, pSrc
  ; dec esi
  dec esi
  mov edi, pDest
  @@:
  std
  lodsb
  cld
  stosb
  cmp esi,edx
  jge @B
  mov byte ptr [edi],0
  ret
yRev endp


Quote from: dedndave on January 01, 2013, 05:11:20 AMthat's because it has to test the direction flag
you may remember our previous discussions about STD and CLD taking ~80 cycles
Title: Re: String reverse algo
Post by: avcaballero on May 09, 2021, 05:44:44 AM
I have revisited the nidud's algo, doing the reversing in the same string gives some opportunities :)
Quote
Test Algorithm 1:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 2:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 3:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 4:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 5:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 6:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 7:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 8:
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Algorithm 9:
Hello, this is a simple string intended for testing string algos. It has 100 characters without zero
orez tuohtiw sretcarahc 001 sah tI .sogla gnirts gnitset rof dednetni gnirts elpmis a si siht ,olleH

Test Timming Algorithm 1:  313
Test Timming Algorithm 2:  312
Test Timming Algorithm 3:  188
Test Timming Algorithm 4:  203
Test Timming Algorithm 5: 94
Test Timming Algorithm 6: 78
Test Timming Algorithm 7: 63
Test Timming Algorithm 8: 94
Test Timming Algorithm 9: 78
Algo8 is nidud's example, Algo9 is the revisited. 78 ticks takes it at the same level of Algo6. Probably using dwords could carry it to 63 ticks.


RevAlg09 proc uses ecx esi, pSrc
  ;
  mov        esi, [pSrc]
  mov        ecx, esi
  ; Busca final
  @top:
    inc        ecx
    or         byte ptr [ecx], 0
  jnz        @top
  dec        ecx
  @Bucle:
    mov        al, byte ptr [esi]
    mov        ah, byte ptr [ecx]
    mov        byte ptr [ecx], al
    mov        byte ptr [esi], ah
    inc        esi
    dec        ecx
    cmp        ecx, esi
  jnbe       @Bucle
  ret
RevAlg09 endp

Title: String reverse algo: revised versions
Post by: jj2007 on May 09, 2021, 11:20:57 AM
New version with more thorough testing:
- my own jRev and xRev algos produced wrong results depending on the length of the string; fixed
- Hutch' algo could not handle short strings with less than 2 bytes; fixed
- Caballero's algo crashed on zero length strings; fixed

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

24614   cycles for 99 * tRev (Hutch)
33313   cycles for 99 * strrev (CRT)
6669    cycles for 99 * jRev (jj)
13590   cycles for 99 * Mirror$ (MasmBasic)
4436    cycles for 99 * xRev (jj, needs pshufb)
5542    cycles for 99 * RevLingo (aligned strings only)
94021   cycles for 99 * cRev (caballero)
32311   cycles for 99 * nRev (Nidud)

24690   cycles for 99 * tRev (Hutch)
33388   cycles for 99 * strrev (CRT)
6819    cycles for 99 * jRev (jj)
13694   cycles for 99 * Mirror$ (MasmBasic)
4539    cycles for 99 * xRev (jj, needs pshufb)
5608    cycles for 99 * RevLingo (aligned strings only)
94088   cycles for 99 * cRev (caballero)
32321   cycles for 99 * nRev (Nidud)

24671   cycles for 99 * tRev (Hutch)
33401   cycles for 99 * strrev (CRT)
6788    cycles for 99 * jRev (jj)
13728   cycles for 99 * Mirror$ (MasmBasic)
4555    cycles for 99 * xRev (jj, needs pshufb)
5617    cycles for 99 * RevLingo (aligned strings only)
94015   cycles for 99 * cRev (caballero)
32272   cycles for 99 * nRev (Nidud)

107     bytes for tRev (Hutch)
19      bytes for strrev (CRT)
87      bytes for jRev (jj)
10      bytes for Mirror$ (MasmBasic)
111     bytes for xRev (jj, needs pshufb)
123     bytes for RevLingo (aligned strings only)
56      bytes for cRev (caballero)
63      bytes for nRev (Nidud)

len(src)=100 bytes
464062  tRev (Hutch)                    .sogla gnirts gnitset rof dedn
464062  strrev (CRT)                    .sogla gnirts gnitset rof dedn
464062  jRev (jj)                       .sogla gnirts gnitset rof dedn
464062  Mirror$ (MasmBasic)             .sogla gnirts gnitset rof dedn
464062  xRev (jj, needs pshufb)         .sogla gnirts gnitset rof dedn
464062  RevLingo (aligned strings only) .sogla gnirts gnitset rof dedn
464062  cRev (caballero)                .sogla gnirts gnitset rof dedn
464062  nRev (Nidud)                    .sogla gnirts gnitset rof dedn
Title: Re: String reverse algo
Post by: hutch-- on May 09, 2021, 11:35:24 AM
Stick this version into the test bed. I removed the stack frame so it may be a bit faster.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

tRev proc
  ; ---------------------------------------------
  ; in place string reverse, fully self contained
  ; ---------------------------------------------
    push ebx
    push edi
    push esi

    mov edi, [esp+4][12]                ; string address
    mov eax, edi

    sub eax, 1
  @@:                                   ; unroll by 4
  REPEAT 3
    add eax, 1                          ; get length
    cmp BYTE PTR [eax], 0               ; loop to buffer end
    je @F
  ENDM
    add eax, 1                          ; get length
    cmp BYTE PTR [eax], 0               ; loop to buffer end
    jne @B

  @@:
    sub eax, edi                        ; sub address from eax for len
    mov esi, eax                        ; store length in esi

    shr eax, 1                          ; get integer half length of string
    mov ebx, eax                        ; store it in variable
    mov eax, edi                        ; eax has start char
    mov edx, edi                        ; load start address
    add edx, esi                        ; get end character address
    mov esi, ebx                        ; copy half len into esi
    sub edx, 1                          ; sub 1 to get end char

  @@:
    movzx ebx, BYTE PTR [eax]           ; load start char
    movzx ecx, BYTE PTR [edx]           ; load end char
    mov [eax], cl                       ; reverse char pair
    add eax, 1                          ; increment start char
    mov [edx], bl
    sub edx, 1                          ; decrement end char
    sub esi, 1                          ; sub 1 from half length
    jnz @B                              ; loop back if its not zero

    mov eax, edi                        ; return the address in eax

    pop esi
    pop edi
    pop ebx

    ret 4                               ; balance the stack by 4 bytes

tRev endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Title: Re: String reverse algo
Post by: hutch-- on May 09, 2021, 11:42:47 AM
Result your last revised version.

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (SSE4)

20914   cycles for 99 * tRev (Hutch)
31184   cycles for 99 * strrev (CRT)
6057    cycles for 99 * jRev (jj)
12924   cycles for 99 * Mirror$ (MasmBasic)
4213    cycles for 99 * xRev (jj, needs pshufb)
5318    cycles for 99 * RevLingo (aligned strings only)
91759   cycles for 99 * cRev (caballero)
20469   cycles for 99 * nRev (Nidud)

20989   cycles for 99 * tRev (Hutch)
30259   cycles for 99 * strrev (CRT)
6175    cycles for 99 * jRev (jj)
13076   cycles for 99 * Mirror$ (MasmBasic)
4324    cycles for 99 * xRev (jj, needs pshufb)
5419    cycles for 99 * RevLingo (aligned strings only)
91712   cycles for 99 * cRev (caballero)
20467   cycles for 99 * nRev (Nidud)

20983   cycles for 99 * tRev (Hutch)
30276   cycles for 99 * strrev (CRT)
6175    cycles for 99 * jRev (jj)
13089   cycles for 99 * Mirror$ (MasmBasic)
4327    cycles for 99 * xRev (jj, needs pshufb)
5440    cycles for 99 * RevLingo (aligned strings only)
92241   cycles for 99 * cRev (caballero)
20443   cycles for 99 * nRev (Nidud)

107     bytes for tRev (Hutch)
19      bytes for strrev (CRT)
87      bytes for jRev (jj)
10      bytes for Mirror$ (MasmBasic)
111     bytes for xRev (jj, needs pshufb)
123     bytes for RevLingo (aligned strings only)
56      bytes for cRev (caballero)
63      bytes for nRev (Nidud)

len(src)=100 bytes
464062  tRev (Hutch)                    .sogla gnirts gnitset rof dedn
464062  strrev (CRT)                    .sogla gnirts gnitset rof dedn
464062  jRev (jj)                       .sogla gnirts gnitset rof dedn
464062  Mirror$ (MasmBasic)             .sogla gnirts gnitset rof dedn
464062  xRev (jj, needs pshufb)         .sogla gnirts gnitset rof dedn
464062  RevLingo (aligned strings only) .sogla gnirts gnitset rof dedn
464062  cRev (caballero)                .sogla gnirts gnitset rof dedn
464062  nRev (Nidud)                    .sogla gnirts gnitset rof dedn

--- ok ---
Title: Re: String reverse algo
Post by: jj2007 on May 09, 2021, 07:24:15 PM
Quote from: hutch-- on May 09, 2021, 11:35:24 AM
Stick this version into the test bed. I removed the stack frame so it may be a bit faster.

Done :cool:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

24595   cycles for 99 * tRev (Hutch)
33051   cycles for 99 * strrev (CRT)
6683    cycles for 99 * jRev (jj)
13491   cycles for 99 * Mirror$ (MasmBasic)
4457    cycles for 99 * xRev (jj, needs pshufb)
7378    cycles for 99 * RevLingo (aligned strings only)
93296   cycles for 99 * cRev (caballero)
32548   cycles for 99 * nRev (Nidud)

23937   cycles for 99 * tRev (Hutch)
32986   cycles for 99 * strrev (CRT)
6725    cycles for 99 * jRev (jj)
13404   cycles for 99 * Mirror$ (MasmBasic)
4427    cycles for 99 * xRev (jj, needs pshufb)
7379    cycles for 99 * RevLingo (aligned strings only)
93127   cycles for 99 * cRev (caballero)
32965   cycles for 99 * nRev (Nidud)

23949   cycles for 99 * tRev (Hutch)
33041   cycles for 99 * strrev (CRT)
6703    cycles for 99 * jRev (jj)
13384   cycles for 99 * Mirror$ (MasmBasic)
4474    cycles for 99 * xRev (jj, needs pshufb)
7385    cycles for 99 * RevLingo (aligned strings only)
93401   cycles for 99 * cRev (caballero)
31649   cycles for 99 * nRev (Nidud)

107     bytes for tRev (Hutch)
19      bytes for strrev (CRT)
87      bytes for jRev (jj)
10      bytes for Mirror$ (MasmBasic)
111     bytes for xRev (jj, needs pshufb)
123     bytes for RevLingo (aligned strings only)
56      bytes for cRev (caballero)
63      bytes for nRev (Nidud)

len(src)=99 bytes
454815  tRev (Hutch)                    sogla gnirts gnitset rof dedne
454815  strrev (CRT)                    sogla gnirts gnitset rof dedne
454815  jRev (jj)                       sogla gnirts gnitset rof dedne
454815  Mirror$ (MasmBasic)             sogla gnirts gnitset rof dedne
454815  xRev (jj, needs pshufb)         sogla gnirts gnitset rof dedne
454479  RevLingo (aligned strings only) sogla gnirtstgni set rof dedne
454815  cRev (caballero)                sogla gnirts gnitset rof dedne
454815  nRev (Nidud)                    sogla gnirts gnitset rof dedne
Title: Re: String reverse algo
Post by: hutch-- on May 09, 2021, 11:34:42 PM
At the risk of wearing you out, one more that uses an external proc to get the length, the old Agner Fog StrLen.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

tRev proc
  ; --------------------------------------------------
  ; in place string reverse using external length proc
  ; --------------------------------------------------

    push ebx                                ;
    push edi                                ; save regs
    push esi                                ;

    mov edi, [esp+4][12]                    ; load string address

    push ebp

    push edi
    call StrLen                             ; use external string length algo

    mov edx, edi                            ; load start address
    sub edx, 1                              ; sub 1 to get end char
    add edx, eax                            ; get end character address
    shr eax, 1                              ; get integer half length of string
    mov esi, eax                            ; store it in esi
    mov eax, edi                            ; eax has start char

  @@:
    movzx ebx, BYTE PTR [eax]               ; load start char
    movzx ecx, BYTE PTR [edx]               ; load end char

    mov [eax], cl                           ; reverse char pair
    add eax, 1                              ; increment start char
    mov [edx], bl
    sub edx, 1                              ; decrement end char
    sub esi, 1                              ; sub 1 from half length
    jnz @B                                  ; loop back if its not zero

    mov eax, edi                            ; return the address in eax

    pop ebp
    pop esi                                 ;
    pop edi                                 ; restore regs
    pop ebx                                 ;

    ret 4                                   ; balance the stack by 4 bytes

tRev endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Title: Re: String reverse algo
Post by: jj2007 on May 10, 2021, 06:34:07 AM
You are on the right track, Hutch - 17k instead of 24kCycles, congrats :biggrin:

Indeed, one of the secrets of the fast jRev and xRev algos is this line: add Len (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1144)(ecx), ecx


Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
17399   cycles for 99 * tRev (Hutch)
33082   cycles for 99 * strrev (CRT)
6719    cycles for 99 * jRev (jj)
13395   cycles for 99 * Mirror$ (MasmBasic)
4458    cycles for 99 * xRev (jj, needs pshufb)
7459    cycles for 99 * RevLingo (aligned strings only)
93268   cycles for 99 * cRev (caballero)
31727   cycles for 99 * nRev (Nidud)

17005   cycles for 99 * tRev (Hutch)
33112   cycles for 99 * strrev (CRT)
6697    cycles for 99 * jRev (jj)
13400   cycles for 99 * Mirror$ (MasmBasic)
4452    cycles for 99 * xRev (jj, needs pshufb)
7379    cycles for 99 * RevLingo (aligned strings only)
93369   cycles for 99 * cRev (caballero)
31654   cycles for 99 * nRev (Nidud)

17052   cycles for 99 * tRev (Hutch)
33091   cycles for 99 * strrev (CRT)
6729    cycles for 99 * jRev (jj)
13402   cycles for 99 * Mirror$ (MasmBasic)
4439    cycles for 99 * xRev (jj, needs pshufb)
7371    cycles for 99 * RevLingo (aligned strings only)
93431   cycles for 99 * cRev (caballero)
31718   cycles for 99 * nRev (Nidud)

71      bytes for tRev (Hutch)
19      bytes for strrev (CRT)
87      bytes for jRev (jj)
10      bytes for Mirror$ (MasmBasic)
111     bytes for xRev (jj, needs pshufb)
123     bytes for RevLingo (aligned strings only)
56      bytes for cRev (caballero)
63      bytes for nRev (Nidud)

len(src)=99 bytes
454815  tRev (Hutch)                    sogla gnirts gnitset rof dedne
454815  strrev (CRT)                    sogla gnirts gnitset rof dedne
454815  jRev (jj)                       sogla gnirts gnitset rof dedne
454815  Mirror$ (MasmBasic)             sogla gnirts gnitset rof dedne
454815  xRev (jj, needs pshufb)         sogla gnirts gnitset rof dedne
454479  RevLingo (aligned strings only) sogla gnirtstgni set rof dedne
454815  cRev (caballero)                sogla gnirts gnitset rof dedne
454815  nRev (Nidud)                    sogla gnirts gnitset rof dedne


Btw this is necessary to avoid a crash in case somebody passes a zero length string to the algo:
    sub esi, 1                              ; sub 1 from half length
    jg @B                                  ; loop back if its not greater than zero

Option prologue etc are not necessary if the proc has no locals and no args.
Title: Re: String reverse algo
Post by: nidud on May 11, 2021, 12:10:25 AM
deleted
Title: Re: String reverse algo
Post by: nidud on May 11, 2021, 03:02:40 AM
deleted