Author Topic: Two version of upper and lower case conversions.  (Read 253 times)

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 5484
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Two version of upper and lower case conversions.
« on: April 23, 2018, 01:14:35 PM »
I prototyped these 2 in 32 bit and they were a very easy conversion to 64 bit MASM. They run OK but I have a sneaking suspicion that there is a much faster way to do this using either supplementary SSE3 or SSE4.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    LOCAL pstr  :QWORD

    mrm pstr, "This is a test"

    rcall upper,pstr
    conout pstr,lf

    rcall lower,pstr
    conout pstr,lf

    waitkey
    .exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

 NOSTACKFRAME

 upper proc

    mov rax, rcx                            ; load string address
    sub rax, 1                              ; set up for loop
    lea rdx, table                          ; load the table address
    jmp lbl                                 ; jump over pre into loop

  pre:
    sub BYTE PTR [rax], 32                  ; sub 32 = convert character to upper case
  lbl:
    add rax, 1
    movzx rcx, BYTE PTR [rax]               ; load byte address in rcx
    cmp BYTE PTR [rcx+rdx], 1               ; test if that byte is lower case
    je pre                                  ; jump to pre to convert
    test rcx, rcx                           ; test for terminator
    jnz lbl                                 ; loop back if not

    ret

  align 16
  table:
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1      ; lower case table
    db 1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

 upper endp

 STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

 NOSTACKFRAME

 lower proc

    mov rax, rcx                            ; load string address
    sub rax, 1                              ; set up for loop
    lea rdx, table1                         ; load the table address
    jmp lbl                                 ; jump over pre into loop

  pre:
    add BYTE PTR [rax], 32                  ; add 32 = make lower case
  lbl:
    add rax, 1
    movzx rcx, BYTE PTR [rax]               ; load byte address in rcx
    cmp BYTE PTR [rcx+rdx], 1               ; test if that byte is upper case
    je pre                                  ; jump to pre to convert
    test rcx, rcx                           ; test for terminator
    jnz lbl                                 ; loop back if not

    ret

  align 16
  table1:
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1      ; upper case table
    db 1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

 lower endp

 STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

jj2007

  • Member
  • *****
  • Posts: 8507
  • Assembler is fun ;-)
    • MasmBasic
Re: Two version of upper and lower case conversions.
« Reply #1 on: April 23, 2018, 03:07:27 PM »
I've hacked together some timings but cannot see a clear winner on my Core i5. Upper$() and Lower$() use the naive and al, -31 resp. or al, 32 thing:
Code: [Select]
This code was assembled with ml64 in 64-bit format
@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1108 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 1123 ms for upper
@#%&$+*this is a test string {123}[456] - 1404 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1108 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 858 ms for upper
@#%&$+*this is a test string {123}[456] - 1389 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1092 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 1107 ms for upper
@#%&$+*this is a test string {123}[456] - 1389 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1108 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 858 ms for upper
@#%&$+*this is a test string {123}[456] - 1419 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1108 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 1092 ms for upper
@#%&$+*this is a test string {123}[456] - 1404 ms for lower

Same with OPT_64 0:
Code: [Select]
This code was assembled with ML in 32-bit format
@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1092 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 874 ms for upper
@#%&$+*this is a test string {123}[456] - 1107 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1092 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 874 ms for upper
@#%&$+*this is a test string {123}[456] - 874 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1076 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 843 ms for upper
@#%&$+*this is a test string {123}[456] - 873 ms for lower

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 5484
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Two version of upper and lower case conversions.
« Reply #2 on: April 23, 2018, 05:11:07 PM »
I think for a lot of these things that memory read/write speed is the limiting factor.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

jj2007

  • Member
  • *****
  • Posts: 8507
  • Assembler is fun ;-)
    • MasmBasic
Re: Two version of upper and lower case conversions.
« Reply #3 on: April 23, 2018, 05:57:33 PM »
Maybe. Here are timings for a version (attached) using also CharUpperBuff:
Code: [Select]
This code was assembled with ml64 in 64-bit format
@#%&$+*THIS IS A TEST STRING {123}[456] - 1763 ms for CharUpperBuff
@#%&$+*THIS IS A TEST STRING {123}[456] - 296 ms for Upper$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 468 ms for upper
@#%&$+*this is a test string {123}[456] - 359 ms for Lower$()
@#%&$+*this is a test string {123}[456] - 281 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 1747 ms for CharUpperBuff
@#%&$+*THIS IS A TEST STRING {123}[456] - 281 ms for Upper$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 452 ms for upper
@#%&$+*this is a test string {123}[456] - 359 ms for Lower$()
@#%&$+*this is a test string {123}[456] - 296 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 1763 ms for CharUpperBuff
@#%&$+*THIS IS A TEST STRING {123}[456] - 297 ms for Upper$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 452 ms for upper
@#%&$+*this is a test string {123}[456] - 374 ms for Lower$()
@#%&$+*this is a test string {123}[456] - 375 ms for lower

It's horribly slow, probably using a table for the LOCALE. I have tried to think of how to use SIMD instructions here, but it's not so easy because of the range restrictions. For example:
Code: [Select]
  mov rdi, offset srcdest
  movups xmm0, spaces
  movups xmm1, OWORD ptr src
  orps xmm1, xmm0
  movups OWORD ptr srcdest, xmm1
Output:
Code: [Select]
@#%&$+*This is A
`#%&$+*this is a
Almost perfect ;)