Print Page - Two version of upper and lower case conversions.

Title: Two version of upper and lower case conversions.
Post by: hutch-- on April 23, 2018, 01:14:35 PM

I prototyped these 2 in 32 bit and they were a very easy conversion to 64 bit MASM. They run OK but I have a sneaking suspicion that there is a much faster way to do this using either supplementary SSE3 or SSE4.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

include \masm32\include64\masm64rt.inc

.code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

LOCAL pstr :QWORD

mrm pstr, "This is a test"

rcall upper,pstr
conout pstr,lf

rcall lower,pstr
conout pstr,lf

waitkey
.exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

NOSTACKFRAME

upper proc

mov rax, rcx ; load string address
sub rax, 1 ; set up for loop
lea rdx, table ; load the table address
jmp lbl ; jump over pre into loop

pre:
sub BYTE PTR [rax], 32 ; sub 32 = convert character to upper case
lbl:
add rax, 1
movzx rcx, BYTE PTR [rax] ; load byte address in rcx
cmp BYTE PTR [rcx+rdx], 1 ; test if that byte is lower case
je pre ; jump to pre to convert
test rcx, rcx ; test for terminator
jnz lbl ; loop back if not

ret

align 16
table:
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ; lower case table
db 1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

upper endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

NOSTACKFRAME

lower proc

mov rax, rcx ; load string address
sub rax, 1 ; set up for loop
lea rdx, table1 ; load the table address
jmp lbl ; jump over pre into loop

pre:
add BYTE PTR [rax], 32 ; add 32 = make lower case
lbl:
add rax, 1
movzx rcx, BYTE PTR [rax] ; load byte address in rcx
cmp BYTE PTR [rcx+rdx], 1 ; test if that byte is upper case
je pre ; jump to pre to convert
test rcx, rcx ; test for terminator
jnz lbl ; loop back if not

ret

align 16
table1:
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ; upper case table
db 1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

lower endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end

Title: Re: Two version of upper and lower case conversions.
Post by: jj2007 on April 23, 2018, 03:07:27 PM

I've hacked together some timings but cannot see a clear winner on my Core i5. Upper$() and Lower$() use the naive and al, -31 resp. or al, 32 thing:

Code Select

This code was assembled with ml64 in 64-bit format
@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1108 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 1123 ms for upper
@#%&$+*this is a test string {123}[456] - 1404 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1108 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 858 ms for upper
@#%&$+*this is a test string {123}[456] - 1389 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1092 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 1107 ms for upper
@#%&$+*this is a test string {123}[456] - 1389 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1108 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 858 ms for upper
@#%&$+*this is a test string {123}[456] - 1419 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1108 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 1092 ms for upper
@#%&$+*this is a test string {123}[456] - 1404 ms for lower

Same with OPT_64 0:

Code Select

This code was assembled with ML in 32-bit format
@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1092 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 874 ms for upper
@#%&$+*this is a test string {123}[456] - 1107 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1092 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 874 ms for upper
@#%&$+*this is a test string {123}[456] - 874 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 889 ms for Upper$()
@#%&$+*this is a test string {123}[456] - 1076 ms for Lower$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 843 ms for upper
@#%&$+*this is a test string {123}[456] - 873 ms for lower

Title: Re: Two version of upper and lower case conversions.
Post by: hutch-- on April 23, 2018, 05:11:07 PM

I think for a lot of these things that memory read/write speed is the limiting factor.

Title: Re: Two version of upper and lower case conversions.
Post by: jj2007 on April 23, 2018, 05:57:33 PM

Maybe. Here are timings for a version (attached) using also CharUpperBuff:

Code Select

This code was assembled with ml64 in 64-bit format
@#%&$+*THIS IS A TEST STRING {123}[456] - 1763 ms for CharUpperBuff
@#%&$+*THIS IS A TEST STRING {123}[456] - 296 ms for Upper$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 468 ms for upper
@#%&$+*this is a test string {123}[456] - 359 ms for Lower$()
@#%&$+*this is a test string {123}[456] - 281 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 1747 ms for CharUpperBuff
@#%&$+*THIS IS A TEST STRING {123}[456] - 281 ms for Upper$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 452 ms for upper
@#%&$+*this is a test string {123}[456] - 359 ms for Lower$()
@#%&$+*this is a test string {123}[456] - 296 ms for lower

@#%&$+*THIS IS A TEST STRING {123}[456] - 1763 ms for CharUpperBuff
@#%&$+*THIS IS A TEST STRING {123}[456] - 297 ms for Upper$()
@#%&$+*THIS IS A TEST STRING {123}[456] - 452 ms for upper
@#%&$+*this is a test string {123}[456] - 374 ms for Lower$()
@#%&$+*this is a test string {123}[456] - 375 ms for lower

It's horribly slow, probably using a table for the LOCALE. I have tried to think of how to use SIMD instructions here, but it's not so easy because of the range restrictions. For example:

Code Select

  mov rdi, offset srcdest
  movups xmm0, spaces
  movups xmm1, OWORD ptr src
  orps xmm1, xmm0
  movups OWORD ptr srcdest, xmm1

Output:

Code Select

@#%&$+*This is A
`#%&$+*this is a

Almost perfect ;)

The MASM Forum

Microsoft 64 bit MASM => MASM64 SDK => Topic started by: hutch-- on April 23, 2018, 01:14:35 PM