Hi guys
Anyone succeeded to create a string case converter in SSE2 ? I found one here https://gist.github.com/easyaspi314/9d31e5c0f9cead66aba2ede248b74d64
But it is very confusing and also for x64 only.
The goal is convert a string to upper case or lowercase with SSE2 in 32 bits
Very simple:
include \masm32\MasmBasic\MasmBasic.inc
Or32 OWORD 20202020202020202020202020202020h
Init
Cls 3
Let esi="A SHORT STRING"
Let edi="This is the destination buffer"
movdqu xmm0, oword ptr [esi]
movups xmm1, Or32
orps xmm0, xmm1
movdqu [edi], xmm0
PrintLine "src= [", esi, "]"
PrintLine "dest=[", edi, "]"
EndOfCode
Output:
src= [A SHORT STRING]
dest=[a short string ination buffer]
Minor problem: you have to find a way to move 14 bytes from an xmmreg to memory :cool:
Hi JJ
Many tks.
So, to upper case we need only to xorps after oring, right ?
Lower Case
[ToLowerTbl: Q$ 020_20_20_20_20_20_20_20, 020_20_20_20_20_20_20_20]
Proc SSEToLower:
Arguments @pString, @pOutput
mov esi D@pString
movdqu xmm0 X$esi
movdqu xmm1 X$ToLowerTbl
orps xmm0 xmm1
mov esi D@pOutput
movups X$esi xmm0
EndP
[StringInput: B$ "hEllo", 0]
[OutputString: B$ 0 #128]
call SSEToLower StringInput, OutputString
OutputString: B$ "hello", 0 ---> in fact will add spaces 020, but this was just for me understand if it´s similar to regular xe86 but using xor
Proc SSEToUpper:
Arguments @pString, @pOutput
mov esi D@pString
movdqu xmm0 X$esi
movdqu xmm1 X$ToLowerTbl
orps xmm0 xmm1
xorps xmm0 xmm1
mov esi D@pOutput
movups X$esi xmm0
EndP
[StringInput: B$ "hEllo", 0]
[OutputString: B$ 0 #128]
call SSEToUpper StringInput, OutputString
OutputString: B$ "HELLO", 0 ---> in fact will add spaces 020, but this was just for me understand if it´s similar to regular xe86 but using xor
One question, how to prevent changing the case for other chars, such as numbers, or ? | _ etc etc ?
I gave a try trying to convert to masks with PCMPGTB, but it've got nowhere. It identified chars bigger then 'Z' and masked the byte positions as 0FF, but i could´nt be able to check their positions and convert only the needed char to lower or upper case.
For example, say i have the string. "Hello 123 i'm doing this. How are you ?" We have non Ascii chars ' ? . and numbers 1 2 3. in the middle of the text. How to convert all to uppercase (except numbers, and non Ansi chars) ?
Quote from: jj2007 on August 20, 2023, 11:06:06 AMMinor problem: you have to find a way to move 14 bytes from an xmmreg to memory :cool:
It seems you didn't catch the irony...
And it's only one of your problems:
include \masm32\MasmBasic\MasmBasic.inc
Init
Let esi="Введите текст здесь: Enter text here in Russian"
PrintLine esi
PrintLine Upper$(esi)
EndOfCode
Output:
Введите текст здесь: Enter text here in Russian
ВВЕДИТЕ ТЕКСТ ЗДЕСЬ: ENTER TEXT HERE IN RUSSIAN
Hi JJ.
About the 14 bytes.. :bgrin: :bgrin: :bgrin: I didn´t thought it was a irony. But you can do it copying all that left from 16 bytes to stack and at the end of the routine copy the remainder bytes from the stack onto the outputted memory buffer where the actual string will be converted. That´s what the macro Structure @TmpStorage 32, @TmpStringDis 0 is for. It allocates 32 bytes on the stack to handle the situations when the string is smaller then 16 bytes (or exceeeds it). This also prevents crashing since we are copying to the output buffer only the necessary converted chars without having to worry if the last 16 bytes are outside the allocated memory address on output
I succeeded to make 2 functions that works, but has some issues yet in some chars (Special for latin, such as ç ã õ etc ? But i guess this is the path to do it.I made those functions as:
StringtoLower
[<16 ToCase_asciiA: Q$ 040_40_40_40_40_40_40_40, 040_40_40_40_40_40_40_40] ; 'A'-1
[<16 ToCase_asciiZ: Q$ 05B_5B_5B_5B_5B_5B_5B_5B, 05B_5B_5B_5B_5B_5B_5B_5B] ; 'Z'+1
[<16 ToCase_Diff: Q$ 020_20_20_20_20_20_20_20, 020_20_20_20_20_20_20_20] ; 'a'-'A'
Proc StringtoLower:
Arguments @pString, @pOutput
Local @StringLenght
Structure @TmpStorage 32, @TmpStringDis 0
Uses esi, edi
mov edi D@pOutput
mov esi D@pString
call StrLen esi
mov D@StringLenght eax
..While D@StringLenght >= 16
; input string = xmm0
movdqu xmm0 X$esi
; GreaterThanA = pcmpgtb InputString, ToCase_asciiA
; All chars => 'A' will be flagged as 0xFF. The rest will be flagged as 0. Therefore, bytes 0 to '@' (064) will be flagged as 0
movdqu xmm1 xmm0; xmm1 = InputString
pcmpgtb xmm1 X$ToCase_asciiA ; xmm1 = greaterThanA. If InputChar >= 'A', Mask1 = 0FF, Else Mask1 = 0. Mask1 = xmm1. Therefore, bytes 0 to @ ('A'-1) will be flagged as 0
; lessEqualZ = pcmpgtb ToCase_asciiZ, Final3InputString
; Now we are doing the opposite. All chars > 'Z' will be flagged as 0. Therefore, bytes 0 to 'Z' will be flagged as 0FF
movdqu xmm2 X$ToCase_asciiZ; xmm2 = X$ToLowCase_asciiZ
pcmpgtb xmm2 xmm0 ; xmm2 = lessEqualz. If InputChar <= 'Z', Mask2 = 0FF, Else Mask2 = 0. Mask2 = xmm2. Therefore, bytes '[' (091) to 255 will be flagged as 0
; Mask3 = pand lessEqualz, greaterThanA
; Char >= 'A', Flag 0FF, Else 0
; Char <= 'Z', Flag 0FF, Else 0
; We have then. Value = 0FF when Char >= 'A' and Char <= 'Z'
; and both results
pand xmm2 xmm1 ; mask3 . Now everything in between A to Z is flagged as 0FF, all the rest is 0
; toAdd = pand ToCase_Diff MAsk3
; And we finally and with our case difference (020 = 'a'-'A') to we keep on xmm1 only 020 corresponding to the flagged positions on our mask.
; So, everything flagged as 0FF will turn onto 020. Else will be 0
movdqu xmm1 X$ToCase_Diff
pand xmm1 xmm2
; added = paddb toAdd InputString
; Finally we ad those flagged bytes to our string to change the cae. or, we can simply 'or'it with orps
;paddb xmm0 xmm1 ; works with paddb, orps and xorps as well. Need to see which one is faster
;orps xmm0 xmm1
xorps xmm0 xmm1
;_mm_storeu_si128((__m128i *)str, added);
movdqu X$edi xmm0
add edi 16
add esi 16
sub D@StringLenght 16
..End_While
; calculate remainders
.If D@StringLenght > 0
mov eax D@StringLenght
movdqu xmm0 X$esi
movdqu xmm1 xmm0; xmm1 = InbputString
pcmpgtb xmm1 X$ToCase_asciiA ; xmm1 = greaterThanA
movdqu xmm2 X$ToCase_asciiZ; xmm2 = X$ToLowCase_asciiZ
pcmpgtb xmm2 xmm0 ; xmm2 = lessEqualz
pand xmm2 xmm1 ; mask
movdqu xmm1 X$ToCase_Diff
pand xmm1 xmm2
;paddb xmm0 xmm1
xorps xmm0 xmm1
mov esi D@TmpStorage | mov D$esi+eax 0
movdqu X$esi xmm0
; ready to copy the remainders
L3: movsb | dec eax | jnz L3<
.End_If
mov eax D@StringLenght
mov B$edi 0
EndP
StringtoUpper
[<16 ToLowCase_asciiA: Q$ 060_60_60_60_60_60_60_60, 060_60_60_60_60_60_60_60] ; 'a'-1
[<16 ToLowCase_asciiZ: Q$ 07B_7B_7B_7B_7B_7B_7B_7B, 07B_7B_7B_7B_7B_7B_7B_7B] ; 'z'+1
Proc StringtoUpper:
Arguments @pString, @pOutput
Local @StringLenght
Structure @TmpStorage 32, @TmpStringDis 0
Uses esi, edi
mov edi D@pOutput
mov esi D@pString
call StrLen esi
mov D@StringLenght eax
..While D@StringLenght >= 16
; input string = xmm0
movdqu xmm0 X$esi
; GreaterThanA = pcmpgtb InputString, ToCase_asciiA
; All chars => 'a' will be flagged as 0xFF. The rest will be flagged as 0. Therefore, bytes 0 to ''' (096) will be flagged as 0
movdqu xmm1 xmm0; xmm1 = InputString
pcmpgtb xmm1 X$ToLowCase_asciiA ; xmm1 = greaterThanA. If InputChar >= 'a', Mask1 = 0FF, Else Mask1 = 0. Mask1 = xmm1. Therefore, bytes 0 to 096 ('a'-1) will be flagged as 0
; lessEqualZ = pcmpgtb ToCase_asciiZ, Final3InputString
; Now we are doing the opposite. All chars > 'z' will be flagged as 0. Therefore, bytes 0 to 'z' will be flagged as 0FF
movdqu xmm2 X$ToLowCase_asciiZ; xmm2 = X$ToLowCase_asciiZ
pcmpgtb xmm2 xmm0 ; xmm2 = lessEqualz. If InputChar <= 'z', Mask2 = 0FF, Else Mask2 = 0. Mask2 = xmm2. Therefore, bytes '{' (07B) to 255 will be flagged as 0
; Mask3 = pand lessEqualz, greaterThanA
; Char >= 'a', Flag 0FF, Else 0
; Char <= 'z', Flag 0FF, Else 0
; We have then. Value = 0FF when Char >= 'a' and Char <= 'z'
; and both results
pand xmm2 xmm1 ; mask3 . Now everything in between a to z is flagged as 0FF, all the rest is 0
; toAdd = pand ToCase_Diff MAsk3
; And we finally and with our case difference (020 = 'a'-'A') to we keep on xmm1 only 020 corresponding to the flagged positions on our mask.
; So, everything flagged as 0FF will turn onto 020. Else will be 0
movdqu xmm1 X$ToCase_Diff
pand xmm1 xmm2
; added = paddb toAdd InputString
; Finally we ad those flagged bytes to our string to change the cae. or, we can simply 'or'it with orps
;psubb xmm0 xmm1 ; works psubb, xorps as well. Need to see which one is faster
xorps xmm0 xmm1
;_mm_storeu_si128((__m128i *)str, added);
movdqu X$edi xmm0
add edi 16
add esi 16
sub D@StringLenght 16
..End_While
; calculate remainders
.If D@StringLenght > 0
mov eax D@StringLenght
movdqu xmm0 X$esi
movdqu xmm1 xmm0; xmm1 = InbputString
pcmpgtb xmm1 X$ToLowCase_asciiA ; xmm1 = greaterThanA
movdqu xmm2 X$ToLowCase_asciiZ; xmm2 = X$ToLowCase_asciiZ
pcmpgtb xmm2 xmm0 ; xmm2 = lessEqualz
pand xmm2 xmm1 ; mask
movdqu xmm1 X$ToCase_Diff
pand xmm1 xmm2
;psubb xmm0 xmm1
xorps xmm0 xmm1
mov esi D@TmpStorage | mov D$esi+eax 0
movdqu X$esi xmm0
; ready to copy the remainders
L3: movsb | dec eax | jnz L3<
.End_If
mov eax D@StringLenght
mov B$edi 0
EndP
Examples of usage:
[OutputString: B$ 0 #128]
[BigText2a: B$ "[zzzzz? / \ : ; zzzzzzzzzzzzzzzzzzzzzzzzaaaaaaaaaaggTTTTTTnvd123456/", 0]
call StringtoLower BigText2a, OutputString
call StringtoUpper BigText2a, OutputString
I´ll do some tests for speed and will convert it to masm for u. Also i´ll try to see if i can find a way to fix the lain chars
Once it is all fixed, then we can try optimize even further. It also couuld to another function with "Ex" appended to the name in cases we already have precalculate the lenght of the string (which will make the functions also faster, btw)
Btw..i´ll also check for speed to see what opcodes are faster to use at the end of the convertion case. For example, if paddb, orps or xorps have significant improves from each other, or if it won´t matter which one to choose. The functions, originally uses paddb xmm0 xmm1 fror StringtoLower and psubb xmm0 xmm1 for StringtoUpper. But in both cases for this tests, i´m using xorps xmm0 xmm1 to see if it have differences on speed.
The functions where an adaptation from ones in C i found here for x64 - https://gist.github.com/easyaspi314/9d31e5c0f9cead66aba2ede248b74d64 (Although the C version seems to be slow, because all of those _mm_add_epi8 etc etc, takes lots of instructions to work (at least on gcc from https://godbolt.org)
About the Unicode version, i´m not there yet. But, it seems that at least for russian, the difference between upper and lowercase is also 32 bytes - https://en.wikipedia.org/wiki/Russian_alphabet But i´ll try it later after i test the speed of all of this and see if i can do something about the Latin chars
Quote from: guga on August 21, 2023, 04:15:49 AMHi JJ.
About the 14 bytes.. :bgrin: :bgrin: :bgrin: I didn´t thought it was a irony. But you can do it copying all that left from 16 bytes to stack and at the end of the routine copy the remainder bytes from the stack onto the outputted memory buffer where the actual string will be converted.
Yes, but copying 14 bytes from an XMM register is very, very slow. You may say that it doesn't matter if the string is one megabyte long, but did it ever happen to you (or anyone else) that one megabyte of text had to be converted to UPPERCASE?
Quote from: jj2007 on August 21, 2023, 04:50:47 AMQuote from: guga on August 21, 2023, 04:15:49 AMHi JJ.
About the 14 bytes.. :bgrin: :bgrin: :bgrin: I didn´t thought it was a irony. But you can do it copying all that left from 16 bytes to stack and at the end of the routine copy the remainder bytes from the stack onto the outputted memory buffer where the actual string will be converted.
Yes, but copying 14 bytes from an XMM register is very, very slow. You may say that it doesn't matter if the string is one megabyte long, but did it ever happen to you (or anyone else) that one megabyte of text had to be converted to UPPERCASE?
Hi JJ
It can be a bit slow, because it will, at the end do a byte by byte copy of whatever amount of bytes smaller then 16.
This part, right ?
movdqu X$esi xmm0
L3: movsb | dec eax | jnz L3<
But....we may overcome this calculating at the beginning of the function if the remainder is a multiple of 8, 4, 2 and precalculate the remainder of remainder (and perhaps using non SSE registers just for those data lesser then 16 bytes). Not sure if it will speed up the cases where we are changing the case of small strings, but this can be tested later when we try to optimize it.
Btw...on my AMD, using paddb xmm0 xmm1 for StringtoLower and psubb xmm0 xmm1 for StringtoUpper is a bit faster then using xorps or orps. Nothing too fast, something around 1% or less, but it may count for something when the functions be optimized further.
I´ll convert those simple versions to masm now to you test and then will see how to make it work for latin string. And later a unicode version should also be needed :azn:
Masm versions:
Data used on both:
ToCase_asciiA xmmword 40404040404040404040404040404040h
ToCase_asciiZ xmmword 5B5B5B5B5B5B5B5B5B5B5B5B5B5B5B5Bh
ToCase_Diff xmmword 20202020202020202020202020202020h
ToLowCase_asciiA xmmword 60606060606060606060606060606060h
ToLowCase_asciiZ xmmword 7B7B7B7B7B7B7B7B7B7B7B7B7B7B7B7Bh
StringtoUpper
StringtoUpper proc near ; CODE XREF: start+26↑p
; .text:00404DD7↑j
TmpStorage = dword ptr -8
StringLenght = dword ptr -4
pString = dword ptr 8
pOutput = dword ptr 0Ch
push ebp
mov ebp, esp
sub esp, 4
sub esp, 24h
mov [ebp+TmpStorage], esp
push esi
push edi
mov edi, [ebp+pOutput]
mov esi, [ebp+pString]
push esi
call StrLen
mov [ebp+StringLenght], eax
loc_404DFD: ; CODE XREF: StringtoUpper+65↓j
cmp [ebp+StringLenght], 10h
jb loc_404E4A
movdqu xmm0, xmmword ptr [esi]
movdqu xmm1, xmm0
pcmpgtb xmm1, ToLowCase_asciiA
movdqu xmm2, ToLowCase_asciiZ
pcmpgtb xmm2, xmm0
pand xmm2, xmm1
movdqu xmm1, ToCase_Diff
pand xmm1, xmm2
psubb xmm0, xmm1
movdqu xmmword ptr [edi], xmm0
add edi, 10h
add esi, 10h
sub [ebp+StringLenght], 10h
jmp loc_404DFD
; ---------------------------------------------------------------------------
loc_404E4A: ; CODE XREF: StringtoUpper+21↑j
cmp [ebp+StringLenght], 0
jbe loc_404E99
mov eax, [ebp+StringLenght]
movdqu xmm0, xmmword ptr [esi]
movdqu xmm1, xmm0
pcmpgtb xmm1, ToLowCase_asciiA
movdqu xmm2, ToLowCase_asciiZ
pcmpgtb xmm2, xmm0
pand xmm2, xmm1
movdqu xmm1, ToCase_Diff
pand xmm1, xmm2
psubb xmm0, xmm1
mov esi, [ebp+TmpStorage]
mov dword ptr [eax+esi], 0
movdqu xmmword ptr [esi], xmm0
loc_404E95: ; CODE XREF: StringtoUpper+B7↓j
movsb
dec eax
jnz short loc_404E95
loc_404E99: ; CODE XREF: StringtoUpper+6E↑j
mov eax, [ebp+StringLenght]
mov byte ptr [edi], 0
pop edi
pop esi
mov esp, ebp
pop ebp
retn 8
StringtoUpper endp
StringtoLower
StringtoLower proc near ; CODE XREF: start+17↑p
; .text:00404D0B↑j
TmpStorage = dword ptr -8
StringLenght = dword ptr -4
pString = dword ptr 8
pOutput = dword ptr 0Ch
push ebp
mov ebp, esp
sub esp, 4
sub esp, 24h
mov [ebp+TmpStorage], esp
push esi
push edi
mov edi, [ebp+pOutput]
mov esi, [ebp+pString]
push esi
call StrLen
mov [ebp+StringLenght], eax
loc_404D2D: ; CODE XREF: StringtoLower+65↓j
cmp [ebp+StringLenght], 10h
jb loc_404D7A
movdqu xmm0, xmmword ptr [esi]
movdqu xmm1, xmm0
pcmpgtb xmm1, ToCase_asciiA
movdqu xmm2, ToCase_asciiZ
pcmpgtb xmm2, xmm0
pand xmm2, xmm1
movdqu xmm1, ToCase_Diff
pand xmm1, xmm2
paddb xmm0, xmm1
movdqu xmmword ptr [edi], xmm0
add edi, 10h
add esi, 10h
sub [ebp+StringLenght], 10h
jmp loc_404D2D
; ---------------------------------------------------------------------------
loc_404D7A: ; CODE XREF: StringtoLower+21↑j
cmp [ebp+StringLenght], 0
jbe loc_404DC9
mov eax, [ebp+StringLenght]
movdqu xmm0, xmmword ptr [esi]
movdqu xmm1, xmm0
pcmpgtb xmm1, ToCase_asciiA
movdqu xmm2, ToCase_asciiZ
pcmpgtb xmm2, xmm0
pand xmm2, xmm1
movdqu xmm1, ToCase_Diff
pand xmm1, xmm2
paddb xmm0, xmm1
mov esi, [ebp+TmpStorage]
mov dword ptr [eax+esi], 0
movdqu xmmword ptr [esi], xmm0
loc_404DC5: ; CODE XREF: StringtoLower+B7↓j
movsb
dec eax
jnz short loc_404DC5
loc_404DC9: ; CODE XREF: StringtoLower+6E↑j
mov eax, [ebp+StringLenght]
mov byte ptr [edi], 0
pop edi
pop esi
mov esp, ebp
pop ebp
retn 8
StringtoLower endp
Additional function
StrLen
StrLen proc near
pString = dword ptr 8
push ebp
mov ebp, esp
push ecx
xorps xmm0, xmm0
mov ecx, [ebp+pString]
loc_4086BA: ; CODE XREF: StrLen+1B↓j
movups xmm1, xmmword ptr [ecx]
pcmpeqb xmm0, xmm1
add ecx, 10h
pmovmskb eax, xmm0
test ax, ax
jz short loc_4086BA
sub ecx, [ebp+pString]
add ecx, 0FFFFFFF0h
bsf ax, ax
add eax, ecx
pop ecx
mov esp, ebp
pop ebp
retn 4
StrLen endp
JJ, i created a extended version of it named StringtoUpperEx that contains one additional parameter where we insert a precalculated lenght of the string.
For my surprise, the StringtoUpperEx is extremelly fast
Text to convert:
[BigText2a: B$ "[zzzzz? / \ : ; zzzzzzzzzzzzzzzzzzzzzzzzaaaaaaaaaaggTTTTTTnvd123456/", 0]
On the normal version (with strlen inside to calculate the lenght of the string), it takes 72.84 clock cycles, while on the extended version (precalculated strlen), 36.33 clock cycles.
Normal version
The fastest results was found in Algo method: 2
Value: 72.84205172214122 clocks
Standard Deviation Results
Mean: 74.43649916350911 clocks
Max (STD Population): 76.02582798029908 clocks
Min (STD Population): 72.84717034671915 clocks
Variance (STD Population): 0.70297743260073 clocks
Standard Deviation (STD Population): 1.58932881678996 clocks
Max (STD Sample): 76.03094660487702 clocks
Min (STD Sample): 72.84205172214122 clocks
Variance (STD Sample): 0.70751277087558 clocks
Standard Deviation (STD Sample): 1.59444744136790 clocks
Extended version
The fastest results was found in Algo method: 2
Value: 36.33179307783062 clocks
Standard Deviation Results
Mean: 38.30579268930184 clocks
Max (STD Population): 40.27649955517551 clocks
Min (STD Population): 36.33508582342818 clocks
Variance (STD Population): 1.07834219136009 clocks
Standard Deviation (STD Population): 1.97070686587366 clocks
Max (STD Sample): 40.27979230077307 clocks
Min (STD Sample): 36.33179307783062 clocks
Variance (STD Sample): 1.08194868698336 clocks
Standard Deviation (STD Sample): 1.97399961147123 clocks
Of course, those are only very preliminary tests, since i did not optimized the code and didn´t found a way yet to check for the latin chars. But it seems promising in terms of speed and accuracy for both versions.
From the result of this tests, the extended version has a variance a bit bigger then the normal one, what could indicate room for more optimization and fix alignment problems or caching of sse registers, perhaps ?
If i could be able to reduce something around 50% in speed on each one of them, i can then add one more parameters to be used as a flag from where the user can activate the latin mode or not (for ç, ã, õ, ô, é, è etc etc).
I´ll then can be able to take a look on the other problem you told about the 14 bytes from XMM to memory. There is aa way to do it, but i donpt know yet, if it will affect performance. I´ll try to do it tonight or tomorrow
Found a way to simulate the pcmpeqb routine when using only 4 bytes to perform the case conversion. On my tests, although this routine is faster then the regular low case conversions by performing a byte by byte scan and xoring only the bytes that are inside the chain of 'A' to 'Z', it is slower then the ones i did right now that uses a combination of SIMD instruction to regular x86, in order to try to maximize the performance on both cases
JJ. on my tests, i suceeded to overcome the speed problem when computing only 14 or 15 (or less) bytes and copying from xmm registers to regular x86 registers were no longer a problem (at least on this preliminary tests). I´m cleaning this whole beast and will upload it here today to we test. (Both syntaxes, masm and rosasm)
Btw, the function below has a small error when saving the bytes to edi. It was missing a saving point on the 2nd step, but since i won´t use this any longer (its slower then the other technique i did) i´m only putting it here to you see how can this be done without SIMD that perhaps maybe faster then the regular ways to do it. The function below is not part of the new algo and i didn´t fix it, i´m just putting it here so i won´t forget the steps and tests i was doing when trying to optimize the major functions.
Proc ChangeCaseShort:
Arguments @pString, @StringLenght, @Output
Local @RemainderBytes, @LoopCount
Uses edi, esi, ebx, edx, ecx
mov esi D@pString;$BigText2a
mov edi D@Output
mov eax D@StringLenght; | shr eax 2 ; divide by 4. ecx now is the counter of multiple of 4 bytes
and eax 0-4 | mov ecx eax | xor eax D@StringLenght | mov D@RemainderBytes eax; | jz L1> ; When 0 means lenght is divisible by 16 and we have no remainders, jmp over to the main function
shr ecx 2 | jz L1>
.Do
mov eax D$esi
mov ebx eax
mov edx eax
xor edx 020202020
or ebx 05F5F5F5F ; 05F = (91 in decimal) = or all chars from A to Z = A or B or C or D ... Z = 05F (In hexa)
and edx ebx
sub edx eax
; create the mask. Why 07F on each byte ? Because the mask needs only the 8th bit settled. So we invert al bits resulting in 07F (00__0111_1111)
; Doing that, whatever byte had 0FF as a mask will be zeroed and all others will have whatever byte it is, but with only the 8th bit disaabled
and edx 07F7F7F7F | xor edx 07F7F7F7F
xor ebx ebx ; create our mask now in ebx
Test_If_Not dl dl ; compare each byte to see if it is zeroed or not. If zero we or it with 0FF atr the given position, thus creating our mask in ebx
or ebx 0FF
Test_End
Test_If_Not dh dh
or ebx 0FF_00
Test_End
shr edx 16
Test_If_Not dl dl
or ebx 0FF_00_00
Test_End
Test_If_Not dh dh
or ebx 0FF_00_00_00
Test_End
and ebx 020202020
add eax ebx ; convert to lowercase
mov D$edi eax
add esi 4
add edi 4
dec ecx
;.Loop_Until ecx = 0
.Repeat_Until_Zero ecx
add esi D@RemainderBytes
add edi D@RemainderBytes
L1:
mov ecx D@RemainderBytes
mov eax D$esi
mov ebx eax
mov edx eax
xor edx 020202020
or ebx 05F5F5F5F ;05F 091 or A to Z
and edx ebx
sub edx eax
; create the mask. Why 07F on each byte ? Because the mask needs only the 8th bit settled. So we invert al bits resulting in 07F (00__0111_1111)
; Doing that, whatever byte had 0FF as a mask will be zeroed and all others will have whatever byte it is, but with only the 8th bit disaabled
and edx 07F7F7F7F | xor edx 07F7F7F7F
xor ebx ebx ; create our mask now in ebx
Test_If_Not dl dl ; compare each byte to see if it is zeroed or not. If zero we or it with 0FF atr the given position, thus creating our mask in ebx
or ebx 0FF
Test_End
Test_If_Not dh dh
or ebx 0FF_00
Test_End
shr edx 16
Test_If_Not dl dl
or ebx 0FF_00_00
Test_End
Test_If_Not dh dh
or ebx 0FF_00_00_00
Test_End
and ebx 020202020
add eax ebx ; convert to lowercase
mov B$edi al | dec ecx | jz L2>
inc edi | mov B$edi ah | dec ecx | jz L2>
shr eax 16 | inc edi | mov B$edi al | dec ecx | jz L2>
mov B$edi 0
L2:
EndP
When you are ready to test "Введите текст здесь: Enter text here in Russian", let me know.
Hi JJ
Ok. I´m quite finished with the regular Ansi version. Only a minor comments to do and create a variation for usage in latin chars (For Portuguese, french, italian, Spanish - dunno if German has accents in ANSI to perform the case change yet)
So, at the end we will have 2 versions
StringtoLower for regular A to Z / a to z chars
StringtoLowerEx for accents in portuguese, spanish, italian, french etc
About russian and other Unicode version (The ending with W thing) it will be a bit harder. Russian do have uppercase chars, and it seems that some of them also have a difference of 020 bytes, but others seems to have a difference of only 1 byte ?
https://www.unicode.org/charts/
I´m not there yet on the Unicode version, but i´m close to a solution (At least Unicode and not UTF16 etc we discussed before - Russian is UTF16, right ?)
On the Ansi version i´ll do a small parameter to handle extra info, such as language and i´ll add at least identification of accents for Portuguese, Italian, french, Spanish (or German if it do have accents or is ANSI as well). perhaps the same can be done for Russian. I don´t know yet.
More info i´m researching is at:
https://www.optilingo.com/blog/french/french-accent-marks/
https://www.busuu.com/en/french/accent-marks
https://www.fluentin3months.com/french-accent-marks/?expand_article=1
https://studyspanish.com/typing-spanish-accents
https://en.wiktionary.org/wiki/Appendix:Spanish_alphabet
https://en.wikipedia.org/wiki/Russian_alphabet
Quote from: guga on August 23, 2023, 07:58:35 AMRussian is UTF16, right ?
Russian can be handled with Utf-8, Utf-16 or its native codepage. My text is Utf-8, and so is 90% of the Internet.
Russian is only one example. You know I am a great fan of speed, but for Upper$() (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1162) I chose a slow algo: MultiByteToWideChar (https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar) works under the hood, slow but versatile and reliable.
One question
About MultiByteToWideChar , you use different codepages for the different languages or only UTF8 ?
I don´t know exactly how strings in UUTF8 (russian, japanese etc) works, but most of them have the same difference of 32 bytes between the cases (upper or small). Maybe the opensource version of MultiByteToWideChar for wine or ReactOS can give more clues how to use other languages on a faster way.
The problem is identifying the language, but if (and it is a big IF) unicode chars has the difference on only 32 bytes (with few exceptions, we can speed up using a table of languages for exceptional chars or situations ?
https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers
Another thing. Perhaps this can give more clues how to do it
https://www.coderstool.com/utf16-encoding-decoding
Put your text there ""Введите текст здесь: Enter text here in Russian", let me know. "
and the click on convert and then convert to uppercase lowercase. We can use the results to identify what are the differences between each chars