News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Reading the Characters of a Wide String

Started by Zen, June 02, 2016, 09:53:58 AM

Previous topic - Next topic

Zen

Hi, MASMers,
I'm writing a program in which I must read the characters in a wide (Unicode string) that has been returned from a COM interface method.
The string has a format like this: "v4.0.30319". It represents the version of a .NET Framework Runtime installed on my computer. The rest of the program runs OK,...no problem. I assumed that this would be simple, so, I'm just trying to write a reliable, but, simple procedure. I've already completely screwed it up and I need help. :icon_eek:
Initially, I thought, I could just load the address of the string and read it two bytes at a time, using CMP with an immediate value and a jump based on the flag value. The way I did it didn't work correctly, so I switched to reading the string, byte by byte. And, this works. I did something like this: 

     mov ebx, 0   
     mov esi, AlphaWideStr    ;    AlphaWideStr is the address of the wide string.   
     mov bl, BYTE PTR [esi]    ;    copy one byte of data from the beginning of wide string.   
     .IF bl==76h    ;    The character "v" is equivalent to 76h.   
     JMP CheckNext   
     .ENDIF   
     mov eax, 0    ;    return error code, if routine fails.   
     RET   
     
CheckNext:
     mov ebx, 0   
     INC esi   
     mov bl, BYTE PTR [esi]   
     .IF bl==0
     JMP testOK
     .ENDIF   
     mov eax, 0    ;     return error code, if routine fails.         
     RET 

testOK:
     mov eax, 40h    ;    Indicated success detecting initial "v" character.     
     RET   


...And, then I just check the return code for either zero (failure), or 40h (success). This works just fine, but, what I'd like to do is read the string two bytes at a time, so, initially I tried code like this: 

     mov ebx, 0   
     mov esi, AlphaWideStr   
     mov ebx, [esi]     ;     Copy the first 4 bytes of the wide string into the register.   
     SHR ebx, 16    ;    Shift to right by 16 bits, leaving the first 2 bytes of the string characters in the register.   
     CMP bx, 7600h    ;    76h is "v". In unicode, the two bytes should look like this: 76 00   
     JZ testOK
     mov eax, 0    ;    return error code, if routine fails.   
     RET   
testOK:
     mov eax, 40h    ;    64 in decimal. Indicated success detecting initial "v" character.     
     RET   


I tried several variants of the above code and I get a failure code returned each time. For instance, I shifted the initial 4 bytes by 24 bits and then compared just what should have been just the "v" character (76h), then using a JZ instruction, to jump to the success code.  I'm obviously making an INCREDIBLY STUPID MISTAKE in my thinking. And, this is so simple. What am I doing wrong ???

jj2007

It works, but why so complicated?

include \masm32\include\masm32rt.inc
__UNICODE__=1
.code
start:
  mov esi, chr$("v4.0.30319") ; AlphaWideStr A
  .if dword ptr [esi]==340076h
print esi, " is version 4", 13, 10
  .elseif word ptr [esi]==76h
print esi, " is another version"
  .else
print esi, " is not version 4", 13, 10
  .endif

  mov esi, chr$("v5.0.30319") ; AlphaWideStr B
  .if dword ptr [esi]==340076h
print esi, " is version 4", 13, 10
  .elseif word ptr [esi]==76h
print esi, " is another version", 13, 10
  .else
print esi, " is not version 4", 13, 10
  .endif

  mov esi, chr$("X4.0.30319") ; AlphaWideStr C
  .if dword ptr [esi]==340076h
inkey esi, " is version 4"
  .elseif word ptr [esi]==76h
inkey esi, " is another version"
  .else
inkey esi, " is not version 4"
  .endif

  exit

end start


     mov ebx, [esi]     ;     Copy the first 4 bytes of the wide string into the register.   
     SHR ebx, 16    ;    Shift to right by 16 bits, leaving the first 2 bytes of the string characters in the register.   
     CMP bx, 7600h    ;    76h is "v". In unicode, the two bytes should look like this: 76 00


That shr is wrong: x86 is little-endian, i.e. the two bytes are already in bx: cmp bx, 76h should work

Zen

JOCHEN,
The function will evolve. Eventually, I must compare two or three (or more) of these version strings to determine which wide string represents the most recent .NET Framework version installed on the user's computer (and, it must be version four, or greater). The strings were returned from ICLRMetaHost.EnumerateInstalledRuntimes, then, IEnumUnknown.Next, and, finally, ICLRRuntimeInfo.GetVersionString.

Quote from: JOCHENThat shr is wrong: x86 is little-endian, i.e. the two bytes are already in bx: cmp bx, 76h should work.

AH,...HAH,...yes, that's the answer, THANKS. That little-endian/big endian stuff always drove me insane,...(they must have invented it just to destroy our brains.):dazzled:

mabdelouahab

#3
Quote from: Zen on June 02, 2016, 10:31:12 AM
... I must compare two or three (or more) of these version strings to determine which wide string represents the most recent .NET Framework version installed on the user's computer (and, it must be version four, or greater). The strings were returned from ICLRMetaHost.EnumerateInstalledRuntimes, then, IEnumUnknown.Next, and, finally, ICLRRuntimeInfo.GetVersionString.

crt_wcsncmp

invoke crt_wcsncmp ,chr$("v4.0.30319"),chr$("v1.0.3705") ,10
.IF sdword ptr eax > 0
invoke crt_wprintf,cfm$("\n is: v4.0.30319")
.ELSE
invoke crt_wprintf,cfm$("\n is: v1.0.3705")
.ENDIF

mineiro

I do not have sure, but I suppose that on utf16 a symbol can have a size of 3 bytes.

include \masm32\include\masm32rt.inc

.data
widechar db "%ws",00h
AlphaWideStr db 76h,00h,34h,00h,2Eh,00h,30h,00h,2Eh,00h,33h,00h,30h,00h,33h,00h,31h,00h,39h,00h,00h,00h
buffer db 120 dup (0)

.data?
szansi dd ?
houtput dd ?
temp dd ?

.code
start:

invoke GetStdHandle,STD_OUTPUT_HANDLE
mov houtput,eax

invoke WideCharToMultiByte,CP_UTF8,0,addr AlphaWideStr,-1,0,0,0,0
mov szansi,eax
invoke WideCharToMultiByte,CP_UTF8,0,addr AlphaWideStr,-1,addr buffer,eax,0,0

invoke WriteFile,houtput,addr buffer,szansi,temp,0

invoke wsprintf,addr buffer,addr widechar,addr AlphaWideStr
invoke WriteFile,houtput,addr buffer,eax,temp,0

    inkey
    exit

end start
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

hutch--

Zen,

Have a look at the unicode library modules in the masm32 library. They all start with "uc" in the file list. Unicode is not difficult to work with, you just read a WORD at a time rather than a BYTE. Instead of incrementing the position 1 byte at a time, add 2 instead.

Zen

MABDELOUAHAB and HUTCH,
Excellent suggestions from both of you. THANKS.

For those of you MASM Forum members that are novices (or, like me, oblivious to reality),...I've found a number of useful webpages with explanations of the terms: big endian and little endian. It's really very simple,...I just wasn't thinking when I posted my original question.
This is the best, lengthiest, and clearest explanation: Understanding Big and Little Endian Byte Order.
Here is the official Microsoft explanation: Explanation of Big Endian and Little Endian Architecture, Microsoft Support
Here is the exhaustive Wikipedia page: Endianness
...And, here is the explanation from an Assembly language tutorial: Big Endian and Little Endian