News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

wchar declaration macro

Started by Queue, July 31, 2017, 04:07:25 AM

Previous topic - Next topic

Queue

First post. I'd existed briefly 7 years ago on the older masmforum but didn't contribute much. I've mainly just been a lurker for a decade.

Anyway, I know of a few existing options for letting you declare unicode / wchar strings, but either didn't like their syntax or the need for escape characters, so this is what I came up with:
_T macro _:VARARG
_T_out textequ @CatStr(<>)
_T_len = @SizeStr(<_>)
_T_pos = 1
_T_int = 0
while _T_pos le _T_len
if _T_int
if @InStr(_T_pos,<_>,<!">) eq _T_pos
if _T_int gt 0
if @InStr(_T_pos,<_>,<"">) eq _T_pos
_T_out catstr _T_out, <,'>, @SubStr(<_>,_T_pos,1), <'>
_T_pos = _T_pos + 1
else
_T_int = 0
endif
else
_T_out catstr _T_out, <,'>, @SubStr(<_>,_T_pos,1), <'>
endif
elseif @InStr(_T_pos,<_>,<!'>) eq _T_pos
if _T_int lt 0
if @InStr(_T_pos,<_>,<''>) eq _T_pos
_T_out catstr _T_out, <,">, @SubStr(<_>,_T_pos,1), <">
_T_pos = _T_pos + 1
else
_T_int = 0
endif
else
_T_out catstr _T_out, <,">, @SubStr(<_>,_T_pos,1), <">
endif
else
_T_out catstr _T_out, <,">, @SubStr(<_>,_T_pos,1), <">
endif
_T_pos = _T_pos + 1
else
if @InStr(_T_pos,<_>,<!">) eq _T_pos
_T_int = 1
elseif @InStr(_T_pos,<_>,<!'>) eq _T_pos
_T_int = -1
elseif @InStr(_T_pos,<_>,< >) eq _T_pos
elseif @InStr(_T_pos,<_>,< >) eq _T_pos
elseif @InStr(_T_pos,<_>,<,>) eq _T_pos
elseif @InStr(_T_pos,<_>,<,>)
_T_int = @InStr(_T_pos,<_>,<,>)
_T_out catstr _T_out, <,>, @SubStr(<_>,_T_pos,_T_int-_T_pos)
_T_pos = _T_int
_T_int = 0
else
_T_out catstr _T_out, <,>, @SubStr(<_>,_T_pos)
_T_pos = _T_len
endif
_T_pos = _T_pos + 1
endif
endm
_T_out substr _T_out, 2
exitm <_T_out>
endm
L textequ <_T(>


It's used like:
wsComSpec dw L"%ComSpec%",0)
xml dw L'<?xml version="1.0"?>',0)
wsClassName dw L"_",%PROJECT,"_",0)

Note the end parenthesis at the end of each line and the L glued on before the string. If you add the L without the end parenthesis MASM will yell at you, and if you remove the L but leave the end parenthesis, MASM will yell at you. No escape characters, roughly the same behavior as MASM's native support for ascii strings (doubled quotation mark / apostrophe support). You could use this macro with BYTE or DWORD strings as well; the macro simply splits the quoted strings apart into individually quoted letters. Text equates need to be prefixed with %.

I've been using this for a while, but wanted to post it here in case it can be refined further, or in case someone would benefit from using it.

Queue

felipe

Welcome to the forum!
I like to look the old forum's archive sometimes.    :t
And thanks for your code, maybe i will try it someday.  :icon14:

jj2007

Welcome back to the forum, Queue :icon14:

Here is a full example:include \masm32\include\masm32rt.inc

_T macro _:VARARG
_T_out textequ @CatStr(<>)
_T_len = @SizeStr(<_>)
_T_pos = 1
_T_int = 0
while _T_pos le _T_len
if _T_int
if @InStr(_T_pos,<_>,<!">) eq _T_pos
if _T_int gt 0
if @InStr(_T_pos,<_>,<"">) eq _T_pos
_T_out catstr _T_out, <,'>, @SubStr(<_>,_T_pos,1), <'>
_T_pos = _T_pos + 1
else
_T_int = 0
endif
else
_T_out catstr _T_out, <,'>, @SubStr(<_>,_T_pos,1), <'>
endif
elseif @InStr(_T_pos,<_>,<!'>) eq _T_pos
if _T_int lt 0
if @InStr(_T_pos,<_>,<''>) eq _T_pos
_T_out catstr _T_out, <,">, @SubStr(<_>,_T_pos,1), <">
_T_pos = _T_pos + 1
else
_T_int = 0
endif
else
_T_out catstr _T_out, <,">, @SubStr(<_>,_T_pos,1), <">
endif
else
_T_out catstr _T_out, <,">, @SubStr(<_>,_T_pos,1), <">
endif
_T_pos = _T_pos + 1
else
if @InStr(_T_pos,<_>,<!">) eq _T_pos
_T_int = 1
elseif @InStr(_T_pos,<_>,<!'>) eq _T_pos
_T_int = -1
elseif @InStr(_T_pos,<_>,< >) eq _T_pos
elseif @InStr(_T_pos,<_>,< >) eq _T_pos
elseif @InStr(_T_pos,<_>,<,>) eq _T_pos
elseif @InStr(_T_pos,<_>,<,>)
_T_int = @InStr(_T_pos,<_>,<,>)
_T_out catstr _T_out, <,>, @SubStr(<_>,_T_pos,_T_int-_T_pos)
_T_pos = _T_int
_T_int = 0
else
_T_out catstr _T_out, <,>, @SubStr(<_>,_T_pos)
_T_pos = _T_len
endif
_T_pos = _T_pos + 1
endif
endm
_T_out substr _T_out, 2
exitm <_T_out>
endm
L textequ <_T(>

.data
wsComSpec dw L"%ComSpec%",0)
xml dw L'<?xml version="1.0"?>',0)
wsClassName dw L"_",%PROJECT,"_",0)

.code
start:
  invoke MessageBoxW, 0, offset wsComSpec, offset xml, MB_OK
  exit

end start

aw27

Quote from: Queue on July 31, 2017, 04:07:25 AM
First post. I'd existed briefly 7 years ago on the older masmforum but didn't contribute much. I've mainly just been a lurker for a decade.

Anyway, I know of a few existing options for letting you declare unicode / wchar strings, but either didn't like their syntax or the need for escape characters, so this is what I came up with:
_T macro _:VARARG
_T_out textequ @CatStr(<>)
_T_len = @SizeStr(<_>)
_T_pos = 1
_T_int = 0
while _T_pos le _T_len
if _T_int
if @InStr(_T_pos,<_>,<!">) eq _T_pos
if _T_int gt 0
if @InStr(_T_pos,<_>,<"">) eq _T_pos
_T_out catstr _T_out, <,'>, @SubStr(<_>,_T_pos,1), <'>
_T_pos = _T_pos + 1
else
_T_int = 0
endif
else
_T_out catstr _T_out, <,'>, @SubStr(<_>,_T_pos,1), <'>
endif
elseif @InStr(_T_pos,<_>,<!'>) eq _T_pos
if _T_int lt 0
if @InStr(_T_pos,<_>,<''>) eq _T_pos
_T_out catstr _T_out, <,">, @SubStr(<_>,_T_pos,1), <">
_T_pos = _T_pos + 1
else
_T_int = 0
endif
else
_T_out catstr _T_out, <,">, @SubStr(<_>,_T_pos,1), <">
endif
else
_T_out catstr _T_out, <,">, @SubStr(<_>,_T_pos,1), <">
endif
_T_pos = _T_pos + 1
else
if @InStr(_T_pos,<_>,<!">) eq _T_pos
_T_int = 1
elseif @InStr(_T_pos,<_>,<!'>) eq _T_pos
_T_int = -1
elseif @InStr(_T_pos,<_>,< >) eq _T_pos
elseif @InStr(_T_pos,<_>,< >) eq _T_pos
elseif @InStr(_T_pos,<_>,<,>) eq _T_pos
elseif @InStr(_T_pos,<_>,<,>)
_T_int = @InStr(_T_pos,<_>,<,>)
_T_out catstr _T_out, <,>, @SubStr(<_>,_T_pos,_T_int-_T_pos)
_T_pos = _T_int
_T_int = 0
else
_T_out catstr _T_out, <,>, @SubStr(<_>,_T_pos)
_T_pos = _T_len
endif
_T_pos = _T_pos + 1
endif
endm
_T_out substr _T_out, 2
exitm <_T_out>
endm
L textequ <_T(>


It's used like:
wsComSpec dw L"%ComSpec%",0)
xml dw L'<?xml version="1.0"?>',0)
wsClassName dw L"_",%PROJECT,"_",0)

Note the end parenthesis at the end of each line and the L glued on before the string. If you add the L without the end parenthesis MASM will yell at you, and if you remove the L but leave the end parenthesis, MASM will yell at you. No escape characters, roughly the same behavior as MASM's native support for ascii strings (doubled quotation mark / apostrophe support). You could use this macro with BYTE or DWORD strings as well; the macro simply splits the quoted strings apart into individually quoted letters. Text equates need to be prefixed with %.

I've been using this for a while, but wanted to post it here in case it can be refined further, or in case someone would benefit from using it.

Queue

Hello Queue,

In reality there are no macros to produce Unicode Strings, what you find are macros that replace ASCII characters with their UTF16 counterparts, which consists of just adding a zero to the ASCII char.

Try to use that macro with strings like this and you will see what you get (garbage):

昨天上午", L"三分钟

The trick to produce real UTF16 unicode, the way compilers like Visual Studio do is:
1) Save source in UTF8. Actually VS alerts you for that when it sees non-local characters in the source.
2) At compile time, it translates all UTF8 to UTF16 and places it in the DATA section.

There are no miracles and no way to produce omelettes without eggs.

So far there are no assembler doing that, but I hope the guys that develop UASM may do it someday.






nidud

#4
deleted

aw27

Quote from: nidud on July 31, 2017, 09:35:57 PM
:biggrin:

This is how Microsoft, GCC, and others implemented the use of TCHAR.

Let me see:

Source code saved as UTF8:

MessageBox(NULL, L"昨天上午", L"三分钟", 0);

Disassembles to:

011616C2  push        offset string L"\x4e09\x5206\x949f" (01166B30h) 
011616C7  push        offset string L"\x6628\x5929\x4e0a\x5348" (01166BDCh) 
011616CC  push        0 
011616CE  call        dword ptr [__imp__MessageBoxW@16 (01169098h)] 

:bgrin:

nidud

#6
deleted

aw27

It does not matter whether people pronounce potatoes or potatos, in the end most people will write potatoes. But beware of people who pronounce potatoes and write potatos just because the plural of burrito is burritos:bgrin:

jj2007

include \Masm32\MasmBasic\Res\JBasic.inc        ; OPT_64 1
Init
  wMsgBox 0, "كثير من الناس ما زالوا يعتقدون أن الأرض مسطحة.", "Assembled with ML64:", MB_OK
EndOfCode

TWell

Quote from: nidud on July 31, 2017, 10:07:49 PM
:biggrin:

Quote from: nidud on July 31, 2017, 09:35:57 PM
Yes, garbage in garbage out.

However, this only apply to people without a keyboard so it strictly not a programming issue.
i can copy & paste with keyboard too;)
but compilers usually read just a file.

nidud

#10
deleted

TWell

Quote from: nidud on July 31, 2017, 11:17:56 PM

Quote from: TWell on July 31, 2017, 11:10:48 PM
i can copy & paste with keyboard too;)
but compilers usually read just a file.

Yes, international code points do need some handling, but the point (still) is that this is not a problem for your native language whatever that is.
You mean PC's current language, as my native language don't define what i am programming and for what language.
And Windows virtual keyboard support quite many languages.

aw27

Quote from: jj2007 on July 31, 2017, 10:46:15 PM
include \Masm32\MasmBasic\Res\JBasic.inc        ; OPT_64 1
Init
  wMsgBox 0, "كثير من الناس ما زالوا يعتقدون أن الأرض مسطحة.", "Assembled with ML64:", MB_OK
EndOfCode


Behind the scenes you get a pointer to MultibyteToWideChar through GetprocAddress.
You are so smart.  :lol:

nidud

#13
deleted

jj2007

Quote from: aw27 on August 01, 2017, 12:49:30 AMBehind the scenes you get a pointer to MultibyteToWideChar through GetprocAddress.

Exactly :t

But I see you have similar ideas:
Quote from: aw27 on July 31, 2017, 07:01:40 PM
The trick to produce real UTF16 unicode, the way compilers like Visual Studio do is:
1) Save source in UTF8. Actually VS alerts you for that when it sees non-local characters in the source.
2) At compile time, it translates all UTF8 to UTF16 and places it in the DATA section.

I tried that some years ago, but decided for the runtime solution because it produces smaller executables in most cases.