wchar declaration macro

nidud · August 01, 2017, 01:20:33 AM

deleted

aw27 · August 01, 2017, 01:22:48 AM

Quote from: jj2007 on August 01, 2017, 01:08:52 AM
I tried that some years ago, but decided for the runtime solution because it produces smaller executables.

I meant that it is a job for an assembler do at assembly time, not for a macro supported by an undocumented library do at runtime. I believe the UASM team can do it and it will be much more useful than other things they have been spending time on and nobody really asked for, like the OOPs stuff.

TWell · August 01, 2017, 01:45:38 AM

Quote from: jj2007 on August 01, 2017, 01:08:52 AM
I tried that some years ago, but decided for the runtime solution because it produces smaller executables.

It means conversion code size vs string length ;)

jj2007 · August 01, 2017, 04:21:08 AM

Quote from: nidud on August 01, 2017, 01:20:33 AMIf the compiler or assembler do the conversion no code is added at all.

Right, no code. But it creates WORDs in the initialised data section where, in the case of the Latin alphabet, BYTEs would be enough.

Quote from: TWell on August 01, 2017, 01:45:38 AM
It means conversion code size vs string length ;)

Exactly :t

Passing a UTF8 string to the converter costs 11 bytes, minus 5 bytes that a mov offset xx would take, so if on average strings are longer than 6 bytes including the zero terminator, and you have many strings, executable size will shrink. Not that it usually matters, but that is the logic. Not a good one, btw, for a library designed for China.

Code Select

include \masm32\MasmBasic\MasmBasic.inc

.data
txTest	db "Test", 0
txTestW	db "T", 0, "e", 0, "s", 0, "t", 0, 0

  Init

asstime_s:
  mov ecx, offset txTestW
asstime_endp:
  wPrintLine ecx

runtime_s:
  xchg ecx, uChr$("Test")
runtime_endp:
  wPrintLine ecx

CodeSize asstime
CodeSize runtime

EndOfCode

Code Select

Test
Test
5       bytes for asstime
11      bytes for runtime

nidud · August 01, 2017, 04:59:05 AM

deleted

aw27 · August 01, 2017, 05:02:17 AM

JJ, you are forgetting one small detail - the Operating System is all Unicode. Along the way every Ansi function will be converted to a Unicode function. Using Ansi variations of API Windows functions is just making your code slower.
And who cares with counting the bytes used by the Unicode strings?

jj2007 · August 01, 2017, 05:16:05 AM

Quote from: nidud on August 01, 2017, 04:59:05 AMand Norwegian too of cource. You do understand that right?

of cource

nidud · August 01, 2017, 05:36:38 AM

deleted

Queue · August 01, 2017, 06:54:03 AM

For someone whose native language is expressible entirely using ASCII, I generally find unicode to just be a pain in the butt. I wanted to make using the "Wide" versions of API calls easier so that I'd have no reason not to. Any existing macros I'd seen involved changing syntax. Variable declarations went from:

Code Select

szMessage db "Hello!",0
to something arguably much different (variable name is no longer first value on the line) like:

Code Select

utf_16 wszMessage "Hello\x",0
and which was annoying to switch back and forth if I wanted to compare A and W behavior (to make sure there wasn't a mistake in the conversion, for example).

So this macro isn't meant to support non-ascii characters, it specifically just lets you use ascii text where you're going to be feeding it into a unicode function.

Another macro I use if I want to avoid the wasted space on unicode (bear with me, I just mean ascii wastefully padded out to two bytes per character, not real unicode) is to expand an ascii string at runtime. Rather than using something heavy and that might misinterpret simple ascii strings based on codepage, I just use a macro that places a simple code loop that expands a string I had predefined.

Code Select


_copysz macro dst:REQ, src:REQ, cfg:REQ, _:VARARG
	local	src_type, dst_type, tmp_regs, tmp_regd
	if @SizeStr(<dst>) ne 3 or @SizeStr(<src>) ne 3 or @SizeStr(<cfg>) ne 3 or @SizeStr(_)
		.err <bad macro argument(s)>
		exitm
	endif
	src_type substr <byte word dwordqword>, @InStr(1,<bwdq>,@SubStr(<cfg>,2,1)) * 5 - 4, 5
	dst_type substr <byte word dwordqword>, @InStr(1,<bwdq>,@SubStr(<cfg>,1,1)) * 5 - 4, 5
	tmp_regs substr <cfg>, 3, 1
	if sizeof dst_type gt sizeof src_type
		if sizeof dst_type eq sizeof word
			mov	@CatStr(%tmp_regs,<h,0>)
		else
			xor	@CatStr(<e>,%tmp_regs,<x,e>,%tmp_regs,<x>)
		endif
	endif
	tmp_regd catstr tmp_regs, <l>, tmp_regs, <xe>, tmp_regs, <x  r>, tmp_regs, <x>
	tmp_regs substr tmp_regd, sizeof src_type / 2 * 2 + 1, sizeof src_type / 4 + 2
	tmp_regd substr tmp_regd, sizeof dst_type / 2 * 2 + 1, sizeof dst_type / 4 + 2
	.while 1
		mov	tmp_regs, src_type ptr [src]
		mov	dst_type ptr [dst], tmp_regd
	.break .if !(tmp_regs & tmp_regs)
		repeat sizeof src_type
			inc	src
		endm
		repeat sizeof dst_type
			inc	dst
		endm
	.endw
endm

which is used like:

Code Select


mov	ecx, offset szXml
mov	edx, offset xBuffer
_copysz	edx, ecx, wba

It doesn't matter that it's doing an "improper" conversion since you'd only use it on ascii strings you control and can verify only contain ascii that will convert to unicode by simply slapping on a zero byte. You can also minimally size the buffer since you already know the input string length (if you're not also appending some unicode you don't strictly control). The resulting code advances each input pointer to its terminating null making it easy to use successively to append multiple strings together. A sacrificial register is specified with the third letter of the third argument.

Code Select

_copysz dst, src, cfg
"dst" is a pointer to the destination buffer loaded in a register,
"src" is a pointer to the source string loaded in a register, and
"cfg" is a short configuration description for the macro,
first letter is width of destination string (b,w,d or q for byte, word, dword or qword),
second letter is width of source string (also b,w,d or q) and
third letter is the sacrificial register, a for eax, b for ebx, c for ecx or d for edx (it has to be one of those 4 because it will need access to 8-bit and 16-bit sized sub-registers).

My point is, if you want programmers to use unicode so that all languages can be supported, when they themselves might not be hindered by ascii, you want it to be painless for them.

Queue

Queue · August 01, 2017, 10:16:08 AM

Here's it all bundled up as a working example:

Code Select


include \masm32\include\masm32rt.inc

; wchar string splitter
_T macro _T_str:VARARG
	_T_out textequ @CatStr(<>)
	_T_len = @SizeStr(<_T_str>)
	_T_pos = 1
	_T_int = 0
	while _T_pos le _T_len
		if _T_int
			if @InStr(_T_pos,<_T_str>,<!">) eq _T_pos
				if _T_int gt 0
					if @InStr(_T_pos,<_T_str>,<"">) eq _T_pos
						_T_out catstr _T_out, <,'>, @SubStr(<_T_str>,_T_pos,1), <'>
						_T_pos = _T_pos + 1
					else
						_T_int = 0
					endif
				else
					_T_out catstr _T_out, <,'>, @SubStr(<_T_str>,_T_pos,1), <'>
				endif
			elseif @InStr(_T_pos,<_T_str>,<!'>) eq _T_pos
				if _T_int lt 0
					if @InStr(_T_pos,<_T_str>,<''>) eq _T_pos
						_T_out catstr _T_out, <,">, @SubStr(<_T_str>,_T_pos,1), <">
						_T_pos = _T_pos + 1
					else
						_T_int = 0
					endif
				else
					_T_out catstr _T_out, <,">, @SubStr(<_T_str>,_T_pos,1), <">
				endif
			else
				_T_out catstr _T_out, <,">, @SubStr(<_T_str>,_T_pos,1), <">
			endif
			_T_pos = _T_pos + 1
		else
			if @InStr(_T_pos,<_T_str>,<!">) eq _T_pos
				_T_int = 1
			elseif @InStr(_T_pos,<_T_str>,<!'>) eq _T_pos
				_T_int = -1
			elseif @InStr(_T_pos,<_T_str>,< >) eq _T_pos
			elseif @InStr(_T_pos,<_T_str>,<	>) eq _T_pos
			elseif @InStr(_T_pos,<_T_str>,<,>) eq _T_pos
			elseif @InStr(_T_pos,<_T_str>,<,>)
				_T_int = @InStr(_T_pos,<_T_str>,<,>)
				_T_out catstr _T_out, <,>, @SubStr(<_T_str>,_T_pos,_T_int-_T_pos)
				_T_pos = _T_int
				_T_int = 0
			else
				_T_out catstr _T_out, <,>, @SubStr(<_T_str>,_T_pos)
				_T_pos = _T_len
			endif
			_T_pos = _T_pos + 1
		endif
	endm
	_T_out substr _T_out, 2
	exitm <_T_out>
endm
L textequ <_T(>

; null-terminated string copy
_copysz macro dst:REQ, src:REQ, cfg:REQ, _:VARARG
	local	src_type, dst_type, tmp_regs, tmp_regd
	if @SizeStr(<dst>) ne 3 or @SizeStr(<src>) ne 3 or @SizeStr(<cfg>) ne 3 or @SizeStr(_)
		.err <bad macro argument(s)>
		exitm
	endif
	src_type substr <byte word dwordqword>, @InStr(1,<bwdq>,@SubStr(<cfg>,2,1)) * 5 - 4, 5
	dst_type substr <byte word dwordqword>, @InStr(1,<bwdq>,@SubStr(<cfg>,1,1)) * 5 - 4, 5
	tmp_regs substr <cfg>, 3, 1
	if sizeof dst_type gt sizeof src_type
		if sizeof dst_type eq sizeof word
			mov	@CatStr(%tmp_regs,<h,0>)
		else
			xor	@CatStr(<e>,%tmp_regs,<x,e>,%tmp_regs,<x>)
		endif
	endif
	tmp_regd catstr tmp_regs, <l>, tmp_regs, <xe>, tmp_regs, <x  r>, tmp_regs, <x>
	tmp_regs substr tmp_regd, sizeof src_type / 2 * 2 + 1, sizeof src_type / 4 + 2
	tmp_regd substr tmp_regd, sizeof dst_type / 2 * 2 + 1, sizeof dst_type / 4 + 2
	.while 1
		mov	tmp_regs, src_type ptr [src]
		mov	dst_type ptr [dst], tmp_regd
	.break .if !(tmp_regs & tmp_regs)
		repeat sizeof src_type
			inc	src
		endm
		repeat sizeof dst_type
			inc	dst
		endm
	.endw
endm

PROJECT	textequ <"Project Name Equate">

.data

wsTitle	dw L":: ",%PROJECT," :: wchar",0)
 sTitle	db  ":: ", PROJECT," ::  char",0
even
wsTest1	dw L"!h%e&l(l)o	!h{e!<l!!\!l}!o!",0)
 sTest1	db  "!h%e&l(l)o	!h{e!<l!!\!l}!o!",0
even
wsTest2	dw L'!@#$',"%^&*",'()-_',"=+\|",'[]{}',";:,./?	!!",0)
 sTest2	db  '!@#$',"%^&*",'()-_',"=+\|",'[]{}',";:,./?	!!",0
even
wsTest3	dw L"t[]e""s!t3!>			.",0)
 sTest3	db  "t[]e""s!t3!>			.",0
even
wsTest4	dw L"this text converted from wchar to ascii",0)
 sTest4	db  "this text converted from ascii to wchar",0
even
wsTest5	dw L'te!<s''''""t'	, "te!!s''""""t",0)
wsTest6	dw L"te!>st"  ,"te!!!!st!!",0)
wsTest7	dw L'"',"'", 0,1,	5h, 2   ,"$")
wsTest8	dw L"'",'"',	0	, 10h, 13h	,"more")

.data?

align 16
xBuffer	db (sizeof wsTest4) dup(?)
.errnz sizeof wsTest4 - sizeof sTest4 * WORD

.code

EntryPoint:
	invoke	MessageBoxA, NULL, offset  sTest1, offset  sTitle, MB_OK
	invoke	MessageBoxW, NULL, offset wsTest1, offset wsTitle, MB_OK
	invoke	MessageBoxA, NULL, offset  sTest2, offset  sTitle, MB_OK
	invoke	MessageBoxW, NULL, offset wsTest2, offset wsTitle, MB_OK
	invoke	MessageBoxA, NULL, offset  sTest3, offset  sTitle, MB_OK
	invoke	MessageBoxW, NULL, offset wsTest3, offset wsTitle, MB_OK
	mov	ecx, offset wsTest4
	mov	edx, offset xBuffer
	_copysz	edx, ecx, bwa
	invoke	MessageBoxA, NULL, offset xBuffer, offset  sTitle, MB_OK
	mov	ecx, offset  sTest4
	mov	edx, offset xBuffer
	_copysz	edx, ecx, wba
	invoke	MessageBoxW, NULL, offset xBuffer, offset wsTitle, MB_OK
	exit

end EntryPoint

Queue

The MASM Forum

News: