Attached a RichMasm beta for testing - use at your own risk.
Main new features:
- open & save files with non-Latin names (an example of an Arabic file is included below)
- search for non-Latin text, e.g. текст
- pass Unicode arguments for testing via OPT_Arg1, as shown below
This requires a recent MasmBasic installation. Extract the attached archives to \Masm32\MasmBasic, then run RichMasmBeta.exe
GuiParas equ "Unicode rocks!!!!" ; in RichMasm, hit F6 to build this application
GuiMenu equ <@File, Open> ; requires MasmBasic (http://masm32.com/board/index.php?topic=94.0)
include \masm32\MasmBasic\Res\MbGui.asm
MakeFont hFont, Height:24
GuiControl MyEdit, "RichEdit", font hFont
wSetWin$ hMyEdit="The commandline passed was"+CrLf$+wCL$()
Event Menu
.if MenuID==0
.if wFileOpen$("Rich source=*.asc|Poor sauce=*.asm|Resource=*.rc")
wSetWin$ hMyEdit=wFileRead$(wFileOpen$())
.endif
.endif
GuiEnd
OPT_Icon Globe ; v v v "Enter text here" in Russian, Chinese and Arabic
OPT_Arg1 Введите текст здесь / 在此输入文字 / أدخل النص هنا
EDIT: Beta removed, the current version (http://masm32.com/board/index.php?topic=94.0) is more up-to-date.
Something you could do is write code in a UNICODE editor and run a process that only read the code up to the comments. This would allow comments in any language to make the code readable but remove it and convert the code to ASCII for assembly/compiling. An assembler de-commenter is simple enough to write so you run the UNICODE through the API to convert it to ASCII, strip the comments and then feed it to the assembler.
nidud's asmc show how to do it:
http://masm32.com/board/index.php?topic=6221.msg66308#msg66308
http://masm32.com/board/index.php?topic=5942.msg66207#msg66207
If only UTF-8 BOM could set that /ws=65001 as default prosessing UTF-8 file, one problem lesser?
Hutch & Tim,
Thanks for your suggestions, I appreciate your interest, really :t
There is one minor problem, though: It works already. Study the examples above, they work absolutely fine and assemble in ML 6.15... no need for more acrobacy ;-)
Under the hood: RichMasm's RichEd20.dll control has always (since 200x?) used Unicode by default. All I had to do is find a way to export UTF-8 text to plain text, and start the build. Actually, the process is a little bit more complicated, but the principle is that simple. And those who believe in purest assembler without any macros can use even ML version 6.14 to process their szText "歡迎", 0 8)
Quote from: hutch-- on May 10, 2017, 01:49:58 PM
An assembler de-commenter is simple enough to write so you run the UNICODE through the API to convert it to ASCII, strip the comments and then feed it to the assembler.
I think you need an scrip-writer (even more complex than RichMasm) because you need to prevent the introducción of unicode characters in other code than strings. Sometimes copying and pasting introduce unicode characters (especially invisible characters and characters that look like ANSI) and take time to find them.
The MasmBasic version of 8 December 2017 (http://masm32.com/board/index.php?topic=94.0) has three new macros for handling UTF-8 strings, uLeft$, uMid$, uRight$:
include \masm32\MasmBasic\MasmBasic.inc
SetGlobals r$="Введите текст здесь" ; "Enter text here" in Russian
SetGlobals c$="在這裡輸入文字" ; "Enter text here" in Chinese
Init
PrintLine "[", r$, "] (original string)"
PrintLine "[", uRight$(r$, 5), "_", uMid$(r$, 9, 5)), "_", uLeft$(r$, 7), "] (right_mid_left, fixed)"
PrintLine "[", uRight$(r$, 5), "_", Mid$(r$, Instr_(r$, "текст"), 2*5)), "_", uLeft$(r$, 7), "] (right_mid_left, Instr)"
wMsgBox 0, wRec$("["+uLeft$(c$, 5)+"]"), "Chinese, uLeft$(5):", MB_OK
wMsgBox 0, wRec$("["+uRight$(c$, 3)+"]"), "Chinese, uRight$(3):", MB_OK
EndOfCode
Remarks:
- use uLeft$(src, chars) if you know the #UTF-8 chars needed
- use "normal" Left$() etc if you got the #chars from Instr_(); but note the need to calculate bytes, see 2*5 above
Output:
[Введите текст здесь] (original string)
[здесь_текст_Введите] (right_mid_left, fixed)
[здесь_текст_Введите] (right_mid_left, Instr)
The three macros should work exactly like their Ansi und wide versions (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1159) (if not, please let me know).
I attach a somewhat bigger project including an example how to use lower$() and UPPER$() with Unicode text.