News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Re: How to generate an Unicode string under MASM 6.15?

Started by nidud, May 05, 2017, 11:15:13 PM

Previous topic - Next topic

nidud

deleted

aw27

Another example with a mix of various language. All you need is an editor that saves in UTF8, most will do. Tested with JWASM/HJWASM.


.386

.MODEL FLAT, C
option casemap:none

CP_UTF8 equ 65001
MB_OK equ 0
NULL equ 0
option dllimport:<kernel32.dll>
MultiByteToWideChar PROTO STDCALL :DWORD,:DWORD,:DWORD,:DWORD,:DWORD,:DWORD
ExitProcess   proto :dword
option dllimport:<user32.dll>
MessageBoxW PROTO STDCALL :DWORD,:DWORD,:DWORD,:DWORD

.data


sKorean db "Korean: 한자"," "
sJapanese db "Japanese: 漢字"," "
sChinese db "Chinese: 汉字"," "
sRussian db "Russian: Прощай", 0,0
sCaption dw "L","a","n","g","u","a","g","e"," ","S","a","l","a","d",0,0

.data?
myBuffer      db 256 dup(?)
   
.code
     
start proc
invoke MultiByteToWideChar, CP_UTF8, 0, offset sKorean, -1, offset myBuffer, 256
invoke MessageBoxW, NULL, addr myBuffer, addr sCaption, MB_OK
invoke  ExitProcess, 0
ret
start endp

end start

TWell

W0LF,
npp is defaulted to UTF8 and in your examples was cyrillic text, so was it really ANSI8 text with KOI8 or Windows cp 1251 ?
The link of picture i wasn't able to see cyrillic text, so do i have to buy better eyeglasses or just more beer :biggrin:

Others:
jwasm and others should accept the fact that UTF-8 is default in source files in these days, as comments needs local language.
How about to just warn about the BOM ?

When we see a real UNICODE capable assembler ?
Have to wait another 20 years ?

ml[64] is just a vintage product, a barrier for further development ;)

jj2007

Quote from: W0LF on May 05, 2017, 10:15:09 PMMy code works even without __UNICODE__ with russian symbols.

It works with ANSI if your machine's codepage is cyrillic. My example works on all machines because it uses UTF-8.

newrobert

Quote from: TWell on May 06, 2017, 02:58:51 AM
W0LF,
npp is defaulted to UTF8 and in your examples was cyrillic text, so was it really ANSI8 text with KOI8 or Windows cp 1251 ?
The link of picture i wasn't able to see cyrillic text, so do i have to buy better eyeglasses or just more beer :biggrin:

Others:
jwasm and others should accept the fact that UTF-8 is default in source files in these days, as comments needs local language.
How about to just warn about the BOM ?

When we see a real UNICODE capable assembler ?
Have to wait another 20 years ?

ml[64] is just a vintage product, a barrier for further development ;)

20 years too long.

jj2007

Quote from: TWell on May 06, 2017, 02:58:51 AMjwasm and others should accept the fact that UTF-8 is default in source files in these days, as comments needs local language.
How about to just warn about the BOM ?

My examples above work because all Masm-compatible assemblers accept UTF-8. Open the attached *.asm source in a hex editor and check yourself.

.data
sCaption db "Заголовок", 0
sText db "Текст на русском!", 0


Btw it builds also with qEditor, although it looks like garbage. In RichMasm you can actually read the Russian text, and you can save edits.

aw27

Quote from: jj2007 on May 06, 2017, 11:41:02 AM
My examples above work because all Masm-compatible assemblers accept UTF-8.
The assemblers simply don't know, they believe they are dealing with ASCII and it works in almost every case. If the UTF8 file has a BOM, it will be enough for the assembler to reject the file as not good.

hutch--

 :biggrin:

> Btw it builds also with qEditor, although it looks like garbage.

That just says that the method you use does not work in Quick Editor. Look in the example code for working UNICODE applications using the authodox Microsoft method. Then there is the QE accessory "MultiTool" that will do conversions for you and if all else fails, UniEdit lets you write anything you like in UNICODE which you then place in a UNICODE RC script.

QE is a pure ASCII editor, it does not pretend to write UTF8/16 or UNICODE.

jj2007

Quote from: aw27 on May 06, 2017, 05:00:38 PMThe assemblers simply don't know, they believe they are dealing with ASCII and it works in almost every case.

Exactly. And I have yet to see a case where it didn't work.include \masm32\include\masm32rt.inc
include uChrMacro.inc

.code
start:
  push MB_OK
  push uChr$("Заголовок!")
  push uChr$("Текст на русском!")
  push 0
  call MessageBoxW
  exit
end start


Pure Masm32. It looks simple enough, right? Of course, there are more complicated solutions.

aw27

Quote from: jj2007 on May 06, 2017, 07:19:50 PM
Exactly. And I have yet to see a case where it didn't work.

Your MasmBasic is amazing. Congratulations  :t

nidud

#10
deleted

aw27

Quote from: nidud on May 06, 2017, 10:18:26 PM
error A2008: syntax error : я╗┐include

The BOMs suck  :badgrin:
Intelligent editors don't need a BOM to find out that the text is UTF8.

jj2007

Quote from: aw27 on May 06, 2017, 10:32:43 PMIntelligent editors don't need a BOM to find out that the text is UTF8.

I would even go one step further: Intelligent editors know when to put a BOM and when to save it as UTF-8 without BOM :bgrin:

(more precisely: ml & clones can't stand the BOM, rc.exe likes it)

hutch--

Bottom line is the UNICODE spec does not require it, its there for text that may be used on other hardware that uses a different byte order. The UNICODE editor I supply specifically does not have it as its designed for Windows UNICODE only.

nidud

#14
deleted