News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Re: How to generate an Unicode string under MASM 6.15?

Started by nidud, May 05, 2017, 11:15:13 PM

Previous topic - Next topic

jj2007

Quote from: hutch-- on May 06, 2017, 11:36:56 PMthe UNICODE spec does not require it

Actually, there is no safe way to distinguish UTF-8 from "pure" Ansi. Most of the sources posted here would work exactly the same way if they were saved as UTF-8; simply because they don't contain any "exotic" characters. This is why the BOMs make sense. What is really, really hard to understand is that neither MASM nor the Watcom clones take a few cycles to test if there are two bytes to skip at the beginning of the source. These things have been around for several decades now...

TWell

QuoteUTF-8 representation of the BOM is the (hexadecimal) byte sequence 0xEF,0xBB,0xBF
I think many people like idea to have that handled in hjwasm and asmc.
After that notepad is not an evil anymore and comments in native language are not a problem.

nidud

#17
deleted

hutch--

Until there is a wide acceptance of writing compilers and assemblers to read UNICODE, the methods that have been around since at least WinNT4 will keep doing the job. Code is still generally ASCII and most assemblers/compilers even restrict the upper 128 character set. The exception which has been this way since the earliest Win32 is RC.EXE that has always been capable of UNICODE in Win32.

jj2007

Quote from: hutch-- on May 07, 2017, 02:39:30 AMUntil there is a wide acceptance of writing compilers and assemblers to read UNICODE, the methods that have been around since at least WinNT4 will keep doing the job.

Absolutely :t

And the handful of non-English coders who want to write their strings or comments in exotic alphabets can use RichMasm.

nidud

#20
deleted

TWell

M$ Cpp don't scare about UTF-8 with BOM.
So this problem is  inherited from a ml.exe.

hutch--

 :biggrin:

> And the handful of non-English coders who want to write their strings or comments in exotic alphabets can use RichMasm.

Or anything else that can write UNICODE to a RC script. With an ASCII editor you can do these.

.data
align 4
  [rename me] \
    dw "T","h","i","s"," ","i","s"," ","a"," ","t","e","s","t",0,0
.code


Or this,

; ANSI string of 16 bytes converted to UNICODE
; at 34 bytes using MultiByteToWideChar

  [rename me] \
    db 84,0,104,0,105,0,115,0,32,0,105,0,115,0,32,0
    db 97,0,32,0,116,0,101,0,115,0,116,0,13,0,10,0
    db 0,0


Or this in a UNICODE RC sript.

    STRINGTABLE
    BEGIN
        250,  "早上好计算机程序员。\0"
        251,  "おはようのコンピュータのプログラマー。\0"
        252,  "Хороший программист утром.\0"
        253,  "Καλή προγραμματιστής ηλεκτρονικών υπολογιστών πρωί.\0"
        254,  "सुप्रभात कंप्यूटर प्रोग्रामर.\0"
        255,  "Chào buổi sáng lập trình máy tính.\0"
        256,  "დილა მშვიდობისა, კომპიუტერული პროგრამისტი.\0"
        257,  "Добро јутро компјутерски програмер.\0"
        258,  "Բարի լույս ծրագրավորող.\0"
        259,  "안녕하세요 컴퓨터 프로그래머.\0"
    END


jj2007

Quote from: nidud on May 07, 2017, 03:02:52 AM
Quote from: jj2007 on May 07, 2017, 02:48:05 AM
And the handful of non-English coders who want to write their strings or comments in exotic alphabets can use RichMasm.

Why do they have to use RichMasm?

Oops, that is a misunderstanding - see attachment.
include \masm32\MasmBasic\MasmBasic.inc  ; Вы знаете, так, где же найти эту библиотеку.
  Init
  uMsgBox 0, "真的,没有人是被迫使用高级编辑。", "Important message:", MB_OK
EndOfCode


(if you can't see the text correctly, try an advanced browser like FireFox, Edge, MSIE, Safari, Opera, Chrome or Vivaldi - there are a few that can display non-Latin alphabets)

nidud

#24
deleted

jj2007

Why would any sane person want to work with such "text"?
What would you do if your browser showed you such gibberish?

nidud

#26
deleted

jj2007

Quote from: nidud on May 07, 2017, 04:59:42 AMThe question is then: why do YOU use it?

The question is: Why do ALL browsers display all kinds of exotic languages correctly? Wouldn't it be so much easier if your browser displayed only the Norwegian subset correctly, and showed you gibberish if, for whatever strange reason, you visited French or German or Russian websites?

Hey, it's 2017. Unicode was invented in the 20th Century.

nidud

#28
deleted

nidud

#29
deleted