Author Topic: Re: How to generate an Unicode string under MASM 6.15?  (Read 3928 times)

jj2007

  • Member
  • *****
  • Posts: 7760
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #15 on: May 07, 2017, 01:20:24 AM »
the UNICODE spec does not require it

Actually, there is no safe way to distinguish UTF-8 from "pure" Ansi. Most of the sources posted here would work exactly the same way if they were saved as UTF-8; simply because they don't contain any "exotic" characters. This is why the BOMs make sense. What is really, really hard to understand is that neither MASM nor the Watcom clones take a few cycles to test if there are two bytes to skip at the beginning of the source. These things have been around for several decades now...

TWell

  • Member
  • ****
  • Posts: 748
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #16 on: May 07, 2017, 02:01:42 AM »
Quote
UTF-8 representation of the BOM is the (hexadecimal) byte sequence 0xEF,0xBB,0xBF
I think many people like idea to have that handled in hjwasm and asmc.
After that notepad is not an evil anymore and comments in native language are not a problem.

nidud

  • Member
  • *****
  • Posts: 1411
    • https://github.com/nidud/asmc
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #17 on: May 07, 2017, 02:16:02 AM »
What is really, really hard to understand is that neither MASM nor the Watcom clones take a few cycles to test if there are two bytes to skip at the beginning of the source. These things have been around for several decades now...

For what purpose?

I think many people like idea to have that handled in hjwasm and asmc.
After that notepad is not an evil anymore and comments in native language are not a problem.

What language do not have this ability?

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4935
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #18 on: May 07, 2017, 02:39:30 AM »
Until there is a wide acceptance of writing compilers and assemblers to read UNICODE, the methods that have been around since at least WinNT4 will keep doing the job. Code is still generally ASCII and most assemblers/compilers even restrict the upper 128 character set. The exception which has been this way since the earliest Win32 is RC.EXE that has always been capable of UNICODE in Win32.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

jj2007

  • Member
  • *****
  • Posts: 7760
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #19 on: May 07, 2017, 02:48:05 AM »
Until there is a wide acceptance of writing compilers and assemblers to read UNICODE, the methods that have been around since at least WinNT4 will keep doing the job.

Absolutely :t

And the handful of non-English coders who want to write their strings or comments in exotic alphabets can use RichMasm.

nidud

  • Member
  • *****
  • Posts: 1411
    • https://github.com/nidud/asmc
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #20 on: May 07, 2017, 03:02:52 AM »
And the handful of non-English coders who want to write their strings or comments in exotic alphabets can use RichMasm.

Why do they have to use RichMasm?

TWell

  • Member
  • ****
  • Posts: 748
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #21 on: May 07, 2017, 03:11:52 AM »
M$ Cpp don't scare about UTF-8 with BOM.
So this problem is  inherited from a ml.exe.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4935
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #22 on: May 07, 2017, 03:14:36 AM »
 :biggrin:

> And the handful of non-English coders who want to write their strings or comments in exotic alphabets can use RichMasm.

Or anything else that can write UNICODE to a RC script. With an ASCII editor you can do these.

.data
align 4
  [rename me] \
    dw "T","h","i","s"," ","i","s"," ","a"," ","t","e","s","t",0,0
.code


Or this,

 ; ANSI string of 16 bytes converted to UNICODE
 ; at 34 bytes using MultiByteToWideChar

  [rename me] \
    db 84,0,104,0,105,0,115,0,32,0,105,0,115,0,32,0
    db 97,0,32,0,116,0,101,0,115,0,116,0,13,0,10,0
    db 0,0


Or this in a UNICODE RC sript.

    STRINGTABLE
    BEGIN
        250,  "早上好计算机程序员。\0"
        251,  "おはようのコンピュータのプログラマー。\0"
        252,  "Хороший программист утром.\0"
        253,  "Καλή προγραμματιστής ηλεκτρονικών υπολογιστών πρωί.\0"
        254,  "सुप्रभात कंप्यूटर प्रोग्रामर.\0"
        255,  "Chào buổi sáng lập trình máy tính.\0"
        256,  "დილა მშვიდობისა, კომპიუტერული პროგრამისტი.\0"
        257,  "Добро јутро компјутерски програмер.\0"
        258,  "Բարի լույս ծրագրավորող.\0"
        259,  "안녕하세요 컴퓨터 프로그래머.\0"
    END

hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

jj2007

  • Member
  • *****
  • Posts: 7760
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #23 on: May 07, 2017, 03:51:32 AM »
And the handful of non-English coders who want to write their strings or comments in exotic alphabets can use RichMasm.

Why do they have to use RichMasm?

Oops, that is a misunderstanding - see attachment.
Code: [Select]
include \masm32\MasmBasic\MasmBasic.inc  ; Вы знаете, так, где же найти эту библиотеку.
  Init
  uMsgBox 0, "真的,没有人是被迫使用高级编辑。", "Important message:", MB_OK
EndOfCode

(if you can't see the text correctly, try an advanced browser like FireFox, Edge, MSIE, Safari, Opera, Chrome or Vivaldi - there are a few that can display non-Latin alphabets)

nidud

  • Member
  • *****
  • Posts: 1411
    • https://github.com/nidud/asmc
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #24 on: May 07, 2017, 04:31:14 AM »
And the handful of non-English coders who want to write their strings or comments in exotic alphabets can use RichMasm.

Why do they have to use RichMasm?

Oops, that is a misunderstanding - see attachment.

 :biggrin:

What does this have to do with anything?

Just to make this clear: ASCII is not exclusively English. All nations who use keyboards and produce/use software have their own version, even the Chinese but then somewhat simplified.

Code: [Select]
LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED

STRINGTABLE
BEGIN
    201 "开始"
    202 "注销..."
    203 "关闭..."
    204 "重新启动..."
    205 "运行..."
    206 "帮助"
END

So why don't you just answer the question?

jj2007

  • Member
  • *****
  • Posts: 7760
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #25 on: May 07, 2017, 04:37:49 AM »
Why would any sane person want to work with such "text"?
What would you do if your browser showed you such gibberish?

nidud

  • Member
  • *****
  • Posts: 1411
    • https://github.com/nidud/asmc
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #26 on: May 07, 2017, 04:59:42 AM »
So the answer is, as you (and everybody else) obviously know, you don't have to use Unicode/UTF-8 to write code or to write their strings or comments in exotic alphabets.

The question is then: why do YOU use it?

Why would any sane person want to work with such "text"?
What would you do if your browser showed you such gibberish?

:lol:

When I read Russian source code using a Norwegian code page that's basically what it look like, yes.  :P

Code: [Select]
chi:
;Use "!.!" instead !.! if you need access to files with spaces in names
; ¨á¯®«ì§ã©â¥ ª ¢ë窨, çâ®¡ë ¨¬¥âì ¤®áâ㯠ª ä ©« ¬ á ¯à®¡¥« ¬¨ ¢ ¨¬¥­ å,
; ­® ¯®¬­¨â¥, çâ® ­¥ ¢á¥ áâ àë¥ ¤®á-¯à¨«®¦¥­¨ï ¨å ¯®­¨¬ îâ

jj2007

  • Member
  • *****
  • Posts: 7760
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #27 on: May 07, 2017, 05:04:06 AM »
The question is then: why do YOU use it?

The question is: Why do ALL browsers display all kinds of exotic languages correctly? Wouldn't it be so much easier if your browser displayed only the Norwegian subset correctly, and showed you gibberish if, for whatever strange reason, you visited French or German or Russian websites?

Hey, it's 2017. Unicode was invented in the 20th Century.

nidud

  • Member
  • *****
  • Posts: 1411
    • https://github.com/nidud/asmc
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #28 on: May 07, 2017, 05:51:45 AM »
The question is then: why do YOU use it?

The question is: Why do ALL browsers display all kinds of exotic languages correctly? Wouldn't it be so much easier if your browser displayed only the Norwegian subset correctly, and showed you gibberish if, for whatever strange reason, you visited French or German or Russian websites?

We have a browser problem?

Quote
Hey, it's 2017. Unicode was invented in the 20th Century.

So, Unicode was invented and all browsers now display all kinds of exotic languages correctly.

Right, so it's all good then.

nidud

  • Member
  • *****
  • Posts: 1411
    • https://github.com/nidud/asmc
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #29 on: May 07, 2017, 06:55:26 AM »
So here's a visual view on how this code page stuff works. The first image is the Norwegian code page I normally use. The second is the Russian version.

In text mode (console mode) the command is mode con CP select=866 with this output:
Code: [Select]
Status for device CON:
----------------------
    Lines:          36
    Columns:        94
    Keyboard rate:  31
    Keyboard delay: 0
    Code page:      866

Haven't tested this but I assume there is a quick-menu in Notepad++ to change the code page there as well.

The point is that when you write code (or anything else) you will always have a fixed keyboard adapted to the language you normally use, so a German or Russian fellow will be totally lost given a Norwegian keyboard. The idea that any of the mention above see their own language as a second (or exotic) sub-language of English as the impression you get from reading this tread is a bit strange. The Germans write their text and comments in the German language using their own ASCII table with their own keyboard layout just as the English and Russians do and so on.