Author Topic: Re: How to generate an Unicode string under MASM 6.15?  (Read 2545 times)

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #60 on: May 08, 2017, 08:36:04 PM »
yes, I agree. The user need have installed that font to write that source code text by using a text editor.
But, if an assembler opens that text file only by looking for hexadecimal numbers, it's possible to recognize the size of that unicode/utf-8 string and this way know that that data is an identifier.

The problem is that the Unicode Standard is huge and contains loads of modifying characters (different versions of the low ASCII characters (CR/LF/SPACE/..)) so you need to know the type of each of them as well.

Quote
Nothing will change as we know, this board accepts english language as mother language as I have said before. It's just an option or feature that can be added. If a person reaches this board using runic I was not be able to help.

That is the core of the problem, yes. We have to agree on a fixed language to communicate in the same way we have to agree on a common programming language in order to write software.

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #61 on: May 08, 2017, 08:37:19 PM »
Quote
Why does every single professional software on Windows work with either UTF-8 or Unicode, except assembly IDEs?

Could you name one please.

RichMasm? ;)

So RichMasm do not work with either UTF-8 or Unicode?

Well, neither do the assembler so maybe that's the reason.

jj2007

  • Member
  • *****
  • Posts: 7558
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: How to generate an Unicode string under MASM 6.15?
« Reply #62 on: May 08, 2017, 08:58:54 PM »
Quote
RichMasm? ;)

So RichMasm do not work with either UTF-8 or Unicode?

It works perfectly, as the example shows. And all major software like MS Office, LibreOffice, Skype, Telegram, whatever, have absolutely no problem working with English, Chinese and Arabic simultaneously. So I really wonder why we are making such a big fuss about it here. It's not rocket science, and we are in the 21st century now.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4813
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: How to generate an Unicode string under MASM 6.15?
« Reply #63 on: May 08, 2017, 09:16:48 PM »
 :biggrin:

Well, user apps may be in the 21st century but compilers and assemblers are truly in the 20th century as they generally work only in ASCII.  :P It would not be wise to hold your breath waiting for compilers and assemblers to start working in UNICODE.  :badgrin:
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

jj2007

  • Member
  • *****
  • Posts: 7558
  • Assembler is fun ;-)
    • MasmBasic
Re: How to generate an Unicode string under MASM 6.15?
« Reply #64 on: May 08, 2017, 09:47:58 PM »
No need to hold my breath - I just use UTF-8 :P

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #65 on: May 08, 2017, 09:50:41 PM »
Quote
It works perfectly, as the example shows.

So RichMasm is not an assembly IDE?

Quote
And all major software like MS Office, LibreOffice, Skype, Telegram, whatever, have absolutely no problem working with English, Chinese and Arabic simultaneously.

Amazing and somewhat disturbing.

Quote
So I really wonder why we are making such a big fuss about it here.


Who are we and what's the fuss about?

Quote
It's not rocket science,

So what is it then?

Quote
and we are in the 21st century now.

What, already?

mineiro

  • Member
  • ***
  • Posts: 365
Re: How to generate an Unicode string under MASM 6.15?
« Reply #66 on: May 08, 2017, 10:03:52 PM »
I know it's a hard job to be implemented, but this board receive persons from all the world that can help.
I was reading about utf-8, we get first byte of source code, check the left most bit of that byte with zero, if yes we are can deal with ascii, we have N bytes char; If that is 110 we have N bytes as being one char; if have 1110 we have N bytes as being one char. After first byte is found the one that follows start with bits 10.
1110.... 10...... 10......
So, text source code suffered from hd bad blocks. When we open that text the left bits of first byte starts with 10......, we know that this is impossible to be the first, so we can discard that as being an identifier and check next byte that starts with 10..... too, we can discard and next one starts with 110, ops, we know that we have found something valid.

I know that an answer can be: "So, build your preprocessor", I see your side too.

Greek school phylosophers tell us that it's impossible to solve all universe and nature questions only using math (calculus, trigonometric and logic).
How scientists today try to prove their creations? Yes, using math.
It's hard believe that this multiverse with a lot of data to be discovered cannot be acessed by math. This is why physics can't join micro with macro cosmo I suppose. We need something new, but the guy that can do this will be a heretic.
On computers example, we live on a "rectangle tyranny" when talking about images.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #67 on: May 08, 2017, 10:04:07 PM »
No need to hold my breath - I just use UTF-8 :P

Note: Need MasmBasic ; download

ᛒᛓᛕᛋᛘᛚᛝᛟᛠ:

   dec ΤΥΦΧΨΩ
   jnz ᛒᛓᛕᛋᛘᛚᛝᛟᛠ

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #68 on: May 08, 2017, 10:17:25 PM »
Quote
It's hard believe that this multiverse with a lot of data to be discovered cannot be acessed by math.
...
It's not rocket science, and we are in the 21st century now.

HELP!  :lol:

mineiro

  • Member
  • ***
  • Posts: 365
Re: How to generate an Unicode string under MASM 6.15?
« Reply #69 on: May 08, 2017, 10:47:36 PM »
hehehe sir nidud
sorry, the phrase is 'cannot be acessed only by using math".
It's because 'scientific method' have been created by greek school. Guys from century ago modeled our way of life.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #70 on: May 09, 2017, 12:37:00 AM »
 :biggrin:

No worries mineiro, you do at least understand the problems and stick to the issues debated.

As for the data handling in MASM this needs code to convert from ASCII/UTF-8 to Unicode. Asmc will do this more or less in the same way it's done in C/C++. HJwasm are also working on this as I understand, but currently not fully implemented.

Using a local code page and ASCII is preferred. This enables switching from Unicode/ASCII using a command line switch to the assembler or/and defining __UNICODE__ directly in the source as the include files in MASM32 is setup.

As for the font issue Notepad++ will handle both ASCII/code page and UTF-8 without BOM. However, UTF-8 is only needed for cosmetic reasons if multiple languages (for some unknown reason) is stuffed in the same file.

The idea of extending the programming language, as debated here, using the Latin-1 Supplement range, or even Unicode is a totally different issue.

mineiro

  • Member
  • ***
  • Posts: 365
Re: How to generate an Unicode string under MASM 6.15?
« Reply #71 on: May 09, 2017, 03:44:31 AM »
Persons are using a lot unicode today because turned into a default on internet (web pages). UTF-8 is being used a lot on chat programs, they are trying to make this default on emails too. I don't have sure if utf-8 absorb unicode, if yes, can be used as another layer.

I was able to copy different symbols on this topic and paste on console mode under linux just to feel, I discovered that I have runic symbols font, also others that I can't recognize. So we have a command interpreter (command.com or cmd.exe) that can accept unicode/utf-8 on linux.
Code: [Select]
mineiro@assembly:~/asm$ ᛒᛓᛕᛋᛘᛚᛝᛟᛠ:
ᛒᛓᛕᛋᛘᛚᛝᛟᛠ:: comando não encontrado
mineiro@assembly:~/asm$    dec ΤΥΦΧΨΩ
mineiro@assembly:~/asm$    jnz ᛒᛓᛕᛋᛘᛚᛝᛟᛠ
mineiro@assembly:~/asm$       250,  "早上好计算机程序员。\0"
mineiro@assembly:~/asm$         251,  "おはようのコンピュータのプログラマー。\0"
mineiro@assembly:~/asm$         252,  "Хороший программист утром.\0"
mineiro@assembly:~/asm$         253,  "Καλή προγραμματιστής ηλεκτρονικών υπολογιστών πρωί.\0"
mineiro@assembly:~/asm$         254,  "सुप्रभात कंप्यूटर प्रोग्रामर.\0"
mineiro@assembly:~/asm$         255,  "Chào buổi sáng lập trình máy tính.\0"
mineiro@assembly:~/asm$         256,  "დილა მშვიდობისა, კომპიუტერული პროგრამისტი.\0"
mineiro@assembly:~/asm$         257,  "Добро јутро компјутерски програмер.\0"
mineiro@assembly:~/asm$         258,  "Բարի լույս ծրագրավորող.\0"
mineiro@assembly:~/asm$         259,  "안녕하세요 컴퓨터 프로그래머.\0"
So, we don't need codepages to show different symbols. We need codepages if we are on ms-dos environment and think like what we have talked here.


---edited--: Anyone?
Code: [Select]
8
7
6
5
4
3
2
1
a b c d e f g h

1- e2-e4
« Last Edit: May 09, 2017, 05:40:31 AM by mineiro »
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #72 on: May 09, 2017, 06:53:06 AM »
There was a bug in Asmc using option codepage: 65001 -- the length is not ASCII*2 in UTF-8 ...

The visual code page effect: this is copied from Doszip
Code: [Select]
; Build: asmc /pe /ws=65001 test.asm

.486
.model flat, stdcall

option dllimport:<kernel32>
ExitProcess proto :dword
option dllimport:<user32>
MessageBoxW proto :ptr, :ptr, :ptr, :dword
option dllimport:none

.code
start:
MessageBoxW( 0,
"早上好计算机程序员。\n"
"おはようのコンピュータのプログラマー。\n"
"Хороший программист утром.\n"
"Καλή προγραμματιστής ηλεκτρονικών υπολογιστών πρωί.\n"
"सुप्रभात कंप्यूटर प्रोग्रामर.\n"
"Chào buổi sáng lập trình máy tính.\n"
"დილა მშვიდობისა, კომპიუტერული პროგრამისტი.\n"
"Добро јутро компјутерски програмер.\n"
"Ô²Õ¡Ö€Õ« Õ¬Õ¸Ö‚ÕµÕ½ Õ®Ö€Õ¡Õ£Ö€Õ¡Õ¾Õ¸Ö€Õ¸Õ².\n"
"안녕하세요 컴퓨터 프로그래머.\n",
"Message", 0 )
ExitProcess( 0 )

end start

Same file copied from Notepad++
Code: [Select]
; Build: asmc /pe /ws=65001 test.asm

.486
.model flat, stdcall

option dllimport:<kernel32>
ExitProcess proto :dword
option dllimport:<user32>
MessageBoxW proto :ptr, :ptr, :ptr, :dword
option dllimport:none

.code
start:
MessageBoxW( 0,
"早上好计算机程序员。\n"
"おはようのコンピュータのプログラマー。\n"
"Хороший программист утром.\n"
"Καλή προγραμματιστής ηλεκτρονικών υπολογιστών πρωί.\n"
"सुप्रभात कंप्यूटर प्रोग्रामर.\n"
"Chào buổi sáng lập trình máy tính.\n"
"დილა მშვიდობისა, კომპიუტერული პროგრამისტი.\n"
"Добро јутро компјутерски програмер.\n"
"Բարի լույս ծրագրավորող.\n"
"안녕하세요 컴퓨터 프로그래머.\n",
"Message", 0 )
ExitProcess( 0 )

end start

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #73 on: May 09, 2017, 06:29:30 PM »
Persons are using a lot unicode today because turned into a default on internet (web pages). UTF-8 is being used a lot on chat programs, they are trying to make this default on emails too. I don't have sure if utf-8 absorb unicode, if yes, can be used as another layer.

As programmers we still deals with bytes, so a DWORD is just a sequence of bytes: db 1,2,3,4 but we lump these together in fixed types: 0x04030201. Same with Unicode, it's just a WORD: 'A' = db 'A',0 = 0x0041.

Everything containing CODE (C/ASM/CSS/HTM) is for this reason ASCII where numbers are written and not inserted (0 = 0x30, not 0x00).

So the code you deal with as a programmer is always ASCII and it looks something like this using UTF-8:
Code: [Select]
mineiro@assembly:~/asm$ ᛒᛓᛕᛋᛘᛚᛝᛟᛠ:<br>
ᛒᛓᛕᛋᛘᛚᛝᛟᛠ::comando não encontrado<br>
mineiro@assembly:~/asm$&nbsp; &nbsp; dec ΤΥΦΧΨΩ<br>
mineiro@assembly:~/asm$&nbsp; &nbsp; jnz ᛒᛓᛕᛋᛘᛚᛝᛟᛠ<br>

And using Unicode will be something like this:
Code: [Select]
Hl&#x0131;o&#xA77A;&#xEA73; b&#x0131;ð ec allar k&#x0131;n&#xA77A;&#x0131;r me&#x0131;r&#x0131;

Quote
So, we don't need codepages to show different symbols. We need codepages if we are on ms-dos environment and think like what we have talked here.

As for the code page setup this relates to writing and links directly to the keyboard you use, and this differ depending on the language locally used.

If you use an English code page and write the string "Hello" in an ASCII editor what you see is "Hello". If you do the same in a Unicode editor what you see is "Hello".

If you use a Russian code page and write the string "Заголовок" in an ASCII editor what you see is "Заголовок". If you do the same in a Unicode editor what you see is "Заголовок".

So what do programmers do? They write code in ASCII using their local code page.

mineiro

  • Member
  • ***
  • Posts: 365
Re: How to generate an Unicode string under MASM 6.15?
« Reply #74 on: May 09, 2017, 09:52:29 PM »
yes sir nidud, I understand;
"Image is nothing, contents is everything"
Not all chars are printable on ascii, that's why on hexdumps we see a lot of dots on ascii relation with hexa numbers. Also hexa "08h", that means 'backspace' key, if we print that so means that we will eat one char on screen.

string db "0",08h,"1",08h,"2",08h,0dh,0ah,08h,08h,00h

Also other example from memory, what's the visual diference between space char (20h) and (0ffh)?. On ms-dos days both represent 'null' char draw.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything