Author Topic: Re: How to generate an Unicode string under MASM 6.15?  (Read 3734 times)

mineiro

  • Member
  • ***
  • Posts: 365
Re: How to generate an Unicode string under MASM 6.15?
« Reply #105 on: May 14, 2017, 12:00:14 PM »
Quote
My suggestion was to use a word processor to write documents and programming tools for programming.

Can you tell me what tools are you using for programming? an assembler that do not understand utf8 symbols only ascii? An text editor that do not understand utf8 but only ascii? A debugger that cannot display utf8 symbols? A hexa editor that cannot show utf8 symbols?

utf8 text files are plain like ascii, so, raw txt.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

  • Member
  • *****
  • Posts: 1408
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #106 on: May 14, 2017, 10:14:59 PM »
Quote
My suggestion was to use a word processor to write documents and programming tools for programming.

Can you tell me what tools are you using for programming?

The first sample was written using Notepad++ with UTF-8 encoding. It was assembled and linked using the Asmc Macro Assembler Version 2.24G-mineiro. The object file was produced using the same assembler without the /pe switch, and the object file was disassembled using Agner Fog's objconv.exe.

The second sample was created by Doszip using ASCII encoding.

Quote
an assembler that do not understand utf8 symbols only ascii?

No assemblers or compilers understand UTF-8.

Quote
An text editor that do not understand utf8 but only ascii?

Notepad(++) understand ASCII, UTF-8, and Unicode, programming tools do not.

Quote
A debugger that cannot display utf8 symbols?

Symbols containing characters above 127 are illegal:
Code: [Select]
; Error: symbol names contain illegal characters
Quote
A hexa editor that cannot show utf8 symbols?

It will show the hexadecimal value of the ASCII bytes in the source, so no.
Code: [Select]
0000000224  6E 22 2C 20 E2 80 80 E2 │ 80 80 29 0D 0A 09 65 78  n", ÔÇÇÔÇÇ)...ex
0000000240  69 74 28 30 29 0D 0A 0D │ 0A 09 65 6E 64 09 E2 80  it(0).....end.ÔÇ
0000000256  80 E2 80 80 E2 80 80                               ÇÔÇÇÔÇÇ

Quote
utf8 text files are plain like ascii, so, raw txt.

So lets test this theory and see how it plays out by copy the UTF-8 sample to some text editors and programming tools and save them as ASCII source code.

Notepad converts the chars to space:
Code: [Select]
.data

  dd 10

.code
   :
dec  
jnz    
printf("%d\n",   )
exit(0)

end    

Notepad++ keeps the UTF-8 chars:
Code: [Select]
.data

   dd 10

.code
   :
dec   
jnz    
printf("%d\n",   )
exit(0)

end    

Doszip converts to space:
Code: [Select]
dd 10

.code
   :
dec
jnz
printf("%d\n", )
exit(0)

end

Pelles C for Windows (poide64.exe in this case) do the same
Code: [Select]
dd 10

.code
   :
dec
jnz
printf("%d\n", )
exit(0)

end

Qedit do this:
Code: [Select]
?? dd 10

.code
???:
dec ??
jnz ???
printf("%d\n", ??)
exit(0)

end ???

TWell

  • Member
  • ****
  • Posts: 748
Re: How to generate an Unicode string under MASM 6.15?
« Reply #107 on: May 14, 2017, 11:27:29 PM »
Quote
No assemblers or compilers understand UTF-8.
Not true, many C compilers understand UTF-8, for example PellesC, gcc, clang
Quote
Pelles C for Windows (poide64.exe in this case) do the same
If someone don't select UTF-8 file before saving as ANSI is a default format

Maybe this kind of things needs new topic.

nidud

  • Member
  • *****
  • Posts: 1408
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #108 on: May 15, 2017, 12:17:04 AM »
Quote
No assemblers or compilers understand UTF-8.
Not true, many C compilers understand UTF-8, for example PellesC, gcc, clang
Quote
Pelles C for Windows (poide64.exe in this case) do the same
If someone don't select UTF-8 file before saving as ANSI is a default format

 :biggrin:

Nonsense, no programming tools understands UTF-8 . They may understand the file format and use it in strings and comments (as explained ad nostrum) but you cant use  UTF-8 for programming, except for the Asmc Macro Assembler Version 2.24G-mineiro that is.

Code: [Select]
#include <stdio.h>

char início[] = "início";

int main(void)
{
printf("%s\n", início);

return 0;
}

Quote
Maybe this kind of things needs new topic.

Yep, that will fix it  :t

jj2007

  • Member
  • *****
  • Posts: 7734
  • Assembler is fun ;-)
    • MasmBasic
Re: How to generate an Unicode string under MASM 6.15?
« Reply #109 on: May 15, 2017, 02:33:36 AM »
Nonsense, no programming tools understands UTF-8

Code: [Select]
include \masm32\include\masm32rt.inc
.code
start:
  tmp$ CATSTR <print cfm$("This code was assembled with >, @Environ(oAssembler), <, an incredibly old programming tool\n")>
  tmp$
  print "This assembler does not 'understand' UTF-8, but it can use it", 13, 10
  print "Cet assembleur ne 'comprend' pas UTF-8, mais il peut l'utiliser", 13, 10
  print "Este montador não 'entende' UTF-8, mas pode usá-lo", 13, 10
  inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10
  exit
end start

Output:
Code: [Select]
This code was assembled with MLv614, an incredibly old programming tool
This assembler does not 'understand' UTF-8, but it can use it
Cet assembleur ne 'comprend' pas UTF-8, mais il peut l'utiliser
Este montador não 'entende' UTF-8, mas pode usá-lo
Этот ассемблер не 'понимает' UTF-8, но может его использовать

Quote
as explained ad nostrum

It's 'ad nauseam'.

nidud

  • Member
  • *****
  • Posts: 1408
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #110 on: May 15, 2017, 03:09:25 AM »
It's 'ad nauseam'.

 :biggrin:

Quote
Used to refer to the fact that something has been done or repeated so often that it has become annoying or tiresome

Nonsense, no programming tools understands UTF-8 . They may understand the file format and use it in strings and comments (as explained ad nostrum)

nidud

  • Member
  • *****
  • Posts: 1408
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #111 on: May 15, 2017, 03:31:52 AM »
Some thoughts about UTF-8.

What do we need UTF-8 for in source code?

To handle "quoted text" not defined in the Basic Latin (ASCII) range (0000-007F) or the Latin-1 Supplement range (0080-00FF) for Windows applications.

Do you normally use UTF-8 "quoted text" declarations in source code?

No. They are normally declared in resource (RC) files.

What about console/text based applications?

The content of the ASCII chart may then change according to location, so this depends on the code page setup for each location. The source may then use "Cyrillic ASCII characters" in the range (0000-00FF) for use in both console and Windows applications. The latter may need to be converted from ASCII to Unicode in the same way as English/Latin.

Example:
http://masm32.com/board/index.php?topic=6221.msg66164#msg66164

The ASCII view from a extern/local code page:
http://masm32.com/board/index.php?topic=6221.msg66192#msg66192

The Asmc ASCII to Unicode method:
http://masm32.com/board/index.php?topic=6221.msg66208#msg66208

Should assemblers be able to detect the UTF-8 header (BOM)?

Yes. Maybe also flip (in case of Asmc) the code page to 65001 if detected.

jj2007

  • Member
  • *****
  • Posts: 7734
  • Assembler is fun ;-)
    • MasmBasic
Re: How to generate an Unicode string under MASM 6.15?
« Reply #112 on: May 15, 2017, 03:38:43 AM »
Do you normally use UTF-8 "quoted text" declarations in source code?

No. They are normally declared in resource (RC) files.

You can stick to your clumsy old tools and habits, no problem.

I use
Code: [Select]
inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10
because I live in the 21st Century.

nidud

  • Member
  • *****
  • Posts: 1408
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #113 on: May 15, 2017, 03:49:16 AM »
I use
Code: [Select]
inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10
because I live in the 21st Century.

 :biggrin:

Silly me, I thought you lived in Italy and used a Latin keyboard.

HSE

  • Member
  • ****
  • Posts: 552
  • <AMD>< 7-32>
Re: How to generate an Unicode string under MASM 6.15?
« Reply #114 on: May 15, 2017, 04:04:40 AM »

I use
Code: [Select]
inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10

 With such self explained programs no help is needed in Italy. :biggrin: 

mineiro

  • Member
  • ***
  • Posts: 365
Re: How to generate an Unicode string under MASM 6.15?
« Reply #115 on: May 15, 2017, 07:22:09 AM »
I'm back  :biggrin:

What do we need UTF-8 for in source code?
We need that if user language cannot fit on 256 ascii bytes.
If user have coded their source code using some codepage and like to exchange that source code with others persons around the world, and don't like that when others open they source code and see garbage on screen. So, what you send is exactly what others get by using utf8 (unique chars), instead of what you send is only recognized by others when they change to the same codepage.
Quote
Do you normally use UTF-8 "quoted text" declarations in source code?
I'm doing a translator and I have started my attack on 'quoted text'. So, I can have only 3 contexts or scopes, one will be on string construction, other on code construction (variables, ...) and other on comments ignored chars construction.
This way I can create a translator that can be usefull to any assembler on this world, also that ones forgotten on ms-dos world like TMA.
ሴ db "ሴ"
If I have scenario above, I can translate variable name ሴ to ascii hex, because utf8 encode is unique, so, unique identifiers from assembler point of view. Now we are dressing utf8 to hex ascii;
utf8_e188b4 db "ሴ"
On other pass, I will play with string construction scope.
utf8_e188b4 db 0e1h,88h,b4h         ;u1234
Quote
What about console/text based applications?
On linux terminal (windows console) we are able to play with utf8 and unicode chars. When we 'cat utf8.txt' (type utf8.txt) we do not see garbage on screen.
Quote
Should assemblers be able to detect the UTF-8 header (BOM)?
This is optional, but check for utf8 mark, unicode mark is good, also for future options like utf16, ... . But have in mind that linux editors does not insert BOM on files, I have see this on my tests with Notepad only (windows universe).

The real point that I now can feel better is:
We have a lot of transformation function to operate numbers, hex2ascii,bin2ascii,hexascii2hex, ... . We do not have transformation function that deals with chars, functions like utf8_2_unicode, unicode_2_utf8, utf8_2_asciihex, ... .

So, why not create an assembler to be usefull ad infinitum? As you can see, utf8 is just a dress of chars, it will not change any assembler syntax, because all that I have found only uses <=7fh chars on ascii table.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

  • Member
  • *****
  • Posts: 1408
    • https://github.com/nidud/asmc
Re: How to generate an Unicode string under MASM 6.15?
« Reply #116 on: May 15, 2017, 08:57:35 AM »
Quote
We do not have transformation function that deals with chars, functions like utf8_2_unicode, unicode_2_utf8, utf8_2_asciihex, ... .

Unicode and Character Set Functions:
https://msdn.microsoft.com/en-us/library/windows/desktop/dd374085(v=vs.85).aspx

ASCII to Unicode using code page example:
http://masm32.com/board/index.php?topic=6221.msg66164#msg66164
http://masm32.com/board/index.php?topic=6221.msg66208#msg66208

UTF-8 to Unicode example:
http://masm32.com/board/index.php?topic=6221.msg66308#msg66308

Unicode to UTF-8 function:
https://msdn.microsoft.com/en-us/library/windows/desktop/dd374130(v=vs.85).aspx