Re: How to generate an Unicode string under MASM 6.15?

mineiro · May 14, 2017, 12:00:14 PM

QuoteMy suggestion was to use a word processor to write documents and programming tools for programming.

Can you tell me what tools are you using for programming? an assembler that do not understand utf8 symbols only ascii? An text editor that do not understand utf8 but only ascii? A debugger that cannot display utf8 symbols? A hexa editor that cannot show utf8 symbols?

utf8 text files are plain like ascii, so, raw txt.

nidud · May 14, 2017, 10:14:59 PM

deleted

TWell · May 14, 2017, 11:27:29 PM

QuoteNo assemblers or compilers understand UTF-8.

Not true, many C compilers understand UTF-8, for example PellesC, gcc, clang

QuotePelles C for Windows (poide64.exe in this case) do the same

If someone don't select UTF-8 file before saving as ANSI is a default format

Maybe this kind of things needs new topic.

nidud · May 15, 2017, 12:17:04 AM

deleted

jj2007 · May 15, 2017, 02:33:36 AM

Quote from: nidud on May 15, 2017, 12:17:04 AMNonsense, no programming tools understands UTF-8

Code Select

include \masm32\include\masm32rt.inc
.code
start:
  tmp$ CATSTR <print cfm$("This code was assembled with >, @Environ(oAssembler), <, an incredibly old programming tool\n")>
  tmp$
  print "This assembler does not 'understand' UTF-8, but it can use it", 13, 10
  print "Cet assembleur ne 'comprend' pas UTF-8, mais il peut l'utiliser", 13, 10
  print "Este montador não 'entende' UTF-8, mas pode usá-lo", 13, 10
  inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10
  exit
end start

Output:

Code Select

This code was assembled with MLv614, an incredibly old programming tool
This assembler does not 'understand' UTF-8, but it can use it
Cet assembleur ne 'comprend' pas UTF-8, mais il peut l'utiliser
Este montador não 'entende' UTF-8, mas pode usá-lo
Этот ассемблер не 'понимает' UTF-8, но может его использовать

Quoteas explained ad nostrum

It's 'ad nauseam'.

nidud · May 15, 2017, 03:09:25 AM

deleted

nidud · May 15, 2017, 03:31:52 AM

deleted

jj2007 · May 15, 2017, 03:38:43 AM

Quote from: nidud on May 15, 2017, 03:31:52 AMDo you normally use UTF-8 "quoted text" declarations in source code?

No. They are normally declared in resource (RC) files.

You can stick to your clumsy old tools and habits, no problem.

I use

Code Select

inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10

because I live in the 21st Century.

nidud · May 15, 2017, 03:49:16 AM

deleted

HSE · May 15, 2017, 04:04:40 AM

Quote from: jj2007 on May 15, 2017, 03:38:43 AM

I use
Code Select Expand
inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10

With such self explained programs no help is needed in Italy.

mineiro · May 15, 2017, 07:22:09 AM

I'm back

Quote from: nidud on May 15, 2017, 03:31:52 AM
What do we need UTF-8 for in source code?

We need that if user language cannot fit on 256 ascii bytes.
If user have coded their source code using some codepage and like to exchange that source code with others persons around the world, and don't like that when others open they source code and see garbage on screen. So, what you send is exactly what others get by using utf8 (unique chars), instead of what you send is only recognized by others when they change to the same codepage.

Quote
Do you normally use UTF-8 "quoted text" declarations in source code?

I'm doing a translator and I have started my attack on 'quoted text'. So, I can have only 3 contexts or scopes, one will be on string construction, other on code construction (variables, ...) and other on comments ignored chars construction.
This way I can create a translator that can be usefull to any assembler on this world, also that ones forgotten on ms-dos world like TMA.
ሴ db "ሴ"
If I have scenario above, I can translate variable name ሴ to ascii hex, because utf8 encode is unique, so, unique identifiers from assembler point of view. Now we are dressing utf8 to hex ascii;
utf8_e188b4 db "ሴ"
On other pass, I will play with string construction scope.
utf8_e188b4 db 0e1h,88h,b4h ;u1234

Quote
What about console/text based applications?

On linux terminal (windows console) we are able to play with utf8 and unicode chars. When we 'cat utf8.txt' (type utf8.txt) we do not see garbage on screen.

Quote
Should assemblers be able to detect the UTF-8 header (BOM)?

This is optional, but check for utf8 mark, unicode mark is good, also for future options like utf16, ... . But have in mind that linux editors does not insert BOM on files, I have see this on my tests with Notepad only (windows universe).

The real point that I now can feel better is:
We have a lot of transformation function to operate numbers, hex2ascii,bin2ascii,hexascii2hex, ... . We do not have transformation function that deals with chars, functions like utf8_2_unicode, unicode_2_utf8, utf8_2_asciihex, ... .

So, why not create an assembler to be usefull ad infinitum? As you can see, utf8 is just a dress of chars, it will not change any assembler syntax, because all that I have found only uses <=7fh chars on ascii table.

nidud · May 15, 2017, 08:57:35 AM

deleted

The MASM Forum

News:

Re: How to generate an Unicode string under MASM 6.15?

mineiro

nidud

TWell

nidud

jj2007

nidud

nidud

jj2007

nidud

HSE

mineiro

nidud