News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Re: How to generate an Unicode string under MASM 6.15?

Started by nidud, May 05, 2017, 11:15:13 PM

Previous topic - Next topic

mineiro

QuoteMy suggestion was to use a word processor to write documents and programming tools for programming.

Can you tell me what tools are you using for programming? an assembler that do not understand utf8 symbols only ascii? An text editor that do not understand utf8 but only ascii? A debugger that cannot display utf8 symbols? A hexa editor that cannot show utf8 symbols?

utf8 text files are plain like ascii, so, raw txt.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

#106
deleted

TWell

QuoteNo assemblers or compilers understand UTF-8.
Not true, many C compilers understand UTF-8, for example PellesC, gcc, clang
QuotePelles C for Windows (poide64.exe in this case) do the same
If someone don't select UTF-8 file before saving as ANSI is a default format

Maybe this kind of things needs new topic.

nidud

#108
deleted

jj2007

Quote from: nidud on May 15, 2017, 12:17:04 AMNonsense, no programming tools understands UTF-8

include \masm32\include\masm32rt.inc
.code
start:
  tmp$ CATSTR <print cfm$("This code was assembled with >, @Environ(oAssembler), <, an incredibly old programming tool\n")>
  tmp$
  print "This assembler does not 'understand' UTF-8, but it can use it", 13, 10
  print "Cet assembleur ne 'comprend' pas UTF-8, mais il peut l'utiliser", 13, 10
  print "Este montador não 'entende' UTF-8, mas pode usá-lo", 13, 10
  inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10
  exit
end start


Output:
This code was assembled with MLv614, an incredibly old programming tool
This assembler does not 'understand' UTF-8, but it can use it
Cet assembleur ne 'comprend' pas UTF-8, mais il peut l'utiliser
Este montador não 'entende' UTF-8, mas pode usá-lo
Этот ассемблер не 'понимает' UTF-8, но может его использовать


Quoteas explained ad nostrum

It's 'ad nauseam'.

nidud

#110
deleted

nidud

#111
deleted

jj2007

Quote from: nidud on May 15, 2017, 03:31:52 AMDo you normally use UTF-8 "quoted text" declarations in source code?

No. They are normally declared in resource (RC) files.

You can stick to your clumsy old tools and habits, no problem.

I use
inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10

because I live in the 21st Century.

nidud

#113
deleted

HSE

Quote from: jj2007 on May 15, 2017, 03:38:43 AM

I use
inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10

With such self explained programs no help is needed in Italy. :biggrin: 
Equations in Assembly: SmplMath

mineiro

I'm back  :biggrin:

Quote from: nidud on May 15, 2017, 03:31:52 AM
What do we need UTF-8 for in source code?
We need that if user language cannot fit on 256 ascii bytes.
If user have coded their source code using some codepage and like to exchange that source code with others persons around the world, and don't like that when others open they source code and see garbage on screen. So, what you send is exactly what others get by using utf8 (unique chars), instead of what you send is only recognized by others when they change to the same codepage.
Quote
Do you normally use UTF-8 "quoted text" declarations in source code?
I'm doing a translator and I have started my attack on 'quoted text'. So, I can have only 3 contexts or scopes, one will be on string construction, other on code construction (variables, ...) and other on comments ignored chars construction.
This way I can create a translator that can be usefull to any assembler on this world, also that ones forgotten on ms-dos world like TMA.
ሴ db "ሴ"
If I have scenario above, I can translate variable name ሴ to ascii hex, because utf8 encode is unique, so, unique identifiers from assembler point of view. Now we are dressing utf8 to hex ascii;
utf8_e188b4 db "ሴ"
On other pass, I will play with string construction scope.
utf8_e188b4 db 0e1h,88h,b4h         ;u1234
Quote
What about console/text based applications?
On linux terminal (windows console) we are able to play with utf8 and unicode chars. When we 'cat utf8.txt' (type utf8.txt) we do not see garbage on screen.
Quote
Should assemblers be able to detect the UTF-8 header (BOM)?
This is optional, but check for utf8 mark, unicode mark is good, also for future options like utf16, ... . But have in mind that linux editors does not insert BOM on files, I have see this on my tests with Notepad only (windows universe).

The real point that I now can feel better is:
We have a lot of transformation function to operate numbers, hex2ascii,bin2ascii,hexascii2hex, ... . We do not have transformation function that deals with chars, functions like utf8_2_unicode, unicode_2_utf8, utf8_2_asciihex, ... .

So, why not create an assembler to be usefull ad infinitum? As you can see, utf8 is just a dress of chars, it will not change any assembler syntax, because all that I have found only uses <=7fh chars on ascii table.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

#116
deleted