News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Re: How to generate an Unicode string under MASM 6.15?

Started by nidud, May 05, 2017, 11:15:13 PM

Previous topic - Next topic

nidud

#30
deleted

aw27

Sign of times, DOS prompt (Ok, let's call it command prompt) showing multilanguage Unicode.



hutch--

Could you folks minimise the amount of image data being posted as attachments in the forum, it just loads up the server with a mountain of crap.

aw27

Quote from: hutch-- on May 07, 2017, 01:48:41 PM
Could you folks minimise the amount of image data being posted as attachments in the forum, it just loads up the server with a mountain of crap.

I only posted one, not as attachment and not crap either.  :biggrin:

jj2007

Quote from: aw27 on May 07, 2017, 01:44:50 PMSign of times, DOS prompt (Ok, let's call it command prompt) showing multilanguage Unicode.

For me, only Russian works in the console. But some years ago I used to see Chinese and Arabic in the console, too, and I have no idea why that ceased to work. I've tried chcp 65001 and 65000, using Lucida Console and Consolas, etc, no success :(

aw27

Quote from: jj2007 on May 07, 2017, 06:18:55 PM
For me, only Russian works in the console. But some years ago I used to see Chinese and Arabic in the console, too, and I have no idea why that ceased to work. I've tried chcp 65001 and 65000, using Lucida Console and Consolas, etc, no success :(
I tested in Windows 10 using NSinSun or Gothic fonts. With the Windows 7 default fonts only Russian, but I think you can add fonts to the command prompt.

jj2007

Quote from: hutch-- on May 07, 2017, 01:48:41 PMCould you folks minimise the amount of image data being posted as attachments

Point taken, Hutch. Most of my images are stored on my private site, though, and what I attach here is usually not that big.

Here is a nice one by Joel:

Quoteif you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will.
...
Also they were liberal hippies in California who wanted to conserve (sneer). If they were Texans they wouldn't have minded guzzling twice the number of bytes. But those Californian wimps couldn't bear the idea of doubling the amount of storage it took for strings, and anyway, there were already all these doggone documents out there using various ANSI and DBCS character sets and who's going to convert them all? Moi? For this reason alone most people decided to ignore Unicode for several years and in the meantime things got worse.

Thus was invented the brilliant concept of UTF-8 ... which has the nice property of also working respectably if you have the happy coincidence of English text and braindead programs that are completely unaware that there is anything other than ASCII.

With "braindead programs" he surely means ML.exe :P

TWell

Quote from: hutch-- on May 07, 2017, 01:48:41 PM
Could you folks minimise the amount of image data being posted as attachments in the forum, it just loads up the server with a mountain of crap.
OK.
I start removing my pictures and maybe later the whole profile  ;)

aw27

Quote from: jj2007 on May 07, 2017, 07:27:15 PM
With "braindead programs" he surely means ML.exe :P
Rewording, UTF8 allows programs built by code page 437 developers to stay alive in a changing World.

mineiro

One common error for foreginers that is learn programming is about labels. As portuguese language point of view we can't create labels like "início" or "começo", and this is just "start" point. And student don't understand why he/she should write by removing accents, cedilla, ... .
This is hard because we can't create function names too.
I don't think this will change for a long time, so, plain english.
Would be nice if we can create function names, labels, variable names (not only data) by using unicode/utf-8/..., so we can inside same source code call a function writen on cyrilic, or arabic, ... . The hard part will be how to write that symbols by using keyboard.
Well, I don't have found a good solution to this, even if this works probably O.S. can refute that name.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

aw27

Quote from: mineiro on May 07, 2017, 10:35:08 PM
One common error for foreginers that is learn programming is about labels.

Some compilers deal well with UTF8.
int Прощай()
{
   return 12;
}
int _tmain(int argc, _TCHAR* argv[])
{
   int início =  Прощай();
   int 好 = início;

   return 0;
}

aw27


Quote from: mineiro on May 07, 2017, 10:35:08 PM
One common error for foreginers that is learn programming is about labels.

Some compilers deal well with UTF8.
int Прощай()
{
   return 12;
}
int _tmain(int argc, _TCHAR* argv[])
{
   int início =  Прощай();
   int 好 = início;

   return 0;
}

No assemblers, though.

mineiro

thanks to reply sir aw27;
the only language that I can program is assembly; I have tried C for dummies and python too, but I learned more by simbiosys on source code, so I only posses noob knowledge about others languages.
Good know that some languages accepts utf-8. I was reading about that and sounds not much difficult to implement. One symbol on screen can have 1,2,3 or 4 bytes on memory, and they have absorbed ascii symbols.
https://tools.ietf.org/html/rfc3629

A bit offtopic but other thing is about 'new line' on source code. On windows this is crlf, on linux is only lf. A lot of assemblers fails here only looking for crlf.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

#43
deleted

mineiro

hello sir nidud, thanks for reply;

Quote from: nidud on May 07, 2017, 11:51:11 PM
The Latin characters are not restricted to the English language  ;)
Yes, I agree. Latin symbols don't have accents, cedilla, ... . This was created by french,spanish,portuguese,italian, german, maybe romaenian not sure...
When we turn computer on, symbols that we can see on screen are just data stored on bios to be put on screen. One chip factory created another symbol table and others the same. This is why they have created ascii symbols, to be a default between symbols (data).

QuoteYou don't need Unicode/UTF for that: That's not the problem.
I was thinking about:
início:
The letter 'í' have accent, this letter if correct codepage setup shows as before, if other codepage is selected then I can see russian symbols or others chars, and we are talking about same hexadecimal representation. I have failed with some assemblers to write that label on source code because they cut the ascii table to latin chars or the first 127 symbols. So I'm judging that this can happen too with russians or others languages.

-edited-
I was thinking too on math symbols. Instead of "call sum" function we can call sigma "call Σ". So, while learning a math book, a student can find that symbol on a function form on computers.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything