News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

How important is unicode

Started by sinsi, January 15, 2019, 06:48:57 PM

Previous topic - Next topic

sinsi

For our non-English speaking members, is EN-US good enough or would you prefer your native language?
I am considering moving to all unicode but it is a bit more effort, is the effort justified?

One problem is getting a good translation, forget google et al, it needs to be a native speaker imho.

aw27

In my opinion people that matter understand enough English and prefer using the original versions because translations are commonly bad.
However, another sort of problems may arise, one example, access to folders with native names.  :(

jj2007

For various reasons, I've invested quite a bit of time in this (see e.g. the discussions with our Norwegian friend). IMHO the answer is not simple. If you are going non-US-EN for a good reason, you already have three choices:
1. the user's Ansi subset
2. Utf16
3. Utf8

Option 1 may often work but mixed sets as in the MsgBox below aren't possible. And communicating in a multilingual environment (like our forum, or a company) is difficult

Option 2, full Unicode, has been adopted by M$ for Windows but is not very common anywhere else

Option 3 is most frequently used on the web etc but in the biggest single software market, China, Utf8 takes more space than Utf16.

My own choice was to keep all options open; for all programming stuff, commands like MsgBox, Print etc use Ansi; for user interaction, there are uXX and wXX versions. Another question is what the IDE supports, of course.

TimoVJL

If text come from resources, UNICODE is a better choice, as it is a native format at there?
May the source be with you

jj2007

Sure, you can put it into resources, but why perform such acrobatics if a simple Print "Привет, Мир" does the job? It's a macro assembler after all.

aw27

Always trolling, desinforming and deceiving. As far as I know Print is a macro that calls some crappy undocumented MasmBasic library function.

jj2007

I am so sorry, it should have been "print" with lowercase:
print "Вы никогда не должны кормить Хосе Паскоа"

Since this is plain Masm32, you must
a) use an advanced IDE to build the source (attached)
b) launch it from a command prompt
c) issue a chcp 65001, then
d) run donotfeed.exe (1024 bytes, this is purest 100% Real MenTM assembly code!)

... and then you see Вы никогда не должны кормить Хосе Паскоа.

aw27

Sure, but print is for console output and is not usual to have string resources in console applications. This implies that the suggestion was for a windows application. So, your reply was not helpful.    :(

jj2007

You are confused, José. This is a console application, it doesn't have resources, and it works nonetheless. Take it easy. Open a bottle of good red wine. Listen to Louis Armstrong and relax.

Raistlin

Why are you guys bolding 😁 Ermmmm me too. But
let's look again at the problem. UTF8 works at a higher
percentage of the time. Let me translate.... the world,
has an international language which equals ASM.
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...

felipe

 :redface: And i was thinking that unicode was as simple as putting above your includes something like __UNICODE__ equ 1  :idea:

Oh i think i get your point sinsi. Mmm, yes a good translator is needed indeed.  :idea:

jj2007

Quote from: felipe on January 16, 2019, 05:40:10 AM
:redface: And i was thinking that unicode was as simple as putting above your includes something like __UNICODE__ equ 1  :idea:

Maybe? Show me your version of print "Вы никогда не должны кормить Хосе Паскоа" 8)

felipe

well i don't use print yet...or unicode... :biggrin: but maybe someday  :idea:

hutch--

I imagine it has some to do with your target, if you are pointing your software at a Chinese market you use the text format that best suits the character display you want. If you are a European non English language speaker, your OS language version will allow you to display text in your native character set. UNICODE is for cross character set display and while it does that job fine, the obvious trap is that it is twice the size.

It is easy enough to process, add 2 versus add 1, and allocate the character count x 2 for memory but when big files are involved, UNICODE is a problem. If you have to process very large log files the double size can run you our of memory.

jj2007

Quote from: hutch-- on January 16, 2019, 10:56:23 AMthe obvious trap is that it is twice the size

For UTF16, yes. That's why the UTF8 representation of Unicode is so popular. Even the big log files on Chinese servers are mostly in English.