The MASM Forum

General => The Workshop => Topic started by: sinsi on January 15, 2019, 06:48:57 PM

Title: How important is unicode
Post by: sinsi on January 15, 2019, 06:48:57 PM
For our non-English speaking members, is EN-US good enough or would you prefer your native language?
I am considering moving to all unicode but it is a bit more effort, is the effort justified?

One problem is getting a good translation, forget google et al, it needs to be a native speaker imho.
Title: Re: How important is unicode
Post by: aw27 on January 15, 2019, 07:36:01 PM
In my opinion people that matter understand enough English and prefer using the original versions because translations are commonly bad.
However, another sort of problems may arise, one example, access to folders with native names.  :(
Title: Re: How important is unicode
Post by: jj2007 on January 15, 2019, 08:27:50 PM
For various reasons, I've invested quite a bit of time in this (see e.g. the discussions with our Norwegian friend (http://masm32.com/board/index.php?topic=6275.msg67256#msg67256)). IMHO the answer is not simple. If you are going non-US-EN for a good reason, you already have three choices:
1. the user's Ansi subset
2. Utf16
3. Utf8

Option 1 may often work but mixed sets as in the MsgBox below aren't possible. And communicating in a multilingual environment (like our forum, or a company) is difficult

Option 2, full Unicode, has been adopted by M$ for Windows but is not very common anywhere else

Option 3 is most frequently used on the web etc but in the biggest single software market, China, Utf8 takes more space than Utf16.

My own choice was to keep all options open; for all programming stuff, commands like MsgBox, Print etc use Ansi; for user interaction, there are uXX and wXX versions. Another question is what the IDE supports, of course.
Title: Re: How important is unicode
Post by: TimoVJL on January 15, 2019, 08:48:52 PM
If text come from resources, UNICODE is a better choice, as it is a native format at there?
Title: Re: How important is unicode
Post by: jj2007 on January 15, 2019, 11:08:39 PM
Sure, you can put it into resources, but why perform such acrobatics if a simple Print "Привет, Мир" does the job? It's a macro assembler after all.
Title: Re: How important is unicode
Post by: aw27 on January 15, 2019, 11:20:02 PM
Always trolling, desinforming and deceiving. As far as I know Print is a macro that calls some crappy undocumented MasmBasic library function.
Title: Re: How important is unicode
Post by: jj2007 on January 16, 2019, 02:56:00 AM
I am so sorry, it should have been "print" with lowercase:
print "Вы никогда не должны кормить Хосе Паскоа"

Since this is plain Masm32, you must
a) use an advanced IDE to build the source (attached)
b) launch it from a command prompt
c) issue a chcp 65001, then
d) run donotfeed.exe (1024 bytes, this is purest 100% Real MenTM assembly code!)

... and then you see Вы никогда не должны кормить Хосе Паскоа.
Title: Re: How important is unicode
Post by: aw27 on January 16, 2019, 03:54:00 AM
Sure, but print is for console output and is not usual to have string resources in console applications. This implies that the suggestion was for a windows application. So, your reply was not helpful.    :(
Title: Re: How important is unicode
Post by: jj2007 on January 16, 2019, 05:23:30 AM
You are confused, José. This is a console application, it doesn't have resources, and it works nonetheless. Take it easy. Open a bottle of good red wine. Listen to Louis Armstrong and relax. (https://www.youtube.com/watch?v=A3yCcXgbKrE)
Title: Re: How important is unicode
Post by: Raistlin on January 16, 2019, 05:32:32 AM
Why are you guys bolding 😁 Ermmmm me too. But
let's look again at the problem. UTF8 works at a higher
percentage of the time. Let me translate.... the world,
has an international language which equals ASM.
Title: Re: How important is unicode
Post by: felipe on January 16, 2019, 05:40:10 AM
 :redface: And i was thinking that unicode was as simple as putting above your includes something like __UNICODE__ equ 1  :idea:

Oh i think i get your point sinsi. Mmm, yes a good translator is needed indeed.  :idea:
Title: Re: How important is unicode
Post by: jj2007 on January 16, 2019, 05:44:13 AM
Quote from: felipe on January 16, 2019, 05:40:10 AM
:redface: And i was thinking that unicode was as simple as putting above your includes something like __UNICODE__ equ 1  :idea:

Maybe? Show me your version of print "Вы никогда не должны кормить Хосе Паскоа" 8)
Title: Re: How important is unicode
Post by: felipe on January 16, 2019, 09:14:21 AM
well i don't use print yet...or unicode... :biggrin: but maybe someday  :idea:
Title: Re: How important is unicode
Post by: hutch-- on January 16, 2019, 10:56:23 AM
I imagine it has some to do with your target, if you are pointing your software at a Chinese market you use the text format that best suits the character display you want. If you are a European non English language speaker, your OS language version will allow you to display text in your native character set. UNICODE is for cross character set display and while it does that job fine, the obvious trap is that it is twice the size.

It is easy enough to process, add 2 versus add 1, and allocate the character count x 2 for memory but when big files are involved, UNICODE is a problem. If you have to process very large log files the double size can run you our of memory.
Title: Re: How important is unicode
Post by: jj2007 on January 16, 2019, 02:13:17 PM
Quote from: hutch-- on January 16, 2019, 10:56:23 AMthe obvious trap is that it is twice the size

For UTF16, yes. That's why the UTF8 representation of Unicode is so popular. Even the big log files on Chinese servers are mostly in English.
Title: Re: How important is unicode
Post by: aw27 on January 16, 2019, 05:52:15 PM
Sinsi question makes sense only because it is a pain to deal with Unicode in Assembly Language. On the other hand it is almost transparent in all modern high-level languages and is not only the default selection but in most cases the only selection.
It took many years for high-level languages to reach the level of perfection and there is a whole lot more to it than JJ even conceives. So, it makes sense in many case, particularly in ASM, to use Unicode strings in a resource file.
Sure, you can put it into resources, but why perform such acrobatics if a simple Print "Привет, Мир" does the job? is a ridiculous response but acceptable if JJ was finishing drinking its usual bottle of wine after lunch while listening Louis Armstrong.

Title: Re: How important is unicode
Post by: hutch-- on January 16, 2019, 06:21:08 PM
Basically you write a few accessories and UNICODE becomes a lot easier. 32 bit MASM32 has macros that handle the ascii range to UNICODE but for full range there is an accessory that handles the full range of UNICODE and produced DB sequences that are placed in the .DATA section. You can get UNICODE strings from a resource file written in UNICODE but at least at one stage the API was buggy where the embedded data was reliable.
Title: Re: How important is unicode
Post by: hutch-- on January 16, 2019, 06:51:33 PM
> if JJ was finishing drinking its usual bottle of wine after lunch while listening Louis Armstrong.

That sounds perfectly reasonable to me. My choice is a bottle of pure malt while listening to the many brilliantly good female musicians on Youtube.
Title: Re: How important is unicode
Post by: sinsi on January 16, 2019, 08:00:15 PM
My question was for a few reasons.
- I've always shied away from using resources but nowadays with manifests/version blocks/icons you can't really avoid them, so adding
   a string table is no big deal.
- The Windows API is apparently all unicode, so any A functions are first converted to unicode then back to ansi, how much overhead?
- As hutch said, unicode uses more memory. A resource section can be discarded but memory used in .data is always in use.

Correct me if I'm wrong, but the masm32 unicode macros are really only for converting english to unicode english yes?
Title: Re: How important is unicode
Post by: jj2007 on January 16, 2019, 08:50:32 PM
Quote from: AW on January 16, 2019, 05:52:15 PMwhy perform such acrobatics if a simple Print "Привет, Мир" does the job? is a ridiculous response

Dear José, if your assembly toolchain does not allow a simple print "Привет, Мир" it is your problem, not mine.
Title: Re: How important is unicode
Post by: aw27 on January 16, 2019, 09:01:27 PM
Quote from: jj2007 on January 16, 2019, 02:13:17 PM
Quote from: hutch-- on January 16, 2019, 10:56:23 AMthe obvious trap is that it is twice the size

For UTF16, yes. That's why the UTF8 representation of Unicode is so popular.

Actually, it is wrong. UTF16 is variable-length, as code points are encoded with one or two 16-bit code units. Characters like emojis use 2 UTF16 characters.
Title: Re: How important is unicode
Post by: aw27 on January 16, 2019, 09:04:41 PM
Quote from: jj2007 on January 16, 2019, 08:50:32 PM
Quote from: AW on January 16, 2019, 05:52:15 PMwhy perform such acrobatics if a simple Print "Привет, Мир" does the job? is a ridiculous response

Dear José, if your assembly toolchain does not allow a simple print "Привет, Мир" it is your problem, not mine.
You are not yet sober.   :shock: . When you are, we will talk again.
Title: Re: How important is unicode
Post by: jj2007 on January 16, 2019, 09:11:31 PM
Quote from: sinsi on January 16, 2019, 08:00:15 PM
My question was for a few reasons.
- I've always shied away from using resources but nowadays with manifests/version blocks/icons you can't really avoid them, so adding a string table is no big deal.

Right, but it's clumsy.

QuoteThe Windows API is apparently all unicode, so any A functions are first converted to unicode then back to ansi, how much overhead?

Correct, but it would matter only in a tight loop, a very unlikely case.

Quoteunicode uses more memory. A resource section can be discarded but memory used in .data is always in use.

Doesn't make a difference for the exe size, though.

QuoteCorrect me if I'm wrong, but the masm32 unicode macros are really only for converting english to unicode english yes?

It seems so, but I am not the right person to answer this. My macros work with Utf8 strings (i.e. true Unicode) in the .DATA section that get converted on the fly to UTF16. That is a few bytes of overhead - push offset utf8string, call makeutf16 - but it allows to assemble them directly as shown above (Print (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1110) "Привет, Мир") without using resources, and for strings longer than 10 chars or so it bloats less.

However, "true" Unicode can work even with plain Masm32:
include \masm32\include\masm32rt.inc
.code
start:
  invoke SetConsoleOutputCP, 65001 ; force UTF8
  print "Привет, Мир"   ; note the lowercase "p" - this is Masm32, not MasmBasic
  exit
end start


This works fine, see above, but the problem is that you can't build it with some IDEs (I use RichMasm, of course).
Title: Re: How important is unicode
Post by: hutch-- on January 16, 2019, 10:46:01 PM
sinsi,

The macros will only convert what you can type in ascii but a resource script that is written in UNICODE can be compiled into a resource using any character set, Chinese, south east Asian etc .... There is a UNICODE editor in MASM32 that will do the job and there is a tool that converts true UNICODE into DB sequences so it can be done but takes a bit more work.

If it is a large block of UNICODE text, you can also save it as a binary resource in a resource file.
Title: Re: How important is unicode
Post by: TimoVJL on January 17, 2019, 01:30:49 AM
As WriteConsoleA support UTF-8 with CP 65001, but some assemblers don't accept UTF-8 BOM :(, so Notepad isn't good editor for coding.
.386
.model flat,stdcall
option casemap:none

ExitProcess proto stdcall :dword
GetStdHandle proto stdcall :dword
SetConsoleOutputCP proto stdcall :dword
WriteConsoleA proto stdcall :dword,:dword,:dword,:dword,:dword
includelib kernel32

.data
msg db "Привет ASM",10

.code
mainCRTStartup proc C
invoke SetConsoleOutputCP, 65001
invoke GetStdHandle, -11
invoke WriteConsoleA,eax,addr msg,sizeof msg,0,0
invoke ExitProcess, 0
mainCRTStartup endp
end

EDIT: UAsm had a BOM handling in version 2.38 ? http://masm32.com/board/index.php?topic=6422.msg68795#msg68795
but not in UASM v2.47, Nov 17 2018, Masm-compatible assembler ?TestMasm32Cyr.asm(1) : Error A2210: Syntax error: ´╗┐include
Title: Re: How important is unicode
Post by: daydreamer on January 17, 2019, 03:07:23 AM
Quote from: sinsi on January 15, 2019, 06:48:57 PM
For our non-English speaking members, is EN-US good enough or would you prefer your native language?
I am considering moving to all unicode but it is a bit more effort, is the effort justified?

One problem is getting a good translation, forget google et al, it needs to be a native speaker imho.
I prefer to use english everyday so I get exercise and sometimes meet a new person on internet that introduce me to new words and I think its maybe best to try to keep comments in code in english,it becomes too hard for a newbie to both read advanced code+comments that nobody can read
so what everybody thinks about code commenting native vs english?
do you want me to keep try to write english code comments?(not sure if that is correct english with code comments)
Title: Re: How important is unicode
Post by: aw27 on January 17, 2019, 03:26:04 AM
Assemblers don't need a BOM (if there is one is useful if they can skip it, so this was the UASM idea but may have been forgotten somewhere or removed on purpose), editors may need a BOM (unless they do a guesswork or wait for you to tell them what the text coding is all about).
Title: Re: How important is unicode
Post by: aw27 on January 17, 2019, 07:50:33 AM
@daydreamer,
Even if you comment only in English you may need to use Unicode to insert an emoticon or the Swedish flag. This is trendy, see Instagram, etc.
Title: Re: How important is unicode
Post by: hutch-- on January 17, 2019, 11:25:06 AM
I have never gone the route of adding the BOM in UNICODE as I only work on x86/64 and simply don't need it. With an assembler, in x86/64, multi-port to other hardware is not a consideration so why bother.
Title: Re: How important is unicode
Post by: daydreamer on January 18, 2019, 04:19:43 AM
Quote from: AW on January 17, 2019, 07:50:33 AM
@daydreamer,
Even if you comment only in English you may need to use Unicode to insert an emoticon or the Swedish flag. This is trendy, see Instagram, etc.
http://www.unicode.org/charts/ (http://www.unicode.org/charts/)
I have checked this page earlier and found lots of useful things,especially checked the game unicodes that might be useful if you want to make a unicode textbased game without need to make graphics for card games,mahjong,chess ,but I havent found flags with cross yet,swedish,danish,finnish
that would be a good combination in a game:unicode characters for games+emoticons that you use when player win/lose/tie

but thanks a swedish flag might be good idea to use

Title: Re: How important is unicode
Post by: aw27 on January 18, 2019, 06:19:13 AM
@daydreamer
Good point. The Unicode points for the flags exist, what is more difficult is finding a font to render them. For Windows you need the BabelStone Flags Font (http://www.babelstone.co.uk/Fonts/Flags.html).
I installed it and tested in Word and works like a charm.

To make a program to render in color you need DirectWrite. I made a program sometime ago, which is in the Games forum, to render emojis that can be easily adapted to render color flags.
Title: Re: How important is unicode
Post by: jj2007 on January 18, 2019, 12:36:39 PM
The DirectX thread is here: http://masm32.com/board/index.php?topic=7480.0
Title: Re: How important is unicode
Post by: aw27 on January 18, 2019, 08:55:31 PM
Thank you JJ.

(https://www.dropbox.com/s/12fcjk00okpqqzz/flags.png?dl=1)
Title: Re: How important is unicode
Post by: felipe on January 19, 2019, 02:27:15 AM
nice
Title: Re: How important is unicode
Post by: daydreamer on January 19, 2019, 09:08:11 AM
Quote from: AW on January 18, 2019, 08:55:31 PM
Thank you JJ.

(https://www.dropbox.com/s/12fcjk00okpqqzz/flags.png?dl=1)
nice flags
doesnt latest richedit support not only unicode but colors too in winapi textarea?MS word you can have different colors in textarea,so probably possible with flags too

but to make use of majhong pieces would be the harder,because you need to sort them before drawing them,because they are ontop of each other in the game


Title: Re: How important is unicode
Post by: aw27 on January 20, 2019, 02:34:06 AM
Richedit: From what I read it is only possible to render unicode color characters in the windowless control. However, I was not able to do it in some tests performed at the time. It is possible to render to the canvas by subclassing but is not the same thing.
Microsoft makes life very hard for developers in the areas they do business, including technologies related to word processing. Things were not like that when Microsoft was only the operating system.
Title: Re: How important is unicode
Post by: TBRANSO1 on January 20, 2019, 03:51:22 AM
Quote from: jj2007 on January 16, 2019, 05:44:13 AM
print "Вы никогда не должны кормить Хосе Паскоа"

Sorry, I don't know anything to add to the conversation...

This is funny... I am not sure if AW knows Russian, but I do... I am not going to "feed"...., but this is funny.

BTW, are you Russian?  If so, why do you have an Italian map?
Title: Re: How important is unicode
Post by: jj2007 on January 20, 2019, 04:25:00 AM
Quote from: TBRANSO1 on January 20, 2019, 03:51:22 AMbut this is funny.

Glad you liked it :P

QuoteBTW, are you Russian?  If so, why do you have an Italian map?

I am German, but I live in Italy. And Russian is a charset that works for the console of many OS versions, in contrast to e.g. Chinese or Arabic. Therefore I use it often to demonstrate Unicode-related features.

Where are you from? I used to speak some Russian many years ago, nowadays I have to rely on DeepL (https://www.deepl.com/translator).
Title: Re: How important is unicode
Post by: TBRANSO1 on January 20, 2019, 04:40:01 AM
Quote from: jj2007 on January 20, 2019, 04:25:00 AM
Where are you from? I used to speak some Russian many years ago, nowadays I have to rely on DeepL (https://www.deepl.com/translator).

Aha, no, I once played a Russian linguist - intelligence expert for the US ARMY and Gov.  I'm from Arizona.  After I got away from the swamp in DC, I have played businessman.  Now, I am playing a programmer.  I sort of miss the work I did back then, and lament the relationship between the US and Russia due to policy blunders for the past two decades... one of the reasons I left.

Title: Re: How important is unicode
Post by: hutch-- on January 20, 2019, 07:40:51 AM
Escaping the swamp in DC sounds like a good move. The deterioration in relations between the US and Russia is very unfortunate as both countries have something to gain by simply trading with each other. Too many fingers in the pie in DC seems to be the problem.
Title: Re: How important is unicode
Post by: daydreamer on January 20, 2019, 10:29:01 AM
@TBRANSO1
good you got outta the swamp


Quote from: AW on January 18, 2019, 08:55:31 PM
Thank you JJ.

(https://www.dropbox.com/s/12fcjk00okpqqzz/flags.png?dl=1)
I got an idea,after checking your dx thread again about emoji and about flags and game unicode,lets all install all game useful unicode sets and for example this could be my character in game,made by unicode characters ontop of each other
king emoji
swedish flag(t-shirt)
blue jeans
blue swede shoes
so how does your character look like?
Title: Re: How important is unicode
Post by: TBRANSO1 on January 20, 2019, 12:24:25 PM
Quote from: hutch-- on January 20, 2019, 07:40:51 AM
Escaping the swamp in DC sounds like a good move. The deterioration in relations between the US and Russia is very unfortunate as both countries have something to gain by simply trading with each other. Too many fingers in the pie in DC seems to be the problem.

Yep

It's beautiful in the area, BUT comes with a price.

OT:
Those Russians are brilliant in science, math and logic... definitely.  It was explained by a professor that a good proportion of the algorithms that Knuth is famous for have roots to the Russians in computer science.  That when archives were dug up after the fall of the "Empire", that a lot of those scientific journals and notes were discovered, and although the Russians were underfunded, embargoed, blockaded, there is a lot of work that we can actually now credit to Russians rather than Americans.  That while discoveries were credited by others in the US, Germany, Netherlands or the UK, but it's good to give credit where it is due.

I have a real life experience of some chemists in a forum around '93 or so, where I was a translator, that they were tasked to solve some equation that looked like squiggly lines to me, and the Americans whipped out their TI's, and the Russians solved the equation with pen and pencil faster.  "Impressive, very Impressive!" as Darth Vader would say.
Title: Re: How important is unicode
Post by: aw27 on January 20, 2019, 06:05:51 PM
Quote
I got an idea,after checking your dx thread again about emoji and about flags and game unicode,lets all install all game useful unicode sets and for example this could be my character in game,made by unicode characters ontop of each other
king emoji
swedish flag(t-shirt)
blue jeans
blue swede shoes
so how does your character look like?

Under Windows, Color Flags are truly on the leading edge of Unicode. The technology is not yet stable. As you can read from the http://www.babelstone.co.uk/Fonts/Flags.html article on the Edge and IR11 browsers Country Subdivision Codes are not rendered as flags. They are rendered like this.
(https://www.dropbox.com/s/56cjzzjx90u2bfj/subdivflags.png?dl=1)
On the other hand while Word 2016 displays all color flags it does not do it smoothly, the font tends to reset to Calibri when we paste a flag character.

DirectWrite will not display the Country Subdivision Codes flags (at least I could not make it work).
For instance the symbol for Belgium: Brussels-Capital Region (FR-BRU) is 7 UTF32 characters and is properly converted to 14 UTF16 characters by MultiByteToWideChar. However it will only display as a generic black flag on a pole (the same symbol that Notepad displays when I select the BabelStone Flags font and paste the character).

There are other shortcomings such as, in monochrome, flags not being displayed as real flags, etc.
Title: Re: How important is unicode
Post by: TimoVJL on January 22, 2019, 05:43:45 PM
Finally Windows 10 supports loading local font for DirectDraw with AddFontResourceEx / RemoveFontResourceEx.
Window 7 needs FontFileLoader implementation for that and that same code don't work with 8.1 and 10 :(
An example needs BabelStoneFlags.ttf to same folder.
Title: Re: How important is unicode
Post by: jj2007 on January 25, 2019, 11:40:42 AM
Is Unicode really that important? (http://masm32.com/board/index.php?topic=94.msg83434#msg83434)
(http://masm32.com/board/index.php?action=dlattach;topic=94.0;attach=8911;image)