The MASM Forum

General => The Workshop => Topic started by: nidud on May 05, 2017, 11:15:13 PM

Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 05, 2017, 11:15:13 PM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: aw27 on May 06, 2017, 01:13:35 AM
Another example with a mix of various language. All you need is an editor that saves in UTF8, most will do. Tested with JWASM/HJWASM.


.386

.MODEL FLAT, C
option casemap:none

CP_UTF8 equ 65001
MB_OK equ 0
NULL equ 0
option dllimport:<kernel32.dll>
MultiByteToWideChar PROTO STDCALL :DWORD,:DWORD,:DWORD,:DWORD,:DWORD,:DWORD
ExitProcess   proto :dword
option dllimport:<user32.dll>
MessageBoxW PROTO STDCALL :DWORD,:DWORD,:DWORD,:DWORD

.data


sKorean db "Korean: 한자"," "
sJapanese db "Japanese: 漢字"," "
sChinese db "Chinese: 汉字"," "
sRussian db "Russian: Прощай", 0,0
sCaption dw "L","a","n","g","u","a","g","e"," ","S","a","l","a","d",0,0

.data?
myBuffer      db 256 dup(?)
   
.code
     
start proc
invoke MultiByteToWideChar, CP_UTF8, 0, offset sKorean, -1, offset myBuffer, 256
invoke MessageBoxW, NULL, addr myBuffer, addr sCaption, MB_OK
invoke  ExitProcess, 0
ret
start endp

end start
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: TWell on May 06, 2017, 02:58:51 AM
W0LF,
npp is defaulted to UTF8 and in your examples was cyrillic text, so was it really ANSI8 text with KOI8 or Windows cp 1251 ?
The link of picture i wasn't able to see cyrillic text, so do i have to buy better eyeglasses or just more beer :biggrin:

Others:
jwasm and others should accept the fact that UTF-8 is default in source files in these days, as comments needs local language.
How about to just warn about the BOM ?

When we see a real UNICODE capable assembler ?
Have to wait another 20 years ?

ml[64] is just a vintage product, a barrier for further development ;)
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 06, 2017, 04:45:25 AM
Quote from: W0LF on May 05, 2017, 10:15:09 PMMy code works even without __UNICODE__ with russian symbols.

It works with ANSI if your machine's codepage is cyrillic. My example works on all machines because it uses UTF-8.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: newrobert on May 06, 2017, 10:59:03 AM
Quote from: TWell on May 06, 2017, 02:58:51 AM
W0LF,
npp is defaulted to UTF8 and in your examples was cyrillic text, so was it really ANSI8 text with KOI8 or Windows cp 1251 ?
The link of picture i wasn't able to see cyrillic text, so do i have to buy better eyeglasses or just more beer :biggrin:

Others:
jwasm and others should accept the fact that UTF-8 is default in source files in these days, as comments needs local language.
How about to just warn about the BOM ?

When we see a real UNICODE capable assembler ?
Have to wait another 20 years ?

ml[64] is just a vintage product, a barrier for further development ;)

20 years too long.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 06, 2017, 11:41:02 AM
Quote from: TWell on May 06, 2017, 02:58:51 AMjwasm and others should accept the fact that UTF-8 is default in source files in these days, as comments needs local language.
How about to just warn about the BOM ?

My examples above work because all Masm-compatible assemblers accept UTF-8. Open the attached *.asm source in a hex editor and check yourself.

.data
sCaption db "Заголовок", 0
sText db "Текст на русском!", 0


Btw it builds also with qEditor, although it looks like garbage. In RichMasm (http://masm32.com/board/index.php?topic=5314.0) you can actually read the Russian text, and you can save edits.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: aw27 on May 06, 2017, 05:00:38 PM
Quote from: jj2007 on May 06, 2017, 11:41:02 AM
My examples above work because all Masm-compatible assemblers accept UTF-8.
The assemblers simply don't know, they believe they are dealing with ASCII and it works in almost every case. If the UTF8 file has a BOM, it will be enough for the assembler to reject the file as not good.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: hutch-- on May 06, 2017, 06:12:58 PM
 :biggrin:

> Btw it builds also with qEditor, although it looks like garbage.

That just says that the method you use does not work in Quick Editor. Look in the example code for working UNICODE applications using the authodox Microsoft method. Then there is the QE accessory "MultiTool" that will do conversions for you and if all else fails, UniEdit lets you write anything you like in UNICODE which you then place in a UNICODE RC script.

QE is a pure ASCII editor, it does not pretend to write UTF8/16 or UNICODE.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 06, 2017, 07:19:50 PM
Quote from: aw27 on May 06, 2017, 05:00:38 PMThe assemblers simply don't know, they believe they are dealing with ASCII and it works in almost every case.

Exactly. And I have yet to see a case where it didn't work.include \masm32\include\masm32rt.inc
include uChrMacro.inc

.code
start:
  push MB_OK
  push uChr$("Заголовок!")
  push uChr$("Текст на русском!")
  push 0
  call MessageBoxW
  exit
end start


Pure Masm32. It looks simple enough, right? Of course, there are more complicated solutions.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: aw27 on May 06, 2017, 09:10:08 PM
Quote from: jj2007 on May 06, 2017, 07:19:50 PM
Exactly. And I have yet to see a case where it didn't work.

Your MasmBasic is amazing. Congratulations  :t
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 06, 2017, 10:18:26 PM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: aw27 on May 06, 2017, 10:32:43 PM
Quote from: nidud on May 06, 2017, 10:18:26 PM
error A2008: syntax error : я╗┐include

The BOMs suck  :badgrin:
Intelligent editors don't need a BOM to find out that the text is UTF8.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 06, 2017, 11:16:44 PM
Quote from: aw27 on May 06, 2017, 10:32:43 PMIntelligent editors don't need a BOM to find out that the text is UTF8.

I would even go one step further: Intelligent editors (http://masm32.com/board/index.php?topic=5314.0) know when to put a BOM and when to save it as UTF-8 without BOM :bgrin:

(more precisely: ml & clones can't stand the BOM, rc.exe likes it)
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: hutch-- on May 06, 2017, 11:36:56 PM
Bottom line is the UNICODE spec does not require it, its there for text that may be used on other hardware that uses a different byte order. The UNICODE editor I supply specifically does not have it as its designed for Windows UNICODE only.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 06, 2017, 11:42:03 PM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 07, 2017, 01:20:24 AM
Quote from: hutch-- on May 06, 2017, 11:36:56 PMthe UNICODE spec does not require it

Actually, there is no safe way to distinguish UTF-8 from "pure" Ansi. Most of the sources posted here would work exactly the same way if they were saved as UTF-8; simply because they don't contain any "exotic" characters. This is why the BOMs make sense. What is really, really hard to understand is that neither MASM nor the Watcom clones take a few cycles to test if there are two bytes to skip at the beginning of the source. These things have been around for several decades now...
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: TWell on May 07, 2017, 02:01:42 AM
QuoteUTF-8 representation of the BOM is the (hexadecimal) byte sequence 0xEF,0xBB,0xBF
I think many people like idea to have that handled in hjwasm and asmc.
After that notepad is not an evil anymore and comments in native language are not a problem.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 07, 2017, 02:16:02 AM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: hutch-- on May 07, 2017, 02:39:30 AM
Until there is a wide acceptance of writing compilers and assemblers to read UNICODE, the methods that have been around since at least WinNT4 will keep doing the job. Code is still generally ASCII and most assemblers/compilers even restrict the upper 128 character set. The exception which has been this way since the earliest Win32 is RC.EXE that has always been capable of UNICODE in Win32.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 07, 2017, 02:48:05 AM
Quote from: hutch-- on May 07, 2017, 02:39:30 AMUntil there is a wide acceptance of writing compilers and assemblers to read UNICODE, the methods that have been around since at least WinNT4 will keep doing the job.

Absolutely :t

And the handful of non-English coders who want to write their strings or comments in exotic alphabets can use RichMasm.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 07, 2017, 03:02:52 AM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: TWell on May 07, 2017, 03:11:52 AM
M$ Cpp don't scare about UTF-8 with BOM.
So this problem is  inherited from a ml.exe.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: hutch-- on May 07, 2017, 03:14:36 AM
 :biggrin:

> And the handful of non-English coders who want to write their strings or comments in exotic alphabets can use RichMasm.

Or anything else that can write UNICODE to a RC script. With an ASCII editor you can do these.

.data
align 4
  [rename me] \
    dw "T","h","i","s"," ","i","s"," ","a"," ","t","e","s","t",0,0
.code


Or this,

; ANSI string of 16 bytes converted to UNICODE
; at 34 bytes using MultiByteToWideChar

  [rename me] \
    db 84,0,104,0,105,0,115,0,32,0,105,0,115,0,32,0
    db 97,0,32,0,116,0,101,0,115,0,116,0,13,0,10,0
    db 0,0


Or this in a UNICODE RC sript.

    STRINGTABLE
    BEGIN
        250,  "早上好计算机程序员。\0"
        251,  "おはようのコンピュータのプログラマー。\0"
        252,  "Хороший программист утром.\0"
        253,  "Καλή προγραμματιστής ηλεκτρονικών υπολογιστών πρωί.\0"
        254,  "सुप्रभात कंप्यूटर प्रोग्रामर.\0"
        255,  "Chào buổi sáng lập trình máy tính.\0"
        256,  "დილა მშვიდობისა, კომპიუტერული პროგრამისტი.\0"
        257,  "Добро јутро компјутерски програмер.\0"
        258,  "Բարի լույս ծրագրավորող.\0"
        259,  "안녕하세요 컴퓨터 프로그래머.\0"
    END

Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 07, 2017, 03:51:32 AM
Quote from: nidud on May 07, 2017, 03:02:52 AM
Quote from: jj2007 on May 07, 2017, 02:48:05 AM
And the handful of non-English coders who want to write their strings or comments in exotic alphabets can use RichMasm.

Why do they have to use RichMasm?

Oops, that is a misunderstanding - see attachment.
include \masm32\MasmBasic\MasmBasic.inc  ; Вы знаете, так, где же найти эту библиотеку.
  Init
  uMsgBox 0, "真的,没有人是被迫使用高级编辑。", "Important message:", MB_OK
EndOfCode


(if you can't see the text correctly, try an advanced browser like FireFox, Edge, MSIE, Safari, Opera, Chrome or Vivaldi - there are a few that can display non-Latin alphabets)
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 07, 2017, 04:31:14 AM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 07, 2017, 04:37:49 AM
Why would any sane person want to work with such "text"?
What would you do if your browser showed you such gibberish?
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 07, 2017, 04:59:42 AM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 07, 2017, 05:04:06 AM
Quote from: nidud on May 07, 2017, 04:59:42 AMThe question is then: why do YOU use it?

The question is: Why do ALL browsers display all kinds of exotic languages correctly? Wouldn't it be so much easier if your browser displayed only the Norwegian subset correctly, and showed you gibberish if, for whatever strange reason, you visited French or German or Russian websites?

Hey, it's 2017. Unicode was invented in the 20th Century.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 07, 2017, 05:51:45 AM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 07, 2017, 06:55:26 AM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 07, 2017, 09:09:04 AM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: aw27 on May 07, 2017, 01:44:50 PM
Sign of times, DOS prompt (Ok, let's call it command prompt) showing multilanguage Unicode.

(http://www.atelierweb.com/a/testuni.png)
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: hutch-- on May 07, 2017, 01:48:41 PM
Could you folks minimise the amount of image data being posted as attachments in the forum, it just loads up the server with a mountain of crap.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: aw27 on May 07, 2017, 02:55:31 PM
Quote from: hutch-- on May 07, 2017, 01:48:41 PM
Could you folks minimise the amount of image data being posted as attachments in the forum, it just loads up the server with a mountain of crap.

I only posted one, not as attachment and not crap either.  :biggrin:
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 07, 2017, 06:18:55 PM
Quote from: aw27 on May 07, 2017, 01:44:50 PMSign of times, DOS prompt (Ok, let's call it command prompt) showing multilanguage Unicode.

For me, only Russian works in the console. But some years ago I used to see Chinese and Arabic in the console, too, and I have no idea why that ceased to work. I've tried chcp 65001 and 65000, using Lucida Console and Consolas, etc, no success :(
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: aw27 on May 07, 2017, 06:50:26 PM
Quote from: jj2007 on May 07, 2017, 06:18:55 PM
For me, only Russian works in the console. But some years ago I used to see Chinese and Arabic in the console, too, and I have no idea why that ceased to work. I've tried chcp 65001 and 65000, using Lucida Console and Consolas, etc, no success :(
I tested in Windows 10 using NSinSun or Gothic fonts. With the Windows 7 default fonts only Russian, but I think you can add fonts to the command prompt.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 07, 2017, 07:27:15 PM
Quote from: hutch-- on May 07, 2017, 01:48:41 PMCould you folks minimise the amount of image data being posted as attachments

Point taken, Hutch. Most of my images are stored on my private site, though, and what I attach here is usually not that big.

Here is a nice one by Joel: (https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/)

Quoteif you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will.
...
Also they were liberal hippies in California who wanted to conserve (sneer). If they were Texans they wouldn't have minded guzzling twice the number of bytes. But those Californian wimps couldn't bear the idea of doubling the amount of storage it took for strings, and anyway, there were already all these doggone documents out there using various ANSI and DBCS character sets and who's going to convert them all? Moi? For this reason alone most people decided to ignore Unicode for several years and in the meantime things got worse.

Thus was invented the brilliant concept of UTF-8 ... which has the nice property of also working respectably if you have the happy coincidence of English text and braindead programs that are completely unaware that there is anything other than ASCII.

With "braindead programs" he surely means ML.exe :P
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: TWell on May 07, 2017, 08:04:48 PM
Quote from: hutch-- on May 07, 2017, 01:48:41 PM
Could you folks minimise the amount of image data being posted as attachments in the forum, it just loads up the server with a mountain of crap.
OK.
I start removing my pictures and maybe later the whole profile  ;)
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: aw27 on May 07, 2017, 08:07:03 PM
Quote from: jj2007 on May 07, 2017, 07:27:15 PM
With "braindead programs" he surely means ML.exe :P
Rewording, UTF8 allows programs built by code page 437 developers to stay alive in a changing World.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 07, 2017, 10:35:08 PM
One common error for foreginers that is learn programming is about labels. As portuguese language point of view we can't create labels like "início" or "começo", and this is just "start" point. And student don't understand why he/she should write by removing accents, cedilla, ... .
This is hard because we can't create function names too.
I don't think this will change for a long time, so, plain english.
Would be nice if we can create function names, labels, variable names (not only data) by using unicode/utf-8/..., so we can inside same source code call a function writen on cyrilic, or arabic, ... . The hard part will be how to write that symbols by using keyboard.
Well, I don't have found a good solution to this, even if this works probably O.S. can refute that name.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: aw27 on May 07, 2017, 11:23:18 PM
Quote from: mineiro on May 07, 2017, 10:35:08 PM
One common error for foreginers that is learn programming is about labels.

Some compilers deal well with UTF8.
int Прощай()
{
   return 12;
}
int _tmain(int argc, _TCHAR* argv[])
{
   int início =  Прощай();
   int 好 = início;

   return 0;
}
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: aw27 on May 07, 2017, 11:26:22 PM

Quote from: mineiro on May 07, 2017, 10:35:08 PM
One common error for foreginers that is learn programming is about labels.

Some compilers deal well with UTF8.
int Прощай()
{
   return 12;
}
int _tmain(int argc, _TCHAR* argv[])
{
   int início =  Прощай();
   int 好 = início;

   return 0;
}

No assemblers, though.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 07, 2017, 11:41:50 PM
thanks to reply sir aw27;
the only language that I can program is assembly; I have tried C for dummies and python too, but I learned more by simbiosys on source code, so I only posses noob knowledge about others languages.
Good know that some languages accepts utf-8. I was reading about that and sounds not much difficult to implement. One symbol on screen can have 1,2,3 or 4 bytes on memory, and they have absorbed ascii symbols.
https://tools.ietf.org/html/rfc3629

A bit offtopic but other thing is about 'new line' on source code. On windows this is crlf, on linux is only lf. A lot of assemblers fails here only looking for crlf.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 07, 2017, 11:51:11 PM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 08, 2017, 12:39:53 AM
hello sir nidud, thanks for reply;

Quote from: nidud on May 07, 2017, 11:51:11 PM
The Latin characters are not restricted to the English language  ;)
Yes, I agree. Latin symbols don't have accents, cedilla, ... . This was created by french,spanish,portuguese,italian, german, maybe romaenian not sure...
When we turn computer on, symbols that we can see on screen are just data stored on bios to be put on screen. One chip factory created another symbol table and others the same. This is why they have created ascii symbols, to be a default between symbols (data).

QuoteYou don't need Unicode/UTF for that: That's not the problem.
I was thinking about:
início:
The letter 'í' have accent, this letter if correct codepage setup shows as before, if other codepage is selected then I can see russian symbols or others chars, and we are talking about same hexadecimal representation. I have failed with some assemblers to write that label on source code because they cut the ascii table to latin chars or the first 127 symbols. So I'm judging that this can happen too with russians or others languages.

-edited-
I was thinking too on math symbols. Instead of "call sum" function we can call sigma "call Σ". So, while learning a math book, a student can find that symbol on a function form on computers.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 08, 2017, 01:13:18 AM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 08, 2017, 04:11:48 AM
because ambiguity, same hexadecimal number to represent different symbols (codepage) instead of one hexadecimal combination to represent one symbol.
letter "ê" per example, I can use extended ascii symbol "ê" (your point of view I suppose) or I can use utf-8/unicode symbol "ê".

Well, a good example can be the undestanding of hexadecimal numbers from a russian point of view. I only have read russian alphabet years ago to undestand the meaning of that symbols and I concluded that we can switch many russian symbols to latim chars to better understand whats writen on russian language from a very noob point of view. So, yes, I can't read russian.
They start reading that to create hexadecimal numbers is necessary append other numeric symbols, so they included alphabet letters to fullfill possible combinations on hexa base.
This is the point; to us the default is "0123456789abcdef", "number" a to f, but from other perspective this is "0123456789аб?диф", "?" means I don't know the letter c on russian alphabet and I don't have sure if this is a real example.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 08, 2017, 05:01:25 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 08, 2017, 07:52:53 AM
we are talking the same thing sir nidud;
I suppose you're not able to create a variable, a label and a function name by using runic symbols by using assemblers.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 08, 2017, 08:45:51 AM
Quote from: nidud on May 08, 2017, 01:13:18 AM
This is the table (https://github.com/nidud/asmc/blob/master/source/libc/ltype/_ltype.asm) used to define a label in Asmc

So it would be relatively easy to use the full 200+ chars for creating labels, at the (small) price of losing compatibility with ML, of course.

But would the code be readable on another user's machine if he uses a different codepage?

My ambition here is different, it is to allow coders to create strings in all alphabets: Print "Добро пожаловать" without using clumsy workarounds. All coders understand what Print or start: means, no need to translate that into Russian or Chinese or Arabic. It would actually make our lives more difficult, because 99% of the available help on the Internet is in English. You don't get much help if you search for the keyword Печать (actually, Google is clever enough to translate it - but you get English help back...).

But the text to be processed is for the users of our programs. And the comments are for ourselves, we should be able to insert them in whatever alphabet we like:include \masm32\include\masm32rt.inc

.data?
buffer db 1000 dup(?)

.code
start:
  mov edi, offset buffer
  invoke crt_printf, chr$("Введите строку: ") ; получить пользовательский ввод
  invoke crt_scanf, chr$("%s"), edi
  invoke crt_printf, chr$("Вы набрали строку «%s»."), edi
  invoke crt__getch
  exit

end start


Output:
Введите строку: Hello
Вы набрали строку «Hello».

Attached as plain text, plain Masm32 example. You can build it in qEditor (it looks better in RichMasm, though).
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 08, 2017, 08:53:41 AM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 08, 2017, 09:46:48 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 08, 2017, 10:21:56 AM
Possible if you switches (remove) abc to favor runic symbols. You have got one solution that is use codepage, I agree this is necessary, but have other solutions.
Ok, let's not remove abc and use runic symbols on upper 127 ascii table symbols. We have a problem, we can't mix different symbols sources, so, impossible to deal with different languages on same source code. Unicode/utf-8 appears as solution.
I know that font are symbols and they can be represented by hexadecimal numbers, so if I switch codepage or, well, I don't need switch codepage because I can change only symbols (font). I understand what you write.

Labels from an assembler point of view are just identifiers, anchors to be a memory address reference. Label identifiers are done by symbols with two dots suffix. It's possible to create an assembler that recognizes unicode/utf-8 text as identifiers.

You show a font example, i will show a time example:
I have tried to be in year 1500ac. On windows O.S., I double clicked on clock to enter time configurations. I put year 1500 and hmmm, I can't setup this year. But I see other solution, I can go to bios and change by that place. So after tried this I reach hmmm again, I can't setup 1500 year on bios setup. How we can solve this cronologic problem? Because O.S. filesystems holds file time creation and with a simple 'dir' or 'ls' I can see this files date and uses it as a filter.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 08, 2017, 10:28:55 AM
Quote from: nidud on May 08, 2017, 09:46:48 AMWhy would it be clumsy for the Russians to write in their own native language?

I write regularly in four different languages. Do I need four PCs now?

Why does every single professional software on Windows work with either UTF-8 or Unicode, except assembly IDEs?
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 08, 2017, 10:57:56 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 08, 2017, 11:28:53 AM
yes, but from assembler point of view we don't have fonts, we have encode schemes.
Just the same thing as changing mnemonic strings to opcodes, but from a opcode to a string point of view.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 08, 2017, 11:33:13 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 08, 2017, 11:42:08 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 08, 2017, 01:31:50 PM
yes, I agree. The user need have installed that font to write that source code text by using a text editor.
But, if an assembler opens that text file only by looking for hexadecimal numbers, it's possible to recognize the size of that unicode/utf-8 string and this way know that that data is an identifier.

You know that 90% of persons on this world have conservative minds, I think that to assembly programmers this number is higher than 100%. Look for your excelent job about tables optimization feature, how much persons use that? We don't see much persons using that because we are conservative. Nothing will change as we know, this board accepts english language as mother language as I have said before. It's just an option or feature that can be added. If a person reaches this board using runic I was not be able to help.

Well, this is a good discussion sir nidud.
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 08, 2017, 06:14:45 PM
Quote from: nidud on May 08, 2017, 11:33:13 AM
QuoteWhy does every single professional software on Windows work with either UTF-8 or Unicode, except assembly IDEs?

Could you name one please.

RichMasm? (http://masm32.com/board/index.php?topic=6224.msg66283#msg66283) ;)
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 08, 2017, 08:36:04 PM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 08, 2017, 08:37:19 PM
deleted
Title: Re: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 08, 2017, 08:58:54 PM
Quote from: nidud on May 08, 2017, 08:37:19 PM
QuoteRichMasm? (http://masm32.com/board/index.php?topic=6224.msg66283#msg66283) ;)

So RichMasm do not work with either UTF-8 or Unicode?

It works perfectly, as the example shows (http://masm32.com/board/index.php?topic=6224.msg66283#msg66283). And all major software like MS Office, LibreOffice, Skype, Telegram, whatever, have absolutely no problem working with English, Chinese and Arabic simultaneously. So I really wonder why we are making such a big fuss about it here. It's not rocket science, and we are in the 21st century now.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: hutch-- on May 08, 2017, 09:16:48 PM
 :biggrin:

Well, user apps may be in the 21st century but compilers and assemblers are truly in the 20th century as they generally work only in ASCII.  :P It would not be wise to hold your breath waiting for compilers and assemblers to start working in UNICODE.  :badgrin:
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 08, 2017, 09:47:58 PM
No need to hold my breath - I just use UTF-8 :P
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 08, 2017, 09:50:41 PM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 08, 2017, 10:03:52 PM
I know it's a hard job to be implemented, but this board receive persons from all the world that can help.
I was reading about utf-8, we get first byte of source code, check the left most bit of that byte with zero, if yes we are can deal with ascii, we have N bytes char; If that is 110 we have N bytes as being one char; if have 1110 we have N bytes as being one char. After first byte is found the one that follows start with bits 10.
1110.... 10...... 10......
So, text source code suffered from hd bad blocks. When we open that text the left bits of first byte starts with 10......, we know that this is impossible to be the first, so we can discard that as being an identifier and check next byte that starts with 10..... too, we can discard and next one starts with 110, ops, we know that we have found something valid.

I know that an answer can be: "So, build your preprocessor", I see your side too.

Greek school phylosophers tell us that it's impossible to solve all universe and nature questions only using math (calculus, trigonometric and logic).
How scientists today try to prove their creations? Yes, using math.
It's hard believe that this multiverse with a lot of data to be discovered cannot be acessed by math. This is why physics can't join micro with macro cosmo I suppose. We need something new, but the guy that can do this will be a heretic.
On computers example, we live on a "rectangle tyranny" when talking about images.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 08, 2017, 10:04:07 PM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 08, 2017, 10:17:25 PM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 08, 2017, 10:47:36 PM
hehehe sir nidud
sorry, the phrase is 'cannot be acessed only by using math".
It's because 'scientific method' have been created by greek school. Guys from century ago modeled our way of life.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 09, 2017, 12:37:00 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 09, 2017, 03:44:31 AM
Persons are using a lot unicode today because turned into a default on internet (web pages). UTF-8 is being used a lot on chat programs, they are trying to make this default on emails too. I don't have sure if utf-8 absorb unicode, if yes, can be used as another layer.

I was able to copy different symbols on this topic and paste on console mode under linux just to feel, I discovered that I have runic symbols font, also others that I can't recognize. So we have a command interpreter (command.com or cmd.exe) that can accept unicode/utf-8 on linux.

mineiro@assembly:~/asm$ ᛒᛓᛕᛋᛘᛚᛝᛟᛠ:
ᛒᛓᛕᛋᛘᛚᛝᛟᛠ:: comando não encontrado
mineiro@assembly:~/asm$    dec ΤΥΦΧΨΩ
mineiro@assembly:~/asm$    jnz ᛒᛓᛕᛋᛘᛚᛝᛟᛠ
mineiro@assembly:~/asm$       250,  "早上好计算机程序员。\0"
mineiro@assembly:~/asm$         251,  "おはようのコンピュータのプログラマー。\0"
mineiro@assembly:~/asm$         252,  "Хороший программист утром.\0"
mineiro@assembly:~/asm$         253,  "Καλή προγραμματιστής ηλεκτρονικών υπολογιστών πρωί.\0"
mineiro@assembly:~/asm$         254,  "सुप्रभात कंप्यूटर प्रोग्रामर.\0"
mineiro@assembly:~/asm$         255,  "Chào buổi sáng lập trình máy tính.\0"
mineiro@assembly:~/asm$         256,  "დილა მშვიდობისა, კომპიუტერული პროგრამისტი.\0"
mineiro@assembly:~/asm$         257,  "Добро јутро компјутерски програмер.\0"
mineiro@assembly:~/asm$         258,  "Բարի լույս ծրագրավորող.\0"
mineiro@assembly:~/asm$         259,  "안녕하세요 컴퓨터 프로그래머.\0"

So, we don't need codepages to show different symbols. We need codepages if we are on ms-dos environment and think like what we have talked here.


---edited--: Anyone?

8
7
6
5
4
3
2
1
a b c d e f g h

1- e2-e4

Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 09, 2017, 06:53:06 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 09, 2017, 06:29:30 PM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 09, 2017, 09:52:29 PM
yes sir nidud, I understand;
"Image is nothing, contents is everything"
Not all chars are printable on ascii, that's why on hexdumps we see a lot of dots on ascii relation with hexa numbers. Also hexa "08h", that means 'backspace' key, if we print that so means that we will eat one char on screen.

string db "0",08h,"1",08h,"2",08h,0dh,0ah,08h,08h,00h

Also other example from memory, what's the visual diference between space char (20h) and (0ffh)?. On ms-dos days both represent 'null' char draw.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 09, 2017, 10:37:24 PM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 10, 2017, 03:11:20 AM
But I can see this topic by using cellphone, by using TV, internet of things.
I don't know how to change codepage on tv, cellphone, ..., only on pc computers.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 10, 2017, 03:45:11 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: TWell on May 10, 2017, 04:20:59 AM
The problem is that DOS/Windows command.com/cmd.exe don't support UTF-8.
Conversion have to do in some point. There is also a OEM/ANSI problem with CUI/GUI.
Windows NT is internally UNICODE (UTF-16LE).
ml understand only ASCII.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 10, 2017, 04:50:03 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 10, 2017, 06:44:03 AM
Attached two simple plain Masm32 sources, with and without Utf8 BOM.
Open them in Notepad, Wordpad, MS Word and qEditor to see the differences (there are differences, it's a mess).

Notepad++ btw treats both as UTF-8, which is slightly incorrect.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 10, 2017, 07:19:18 AM
I don't have sure sir nidud;
I have only read about utf-8 but I do not have started to code a utf-8 string identifier.
Tomorrow I can start to code, well, maybe 2 weeks later can be done, who knows.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 10, 2017, 08:00:28 AM
hellow sir jj2007;
I remember that you talked about 2 bytes on start of file, I think that inserted by notepad and not welcome.
What means that bytes? Can I ignore that 2 bytes or that can be some hint?
what you have found about that?
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 10, 2017, 08:11:00 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 10, 2017, 08:31:27 AM
QuoteWhat is it you can't see, whats missing?
The symbols/font does not match one with each other, different symbols.

Særleg != Særleg

Særleg == 7 symbols on screen
Særleg == 6 symbols on screen
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 10, 2017, 09:00:14 AM
Quote from: mineiro on May 10, 2017, 08:00:28 AMWhat means that bytes? Can I ignore that 2 bytes or that can be some hint?

These three bytes are hex EF BB BF, and they are the UTF-8 "BOM", i.e. byte order mark. There is a fairly good description here (https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8).

The main point of the BOM is that the software knows that the ANSI chars following are UTF-8 encoded. It is a priori not possible to identify with certainty whether an ANSI text is encoded as UTF-8 or with any other codepage. And only if you know the codepage, you can translate text to "true" Unicode for displaying e.g. MessageBoxes correctly. This is why everybody (well, almost everybody) uses UTF-8.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 10, 2017, 09:02:48 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 10, 2017, 11:00:28 AM
hello sir jj2007;
I checked that. Opened notepad, do not have pressed any char on keyboard and saved text on utf-8 format, the size of file is that 3 bytes.
exactly, thanks.

hello sir nidud;
I understood what you say.
Your computer is configured to your region, your country (codepage). Mine to other region, other country (codepage).
To I be able to read what you write using your language symbols (alphabet letters) I need know codepage that text has been written, because from my codepage what you written appears like garbage on screen. And this is mutual from your point of view, to you understand what I write on my language you need switch your computer codepage to the same or you will see garbage on screen.
So we start switching all codepage that we know in a hope that garbage strings transform into some recognized symbol strings.
I don't understand your language but I can recognize symbols of your language. So, while switching codepages sounds that symbols of your language fitted fine with others language symbols, this way I can believe that I have in hands a french/spanish/russian text instead of Norwegian text. We lost information.

I'm supposing that on your screen, the symbols below are equal instead of different:
Særleg != Særleg

edited--
writed to written, sorry, poor english language.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 10, 2017, 11:46:22 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 10, 2017, 12:39:46 PM
Quote from: nidud on May 10, 2017, 11:46:22 AM
Do you understand Norwegian?
No, but I can recognize Norwegian chars, letters, symbols.

Quote
I first need to learn you language I think  ;)
Yes, but you can recognize latin chars, letters, symbols.

Quote
Doesn't help if I don't understand the language. In this case we have similar letters, but Chines and Arabic, it doesn't really matter: I can't read it anyway.
Me too, but I can try to recognize their chars, letters, symbols.

Quote
For which purpose?
To switch garbage data on screen into form of chars, letters, symbols. So I can go and use a translator. If translator returns to me non sense words, so I know that I'm not dealing with that codepage language.

Quote
How likely do you think it is that one man could have written the sample below with the same keyboard/OS and understand all these languages?

      "早上好计算机程序员。\n"
      "おはようのコンピュータのプログラマー。\n"
      "Хороший программист утром.\n"
      "Καλή προγραμματιστής ηλεκτρονικών υπολογιστών πρωί.\n"
      "सुप्रभात कंप्यूटर प्रोग्रामर.\n"
      "Chào buổi sáng lập trình máy tính.\n"
      "დილა მშვიდობისა, კომპიუტერული პროგრამისტი.\n"
      "Добро јутро компјутерски програмер.\n"
      "Բարի լույս ծրագրավորող.\n"
      "안녕하세요 컴퓨터 프로그래머.\n",
They have used some escape char on browser, text editor, hexadecimal coded, ... to insert that symbols, or have copy and paste operation. On firefox browser I was able to enter that chars by this sequence:
control + shift + u
so, appears to me: u
after I pressed hexadecimal string sequence 2654:
u2654
after I pressed enter key and that turned into a valid symbol: a king chess piece (char, letter, symbol).


this is what I like to test, because I read that utf-8 chars can have variable bytes size, not only 2 bytes if extended chars is being show on screen.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 10, 2017, 07:55:35 PM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 10, 2017, 10:30:07 PM
hehehe, this is getting funny;

Well sir nidud, let's go back on past, on times that we are not programmers, we do not have computer skills.
So, as being normal user, my boss tell me to search a program to do inventory control. I searched on internet and don't have found one, on any language. But I have a creative mind, I looked to different program screens and don't recognized letters used on that program, but I recognized objects, controls like button,list view, textview, edit box, ... . So, I can use a program like appointment book to be a inventory control per example, instead of put my fingers on resource or also to change resource things by using a program like resource hacker.

You are talking that russians, chineses, brazilians don't like to write their source code by using comments on their mother languages. I'm saying to you the contrary, I like to write source code comments on my mother language, because I have a much bigger jargon than my poor english language. Also, have words on my language that don't exist on others languages. So, whats wrong with this? Why I can't create labels, function names, variables on my mother language? I need remove an accent from a variable name to be compatible with assembler, this way writing on wrong way on my language just to be acceptable to assembler.
Computer comes to make our life easy, not hard.

Chinese point is more difficult, they speak more than 3 different languages, dialects also inside China. Their alphabet cannot fit on 256 symbols space. So, let's exclude chineses? Let's change chinese culture to be like ours? No.

QuoteSo if we translate JWasm to Russian how would this be done? Well, I'm not capable of doing that, and assuming you don't understand Russian either we will need a Russian to do the actual translation. JWasm is a console application using ASCII strings, so this Russian fellow have to live in Russia, or at least use a Russian OS to write the ASCII strings.

Do we need to convert any of these strings to Unicode? No. Do we need to see these Russian ASCII strings? No. Do we need to use this version of JWasm? No, this is for Russian consumption only.

If you have searched by assembly source code on last 10 years, you meet this board, but you probably meet russian wasm board too.
We are talking about symbols, assembler don't need care if a label string was writen using one form or another, to assembler that continues being an identifier.
The point is that by using codepages I am not able to deal with more than 2 languages on same source code. So, I cannot translate bible that was writen on aramaic to latin to after translate to portuguese language.
I recognize my faults, my errors. Only because I'm not able to talk different languages does not means that others are like me.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 11, 2017, 01:03:52 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 11, 2017, 03:32:47 AM
hello sir nidud
yes, it's me   :biggrin:

QuoteNo, I'm saying they do this all the time so that's a lie.
Yes, I agree with you, you don't have said that. My fault. Sorry. I do not have expressed myself on the way I like.

QuoteNot sure what you trying to say here but if you read back your own posts it's clear that you understand most of the technical stuff and the limitations with regards to solve it. I think the problem is that you mixing programming language, which in itself is limited, with other types of software that don't have these limitations.
The limit of a programming language is our mind.
If I open an image edit program I can program on assembly language. I need know how to calculate address on mind and see the 'color' that fit that hexa number. A real example; I open an image edit program on grayscale mode, I insert color 'nop'(90h) on first position, a 'int'(0cdh) color on second position, and a color 20h on third position. I then export that 'image' to a disk as a raw way, rename that to .com and that works. "Cognitive parallax".

If I talk today about 'time machine', what persons understand?
They understand that we can travel on time/space.
The real meaning of time machines was to try to predict next season to plant, to grow, rain seasons.

-edited-
https://www.youtube.com/watch?v=7Y_SQBdVHQk
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 12, 2017, 03:43:47 AM
hello sir nidud;
this is what I have done
UTF-8 PSEUDOCODE

if first byte is between < 00h to  7fh > then is ascii <= 7fh
if first byte is between <0c0h to 0dfh > then one more byte follows
if second byte is between <80h to 0bfh> is valid char
if first byte is between <0e0h to 0efh> then two more byte follows
if second and third byte is between <80h to 0bfh> is valid char
if first byte is between <0f0h to ...h> then three more byte follows
if second and third and fourth byte is between <80h to 0bfh> is valid char
...
110b = 2, 1110b = 3, 11110b = 4, 111110b = 5 bytes , ...

note:
Using ascii chars inside raw text files (.txt, not structured) bellow hexa numbers are not possible:
00h,01h,02h,03h,04h,05h,06h,07h,08h,   ,   ,0bh,0ch,   ,0eh,0fh ;07 is bell, 08h is backspace, 0bh verticall tab,
10h,11h,12h,13h,14h,15h,16h,17h,18h,19h,1ah,1bh,1ch,1dh,1eh,1fh
                           ,27h,                                ;27h is escape key
                                                           ,7fh ;7fh is delete
Hexa numbers above are usefull to text editors to control text viewer and text buffer sync.

09h is tab, 0ah is line feed, 0dh is carriage return ;abstraction: mechanical typewriter machine
LF happens when we press an arm of typewriter machine to feed paper, to roll paper by doing pressure to activate rotor
CR happens when we move pressed typewriter arm full to back
TAB happens when we control arm moves
SPACE happens when we control arm move one step foward
I have excluded bell sound when arm reach end of paper

Conclusion:
utf-8 does not use chars on range (80h to 0bfh) as being first byte
      does not use chars on range (00h to 7fh) and (0c0h to 0ffh) as being second, third and fourth next bytes

cr,lf,tab,space are global valid chars


So, by preserving all possible ascii table we can't do this.
A scanner example; search for space,tab,cr,lf (non printable chars). From ascii to unicode is easy, 0020h or 2000h (little big endian) to unicode and 20h to ascii.But from utf-8 to ascii they are the same.
We can try to predict text by utf-8 rules, also language entropy can or cannot fit on that rule.
início:      utf8 í == 0c3h 0adh
thats: ascii (in) utf8 (í) ascii (cio)

That word on ascii will not fit on utf8 rules. (language entropy).
So, to increase prediction we can insert excluded ascii chars (00h,01h,...), because on text files they are not possible, but assemblers accept that.
But, this can fail on language entropy and exclude ascii chars.

So, to solve this problem you can create a switch key on command line to tell assembler that input source file is unicode/utf8/ascii Default:ascii

edit--
minor correction on pseudocode
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 12, 2017, 06:18:56 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 12, 2017, 08:05:15 AM
hello sir nidud;
I continue trying to make you think about
início db "início" ; início is a label, a string (ascii or utf8) and a comment

Quote from: nidud on May 12, 2017, 06:18:56 AM
Given we have no idea whats above 0x7F we just have to accept all chars above as done in the table example.
We have an idea no? Thats a delete key on keyboard. Ascii rules.
Ok, above that we can't conclude anything.

Quote
I will assume the the main goal here is to be able to use the Portuguese language included chars above 127
Thank you for personalized version, but I'm not thinking only on portuguese language, I'm thinking about all languages.

Særleg == 6 symbols on screen
ascii (S) utf8 (æ) ascii (rleg)
the entropy of that word on your language by using utf8 rules is:
Særleg is acceptable on utf8 rule, so it's a valid utf8 text string
but if you try on your computer "Særleg" word only on ascii mode so means that it cannot fit on utf8 rules, so, it's an ascii string.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 12, 2017, 08:41:12 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 12, 2017, 09:17:47 AM
well sir nidud, I think I'm not perturbing you ok.

mineiro@assembly:~/.wine/drive_c$ wine asmc /pe test.asm
Asmc Macro Assembler Version 2.24G
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.

Assembling: test.asm
fixme:ntdll:find_reg_tz_info Can't find matching timezone information in the registry for bias 180, std (d/m/y): 19/02/2017, dlt (d/m/y): 15/10/2017
test.asm(12) : error A2008: syntax error : S
test.asm(15) : error A2167: unexpected literal found in expression : ício:
test.asm(16) : error A2206: missing operator in expression
test.asm(16) : error A2033: invalid INVOKE argument : 0
test.asm(19) : error A2008: syntax error : início
test.asm(19) : error A2088: END directive required at end of file
mineiro@assembly:~/.wine/drive_c$

asmc does not understand utf8 text files
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 12, 2017, 09:30:38 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 12, 2017, 10:44:20 AM
I think you have uploaded wrong version. From 3 downloads 2 are mine, just to re-check.
Also date/time included, this file has been build today.
mineiro@assembly:~/.wine/drive_c$ ls asmc
asmc224G-mineiro.zip  asmc.exe
mineiro@assembly:~/.wine/drive_c$ ls asmc.exe -sal
296 -rw-rw-r-- 1 mineiro mineiro 303104 Mai 11 14:39 asmc.exe
mineiro@assembly:~/.wine/drive_c$ wine cmd.exe
Versão do CMD Wine 5.1.2600 (1.6.2)

C:\>asmc
Asmc Macro Assembler Version 2.24G
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.

USAGE: ASMC [ options ] filelist
Use option /? for more info

C:\>dir asmc.exe
O volume na unidade C não tem rótulo.
Número de Série do Volume é 0000-0000

Directory of C:

11/5/2017     14:39       303,104  asmc.exe
       1 file                   303,104 bytes
       0 directories     95,769,042,944 bytes free


C:\>
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 13, 2017, 09:33:48 PM
pseudocode posted before works but can accept invalid unicode chars. After rewriting that code 3 times I reach this code, not optimized:

include \masm32\include\masm32rt.inc

.686
.xmm

predict_txt PROTO :dword, :dword
is_valid_utf8_encode PROTO :dword, :dword

.data
pbuffchar dd 2 dup (0)

.data?
houtput dd ?
pfile dd ?
szfile dd ?
temp dd ?

.code
start:
invoke GetStdHandle,STD_OUTPUT_HANDLE
mov houtput,eax

mov pfile,InputFile("utf8.txt")
mov szfile,ecx

invoke predict_txt,pfile,szfile

free pfile
inkey "Done..."
invoke ExitProcess, 0

align 16
predict_txt proc _pfile:DWORD,_szfile:DWORD

mov esi,_pfile
mov ecx,_szfile

next_char:
invoke is_valid_utf8_encode,esi,ecx
;returns 0 if invalid utf8 char
;returns -1 if bound error
;return sizeof utf8 char

.if eax == -1 ;abort
jmp quit
.elseif eax == 0 ;if invalid utf8 so this can be an extended ascii char table
.elseif eax == 1 ;valid ascii char
;check identifier delimiters
.else ;valid utf8 char, check BOM

.endif
next:
sub ecx,eax
add esi,eax

test ecx,ecx
jnz next_char

quit:
ret
predict_txt endp

align 16
;this function check for a valid utf8 char on text
;return:
;-1 if bound error sizeof text !=
;0 if not valid utf8
;sizeof utf8 char
is_valid_utf8_encode proc uses esi ecx _ptext:dword,_sztext:dword

LOCAL szbytes:dword

mov szbytes,-1
mov esi,_ptext
mov ecx,_sztext

test ecx,ecx ;eof?
jz done

mov szbytes,0
movzx eax,byte ptr [esi] ;read one byte
@@: ;counting utf8 bytes need to get a valid char
inc szbytes
rcl al,1 ;0???????? 110????? 1110???? 11110??? 111110?? 1111110? 11111110
jc @B

.if szbytes > ecx
mov szbytes,-1 ;don't have sufficient bytes to be read
jmp done
.endif
.if szbytes > 4 ;not utf8 valid encode
mov szbytes,0
jmp done
.endif

mov szbytes,0
movzx eax,byte ptr [esi] ;read one byte
.if eax <= 07fh
mov szbytes,1
jmp done
.elseif eax >= 0c2h && eax <= 0dfh
.if byte ptr [esi+1] >= 80h && byte ptr [esi+1] <= 0bfh
mov szbytes,2
jmp done
.endif


.elseif eax == 0e0h
.if byte ptr [esi+1] >= 0a0h && byte ptr [esi+1] <= 0bfh
.if byte ptr [esi+2] >= 80h && byte ptr [esi+2] <= 0bfh
mov szbytes,3
jmp done
.endif
.endif
.elseif eax >= 0e1h && eax <= 0ech
.if byte ptr [esi+1] >= 80h && byte ptr [esi+1] <= 0bfh
.if byte ptr [esi+2] >= 80h && byte ptr [esi+2] <= 0bfh
mov szbytes,3
jmp done
.endif
.endif
.elseif eax == 0edh
.if byte ptr [esi+1] >= 80h && byte ptr [esi+1] <= 09fh
.if byte ptr [esi+2] >= 80h && byte ptr [esi+2] <= 0bfh
mov szbytes,3
jmp done
.endif
.endif
.elseif eax >= 0eeh && eax <= 0efh
.if byte ptr [esi+1] >= 80h && byte ptr [esi+1] <= 0bfh
.if byte ptr [esi+2] >= 80h && byte ptr [esi+2] <= 0bfh
mov szbytes,3
jmp done
.endif
.endif


.elseif eax == 0f0h
.if byte ptr [esi+1] >= 90h && byte ptr [esi+1] <= 0bfh
.if byte ptr [esi+2] >= 80h && byte ptr [esi+2] <= 0bfh
.if byte ptr [esi+3] >= 80h && byte ptr [esi+3] <= 0bfh
mov szbytes,4
jmp done
.endif
.endif
.endif

.elseif eax >= 0f1h && eax <= 0f3h
.if byte ptr [esi+1] >= 80h && byte ptr [esi+1] <= 0bfh
.if byte ptr [esi+2] >= 80h && byte ptr [esi+2] <= 0bfh
.if byte ptr [esi+3] >= 80h && byte ptr [esi+3] <= 0bfh
mov szbytes,4
jmp done
.endif
.endif
.endif

.elseif eax == 0f4h
.if byte ptr [esi+1] >= 80h && byte ptr [esi+1] <= 8fh
.if byte ptr [esi+2] >= 80h && byte ptr [esi+2] <= 0bfh
.if byte ptr [esi+3] >= 80h && byte ptr [esi+3] <= 0bfh
mov szbytes,4
jmp done
.endif
.endif
.endif
.endif

done:
mov eax,szbytes
ret
is_valid_utf8_encode endp

end start

edited= inserted -1 error
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 14, 2017, 12:23:39 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 14, 2017, 08:27:54 AM
hello sir nidud;
I think that don't work because uses spaces, and spaces are identifier delimiter into one scope, but on string construction " " they are valid.

; Build: asmc /pe test.asm
.486
.model   flat, c
option   dllimport:<msvcrt.dll>

printf   proto :ptr, :vararg
exit   proto :dword

.data
dd 10

.code
↑:
dec   
jnz ↑
printf("%d\n",)
exit(0)

end   ↑
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 14, 2017, 10:05:48 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 14, 2017, 12:00:14 PM
QuoteMy suggestion was to use a word processor to write documents and programming tools for programming.

Can you tell me what tools are you using for programming? an assembler that do not understand utf8 symbols only ascii? An text editor that do not understand utf8 but only ascii? A debugger that cannot display utf8 symbols? A hexa editor that cannot show utf8 symbols?

utf8 text files are plain like ascii, so, raw txt.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 14, 2017, 10:14:59 PM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: TWell on May 14, 2017, 11:27:29 PM
QuoteNo assemblers or compilers understand UTF-8.
Not true, many C compilers understand UTF-8, for example PellesC, gcc, clang
QuotePelles C for Windows (poide64.exe in this case) do the same
If someone don't select UTF-8 file before saving as ANSI is a default format

Maybe this kind of things needs new topic.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 15, 2017, 12:17:04 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 15, 2017, 02:33:36 AM
Quote from: nidud on May 15, 2017, 12:17:04 AMNonsense, no programming tools understands UTF-8

include \masm32\include\masm32rt.inc
.code
start:
  tmp$ CATSTR <print cfm$("This code was assembled with >, @Environ(oAssembler), <, an incredibly old programming tool\n")>
  tmp$
  print "This assembler does not 'understand' UTF-8, but it can use it", 13, 10
  print "Cet assembleur ne 'comprend' pas UTF-8, mais il peut l'utiliser", 13, 10
  print "Este montador não 'entende' UTF-8, mas pode usá-lo", 13, 10
  inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10
  exit
end start


Output:
This code was assembled with MLv614, an incredibly old programming tool
This assembler does not 'understand' UTF-8, but it can use it
Cet assembleur ne 'comprend' pas UTF-8, mais il peut l'utiliser
Este montador não 'entende' UTF-8, mas pode usá-lo
Этот ассемблер не 'понимает' UTF-8, но может его использовать


Quoteas explained ad nostrum

It's 'ad nauseam'.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 15, 2017, 03:09:25 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 15, 2017, 03:31:52 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: jj2007 on May 15, 2017, 03:38:43 AM
Quote from: nidud on May 15, 2017, 03:31:52 AMDo you normally use UTF-8 "quoted text" declarations in source code?

No. They are normally declared in resource (RC) files.

You can stick to your clumsy old tools and habits, no problem.

I use
inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10

because I live in the 21st Century.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 15, 2017, 03:49:16 AM
deleted
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: HSE on May 15, 2017, 04:04:40 AM
Quote from: jj2007 on May 15, 2017, 03:38:43 AM

I use
inkey "Этот ассемблер не 'понимает' UTF-8, но может его использовать", 13, 10

With such self explained programs no help is needed in Italy. :biggrin: 
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: mineiro on May 15, 2017, 07:22:09 AM
I'm back  :biggrin:

Quote from: nidud on May 15, 2017, 03:31:52 AM
What do we need UTF-8 for in source code?
We need that if user language cannot fit on 256 ascii bytes.
If user have coded their source code using some codepage and like to exchange that source code with others persons around the world, and don't like that when others open they source code and see garbage on screen. So, what you send is exactly what others get by using utf8 (unique chars), instead of what you send is only recognized by others when they change to the same codepage.
Quote
Do you normally use UTF-8 "quoted text" declarations in source code?
I'm doing a translator and I have started my attack on 'quoted text'. So, I can have only 3 contexts or scopes, one will be on string construction, other on code construction (variables, ...) and other on comments ignored chars construction.
This way I can create a translator that can be usefull to any assembler on this world, also that ones forgotten on ms-dos world like TMA.
ሴ db "ሴ"
If I have scenario above, I can translate variable name ሴ to ascii hex, because utf8 encode is unique, so, unique identifiers from assembler point of view. Now we are dressing utf8 to hex ascii;
utf8_e188b4 db "ሴ"
On other pass, I will play with string construction scope.
utf8_e188b4 db 0e1h,88h,b4h         ;u1234
Quote
What about console/text based applications?
On linux terminal (windows console) we are able to play with utf8 and unicode chars. When we 'cat utf8.txt' (type utf8.txt) we do not see garbage on screen.
Quote
Should assemblers be able to detect the UTF-8 header (BOM)?
This is optional, but check for utf8 mark, unicode mark is good, also for future options like utf16, ... . But have in mind that linux editors does not insert BOM on files, I have see this on my tests with Notepad only (windows universe).

The real point that I now can feel better is:
We have a lot of transformation function to operate numbers, hex2ascii,bin2ascii,hexascii2hex, ... . We do not have transformation function that deals with chars, functions like utf8_2_unicode, unicode_2_utf8, utf8_2_asciihex, ... .

So, why not create an assembler to be usefull ad infinitum? As you can see, utf8 is just a dress of chars, it will not change any assembler syntax, because all that I have found only uses <=7fh chars on ascii table.
Title: Re: How to generate an Unicode string under MASM 6.15?
Post by: nidud on May 15, 2017, 08:57:35 AM
deleted