News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Re: How to generate an Unicode string under MASM 6.15?

Started by nidud, May 05, 2017, 11:15:13 PM

Previous topic - Next topic

nidud

#45
deleted

mineiro

because ambiguity, same hexadecimal number to represent different symbols (codepage) instead of one hexadecimal combination to represent one symbol.
letter "ê" per example, I can use extended ascii symbol "ê" (your point of view I suppose) or I can use utf-8/unicode symbol "ê".

Well, a good example can be the undestanding of hexadecimal numbers from a russian point of view. I only have read russian alphabet years ago to undestand the meaning of that symbols and I concluded that we can switch many russian symbols to latim chars to better understand whats writen on russian language from a very noob point of view. So, yes, I can't read russian.
They start reading that to create hexadecimal numbers is necessary append other numeric symbols, so they included alphabet letters to fullfill possible combinations on hexa base.
This is the point; to us the default is "0123456789abcdef", "number" a to f, but from other perspective this is "0123456789аб?диф", "?" means I don't know the letter c on russian alphabet and I don't have sure if this is a real example.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

#47
deleted

mineiro

we are talking the same thing sir nidud;
I suppose you're not able to create a variable, a label and a function name by using runic symbols by using assemblers.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

jj2007

Quote from: nidud on May 08, 2017, 01:13:18 AM
This is the table used to define a label in Asmc

So it would be relatively easy to use the full 200+ chars for creating labels, at the (small) price of losing compatibility with ML, of course.

But would the code be readable on another user's machine if he uses a different codepage?

My ambition here is different, it is to allow coders to create strings in all alphabets: Print "Добро пожаловать" without using clumsy workarounds. All coders understand what Print or start: means, no need to translate that into Russian or Chinese or Arabic. It would actually make our lives more difficult, because 99% of the available help on the Internet is in English. You don't get much help if you search for the keyword Печать (actually, Google is clever enough to translate it - but you get English help back...).

But the text to be processed is for the users of our programs. And the comments are for ourselves, we should be able to insert them in whatever alphabet we like:include \masm32\include\masm32rt.inc

.data?
buffer db 1000 dup(?)

.code
start:
  mov edi, offset buffer
  invoke crt_printf, chr$("Введите строку: ") ; получить пользовательский ввод
  invoke crt_scanf, chr$("%s"), edi
  invoke crt_printf, chr$("Вы набрали строку «%s»."), edi
  invoke crt__getch
  exit

end start


Output:
Введите строку: Hello
Вы набрали строку «Hello».

Attached as plain text, plain Masm32 example. You can build it in qEditor (it looks better in RichMasm, though).

nidud

#50
deleted

nidud

#51
deleted

mineiro

Possible if you switches (remove) abc to favor runic symbols. You have got one solution that is use codepage, I agree this is necessary, but have other solutions.
Ok, let's not remove abc and use runic symbols on upper 127 ascii table symbols. We have a problem, we can't mix different symbols sources, so, impossible to deal with different languages on same source code. Unicode/utf-8 appears as solution.
I know that font are symbols and they can be represented by hexadecimal numbers, so if I switch codepage or, well, I don't need switch codepage because I can change only symbols (font). I understand what you write.

Labels from an assembler point of view are just identifiers, anchors to be a memory address reference. Label identifiers are done by symbols with two dots suffix. It's possible to create an assembler that recognizes unicode/utf-8 text as identifiers.

You show a font example, i will show a time example:
I have tried to be in year 1500ac. On windows O.S., I double clicked on clock to enter time configurations. I put year 1500 and hmmm, I can't setup this year. But I see other solution, I can go to bios and change by that place. So after tried this I reach hmmm again, I can't setup 1500 year on bios setup. How we can solve this cronologic problem? Because O.S. filesystems holds file time creation and with a simple 'dir' or 'ls' I can see this files date and uses it as a filter.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

jj2007

Quote from: nidud on May 08, 2017, 09:46:48 AMWhy would it be clumsy for the Russians to write in their own native language?

I write regularly in four different languages. Do I need four PCs now?

Why does every single professional software on Windows work with either UTF-8 or Unicode, except assembly IDEs?

nidud

#54
deleted

mineiro

yes, but from assembler point of view we don't have fonts, we have encode schemes.
Just the same thing as changing mnemonic strings to opcodes, but from a opcode to a string point of view.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

#56
deleted

nidud

#57
deleted

mineiro

yes, I agree. The user need have installed that font to write that source code text by using a text editor.
But, if an assembler opens that text file only by looking for hexadecimal numbers, it's possible to recognize the size of that unicode/utf-8 string and this way know that that data is an identifier.

You know that 90% of persons on this world have conservative minds, I think that to assembly programmers this number is higher than 100%. Look for your excelent job about tables optimization feature, how much persons use that? We don't see much persons using that because we are conservative. Nothing will change as we know, this board accepts english language as mother language as I have said before. It's just an option or feature that can be added. If a person reaches this board using runic I was not be able to help.

Well, this is a good discussion sir nidud.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

jj2007

Quote from: nidud on May 08, 2017, 11:33:13 AM
QuoteWhy does every single professional software on Windows work with either UTF-8 or Unicode, except assembly IDEs?

Could you name one please.

RichMasm? ;)