News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Re: How to generate an Unicode string under MASM 6.15?

Started by nidud, May 05, 2017, 11:15:13 PM

Previous topic - Next topic

nidud

#60
deleted

nidud

#61
deleted

jj2007

Quote from: nidud on May 08, 2017, 08:37:19 PM
QuoteRichMasm? ;)

So RichMasm do not work with either UTF-8 or Unicode?

It works perfectly, as the example shows. And all major software like MS Office, LibreOffice, Skype, Telegram, whatever, have absolutely no problem working with English, Chinese and Arabic simultaneously. So I really wonder why we are making such a big fuss about it here. It's not rocket science, and we are in the 21st century now.

hutch--

 :biggrin:

Well, user apps may be in the 21st century but compilers and assemblers are truly in the 20th century as they generally work only in ASCII.  :P It would not be wise to hold your breath waiting for compilers and assemblers to start working in UNICODE.  :badgrin:

jj2007

No need to hold my breath - I just use UTF-8 :P

nidud

#65
deleted

mineiro

I know it's a hard job to be implemented, but this board receive persons from all the world that can help.
I was reading about utf-8, we get first byte of source code, check the left most bit of that byte with zero, if yes we are can deal with ascii, we have N bytes char; If that is 110 we have N bytes as being one char; if have 1110 we have N bytes as being one char. After first byte is found the one that follows start with bits 10.
1110.... 10...... 10......
So, text source code suffered from hd bad blocks. When we open that text the left bits of first byte starts with 10......, we know that this is impossible to be the first, so we can discard that as being an identifier and check next byte that starts with 10..... too, we can discard and next one starts with 110, ops, we know that we have found something valid.

I know that an answer can be: "So, build your preprocessor", I see your side too.

Greek school phylosophers tell us that it's impossible to solve all universe and nature questions only using math (calculus, trigonometric and logic).
How scientists today try to prove their creations? Yes, using math.
It's hard believe that this multiverse with a lot of data to be discovered cannot be acessed by math. This is why physics can't join micro with macro cosmo I suppose. We need something new, but the guy that can do this will be a heretic.
On computers example, we live on a "rectangle tyranny" when talking about images.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

#67
deleted

nidud

#68
deleted

mineiro

hehehe sir nidud
sorry, the phrase is 'cannot be acessed only by using math".
It's because 'scientific method' have been created by greek school. Guys from century ago modeled our way of life.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

#70
deleted

mineiro

#71
Persons are using a lot unicode today because turned into a default on internet (web pages). UTF-8 is being used a lot on chat programs, they are trying to make this default on emails too. I don't have sure if utf-8 absorb unicode, if yes, can be used as another layer.

I was able to copy different symbols on this topic and paste on console mode under linux just to feel, I discovered that I have runic symbols font, also others that I can't recognize. So we have a command interpreter (command.com or cmd.exe) that can accept unicode/utf-8 on linux.

mineiro@assembly:~/asm$ ᛒᛓᛕᛋᛘᛚᛝᛟᛠ:
ᛒᛓᛕᛋᛘᛚᛝᛟᛠ:: comando não encontrado
mineiro@assembly:~/asm$    dec ΤΥΦΧΨΩ
mineiro@assembly:~/asm$    jnz ᛒᛓᛕᛋᛘᛚᛝᛟᛠ
mineiro@assembly:~/asm$       250,  "早上好计算机程序员。\0"
mineiro@assembly:~/asm$         251,  "おはようのコンピュータのプログラマー。\0"
mineiro@assembly:~/asm$         252,  "Хороший программист утром.\0"
mineiro@assembly:~/asm$         253,  "Καλή προγραμματιστής ηλεκτρονικών υπολογιστών πρωί.\0"
mineiro@assembly:~/asm$         254,  "सुप्रभात कंप्यूटर प्रोग्रामर.\0"
mineiro@assembly:~/asm$         255,  "Chào buổi sáng lập trình máy tính.\0"
mineiro@assembly:~/asm$         256,  "დილა მშვიდობისა, კომპიუტერული პროგრამისტი.\0"
mineiro@assembly:~/asm$         257,  "Добро јутро компјутерски програмер.\0"
mineiro@assembly:~/asm$         258,  "Բարի լույս ծրագրավորող.\0"
mineiro@assembly:~/asm$         259,  "안녕하세요 컴퓨터 프로그래머.\0"

So, we don't need codepages to show different symbols. We need codepages if we are on ms-dos environment and think like what we have talked here.


---edited--: Anyone?

8
7
6
5
4
3
2
1
a b c d e f g h

1- e2-e4

I'd rather be this ambulant metamorphosis than to have that old opinion about everything

nidud

#72
deleted

nidud

#73
deleted

mineiro

yes sir nidud, I understand;
"Image is nothing, contents is everything"
Not all chars are printable on ascii, that's why on hexdumps we see a lot of dots on ascii relation with hexa numbers. Also hexa "08h", that means 'backspace' key, if we print that so means that we will eat one char on screen.

string db "0",08h,"1",08h,"2",08h,0dh,0ah,08h,08h,00h

Also other example from memory, what's the visual diference between space char (20h) and (0ffh)?. On ms-dos days both represent 'null' char draw.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything