News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

UTF8 disk I/O library ....

Started by CommonTater, March 10, 2013, 02:13:26 AM

Previous topic - Next topic

CommonTater

#15
Quote from: jj2007 on March 10, 2013, 07:45:04 AM
Quote from: CommonTater on March 10, 2013, 07:33:44 AMYou feed it a word processor file and then try to claim my library doesn't work when it can't sort out the formatting marks

No, Utf8.txt is a plain UTF-8 text file, of the type you claim to handle with your library. You can open it in Notepad, \Masm32\qeditor.exe, poide.exe, and it will display 9 lines of Russian, Arabic and Chinese text.

 The demo program was to show you how to use the function calls.  It's NOT a  word processor... it reads and writes wide strings in CONSOLE mode which does not support multiple code pages as you're asking it to.  There was NO attempt to provide anything except an example to show how to use the functions.

GOT THAT?

Make a GUI mode program... a little edit window and use the functions to save and load the edit window... and watch what happens...  OH MY GOD... look at that... it actually works!  WOW....

The actual conversions are done with Windows API calls that are known to work.... world wide... and are in use in literally thousands of programs.  If you'd even bothered to do anything except finding a way to foul it up then claiming it doesn't work you would know that. 




dedndave

the differences may be due to the absence or presence of a BOM marker at the beginning of the file
while not part of the UTF-8 spec, it is usually present in windows unicode text files

jj2007

Quote from: dedndave on March 10, 2013, 08:29:07 AM
the differences may be due to the absence or presence of a BOM marker at the beginning of the file
while not part of the UTF-8 spec, it is usually present in windows unicode text files

That's what I also suspected but (Main.c):

    // opening for read
    wprintf(L"Reading Utf8.txt\n");
    file = OpenUTF8(L"Utf8.txt", FM_READ);
    if(GetLastError() == ERR_BOM)
      wprintf(L"No BOM! ... ");


The Utf8.txt file has a BOM, and besides, Tater's code works if the UTF-8 strings are English only.

Gunther

Hi CommonTater,

I've read the entire thread and I'm not the referee nor the moderator here, but:

Quote from: CommonTater on March 10, 2013, 08:09:03 AM
Listen up ASSHOLE .... Now take your petty STUPID vendettas and ram them right up your FAT HOMO ASS!

that is not our way of speaking here!

Gunther
You have to know the facts before you can distort them.

CommonTater

#19
Quote from: Gunther on March 10, 2013, 08:51:59 AMthat is not our way of speaking here!

Obviously... and I apologize to everyone but JJ.
AND... if he starts his bullshit up again, he'll get exactly the same reaction from me everytime!

It's got nothing to do with the Byte Order Message ... My code consumes that without microsoft's text conversion routines ever seeing it... in fact they would be confused by it.

It's got nothing to do main.c .... the test program does exactly what it's supposed to do... it shows you how to call the functions.

It is about the limitations on windows console (cmd.exe).  The test??.exe files were used for debugging purposes and were provided with the archive as demonstrations. They show you how to call the library's 4 functions... It's NOT a word processor. It's NOT a multilanguage display system... it's a freaking demo program and nothing more.

The reason he thinks it doesn't work is he is trying to do multiple foreign languages in WINDOWS CONSOLE which is bound to a specific code page and after the first text output can only correctly display text within that one --singular-- codepage. 

Let me say this one more time for perfect clarity ...

It's the console window that can't display the text
The library works... or it would not have been online.



Windows GUI, on the other hand uses a different system and can and does work correctly with the UTF8 library.  And most importantly it works BECAUSE, I've used the windows native conversion apis... you know the ones in use all around the world with no problems at all. 

Don't believe me?  Well the source code is in the zip file... look up the functions used...  Nothing is hidden, it's all right there for anyone to examine...

This is a 100% manufactured problem that exists only in JJ's dementia and noplace else.

And, despite the people telling me to grow a thicker skin or buy a sense of humour....

NO I do not have to take that that crap from anyone. 
I never have in the past and I'm not going to start now.

Do we understand one another now?

NB: Edited severely by the Admin'

Dubby

whoa... what happen guys...??

first I don't want to insult anyone.. then I would like to apologize if do so...

OK here's my comment..
the CMD window or you call it console simply use it's selected font to do font rendering... without any font replacement in which GUI dialog boxes does...
for a simple test, just open up cmd and copy paste some Unicode character (not the English one).
Now choose different available font, in my win7 system the "consolas" font is available choose that one.. now copy and paste some Russian text.. it will displayed correctly...

why? simply... because the font support it...
now paste some Chinese character, it will displayed nothing but garbage...
then what about choosing another font?
but hey MS doesn't even let me choose another font..
just go here: http://blogs.msdn.com/b/oldnewthing/archive/2007/05/16/2659903.aspx
and here: http://support.microsoft.com/kb/247815
you'll see why...
but it will be another case if you direct the output to another window....


Some people might be better at something but not in another thing...

I'm not against anyone here, but the only one I support is the truth... (oops is it too rhetorical..?)
again my apologize to anyone who might got insulted...

jj2007

Quote from: CommonTater on March 10, 2013, 11:39:55 AMThe reason he thinks it doesn't work is that  the dumb shit is trying to do multiple foreign languages in WINDOWS CONSOLE which is bound to a specific code page and after the first text output can only correctly display text within that one --singular-- codepage.

Tater, you are confused. The Windows console is indeed able to display several languages simultaneously, using one codepage - UTF8. My examples demonstrate that very clearly. It is a bit challenging, yes, but it is feasible. Otherwise there would be no console windows in a country that has three times the population of the U.S.

Now if you truly convinced that it's only a console issue, you are free to post a GUI example with a simple edit control. I am really curious if it works, honestly :biggrin:

jj2007

Quote from: Dubby on March 10, 2013, 06:26:51 PM
for a simple test, just open up cmd and copy paste some Unicode character (not the English one).
Now choose different available font, in my win7 system the "consolas" font is available choose that one.. now copy and paste some Russian text.. it will displayed correctly...

why? simply... because the font support it...
now paste some Chinese character, it will displayed nothing but garbage...

Hi Dubby,

The copy & paste test works fine with Russian, but it is a bit trickier with the more exotic fonts like Arabic or Chinese. Even if the fonts are installed (like on my machines), pasting doesn't work - but you can print them to the console. Ask a Billion Chinese, they can confirm that ;-)

Dubby

I think I need to correct something in my previous post...
"it's not about the font but the locale..."

Isn't it because their codepage already set in Chinese?

simply change the system locale...
I guess almost all non English folks have their default locale sets to their language...
in the attachment below contain 2 images.. one is in English locale and one in Chinese locale both of them were using utf-8 set to the console output..

jj2007

Quote from: Dubby on March 10, 2013, 09:20:32 PM
"it's not about the font but the locale..."

Isn't it because their codepage already set in Chinese?

simply change the system locale...

Well, mine is set to Italian - and I can display simultaneously Italian, English, Russian, Chinese, Arabic and a number of others. The reason is simple: Codepage UTF-8 alias 65001 is meant for that. It can display every language for which fonts are installed.

It is actually a bit trickier if you look at the details, but it works :biggrin:

Dubby

okey... would you kind enough to provide an example... :D
or at least how to achieve it...

jj2007

Quote from: Dubby on March 10, 2013, 09:45:04 PM
okey... would you kind enough to provide an example... :D
or at least how to achieve it...

See reply #1.

Just saw that the example there uses "true" Unicode. Here is the UTF-8 version (Utf8.txt attached):

include \masm32\MasmBasic\MasmBasic.inc        ; download
  Init
  Recall "Utf8.txt", MyStrings$()        ; read strings from file into an array
  ; remember that a) user needs to set console font to Lucida, b) Chinese & Arabic fonts must be installed
  SetCpUtf8                ; set the codepage
  For_ ebx=0 To eax-1
        Print MyStrings$(ebx), CrLf$        ; print to console
  Next
  Inkey CrLf$, "--- hit any key ---"        ; wait for a keypress
  Exit
end start