News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

How convert UTF-16 string to UTF-8 or OEMCP code

Started by bomz, February 25, 2025, 05:52:29 AM

Previous topic - Next topic

bomz

Hi to all! How convert UTF-16 string to UTF-8 or OEMCP code (FindNextFileW)?

zedd151

#1
Quote from: bomz on February 25, 2025, 05:52:29 AMHi to all! How convert UTF-16 string to UTF-8 or OEMCP code (FindNextFileW)?
Hi bomz. I had asked that your post be made into its own topic, where you might get a better response than posting your question to an unrelated existing topic.

I myself do not have an answer for you, as I work solely with ascii, 99.9 percent of the time.
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

NoCforMe

#2
[deleted]
Assembly language programming should be fun. That's why I do it.

bomz

problem occurred when downloading video files from YouTube, windows can't reflect files name, but can address files - delete,move, copy, rename, back rename. where is no method how do it in console, only use mass method with musk *.mp4


zedd151

Quote from: bomz on February 25, 2025, 11:21:20 AMproblem occurred when downloading video files from YouTube, windows can't reflect files namess method with musk *.mp4
Did this just start recently, or has it always been this way? I am just curious.
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

NoCforMe

#6
[deleted]
Assembly language programming should be fun. That's why I do it.

sinsi

Quote from: NoCforMe on February 25, 2025, 10:57:49 AMIf the 16-bit text is anything like Unicode, which I suspect it is, you can simply do the "read a word, write a byte" method:
; Assuming ECX points to your UTF-16 string,
; and EDX points to where you want to store the ASCII:

next: MOV AX, [ECX] ;Read a WORD.
MOV [EDX], AL ;Write a BYTE.
TEST AX, AX ;If it's a zero,
JZ done ;  that's the end.
ADD ECX, 2 ;Advance your pointers.
INC EDX
JMP next

done:

You don't have to use those registers; I just wrote the code that one way.
But you get the idea.

That is so wrong it's giving me double vision :dazzled:
Try it on real unicode, like some chinese text.

NoCforMe

Well, if that won't work, you'll need to use some kind of translation table to convert UTF-16, right?
Assembly language programming should be fun. That's why I do it.

zedd151

Quote from: sinsi on February 25, 2025, 12:03:29 PM
Quote from: NoCforMe on February 25, 2025, 10:57:49 AMIf the 16-bit text is anything like Unicode, which I suspect it is, you can simply do the "read a word, write a byte" method:
; Assuming ECX points to your UTF-16 string,
; and EDX points to where you want to store the ASCII:

next: MOV AX, [ECX] ;Read a WORD.
MOV [EDX], AL ;Write a BYTE.
TEST AX, AX ;If it's a zero,
JZ done ;  that's the end.
ADD ECX, 2 ;Advance your pointers.
INC EDX
JMP next

done:

You don't have to use those registers; I just wrote the code that one way.
But you get the idea.

That is so wrong it's giving me double vision :dazzled:
Try it on real unicode, like some chinese text.
Looks like unicode->ascii conversion code.
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

sinsi

QuoteZài zhēnzhèng de unicode shàng chángshì yīxià, lìrú yīxiē zhōngwén wénzì.
5A C3 A0 69 20 7A 68 C4 93 6E 7A 68 C3 A8 6E 67 20 64 65 20 75 6E 69 63 6F 64 65 20 73 68 C3 A0
6E 67 20 63 68 C3 A1 6E 67 73 68 C3 AC 20 79 C4 AB 78 69 C3 A0 2C 20 6C C3 AC 72 C3 BA 20 79 C4
AB 78 69 C4 93 20 7A 68 C5 8D 6E 67 77 C3 A9 6E 20 77 C3 A9 6E 7A C3 AC 2E
Take every second byte away and you're still left with gibberish :biggrin:


Quote from: NoCforMe on February 25, 2025, 12:26:28 PMWell, if that won't work, you'll need to use some kind of translation table to convert UTF-16, right?
Quote from: tenkey on February 25, 2025, 11:24:53 AMFor UTF-8, you could try WideCharToMultiByte.
Even WideCharToMultiByte isn't guaranteed to convert properly, or convert to an ANSI string (as the ToMultiByte part infers).


NoCforMe

So how does that stuff get translated to ASCII? or can it even be?
Assembly language programming should be fun. That's why I do it.

bomz


jj2007

Quote from: sinsi on February 25, 2025, 12:03:29 PMThat is so wrong it's giving me double vision :dazzled:
Try it on real unicode, like some chinese text.

Indeed. All file names are stored internally as Utf-16. You either use them "as is" in Utf-16, or you convert them for display purposes to Utf-8.

Quote from: tenkey on February 25, 2025, 11:24:53 AMFor UTF-8, you could try WideCharToMultiByte.

Exactly.

bomz