News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

How convert UTF-16 string to UTF-8 or OEMCP code

Started by bomz, February 25, 2025, 05:52:29 AM

Previous topic - Next topic

jj2007

Quote from: bomz on February 25, 2025, 11:56:40 PMperhaps I am wrong to use console streams?

Usually no problem, bomz, but consoles tend to display only locally common character sets. In Europe you may be able to display cyrillic text but no chinese, in China the console will surely display chinese and english text but probably no Russian.

If you want to see all kinds of characters, use a static, edit or richedit control.

bomz

problem is not to see text, which can't be reflect in console, directory list redirect to file, but save true name to file after download and ffmpeg = have possibility to rename and back rename, copy, move, delete....

jj2007

Quote from: bomz on February 26, 2025, 08:16:24 AMhave possibility to rename and back rename, copy, move, delete....

No problem if you use systematically the xxxW functions and Utf16 :thumbsup:

bomz

    invoke MultiByteToWideChar,CP_UTF8,0,hMemPointer,ebx,0,0
    mov _bs, eax
    invoke MultiByteToWideChar,CP_UTF8,0,hMemPointer,ebx,Buffer,esi
    invoke WideCharToMultiByte,CP_OEMCP,0,Buffer,_bs,hMemPointer,SizeofhMemory,0,0
if try to use as suggests above MultiByteToWideChar, code for convert UTF-8 to OEMCP must work with UTF-16 too. If try to change CP_UTF8 to 1200 error occurs, because no UTF-16 codepage in Windows, what is not surprise because Windows can't reflect such names correctly. Look like Windows api can't work with UTF-16 only transfer UTF-16 names to nuclear function. If try transfer UTF-16 code from one Windows api to another or try use console streams code destroys. or something like that

jj2007

Quote from: bomz on February 28, 2025, 01:22:18 AMMultiByteToWideChar... must work with UTF-16 too.

Well, not really: if your source is CP_UTF16, it is already wide char... nothing to convert :cool:

bomz

First I do not try convert anything, just direct console stream to file, for back rename after ffmpeg. as result I want get pull of console utilities which allow work with such files

zedd151

Quote from: bomz on February 28, 2025, 03:51:38 AMFirst I do not try convert anything, 

But the title of this thread clearly says something different. "How convert UTF-16 string to UTF-8 or OEMCP code"

Do you see how this can cause confusion?
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—


jj2007

Quote from: bomz on February 28, 2025, 01:22:18 AMMultiByteToWideChar, code for convert UTF-8 to OEMCP must work with UTF-16

Quote from: bomz on February 28, 2025, 03:51:38 AMI do not try convert anything

Please take a decision :cool:

ognil


Q: How convert UTF-16 string to UTF-8 and to OEMCP code , direct console stream to file, use MASM64 assembly for windows 10

A: To achieve the tasks of converting a UTF-16 string to UTF-8, converting it further to OEM code page (OEMCP), and redirecting the console stream to a file using MASM64 assembly on Windows 10, we need to break the problem into several steps. Each step involves interacting with the Windows API, as MASM64 does not provide built-in functions for these operations.
Below is a detailed explanation of the process, followed by an example implementation in MASM64 assembly:

Step-by-Step Explanation

1. Convert UTF-16 String to UTF-8
The Windows API provides the WideCharToMultiByte function, which can be used to convert a UTF-16 (wide character) string to a UTF-8 encoded string. The function requires specifying the code page as CP_UTF8.

2. Convert UTF-8 String to OEM Code Page
Once the string is in UTF-8 format, you can use the same WideCharToMultiByte function again, but this time specify the OEM code page (e.g., CP_OEMCP) to convert the UTF-8 string to the OEM code page.

3. Redirect Console Stream to a File
To redirect the console output to a file, you can use the CreateFile function to open or create a file, and then use SetStdHandle to redirect the standard output (STD_OUTPUT_HANDLE) to the file handle.

4. Write to the Console/File
Finally, you can use the WriteConsole or WriteFile function to write the converted string to the redirected console/file.

MASM64 Assembly Implementation
Below is an example implementation in MASM64 assembly that performs the above steps:
; MASM64 Example: Convert UTF-16 to UTF-8, then to OEMCP, and redirect console output to a file
include \masm64\include\masm64rt.inc
.data
    utf16_string db "Hello, World!", 0          ; UTF-16 string (null-terminated)
    utf8_buffer db 256 dup(0)                   ; Buffer for UTF-8 conversion
    oem_buffer db 256 dup(0)                    ; Buffer for OEMCP conversion
    file_handle HANDLE ?                        ; Handle for the output file
    bytes_written DWORD ?                       ; Number of bytes written
.code
main proc
 ; Step 1: Convert UTF-16 to UTF-8
 invoke WideCharToMultiByte, CP_UTF8, 0, offset utf16_string, -1, offset utf8_buffer, sizeof utf8_buffer, NULL, NULL
 ; Step 2: Convert UTF-8 to OEMCP
 invoke MultiByteToWideChar, CP_UTF8, 0, offset utf8_buffer, -1, offset oem_buffer, sizeof oem_buffer
 invoke WideCharToMultiByte, CP_OEMCP, 0, offset oem_buffer, -1, offset oem_buffer, sizeof oem_buffer, NULL, NULL
 ; Step 3: Open/Create a file for output
 invoke CreateFile, "output.txt", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL
mov file_handle, rax
; Step 4: Redirect console output to the file
invoke SetStdHandle, STD_OUTPUT_HANDLE, file_handle
; Step 5: Write the OEMCP string to the file
invoke GetStdHandle, STD_OUTPUT_HANDLE
invoke WriteFile, rax, offset oem_buffer, sizeof oem_buffer, addr bytes_written, NULL
 ; Cleanup
 invoke CloseHandle, file_handle
 ; Exit program
 invoke ExitProcess, 0
main endp
end
; MASM64 Example: Convert UTF-16 to UTF-8, then to OEMCP, and redirect console output to a file

Explanation of Key Functions
WideCharToMultiByte :
Converts a wide character string (UTF-16) to a multibyte string (UTF-8 or OEMCP).
Parameters:
CodePage: Specifies the target code page (e.g., CP_UTF8 or CP_OEMCP).
dwFlags: Conversion flags (usually 0).
lpWideCharStr: Pointer to the source UTF-16 string.
cchWideChar: Length of the source string (-1 for null-terminated).
lpMultiByteStr: Pointer to the buffer for the converted string.
cbMultiByte: Size of the buffer.
lpDefaultChar: Default character for unmappable characters (optional).
lpUsedDefaultChar: Indicates if default characters were used (optional).
MultiByteToWideChar :
Converts a multibyte string (UTF-8) back to a wide character string (UTF-16).
Used here to prepare the UTF-8 string for conversion to OEMCP.
CreateFile :
Creates or opens a file for writing.
Returns a handle to the file.
SetStdHandle :
Redirects the standard output handle to the specified file handle.
WriteFile :
Writes data to the file or redirected console.
CloseHandle :
Closes the file handle after writing.
"Not keeping emotions under control is another type of mental distortion."

jj2007

Quote from: ognil on February 28, 2025, 05:33:48 AM; Step 1: Convert UTF-16 to UTF-8
 invoke WideCharToMultiByte, CP_UTF8, 0, offset utf16_string, -1, offset utf8_buffer, sizeof utf8_buffer, NULL, NULL
 ; Step 2: Convert UTF-8 to OEMCP
 invoke MultiByteToWideChar, CP_UTF8, 0, offset utf8_buffer, -1, offset oem_buffer, sizeof oem_buffer

So, in step 1 you convert Utf-16 to Utf-8. Then, in step 2, you convert the converted string to CP_OEM. That's genius :thumbsup:

(minor criticism: the "oem_buffer" should be named "wide_buffer" because that's what it gets)

bomz

it is impossible do with using console streams. Only through write direct to file to save UTF-16 code

jj2007

Using your executable. Where is the source btw?

bomz

#28
*

bomz

#29
*