Lots of people argue on the web whether Utf-16 is "harmful", and what are the pros and cons of Utf-8 vs Utf-16. As a matter of fact, both work just fine, as demonstrated below with two compound characters that are not in the Basic Multilingual Plane. For Chinese text, the space requirements are very similar, as demonstrated above, for English text, Utf-8 clearly wins, of course.
include \masm32\MasmBasic\MasmBasic.inc
Init
Cls
Let edi="aaa שָׁ bbb 𝕥 ccc" ; two compound characters with 6 and 4 bytes in Utf-8
PrintLine "Utf-8:", CrLf$, HexDump$(edi, 32)
Let esi=wRec$(edi) ; convert to Utf-16
PrintLine "Utf-16:", CrLf$, HexDump$(esi, 48)
if 0 ; activate if needed
lea ecx, [2*wLen(esi)]
FileWrite "compound.txt", esi, ecx
endif
uMsgBox 0, edi, "Utf-8:", MB_OK
wMsgBox 0, esi, "Utf-16:", MB_OK
EndOfCode
Output (20 = 32 dec are the spaces):
Utf-8:
002BD2B0 61 61 61 20 20 D7 A9 D7 81 D6 B8 20 20 62 62 62 aaa שָׁ bbb
002BD2C0 20 20 F0 9D 95 A5 20 20 63 63 63 00 00 00 00 00 𝕥 ccc.....
Utf-16:
0028DEF0 61 00 61 00 61 00 20 00 20 00 E9 05 C1 05 B8 05 a.a.a. . .�.�.�.
0028DF00 20 00 20 00 62 00 62 00 62 00 20 00 20 00 35 D8 . .b.b.b. . .5�
0028DF10 65 DD 20 00 20 00 63 00 63 00 63 00 00 00 00 00 e� . .c.c.c.....