News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

JSON routines

Started by Biterider, December 07, 2020, 08:16:05 AM

Previous topic - Next topic

jj2007

Quote from: Biterider on December 11, 2022, 07:33:23 PM
   Implementations MUST NOT add a byte order mark (U+FEFF) to the
   beginning of a networked-transmitted JSON text.  In the interests of
   interoperability, implementations that parse JSON texts MAY ignore
   the presence of a byte order mark rather than treating it as an
   error.

A silly rule. Every parser can find the BOM in a few microseconds, set its encoding, and then ignore the two or three bytes. It is much, much more difficult to determine if a text is Ansi (which codepage?), Utf8 or Utf16.

Practically all webpages have a <meta charset="utf-8"> on top. That is a 22-byte "BOM".

QuoteLater: I got it. The 3 byte BOM is the UTF8 encoding of U+FEFF  :biggrin:

Nice find :tongue:

HSE

Quote from: HSE on December 10, 2022, 05:42:31 AM
data0 is UTF-8 and data1 is ANSI

I found that my use of name "ANSI" is totally wrong!! (but almost everybody use that in same way  :biggrin:)

The character encoding to wich I refer is a superset of ISO 8859-1 (wich is something like ASCII + Latin-1 Supplement) in terms of printable characters, but differs from it, and have additional characters. It is known to Windows by the code page number 1252, and by the IANA-approved name "windows-1252".

QuoteMicrosoft explains, "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."

Quite interesting  :thumbsup:
Equations in Assembly: SmplMath

jj2007

Agreed. "Ansi" in its common usage is a family of character sets that comprises Windows-1252 but also the popular "DOS graphics" charset.

CP_UTF8 is a different animal.

HSE

Hi Biterider!

By assembling in ANSI mode, these few modifications allow you to read UTF-8 or ANSI, even if it is not very efficient for ANSI (and always will save ANSI).

Code (old Json.inc) Select

JSON_FILE_UTF8      equ 0
JSON_FILE_ANSI      equ 1

Method Json.Read, uses xbx xdi xsi, pStream:$ObjPtr(Stream), xTipo:XWORD
  local lMemBlockUTF8:XWORD
   · · ·
  mov xbx, $OCall(pStream::Stream.GetSize)
  .if eax != -1
    mov lMemBlockUTF8, xax
   · · ·
         .if xTipo == JSON_FILE_UTF8 
            invoke UTF8ToWide, pMemBlockWide, pMemBlockUTF8, ebx
         .else
            invoke MultiByteToWideChar,CP_ACP,0,pMemBlockUTF8,lMemBlockUTF8,pMemBlockWide, ebx
         .endif
   · · ·
MethodEnd


Regards, HSE.
Equations in Assembly: SmplMath

Biterider

Hi HSE
Good idea.  :thumbsup:
If you are thinking about compiling this thing for ANSI, there is a way to simplify it.
What about when all internal processing is done with ANSI? The only problem is reading an UFT8 stream, but there is no solution for that if the code point is beyond the ANSI range.  :sad:

This means that compiling for ANSI falls into the "closed ecosystem" category.  :biggrin:
Such a change can be easily implemented.

Biterider

jj2007

Quote from: Biterider on December 14, 2022, 05:28:13 AMThe only problem is reading an UFT8 stream

Remember that streams are agnostic regarding codepages. For an assembler and for the Windows functions ending with -A, "Ansi" and Utf8 are the same thing. It is when you reconvert it to Utf16 that a MessageBoxW looks arabic or chinese.

HSE

Biterider,

Quote from: Biterider on December 14, 2022, 05:28:13 AM
The only problem is reading an UFT8 stream, but there is no solution for that if the code point is beyond the ANSI range.  :sad:

No big problem. This days not reading so much russian, chinese or quenya :biggrin: :biggrin:

Quote from: Biterider on December 14, 2022, 05:28:13 AM
This means that compiling for ANSI falls into the "closed ecosystem" category.  :biggrin:
What about when all internal processing is done with ANSI?

The idea behind no to simplify nothing is that assembling for WIDE "ecosystem open a little", you can read ANSI or UTF8, and always save UTF8  :biggrin:

Just I can't test because I have a nice crash when app is assembled for WIDE  :sad:  (some interface problem, I guess)

Later:
            :thumbsup: Assembling for WIDE can read ANSI and save UTF8

HSE
Equations in Assembly: SmplMath

HSE

Quote from: jj2007 on December 14, 2022, 06:05:53 AM
Remember that streams are agnostic regarding codepages. For an assembler and for the Windows functions ending with -A, "Ansi" and Utf8 are the same thing. It is when you reconvert it to Utf16 that a MessageBoxW looks arabic or chinese.

I was sending an UTF8 stream to a Wide control  :eusa_snooty:
Equations in Assembly: SmplMath

jj2007

Quote from: HSE on December 14, 2022, 06:35:49 AMI was sending an UTF8 stream to a Wide control  :eusa_snooty:

That won't work :biggrin:

Coding something similar to wRec$(someUtf8$) shouldn't take you more than ten minutes, though: MultiByteToWideChar is your friend, of course.

HSE

Quote from: jj2007 on December 14, 2022, 09:19:26 AM
Coding something similar to wRec$(someUtf8$) shouldn't take you more than ten minutes, though: MultiByteToWideChar is your friend, of course.

Yes, but could be more interesting to obtain the stream directly from JSON tree processing. Just in case a new UTF8 stream is unnecessary, and you want an specific text format.
Equations in Assembly: SmplMath