The MASM Forum

Projects => ObjAsm => Topic started by: Biterider on December 07, 2020, 08:16:05 AM

Title: JSON routines
Post by: Biterider on December 07, 2020, 08:16:05 AM
Hi
JSON is a must have when dealing with servers or high-level languages.
I recently had to try my luck with Python, which has a large number of libraries, especially one for JSON.

To exchange data from my applications, I started writing my own JSON read/write routines.
There are many libraries out there, but I wanted to integrate the functionality seamlessly into my code.

My first attempt is a simple write routine (stringify) that is encapsulated in an object.
When I have a little more time, I will complete the read (parse) routine.
It's a work in progress.  :biggrin:

If anyone wants to try it, the source is attached.

Regards, Biterider
Title: Re: JSON routines
Post by: Biterider on December 14, 2020, 04:33:13 AM
Hi
Today I finished the JSON parser, which strictly follows the guidelines of https://www.json.org/ (https://www.json.org/).

The ANSI compilation target was a little tricky as the native JSON format is UTF8. The solution to this problem is to use wide strings internally, which are translated into the current CodePage at the end.
The WIDE compilation target was easier to implement thanks to the UTF8ToWide function.

I successfully tested the code on several JSON files that I found on the internet.

A note on numeric and boolean values: both are recognized when reading, but not translated into a value.
It is up to the host to interpret the string value and convert it to the desired format, e.g. BYTE, WORD, DWORD, QWORD, REAL4, REAL8 etc.

It remains to test the performance for the next few days, but I am confident that this is very acceptable.  :biggrin:


The source is attached to the first post.

Regards, Biterider
Title: Re: JSON routines
Post by: Biterider on December 20, 2020, 01:00:01 AM
Hi
Finally, I finished JSON support and added 3 routines for encoding and decoding JSON escape sequences.

JsonEscDecode: converts a string containing escape sequences into a plain wide string
JsonEscEncode: converts a wide string containing special characters into a JSON string
JsonEscEncodeSize: calculates the required byte size to allocate a JSON string.

Note: these routines are very similar to the percent encoding of URL strings https://en.wikipedia.org/wiki/Percent-encoding (https://en.wikipedia.org/wiki/Percent-encoding)  :icon_idea:

Biterider

Title: Re: JSON routines
Post by: Biterider on January 10, 2021, 05:50:53 PM
Hi
I would like to post a link to a very handy tool written in asm by fearless to display the JSON information in the form of a tree structure.  :thumbsup:
It has already been mentioned a few times here in the forum, the direct link is:
https://github.com/mrfearless/cjsontree (https://github.com/mrfearless/cjsontree)

Biterider

Title: Re: JSON routines
Post by: fearless on January 10, 2021, 08:13:29 PM
Thanks Biterider,
Hopefully its useful to someone. It was similar to the xmltree thing that is on the forums somewhere, but more developed. Originally more as a proof of concept for using the cJSON library (https://github.com/DaveGamble/cJSON (https://github.com/DaveGamble/cJSON)) but it grew over time with iterations to fix issues and to add in a few more features. Had plans to add some more stuff but my interest only carried it so far. I found using json preferable to xml as it seems less crowded, and doesn't use the tags syntax of xml (which can add to the size of the file/data).
Title: Re: JSON routines
Post by: avcaballero on January 10, 2021, 08:59:50 PM
https://www.guru99.com/json-vs-xml-difference.html

JSON sample

{
  "student": [

     {
        "id":"01",
        "name": "Tom",
        "lastname": "Price"
     },

     {
        "id":"02",
        "name": "Nick",
        "lastname": "Thameson"
     }
  ]   
}


XML sample

<?xml version="1.0" encoding="UTF-8" ?>
<root>
<student>
<id>01</id>
<name>Tom</name>
<lastname>Price</lastname>
</student>
<student>
<id>02</id>
<name>Nick</name>
<lastname>Thameson</lastname>
</student>
</root>


I am more used to working with xml, I did not know json, but it does not seem that there is much difference between one and the other. What I find interesting is that there seem to be nosql databases that use this structure to store data, which I have never seen and I don't find its ussefulness over sql db.

QuoteMongoDB is a document database, which means it stores data in JSON-like documents. We believe this is the most natural way to think about data, and is much more expressive and powerful than the traditional row/column model.

https://www.mongodb.com/


QuoteWhen people use the term "NoSQL database", they typically use it to refer to any non-relational database. Some say the term "NoSQL" stands for "non SQL" while others say it stands for "not only SQL." Either way, most agree that NoSQL databases are databases that store data in a format other than relational tables.

A common misconception is that NoSQL databases or non-relational databases don't store relationship data well. NoSQL databases can store relationship data—they just store it differently than relational databases do. In fact, when compared with SQL databases, many find modeling relationship data in NoSQL databases to be easier than in SQL databases, because related data doesn't have to be split between tables.

https://www.mongodb.com/nosql-explained


Ah, BTW, thank you for the code  :thumbsup:
Title: Re: JSON routines
Post by: HSE on December 10, 2022, 02:32:28 AM
Hi Biterider!

There is a little problem in encoding translation with some unusual codes (mostly accidentals  :biggrin:).

SysSetup OOP, WIN32, ANSI_STRING

    OCall [xsi].DskStreamIn::DiskStream.Init, NULL, $OfsCStr("data0.json"), GENERIC_READ, \
                                    0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0
    OCall xsi.Read, addr [xsi].DskStreamIn

    OCall [xsi].DskStreamOut::DiskStream.Init, NULL, $OfsCStr("data1.json"), GENERIC_WRITE, \
                                    0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0

    New MemoryStream
    mov pBufferM, xax
    OCall pBufferM::MemoryStream.Init, xsi, 5*1024, 5*1024, -1

    OCall xsi.Write, pBufferM

    OCall xsi.Write, addr [xsi].DskStreamOut


Apparently there is no problem in pBuffer (that is showed in a RichEdit), but some codes are changed between data0 and data1:

   E28093  ->  96
   C2A9     ->  A9
   C2AE     ->  AE

That look correct because is using codepage 1252 as default, but JSON routines fail to create the tree from data1 with this changes.

Manually replacing first new code with 2Dh (minus sign), and erasing others two, work perfectly.

Thanks, HSE.
Title: Re: JSON routines
Post by: Biterider on December 10, 2022, 05:04:40 AM
Hi HSE
I'll look into it. Can you send me the input file (data0.json)?

Regards, Biterider
Title: Re: JSON routines
Post by: HSE on December 10, 2022, 05:42:31 AM
data0 is UTF-8 and data1 is ANSI
Title: Re: JSON routines
Post by: Biterider on December 10, 2022, 08:35:45 PM
Hi
I found the problem. Contrary to my statement in the JSON description "This implementation uses WIDE strings internally", when compiling for ANSI strings, some conversions happen and the result is saved as ANSI. For this reason, when writing back the JSON file, the result is ANSI. If you compile the application for WIDE strings, everything works fine as it should.
The solution to this situation is to really convert all to WIDE strings, with the consequence that, for example, the search for a specific key must be done with a WIDE string, regardless of the TARGET_STR_TYPE setting.
Before changing this, I would like to discuss it first.  :rolleyes:

Biterider
Title: Re: JSON routines
Post by: jj2007 on December 10, 2022, 11:12:23 PM
I would avoid ANSI. You can perfectly convert Utf8 to Utf16 and back, but Ansi<->Utf16 is a different story. There is a reason why the Internet runs on Utf8.
Title: Re: JSON routines
Post by: HSE on December 11, 2022, 03:35:19 AM
Quote from: Biterider on December 10, 2022, 08:35:45 PMContrary to my statement in the JSON description "This implementation uses WIDE strings internally"

Also you state "The default encoding is UTF-8.". Then, ideally, something loaded as UTF-8 must be saved as UTF-8. (unless some explicit option). But that is just a feature.

Quote from: Biterider on December 10, 2022, 08:35:45 PM
, when compiling for ANSI strings, some conversions happen and the result is saved as ANSI. For this reason, when writing back the JSON file, the result is ANSI.

If I was not making something wrong  :biggrin:, then, from programming point of view, here is the problem: JSON routines save in extended ANSI encoding, but only can read reduced ANSI encoding.

Quote from: Biterider on December 10, 2022, 08:35:45 PM
Before changing this, I would like to discuss it first.  :rolleyes:

For what I want, to make JSON routines load correctly extended ANSI encoding could be more than enough  :thumbsup:

I don't know if worth the effort to change things to allow a WIDE application can work with ANSI process and files. If is for free, that could be nice  :biggrin:

Perhaps could be this more general Json.inc and a JsonECMA.inc

For example in Android Java, JSON stream "must be" UTF-8. But not limitations in file encoding, just you have to make an independent conversion first:
File file = new File("My.json");
FileInputStream inputStream = new FileInputStream(file);
int size = inputStream.available();
byte[] buffer = new byte[size];
inputStream.read(buffer);
inputStream.close();

json = new String(buffer, "ISO_8859_1");  << conversion ANSI to UTF-8

MyJson = new JSONObject(json);




Title: Re: JSON routines
Post by: Biterider on December 11, 2022, 07:30:17 PM
Hi
Looking at some sources of information I stumbled across the RFC JSON specification.
Specifically, chapter 8.1 (https://www.rfc-editor.org/rfc/rfc8259#section-8.1) defines:
QuoteJSON text exchanged between systems that are not part of a closed ecosystem MUST be UTF-8 encoded
I read it like this:
If you want to achieve interoperability between applications, you must use UTF8.
When you do your own thing you can do whatever you prefer...

Biterider
Title: Re: JSON routines
Post by: Biterider on December 11, 2022, 07:33:23 PM
Hi
Reading further
QuoteImplementations MUST NOT add a byte order mark (U+FEFF) to the
   beginning of a networked-transmitted JSON text.  In the interests of
   interoperability, implementations that parse JSON texts MAY ignore
   the presence of a byte order mark rather than treating it as an
   error.

This is something that needs to be added to the current implementation  :icon_idea:

What puzzles me is that U+FEFF ist the BOM for UTF-16. I would expect the 3 byte BOM EF;BB;BF for UTF8...  :rolleyes:


Later: I got it. The 3 byte BOM is the UTF8 encoding of U+FEFF  :biggrin:

Biterider
Title: Re: JSON routines
Post by: HSE on December 11, 2022, 09:28:47 PM
Perfect  :thumbsup:

And I like the definition: I have a "closed ecosystem"  :biggrin: :biggrin:
Title: Re: JSON routines
Post by: jj2007 on December 11, 2022, 10:43:19 PM
Quote from: Biterider on December 11, 2022, 07:33:23 PM
   Implementations MUST NOT add a byte order mark (U+FEFF) to the
   beginning of a networked-transmitted JSON text.  In the interests of
   interoperability, implementations that parse JSON texts MAY ignore
   the presence of a byte order mark rather than treating it as an
   error.

A silly rule. Every parser can find the BOM in a few microseconds, set its encoding, and then ignore the two or three bytes. It is much, much more difficult to determine if a text is Ansi (which codepage?), Utf8 or Utf16.

Practically all webpages have a <meta charset="utf-8"> on top. That is a 22-byte "BOM".

QuoteLater: I got it. The 3 byte BOM is the UTF8 encoding of U+FEFF  :biggrin:

Nice find :tongue:
Title: Re: JSON routines
Post by: HSE on December 13, 2022, 12:36:26 AM
Quote from: HSE on December 10, 2022, 05:42:31 AM
data0 is UTF-8 and data1 is ANSI

I found that my use of name "ANSI" is totally wrong!! (but almost everybody use that in same way  :biggrin:)

The character encoding to wich I refer is a superset of ISO 8859-1 (wich is something like ASCII + Latin-1 Supplement) in terms of printable characters, but differs from it, and have additional characters. It is known to Windows by the code page number 1252, and by the IANA-approved name "windows-1252" (https://en.wikipedia.org/wiki/Windows-1252).

QuoteMicrosoft explains, "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."

Quite interesting  :thumbsup:
Title: Re: JSON routines
Post by: jj2007 on December 13, 2022, 02:02:32 AM
Agreed. "Ansi" in its common usage is a family of character sets that comprises Windows-1252 but also the popular "DOS graphics" charset (https://en.wikipedia.org/wiki/Code_page_437#Character_set).

CP_UTF8 is a different animal.
Title: Re: JSON routines
Post by: HSE on December 14, 2022, 04:28:54 AM
Hi Biterider!

By assembling in ANSI mode, these few modifications allow you to read UTF-8 or ANSI, even if it is not very efficient for ANSI (and always will save ANSI).

Code (old Json.inc) Select

JSON_FILE_UTF8      equ 0
JSON_FILE_ANSI      equ 1

Method Json.Read, uses xbx xdi xsi, pStream:$ObjPtr(Stream), xTipo:XWORD
  local lMemBlockUTF8:XWORD
   · · ·
  mov xbx, $OCall(pStream::Stream.GetSize)
  .if eax != -1
    mov lMemBlockUTF8, xax
   · · ·
         .if xTipo == JSON_FILE_UTF8 
            invoke UTF8ToWide, pMemBlockWide, pMemBlockUTF8, ebx
         .else
            invoke MultiByteToWideChar,CP_ACP,0,pMemBlockUTF8,lMemBlockUTF8,pMemBlockWide, ebx
         .endif
   · · ·
MethodEnd


Regards, HSE.
Title: Re: JSON routines
Post by: Biterider on December 14, 2022, 05:28:13 AM
Hi HSE
Good idea.  :thumbsup:
If you are thinking about compiling this thing for ANSI, there is a way to simplify it.
What about when all internal processing is done with ANSI? The only problem is reading an UFT8 stream, but there is no solution for that if the code point is beyond the ANSI range.  :sad:

This means that compiling for ANSI falls into the "closed ecosystem" category.  :biggrin:
Such a change can be easily implemented.

Biterider
Title: Re: JSON routines
Post by: jj2007 on December 14, 2022, 06:05:53 AM
Quote from: Biterider on December 14, 2022, 05:28:13 AMThe only problem is reading an UFT8 stream

Remember that streams are agnostic regarding codepages. For an assembler and for the Windows functions ending with -A, "Ansi" and Utf8 are the same thing. It is when you reconvert it to Utf16 that a MessageBoxW looks arabic or chinese.
Title: Re: JSON routines
Post by: HSE on December 14, 2022, 06:07:20 AM
Biterider,

Quote from: Biterider on December 14, 2022, 05:28:13 AM
The only problem is reading an UFT8 stream, but there is no solution for that if the code point is beyond the ANSI range.  :sad:

No big problem. This days not reading so much russian, chinese or quenya :biggrin: :biggrin:

Quote from: Biterider on December 14, 2022, 05:28:13 AM
This means that compiling for ANSI falls into the "closed ecosystem" category.  :biggrin:
What about when all internal processing is done with ANSI?

The idea behind no to simplify nothing is that assembling for WIDE "ecosystem open a little", you can read ANSI or UTF8, and always save UTF8  :biggrin:

Just I can't test because I have a nice crash when app is assembled for WIDE  :sad:  (some interface problem, I guess)

Later:
            :thumbsup: Assembling for WIDE can read ANSI and save UTF8

HSE
Title: Re: JSON routines
Post by: HSE on December 14, 2022, 06:35:49 AM
Quote from: jj2007 on December 14, 2022, 06:05:53 AM
Remember that streams are agnostic regarding codepages. For an assembler and for the Windows functions ending with -A, "Ansi" and Utf8 are the same thing. It is when you reconvert it to Utf16 that a MessageBoxW looks arabic or chinese.

I was sending an UTF8 stream to a Wide control  :eusa_snooty:
Title: Re: JSON routines
Post by: jj2007 on December 14, 2022, 09:19:26 AM
Quote from: HSE on December 14, 2022, 06:35:49 AMI was sending an UTF8 stream to a Wide control  :eusa_snooty:

That won't work :biggrin:

Coding something similar to wRec$(someUtf8$) (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1419) shouldn't take you more than ten minutes, though: MultiByteToWideChar is your friend, of course.
Title: Re: JSON routines
Post by: HSE on December 14, 2022, 10:48:59 AM
Quote from: jj2007 on December 14, 2022, 09:19:26 AM
Coding something similar to wRec$(someUtf8$) (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1419) shouldn't take you more than ten minutes, though: MultiByteToWideChar is your friend, of course.

Yes, but could be more interesting to obtain the stream directly from JSON tree processing. Just in case a new UTF8 stream is unnecessary, and you want an specific text format.