News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

JSON routines

Started by Biterider, December 07, 2020, 08:16:05 AM

Previous topic - Next topic

Biterider

Hi
JSON is a must have when dealing with servers or high-level languages.
I recently had to try my luck with Python, which has a large number of libraries, especially one for JSON.

To exchange data from my applications, I started writing my own JSON read/write routines.
There are many libraries out there, but I wanted to integrate the functionality seamlessly into my code.

My first attempt is a simple write routine (stringify) that is encapsulated in an object.
When I have a little more time, I will complete the read (parse) routine.
It's a work in progress.  :biggrin:

If anyone wants to try it, the source is attached.

Regards, Biterider

Biterider

Hi
Today I finished the JSON parser, which strictly follows the guidelines of https://www.json.org/.

The ANSI compilation target was a little tricky as the native JSON format is UTF8. The solution to this problem is to use wide strings internally, which are translated into the current CodePage at the end.
The WIDE compilation target was easier to implement thanks to the UTF8ToWide function.

I successfully tested the code on several JSON files that I found on the internet.

A note on numeric and boolean values: both are recognized when reading, but not translated into a value.
It is up to the host to interpret the string value and convert it to the desired format, e.g. BYTE, WORD, DWORD, QWORD, REAL4, REAL8 etc.

It remains to test the performance for the next few days, but I am confident that this is very acceptable.  :biggrin:


The source is attached to the first post.

Regards, Biterider

Biterider

Hi
Finally, I finished JSON support and added 3 routines for encoding and decoding JSON escape sequences.

JsonEscDecode: converts a string containing escape sequences into a plain wide string
JsonEscEncode: converts a wide string containing special characters into a JSON string
JsonEscEncodeSize: calculates the required byte size to allocate a JSON string.

Note: these routines are very similar to the percent encoding of URL strings https://en.wikipedia.org/wiki/Percent-encoding  :icon_idea:

Biterider


Biterider

Hi
I would like to post a link to a very handy tool written in asm by fearless to display the JSON information in the form of a tree structure.  :thumbsup:
It has already been mentioned a few times here in the forum, the direct link is:
https://github.com/mrfearless/cjsontree

Biterider


fearless

Thanks Biterider,
Hopefully its useful to someone. It was similar to the xmltree thing that is on the forums somewhere, but more developed. Originally more as a proof of concept for using the cJSON library (https://github.com/DaveGamble/cJSON) but it grew over time with iterations to fix issues and to add in a few more features. Had plans to add some more stuff but my interest only carried it so far. I found using json preferable to xml as it seems less crowded, and doesn't use the tags syntax of xml (which can add to the size of the file/data).

avcaballero

#5
https://www.guru99.com/json-vs-xml-difference.html

JSON sample

{
  "student": [

     {
        "id":"01",
        "name": "Tom",
        "lastname": "Price"
     },

     {
        "id":"02",
        "name": "Nick",
        "lastname": "Thameson"
     }
  ]   
}


XML sample

<?xml version="1.0" encoding="UTF-8" ?>
<root>
<student>
<id>01</id>
<name>Tom</name>
<lastname>Price</lastname>
</student>
<student>
<id>02</id>
<name>Nick</name>
<lastname>Thameson</lastname>
</student>
</root>


I am more used to working with xml, I did not know json, but it does not seem that there is much difference between one and the other. What I find interesting is that there seem to be nosql databases that use this structure to store data, which I have never seen and I don't find its ussefulness over sql db.

QuoteMongoDB is a document database, which means it stores data in JSON-like documents. We believe this is the most natural way to think about data, and is much more expressive and powerful than the traditional row/column model.

https://www.mongodb.com/


QuoteWhen people use the term "NoSQL database", they typically use it to refer to any non-relational database. Some say the term "NoSQL" stands for "non SQL" while others say it stands for "not only SQL." Either way, most agree that NoSQL databases are databases that store data in a format other than relational tables.

A common misconception is that NoSQL databases or non-relational databases don't store relationship data well. NoSQL databases can store relationship data—they just store it differently than relational databases do. In fact, when compared with SQL databases, many find modeling relationship data in NoSQL databases to be easier than in SQL databases, because related data doesn't have to be split between tables.

https://www.mongodb.com/nosql-explained


Ah, BTW, thank you for the code  :thumbsup:

HSE

Hi Biterider!

There is a little problem in encoding translation with some unusual codes (mostly accidentals  :biggrin:).

SysSetup OOP, WIN32, ANSI_STRING

    OCall [xsi].DskStreamIn::DiskStream.Init, NULL, $OfsCStr("data0.json"), GENERIC_READ, \
                                    0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0
    OCall xsi.Read, addr [xsi].DskStreamIn

    OCall [xsi].DskStreamOut::DiskStream.Init, NULL, $OfsCStr("data1.json"), GENERIC_WRITE, \
                                    0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0

    New MemoryStream
    mov pBufferM, xax
    OCall pBufferM::MemoryStream.Init, xsi, 5*1024, 5*1024, -1

    OCall xsi.Write, pBufferM

    OCall xsi.Write, addr [xsi].DskStreamOut


Apparently there is no problem in pBuffer (that is showed in a RichEdit), but some codes are changed between data0 and data1:

   E28093  ->  96
   C2A9     ->  A9
   C2AE     ->  AE

That look correct because is using codepage 1252 as default, but JSON routines fail to create the tree from data1 with this changes.

Manually replacing first new code with 2Dh (minus sign), and erasing others two, work perfectly.

Thanks, HSE.
Equations in Assembly: SmplMath

Biterider

Hi HSE
I'll look into it. Can you send me the input file (data0.json)?

Regards, Biterider

HSE

data0 is UTF-8 and data1 is ANSI
Equations in Assembly: SmplMath

Biterider

Hi
I found the problem. Contrary to my statement in the JSON description "This implementation uses WIDE strings internally", when compiling for ANSI strings, some conversions happen and the result is saved as ANSI. For this reason, when writing back the JSON file, the result is ANSI. If you compile the application for WIDE strings, everything works fine as it should.
The solution to this situation is to really convert all to WIDE strings, with the consequence that, for example, the search for a specific key must be done with a WIDE string, regardless of the TARGET_STR_TYPE setting.
Before changing this, I would like to discuss it first.  :rolleyes:

Biterider

jj2007

I would avoid ANSI. You can perfectly convert Utf8 to Utf16 and back, but Ansi<->Utf16 is a different story. There is a reason why the Internet runs on Utf8.

HSE

Quote from: Biterider on December 10, 2022, 08:35:45 PMContrary to my statement in the JSON description "This implementation uses WIDE strings internally"

Also you state "The default encoding is UTF-8.". Then, ideally, something loaded as UTF-8 must be saved as UTF-8. (unless some explicit option). But that is just a feature.

Quote from: Biterider on December 10, 2022, 08:35:45 PM
, when compiling for ANSI strings, some conversions happen and the result is saved as ANSI. For this reason, when writing back the JSON file, the result is ANSI.

If I was not making something wrong  :biggrin:, then, from programming point of view, here is the problem: JSON routines save in extended ANSI encoding, but only can read reduced ANSI encoding.

Quote from: Biterider on December 10, 2022, 08:35:45 PM
Before changing this, I would like to discuss it first.  :rolleyes:

For what I want, to make JSON routines load correctly extended ANSI encoding could be more than enough  :thumbsup:

I don't know if worth the effort to change things to allow a WIDE application can work with ANSI process and files. If is for free, that could be nice  :biggrin:

Perhaps could be this more general Json.inc and a JsonECMA.inc

For example in Android Java, JSON stream "must be" UTF-8. But not limitations in file encoding, just you have to make an independent conversion first:
File file = new File("My.json");
FileInputStream inputStream = new FileInputStream(file);
int size = inputStream.available();
byte[] buffer = new byte[size];
inputStream.read(buffer);
inputStream.close();

json = new String(buffer, "ISO_8859_1");  << conversion ANSI to UTF-8

MyJson = new JSONObject(json);




Equations in Assembly: SmplMath

Biterider

Hi
Looking at some sources of information I stumbled across the RFC JSON specification.
Specifically, chapter 8.1 defines:
QuoteJSON text exchanged between systems that are not part of a closed ecosystem MUST be UTF-8 encoded
I read it like this:
If you want to achieve interoperability between applications, you must use UTF8.
When you do your own thing you can do whatever you prefer...

Biterider

Biterider

Hi
Reading further
QuoteImplementations MUST NOT add a byte order mark (U+FEFF) to the
   beginning of a networked-transmitted JSON text.  In the interests of
   interoperability, implementations that parse JSON texts MAY ignore
   the presence of a byte order mark rather than treating it as an
   error.

This is something that needs to be added to the current implementation  :icon_idea:

What puzzles me is that U+FEFF ist the BOM for UTF-16. I would expect the 3 byte BOM EF;BB;BF for UTF8...  :rolleyes:


Later: I got it. The 3 byte BOM is the UTF8 encoding of U+FEFF  :biggrin:

Biterider

HSE

Perfect  :thumbsup:

And I like the definition: I have a "closed ecosystem"  :biggrin: :biggrin:
Equations in Assembly: SmplMath